Okay so some of you be using malenox FPGA cards which basically bypasses the BUS to give lower latency on your network response time.
Now if you have used an OS like SUSE and had a butt load of bonded nics and then want to migrate the OS and all the bonded nics configurations in an automated fashion using ansible or something configuration management tool.
What some of you might run into is when the OS comes up for the first time, some of the Mellanox nics will boot up in infiniband mode. Which will result in the bonded nics showing up as down. I will show you how to determine this and fix this.
.
So the first thing you want to do is determine which bonds are showing down
How to check which bonds are down.
1.grep -c down /proc/net/bonding/*
◦ this will list out all the bonds that show an interface is down
Example
root@ansibleclient:~> grep -c down /proc/net/bonding/*
.
/proc/net/bonding/bond1:0
/proc/net/bonding/bond2:0
/proc/net/bonding/bond3:1 (this indicates that one interface is down)
.
2.Once you determine the bond has an interface that is down you want to figure out if it’s the Mellanox card nic.
• cat /proc/net/bonding/bond3
i.this will give you the nic mac address that are inside the bond.
Example
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth4
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
.
Slave Interface: eth4
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:02:c9:e9:e9:11
Slave queue ID: 0
.
Slave Interface: eth5
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:02:c9:e9:e9:12
Slave queue ID: 0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
.
3.now what you want to do next is run ‘ip a’ and see if those interfaces are listed
.
Example – should look something like this. If you don’t see the down nic here for our example lets say its eth5. This could mean its in infiniband mode and not ethernet mode. It also shows if the interface is up or down. Which is very important when troubleshooting the interface
.
[root@nickansible]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:26:9a:33:59 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
valid_lft 82770sec preferred_lft 82770sec
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:26:88:5a:fd brd ff:ff:ff:ff:ff:ff
inet 192.168.1.11/24 brd 192.168.1.255 scope global noprefixroute dynamic enp0s8
valid_lft 82773sec preferred_lft 82773sec
.
4.Okay now we need to determine if eth5 is infact the Mellanox card. So now we need the nic information
.
Example.
It will look something like this.
.
[root@nick ansible# ethtool -i eth5
driver: e1000
version: 7.3.21-k8-NAPI
firmware-version:
expansion-rom-version:
bus-info: 0000:00:18.0 (this is the important info you need)
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
.
• Now you want to take the bus info and determine if it is infact the Mellanox card
.
Example
.
[root@nick ansible]# lspci -s 0000:00:18.0.0
00:18:00 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]
.
5.Okay now we know for sure this nic is infact the Mellanox nic that is down. So now we went to manually force it into ethernet but first check to see what it says
• cat /sys/bus/pci/devices/0000\:18\:00.0/mlx4_port0
ii.if this doesn’t return “ETH” then its in infiniabnd mode
• cat /sys/bus/pci/devices/0000\:18\:00.0/mlx4_port1
iii.if this doesn’t return “ETH” then its in infiniabnd mode
.
6.Now what we want to od is manually change the nic to ethernet mode
• echo eth > /sys/bus/pci/devices/0000\:18\:00.0/mlx4_port1
• echo eth > /sys/bus/pci/devices/0000\:18\:00.0/mlx4_port1
iv.If you cat them now it should say “ETH”
.
Okay so now when you do ‘ip a’ you should should see the nics up and if you check the status of the bond there should be 0 bonds down. You might have to bring the bond down and up.
.
7.You can do this simply by
• Ifdown eth5 & ifup eth5
v.If there are no errors, the cursor will simply move to the next line with a brief delay.
.
.
Now the issue here is that if you aren’t able to get rpms from Mellanox that are supported by patching in your organisation. You’re going to need a way to ensure that if the server reboots the nic will start up in ethernet mode, otherwise you could be in a very bad situation if the server boots and the nic came up in infiniband mode.
.
So there are a couple of ideas I came up with to solve this.
Option:
1.You can simply add the echo lines in the /etc/rc.local
• echo eth > /sys/bus/pci/devices/0000\:18\:00.0/mlx4_port1
• echo eth > /sys/bus/pci/devices/0000\:18\:00.0/mlx4_port1
i.This should bring the interface back to “ETH”, however you might need to add some more lines to bring the interface up properly.
.
1.This the approach I chose and the cooler way to go about it. In redhat 7 you can define a if-preup-local script which will run anytime “ifup” is run.
Here is how you set that up.
1.Create a file called “/etc/sysconfig/network-scripts/ifup-pre-local’
a.vi /etc/sysconfig/network-scripts/ifup-pre-local
.
2.Now you can add whatever script you want. My colleague and I came up with a script that determined based on mac and bus info and if it certain buses and mac showed up it would run the echo to move the ports into eth mode
.
ADD this inside and save the file
#!/bin/bash
.
#
.
LID=”00:00:00:00″
for i in `ls /etc/sysconfig/network-scripts/ifcfg-* 2> /dev/null`
do
for j in `grep HWADDR $i |awk -F\” ‘{print $2}’`
do
ID1=$(echo $j | awk -F\: ‘{print $2″:”$3}’)
ID2=$(echo $j | awk -F\: ‘{print $4″:”$5}’)
ID=”$ID1:$ID2″
PORT=$(echo $j | cut -c 16-17)
for k in `ls /sys/bus/pci/devices/0000\:*\:00.0/net/ib[0-9]/address 2> /dev/null`
do
grep “$ID1.*$ID2” $k 1> /dev/null
if [ $? -eq 0 ]; then
if [ “x$ID” != “x$LID” ]; then
mlxport=1
else
let “mlxport++”
fi
LID=$ID
p=$(echo $k | awk -F/ ‘{print “/sys/bus/pci/devices/”$6″/”}’)
echo “Running: echo eth > ${p}mlx4_port${mlxport}”
echo eth > ${p}mlx4_port${mlxport}
fi
done
done
done
.
3.Next you want to create a symlink in side /sbin
c.now create a symlink for ifup-pre-local
ii.ln -s /etc/sysconfig/network-scripts/ifup-pre-local ifup-pre-local
.
Now when you run ifup it will run that script that check to see if the any of those bus and macs are in infinband mode and bring them into eth. It safer to do this way because if you restart the network and for some reason the nic goes back into infiniband and someone new had no idea. They would spend awhile trying to figure this out.
.
.
.
How do deploy this fix via anisble role coming soon……
.
.
.