This post will take you through the installation and configuration of an Infiniband card on a server running Red Hat Enterprise Linux 5.4. These steps are applicable to any version of Red Hat 5, and will probably work with version 6 as well. It has been surprisingly hard to find all of these steps in one document.
Required packages
openib-1.4.1-6.el5.noarch
libibverbs-1.1.3-2.el5.x86_64
libnes-0.9.0-2.el5.x86_64
libibumad-1.3.3-1.el5.x86_64
opensm-libs-3.3.3-2.el5.x86_64
swig-1.3.29-2.el5.x86_64
ibutils-libs-1.2-11.1.el5.x86_64
ibutils-1.2-11.1.el5.x86_64
(provides ibdiagnet and others)
opensm-3.3.3-2.el5.x86_64
libibmad-1.3.3-1.el5.x86_64
infiniband-diags-1.5.3-1.el5.x86_64
(provides handy tools like ibstat and ibstatus)
libibverbs-utils-1.1.3-2.el5.x86_64
(provides handy tools ibv_devinfo and ibv_devices)
libibverbs-devel-1.1.3-2.el5.x86_64
Hardware
First, make sure your hardware is working correctly:
$ lspci | grep fini
Make sure the card shows up! If not, there is basic hardware problem. Try re-seating the card or moving it to another PCI slot.
Kernel driver
If you have installed the openib package, the Infiniband kernel module should be installed. Reboot the system and look at the kernel boot messages for a good clue to which driver you need:
$ dmesg | grep mth ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008) ib_mthca: Initializing 0000:51:00.0
I need the mthca driver, so I use modprobe to load the kernel module:
$ modprobe ib_mthca
Note that even though the correct driver was mentioned in the boot messages, the module is NOT automatically installed. I added this module to the /etc/modules file so that it will be automatically loaded at boot. Now, check the module:
$ lsmod | grep ib_mthca ib_mthca 158053 0 ib_mad 70757 5 ib_mthca,ib_umad,ib_cm,ib_sa,mlx4_ib ib_core 104901 17 ib_mthca,ib_iser,ib_srp,rds,ib_sdp,ib_ipoib,rdma_ucm,rdma_cm,ib_ucm,ib_uverbs,ib_umad,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_mad,iw_cxgb3
Once the kernel driver is loaded, you should see a directory under /sys/class/infiniband:
$ ls /sys/class/infiniband mthca0
User-space driver
Okay, that’s working, but ibv_devices and ibv_devinfo still report:
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
Now install the appropriate driver for the user space:
$ yum install libmthca
Check again:
$ ibv_devices device node GUID ------ ---------------- mthca0 0005ad00000c1588 [root@evc2 ~]# ibv_devinfo hca_id: mthca0 transport: InfiniBand (0) fw_ver: 1.2.917 node_guid: 0005:ad00:000c:1588 sys_image_guid: 0005:ad00:0100:d050 vendor_id: 0x05ad vendor_part_id: 25204 hw_ver: 0xA0 board_id: HCA.Cheetah-DDR.20 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 275 port_lmc: 0x00
If you still get the error
Failed to get IB devices list: Function not implemented
the root cause might be that you need to load the correct driver:
modprobe ib_uverbs
$ ifconfig eth0 Link encap:Ethernet HWaddr 00:14:5E:F4:3A:A8 inet addr:172.20.102.2 Bcast:172.20.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:253460 errors:0 dropped:0 overruns:0 frame:0 TX packets:140500 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:351821939 (335.5 MiB) TX bytes:12147168 (11.5 MiB) Interrupt:185 Memory:e4000000-e4012800 ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.21.102.2 Bcast:172.21.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:25 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:1 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:1400 (1.3 KiB) TX bytes:0 (0.0 b) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1516 errors:0 dropped:0 overruns:0 frame:0 TX packets:1516 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2394328 (2.2 MiB) TX bytes:2394328 (2.2 MiB)
TCP over Infiniband
Following the Red Hat documentation, create a device configuration file called
/etc/sysconfig/network-scripts/ifcfg-ib0
Make sure to set the right IP address for YOUR configuration: (EDIT: Removed TYPE parameter)
DEVICE=ib0 BOOTPROTO=none ONBOOT=yes IPADDR=172.21.102.2 NETMASK=255.255.0.0
Now you should be able to ping the Infiniband interface (assuming the cable is plugged in to a working fabric):
$ ping ivc2 PING ivc2 (172.21.102.2) 56(84) bytes of data. 64 bytes from ivc2 (172.21.102.2): icmp_seq=1 ttl=64 time=2.38 ms
Enable connected mode
Edit: 21 Sept 2013
By default, RedHat does not enable “connected mode” on Infiniband. Enabling connected mode can substantially speed up IP-over-IB transport. Add the following line to the config file you created in the previous step
CONNECTED_MODE=Yes
and restart Infiniband.
TYPE=Ethernet
that should be
TYPE=InfiniBand
Now that you mention it, I don’t know where that TYPE keyword came from. It’s not present in the ifcfg-ib0 file on my Red Hat systems, and it’s not mentioned in the Red Hat docs, so I removed it.
Hi – I have been searching the web for some possible solutions. However, I am having a problem where the rate for mthca0 and mthca1 becomes degraded from 20 to 10 Gb/sec when ever the IB switch is rebooted. Is this a known problem that you are aware of? Also, when I run “ibv_devinfo -v”, I notice the ports have “subnet_timeout: 18”. How can I change this value? Is this value in seconds or msec? Is there some site or doc that gives an explanation of the values and how to modify them?
ibv_devinfo -v
hca_id: mthca1
transport: InfiniBand (0)
fw_ver: 1.2.0
node_guid: 0002:c902:0029:6588
sys_image_guid: 0002:c902:0029:658b
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id: IBM0020000002
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffff000
max_qp: 64512
max_qp_wr: 16384
device_cap_flags: 0x00001c76
max_sge: 27
max_sge_rd: 0
max_cq: 65408
max_cqe: 131071
max_mr: 131056
max_pd: 32764
max_qp_rd_atom: 4
max_ee_rd_atom: 0
max_res_rd_atom: 258048
max_qp_init_rd_atom: 128
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 8192
max_mcast_qp_attach: 56
max_total_mcast_qp_attach: 458752
max_ah: 0
max_fmr: 0
max_srq: 960
max_srq_wr: 16384
max_srq_sge: 27
max_pkeys: 64
local_ca_ack_delay: 15
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 8
port_lmc: 0x00
max_msg_sz: 0x80000000
port_cap_flags: 0x02510a68
max_vl_num: 4 (3)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 64
gid_tbl_len: 32
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 5.0 Gbps (2)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0002:c902:0029:6589
hca_id: mthca0
transport: InfiniBand (0)
fw_ver: 1.2.0
node_guid: 0002:c902:0029:72a4
sys_image_guid: 0002:c902:0029:72a7
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id: IBM0020000002
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffff000
max_qp: 64512
max_qp_wr: 16384
device_cap_flags: 0x00001c76
max_sge: 27
max_sge_rd: 0
max_cq: 65408
max_cqe: 131071
max_mr: 131056
max_pd: 32764
max_qp_rd_atom: 4
max_ee_rd_atom: 0
max_res_rd_atom: 258048
max_qp_init_rd_atom: 128
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 8192
max_mcast_qp_attach: 56
max_total_mcast_qp_attach: 458752
max_ah: 0
max_fmr: 0
max_srq: 960
max_srq_wr: 16384
max_srq_sge: 27
max_pkeys: 64
local_ca_ack_delay: 15
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 7
port_lmc: 0x00
max_msg_sz: 0x80000000
port_cap_flags: 0x02510a68
max_vl_num: 4 (3)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 64
gid_tbl_len: 32
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 5.0 Gbps (2)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0002:c902:0029:72a5
Thanks,
Neuhru
I wish I understood Infiniband well enough to answer that question! I have not experienced that problem, but then again, our IB switches have not been rebooted in years. You might want to read some of the references I’ve gathered here:
https://shocksolution.com/2012/12/the-infiniband-troubleshooting-quick-reference/
At what point do you install drivers from OFED website in above documentation.
Amit, I did not download or install any third-party drivers. The Red Hat RPMs provided drivers for RDMA and VERBS.
thanks for the info , is hard to find it . i tried in a redhat 6.5
for redhat 6.5 and above
yum groupinfo “Infiniband Support”
yum groupinstall “Infiniband Support”
also install the optional rpms
service rdma start
service opensm start
Thank you for the update! I no longer have a system with Inifiniband, so I am unable to update this post.