Glossary of Infiniband Terminology:
- GID: Global Identifier
- GUID: Global Unique Identifiers (also known as Direct Address)
- HCA: Host Channel Adapter
- LID: Local Identifier
- TCA: Target Channel Adapter
- SM: Subnet Manager
Infiniband: The Host Perspective
Every host on an Infiniband fabric has three identifiers: GUID, GID, and LID. A GUID is similar in concept to a MAC address because it consists of a 24-bit manufacturer’s prefix and a 40-bit device identifier (64 bits total). The Global Identifier (GID) is a 128-bit identifier similar to an IPv6 address (technically, a GID is a valid IPv6 identifier with restrictions). The GID consists of the 64-bit GUID plus an additional 64-bit EUI-64 identifier, for a total of 128 bits. The GID is used for routing between subnets. The default GID prefix is 0xfe80::0. Finally, there is the local identifier (LID), which is assigned by the subnet manager. The LID is a 16-bit identifier that is unique within a subnet. Hosts have an LID between 0 and 48,000, usually expressed in hexadecimal notation (such as 0xb1). Routing within a subnet is managed by LID. The GUID, GID, and LID for a Linux server are stored in text files. The exact path to this text file will depend upon which Infiniband driver is used on the system. For a system with the MLX4 driver (such as RedHat/CentOS 5.x), the commands are:
cat /sys/class/infiniband/mlx4_0/node_guid 0002:c903:0001:0a48 cat /sys/class/infiniband/mlx4_0/ports/1/gids/0 fe80:0000:0000:0000:0002:c903:0001:0a49 cat /sys/class/infiniband/mlx4_0/ports/1/lid 0x14a
Note that the GUID is contained within the GID, so you really don’t need to fetch the GUID if you have the GID.
Infiniband: Switch Perspective
Understanding an Infiniband network from the perspective of a switch is more difficult. The concepts are the same, but an “Infiniband switch” is actually composed of multiple spine and leaf (line) modules, each of which is considered a “switch” with its own GUID and LID! For example, consider this snipped of output from the ibswitches command on a Voltaire Grid Director 4200-L50:
Switch : 0x0008f10500652600 ports 36 "Voltaire sLB-4018 Line 4 Chip 1 4200 #4200-CF50" enhanced port 0 lid 125 lmc 0 Switch : 0x0008f10500652758 ports 36 "Voltaire sLB-4018 Line 3 Chip 1 4200 #4200-CF50" enhanced port 0 lid 127 lmc 0 Switch : 0x0008f10500652748 ports 36 "Voltaire sLB-4018 Line 2 Chip 1 4200 #4200-CF50" enhanced port 0 lid 126 lmc 0 Switch : 0x0008f10500380986 ports 36 "Voltaire 40Gb InfiniBand Switch Module for IBM BladeCenter" enhanced port 0 lid 124 lmc 0 Switch : 0x0008f1050038084c ports 36 "Voltaire 40Gb InfiniBand Switch Module for IBM BladeCenter" enhanced port 0 lid 123 lmc 0 Switch : 0x0008f105007521a6 ports 36 "Voltaire sFB-4200 Spine 4 Chip 1 4200 #4200-CF50" enhanced port 0 lid 128 lmc 0 Switch : 0x0008f105007521dc ports 36 "Voltaire sFB-4200 Spine 3 Chip 1 4200 #4200-CF50" enhanced port 0 lid 131 lmc 0 Switch : 0x0008f105007521cc ports 36 "Voltaire sFB-4200 Spine 2 Chip 1 4200 #4200-CF50" enhanced port 0 lid 129 lmc 0 Switch : 0x0008f105007521d8 ports 36 "Voltaire sFB-4200 Spine 1 Chip 1 4200 #4200-CF50" enhanced port 0 lid 130 lmc 0 Switch : 0x0008f10500652742 ports 36 "Voltaire sLB-4018 Line 1 Chip 1 4200 #4200-CF50" enhanced port 0 lid 1 lmc 0
References
Infiniband Fundamentals
- Infiniband: An Overview
- An Introduction to the InfiniBand™ Architecture (InfiniBand Trade Assoc.)
- IBM Infiniband HOWTO
InfiniBand Troubleshooting:
- Igor’s InfiniBand troubleshooting guide at Krazyworks
- InfiniBand Fabrics Troubleshooting from SGI
- Monitoring and Controlling the InfiniBand Fabric (from Oracle)