Most business-class switches and routers support monitoring via SNMP. The main barrier to implementing SNMP monitoring is that so many parameters can be monitored, and it can be very hard to isolate the most important parameters. This example shows how to monitor the status of a specific port. Note that the term “port” includes link aggregation groups (LAGs) or other logical groups of ports like VLANs.
Use Case
A top-of-rack switch has several important link aggregation groups (LAGs). For example, one LAG has connections to two identical routers in a VRRP cluster. Other LAGs provide redundant network paths to independent NICs on the same server. I want to be alerted if the port status changes on any critical LAG. The ISP delivers Internet to a single port on the switch, which I also want to monitor.
Finding the Right SNMP Strings
Use the snmpwalk utility to list all of the parameters that can be monitored, and search the output to find the desired strings. Use the RFC1213 MIB, which is supported by many network devices, to interpret the results. In this example, two switches are stacked, but SNMP interprets the stacked switches as a single logical device. Note that the description string for ifDescr.74
specifies that it corresponds to port 22 on unit 2.
snmpwalk -v2c -c my_community switch-01 RFC1213-MIB::interfaces | less
...
IF-MIB::ifDescr.420 = STRING: lag 3
IF-MIB::ifDescr.421 = STRING: lag 4
IF-MIB::ifDescr.422 = STRING: lag 5
IF-MIB::ifDescr.423 = STRING: lag 6
...
IF-MIB::ifDescr.74 = STRING: Unit: 2 Slot: 0 Port: 22 10G - Level
...
Nagios Check Configuration
Using the strings found above, I configured the Nagios check to alert if any key LAG or port goes down:
define service {
service_description Port Link Status
check_command check_snmp!-C my_community -o ifOperStatus.420 -r 1 -m RFC1213-MIB -o ifOperStatus.421 -r 1 -m RFC1213-MIB -o ifOperStatus.422 -r 1 -m RFC1213-MIB -o ifOperStatus.423 -r 1 -m RFC1213-MIB -o ifOperStatus.74 -r 1 -m RFC1213-MIB switch-01
host_name switch-01
use generic-service,Default_collector_server
contact_groups +admins
}