I recently found the following message in the system logs for one of the compute nodes in the STOKES cluster:
init: Id "co" respawning too fast: disabled for 5 minutes
This caught my attention, because this OS image on this particular node should be identical to the image that is deployed on the rest of the nodes in the cluster. Why was it the only one producing this strange warning message? I searched the web and learned that the following line in /etc/inittab is causing the warning:
co:2345:respawn:/sbin/agetty ttyS1 19200 vt100-nav
However, this line is also found in the inittab file on every other node, so there is no error in inittab. Many solutions suggest commenting out this line. That is not a real solution, since this line is actually correct. After reading this post about co respawning too fast, I realized that the problem with my server is that one of the physical serial ports was not working. The output from dmesg showed:
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A brd: module loaded
This node should have TWO serial ports! I rebooted the node and confirmed that the second serial port was not configured in the UEFI. I suspect that this node “lost” a serial port when the battery on the motherboard had to be replace a few months back. Surprisingly, I had to reconfigure the serial ports twice before it worked. Now dmesg shows:
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A 00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:03: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
…and the annoying warning is no longer filling my log files with noise! I suspect that this points to the root problem experienced by other admins, such as these disgruntled Amazon Web Services users. I suspect that Amazon made a change in their virtual machine that eliminated one or more virtual serial ports, wreaking havoc with some users whose instances depended on them.
Error is looping. How did you stop the error message?