2024-07-30
On not automatically reconnecting to IPMI Serial-over-LAN consoles
One of the things that the IPMI (network) protocol supports is Serial over LAN, which can be used to expose a server's serial console over your BMC's management network. These days, servers are starting to drop physical serial ports, making IPMI SOL your only way of getting console serial ports. The conserver serial console management software supports IPMI SOL (if built with the appropriate libraries), and you can directly access SOL serial consoles with IPMI programs. However, as I mentioned in passing in yesterday's entry, IPMI SOL access has a potential problem, which is that only one SOL connection is allowed at a time and if someone makes a new SOL connection, any old one is automatically disconnected. This disconnection is invisible to the IPMI SOL client until (and unless) it attempts to send something to the SOL console, at which point it apparently gets a timeout. This is bad for a program like conserver, which in many situations will only read SOL console output in order to log it, not send any input to the SOL console.
(This BMC behavior may not be universal, based on some comments in FreeIPMI.)
Conserver uses FreeIPMI for IPMI SOL access, which supports a special 'serial keepalive' option (which you can configure in libipmiconsole.conf) to detect and remedy this. As covered in comments in ipmiconsole.h, this option (normally) works by periodically sending a NUL character to the SOL console, which will make the BMC eventually tell you that the serial connection has been broken and you need to re-create your IPMI SOL session so that now you get serial output again.
When I first read about this option I was enthused about putting it into our configuration, so that conserver would automatically re-establish stolen SOL connections. Then I thought about it a bit more and decided that this probably wasn't a good idea. The problem is that there's no way to tell if another IPMI SOL session is active at the moment or not (at least with this option); all we can do is unconditionally take the SOL console back. If one of us has made a SOL connection, done some stuff, and disconnected again, this is fine. If one of us is in the process of using a live SOL connection right now, this is bad.
This is especially so because about the only time when we'd resort to using a direct IPMI SOL connection instead of logging in to the console server and using conserver is when either we can't get to the console server or the console server can't get to the BMC of the machine we want to connect to. These are stressful situations when something is already wrong, so the last thing we want is to compound our problems by having a serial console connection stolen in the middle of our work.
Not configuring FreeIPMI with serial keepalives doesn't completely eliminate this problem; it could still happen if the console server machine is (re)booted or conserver is restarted. Both of these will cause conserver to start up, make a bunch of IPMI SOL connections, and steal any current by-hand SOL connections away from us. But at least it's less likely.