2024-07-29
Handling (or not) the serial console of our serial console server
We've had a central serial console server for a long time. It has two purposes; it logs all of the (serial) console output from servers and various other pieces of hardware (which on Linux machines includes things like kernel messages, cf), and it allows us to log in to machines over their serial console. For a long time this server was a hand built one-off machine, but recently we've been rebuilding it on our standard Ubuntu framework (much like our central syslog server). Our standard Ubuntu framework includes setting up a (kernel) serial console, which made me ask myself what we were going to do with the console server's serial console.
We have a matrix of options. We can direct the serial console to either a physical serial port or to the BMC's Serial over LAN system. Once the serial console is somewhere, we can ignore it except when we want to manually use it, connect it to the console server's regular conserver instance, or connect it to a new conserver instance on some other machine (which would have to be using either IPMI Serial-over-LAN or a USB serial port, depending on which serial console we pick).
Connecting the console server's serial port to itself would let us log routine serial console output in the same place that we put all of the other serial console output. However, it wouldn't allow us to capture kernel logs if the machine crashed for some reason, which is one valuable thing that our current serial console setup has, or log in through the serial console if the console server fell off the network. Setting up a backup, single-host conserver on another machine would allow us to do both, at the cost of having a second conserver machine to think about.
Using Serial-over-LAN would allow us to log in to the console server over its serial console from any other machine that had access to what has become our IPMI/BMC network, which is a number of them (it's that way for emergency access purposes). However it requires that the BMC network be up, which is to say that all of the relevant switches are working. A direct (USB) serial connection would only require the other machine to be up and reachable.
Of course we can split the difference. We could have the Linux kernel serial console on the physical serial port and also have logins enabled on the Serial-over-LAN serial port. In a lot of situations this would still give us remote access to the console server, although we wouldn't be able to trigger things like Magic SysRq over the SoL connection since it's not a kernel console.
(Unfortunately you can only have one kernel serial console.)
My current view is that the easiest thing to start with is to set the serial console to the Serial-over-LAN port and then not have anything collecting kernel messages from it. If we decide we want to change that, we can harvest SoL serial console messages from either the console server itself or from another machine. In an emergency, a SoL port can be accessed from any machine with BMC network access, not just from its conserver machine, unlike a physical serial port (which would have to be accessed from the other machine connected to it).
(In our current conserver setup, you don't really want to access the SoL port from another machine if you can avoid it. Doing so will quietly break the connection from conserver on the console server until you restart conserver. It's possible we could work around this with libipmiconsole.conf settings.)