2024-07-25
How I almost set up a recursive syslog server
Over on the Fediverse, I mentioned an experience I had today:
Today I experienced that when you tell a syslog server to forward syslog to another server, it forwards everything. Including anything it was sent by other servers. And to confuse you, those forwarded messages will often be logged with the original host names, so you can wonder what these weird servers are that are sending you unexpected traffic.
At least I caught this before we had the central syslog server forward to itself. That probably would have been fun™.
You might wonder how on earth you do this to yourself without noticing, and the answer is the (dangerous) power of standardized installs.
We've had a central syslog server for a long time, along with another syslog server that we run for machines run by Points of Contact that are on internal sandbox networks. For much of this time, these syslog servers have been completely custom-installed machines; for example, they ran RHEL and then CentOS when we'd switched to Ubuntu for the rest of our machines. The current hardware and OS setup on these machines has been aging, so we've been working on replacing them. This time around, rather than doing a custom install, we decided to make these machines one variant of our standard Ubuntu install, supplemented by a small per-machine customization process. There are some potential downsides to this, since the machines have somewhat less security isolation, but we felt the advantages were worth it (for example, now they'll be part of our standard update system).
Part of our standard Ubuntu install configures the installed machine's syslog daemon to forward a copy of all syslog messages to our central syslog server; specifically this is part of the standard scripts that are run on a machine to give it our general baseline setup. This is standard and so basically invisible, so I didn't think of this syslog forwarding when putting together the post-install customization instructions for these syslog servers. Fortunately, the first syslog server we rebuilt and put into production was the additional syslog server for other people's logs, not the central server for our own logs. It was fortunate that today I had a reason to look at one set of logs on our central syslog server that had low enough log volume that I could spot out of place entries immediately, and then start trying to track them down.
This sort of thing is fairly closely related to the general large environment issue where you have recursive dependencies or recursive relationships between services, often without realizing it. You can even get direct self-dependencies, for example if you don't remember to change your DHCP server away from getting its network configuration by DHCP, although in that sort of case you're probably going to notice the first time you reboot the machine in production (assuming you don't have redundant DHCP servers; if you do, you might not find this out until you're cold-starting your entire environment).
(Some self-usage is harmless and even a good thing. For example, you probably want your internal DNS resolvers to do any necessary DNS lookups through themselves, instead of trying to find some other DNS resolver for them.)