== Your server BMCs can need to be rebooted every so often Over on the Fediverse [[I said https://mastodon.social/@cks/109683498578654891]]: > A sysadmin tip: if your [[BMC/IPMI > https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface]] > is doing weird things, restart (reboot) it. Server BMCs are little > computers running ancient versions of Linux with software that's > probably terribly written and they stay running forever, which means > all sorts of opportunities for slow bugs. Reboot away! > > This is brought to you by the BMC with a KVM-over-IP that wouldn't > accept '2' entered on the (virtual) keyboard in any way or form. Until > I rebooted the BMC. \\ > PS: Our IP addresses have 2s in them. (This probably isn't the only weird BMC glitch we've experienced, but it's the first one where I tried rebooting the BMC and that fixed it.) A number of people shared additional stories in the replies, and I especially 'liked' [[@frederic@chaos.social https://chaos.social/@frederic/109683675954689732]]'s: > Same for IPMI hardware sensors: Thought the motherboard was damaged > because half the sensors were reported as "n/a". Rebooting magically > fixed this. {{AB:🙈:see no evil emoji}} This happens for more or less the reasons I mentioned above. BMCs naturally accumulate very large uptimes because they don't normally reboot when your server reboots; if you don't do anything special, your BMC will normally stay up for as long as the server has power. In many places this can amount to years of uptime, and it's a rare set of software that can stand up to that even if you don't use them much. Server vendors typically don't want you to think about this, and I don't believe 'BMC uptime' is generally exposed anywhere. (Routinely querying the BMC's sensor readings via IPMI may actually make this worse, since then the BMC's software is active to answer those queries. I should probably make [[our metrics system PrometheusGrafanaSetup-2019]] notice when a server decreases the number of IPMI metrics it exposes without a reboot.) Modern BMCs can generally reboot themselves without rebooting their host (the actual server), although you may want to test this to be sure since [[apparently some vendors can do that differently https://honk.bewilderbeest.net/u/zev/h/63l6vyZV3WKDHZbz35]]. PS: How I encountered this is that I was reinstalling a server using KVM-over-IP, and I hit the portion of the base Ubuntu 22.04 install when I had to enter the subnet and various associated IP addresses. Our network has a '2' in it, so all of that failed. Helpfully, the KVM-over-IP software had a virtual keyboard so I could see it wasn't just some browser weirdness intercepting a '2' from my real keyboard; even the virtual keyboard's '2' key wouldn't get through to the Ubuntu 22.04 installer running on the server being reinstalled. Since rebooting the BMC didn't reboot the host, I could verify that rebooting the BMC alone fixed the problem; when the BMC rebooted, my KVM-over-IP session could now enter all digits. (I'm glad that it occurred to me to reboot the BMC, instead of just grumble and go down to the machine room to do the install with the physical console.)