The DBus daemon and out of memory conditions (and systemd)

October 23, 2019

We have a number of systems where for reasons beyond the scope of this entry, we enable strict overcommit. In this mode, when you reach the system's memory limits the Linux kernel will deny memory allocations but usually not trigger the OOM killer to terminate processes. It's up to programs to deal with failed memory allocations as best they can, which doesn't always go very well. In our current setup on the most common machines we operate this way, we've set the vm.admin_reserve_kbytes sysctl to reserve enough space for root so that most or all of our system management scripts keep working and we at least don't get deluged in email from cron about jobs failing. This mostly works.

(The sysctl is documented in vm.txt.)

Recently several of these machines hit an interesting failure mode that required rebooting them, even after the memory usage had finished. The problem is DBus, or more specifically the DBus daemon. The direct manifestation of the problem is that dbus-daemon logs an error message:

dbus-daemon[670]: [system] dbus-daemon transaction failed (OOM), sending error to sender inactive

After this error message is logged, attempts to do certain sorts of systemd-related DBus operations hang until they time out (if the software doing them has a timeout). Logins over SSH take quite a while to give you a shell, for example, as they fail to create sessions:

pam_systemd(sshd:session): Failed to create session: Connection timed out

The most relevant problem for us on these machines is that attempts to query metrics from the Prometheus host agent start hanging, likely because we have it set to pull information from systemd and this is done over DBus. Eventually there are enough hung metric probes so that the host agent starts refusing our attempts immediately.

The DBus daemon is not easy to restart (systemd will normally refuse to let you do it directly, for example), so I haven't found any good way of clearing this state. So far my method of recovering a system in this state is to reboot it, which I generally have to do with 'reboot -f' because a plain 'reboot' hangs (it's probably trying to talk to systemd over DBus).

I believe that part of what creates this issue is that the DBus daemon is not protected by vm.admin_reserve_kbytes. That sysctl specifically reserves space for UID 0 processes, but dbus-daemon doesn't run as UID 0; it runs as its own UID (often messagebus), for good security related reasons. As far as I know, there's no way to protect an arbitrary UID through vm.admin_reserve_kbytes; it specifically applies only to processes that hold a relatively powerful Linux security capability, cap_sys_admin. And unified cgroups (cgroup v2) don't have a true guaranteed memory reservation, just a best effort one (and we're using cgroup v1 anyway, which doesn't have anything here).

We're probably making this DBus issue much more likely to happen by having the Prometheus host agent talk to systemd, since this generates DBus traffic every time our Prometheus setup pulls host metrics from the agent (currently, every 15 seconds). At the same time, the systemd information is useful to find services that are dead when they shouldn't be and other problems.

(It would be an improvement if the Prometheus host agent would handle this sort of DBus timeout during queries, but that would only mean we got host metrics back, not that DBus was healthy again.)

PS: For us, all of this is happening on Ubuntu 18.04 with their version of systemd 237 and dbus 1.12.2. However I suspect that this isn't Ubuntu specific. I also doubt that this is systemd specific; I rather suspect that any DBus service using the system bus is potentially affected, and it's just that the most commonly used ones are from systemd and its related services.

(In fact on our Ubuntu 18.04 servers there doesn't seem to be much on the system bus apart from systemd related things, so if there are DBus problems at all, it's going to be experienced with them.)

Written on 23 October 2019.
« Groups of processes are a frequent and fundamental thing in Unix
Third party ClamAV signatures seem to include a lot of phish and other spam »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 23 22:31:07 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.