Wandering Thoughts archives


Systemd's DynamicUser feature is (currently) dangerous

Yesterday I described how timesynd couldn't be restarted on one of our Ubuntu 18.04 machines, where the specific thing that caused the failure was timesyncd attempting to access /var/lib/private/systemd/timesync and failing because /var/lib/private is only accessible by root, not the UID that timesyncd was running as. My diagnostic efforts left me puzzled as to how this was supposed to work at all, but Trent Lloyd (@lathiat) pointed me to the answer, which is in Lennart Poettering's article Dynamic Users with systemd, which introduces the overall system, explains the role of /var/lib/private, and covers how timesyncd is supposed to get access through an inaccessible directory. I'll quote the explanation for that:

[Access through /var/lib/private] is achieved by invoking the service process in a slightly modified mount name-space: it will see most of the file hierarchy the same way as everything else on the system ([...]), except for /var/lib/private, which is over-mounted with a read-only tmpfs file system instance, with a slightly more liberal access mode permitting the service read access. [...]

Since timesyncd is not able to get access through /var/lib/private, you might guess that something has gone wrong in the process of setting up this slightly modified mount namespace. Indeed this turned out to be the case. The machine that this happened on is an NFS client and (as is usual) its UID 0 is mapped to an unprivileged UID on our fileservers. On this machine there were some FUSE mounts in the home directories of users who have their $HOME not world readable (our default $HOME permissions are owner-only, to avoid accidents). When systemd was setting up the 'slightly modified mount name-space' it attempted to access these FUSE mounts as part of binding them into the namespace, but it failed because UID 0 had no permissions to look inside user home directories.

This failure caused systemd to give up attempting to set up the namespace. However, systemd did not abort unit activation or even log an error message. Instead it continued on to try to start timesyncd without this special namespace, despite the fact that timesyncd uses both DynamicUser and StateDirectory and so starting it normally was essentially absolutely guaranteed to fail.

(Although my initial case was dangling FUSE mounts, it soon developed that any FUSE mounts would do it, for example a sshfs or smbfs mount in a user's NFS mounted home directory when the home directory isn't world-accessible.)

Systemd's failure to handle errors in setting up the namespace here has been raised as systemd issue 9835. However, merely logging an error or aborting the unit activation would not actually fix the core problem; it would merely let you see exactly why your timesyncd or whatever service is failing to start. The core problem is that systemd's current design for DynamicUser intrinsically blows up if systemd and UID 0 don't have full access to every mount that's visible on the system.

(Well, DynamicUser plus StateDirectory, but the idea seems to be that pretty much every service using dynamic users will have a systemd managed state directory.)

In my opinion, this makes using DynamicUser surprisingly dangerous. A systemd service that is set to use it can't be reliably started or restarted on all systems; it only works on some systems, some of the time (but those happen to be the common case). If there's ever a problem setting up the special namespace that each such service requires, things fail. Machines that are NFS clients are the obvious case, since the client's UID 0 often has limited privileges, but I believe that there are likely to be others.

(And of course services can be restarted for random and somewhat unpredictable reasons, such as package updates or other services being restarted. You should not assume that you can always control these circumstances, or completely predict the state of the system when they happen.)

linux/SystemdDynamicUserDangerous written at 21:51:36; Add Comment

A timesyncd total failure and systemd's complete lack of debugability

Last November, I wrote an entry about how we were switching to using systemd's timesyncd on our Ubuntu machines. Ubuntu 18.04 defaults to using timesyncd just as 16.04 does, and when we set up our standard Ubuntu 18.04 environment we stuck with that default behavior (although we customize the list of NTP servers). Then today I discovered that timesyncd had silently died on one of our 18.04 servers back on July 20th, and worse it couldn't be restarted.

Specifically, it reported:

systemd-timesyncd[10940]: Failed to create state directory: Permission denied

The state directory it's complaining about is /var/lib/systemd/timesync, which is actually a symlink to /var/lib/private/systemd/timesync (at least on systems that are in good order; if the symlink has had something happen to it, you can apparently get other errors from timesyncd). I had a clever informed theory about what was wrong with things, but it turns out strace says I'm wrong.

(To my surprise, doing 'strace -f -p 1' on this system did not produce either explosions or an impossibly large amount of output. This would have been a very different thing on a system that was actually in use; this is basically an almost idle server being used as part of our testing of 18.04 before we upgrade our production servers to it.)

According to strace, what is failing is timesyncd's attempts to access /var/lib/private/systemd/timesync as its special UID (and GID) 'systemd-timesync'. This is failing for the prosaic reason that /var/lib/private is owner-only and owned by root. Since this works on all of our other Ubuntu 18.04 machines, presumably the actual failure is somewhere else.

The real problem here is that it is impossible to diagnose or debug this situation. Simply to get this far I had to read the systemd source code (to find the code in timesyncd that printed this specific error message) and then search through 25,000 lines of strace output. And I still don't know what the problem is or how to fix it. I'm not even confident that rebooting the server will change anything, especially when all the relevant pieces on this server seem to be just the same as the pieces on other, working servers.

(I do know that according to logs this failure started happening immediately after the systemd package was upgraded and re-executed itself. On the other hand, the systemd upgrade also happened on other Ubuntu 18.04 machines, and they didn't have their timesyncds explode.)

Since systemd has no clear diagnostic information here, I spent a great deal of time chasing the red herring that if you look at /var/lib/private/systemd/timesync on such a failing system, it will be owned by a numeric UID and GID, while on working systems it will be the magically special login and group 'systemd-timesync'. This is systemd's 'dynamic user' facility in action, combined with systemd itself creating the /var/lib/private/systemd/timesync directory (with the right login and group) before exec'ing the timesyncd binary. When timesyncd fails to start, systemd removes the login and group but leaves the directory behind, now not owned by any existing login or group.

(You might think that the 'failed to create state directory' error message would mean that timesyncd was the one actually creating the state directory, but strace says otherwise; the mkdir() happens before the exec() does, while the new process that will become timesyncd is still in systemd's code. timesyncd's code does try to create the directory, but presumably the internal systemd functions it's using are fine if the directory is already there with the right ownership and so on.)

I am rather unhappy about this situation, and I am even unhappier that there is effectively nothing that we can do about any aspect of it except to stop using timesyncd (which is now something that I will be arguing for, especially since this server drifted more than half a second out of synchronization before I found this issue entirely by coincidence). Reporting a bug to either systemd or to Ubuntu is hopeless (systemd will tell me to reproduce on the latest version, Ubuntu will ignore it as always). This is simply what happens when the systemd developers produce a design and an implementation that doesn't explain how it actually works and doesn't contain any real support for field diagnosis. Once again we get to return to the era of 'reboot the server, maybe that will fix it'. Given systemd's general current attitude, I don't expect this to change any time soon. Adding documentation of systemd's internals and diagnosis probes would be admitting that the internals can have bugs, problems, and issues, and that's just not supposed to happen.

PS: The extra stupid thing about the whole situation is that the only thing /var/lib/systemd/timesync is used for is to hold a zero-length file whose timestamp is used to track the last time the clock was synchronized, and non-root users can't even see this file on Ubuntu 18.04.

Update: I've identified the cause of this problem, which is described in my new entry on how systemd's DynamicUser feature is dangerous. The short version is that systemd silently failed to set up a custom namespace that would have given timesyncd access to /var/lib/private because it could not deal with FUSE mounts in NFS mounted user home directories that were not world-accessible.

linux/SystemdTimesyncdFailure written at 01:52:59; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.