A fundamental limitation of systemd's per-user fair share scheduling

September 3, 2017

Up until now, I've been casually talking about systemd supporting per-user fair share scheduling, when writing about the basic mechanics and in things like getting cron jobs to cooperate. But really both of these point out a fundamental limitation, which is that systemd doesn't have per-user fair share scheduling; what it really has is per-slice fair share scheduling. You can create per-user fair share scheduling from this only to the extent that you can arrange for a given user's processes to all wind up somewhere under their user-${UID} slice. If you can't arrange for all of the significant processes to get put under user-${UID}.slice, you don't get complete per-user fair share scheduling; some processes will escape to be scheduled separately and possibly (very) unfairly.

This may sound like an abstract limitation, so let me give you a concrete case where it matters. We run a departmental web server, where users can run processes to handle web requests in various ways, both via CGIs and via user-managed web servers. Both of these can experience load surges of various sorts and sometimes this can result in them eating a bunch of CPU. It would be nice if user processes could have their CPU usage shared fairly among everyone, so that one user with a bunch of CPU-heavy requests wouldn't starve everyone else out of the CPU.

User-managed web servers run either from cron with @reboot entries or manually by the user logging in and (re)starting them; in both cases we can arrange for the processes to be under user-${UID}.slice and so be subject to per-user fair share scheduling. However, user CGIs are run via suexec and suexec doesn't use PAM (unlike cron); it just directly changes UID to the target user. As a result, all suexec CGI processes are found in apache2.service under the system slice, and so will never be part of per-user fair share scheduling.

(Even if you could make suexec use PAM and so set up systemd sessions for CGIs it runs if you wanted to, it's not clear that you'd want to be churning through that many session scopes and perhaps user slice creations and removals. I'm honestly not sure I'd trust systemd to be resilient in the face of creating huge numbers of very short-lived sessions, especially many at once if you get a load surge against some CGIs.)

As far as I can see, there's no way to solve this within the current state of systemd, especially for the case of CGIs. Systemd would probably need a whole new raft of features (likely including having the user-${UID}.slice linger around even with no processes under it). Plus we'd need a new version of suexec that explicitly got systemd to put new processes in the right slices (or used PAM so a PAM module could do this).

Sidebar: This is also a general limitation of Linux

Linux has chosen to implement per-user fair share scheduling through a general mechanism to do fair share scheduling of (c)groups. Doing it this way has always required that you somehow arranged for all user processes to wind up in a per-user cgroup (whether through PAM modules, hand manipulation when creating processes, or a daemon that watched for processes that were in the wrong spot and moved them). If and when processes fell through the cracks, they wouldn't be scheduled appropriately. If anything, systemd makes it easier to get close to full per-user fair share scheduling than previous tools did.

Written on 03 September 2017.
« Putting cron jobs into systemd user slices
The idea of 'spam levels' may be a copout »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Sep 3 01:40:44 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.