cgroups: so close and yet so far away from per-user fair scheduling
Suppose, not entirely hypothetically, that you have some shared, multiuser compute servers. Further suppose that sometimes, person A is running a single compute process while person B is running, oh, nine. Since Linux divides CPU time among processes without caring who owns them, this means that person A is getting 1/10th of the CPU while person B is getting 9/10ths of it. This doesn't seem entirely fair; it would be better if A and B split the CPU 50/50 regardless of how many compute jobs each ran.
(Since these are all multi-CPU machines, the real examples are more complicated. But yes, periodically there are more compute jobs than there are cores.)
Modern Linux kernels come with support for cgroups, which is designed to enable this sort of stuff. I'll cut to the chase: cgroups can at least in theory do exactly the per-user fair scheduling that we want here. In practice Linux is let down by the current state of the user tools, which lack the features you need to make this feasible.
How to use cgroups to create per-user fair scheduling is pretty
straightforward; you just put each user into their own cgroup (or at
least each real user, you might want to do something different with
system daemon UIDs) and give each user cgroup the same
value. The system will then evenly divide the available CPU up between
all users with active processes. The obvious place to manage all of this
is in a PAM module, which can create the per-user cgroup on the fly the
first time it's necessary and so on.
The kernel support for all of this is there, as is most of the user level tools you'd need (in the form of libcg and associated programs); there's even a PAM module, which classifies users into cgroups based on a configuration file or two. However, what the tools don't have is any ability to have generic entries in the configuration files for creating cgroups and assigning users to them. If you want to have one cgroup per user, you get to write them out explicitly (and then the tools will create them all ahead of time). Oh sure, you can generate the config files with a script, but you also have to poke various daemons every time you want your config file changes to take effect. Things get annoying fast.
(I also wonder how happy the kernel will be to have a thousand or so cgroups, almost all of which are unused at any given time (given that only a handful of our users will log on to a compute server at once).)
PS: the tragic thing is that a hard-coded PAM module would be almost trivial (and I've written PAM modules before). But that would mean building and maintaining a custom PAM module, and this issue is not quite important enough here to justify that.
(Like most sysadmins, we get a modest amount of hives at locally developed software. The closer we can be to stock systems the happier we are, because it means that someone else is maintaining the software.)
systemd, cgroups, and per-user fair scheduling
It appears that this entire issue will be rendered
moot for us if and when Ubuntu does an LTS release
that's based on
systemd. Per this blog posting and more
current versions of systemd can already put users into per-user cgroups
for you with the right options set on the systemd PAM module.
I admit that I'm kind of looking forward to this.
(Well, I'm not looking forward to yet another init replacement and
init system, but there doesn't seem to be anything I can do about
systemd will be the last one for a while.)