cgroups: so close and yet so far away from per-user fair scheduling

June 17, 2011

Suppose, not entirely hypothetically, that you have some shared, multiuser compute servers. Further suppose that sometimes, person A is running a single compute process while person B is running, oh, nine. Since Linux divides CPU time among processes without caring who owns them, this means that person A is getting 1/10th of the CPU while person B is getting 9/10ths of it. This doesn't seem entirely fair; it would be better if A and B split the CPU 50/50 regardless of how many compute jobs each ran.

(Since these are all multi-CPU machines, the real examples are more complicated. But yes, periodically there are more compute jobs than there are cores.)

Modern Linux kernels come with support for cgroups, which is designed to enable this sort of stuff. I'll cut to the chase: cgroups can at least in theory do exactly the per-user fair scheduling that we want here. In practice Linux is let down by the current state of the user tools, which lack the features you need to make this feasible.

How to use cgroups to create per-user fair scheduling is pretty straightforward; you just put each user into their own cgroup (or at least each real user, you might want to do something different with system daemon UIDs) and give each user cgroup the same cpu.shares value. The system will then evenly divide the available CPU up between all users with active processes. The obvious place to manage all of this is in a PAM module, which can create the per-user cgroup on the fly the first time it's necessary and so on.

The kernel support for all of this is there, as is most of the user level tools you'd need (in the form of libcg and associated programs); there's even a PAM module, which classifies users into cgroups based on a configuration file or two. However, what the tools don't have is any ability to have generic entries in the configuration files for creating cgroups and assigning users to them. If you want to have one cgroup per user, you get to write them out explicitly (and then the tools will create them all ahead of time). Oh sure, you can generate the config files with a script, but you also have to poke various daemons every time you want your config file changes to take effect. Things get annoying fast.

(I also wonder how happy the kernel will be to have a thousand or so cgroups, almost all of which are unused at any given time (given that only a handful of our users will log on to a compute server at once).)

PS: the tragic thing is that a hard-coded PAM module would be almost trivial (and I've written PAM modules before). But that would mean building and maintaining a custom PAM module, and this issue is not quite important enough here to justify that.

(Like most sysadmins, we get a modest amount of hives at locally developed software. The closer we can be to stock systems the happier we are, because it means that someone else is maintaining the software.)

Sidebar: systemd, cgroups, and per-user fair scheduling

It appears that this entire issue will be rendered moot for us if and when Ubuntu does an LTS release that's based on systemd. Per this blog posting and more documentation, current versions of systemd can already put users into per-user cgroups for you with the right options set on the systemd PAM module.

I admit that I'm kind of looking forward to this.

(Well, I'm not looking forward to yet another init replacement and init system, but there doesn't seem to be anything I can do about that. Maybe systemd will be the last one for a while.)

Comments on this page:

From at 2011-06-17 03:50:49:

that is provided ubuntu uses systemd. I would not hold my breath.

Right now it is only in fedora 15, so it is safe to assume systemd will be in a technology preview for rhel 6.3 or 6.4 (one year from now); then one higher for production. If you need this functionality, use another vendor :-) like scientific linux (if you will not pay for rhel licenses, that is).

From at 2012-11-30 23:55:16:

looks like someone wrote some patches for this here.

This would enable what your talking about.

By cks at 2012-12-01 01:47:20:

That's an interesting and hopeful development (and thank you for mentioning it here). I'll cross my fingers that this patch is accepted and then makes it into some future Ubuntu LTS version.

We looked at the problem and apparently the only missing thing is a way to create the user cgroup on the fly. If cgroups are pre-created the PAM module classifies the processes correctly and fair sharing works great.

Another problem we found was the lack of documentation on how to get things done. Your post was helpful in that regard.

Our contribution is here:

By lilydjwg at 2020-05-15 08:43:50:

This now works with some simple systemd configurations, i.e. create a user-.slice.d/resources.conf and put


in it. At least it works for unified cgroups (cgroup2).

Written on 17 June 2011.
« My three sorts of (Linux) desktops
My xdm heresy »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jun 17 00:26:17 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.