How we use the SLURM job scheduler system on our compute servers

November 26, 2021

The Slurm Workload Manager, often called just SLURM, is what a lot of supercomputers and big compute clusters use to manage scheduling and executing all of the jobs that people always want to run on them. However you don't have to use it in such grand environments, and we use it in a much more modest one, with a relatively simple usage as experienced by most people. We've actually had two iterations of our SLURM environment, one that I described in 2019 and the current one.

Our motivation for using SLURM at all is that we have a pool of compute servers of varying capacity, and some GPU servers as well. A few of these compute servers are general login servers, but the problem with these is that they're a free for all; anyone can log in at any time and start using CPU (and perhaps memory, although that can't be fair-share scheduled so it's first come, first served). Traditionally people have wanted to reserve some dedicated amount of resources that are theirs for some amount of time. Well, SLURM does that.

As experienced by our researchers, our SLURM setup lets them reserve some amount of cores and memory on a compute server for a job for up to a maximum time (and also GPUs, if they want those and the machine has them). Sometimes people will be picky about what sort of machine (or what specific machine) they want to use, and other times they'll take whichever of our highly varied set of compute servers has enough resources free at the moment and SLURM decides to use. When SLURM grants your allocation, which may be immediately, you can run an interactive login session on the compute server or just run a program or script. Your program or session gets access to however many resources you asked for and got given, and no more. If the compute server has left over resources, someone else can allocate them (or you can allocate them for another job).

(When a job ends, perhaps because it hit the maximum number of days we allow a single job to run for, SLURM terminates all of that job's processes on the compute server.)

Researchers can have more than one job active at once, up to e relatively large limit. Internally, SLURM keeps track of what resources people have used recently and uses this information to do fair scheduling when all of our compute or GPU servers are busy for long enough, so that everyone who is competing for resources gets roughly the same amount over the long term. This does have some caveats; basically, our SLURM setup always lets you use resources that are currently free, so in order to get fair scheduling, people have to be willing to submit jobs that won't execute immediately.

(In practice our compute servers generally aren't all busy, and when they are all busy it's usually not for very long.)

Our researchers can in theory use our SLURM setup for a lot of additional, more sophisticated things, but in practice 'run something on a compute server with some guaranteed resources' is the big usage. I suspect that most researchers never bother to look deeper in our documentation and the SLURM manual pages than how to make requests, see what compute servers are available and what they have, and maybe see what jobs are active or queued up.

Sidebar: Our changing SLURM setup over time

Our first SLURM setup tried to use SLURM to allocate entire compute servers at a time, which was imitating our previous manual system. Even at the time I noted that this wasn't how SLURM wanted to work, and eventually that mismatch became too much. Our current setup lets SLURM allocate and schedule fine grained resources the way it wants to, which means that people can be allocated only part of a machine and they have to figure out how much resources to ask for for their jobs.

Written on 26 November 2021.
« Why region based memory allocation help with fragmentation
Two stories of how and why simultaneous multithreading works »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Nov 26 22:19:08 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.