The Unix background of Linux's 'file-max' and nr_open kernel limits on file descriptors
Somewhat recently, Lennart Poettering wrote about File Descriptor Limits. In passing, Poettering said (emphasis mine):
[...] Specifically on Linux there are two system-wide sysctls:
fs.file-max. (Don't ask me why one uses a dash and the other an underscore, or why there are two of them...) [...]
I can't help much about the first question, but the answer to the second one may be that Linux is carrying on a tradition that goes deep in the history of Unix (and, it turns out, to its early implementation). Specifically, it goes back to the very simple kernel designs of Research Unix versions, such as V7 (the famous starting point for so much diversity).
Unix started out as a small and simple system, and the Research Unix kernels often used simple and what we would consider brute force data structures. In particular, early Unixes tended to use fixed size arrays of things that kernels today allocate dynamically. When it comes to open files and file descriptors, there are two things that you have to keep track of. Each process has some number of file descriptors, then the underlying open files may be shared between processes and so have to be tracked in some global state.
V7 and other Research Unixes implemented this in a straightforward
way. Each process had a fixed size array of its open files, the
u_ofile array in the user structure, and
then there was another fixed size global array for all open files,
file struct (in c.c; the
struct file is defined in file.h). The
sizes of both of these arrays were set when you built the kernel,
in param.h, and
influenced how much of the PDP-11's very limited memory the resulting
kernel would take up.
(If you read through the V7 param.h, you can see that V7 had any number of very small limits. The limit on the total number of open files may seem small, but standard input, standard output, and standard error are often widely shared; each login session might reasonably use only one open file for all of them for the shell and all of the processes it runs interactively.)
The existence of these compiled in limits lasted a fair while; the
4.3 BSD user.h
still has a fixed size array of file descriptors for each process,
for example (4.4 BSD switched to a dynamic scheme). So did Linux
0.96c, as seen in sched.h
(and also a fixed size global array of open file structures; see
the implementation of
sys_open in open.c).
Once the actual allocation of both the per process set of file descriptors and the global set became more dynamic, people naturally started putting limits on just how dynamic this could be. It was natural to make the per-process a resource limit (at least normally) while setting a kernel tunable limit for the global limit. Linux also has a kernel limit that caps how many open files a process can have that overrides the normal resource limits.
Having two separate limits even on kernels which dynamically allocate these things makes some sense, but not necessarily a lot of it. A limit on the number of file descriptors that a single process can have open at once will save you from a program with a coding error that leaks open files (especially if it leaks them rapidly). A separate limit on the total number of open file descriptors across all processes is effectively a limit on the amount of memory one area of the kernel can lock down, which is at least potentially useful.
(I expect that Poettering knows all of this background, but other
people don't necessarily, so I decided to write about it. I mentioned
some of this in my entry on
dup(2) and shared file descriptors.)
PS: The obvious speculation about why Linux's sysctl for per process
open file descriptors has an underscore in its name is that the original
kernel #define was called
NR_OPEN. Since the original kernel #define
for the global maximum as
NR_FILE, this doesn't explain why it uses