Wandering Thoughts archives

2015-04-28

There's no portable way to turn a file descriptor read only or write only

It all started when John Regehr asked a good question in a tweet:

serious but undoubtedly stupid question: why does writing to file descriptor 0 in Linux and OS X work?

My first impulse was to say 'lazy code that starts with a general read/write file descriptor and doesn't bother to make it read only when the fd becomes a new process's standard input', but I decided to check the manual pages first. Much to my surprise it turns out that in Unix there is no portable way to turn a read/write file descriptor into a read-only or write-only one.

In theory the obvious way to do this is with fcntl(fd, F_SETFL, O_RDONLY) (or O_WRONLY as applicable). In practice, this is explicitly documented as not working on both Linux and FreeBSD; on them you're not allowed to affect the file access mode, only things like O_NONBLOCK. It's not clear if this behavior is compliant with the Single Unix Specification for fcntl(), but either way it's how a very large number of real systems behave in the field today so we're stuck with it.

This means that if you have, say, a shell, the shell cannot specifically restrict plain commands that it starts to have read-only standard input and write-only standard output and standard error. The best it can do is pass on its own stdin, stdout, and stderr, and if they were passed to the shell with full read/write permissions the shell has to pass them to your process with these permissions intact and so your process can write to fd 0. Only when the shell is making new file descriptors can it restrict them to be read only or write only, which means pipelines and file redirections.

Further, it turns out that in a fair number of cases it's natural to start out with a single read/write file descriptor (and in a few it's basically required). For one example, anything run on a pseudo-tty that was set up through openpty() will be this way, as the openpty() API only gives you a single file descriptor for the entire pty and obviously it has to be opened read/write. There are any number of other cases, so I'm not going to try to run through them all.

(At this point it may also have reached the point of backwards compatibility, due to ioctl() calls on terminals and ptys. I'm honestly not sure of the rules for what terminal ioctls need read and/or write permissions on the file descriptors, and I bet a bunch of other people aren't either. In that sort of environment, new programs that set up shells might be able to restrict fds 0, 1, and 2 to their correct modes but don't dare do so lest they break various shells and programs that have gotten away with being casual and uncertain.)

PS: If you want to see how a shell or a command's descriptors are set up, you can use lsof. The letter after the file descriptor's number will tell you if it's read, write, or u for r/w.

FdPermissionsLimitation written at 00:29:11; Add Comment

2015-04-03

Understanding the (original) meaning of Unix load average

Most everyone knows the load average, and almost every system administrator knows that it's not necessarily a useful measure today. The problem is that the load average combines two measurements, as it counts both how many processes are trying to run and how many processes are currently waiting for IO to finish. This means that a machine having a big load average tells you very little by itself; do you have a lot of processes using the CPU, a lot of processes doing IO, a few processes doing very slow IO, or perhaps a bunch of processes waiting for an NFS server to come back to life?

As it happens, I think there is an explanation for what the load average is supposed to mean and originally did mean, back in the early days of Unix. To put it simply, it's how soon your process would get to run.

To see how this makes sense, let's rewind time to the Vaxes that 3BSD ran on when load average was added to Unix. On those machines, two things were true: in CPU-relative terms IO was faster than it is now, and the CPU was simply slow in general so that doing anything much took appreciable compute time. This means that a process waiting on 'fast' disk IO is probably going to have the IO complete before you do much computation yourself and then the process's going to have to use enough CPU time to deal with the IO results that you're going to notice, even if it's doing relatively simple processing. So runnable processes are directly contending for the CPU right now and 'busy' processes in IO wait will be contending for it before you can do very much (and the kernel will soon be doing some amount of computing on their behalf). Both sorts of processes will delay yours and so merging them together in a single 'load average' figure makes sense.

This breaks down (and broke down) as CPUs became much faster in an absolute sense as well as much faster than IO. Today a process doing only basic IO processing will use only tiny amounts of CPU time and your CPU-needing process will probably hardly notice or be delayed by it. This makes the number of processes in IO wait basically meaningless as a predictor of how soon a ready process can run and how much of the CPU it'll get; you can do a lot before their slow IO completes and when it does complete they often need almost no CPU time before they go back to waiting on IO again. There's almost no chance that a 'busy' process in IO wait will block your process from getting a CPU slice.

(As a side note, including some indicator of disk load into 'load average' also makes a lot of sense in a memory-constrained environment where a great deal of what you type at your shell prompt requires reading things off disk, which is what early BSDs on Vaxes usually were. A 100% unused CPU doesn't help you if you're waiting to read the test binary in from the disk in the face of 10 other processes trying to do their own disk IO.)

LoadAverageMeaning written at 04:04:02; Add Comment

2015-04-02

When the Unix load average was added to Unix

For reasons beyond the scope of this entry itself, I recently became curious about when the concept of 'load average' first appeared in Unix. Fortunately we have the Unix tree from the Unix Heritage Society, so I can answer this question by digging through various historical Unix trees.

The answer appears to be that the concept of load average appears first in 3BSD. In the 3BSD /usr/src/cmd directory, uptime.c is allegedly dated to October 4th 1979. The 3BSD kernel source for vmsched.c already has the normal definition of load average; it counts both runnable processes and processes waiting in uninterruptible sleep (which is theoretically always a short term thing). I believe that 3BSD also marks the appearance of vmstat.

As far as I can tell, V7 Unix lacked both an uptime command and any kernel support for accumulating the information. I was going to say that V7 didn't have any way to see how loaded your system was, but it does have a basic version of iostat and the kernel kept some degree of information about things like system versus user time, as you can see from the iostat manpage.

My personal suspicion is that 3BSD grew support for keeping track of how loaded the system was (via both uptime and the more detailed vmstat) because Berkeley started using 3BSD for undergraduate computing in student labs, where you could not simply ask your colleagues if someone was running something big and could they stop for a while if so. But I don't actually know if Berkeley was using 3BSD in undergrad labs this early on or if they only started doing it a few years later with 4BSD et al.

(UCB may also have wanted to have some idea of how well their new paged virtual memory system was working in practice.)

As as side note, I do like the BUGS section of the 3BSD vmstat manual page (at the end):

So many numbers print out that its sometimes hard to figure out what to watch.

This has only become more true over time.

LoadAverageOrigin written at 01:14:46; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.