Wandering Thoughts archives

2007-11-13

Why vfork() got created (part 2)

Although fork() theoretically copies the address space of the parent process to create the child's address space, Unix normally does not actually make a real copy. Instead it marks everything as copy on write and then waits for either the parent or the child to dirty a page, whereupon it does copies only as much as it has to. This makes forks much faster, which is very important since things in Unix fork a lot.

(Disclaimer: this is the traditional implementation for paged virtual memory. I don't know what V7 did, since it only had swapping.)

At least that is the theory and the ideal, but not in this case the practical.

The second and probably real reason that vfork() exists is that when UC Berkeley was adding paged virtual memory to Unix, they couldn't get copy on write working on their cheap Vax 11/750s (although it did work on the higher end 11/780s). To avoid a fairly bad performance hit on what were already low end machines, they added vfork() (which doesn't require copy on write to run fast) and modified sh and csh to use it.

The specific problem was apparently bugs in the 750 microcode that caused writes to read only pages in the stack to not fault correctly. One of the reasons that the 750 was cheaper than the 780 was that it did a number of things in microcode that the 780 did in hardware, which explains why 780s didn't have this problem.

(My source for the details is a message from John Levine here.)

WhyVforkII written at 22:48:23; Add Comment

2007-11-11

Why vfork() got created (part 1)

The fork() system call presents a problem for strict virtual memory overcommit. Because it duplicates a process's virtual address space, strictly correct accounting requires that the child process be instantly charged for however much committed address space the parent process has; if this puts the system over the commit limit, the fork() should fail.

At the same time, in practice most fork() child processes don't use very much of the memory that they're being charged for; almost all the time they touch only a few pages and then throw everything away by calling exec(). Failing such a fork() because the strict accounting says you should is very irritating to people; it is the system using robot logic. But at the time of the fork(), the system has no way of telling parsimonious 'good' child processes that will promptly exec() from child processes that are going to stick around and do a lot of work and use a lot of their committed address space.

The answer (I will not call it a solution) is vfork(), which is a fork() that doesn't require the kernel to charge the child process for any committed address space. This allows large processes to spawn other programs without running into artificial limits, at the cost of being a hack itself.

(In order to make this work the child can't actually get any pages of its own; instead it gets to use the parent's pages, and to make this work the parent process gets frozen until the child exits or exec()s.)

Actually this is a bit of a lie, because it is only half the reason that vfork() exists. But that's another entry, because this one is already long enough.

WhyVforkI written at 22:28:43; Add Comment

2007-11-07

Understanding the virtual memory overcommit issue

First, a definition: the committed address space is the total amount of virtual memory that the kernel might have to supply real memory pages for, either in swap space or in RAM. In other words, this is how much memory the kernel has committed to supplying to programs if they all decide to touch all of the memory they've requested from the kernel.

(This is less than the total amount of virtual memory used in the system, since some things, like program code and memory mapped files, don't need swap space.)

In the old days, how much committed address space a Unix kernel would give out was simple but limited: the amount of swap space you had. When people starting moving beyond this, they ran into two issues:

  • the kernel needs some amount of memory for itself in order to operate.

  • programs do not necessarily use all of the memory that they've requested from the kernel, especially when the request is sort of implicit (such as when a process fork()s).

If we could ignore both issues, the committed address space the kernel should give out would be simple: the sum of physical memory plus swap space. Since we can't, the question is how much should we adjust the number for each issue. Unfortunately both issues are unpredictable and depend on what you're doing with your system and on how cautious you need to be about never hitting a situation where the kernel has overcommitted memory, so there is no universal answer, only heuristics and tuning knobs, and the various Unixes have wound up making different choices.

Note that these are choices. While people sometimes argue back and forth about them, the overall problem is a hard one and there is no universal right answer for what committed address space limit to use and how to behave in the face of overcommit.

Sidebar: the results of running out

If the kernel runs into its limit on committed address space it starts giving errors when asked to do operations that require more, so programs stop being able to do things like malloc() memory or fork() or start new processes with big writeable data areas. If the kernel discovers that it has overcommitted itself it is generally forced to start killing processes when they try to use pages of memory that the kernel can't actually supply at the moment.

(Sometimes the kernel winds up in a worse situation, if for example it needs memory for its own use but can't get it. This can lock up an entire machine instead of just killing processes.)

Programmers and system administrators generally prefer the former to the latter; it is a lot easier to cope with malloc() failing than random processes getting abruptly killed. At the same time they want failures to only happen when the system is genuinely out of memory, not when the kernel is just being conservative.

MemoryOvercommit written at 23:23:15; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.