Wandering Thoughts archives

2007-09-26

Thinking about why Apache waits for CGIs to close standard output

This month, someone came to the local Unix group with an issue: they were using a CGI program to kick off a background process, but when they visited the CGI's URL in their web browser their browser just sat there, spinning the little 'I am fetching the page' throbber. After the dust had settled and the right advice had been given (the background process needed to have its standard input, standard output, and standard error redirected to /dev/null), I got to thinking about why Apache is not more operator-friendly here.

(What is happening is that Apache is waiting to see end of file on the CGI's output, which is not coming because the background process still has a copy of it and might write something to it someday.)

Because the CGI is talking to it instead of directly to the network, Apache could just notice when the CGI died and properly finish off the HTTP reply. But it's worth thinking about the conditions in which Apache could do that.

Apache can clearly finish off the HTTP reply when it sees end of file on the CGI's output; whether or not the CGI is still running, it can no longer add anything more to the HTTP reply.

However, Apache can't just finish off the HTTP reply when it sees that the CGI has died, because the CGI might just have written a bunch of output that Apache hasn't processed yet. The real condition Apache would have to check is that the CGI has died and there is no more pending data on the CGI's output.

So, although the idea looks simple, the actual condition is not so much so (and it is subject to races if the CGI started a background process that actually is producing output for the HTTP reply).

Sidebar: explaining the whole situation clearly

As an aside, the question also got me thinking about how to clearly explain what was going on in the whole situation. I think the best approach is to start with the idea of abstract communication channels between Apache and the CGI (and between the browser and the web server), and talk about how each bit closes down the communication channel to signal that it is done.

(Then you can use shell scripts as CGIs to show that the programs run by the shell as part of the script clearly must have access to the channels, and a background process is just a program that hasn't exited.)

WhyApacheCGIWait written at 22:55:43; Add Comment

2007-09-17

How mmap(2) requires a unified buffer cache

In a previous entry I mentioned that Sun's addition of mmap() basically forced their hand on having a unified buffer cache. Today I feel like elaborating on that.

The problem with having mmap() and not a unified buffer cache is page coherence. If you have some programs using mmap() and some using regular read() and write(), you will wind up with two copies of pages, one mapped into process memory and one in the buffer cache. Because virtual memory and the buffer cache are not unified, there is nothing that keeps these two copies in sync with each other; programs will see an unpredictable mix of new and old data, depending on what pages got forced out of virtual memory when.

(A related problem is finding already-mapped pages so that you can share mappings across processes, which means you're going to need some sort of mapping index anyways.)

Since you want to let people mmap() more file pages than you have buffer cache, you can't just have mmap() use the buffer cache to hold mapped in file pages. You can reuse your mapping indexing scheme to create coherence without technically having a unified buffer cache, but I think that there would be various issues and you're so close to a unified buffer cache that you might as well go the rest of the way.

(The one benefit of not unifying the buffer cache is that at least theoretically you have a clear way to avoid file IO eating your virtual memory system.)

UnifiedCacheMmap written at 22:32:07; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.