How mmap(2) requires a unified buffer cache

September 17, 2007

In a previous entry I mentioned that Sun's addition of mmap() basically forced their hand on having a unified buffer cache. Today I feel like elaborating on that.

The problem with having mmap() and not a unified buffer cache is page coherence. If you have some programs using mmap() and some using regular read() and write(), you will wind up with two copies of pages, one mapped into process memory and one in the buffer cache. Because virtual memory and the buffer cache are not unified, there is nothing that keeps these two copies in sync with each other; programs will see an unpredictable mix of new and old data, depending on what pages got forced out of virtual memory when.

(A related problem is finding already-mapped pages so that you can share mappings across processes, which means you're going to need some sort of mapping index anyways.)

Since you want to let people mmap() more file pages than you have buffer cache, you can't just have mmap() use the buffer cache to hold mapped in file pages. You can reuse your mapping indexing scheme to create coherence without technically having a unified buffer cache, but I think that there would be various issues and you're so close to a unified buffer cache that you might as well go the rest of the way.

(The one benefit of not unifying the buffer cache is that at least theoretically you have a clear way to avoid file IO eating your virtual memory system.)

Written on 17 September 2007.
« In praise of Python's Global Interpreter Lock
Linux NFS client kernel tunable settings »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Sep 17 22:32:07 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.