== A little modern Unix twitch Every so often, I just want to read a file (or a bunch of files) without doing anything to them. I have all sorts of reasons for this; sometimes I want to prime the OS's disk cache, or I want to time the file read speed, or I want to put some IO load on the system, or any number of other reasons (the worst is to just update the file atimes). The common element is that I don't care what happens to the file data after it gets read off the disk, so I dump it in _/dev/null_. When I do this these days, I never do the redirection to _/dev/null_ in the same process that is doing the reading; instead, I always feed things through a pipe. In other words, instead of running: > _cat file >/dev/null_ I run: > _cat file | cat >/dev/null_ This is completely wasteful and annoying, but the problem with not doing this is that far too many commands and Unix systems are too smart for their own good these days; there are all sorts of things that notice you are writing to _/dev/null_ and optimize away all of that read IO that I want to happen. Putting a pipe in the middle kills all of those optimizations because no matter how optimized the writer is, the data has to go across the pipe which means that the reader has to actually *read* it. Sometimes this is unnecessary paranoia, but it's easier to be paranoid and slightly inefficient all of the time than to try to remember when I can be completely efficient and when I can't be. (It's not as if an extra _cat_ process really matters on any modern system.) === Sidebar: How this optimization can happen naturally I can't swear that I'm remembering something that actually happened in a real Unix, but here's an example of how this sort of stuff can get optimized without any individual component being too crazy. You need two pieces: * a version of _cat_ that prefers to work by _mmap()_'ing each source file and then _write()_'ing it to the output in one go. This is less absurd than it sounds; when _mmap()_ was first introduced, a lot of people became very enthused about using it on everything (which sometimes led to fun bugs when these programs were asked to work on something that couldn't be _mmap()_'d). You can even argue that this version of _cat_ is better because it doesn't try to guess the right buffer size, it just defers everything to the operating system. (If the kernel has bits of the file in kernel buffers, it can even do 'zero copy' IO, where it doesn't have to copy things to user level on a _read()_ only to immediately copy them back into the kernel on the following _write()_.) * a kernel that optimizes _write()_'s to _/dev/null_ by not actually copying data from the user level process into the kernel only to then discard it; instead, it just checks that the buffer given to it is a valid one and then returns immediate success. When the file is _mmap()_'d, nothing is immediately read from disk; instead the reads will happen when the mapped pages are touched and produce virtual memory faults. If you wrote to a real file, this would happen when the _write()_ started copying data from your process into kernel buffers; however, because the _write()_ to _/dev/null_ never does this copy, it never causes any page faults on the mapped pages and thus never does any IO to read the source file. Ergo, '_cat file >/dev/null_' does nothing real and runs startlingly fast. It's hard to argue with either of these optimizations in isolation (apart from the whole issue of hitting everything with the _mmap()_ stick), but when they combine together you get an unfortunate result.