2011-11-27
Why processing things in inode order is a good idea
In a note on yesterday's entry on readdir()'s ordering, a commentator wrote (in part):
Note many utils use FTS which sorts directory entries by inode [...]
It may not be obvious why this is a good thing to do, so let me take a shot at it.
Suppose that you want to do something that either looks at or
touches the inodes of the files in a directory; perhaps your ls
or
find
needs to stat()
them, or your chmod
needs to change their
permissions. What is the right order to process the files in?
As always, modern disks are seek limited. You can't do anything to change how many seeks it takes to read the directory (or in general how fast it happens), because you don't control anything about the order that you get directory entries in; as discussed last entry, the kernel returns directory entries in whatever it wants to. But you can control what order the kernel reads inodes. So we want to ask for inodes in whatever order minimizes seeks.
In general, you don't know exactly how a filesystem organizes where inodes go on the disk and they are usually not all in one contiguous area but scattered in various spots over the disk. However, filesystems have historically put inodes on the disk in increasing order of inode number; you can be pretty certain that inode X+1000 is in a block that is after the block that inode X is in (at least as far as logical block numbers go). Asking for inodes in increasing numerical order thus at least means that the disk only seeks forward (and probably the minimum distance possible) and it maximizes the chances that the kernel will be able to read several inodes of interest in one block. Asking for inodes in any other order increases the chances that the kernel will have to seek back and forth over the disk to give them to you.
(There are some filesystems where this is no longer true, primarily filesystems (such as ZFS) which never rewrite things in place. That means that every time an inode is modified it has to be written to a new place on disk, which means that the (fixed) inode number of a file has no bearing on where on disk the inode has wound up.)