About the order that readdir()
returns entries in
In a mostly unrelated article (seen via Planet Sysadmin) I recently noticed the following:
readdir(3) just returns the directory entries in the order they are linked together, which is also not related to inode numbering but as best as I can tell is from outer leaf inwards (since the most recently created file is listed first).
On most systems, readdir(3)
is a modestly warmed over
version of the underlying 'read directory entries'
system call, and returns directory entries in the same order that the
system call does. In theory a Unix kernel can return directory entries
in whatever order it wants (including, say, sorted alphabetically). In
practice kernels almost always give you directory entries in what I will
call 'traversal order', whatever the natural order is for entries in the
on-disk data structures that represent a directory.
For a very long time, Unix directories on disk were simple linear arrays (first with fixed-size entries and then with variable sized ones when BSD introduced long filenames). When a new entry was added, it was generally put in the first spot where there was room; at the end if none of the directory's entries had been deleted, and perhaps earlier if a filename of a suitable length had been deleted earlier. The kernel read directories in forward linear order, starting at the first block of the directory and going up, and so returned entries in this order.
(In the original simple Unix filesystems, inode numbers were also allocated in a straightforward 'first free number' order, so the order of directory entry creation could correspond quite well to inode order. The Berkeley FFS changed this somewhat by allocating inodes in a more scattered manner.)
Modern Unix systems commonly use some sort of non-linear directories under at least some circumstances (a linear data structure may still be more efficient for small directories); generally these are some variant of balanced trees. The natural traversal order is tree order, but what that is is very implementation dependent. I believe it's common to hash filenames and then insert entries into the tree in hash order, but hashes (and thus the hash order) vary tremendously between filesystems, and I'm sure that somewhere there is a filesystem that doesn't hash names and just inserts them straight in some order.
(Because this is a per-filesystem thing, it follows that the traversal order can be different for different directories on the same system even if they have the same entries created in the same order, either because the directories are using different filesystem types or just because some parameters were set differently on the different filesystems.)
|
|