== A _find_ optimization and a piece of history, all in one One of the floating pieces of modern Unix lore is that if you are doing a _find_ that matches against both filenames and other properties of the file, it's best to put the filename match first. That is, if you want to find zero-sized object files the right order is: > _find . -name '*.o' -size 0 -print_ I called this a piece of modern Unix lore for good reason; this wasn't necessarily true in the old days (and even today it isn't always true, depending on the filesystem and how smart your version of _find_ is). First, let's cover why this can be the faster order. When _find_ is processing a given directory entry it already has the name, but it doesn't know the file size; to find out the file size it would have to _stat()_ the file, which takes an extra system call and possibly an extra disk read IO. So if _find_ can make a decision on the directory entry just by checking its name, it can save a _stat()_. But wait. In order to properly traverse a directory tree, _find_ needs to know if a directory entry is a subdirectory or something else, and in the general case that takes a _stat()_. This gets us back to being just as slow, because regardless of the order of _find_ operations _find_ is going to have to _stat()_ the name sooner or later just to find out if it needs to _chdir()_ into it. So how can _find_ still optimize this? (There are some clever optimizations that _find_ can do under some circumstances, but we'll skip those for now.) What happened is that a while back, Unix filesystem developers realized that it was very common for programs reading directories to need to know a bit more about directory entries than just their names, especially their file types (_find_ is the obvious case, but also consider things like '_ls -F_'). Given that the type of an active inode never changes, it's possible to embed this information straight in the directory entry and then return this to user level, and that's what developers did; on some systems, _readdir(3)_ will now return directory entries with an additional ((d_type)) field that has the directory entry's type. (This required changes to both filesystems, to embed the information in the on-disk information, and the system call API, to get it to user space. Hence it only works on some filesystems on some versions of Unix.) Given ((d_type)), _find_ can completely avoid _stat()_'ing directory entries if it only needs to know their name or their type. However, it has to _stat()_ the directory entry if it needs to know more information, such as the size. (And if the ((d_type)) of directory entries ever gets corrupted, you can get [[very odd results ../linux/FSCorruptionAndDType]].)