Wandering Thoughts archives

2010-11-08

A find optimization and a piece of history, all in one

One of the floating pieces of modern Unix lore is that if you are doing a find that matches against both filenames and other properties of the file, it's best to put the filename match first. That is, if you want to find zero-sized object files the right order is:

find . -name '*.o' -size 0 -print

I called this a piece of modern Unix lore for good reason; this wasn't necessarily true in the old days (and even today it isn't always true, depending on the filesystem and how smart your version of find is).

First, let's cover why this can be the faster order. When find is processing a given directory entry it already has the name, but it doesn't know the file size; to find out the file size it would have to stat() the file, which takes an extra system call and possibly an extra disk read IO. So if find can make a decision on the directory entry just by checking its name, it can save a stat().

But wait. In order to properly traverse a directory tree, find needs to know if a directory entry is a subdirectory or something else, and in the general case that takes a stat(). This gets us back to being just as slow, because regardless of the order of find operations find is going to have to stat() the name sooner or later just to find out if it needs to chdir() into it. So how can find still optimize this?

(There are some clever optimizations that find can do under some circumstances, but we'll skip those for now.)

What happened is that a while back, Unix filesystem developers realized that it was very common for programs reading directories to need to know a bit more about directory entries than just their names, especially their file types (find is the obvious case, but also consider things like 'ls -F'). Given that the type of an active inode never changes, it's possible to embed this information straight in the directory entry and then return this to user level, and that's what developers did; on some systems, readdir(3) will now return directory entries with an additional d_type field that has the directory entry's type.

(This required changes to both filesystems, to embed the information in the on-disk information, and the system call API, to get it to user space. Hence it only works on some filesystems on some versions of Unix.)

Given d_type, find can completely avoid stat()'ing directory entries if it only needs to know their name or their type. However, it has to stat() the directory entry if it needs to know more information, such as the size.

(And if the d_type of directory entries ever gets corrupted, you can get very odd results.)

unix/FindAndDTypeOptimization written at 23:41:54;

When Linux's rp_filter might make sense

I wrote a grumpy entry about net.ipv4.conf.*.rp_filter setting back here, where I said that it didn't make any sense. Well, I can actually come up with one situation where it may make sense: virtualization and thus virtualization networks.

One relatively common virtual machine setup is NAT-based, where the virtual machines get IP addresses on a private virtual network on the host. While the host doesn't route to its virtual network (or networks, if you have a big enough virtualization setup), it may be listening for various guest related services on its internal IP address. You don't want these services to be reachable from outside the machine; instead, you really do want the machine's private virtual network to be an isolated network. Using the rp_filter setting is the easy way to achieve this, and its drawbacks are irrelevant because the isolated network is both private and disconnected from any other real machine.

(Well. Most of the drawbacks. Interesting things may happen if your guest virtual machines try to talk to the real IP address of your host.)

While you can use ipfilters or policy based routing to achieve the same effects, both require you to know the IP addresses of some or everything involved; this means you need to generate the rules for your machine and update them when the machine's IP addresses or connectivity change. Using rp_filter is a lazy blunt hammer that gets you away from this.

However, I still believe that it's a mistake to default rp_filter to on. At least right now, I think that most users are not going to do virtualization and so the better approach (although it takes slightly more work) is to turn rp_filter on only when people configure virtualization.

(The counter-argument to this is that having rp_filter on all the time prevents the situation where you turn on virtualization and suddenly your networking explodes. Of course it does this by making your networking explode from the start, but you're more likely to notice that.)

linux/MaybeSensibleRpfilter written at 00:26:19;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.