A sudden realization about Unix access time updates and disk mirrors

August 21, 2010

I have a small confession: we have only recently turned off file access time updates on our fileservers. While turning off atime is one of the most common pieces of advice for how to speed up overall filesystem performance (see here for one explanation why), we haven't felt that our filesystems were particularly slow and sysadmins famously have a lot of inertia.

(What spurred me into action was being exposed to a bunch of stuff on ZFS performance tuning, which caused me to start thinking again about some long-standing minor performance issues we have from time to time.)

In the process of doing this I came to a belated realization about atime updates on filesystems that are mirrored:

Atime updates steal IOPs from all of your disks.

(In this they are not unique; all write IO does this.)

If you have an IO load that is primarily random reads (which we do overall), your limiting factor is IO operations per second; a modern disk is seek limited and can give you only somewhere between 100 and 150 IOPs a second. The usual assumption about random reads is that on average they'll keep all disks equally busy, so you get ~100 * N IOPs/sec out of your overall system.

But on a mirrored system, writes have to be written to all of the disks, which takes an IOP a disk and thus reduces the IOPs/s level of all of them. This means that atime updates can have a larger effect on your IO load than is immediately obvious. In this they are not alone; any other source of quiet asynchronous write IO can also have such an indirect impact.

(In our case it is probably still not enough to make an obvious difference, and certainly turning off atime updates didn't produce any immediately obvious performance improvements.)

This whole idea really should not have been so slow to show up in my mind, because I have certainly read about this issue a lot in the context of scaling traditional SQL databases through master/slave replication. Slave database replicas work great for dealing with read load, but you still get killed when your write load goes up since writes have to be applied to all slaves as well as on the master.

(I probably first ran into this idea in Brad Fitzpatrick's various presentations about scaling LiveJournal.)

Sidebar: backups and access time

One of the reasons that turning off access time updates was an easy decision for us is that our backup system makes them relatively useless anyways because it works by reading things through the filesystem and thus updates every file's access time when it backs them up.

The traditional two uses of access time are detecting files that haven't been looked at for a while and detecting what files have been looked at recently. Here, a file's atime is never older than the most recent backup (which will be in the past week or two), no matter how long it's been since the user looked at it, and a recently accessed file may just mean that the backup system has backed it up recently.

Written on 21 August 2010.
« A clever blog anti-usability trick
Another reason to hate $LANG and locales on Unix »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Aug 21 00:51:59 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.