Wandering Thoughts archives


Disk write buffering and its interactions with write flushes

Pretty much every modern system defaults to having data you write to filesystems be buffered by the operating system and only written out asynchronously or when you specially request for it to be flushed to disk, which gives you general questions about how much write buffering you want. Now suppose, not hypothetically, that you're doing write IO that is pretty much always going to be specifically flushed to disk (with fsync() or the equivalent) before the programs doing it consider this write IO 'done'. You might get this situation where you're writing and rewriting mail folders, or where the dominant write source is updating a write ahead log.

In this situation where the data being written is almost always going to be flushed to disk, I believe the tradeoffs are a bit different than in the general write case. Broadly, you can never actually write at a rate faster than the write rate of the underlying storage, since in the end you have to wait for your write data to actually get to disk before you can proceed. I think this means that you want the OS to start writing out data to disk almost immediately as your process writes data; delaying the write out will only take more time in the long run, unless for some reason the OS can write data faster when you ask for the flush than before then. In theory and in isolation, you may want these writes to be asynchronous (up until the process asks for the disk flush, where you have to synchronously wait for them), because the process may be able to generate data faster if it's not stalling waiting for individual writes to make it to disk.

(In OS tuning jargon, we'd say that you want writeback to start almost immediately.)

However, journaling filesystems and concurrency add some extra complications. Many journaling filesystems have the journal as a central synchronization point, where only one disk flush can be in progress at once and if several processes ask for disk flushes at more or less the same time they can't proceed independently. If you have multiple processes all doing write IO that they will eventually flush and you want to minimize the latency that processes experience, you have a potential problem if different processes write different amounts of IO. A process that asynchronously writes a lot of IO and then flushes it to disk will obviously have a potentially long flush, and this flush will delay the flushes done by other processes writing less data, because everything is running through the chokepoint that is the filesystem's journal.

In this situation I think you want the process that's writing a lot of data to be forced to delay, to turn its potentially asynchronous writes into more synchronous ones that are restricted to the true disk write data rate. This avoids having a large overhang of pending writes when it finally flushes, which hopefully avoids other processes getting stuck with a big delay as they try to flush. Although it might be ideal if processes with less write volume could write asynchronously, I think it's probably okay if all of them are forced down to relatively synchronous writes with all processes getting an equal fair share of the disk write bandwidth. Even in this situation the processes with less data to write and flush will finish faster, lowering their latency.

To translate this to typical system settings, I believe that you want to aggressively trigger disk writeback and perhaps deliberately restrict the total amount of buffered writes that the system can have. Rather than allowing multiple gigabytes of outstanding buffered writes and deferring writeback until a gigabyte or more has accumulated, you'd set things to trigger writebacks almost immediately and then force processes doing write IO to wait for disk writes to complete once you have more than a relatively small volume of outstanding writes.

(This is in contrast to typical operating system settings, which will often allow you to use a relatively large amount of system RAM for asynchronous writes and not aggressively start writeback. This especially would make a difference on systems with a lot of RAM.)

tech/WriteBufferingAndSyncs written at 21:59:25; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.