Thinking about how much asynchronous disk write buffering you want

May 15, 2017

Pretty much every modern system defaults to having data you write to filesystems be buffered by the operating system and only written out asynchronously; you have to take special steps either to make your write IO synchronous or to force it to disk (which can lead to design challenges). When the operating system is buffering data like this, one obvious issue is the maximum amount of data it should let you buffer up before you have to slow down or stop.

Let's start with two obvious observations. First, if you write enough data, you will always have to eventually slow down to the sustained write speed of your disk system. The system only has so much RAM; even if the OS lets you use it all, eventually it will all be filled up with your pending data and at that point you can only put more data into the buffer when some earlier data has drained out. That data drains out at the sustained write speed of your disk system. The corollary to this is that if you're going to write enough data, there is very little benefit to letting you fill up lots of a write buffer; the operating system might as well limit you to the sustained disk write speed relatively early.

Second, RAM being used for write buffers is often space taken away from other productive uses for that RAM. Some times you will read back some of the written data and be able to get it from RAM, but if it is purely written (for now) then the RAM is otherwise wasted, apart from any benefits that write buffering may get you. By corollary with our first observation, buffering huge amounts of write-only data for a program that is going to be limited by disk write speed is not productive (because it can't even speed the program up).

So what are the advantages of having some amount of write buffering, and how much do we need to get them?

  • It speeds up programs that write occasionally or only once and don't force their data to be flushed to the physical disk. If their data fits into the write buffer, these programs can continue immediately (or exit immediately), possibly giving them a drastic performance boost. The OS can then write the data out in the background as other things happen.

    (Per our first observation, this doesn't help if the collection of programs involved write too much data too fast and overwhelm the disks and possibly your RAM with the speed and volume.)

  • It speeds up programs that write in bursts of high bandwidth. If your program writes a 1 GB burst every minute, a 1 GB or more write buffer means that it can push that GB into the OS very fast, instead of being limited to the (say) 100 MB/s of actual disk write bandwidth and taking ten seconds or more to push out its data burst. The OS can then write the data out in the background and clear the write buffer in time for your next burst.

  • It can eliminate writes entirely for temporary data. If you write data, possibly read it back, and then delete the data fast enough, the data needs never be written to disk if it can be all kept in the write buffer. Explicitly forcing data to disk obviously defeats this, which leads to some tradeoffs in programs that create temporary files.

  • It allows the OS to aggregate writes together for better performance and improved data layout on disk. This is most useful when your program issues comparatively small writes to the OS, because otherwise there may not be much improvement to be had from aggregating big writes into really big writes. OSes generally have their own limits on how much they will aggregate together and how large a single IO they'll issue to disks, which clamps the benefit here.

    (Some of the aggregation benefit comes from the OS being able to do a bunch of metadata updates at once, for example to mark a whole bunch of disk blocks as now used.)

    More write buffer here may help if you're writing to multiple different files, because it allows the OS to hold back writes to some of those files to see if you'll write more data to them soon enough. The more files you write to, the more streams of write aggregation the OS may want to keep active and the more memory it may need for this.

    (With some filesystems, write aggregation will also lead to less disk space being used. Many filesystems that compresses data are one example, and ZFS in general can be another one, especially on RAIDZ vdevs (and also).)

  • If the OS starts writing out data in the background soon enough, a write buffer can reduce the amount of time a program takes to write a bunch of data and then wait for it to be flushed to disk. How much this helps depends partly on how fast the program can generate data to be written; for the best benefit, you want this to be faster than the disk write rate but not so fast that the program is done before much background write IO can be started and completed.

    (Effectively this converts apparently synchronous writes into asynchronous writes, where actual disk IO overlaps with generating more data to be written.)

Some of these benefits require the OS make choices that push against each other. For example, the faster the OS starts writing out buffered data in the background, the more it speeds up the overlapping write and compute case but the less chance it has to avoid flushing data to disk that's written but then rapidly deleted (or otherwise discarded).

How much write buffering you want for some of these benefits depends very much on what your programs do (individually and perhaps in the aggregate). If your programs write only in bursts or fall into the 'write and go on' pattern, you only need enough write buffer to soak up however much data they're going to write in a burst so you can smooth it out again. Buffering up huge amounts of data for them beyond that point doesn't help (and may hurt, both by stealing RAM from more productive uses and from leaving more data exposed in case of a system crash or power loss).

There is also somewhat of an inverse relationship between the useful size of your write buffer and the speed of your disk system. The faster your disk system can write data, the less write buffer you need in order to soak up medium sized bursts of writes because that write buffer clears faster. Under many circumstances you don't need the write buffer to store all of the data; you just need it to store the difference between what the disks can write over a given time and what your applications are going to produce over that time.

(Conversely, very slow disks may call in theory call for very big OS write buffers, but there are often practical downsides to that.)

Written on 15 May 2017.
« How we failed at making all our servers have SSD system disks
Unfortunately I don't feel personally passionate about OmniOS »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 15 23:55:52 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.