A surprise about which of our machines has the highest disk write volume

October 14, 2017

Once upon a time, hard drives and SSDs just had time-based warranties. These days, many SSDs have warranties that are more like cars; they're good for so much time or so many terabytes written, whichever comes first, and different SSD makers and models can have decidedly different maximum figures for this. So, as part of investigating what SSDs to get for future usage here, we've been looking into what sort of write volume we see on both our ZFS fileservers (well, on the iSCSI backends for them) and on the system SSDs of those of our Ubuntu servers that have them. The result was a bit surprising.

Before I started looking into this, I probably would have guessed that the highest write volume would be either for the SSDs of the ZFS pool that holds our /var/mail filesystem. I might have also guessed that perhaps some of the oldest disks for ZFS pools on our most active fileserver might be pretty active. While both of these are up in the write volume rankings, neither has our highest write volume.

Our highest write volume turns out to happen on the system SSDs in our central mail machine; they see about 32 TB of writes a year, compared to about 23 TB of writes a year on the busiest iSCSI backend disks on our most active fileserver. The oldest and most active SSDs involved in the mail spool have seen only about 10 TB of writes a year, which is actually below many of our more active ZFS pool disks (on several fileservers). The central mail machine's IO activity is also heavily unbalanced in favour of writes; with some hand-waving about the numbers, the machine runs about 80% writes (by the amount of data involved) or more. The disks in the ZFS pools show much lower write to read ratios; an extreme case is the mail spool's disks, which see only 12% writes by IO volume.

My current theory is that this huge write volume is because Exim does a lot of small writes to things like message files and log files and then fsync()'s them out to disk all the time. Exim uses three files for each message and updates two of them frequently as message deliveries happen; updates almost certainly involve fsync(), and then on top of that the filesystem is busy making all the necessary file creations, renames, and deletions be durable. We're using ext4, but even there the journal has to be forced to disk at every step.

(This certainly seems to be something involving Exim, as our external mail gateway has the same highly unbalanced writes to reads ratio. The gateway is doing roughly 4 TB of writes a year, but that's still quite high for our Ubuntu system SSDs.)

PS: All of these figures for SSDs are before any internal write amplification that the SSD itself does. My understanding is that SSD warranty figures are quoted before write amplification, as the user-written write volume.


Comments on this page:

By jmwallach at 2017-10-14 05:18:59:

Reading the previous post it seems like exim is doing it's work on the NFS mounts but you mention several operations per message, is that happening in /var/run or some such?

By cks at 2017-10-14 16:30:33:

While Exim is processing a message, it keeps various internal data files for it in /var/spool/exim4 (and it also writes to logs in /var/log). On our systems, both of these are on the local system disks. For local deliveries, messages are ultimately written to /var/mail, which is NFS-mounted.

(Or people can write them to files in their home directories or other NFS-mounted filesystems.)

On the external mail gateway, messages just flow through to the central mail server, so everything happens in /var/spool/exim4 and /var/log and so on, which are all on the local disks.

Written on 14 October 2017.
« Working to understand PCI Express and how it interacts with modern CPUs
Unbalanced reads from SSDs in software RAID mirrors in Linux »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 14 03:17:09 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.