How to force a disk write cache flush operation on Linux

October 24, 2013

Suppose, not entirely hypothetically, that you want to test how well some new storage hardware and disks stand up to a lot of write cache flush operations (eg); you don't care about high level filesystem operations, you just want to hammer on the disks and the disk interconnects.

I will cut to the chase: the simplest and most direct way of doing this on Linux is to call fsync() on a (or the) disk block device. This appears to always generate a SYNCHRONIZE_CACHE operation (or the SATA equivalent), although this will be a no-op in the disk if there is no cached writes.

(The one exception is that the Linux SCSI layer does not issue cache synchronization operations unless it thinks that the disk's write cache is enabled. Since SATA disks go through the SCSI layer these days, this applies to SATA drives too.)

Calling fsync() on files on filesystems can issue write cache flush operations under at least some circumstances, depending on the exact filesystem. However I can't follow the ext3 and ext4 code clearly enough to be completely sure that they always flush the write cache on fsync() one way or another, although I suspect that they do. In any case I generally prefer testing low-level disk performance using the raw block devices.

(It appears that under at least some circumstances, calling fsync() on things on extN filesystems will not directly flush the disk's write cache but will instead simply issue writes that are intended to bypass it. These may then get translated into write flushes on disks that don't support such a write bypass.)

Sidebar: Where this lives in the current 3.12.0-rc6 code

Since I went digging for this in the kernel source and would hate to have not written it down if I ever need it again:

  • blkdev_fsync() in fs/block_dev.c calls blkdev_issue_flush() and is what handles fsync() on block devices.
  • blkdev_issue_flush() in block/blk-flush.c issues a BIO operation with WRITE_FLUSH.
  • WRITE_FLUSH is a bitmap of BIO flags, including REQ_FLUSH.
  • sd_prep_fn() in drivers/scsi/sd.c catches REQ_FLUSH operations and calls scsi_setup_flush_cmnd(), which sets the SCSI command to SYNCHRONIZE_CACHE.
  • SATA disks translate SYNCHRONIZE_CACHE into either ATA_CMD_FLUSH or ATA_CMD_FLUSH_EXT in ata_get_xlat_func() and ata_scsi_flush_xlat() in drivers/ata/libata-scsi.c.

Whether or not the disk has write cache enabled comes into this through the SCSI layer:

  • sd_revalidate_disk() in drivers/scsi/sd.c configures the general block layer flush settings for the particular device based on whether the device has WCE and possibly supports FUA to support bypassing the write cache (see FUA in here). This is done by calling blk_queue_flush().
  • blk_queue_flush() in block/blk-settings.c sets the request queue's flush_flags field to the value passed in.
  • generic_make_request_checks() in block/blk-core.c filters out flush flags and flush requests for queues that did not advertise them, ie any 'SCSI' drive that didn't advertise at least WCE.

For a given SCSI drive, the state of WCE is reported in the sysfs attribute cache_type and the state of FUA in, surprise, FUA. This includes SATA drives (which are handled as SCSI drives, more or less).

For more kernel internal details, see this bit of kernel documentation. This may be useful to understand the interaction of various bits and pieces in the source code I've inventoried above.

By the way, git grep turns out to be really handy for this sort of thing.

PS: I don't know if there's a straightforward way to force FUA writes. You'd expect that O_SYNC writes would do it but I can't prove it from my reading of kernel source code so far, although I haven't dug deeply on this.

Written on 24 October 2013.
« Paying for services is not necessarily enough
Modern disk write caches and how they get dealt with (a quick overview) »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 24 00:31:00 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.