How to force a disk write cache flush operation on Linux
Suppose, not entirely hypothetically, that you want to test how well some new storage hardware and disks stand up to a lot of write cache flush operations (eg); you don't care about high level filesystem operations, you just want to hammer on the disks and the disk interconnects.
I will cut to the chase: the simplest and most direct way of doing
this on Linux is to call fsync()
on a (or the) disk block device.
This appears to always generate a SYNCHRONIZE_CACHE
operation
(or the SATA equivalent), although this will be a no-op in the disk
if there is no cached writes.
(The one exception is that the Linux SCSI layer does not issue cache synchronization operations unless it thinks that the disk's write cache is enabled. Since SATA disks go through the SCSI layer these days, this applies to SATA drives too.)
Calling fsync()
on files on filesystems can issue write cache flush
operations under at least some circumstances, depending on the exact
filesystem. However I can't follow the ext3 and ext4 code clearly
enough to be completely sure that they always flush the write cache on
fsync()
one way or another, although I suspect that they do. In any
case I generally prefer testing low-level disk performance using the
raw block devices.
(It appears that under at least some circumstances, calling fsync()
on things on extN filesystems will not directly flush the disk's write
cache but will instead simply issue writes that are intended to bypass
it. These may then get translated into write flushes on disks that don't
support such a write bypass.)
Sidebar: Where this lives in the current 3.12.0-rc6 code
Since I went digging for this in the kernel source and would hate to have not written it down if I ever need it again:
- blkdev_fsync() in fs/block_dev.c calls blkdev_issue_flush()
and is what handles
fsync()
on block devices. - blkdev_issue_flush() in block/blk-flush.c issues a BIO operation
with
WRITE_FLUSH
. WRITE_FLUSH
is a bitmap of BIO flags, includingREQ_FLUSH
.- sd_prep_fn() in drivers/scsi/sd.c catches
REQ_FLUSH
operations and calls scsi_setup_flush_cmnd(), which sets the SCSI command toSYNCHRONIZE_CACHE
. - SATA disks translate
SYNCHRONIZE_CACHE
into eitherATA_CMD_FLUSH
orATA_CMD_FLUSH_EXT
in ata_get_xlat_func() and ata_scsi_flush_xlat() in drivers/ata/libata-scsi.c.
Whether or not the disk has write cache enabled comes into this through the SCSI layer:
- sd_revalidate_disk() in drivers/scsi/sd.c configures the general block layer flush settings for the particular device based on whether the device has WCE and possibly supports FUA to support bypassing the write cache (see FUA in here). This is done by calling blk_queue_flush().
- blk_queue_flush() in block/blk-settings.c sets the request queue's
flush_flags
field to the value passed in. - generic_make_request_checks() in block/blk-core.c filters out flush flags and flush requests for queues that did not advertise them, ie any 'SCSI' drive that didn't advertise at least WCE.
For a given SCSI drive, the state of WCE is reported in the sysfs
attribute cache_type
and the state of FUA in, surprise, FUA
.
This includes SATA drives (which are handled as SCSI drives, more or
less).
For more kernel internal details, see this bit of kernel documentation. This may be useful to understand the interaction of various bits and pieces in the source code I've inventoried above.
By the way, git grep
turns out to be really handy for this sort
of thing.
PS: I don't know if there's a straightforward way to force FUA writes.
You'd expect that O_SYNC
writes would do it but I can't prove it
from my reading of kernel source code so far, although I haven't dug
deeply on this.
|
|