What happens in ZFS when you have 4K sector disks in an ashift=9
vdev
Suppose, not entirely hypothetically,
that you've somehow wound up with some 4K 'advance format' disks
(disks with a 4 KByte physical sector size but 512 byte emulated
(aka logical) sectors) in a ZFS pool (or vdev) that has an ashift
of 9 and thus expects disks with a 512 byte
sector size. If you import or otherwise bring up the pool, you get
slightly different results depending on the ZFS implementation.
In ZFS on Linux, you'll get one ZFS
Event Daemon (zed
)
event for each disk, with a class of vdev.bad_ashift
. I don't
believe this event carries any extra information about the mismatch;
it's up to you to use the information on the specific disk and the
vdev in the event to figure out who has what ashift values. In the
current Illumos source, it looks like you get a somewhat more
straightforward message, although I'm not sure how it trickles out
to user level. At the kernel level it says:
Disk, '<whatever>', has a block alignment that is larger than the pool's alignment.
This error is not completely correct, since it's the vdev ashift that matters here, not the pool ashift, and it also doesn't tell you what the vdev ashift or the device ashift are; you're once again left to look those up yourself.
(I was going to say that the only likely case is a 4K advance format
disk in an ashift=9
vdev, but these days you might find some SSDs
or NVMe drives that advertise a physical sector size larger than
4K.)
This is explicitly a warning, not an error. Both the ZFS on Linux and Illumos code have the a comment to this effect (differing only in 'post an event' versus 'issue a warning'):
/* * Detect if the alignment requirement has increased. * We don't want to make the pool unavailable, just * post an event instead. */
This is a warning despite the fact that your disks can accept IO
for 512-byte sectors because what ZFS cares about (for various
reasons) is the physical sector size, not the logical one. A vdev
with ashift=9
really wants to be used on disks with real 512-byte
physical sectors, not on disks that just emulate them.
(In a world of SSDs and NVMe drives that have relatively opaque and complex internal sizes, this is rather less of an issue than it is (or was) with spinning rust. Your SSD is probably lying to you no matter what nominal physical sector size it advertises.)
The good news is that as far as I can tell, this warning has no further direct effect on pool operation. At least in ZFS on Linux, the actual disk's ashift is only looked up in one place, when the disk is opened as part of a vdev, and the general 'open a vdev' code discards it after this warning; it doesn't get saved anywhere for later use. So I believe that ZFS IO, space allocations, and even uberblock writes will continue as before.
(Interested parties can look at vdev_open
in vdev.c. Disks
are opened in vdev_disk.c.)
That ZFS continues operating after this warning doesn't mean that
life is great, at least if you're using HDs. Since no ZFS behavior
changes here and ZFS can do a using disks with 4K physical sectors
in an ashift=9
vdev will likely leave your disk (or disks) doing
a lot of read/modify/write operations when ZFS does unaligned writes
(as it can often do). This both performs relatively badly and
leaves you potentially exposed to damage to unrelated data if there's
a power loss part way through.
(But, as before, it's a lot better than not
being able to replace old dying disks with new working ones. You
just don't want to wind up in this situation if you have a choice,
which is a good part of why I advocate for creating basically all
pools as 'ashift=12
' from the start.)
PS: ZFS events are sort of documented in the zfs-events
manpage,
but the current description of vdev.bad_ashift is not really
helpful. Also, I wish that the ZFS on Linux project itself had the
current manpages online (well, apart from as manpage source in the
Github repo, since most people
find manpages in their raw form to be not easy to read).
|
|