2024-03-16
Some more notes on Linux's ionice
and kernel IO priorities
In the long ago past, Linux gained some support for block IO
priorities,
with some limitations that I noticed the first time I looked into
this. These days the Linux kernel has support for
more IO scheduling and limitations, for example in cgroups v2 and its IO
controller.
However ionice
is still there and now I want to note some more things, since I
just looked at ionice again (for reasons outside the scope of this
entry).
First, ionice
and the IO priorities it sets are specifically
only for read IO and synchronous write IO, per ioprio_set(2)
(this is
the underlying system call that ionice
uses to set priorities).
This is reasonable, since IO priorities are attached to processes
and asynchronous write IO is generally actually issued by completely
different kernel tasks and in situations where the urgency of doing
the write is unrelated to the IO priority of the process that
originally did the write. This is a somewhat unfortunate limitation
since often it's write IO that is the slowest thing and the source
of the largest impacts on overall performance.
IO priorities are only effective with some Linux kernel IO schedulers, such as BFQ. For obvious reasons they aren't effective with the 'none' scheduler, which is also the default scheduler for NVMe drives. I'm (still) unable to tell if IO priorities work if you're using software RAID instead of sitting your (supported) filesystem directly on top of a SATA, SAS, or NVMe disk. I believe that IO priorities are unlikely to work with ZFS, partly because ZFS often issues read IOs through its own kernel threads instead of directly from your process and those kernel threads probably aren't trying to copy around IO priorities.
Even if they pass through software RAID, IO priorities apply at the level of disk devices (of course). This means that each side of a software RAID mirror will do IO priorities only 'locally', for IO issued to it, and I don't believe there will be any global priorities for read IO to the overall software RAID mirror. I don't know if this will matter in practice. Since IO priorities only apply to disks, they obviously don't apply (on the NFS client) to NFS read IO. Similarly, IO priorities don't apply to data read from the kernel's buffer/page caches, since this data is already in RAM and doesn't need to be read from disk. This can give you an ionice'd program that is still 'reading' lots of data (and that data will be less likely to be evicted from kernel caches).
Since we mostly use some combination
of software RAID, ZFS, and NFS, I don't think ionice
and IO priorities
are likely to be of much use for us. If we want to limit the impact a
program's IO has on the rest of the system, we need different measures.