Our likely iSCSI parameter tuning
Now that I have some idea of what the various iSCSI parameters do and control, I can think about how we might want to change them. The necessary disclaimer is that at the moment, all of this is theoretical, not tested and validated and useful.
My general belief is that our IO load is mostly reads and is somewhere between truly random and short sequential reads (ie, sequential reads of short files). I expect that most of our writes are asynchronous, but some of them are synchronous as ZFS commits transaction groups.
(I have not actually verified this belief, partly because measuring this stuff is hard. Please do not suggest DTrace as the obvious easy answer unless you also have a DTrace script that gives good answers to these questions, at scale.)
Given this, my first overall conclusion is that tuning iSCSI parameters probably isn't that important for us, provided that they are sane to start with. Bandwidth is not really an issue in general and the major latency tuning you can do is for writes, which are not that important for us. Dedicated tuning could modestly lower the protocol overhead for reads and theoretically this slightly reduces the latency, but.
This doesn't mean that there's nothing to tune, though. Here's what I think we'll want:
- InitialR2T set to No, so that the initiator does not have to wait
for the target before sending write data. But then everyone wants
InitialR2T set to no.
(It should really be the iSCSI default and then targets that have weird requirements and limitations would require it be Yes. But this grump really calls for a separate entry.)
- a maximum PDU size that lets the target return all of the data
for a typical small read in a single PDU. The default of 8Kbytes
is probably not enough, but I wouldn't go all that large either
(at least not without a lot of testing).
I don't think that ZFS does any small synchronous writes, so it's not worth worrying about fitting writes into a single PDU.
- a maximum burst size that's at least large enough to allow ZFS to
read its typical large read size in a single request. On many
pools, I believe that this will be the record size, normally 128
Kbytes. A larger maximum burst size is not a problem and may
sometimes be an advantage.
- a first burst length that is large enough to allow ZFS to do uberblock updates without waiting for an R2T; this assumes that InitialR2T is no. I believe that this is 128 Kbytes.
Since we can set InitialR2T to no, I don't think we really care about ImmediateData. If it naturally defaults to 'yes' there's no reason to change it, but if it doesn't then there's not much reason to bother changing it. The exception would be if it turns out to be harmless to set the maximum PDU size very large (because neither side does any fixed buffer allocations based on it), large enough that typical writes easily fit into a single PDU.