Thinking about when not disabling iSCSI's InitialR2T matters

May 5, 2011

A commenter on a recent iSCSI entry asked me if disabling initial R2T had made any difference. I can't answer the question because we haven't actually done any of my iSCSI tuning ideas yet, but in the process of saying that I wound up thinking about why I didn't expect disabling initial R2T to make much of a difference for us.

Let's backtrack for a moment and ask what performance impact not disabling initial R2T has. After the dust settles, requiring an initial R2T delays every SCSI WRITE by the time it takes to send the first R2T from the target to the initiator (and to have it processed on both ends). It also adds an extra iSCSI PDU to the network from target to initiator (which may or may not result in an actual extra TCP packet, depending on what else is going on at the time). When does this matter?

I will skip to my conclusion: InitialR2T is a setting that only really matters over high-latency WAN connections and perhaps in some exotic situations with synchronous writes to very, very fast storage.

Most writes are asynchronous. Roughly speaking, delaying an asynchronous write is unimportant provided that both ends can handle enough outstanding writes to fill up the available bandwidth; by definition, no one is stalling for a specific asynchronous write to complete, so all we need is to avoid a general stall. So for an initial R2T to make an actual performance difference we need to be dealing with a situation where this is not true, where either the writes are synchronous or the systems cannot handle enough asynchronous writes to fill up the bandwidth.

But neither case is enough by itself, because what we also require is that waiting for an R2T adds visible amounts of time to the overall request. Although I haven't looked at packet traces to verify this, I expect competent iSCSI targets to generate R2Ts basically instantly when they get an iSCSI WRITE PDU, and the typical local LAN packet latency is on the order of a tenth of a millisecond (assuming that the LAN from the target to the initiator is not saturated). This time is dwarfed by the time it takes to do disk IO with a physical disk (and to transfer significant amounts of information over a gigabit link).

Ergo, the R2T delay only matters when it starts rising to some visible fraction of the time that the rest of the SCSI WRITE takes, both the actual disk IO and the data transfer time. The easiest way to get this is with slow R2T response times, such as you might get over a high latency WAN link. In theory you might get this with a very fast disk subsystem on the target, but even then I think you'd have to be in an unusual situation for a tenth of a millisecond per write to matter.

(It's possible that this could matter if you are doing small random writes to a fast SSD. The smaller the writes are (and the faster they're serviced), the more outstanding writes you need in order to fill up the available bandwidth. I do not feel like doing the math right now to work out actual numbers for this, plus where are you getting more than 100 Mbytes/sec of small writes from?)

Oh well. I suppose this simplifies our theoretical future iSCSI tuning efforts.

Written on 05 May 2011.
« Why xterm's cut and paste model is non-standard and limited
A realization about cache entry lifetime and validation »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu May 5 02:03:24 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.