The apparent origins of some odd limitations in the iSCSI protocol

May 2, 2011

The iSCSI protocol has some odd features and defaults; yesterday I grumbled about how InitialR2T defaults to 'yes', for example. In many ways it is not the sort of protocol that you would design if you were going to do a TCP-based remote block access protocol, even setting aside the idea of transporting SCSI commands across the network.

Now, I wasn't there at the time, so I have no idea what the real reasons were for these protocol decisions; all I can do is guess. But what it certainly looks like from the outside is that these decisions were made in order to make it possible to make relatively inexpensive, relatively dumb (and possibly hardware-accelerated) iSCSI target implementations. Take the issue of 'Ready to Transfer' (R2T) messages from the target to the initiator. By requiring R2T messages, a target can pre-allocate limited receive buffers and then strictly control the flow of data into them; it knows that it can never receive valid data that it has not already allocated a buffer for, because it allocates the buffer before it sends the R2T. This is a perfect feature for things with limited resources and hardware that wants to do direct DMA transfers, but it's not how most TCP protocols usually work.

(Of course, this sort of decision harks back to SCSI itself, which also has the 'target tells you when to send write data' feature (among other things). But this was a sensible decision for SCSI, which operated in a quite different and more direct environment than a TCP stream and with very limited hardware on the disks (well, at least initially). In SCSI you really could DMA the incoming data directly from the wire to the data buffers (and then on to disk) without having to do other work. This is not so true in a TCP-based protocol, which has to decode TCP headers and reassemble the TCP stream before it can even start with such things.)

I can see why iSCSI wants to have this sort of feature available (in part, it enables building simple iSCSI target implementations that transport iSCSI commands more or less directly to physical disks). But I really think that iSCSI should have been specified so that these features were not the default, that the starting assumption was that you had fully intelligent initiators and targets and that you wanted the best performance possible by default. Although I have not looked at the protocol in detail, my guess is that this might also have added some additional features to the protocol, things like dynamic control of 'receive windows' for write data.

PS: I don't think that ATA-over-Ethernet does any better than iSCSI here. While simpler in some respects, AOE has its own protocol issues.

Sidebar: why iSCSI doesn't need R2T and so on for read data

It might strike you that there is an odd asymmetry in iSCSI; write operations require permission from the target before the initiator sends data, but read operations do not require permission from the initiator before the target sends data. The difference is that the initiator already controls the amount and timing of incoming read data, because it made the read request to start with. The equivalent of a read R2T is the read request itself. Write requests are different because the target doesn't initiate them and so can get hit with arbitrary requests with arbitrary amounts of data at random times.

I tend to think that this does have some drawbacks for low-resource initiators (they must artificially fragment a contiguous read stream in order to limit the incoming data), but it makes for a simpler target implementation (the target doesn't have to keep a bunch of buffered data sitting around until the initiator allows it to be offloaded) and I suspect that this was what was on the minds of the people creating the iSCSI protocol.

Written on 02 May 2011.
« Our likely iSCSI parameter tuning
The importance of xterm for X Windows »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 2 01:16:39 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.