The advantages of iSCSI's MC/S for multipathing

October 19, 2008

In theory, iSCSI has a feature called 'multiple connections per session', commonly abbreviated as MC/S. In iSCSI terminology, a 'session' is a single logical connection between an iSCSI initiator and an iSCSI target, and a 'connection' is a TCP connection. MC/S lets a session be composed of multiple TCP connections, each of which can use a different set of IP addresses and thus a different network path.

(In practice, apparently very few iSCSI initiators and targets actually support MC/S. I suspect that MC/S is quite complicated at the protocol level, much like the rest of iSCSI, and this has created a disincentive to actually implement it.)

MC/S isn't the only way to do multipathing with iSCSI; an iSCSI client can also do it at a higher level, by creating multiple sessions (each with its own set of network parameters, so it uses a different network path). So why have MC/S at all, and why have I said in the past that MC/S would be the best way?

The simple answer is that MC/S is the best way because when you use MC/S, everyone involved actually knows that there is multipathing going on. The problem with multiple sessions is that at least the iSCSI target has no idea that these two separate sessions are a single object; instead, it treats them as entirely separate.

For example, the two sessions will probably have separate command ordering constraints; if there is a write barrier or command flush on one, it won't affect commands flowing over the other session. The result is that your write barriers are only partial barriers unless the high-level multipathing code in the initiator handles write barriers specially, which may cause heartburn for your filesystems. (You can probably think of other potential problems.)

In theory the initiator's top level multipathing code can cope with this and should; however, there is likely to be at least a performance penalty. Consider how you would have to implement write barriers. It is not good enough to simply send write barriers down both sessions, because there is nothing that forces cross-session synchronization (so that commands on session A cannot go to the disk until after the write barrier on session B has completed); instead you are going to need to do something like send write barriers down both sessions and then send writes to only one session until both barriers are reported as complete.

(Here I'm assuming that you don't have to worry about reads from one session crossing writes from the other session and returning what is from your perspective stale data, since you should be satisfying such reads out of your own local cache. However, there are probably situations where this is not entirely true.)

My understanding is that it is for this reason that documentation for various iSCSI targets I've read strongly suggests not using multiple sessions from the same initiator. One of the reasons that I feel that we can get away with it in our setup is that we are using ZFS, which is already reasonably cautious about disks lying to it.

(Also, we don't have a choice; we have to have connection redundancy, and neither end of our iSCSI setup supports MC/S right now.)

Written on 19 October 2008.
« Thesis: reputation based antispam systems are dead
Seeing how remarkable V7 Unix was »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Oct 19 23:13:38 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.