What we did to get iSCSI multipathing working on Solaris 10 update 8

March 10, 2011

I've mentioned in the past that we use multipathing in our fileserver setup, but I've never written down what we needed to do to get this working for us.

First, we're using Solaris MPxIO, not iSCSI's own multipathing; neither Solaris nor (I believe) our Linux backends support MC/S. Also, see my earlier notes.

To start with, each iSCSI backend has two different IP addresses on two different networks. Then:

  • we had to configure MPxIO so that it recognized our iSCSI backends as valid multipathing targets. This is done by adding the vendor and product ID to /kernel/drv/scsi_vhci.conf in a special magic format. The comments in this file are actually a good guide to what you need to do. These days, you may find that your iSCSI backend is already automatically recognized by MPxIO and you don't need to do anything, especially if you're using a popular commercial one.

    Rebooting is required to activate this, but before you do so be very sure that your iSCSI disks have unique serial numbers.

    You will know that multipathing is working for your disks when they show up as very long names instead of nice short ones.

  • we use static target configuration, so we configured each target for each of its iSCSI network IP addresses. I don't know how well this works with any of the dynamic discovery mechanisms, but based on previous experience I would make sure that your targets are only advertising the IPs for your actual storage networks, not, say, their management interface IP as well.

  • make sure that the Solaris iSCSI initiator is configured to make two connections per target. You have to do this after the target is configured with both IPs; otherwise, Solaris will happily make two connections to the same IP address, which is not what you want.

    (This is done with 'iscsiadm modify initiator-node -c 2'.)

  • we found that we could not reliably use the onboard nVidia Ethernet ports on our SunFire X2200s. Apparently not even Sun could get good drivers for them. We switched to an Intel dual NIC card and had no problems.

Solaris MPxIO defaults to round-robin use of all of the available paths, which is what you want if you want maximum performance.

Once we got it up this setup has worked reliably and without problems for us, and at full speed. The targets and the backends can talk to each other at gigabit wire bandwidth and a suitable set of operations (talking to enough different disks) on a Solaris fileserver can read data at over 200 MBytes/sec, saturating both gigabit links to the backends. Note that this is without jumbo frames.

(This speed is purely local; since the fileservers only have a single gigabit link for NFS clients, they will not normally do more than 100 Mbytes/sec of IO to the backends. Well, I suppose writes could multiply this; if you managed to write to a pool with enough disks, the 100 Mbytes/sec from an NFS client could wind up doubling due to mirroring. In practice, our NFS clients are just not that active.)

Sidebar: troubleshooting performance issues

At the network level iSCSI is just a TCP stream, so the first thing to do with an iSCSI performance issue is to make sure that your network is working in general. If you cannot get sustained full wire bandwidth between your initiator and your target using a tool like ttcp, you have a network issue in general that you need to fix.

(This is obviously much easier to test if your targets are running some sort of general OS and aren't just closed appliances.)

If you see network problems, go through the normal network troubleshooting steps. For example, try connecting an initiator and a target together with a crossover cable so that you can rule out your switch. If you're using jumbo frames, try turning them off to see if it improves things; in fact, consider leaving them turned off unless you see a clear performance advantage to them (my experience with jumbo frames has not been very positive).

Next, of course, you need to verify that the disks on the target can actually deliver the performance that you expect to see. Some machines may have bad or underpowered disk subsystems, or some of the disks might have quietly gone bad. Again, this is a lot easier if your targets are running a real operating system where you can get in to directly measure local disk performance.

You should only start tuning iSCSI parameters or blaming the iSCSI software once you've verified that the network and the disks are both fine. Otherwise you may waste a lot of time chasing red herrings and dead ends.

Written on 10 March 2011.
« Why feed readers have been a geek flash in the pan
What tools I use to deal with email »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Mar 10 01:04:08 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.