Wandering Thoughts archives

2009-03-16

An important gotcha with iSCSI multipathing in Solaris 10

Here's something important to know about Solaris's MPxIO multipathing: MPxIO identifies disks only by their serial numbers and identifiers. So if two Solaris devices have the same serial number, MPxIO concludes that they are two paths to the same physical disk; it has no actual knowledge of underlying path issues, such as iSCSI target identifiers.

This matters a great deal on iSCSI, because at least some iSCSI initiators have serial numbers that are set in software. If you accidentally duplicate some serial numbers between different disks, Solaris's MPxIO will happily decide that they are all the same disk and start distributing IO among them. The result will not make your filesystem very happy. (If you are using ZFS, you have probably just lost the entire pool and possibly the system as well.)

(This is similar to my previous mistake along these lines, but much bigger. I am fortunate that I made this mistake in testing.)

Or in short: when you set up iSCSI targets, make very sure that they have unique SCSI serial numbers et al.

It's hard to fault MPxIO for this behavior, since part of MPxIO's job as a high level multipathing system is to join together the same drive when it's visible over multiple different transport mediums (for example, a drive that is visible over both FibreChannel and iSCSI, however peculiar that may be). Still, it makes adding new targets a bit nerve-wracking, since I know that one mistake or oversight with the configuration of a new iSCSI backend may destroy a pool on an unrelated set of storage.

(This is where I wish Solaris (and our iSCSI backends) had iSCSI specific multipathing, which would avoid this problem because it knows that two completely different targets can never be the same disk.)

ISCSIMultipathGotcha written at 23:47:29; Add Comment

2009-03-06

What can keep a ZFS pool busy and prevent it from exporting

Suppose that you have some ZFS-based NFS fileservers, perhaps in some sort of failover configuration, and that you want to export some pools from one. So you do:

# zpool export tank
cannot unmount '/x/y': Device busy

The usual set of tools don't show any local processes using the filesystem (not that there really are any, the server being purely an NFS server), and there's no actual NFS activity. In short: rather puzzling. Also annoying, since this stops the pool from exporting.

(This can also happen with 'zfs upgrade -a', which apparently unmounts and remounts the filesystems as it upgrades them.)

When this has happened to us, the thing holding the filesystem busy turned out to be the Solaris NFS lock manager. Temporarily disabling it in svcadm allowed the pool to export without problems:

# svcadm disable svc:/network/nfs/nlockmgr
# zpool export tank
# svcadm enable svc:/network/nfs/nlockmgr

I suspect that this will lose any locks attached to the filesystem, and of course it has side effects if you're still allowing NFS traffic to filesystems in other pools (we weren't).

(My theory is that the NFS lock manager holds filesystems busy when it has active locks against something on the filesystem. Unfortunately I don't know of any way of inspecting its state, although I'm sure there is one. (And using the kernel debugger seems a bit overkill.))

Having written all of this, I checked the fine manual and I see that 'zpool export' has a -f option, and I don't think we tried that when we ran into this problem. However, I feel better knowing what the actual cause is (and 'zfs upgrade' has no -f, so we'd have had to deal with the issue sooner or later).

ZFSBusyPool written at 01:40:28; Add Comment

2009-03-02

Some gotchas with ZFS in Solaris 10 update 6

Apart from the outright issue with the ZFS failmode setting, here is some things that I've run across during our testing and usage of ZFS on Solaris 10 update 6:

  • 'zpool status -x' is of the opinion that a pool using an older on-disk format is an error, or at least something that should show up in its output. It is completely mistaken and this mistake makes it far less useful than it could be, since it now reports all of our pools all of the time, regardless of whether real errors have happened or not.

  • while it seems that 'zpool upgrade' is transparent when the system is in use, 'zfs upgrade' does not seem to be; in my one test so far, active NFS clients got IO errors during it.

    (I did not test what local programs would see, since it's not relevant for our environment.)

The new ZFS feature of pools having host ownership works, even with un-upgraded pools (which is useful for us), but is a bit peculiar. The important feature is that if a fileserver goes down, its pools are imported on a different machine, and the fileserver reboots, the fileserver will no longer automatically import the pools (and thus destroy them). This makes failover scenarios much safer.

However, it seems that Solaris will no longer import pools at all on boot if they were last used on another machine, even if the pools have been released with 'zpool export' (and it gives no boot-time error messages about this). This might seem like an obscure scenario, but consider something like a power supply failure; what we'd like to do is import the pools on a backup server, fix the primary server's power supply so that we can boot it, export the pools on the backup server, and power up and boot the primary server and have it just go. Instead, we'll have to remember to manually import all of the pools (slowly).

(In writing this it occurs to me to wonder if 'zpool import -a' on the primary machine would actually work. I suspect not, but it's worth testing.)

ZFSSolaris10U6Gotchas written at 01:18:21; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.