Wandering Thoughts archives

2007-07-25

Solaris Volume Manager and iSCSI: a problematic interaction

Solaris Volume Manager (which I still call DiskSuite) keeps information about the state of its logical volumes in what it calls a 'metadevice state database' (a metadb for short). You normally keep a number of replicas of this state database, scattered around the physical devices that DiskSuite is managing for you. When you are using metasets, all of the metadb replicas have to be on disks in the metaset. This is a logical consequence of the DiskSuite tools needing to update the metadata to reflect which machine owns a metaset; if there was metadata on a disk outside the metaset, DiskSuite on another machine wouldn't necessarily be able to update it.

DiskSuite's approach to dealing with unavailable metadb replicas is simple: DiskSuite panics the system if it loses metadb quorum, where quorum is half of the metadb replicas plus one. This is actually spelled out explicitly in the metadb manpage, along with the reasoning.

(Technically it may survive with exactly half of the metadb replicas; I can't test right now.)

Now we get to the iSCSI side of the problem, namely that if the Solaris iSCSI initiator loses connectivity to an iSCSI target it offlines all of the disks exported by that iSCSI target, which in turn immediately tells DiskSuite that the metadb replicas on all of those disks are now unavailable. If this drops you below quorum in DiskSuite (for any metaset), your system promptly panics.

(This is different from the behavior of FibreChannel, where glitches in FC connectivity just produce IO errors for any ongoing IO and don't yank the metadb replicas out from under DiskSuite.)

The net result is that if you are using Solaris Volume Manager to manage iSCSI-based storage in metasets, you need to build metasets that include disks (logical or otherwise) from at least three different iSCSI targets or the loss of connectivity to a single target will kill your entire machine.

(And you need to carefully balance the number of metadb replicas across all of your targets so that one target doesn't have too many replicas.)

DiskSuiteiSCSIProblem written at 17:15:47; Add Comment

2007-07-12

An interesting mistake with ZFS and iSCSI

First I'll show you the symptoms, then I'll explain what I did to shoot myself in the foot:

# zpool create tank01 c0t38d0
# zpool create tank02 c0t39d0
# zpool replace tank01 c0t38d0 c0t42d0

(Time passes, the resilver finishes, and zpool status tank01 shows no use of c0t38d0.)

# zpool attach -f tank02 c0t39d0 c0t38d0
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c0t42d0s0 is part of active ZFS pool tank01. Please see zpool(1M).

(Emphasis mine.)

All of these disks are iSCSI disks being exported from a Linux machine. The error condition was persistent, lasting through reboots, zpool export and zpool import and so on, while at the same time nothing said that c0t38d0 was in use or active.

How I shot myself in the foot is simple: I configured all of the iSCSI disks with the same ScsiId value. When I set up the Linux target software, I'd assumed that its 'SCSI ID' was something like a model name, partly because there's also a ScsiSN parameter for each disk's nominal serial number. I was totally wrong; it needs to be a unique identifier, just like the ScsiSN values (and if left alone, the target software would have handled it).

What is presumably going on is that ZFS noticed that c0t38d0 has the same ScsiId as c0t42d0, concluded that they were two names for the same actual disk (which is easily possible in a multi-path setup), and sensibly refused to let me shoot myself in the foot. The one thing I don't understand is why it only happens with c0t42d0, which is the last of the iSCSI disks.

ZFSiSCSIMistake written at 21:23:15; Add Comment

By day for July 2007: 12 25; before July; after July.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.