2009-03-16
An important gotcha with iSCSI multipathing in Solaris 10
Here's something important to know about Solaris's MPxIO multipathing: MPxIO identifies disks only by their serial numbers and identifiers. So if two Solaris devices have the same serial number, MPxIO concludes that they are two paths to the same physical disk; it has no actual knowledge of underlying path issues, such as iSCSI target identifiers.
This matters a great deal on iSCSI, because at least some iSCSI initiators have serial numbers that are set in software. If you accidentally duplicate some serial numbers between different disks, Solaris's MPxIO will happily decide that they are all the same disk and start distributing IO among them. The result will not make your filesystem very happy. (If you are using ZFS, you have probably just lost the entire pool and possibly the system as well.)
(This is similar to my previous mistake along these lines, but much bigger. I am fortunate that I made this mistake in testing.)
Or in short: when you set up iSCSI targets, make very sure that they have unique SCSI serial numbers et al.
It's hard to fault MPxIO for this behavior, since part of MPxIO's job as a high level multipathing system is to join together the same drive when it's visible over multiple different transport mediums (for example, a drive that is visible over both FibreChannel and iSCSI, however peculiar that may be). Still, it makes adding new targets a bit nerve-wracking, since I know that one mistake or oversight with the configuration of a new iSCSI backend may destroy a pool on an unrelated set of storage.
(This is where I wish Solaris (and our iSCSI backends) had iSCSI specific multipathing, which would avoid this problem because it knows that two completely different targets can never be the same disk.)
Complex data structures and the two sorts of languages
I've written before about the two sorts of languages, namely the ones where you can write code that runs as fast as the builtin features, and ones where you can't. For now, let me call these 'fast' and 'slow' languages, on the grounds that it is a slightly less loaded set of terms than other possible ones.
It's recently struck me that one of the effects of this language gap on the 'slow' languages is that it basically cuts off your ability to write new implementations of interesting data structures that are speedy and efficient. I think that this is important because it is my sense that it is usually exactly when time and space are at a premium that you want to write such a new data structure; otherwise, you might as well just use the built in ones.
(There are cases where an algorithm is most clearly and compactly expressed by using a specific data structure, so it is worth using it even when it is not the 'best' from a performance perspective.)
In turn this may mean that you just can't such languages to tackle problems that call for such specific, specialized data structures because your implementation will wind up too slow to be usable. (This is different from the language being slow in general; in the sort of case I am thinking of, the problem would be perfectly feasible in the language if only you had a fast implementation of the underlying data structure.)
As someone who likes playing around with programming, I find this regrettable for more than the obvious reason. For example, it means that there really is not much point in implementing, say, splay trees in Python (normally my preferred language), because while the result will work it probably won't run fast enough to be useful for any real code I write.