Solaris is not an enterprise operating system
Why I say this is best explained as a two-part quiz. The first question: supposing that you are developing an operating system, when do you bring iSCSI disks online during boot?
- before bringing up ZFS pools and mounting all non-system filesystems.
- after bringing up ZFS pools and so on.
- while bringing up ZFS pools and so on, so that the two activities can race with each other in case you have ZFS pools on iSCSI disks.
Supposing that your original answer to this question was #1 and your new answer to this question is #3. How long do you allow this bug to remain unfixed?
- clearly this is important, so a more or less immediate fix.
- certainly a fix in six months, and definitely before the next significant release.
- more than a year and one significant release (so far).
Oracle's current answers are #3 and #3 respectively. Perhaps all of their 'enterprise' customers avoid iSCSI and it is strictly a minor hobbyist protocol, or perhaps all of their enterprise customers are busy assiduously avoiding ZFS in favour of other options (I can't say I blame them).
Regardless of other issues, what this says to me is that Oracle does not consider iSCSI support to be at all a priority and therefor we are being extremely unwise to build anything on top of Solaris plus iSCSI. Evidently when iSCSI works it is because we are lucky, not because Oracle actually thinks it's important. Among other things, this does render pretty much moot all of my thinking about enticing ZFS features that might cause us to upgrade from Solaris 10 Update 8 (which does not have this bug, obviously).
(Fortunately my co-worker discovered and isolated this while our recently built S10U9 server was not yet in real production. And clearly we are going to have to add more things to our testing procedures, such as 'reboot machine to make sure pools return'.)
Update: this issue is fixed in x86 patch 147441-05 (with the bug listed as '7012256 pools on iSCSI devices unavailable upon boot'). This was released (to general Solaris users) no earlier than November 4th and possibly later, more than a year after the bug was first seen, so I believe that my point remains. It's nice to see that Oracle did finally get around to fixing this, though.