Wandering Thoughts archives

2011-12-10

ZFS pool activation and iSCSI (part II)

Here's an interesting question that should have occurred to me much earlier:

Why is a bug about boot-time ordering between iSCSI disk discovery and ZFS pool activation fixed with a kernel patch?

I don't have a sure answer to this; the best I can do is a theory. But before we get there, let's talk about how ZFS pool activation seems to connect up with the Solaris iSCSI initiator.

At the SMF level, the iSCSI initiator is svc:/network/iscsi/initiator. Nothing explicitly depends on it (at least according to 'svcs -D'). Despite this, on our S10U8 machines it finishes starting immediately before svc:/system/filesystem/local does (which is exactly what you want, since the latter SMF service seems to be what starts ZFS pools). Exactly why SMF uses or enforces this order is opaque to me. For that matter it's not clear if SMF itself is enforcing the order; because SMF only shows the order by end time, not by start time, it's possible that the start order is different than the finish order.

(A great deal of SMF is opaque, annoying, or both.)

Now it's time for theorizing.

If we take SMF at its word, there is no explicit ordering dependency in SMF. Any ordering we get is either a lucky coincidence or enforced by something else, and I don't believe it's a lucky coincidence. The obvious candidate to enforce an ordering is the kernel, since it handles both the iSCSI and ZFS parts of all of this. It would make a kind of sense if the kernel delayed ZFS pool activation until iSCSI discovery had finished; it's very analogous to how kernels often delay things for SCSI disk discovery. Given that iSCSI disk discovery can be quite protracted, it would also make sense if at some point a clever Sun kernel developer broke that absolute dependency so that the boot could still proceed even if iSCSI discovery was taking ages; such a dependency break would match the symptoms we saw here, where 'zfs mount -a' ran after iSCSI discovery had started but before it had finished. The fix for this kernel dependency issue would of course be another kernel change.

(Since Oracle no longer updates the OpenSolaris source code it's impossible to verify this theory. Besides, my patience for spelunking Solaris kernel code is pretty close to being exhausted.)

ZFSPoolActivationII written at 01:07:20; Add Comment

2011-12-07

Understanding the Solaris iSCSI initiator (a bit)

If you're an innocent person (like I used to be), the Solaris iSCSI initiator appears much like how it works on other Unixes. You have an administrative command (iscsiadm), a system daemon (iscsid, which is what the SMF service svc:/network/iscsi/initiator starts), and a kernel component that presumably turns iSCSI connections into SCSI disks. Unfortunately this view of Solaris is highly misleading, or as I should actually admit, wrong.

Because of the complexity involved, most systems split the iSCSI initiator into a user-level system daemon that does target discovery, iSCSI login, and session initiation and a kernel component that takes established iSCSI sessions and does iSCSI IO with them. Solaris does not work this way.

In Solaris, the entire iSCSI protocol stack is in the kernel, including all target discovery. Yes, this includes the extra protocols used for finding targets (iSNS and SendTargets). That tempting looking iscsid daemon actually only has two little jobs: it tells the kernel to start up the iSCSI initiator (and keep it running) and it does hostname lookups for the kernel. Oh, and it tries to avoid reporting 'service ready' to SMF until the kernel seems to have completed iSCSI discovery or discovery has stalled out.

(iscsid does not even read and write the iSCSI initiator configuration database in /etc/iscsi; the kernel does it directly. By the way, the database is stored as a serialized nvlist (of course). Normally there are two copies, the current database and the previous database.)

None of this is documented, of course, or at best it's only documented if you read carefully between the lines in the way that the Solaris people want you to.

PS: according to comments in the OpenSolaris iscsid code, the hostname lookup is incomplete. iscsid only returns to the kernel a single IP address for a hostname, regardless of how many the host has; it picks the first one that the underlying library call returns.

Sidebar: when iscsid reports things to SMF

Because I was just looking at this in the source code and we may need it sometime: first, if the kernel reports that all forms of iSCSI target discovery have completed, service startup is obviously done. After that iscsid gives up and declares 'service started' if it's been 60 seconds without any new LUNs being discovered. As long as you discover at least one LUN every minute, SMF will keep waiting for svc:/network/iscsi/initiator to complete.

(What effect this has on the rest of the system is unclear, since nothing depends on the iSCSI initiator in SMF from what I can see.)

SolarisISCSIInitiator written at 02:05:03; Add Comment

2011-12-06

What I know about boot time ZFS pool activation (part I)

In response to my entry on the boot time ZFS and iSCSI sequencing bug, a commentator asked if SMF dependencies could be used to work around the issue. As it happens, this is not a simple question to answer because how ZFS pools are activated at boot time is at best an obscure thing (at least as far as I can tell). Here's what I think is going on, which has to come with a lot of disclaimers.

ZFS pool information for pools that will be imported during boot is in /etc/zfs/zpool.cache; this is a serialized nvlist of pool information. zpool.cache is read in by the kernel very early during boot; as far as I can disentangle the OpenSolaris code, it's loaded when the ZFS module is first loaded (or as the root filesystem is being brought up, if the root filesystem is a ZFS one). However this doesn't seem to actually activate the ZFS pools, just set up the (potential) pool configuration in the kernel.

(ZFS pool activation is, or at least seems to be, when the kernel tries to find all of the pool's devices and either finds enough of them to start the pool up or marks it as failed. Thus ZFS pool activation is the point at which all devices need to have been brought up.)

It's not clear to me when and how ZFS pools are actually activated. At a low level pools seem to be activated on demand when they are looked at. However there is no high level SMF service that says 'activate ZFS pools'; instead, they seem to get activated as a side effect of other SMF services. I suspect that the primary path to ZFS pool activation is the 'zfs mount -a' that is done in the SMF svc:/system/filesystem/local service (this is what is prints the 'Reading ZFS config:' message that you see during Solaris boot). There is also some special magic for activating ZFS swap volumes (exactly where the magic is depends on which Solaris 10 update you're on), which may activate pools that have swap volumes.

How iSCSI comes into this picture is sufficiently complicated that it needs another entry.

ZFSPoolActivationI written at 02:30:11; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.