2013-06-25
Balancing Illumos against ZFS on Linux
Every so often I poke at some aspect of our fileserver replacement project (where we need to replace our current Solaris 10 Update 8 servers with something modern enough to handle 4K sector disks), but at the moment things are moving slowly. One reason for this slowness is that I hope that things will get clearer as time goes on.
Currently I'm looking at Illumos and ZFS on Linux. With Illumos, I brought up most of our environment on an OmniOS VM and it all worked (including some tricky bits) and almost all of it worked just like Solaris. With ZFS on Linux I have the core basics up but I'm slowly chasing a NFS performance issue. And in a way this encapsulates the overall issue for me.
The risk with ZFS on Linux is issues integrating the ZFS codebase with Linux. My NFS write performance issue is clearly an issue at this join point and I have few concrete ideas for how to either troubleshoot it or to resolve it. It's probably not the only such integration issue out there and the only way to smoke them out (or to be sure that they aren't going to affect us) may be to run ZoL in production in our environment.
(I admit that that's the pessimistic view.)
The risk with Illumos is the same as it always has been: that we won't be able to find an Illumos distribution that is mature and supported for a long time, or at least not a distribution that has what we want. OmniOS has what we want and tracking it over time will tell me something about the other attributes. Not huge amounts, though, so I think I am going to have to start following some mailing lists so I can get an informed idea of how things are going.
(A project's mailing lists often give you a somewhat too pessimistic view of how healthy the project is because they often attract people with problems or gripes instead of all of the people who are happy. But seeing what the problems and gripes are is itself interesting, as is finding out what the explosive political issues are. It's just that mailing lists are time consuming and it's hard to sustain interest if you don't care about the problems, you're just there to get a sense of the land.)
In that our fileservers are going to be locked down appliances that we rarely update or even touch, my somewhat reluctant current belief is that any Illumos distribution is probably going to wind up less risky than ZFS on Linux. In practice we can have much more confidence in the core ZFS, NFS, iSCSI, and multipathing environment on Illumos because basically all of it comes from Solaris and we have plenty of experience with most of the Solaris bits. If the worst comes to the worst, lack of updates is not a huge drawback once we freeze the production system.
2013-06-11
The good and bad of IPS (as I see it)
IPS (the 'Image Packaging System') is the new packaging system used in
Solaris 11 and (more importantly) many Illumos-derived distributions; it
replaces Solaris 10 packages and patches. I have previously described IPS as being more or less like git; it puts all
files together in a hash-based content store and then has 'packages'
that are basically just indexes into the store. This contrasts with the
traditional Linux approach to packaging where each package is an archive
of some sort that contains all actual files in the package.
The attractive part of IPS is what the content store approach does for repositories and for package updates. If files are the same between two versions of a package (or between multiple packages), the repository only needs to store one copy and the package update or install process can detect that you already have the needed file installed already. This mimics the practical behavior of Solaris 10 patches, which only included changed files (as opposed to the Linux approach, where changing just one file in a package causes you to re-issue an entire second copy of the whole package).
(This also minimizes what needs to be digitally signed. Much as in git, you don't need to digitally sign the files themselves, just the package index data. The all-in-one Linux package format means that you generally need to sign and verify large blobs of data.)
The bad part of IPS is what it does to downloading and storing packages. As far as I know, files are downloaded from IPS repositories in the same way that they're stored; you ask for them one by one and they then dribble in bit by bit. As we've learned the hard way, this is not a great way to do things on the modern Internet (or in general) because each separate fetch requires a new connection (or at least a new request) and that has various consequences.
(IPS packages are normally fetched over HTTP or HTTPS but I don't know if the IPS client and server are smart enough to take advantage of HTTP connection reuse.)
I'm also not enthused about how this makes package repositories harder to manage and exposes them to subtle forms of breakage (such as a file that's listed in package manifests but not present in the repository). Pruning old packages is now necessarily a whole-repository operation, since you can't just remove their files without seeing if any other package uses them.
I suspect that Sun developed IPS this way to preserve the small sizes and small installation changes of Solaris 10 patches (which transfer and install only the changed files instead of the whole package). I prefer the simpler approach of Linux packages (and I note that Linux package updates themselves can optimize both transfer size and install time actions).