Our Linux ZFS fileservers work much like to our OmniOS ones

December 31, 2018

I wrote recently about our new generation of Linux-based ZFS fileservers. One of the things that I covered only in passing is how surprisingly similar they are to our current OmniOS fileservers. Although we've changed from OmniOS to Linux, which is a significant shift at one level, a great deal of the administration and how we work with them has basically not changed at all. From the perspective of using and operating our ZFS fileservers, very little has fundamentally changed and most of our practices and procedures are just as before. Looking at things so far, I think that there are three reasons for this (or perhaps four, if I'm going to be pessimistic).

The first reason is that modern Unixes are quite similar to each other. On top of that, OmniOS is less foreign than Solaris was (it used many more GNU versions of things than Solaris had, and on top of that we added various stuff), and by moving from OmniOS to Linux we're moving 'upwards', to an environment with more GNU tools, programs with more features, and so on. Moving from Linux to Solaris would be dislocating (and was), because so many things we're used to would be missing (including /bin/sh as a POSIX shell), but OmniOS is more mainstream than Solaris and we're moving the other way.

The second reason is that the two big ZFS commands, zfs and zpool, are basically the same between OmniOS and Linux (and in general between all ZFS versions). This is pretty much by design. Right from the start on Solaris, ZFS defined its own management commands in addition to the kernel components, and people who ported ZFS to other environments have kept those commands as unchanged as possible. ZFS on Linux does have some Linux specific components and aspects, such as its systemd services and ZED, but we don't interact with them on a day to day basis. Because the ZFS commands are the same and OmniOS was substantially similar to Linux, many of our management scripts and so on can (and do) operate basically the same on Linux as they did on OmniOS, or even are the same scripts.

The third reason (and the final positive one) is that we do a lot of our ZFS administration through a few locally written front end commands. Since we wrote these commands, we can make them behave exactly the same on Linux as on OmniOS, even if the actual underlying mechanisms are significantly different (for example, we do extensive translation of NFS export options, but that's all hidden by the local command). If we had to work at the level of straight ZFS commands on a routine basis, some things would be more noticeably different; for example, OmniOS device names are quite different from Linux device names.

(In the OmniOS days, transforming the device names into a far more understandable form was much more important than it is now, since we were using iSCSI and thus really wanted to know which iSCSI backend a particular disk was on. Today, most of the important things are visible in the Linux device names we use, although they still require some mental translation.)

The final reason is that we haven't yet had to troubleshoot issues, which is the area where there are clear and significant differences. While we have much more ongoing metrics on our new Linux fileservers than we ever set up on OmniOS, we have no equivalents of our DTrace monitoring scripts. For all that Linux is a more familiar environment for finding some problems, there's some information that the DTrace scripts gave us ready access to that I'm not sure we have any good equivalents of.

(For instance, I'm not sure we have a good way to find out what clients are the most active ones for a given a fileserver. There's nfswatch, but I'm not sure that's going to be enough. We can (and do) gather client-side statistics from our own clients, but that doesn't help us if a significant amount of traffic comes from other people's client machines, which it sometimes does.)

Of course, getting here took a bunch of work. We had to adopt and modify our local programs and some of our scripts, design and build some new systems to replace things we were doing on OmniOS, and figure out how to hook things into existing Linux and ZFS on Linux facilities like ZED. But now that we're here, it's pleasant how similar the new environment is to the old, as far as operating it goes.

PS: Someday eBPF and bpftrace and so on may make a solid DTrace replacement, with some work on our part to build equivalent scripts, but it's not there out of the box right now on Ubuntu 18.04 LTS.

PPS: Of course, as far as our NFS clients are concerned everything is just the same. Filesystems are mounted in the same way, using the same paths; it's just that the fileserver names changed. With our automounter replacement, clients don't even really care about the fileserver names either; all NFS mounts are driven by a magic file that's automatically generated from our master data file of ZFS filesystems and where they are.

Written on 31 December 2018.
« Thinking about DWiki's Python 3 Unicode issues
How I get a copy of the Ubuntu kernel source code (as of Ubuntu 18.04) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 31 22:31:56 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.