Wandering Thoughts archives


When you make changes, ZFS updates much less stuff than I thought

In the past, for example in my entry on how ZFS bookmarks can work with reasonable efficiency, I have given what I think of as the standard explanation of how ZFS's copy on write nature forces changes to things like the data in a file to ripple up all the way to the top of the ZFS hierarchy. To quote myself:

If you have an old directory with an old file and you change a block in the old file, the immutability of ZFS means that you need to write a new version of the data block, a new version of the file metadata that points to the new data block, a new version of the directory metadata that points to the new file metadata, and so on all the way up the tree, [...]

This is wrong. ZFS is structured so that it doesn't have to ripple changes all the way up through the filesystem just because you changed a piece of it down in the depths of a directory hierarchy.

How this works is through the usual CS trick of a level of indirection. All objects in a ZFS filesystem have an object number, which we've seen come up before, for example in ZFS delete queues. Once it's created, the object number of something never changes. Almost everything in a ZFS filesystem refers to other objects in the filesystem by their object number, not by their (current) disk location. For example, directories in your filesystem refer to things by their object numbers:

# zdb -vv -bbbb -O ssddata/homes cks/tmp/testdir
   Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
  1003162    1   128K    512      0     512    512  100.00  ZFS directory
    microzap: 512 bytes, 1 entries
       ATESTFILE = 1003019 (type: Regular File)

The directory doesn't tell us where ATESTFILE is on the disk, it just tells us that it's object 1003019.

In order to find where objects are, ZFS stores a per filesystem mapping from object number to actual disk locations that we can sort of think of as a big file; these are called object sets. More exactly, each object number maps to a ZFS dnode, and the ZFS dnodes are stored in what is conceptually an on-disk array ('indexed' by the object number). As far as I can tell, an object's dnode is the only thing that knows where its data is located on disk.

So, suppose that we overwrite data in ATESTFILE. ZFS's copy on write property means that we have to write a new version of the data block, possibly a new version of some number of indirect blocks (if the file is big enough), and then a new version of the dnode so that it points to the new data block or indirect block. Because the dnode itself is part of a block of dnodes in the object set, we must write a new copy of that block of dnodes and then ripple the changes up the indirect blocks and so on (eventually reaching the uberblock as part of a transaction group commit). However, we don't have to change any directories in the ZFS filesystem, no matter how deep the file is in them; while we changed the file's dnode (or if you prefer, the data in the dnode), we didn't change its object number, and the directories only refer to it by object number. It was object number 1003019 before we wrote data to it and it's object number 1003019 after we did, so our cks/tmp/testdir directory is untouched.

Once I thought about it, this isn't particularly different from how conventional Unix filesystems work (what ZFS calls an object number is what we conventionally call an inode number). It's especially forced by the nature of a copy on write Unix filesystem, given that due to hardlinks a file may be referred to from multiple directories. If we had to update every directory a file was linked from whenever the file changed, we'd need some way to keep track of them all, and that would cause all sorts of implementation issues.

(Now that I've realized this it all feels obvious and necessary. Yet at the same time I've been casually explaining ZFS copy on write updates wrong for, well, years. And yes, when I wrote "directory metadata" in my earlier entry, I meant the filesystem directory, not the object set's 'directory' of dnodes.)

Sidebar: The other reason to use inode numbers or object numbers

Although modern filesystems may have 512 byte inodes or dnodes, Unix has traditionally used ones that were smaller than a disk block and thus that were packed several to a (512 byte) disk block. If you need to address something smaller than a disk block, you can't just use the disk block number where the thing is; you need either the disk block number plus an index into it, or you can make things more compact by just having a global index number, ie the inode number.

The original Unix filesystems made life even simpler by storing all inodes in one contiguous chunk of disk space toward the start of the filesystem. This made calculating the disk block that held a given inode a pretty simple process. (For the sake of your peace of mind, you probably don't want to know just how simple it was in V7.)

solaris/ZFSDirectoriesAndChanges written at 22:48:33; Add Comment

Running servers (and services) well is not trivial

It all started with a Reddit comment thread called The "mass exodus" from Github to GitLab: 10 days later. In it, someone commented that they didn't understand why there was a need for cloud Git services in the first place, since running your own Git server for your company was easy enough. I think that part of this view is due to the difference between 'on premise' and 'off premise' approaches to environments, but as a sysadmin I had a twitchy reaction to the 'it's easy' part.

These days, it's often relatively easy to 'just set up a server' or a service, especially if you already work in the cloud. Spin up a VM or a Docker image, install some stuff, done, right? Well, not if you want this to be reliable infrastructure. So let's run down what you or I would have to do to set up a general equivalent of Github for internal company use:

  • Figure out the Git server software you want to use, including whether you want the full Github experience or a web Git thing that people in can pull from and push to. Or you could go very old school and demand that people use SSH logins to a Unix machine where they do filesystem level git operations, although I'm not sure that would work well for very long.
  • Possibly figure out how to configure this Git server software for your particular requirements, setup, and available other pieces.

  • Figure out how you're going to handle authentication for this Git service. Do you have a company authentication system? How will you tie this service to any 2FA that you use (and you probably want to consider 2FA)?

    If you can't outsource all authentication to some other system in your company, you've just taken on an ongoing maintenance task of adding, removing, and updating users and groups (and any other things that have to authenticate to the Git server). Failure to do this well, or failure to be hooked into other aspects of the company's authentication and HR systems, can result in fun things like fired employees still retaining high-privilege access to your Git server (ie the company's Git server). You probably don't want that.

    (Non-integrated authentication causes sufficiently many problems that it's featured in a sysadmin test.)

  • Install a 'server' to run all of the necessary components on. In the best case, you use Docker or something that uses Docker images and there's a pre-packaged Docker image that you can throw on. In the worst case, you get to find and install a physical server for this, including hooking it into any fleet-wide management systems so that it automatically gets kept up to date on security patches, and then install a host of un-packaged software from source (or install random binaries you download from the Internet, if you feel like doing that).

    If you're using Docker or a VPS in the cloud, don't forget to figure out how you're going to get persistent storage.

  • Figure out how to back up the important data on the system. Even if you have persistent cloud storage, you want some form of backups or ability to roll back in time, because sooner or later someone will accidentally do something destructive to a bit of the system (eg 'oops, I mass-deleted a bunch of open issues by mistake') and you'll need to fix it.

    Once you have backups set up, make sure that you're monitoring them on an ongoing basis so that you can find out if they break.

  • If your company has continuous integration systems and similar development automation, or has production servers that pull from your Git repos (or that get pushed to by them), you're going to need to figure out how to connect all of this to the Git server software. This includes things like how to authenticate various parties to each other, unless everyone can pull anything from your Git server.

I'm going to generously assume that the system never has performance problems (which you'd have to troubleshoot) and never experiences software issues with your chosen Git server and any databases and the like that it may depend on (which you'd have to troubleshoot too). Once set up, it stays quietly running in the corner and doesn't actively require your attention. This is, shall we say, not always the experience that you actually get.

(I'm also assuming that you can just get TLS certificates from Let's Encrypt and that you know how to do this. Or perhaps the Git server stuff you picked does this all for you.)

Unfortunately we're not done, because the world is not a nice place. Even with the service just working, we still have more things to worry about:

  • Figure out how to get notified if there are security issues or important bugfixes in either the Git server software or the underlying environment that it runs on. You can be confident that there will be some, sooner or later.

  • Even without security problems, someday you're going to have to update to a new version of the Git server software. Will it be fully compatible as a drop-in replacement? If your Git server is important to your company, you don't really want to just drop the new version in and hope for the best; you're going to have to spend time building out an environment to test the new version in (with something like your data).

    New versions may require changes to other bits of the system, any local customizations, or to things you integrated the old version with. Updates are often a pain but at the same time you have to do them sooner or later.

  • In general you need to worry about how to secure both the Git server software and the underlying environment it runs on. The defaults are not necessarily secure and are not necessarily appropriate for your company.

  • You may want to set up some degree of monitoring for things like disk space usage. If this Git server is important, people will notice right away if it goes down, but they may not notice in time if the disk space is just quietly getting closer and closer to running out because more and more people in the company are using it for more stuff.

If this is something that matters to the company and the company is more than a few people, it's also not just 'you' (a single person) who will be looking after the server. The company needs at least a few people involved so that you can go on vacation or even just get sick without the company running the risk of the central Git server that all the developers use just falling over and no one knowing how to bring it back.

In some environments this Git server will either be exposed to the Internet (even if it's only used by company people) or at least available across multiple 'internal network' locations because, say, your developers are not really on the same network as your production servers in the cloud. This will likely raise additional network security issues and perhaps authentication issues. This is very especially the case if you have to expose this Git service to the Internet. Mere obscurity and use only by company insiders is not enough any more these days; there are systems that mass scan the entire IPv4 Internet and keep world-accessible databases of what they find. If you have open ports or services, you have problems, and that means you're going to have to do the work to close things down.

Basically all of this applies to pretty much any service, not just Git. If you want to run a mailer, or an IMAP server, or a web server hosting a blog or CMS, or a DNS server, or whatever, you will have to face all of these issues and often more. Under the right circumstances, the initial setup can appear extremely trivial; you grab a Docker image, you start it somewhere, you make your internal DNS point a name to it, and you can call it up in your browser or otherwise start poking it. But that's just the start of a long and tiresome journey to a reliable, secure service that can be sustained over the long haul and won't blow up in your face someday.

I'm naturally inclined toward the 'on premise' view of things where we do stuff internally. But all of this is why, if someone approached me at work about setting up a Github-like service locally, I would immediately ask 'is there some reason we can't pay Github to do this for us?' I'm pretty confident that Github'll do it better, and if staff time isn't free they'll probably do it cheaper.

PS: I've probably missed some things you'd need to think about and tackle, which just goes to show how non-trivial this really is.

sysadmin/RunningServersNotTrivial written at 00:54:37; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.