Wandering Thoughts archives

2010-10-27

A modern VCS mistake enabled by working on live trees

I've written before about sysadmin style use of version control, where we typically use a VCS for documenting the live tree instead of developing something. Recently I ran into a mistake that modern VCSes and this style of working combine together to enable: deleting a file from the live directory tree but forgetting to delete it from the VCS.

Since the live tree is, well, live, deleting the file actually takes effect and so you don't notice anything wrong. In fact, in a sense nothing is wrong except that your VCS repository no longer accurately reflects the state of the live system.

(If you make a habit of running 'hg status' or the equivalent you may notice the complaints about this, but our working practice is 'hg diff, make changes, hg diff, hg commit', and at least in our current version of Mercurial this produces no complaints; as a result, we sailed along in this situation for some time.)

There's at least two dangers of this situation. The first is that you haven't captured information about when the file was actually deleted from your live system, and you may want it someday (you can sometimes guess based on other changes). The second is that if you ever have to actually check out the tree (in any version, either the current one or an older one), these deleted files will spring back to life. Sometimes this is just surprising and distracting clutter, but it could be a problem if the file is in some place where its mere presence will cause it to be used for something.

(If the removed file has to be mentioned in a configuration file in order to do anything, having it reappear is generally harmless because you will have also committed the change to the configuration file that stops using it.)

There's also the mirror image mistake of adding a file to the live tree without committing it to the VCS repository, but we haven't made that one yet.

VCSRemovingFilesMistake written at 01:56:47; Add Comment

2010-10-15

How big our fileserver environment is (part 2)

Yesterday I counted up (or ran down) the hardware side of how big our fileserver environment is. Today I'll cover the amount of disk space we have.

Our raw disk space is simple to compute, impressive sounding, and highly misleading. With four backends of 750 GB disks and five backends of 1 TB disks, we have 96 TB (of disk vendor terabytes) of raw space available, or 60 TB for the production backends only. However, all of our space is mirrored, so that is really 30 TB (actually slightly less because we have a 750 GB four-way mirror), and then you take out a disk from each backend for spares, and the resulting ~27 TB number is still misleading.

How we really think about disk space is in terms of 'chunks', by which we generally mean mirrored pairs of our standard-sized LUNs, roughly 250 GB each; there are three chunks on a 750 GB disk (or a pair of them) and four on a 1 TB disk. At the ZFS pool level, all of our space is handled in terms of chunks, as they are what we add to pools and so on.

After the dust settles, we have:

  • 110 chunks total across all of the production backends; after various overheads, this would be about 25 TB of space as ZFS sees it.
  • 70 chunks allocated to our various ZFS pools, for a total of 15.6 TB of space as ZFS sees it.
  • 9.3 TB of space actually used by data (as counted by ZFS).

So we have a bit over half our potential space actually allocated, and a bit over half of our allocated space actually being used.

The allocation and space usage is spread very unevenly among our ZFS pools. The smallest size pool is one chunk and the largest is 12 chunks; the least used pool has only 27.5 GB used, the most used pool has 1.8 TB used (not coincidentally, it is also the 12-chunk pool).

Pools and filesystems: we have 25 ZFS pools in total, with 174 different filesystems between them (we have more NFS mounts than ZFS-level filesystems for reasons that don't fit in the margins of this entry). We have so many filesystems for complicated reasons, but part of it is how we've chosen to structure and organize filesystems here. ZFS makes filesystems cheap, so this is fine.

(The hot spare backend has no space allocated or used on it. The test fileserver and backends have a fluctuating amount of their space allocated and used, but how much is irrelevant.)

OurScaleII written at 00:38:13; Add Comment

2010-10-13

How big our fileserver environment is (part 1)

I've said before that I call us a midsized environment (for a number of reasons, including that we're clearly not small and also nowhere near the size of large environments). Now I feel like sharing some actual numbers on how big and not-big we are. Today's entry is about the hardware of our fileserver environment; a future one will cover data size.

(This is necessarily a snapshot in time and is likely to change in the future.)

Our fileserver environment is made up from fileservers and backends. We currently have seven fileserver machines; four productions fileservers, one hot spare, and two test fileservers. One of the test fileservers is currently inactive because it hasn't been reinstalled after we used it to build a fast OS upgrade for the production machines.

(To the extent that we have roles for the test machines, one is intended to be an exact duplicate of the production machines so that we have a test environment for reproducing problems, and the other is for things like OS upgrades or patches.)

We currently have nine backends. Six are in production, one is a hot spare, and two are for testing; which one is the hot spare backend has changed over time as failures have caused us to bring the hot spare into production and turn the old production backend into the hot spare. Currently, all backends have a full set of disks; four backends (the first four we built) use 750 GB disks and the other five use 1 TB disks, including both test backends.

(We plan to raid the test backends for spare disks at some point when our spares pool is depleted but haven't needed to so far.)

As you could partly guess from the pattern of disk sizes, our initial production deployment was three fileservers and four backends; we expanded into test machines, hot spares, and then a fourth production fileserver and its two production backends from there.

Using so much hardware for hot spares and testing is a lot less extravagant than it seems. Because of university budget issues we pretty much bought all of this hardware in two large chunks, and once we had the hardware we felt that we might as well use it for something productive instead of having it sit in storage until we needed to expand the production environment. And we still have more hardware in storage, although we've more or less run out of unclaimed 1 TB drives.

(Some environments would be space, power, or cooling constrained; we haven't run into that yet.)

Sidebar: how much hardware we need for backend testing

We need two backends because that's how production machines are set up; all ZFS vdevs are made up from mirrored pairs, each side coming from a different backend. There's a fair amount of testing where we need to be able to duplicate this mirroring.

We need enough disks to fill a single backend, because we want to be able to do a full scale load test on a single backend; this verifies things like that the software and hardware can really handle driving all of the disks at once.

We don't strictly need enough disks to fill up both test backends, although it's convenient to have that many because then you don't have to worry about shuffling disks around based on what testing you want to do.

OurScaleI written at 23:50:32; Add Comment

2010-10-05

Why people combine NFS with Samba servers

One of the things that the Samba people say in response to certain problem reports is that you should not use Samba on NFS-mounted filesystems. This is impractical advice in many production environments, and I will tell you why: namespaces. Or less abstractly, making it so that users do not have to know what fileserver their files come from.

We are not atypical as midsized environments go, and right now we have four fileservers and around two hundred filesystems distributed between those fileservers. We do not particularly want users to have to know or care exactly which fileserver hosts their home directory (partly because it can change sometimes). This is easy on Unix machines, where users just use filesystem names and the sysadmins make sure that those filesystems are mounted from the right places; the result is a single global namespace of filesystems.

Using Samba to re-export NFS mounts allows us to preserve this property for SMB/CIFS clients as well, in a straightforward configuration. We have one Samba server which users map shares from, and they don't have to care where the storage really comes from. From both the user perspective and the Samba perspective, NFS is fusing multiple fileserver namespaces together into a single global namespace where everyone is indifferent as to where the files are really coming from.

It looks like it is at least theoretically possible to use SMB features to do a pure-Samba version of this fused namespace (you can, under some circumstances, make a Samba share that just redirects clients to another SMB server). However, it also appears that the Samba configuration would be significantly more complicated, and I don't know if clients expose the SMB redirections involved to the users (this would be undesirable).

Of course this omits the elephant in the corner of the room, namely having to run Samba on your fileservers. In our case this would be highly undesirable purely on various practical grounds. For others this is alarming based on load and security issues; you are running much more on your fileservers, both in terms of processes and in terms of having to run large amounts of code that historically have had a number of security issues. Sticking NFS in the middle gives you significantly more isolation.

WhyNFSSamba written at 01:28:23; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.