Wandering Thoughts archives

2011-07-20

Why I would like my mailer to have a real programming language (part 2)

In illustrated form, to go with the previous explanation of this.

The actual configuration change that I just made, amounting to part of one line:

< require_files = $local_part:$home/.forward
> require_files = <; $local_part;$home/.forward;${if !IS_SPAM {!$home/.forward-nonspam}}

(additions and changes have been bolded.)

The amount of added comments necessary to explain this configuration: 17 lines, not counting an entire previous entry (with an antecedent for background). Part of this is to explain the logic and what is going on, and part of this is to explain a necessary workaround because of interactions due to how Exim has chosen to do various sorts of string expansions.

(There are three separate sorts of string interpretation going on in this one line. It's fun.)

Don't ask how long this small change took to develop and test, despite the logic being simple and easily expressed when written down in plain language.

Sidebar: the levels of string interpretation here

Because someone someday may ask, here are the three levels that Exim is doing:

  1. a purely textual, configuration file level macro substitution that expands IS_SPAM into an Exim string expansion condition.
  2. splitting require_files on the list separator boundaries, either : (original line) or ; (changed line)
  3. string expanding the ${if ...} clause.

The separator has to change because (wait for it) IS_SPAM expands to something that has :'s in it. This fooled me during debugging for some time, because the pre-macro-substitution version does not have any :'s so it looks safe from step 2.

A decently designed programming language would be a lot cleaner here. Unfortunately, Exim is probably trying to avoid being a Lisp instead.

sysadmin/ProgrammableMailersII written at 15:55:41; Add Comment

Our ZFS spares handling system (part 3)

In part 1 I mentioned that our spares system pulls what disks to use as spares from files and how the files are maintained was beyond the scope of the entry. Well, time to talk about that.

From more or less the beginning of our ZFS fileserver system we've had an administrative system that captured a record of all pools on each physical server and a list of all disks visible to that server and how they were being used by ZFS. This system is relatively crude; it's shell scripts that run once a day and then scp their output to a central location (which is then replicated back to all fileservers). By combining the information from all of the local disk usage files, the spares file building system can get a global view of both what disks exist and how they're used.

(At this point I will pause to note that all through our system we translate iSCSI disk names from the local Solaris cNt... names to symbolic names that have the iSCSI target and logical disk involved. This is a hugely important step and avoids so many potential problems.)

Although we have disk usage information for physical servers, the spares files are built for our virtual fileservers; each virtual fileserver has its own list of spares, even if two virtual fileservers happen to be using the same physical server at the moment. We do this because each of our iSCSI backends is typically dedicated to a single virtual fileserver and we want to keep things that way even when we have to activate spares. The overall spares handling environment goes to some pains to make this work.

The whole process of building the spares files for the virtual fileservers is controlled by a configuration file with directives. There are two important sorts of directives in the file:

fs8 use backend lincoln

This means that the virtual fileserver fs8 should use as spare disks any unused disks on the iSCSI backend called lincoln.

all exclude pool fs2-core-01

This means that all virtual fileservers should avoid (as spares) any logical disks that share a physical disk with a disk used by the ZFS pool fs2-core-01 (which happens to be the pool that hosts our /var/mail, which is quite sensitive to the increased IO load of a resilver).

(There are variants of these directives that allow us to be more specific about things, but in practice we don't need them.)

The spares-files build process is run on a single machine from cron, normally once a day. This low frequency of automated rebuilds is generally perfectly fine because disk usage information changes only very slowly. If we're replacing a backend there is a series of steps we have to do by hand to get the spares files rebuilt promptly, but that's an exceptional circumstance.

In theory we could have put all of this initial spares selection logic straight into the spares handling program. In practice, I feel that there's a very strong reason to keep them separate (in addition to this making both aspects of the spares problem simpler). Since we want each potential spare disk to only ever be usable by a single virtual fileserver, overall spares selection is inherently a global process. Global processes should be done once and centrally, because this avoids any chance that two systems will ever do them separately and disagree over what the answer should be. If we only ever generate spare disk lists in one place, we have a strong assurance that only serious program bugs will ever cause two fileservers to think that they can use the same disk as a spare. If the fileservers did this themselves, there are all sorts of (de)synchronization issues that could cause such duplication.

(We can also post-process the output files to check that this constraint holds true.)

solaris/ZFSOurSparesSystemIII written at 00:16:01; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.