Wandering Thoughts archives


A caching and zone refresh problem with Unbound

Like many people, we have internal resolving DNS servers that everyone's laptops and so on are supposed to use for their DNS. These used to run Bind and now run Unbound, mostly because OpenBSD switched which nameservers they like. Also like many people, we have a collection of internal zones and internal zone views, which are managed from a private internal master DNS server. This has led to a problem with our Unbound setup that we actually don't know how to solve.

When we make an update to internal DNS and reload the private master with it, we want this to be reflected on the resolving DNS servers essentially immediately so that people see the DNS change right away. In the days when we ran Bind on the resolving servers, this was easy; we configured the resolving Bind to be a secondary for our internal zones and set the master up with also-notify entries for it. When the master detected a zone change, it sent notifications out and the resolving Bind immediately loaded the new data. Done.

It's not clear how to achieve something like this with Unbound. Unbound doesn't listen to NOTIFY messages and do anything with them (although it's listed as a TODO item in the source code). While you can force an incredibly low TTL on DNS records so that the new DNS information will be seen within a minute or two, this setting is global; you can't apply it to just some zones, like say your own, and leave everything else cached as normal. In theory we could set absurdly low TTLs on everything in the internal views on the private master, which would propagate through to Unbound. In practice, how the internal views are build makes this infeasible; it would be a major change in what is a delicate tangle of a complex system.

(With low TTLs there's also the issue of cached negative entries, since we're often adding names that didn't exist before but may have been looked up to see that nope, there is no such hostname.)

Unbound can be told to flush specific zones via unbound-control and in theory this works remotely (if you configure it explicitly, among other things). In practice I have a number of qualms about this approach, even if it's scripted, and Unbound's documentation explicitly says that flushing zones is a slow operation.

Given that Unbound explicitly doesn't support NOTIFY yet, there's probably no real good solution to this. That's sometimes how it goes.

(We have what I'm going to call a workaround that we currently feel we can live with, but I'm not going to tell you what it is because it's not really an appealing one.)

sysadmin/UnboundZoneRefreshProblem written at 00:30:58; Add Comment


Sudo and changes in security expectations (and user behaviors)

Sudo has a well known, even famous default behavior; if you try to use sudo and you don't have sudo privileges, it sends an email alert off to the sysadmins (sometimes these are useful). In my view, this is the sign of a fundamental assumption in sudo's security model, namely that it's only going to be used by authorized people or by malicious parties. If you're not a sysadmin or an operator or so on, you know that you have no business running sudo, so you don't. Given the assumption that unauthorized people don't innocently run sudo, it made sense to send alert email about it by default.

Once upon a time that security model was perfectly sensible, back in the days when Unix machines were big and uncommon and theoretically always run by experienced professionals. Oh, sure, maybe the odd sysadmin or operator would accidentally run sudo on the wrong machine, but you could expect that ordinary people would never touch it. Today, however, those days are over. Unix machines are small and pervasive, there are tons of people who have some involvement in sysadmin things on one, and sudo has been extremely successful. The natural result is that there are a lot of people out there who are following canned howto instructions without really thinking about them, and these instructions say to use sudo to get things done.

(Sometimes the use of sudo is embedded into an installation script or the like. The Let's Encrypt standard certbot-auto script works this way, for instance; it blithely uses sudo to do all sorts of things to your system without particularly warning you, asking for permission, or the like.)

In other words, the security model that there's basically no innocent unauthorized use of sudo is now incorrect, at least on multi-user Unix systems. There are plenty of such innocent attempts, and in some environments (such as ours) they're the dominant ones. Should this cause sudo's defaults to change? That I don't know, but the pragmatic answer is that in the grand Unix tradition, leaving the defaults unchanged is easier.

(There remain Unix environments where there shouldn't be any such unauthorized uses, of course. Arguably multi-user Unix environments are less common now than such systems, where you very much do want to get emailed if, eg, the web server UID suddenly tries to run sudo.)

unix/SudoAndSecurityAssumptions written at 01:03:59; Add Comment


Your C compiler's optimizer can make your bad programs compile

Every so often I learn something by having something brought to my awareness. Today's version is that one surprising side effect of optimization in C compilers can be to make your program compile. The story starts with John Regehr's tweets:

@johnregehr: tonight I am annoyed by: a program that both gcc and clang compile at -O2 and neither compiles at -O0

@johnregehr: easier to arrange than you might think

void bar(void);
static int x;
int main(void) { if (x) bar(); }

(With the Fedora 23 versions of gcc 5.3.1 and clang 3.7.0, this will fail to compile at -O0 but compile at -O1 or higher.)

I had to think about this for a bit before the penny dropped and I realized where the problem was and why it happens this way. John Regehr means that this is the whole program (there are no other files), so bar() is undefined. However, x is always 0; it's initialized to 0 by the rules of C, it's not visible outside this file since it's static, and there's nothing here that creates a path to change it. With optimization, the compiler can thus turn main() into:

int main(void) { if (0) bar(); }

This makes the call to bar() unreachable, so the compiler removes it. This leaves the program with no references to bar(), so it can be linked even though bar() is not defined anywhere. Without optimization, an attempt to call bar() remains in the compiled code (even though it will never be reached in execution) and then linking fails with an error about 'undefined reference to `bar''.

(The optimized code that both gcc and clang generate boil down to 'int main(void) { return 0; }', as you'd expect. You can explore the actual code through the very handy GCC/clang compiler explorer.)

As reported on Twitter by Jed Davis, this can apparently happen accidentally in real code (in this case Firefox, with a C++ variant).

programming/COptimizerMakingProgramsCompile written at 00:15:00; Add Comment


Our central web server, Apache, and slow downloads

Recently, our monitoring system alerted us that our central web server wasn't responding. This has happened before and so I was able to immediately use our mod_status URL to see that yep, all of the workers were busy. This time around it wasn't from a popular but slow CGI or a single IP address; instead, a bunch of people were downloading PDFs of slides for a course here. Slowly, apparently (although the PDFs aren't all that big).

This time around I took the simple approach to deal with the problem; I increased the maximum number of workers by a bunch. This is obviously not an ideal solution, as we're using the prefork MPM and so more workers means more processes which means more memory used up (and potentially more thrashing in the kernel for various things). In our specific situation I figured that this would have relatively low impact, as worker processes that are just handling static file transfers to slow clients don't need many resources.

A high maximum workers setting is dangerous in general, though. With a different access pattern, the number of workers we have configured right now could easily kill the entire server (which would have a significant impact on a whole lot of people). Serving static files to a whole lot of slow clients is not a new problem (in fact it's a classic problem), but Apache has traditionally not had any clever way to handle it.

Modern versions of Apache have the event MPM, which its documentation claims is supposed to deal reasonably with this situation. We're using the prefork MPM, but I don't think we've carefully evaluated our choice here; it's just both the Ubuntu default and the historical behavior. We may want to reconsider this, as I don't think there's any reason we couldn't switch away from prefork (we don't enable PHP in the central web server).

(Per this serverfault question and answer, in the ever increasing world of HTTPS the event MPM is basically equivalent to the worker MPM. However, our central web server is still mostly HTTP, so could benefit here, and even the worker MPM is apparently better than prefork for avoiding memory load and so on.)

PS: There are potential interactions between NFS IO stalls and the choice of MPM here, but in practice in our environment any substantial NFS problem rapidly causes the web server to grind to a halt in general. If it grinds to a halt a little bit faster with the event or worker MPM than with the prefork one, this is not a big deal.

web/ApacheDownloadOverloadIssue written at 01:33:31; Add Comment


How we do MIME attachment type logging with Exim

Last time around I talked about the options you have for how to log attachment information in an Exim environment. Out of our possible choices, we opted to do attachment logging using an external program that's run through Exim's MIME ACL, and to report the result to syslog in the program. All of this is essentially the least-effort choice. Exim parses MIME for us, and having the program do the logging means that it gets to make the decisions about just what to log.

However, the details are worth talking about, so let's start with the actual MIME ACL stanza we use:

# used only for side effects
  # only act on potentially interesting parts
  condition = ${if or { \
     {and{{def:mime_content_disposition}{!eq{$mime_content_disposition}{inline}}}} \
     {match{$mime_content_type}{\N^(application|audio|video|text/xml|text/vnd)\N}} \
    } }
  decode = default
  # set a dummy variable to get ${run} executed
  set acl_m1_astatus = ${run {/etc/exim4/alogger/alogger.py \
     --subject ${quote:$header_subject:} \
     --csdnsbl ${quote:$header_x-cs-dnsbl:} \
     $message_exim_id \
     ${quote:$mime_content_type} \
     ${quote:$mime_content_disposition} \
     ${quote:$mime_filename} \
     ${quote:$mime_decoded_filename} }}

(See my discussion of quoting for ${run} for what's happening here.)

The initial 'condition =' is an attempt to only run our external program (and writing decoded MIME parts out to disk) for MIME parts that are likely to be interesting. Guessing what is an attachment is complicated and the program makes the final decision, but we can pre-screen some things. The parts we consider interesting are any MIME parts that explicitly declare themselves as non-inline, plus any inline MIME parts that have a Content-Type that's not really an inline thing.

There is one complication here, which is our check that $mime_content_disposition is defined. You might think that there's always going to be some content-disposition, but it turns out that when Exim says the MIME ACL is invoked on every MIME part it really means every part. Specifically, the MIME ACL is also invoked on the message body in a MIME email that is not a multipart (just, eg, a text/plain or text/html message). These single-part MIME messages can be detected because they don't have a defined content-disposition; we consider this to basically be an implicit 'inline' disposition and thus not interesting by itself.

The entire warn stanza exists purely to cause the ${run} to execute (this is a standard ACL trick; warn stanzas are often used just as a place to put ACL verbs). The easiest way to get that to happen is to (nominally) set the value of an ACL variable, as we do here. Setting an ACL variable makes Exim do string expansion in a harmless context that we can basically make into a no-op, which is what we need here.

(Setting a random ACL variable to cause string expansion to be done for its side effects is a useful Exim pattern in general. Just remember to add a comment saying it's deliberate that this ACL variable is never used.)

The actual attachment logger program is written in Python because basically the moment I started writing it, it got too complicated to be a shell script. It looks at the content type, the content disposition, and any claimed MIME filename in order to decide whether this part should be logged about or ignored (using the set of heuristics I outlined here). It uses the decoded content to sniff for ZIP and RAR archives and get their filenames (slightly recursively). We could have run more external programs for this, but it turns out that there are handy Python modules (eg the zipfile module) that will do the work for us. Working in pure Python probably doesn't perform as well as some of the alternatives, but it works well enough for us with our current load.

(In accord with my general principles, the program is careful to minimize the information it logs. For instance, we log only information about extensions, not filenames.)

The program is also passed the contents of some of the email headers so that it can add important information from them to the log message. Our anti-spam system adds a spam or virus marker to the Subject: header for recognized bad stuff, so we look for that marker and log if the attachment is part of a message scored that way. This is important for telling apart file types in real email that users actually care about from file types in spam that users probably don't.

(We've found it useful to log attachment type information on inbound email both before and after it passes through our anti-spam system. The 'before' view gives us a picture of what things look like before virus attachment stripping and various rejections happen, while the 'after' view is what our users actually might see in their mailboxes, depending on how they filter things marked as spam.)

Sidebar: When dummy variables aren't

I'll admit it: our attachment logger program prints out a copy of what it logs and our actual configuration uses $acl_m1_astatus later, which winds up containing this copy. We currently immediately reject all messages with ZIP files with .exes in them, and rather than parse MIME parts twice it made more sense to reuse the attachment logger's work by just pattern-matching its output.

sysadmin/EximOurAttachmentLogging written at 00:51:59; Add Comment


Why Python can't have a full equivalent of Go's gofmt

I mentioned in passing here that people are working on Python equivalents of Go's gofmt and since then I've played around a bit with yapf, which was the most well developed one that I could find. Playing around with yapf (and thinking more about how to deal with my Python autoindent problem) brought home a realization, which is that Python fundamentally can't have a full, true equivalent of gofmt.

In Go, you can be totally sloppy in your pre-gofmt code; basically anything goes. Specifically, you don't need to indent your Go code in any particular way or even at all. If you're banging out a quick modification to some Go code, you can just stuff it in with either completely contradictory indentation or no indentation at all. More broadly, you can easily survive an editor with malfunctioning auto-indentation for Go code. Sure, you code will look ugly before you gofmt it, but it'll still work.

With Python, you can't be this free and casual. Since indentation is semantically meaningful, you must get the indentation correct right from the start; you can't leave it out or be inconsistent. A Python equivalent of gofmt can change the indentation level you use (and change some aspects of indentation style), but it can't add indentation for you in the way that gofmt does. This means that malfunctioning editor auto-indent is quite a bit more damaging (as is not having it at all); since indentation is not optional, you must correct or add it by hand, all the time. In Python, either you or your editor are forced to be less sloppy than you can be in Go.

(Sure, Go requires that you put in the { and } to denote block start and end, but those are easy and fast compared to getting indentation correct.)

Of course, you can start out with minimal, fast to create indentation; Python will let you do one or two space indents if you really want. But once you run yapf on your initial code, in many cases you're going to be stuck matching it for code changes. Python will tolerate a certain amount of indentation style mismatches, but not too much (Python 3 is less relaxed here than Python 2). Also, I'm confident that I don't know just how much sloppiness one can get away with here, so in practice I think most people are going to be matching the existing indentation even if they don't strictly have to. I know that I will be.

I hadn't thought about this asymmetry before my editor of choice started not getting my Python auto-indentation quite right, but it's now rather more on my mind.

python/PythonNoFullGofmt written at 00:02:48; Add Comment


Some options for logging attachment information in an Exim environment

Suppose, not entirely hypothetically, that you use Exim as your mailer and you would like to log information about the attachments your users get sent. There are a number of different ways in Exim that you can do this, each with their own drawbacks and advantages. As a simplifying measure, let's assume that you want to do this during the SMTP conversation so that you can potentially reject messages with undesirable attachments (say ZIP files with Windows executables in them).

The first decision to make is whether you will scan and analyze the entire message in your own separate code, or let Exim break the message up into various MIME parts and look at them one-by-one. Examining the entire message at once means that you can log full information about its structure in one place, but it also means that you're doing all of the MIME processing yourself. The natural place to take a look at the whole message is with Exim's anti-virus content-scanning system; you would hook into it in a way similar to how we hooked our milter-based spam rejection into Exim.

(You'll want to use a warn stanza to just cause the scanner to run, and maybe to give you some stuff that you'll get Exim to log with the log_message ACL directive.)

If you want to let Exim section the message up into various different MIME parts for you, then you want a MIME ACL (covered in the Content scanning at ACL time chapter of the documentation). At this point you have another decision to make, which is whether you want to run an external program to analyze the MIME part or whether to rely only on Exim. The advantage of doing things entirely inside Exim is that Exim doesn't have to decode the MIME part to a file for your external program (and then run an outside program for each MIME part); the disadvantage is that you can only log MIME part information and can't do things like spot suspicious attempts to conceal ZIP files.

Mechanically, having Exim do it all means you'd just have a warn stanza in your MIME ACL that logged information like $mime_content_disposition, $mime_content_type, $mime_filename or its extension, and so on, using log_message =. You wouldn't normally use decode = because you have little use for decoding the part to a file unless you're going to have an outside program look at it. If you wanted to run a program against MIME parts, you'd use decode = default and then run the program with $mime_decoded_filename and possibly other arguments via ${run} in, for example, a 'set acl_m1_blah = ...' line.

(There are some pragmatic issues here that I'm deferring to another entry.)

Allowing Exim to section the message up for you is easier in many ways, but has two drawbacks. First, Exim doesn't really provide any way to get the MIME structure of the message, because you just get a stream of parts; you don't necessarily see, for example, how things are nested. The second is that processing things part by part obviously makes it harder to log all the information about a message's file types in a single line; the natural way is to log a separate line for each part, as you process it.

Speaking of logging, if you're running an external program (either for the entire message or for each MIME part) you need to decide whether your program will do the logging or whether you're going to have the program pass information back to Exim and have Exim log it. Passing information back to Exim is more work but means that you'll see your attachment information along with the other log lines for the message. Logging to a place like syslog may make the information more conveniently visible and it's generally going to be easier.

Sidebar: Exim's MIME parsing versus yours

Exim's MIME parsing is in C and is presumably done on an in-place version of the message that Exim already has on disk. It thus should be quite efficient (until you start decoding parts) and hopefully reasonably security hardened. Parsing a message's MIME structure yourself means relying on the speed, quality, resilience against broken MIME messages, and security of whatever code either you write or your language of choice already has for MIME parsing, and it requires Exim to reconstitute a full copy of the message for you.

My experience with Python's standard MIME parsing module was that it's at least somewhat fragile against malformed input. This isn't a security risk (it's Python), but it did mean that my code wound up spending a bunch of time recovering from MIME parsing explosions and trying to extract some information from the mess anyways. I wouldn't be surprised if other languages had standard packages that assumed well-formed input and threw errors otherwise (and it's hard to blame them; dealing with malformed MIME messages is a specialized need).

(Admittedly I don't know how well Exim itself deals with malformed MIME messages and MIME parts. Hopefully it parses them as much as possible, but it may just throw up its hands and punt.)

sysadmin/EximAttachmentLoggingOptions written at 01:07:41; Add Comment


How Exim's ${run ...} string expansion operator does quoting

Exim has a complicated string expansion system with various expansion operations. One of these is ${run}, which runs a command to get its output (or just its exit status if you only care about that). The documentation for ${run} says, in part:

${run{<command> <args>}{<string1>}{<string2>}}
The command and its arguments are first expanded as one string. The string is split apart into individual arguments by spaces, [...]

Since the arguments are split by spaces, when there is a variable expansion which has an empty result, it will cause the situation that the argument will simply be omitted when the program is actually executed by Exim. If the script/program requires a specific number of arguments and the expanded variable could possibly result in this empty expansion, the variable must be quoted. [...]

What this documentation does not say is just how the command line is supposed to be quoted. For reasons to be covered later I have recently become extremely interested in this question, so I now have some answers.

The short answer is that the command is interpreted in the same way as it is in the pipe transport. Specifically:

Unquoted arguments are delimited by white space. If an argument appears in double quotes, backslash is interpreted as an escape character in the usual way. If an argument appears in single quotes, no escaping is done.

The usual way that backslashed escape sequences are handled is covered in character escape sequences in expanded strings.

Although the documentation for ${run} suggests using the sg operator to substitute dangerous characters, it appears that the much better approach is to use the quote operator instead. Using quote is simple and will allow you to pass through arguments unchanged, instead of either mangling characters with sg or doing complicated insertions of backslashes and so on. Note that this 'passing through unchanged' will include passing through literal newlines, which may be something you have to guard against in the command you're running. In fact, it appears that almost any time you're putting an Exim variable into a ${run} command line you should slap a ${quote:...} around it. Maybe the variable can't have whitespace or other dangerous things in it, but why take the chance?

(I suspect that the ${run} documentation was written at a time that quote didn't exist, but I haven't checked this.)

This documentation situation is less than ideal, to put it one way. It's possible that you can work all of this out without reading the Exim source code if you read all of the documentation at once and can hold it all in your head, but that's often not how documentation is used; instead it gets consulted as a sporadic reference. The ${run} writeup should at least have pointers to the sections with specific information on quoting, and ideally would have at least a brief inline discussion of its quoting rules.

(I also believe that the rules surrounding ${run}'s handling of argument expansion are dangerous and wrong, but it's too late to fix them now. See this entry and also this one.)

Sidebar: where in the Exim source this is

Since I had to read the Exim source to get my answer, I might as well note down where I found things.

${run} itself is handled in the EITEM_RUN case in expand_string_internal in expand.c. The actual command handling is done by calling transport_set_up_command, which is in transport.c. This handles single quotes itself in inline code but defers double quote handling to string_dequote in string.c, which calls string_interpret_escape to handle backslashed escape sequences.

(It looks like transport_set_up_command is called by various different things under various circumstances that I'm not going to try to decode the Exim source code to nail down.)

sysadmin/EximRunAndQuoting written at 00:54:14; Add Comment


Some notes on UID and GID remapping in the Illumos/OmniOS NFS server

As part of looking into this whole issue, I recently wound up reading the current manpages for the OmniOS NFS server and thus discovered that it can remap UIDs and GIDs for clients via the uidmap= and gidmap= NFS share options. Server side NFS ID remapping is not as neat or scalable as client side remapping, but it does solve my particular problem, so I've now played around with it a bit and have some notes.

The feature itself is an Illumos one and is about two years old now, so it's probably been integrated into most Illumos-based releases that you want to use (even though we mostly don't update, you really do want to do so every so often). It's certainly in OmniOS r151014, which is what we use on our fileservers.

The share_nfs manpage describes mappings as [clnt]:[srv]:access_list. Despite its name, the access_list bit is just for matching the client; it doesn't create or change any NFS mount access permissions, which are still set through rw= and so on. You can also use a different mechanism in each place for identifying clients, say a netgroup for filesystem access and then a hostname to identify the client for remapping (which might be handy if using a netgroup has side effects).

The manpage also describes uidmap (and gidmap) as 'remapping the user ID (uid) in the incoming request to some other uid'. This is a completely accurate description in that the server does not remap IDs in replies, such as in the results from stat() system calls. For example, if you remap your client UID to your server UID, 'ls -l' will show that your files are owned by the server-side UID, not you on the client. This is potentially confusing in general and will probably cause anything that does client-side UID or GID checking to incorrectly reject you.

(This design decision is probably due to the fact that the UID and GID mapping is not necessarily 1:1, either on the server or for that matter on the client. And yes, I can imagine merely somewhat perverse client side uses of mapping a second local UID to a server UID that also exists on the client.)

Note that in general UID remapping is probably more important than GID remapping, since you can always force a purely server side group list (as far as I know this NFS server lookup entirely overwrites the group list from the client).

PS: I don't know how well this scales on any level. Since all of the mappings have to be specified as share options, I expect that this won't really be very pleasant to deal with if you're trying to do a lot of remapping (either many IDs for some clients or many clients with a few ID remappings).

solaris/NFSServerUIDRemapping written at 01:27:04; Add Comment


Keeping around an index to the disk bays on our iSCSI backends

Today, for the first time in a year or perhaps more, we had a disk failure in one of our iSCSI backends (or at least we detected it today). As part of discovering that I was out of practice at dealing with this, I wound up having to hunt around to find our documentation on how iSCSI disk identifiers mapped to drive bays in our 16-bay Supermicro cases. This made me realize that probably we should do better at this.

The gold standard would be to label the actual drive sleds themselves with what iSCSI disk they are, but there are two problems with this. First, there's just not much good space on the front of each drive sled. Second and more severe, the actual assignment isn't tied to HD's serial number or anything to do with the drive sled, but on what bay the drive sled is in. Labeling the drive sleds thus has the same problem as comments in code, and we must be absolutely certain that each drive sled is put in the proper bay or has its label removed or updated if it's moved. I'm not quite willing to completely believe this any more than I ever completely believe code comments, and that means we're always going to need to double check.

While we have documentation (as mentioned), what we don't have is a printed out version of that documentation in our machine room. The whole thing fits on a single printed page, so there are plenty of obvious places it could go; probably the best place is on the exposed side of one of the fileserver racks. The heart of the documentation is a little chart, so we could cut out a few copies and put them up on the front of some of the racks that hold the fileservers and iSCSI backends. That would make the full documentation accessible when we're in the machine room, and keep the most important part visible when we're in front of the iSCSI backends about to pull a disk.

This isn't the only bit of fileserver and iSCSI backend information that would be handy to have immediately accessible in paper form in the machine room, either. I think I have some thinking, some planning, and some printing out to do in my future.

(We used to keep printed documentation about some things in the machine room, but then time passed, it got out of date or irrelevant, and we removed the old racks that it had previously been attached to. And we're a lot less likely to need things like 'this is theoretically the order to turn things on after a total power failure' than more routine documentation like 'what disk maps to what drive bay'.)

sysadmin/DriveChassisBayLabels written at 22:51:43; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.