Wandering Thoughts archives

2016-07-14

Your C compiler's optimizer can make your bad programs compile

Every so often I learn something by having something brought to my awareness. Today's version is that one surprising side effect of optimization in C compilers can be to make your program compile. The story starts with John Regehr's tweets:

@johnregehr: tonight I am annoyed by: a program that both gcc and clang compile at -O2 and neither compiles at -O0

@johnregehr: easier to arrange than you might think

void bar(void);
static int x;
int main(void) { if (x) bar(); }

(With the Fedora 23 versions of gcc 5.3.1 and clang 3.7.0, this will fail to compile at -O0 but compile at -O1 or higher.)

I had to think about this for a bit before the penny dropped and I realized where the problem was and why it happens this way. John Regehr means that this is the whole program (there are no other files), so bar() is undefined. However, x is always 0; it's initialized to 0 by the rules of C, it's not visible outside this file since it's static, and there's nothing here that creates a path to change it. With optimization, the compiler can thus turn main() into:

int main(void) { if (0) bar(); }

This makes the call to bar() unreachable, so the compiler removes it. This leaves the program with no references to bar(), so it can be linked even though bar() is not defined anywhere. Without optimization, an attempt to call bar() remains in the compiled code (even though it will never be reached in execution) and then linking fails with an error about 'undefined reference to `bar''.

(The optimized code that both gcc and clang generate boil down to 'int main(void) { return 0; }', as you'd expect. You can explore the actual code through the very handy GCC/clang compiler explorer.)

As reported on Twitter by Jed Davis, this can apparently happen accidentally in real code (in this case Firefox, with a C++ variant).

programming/COptimizerMakingProgramsCompile written at 00:15:00; Add Comment

2016-07-13

Our central web server, Apache, and slow downloads

Recently, our monitoring system alerted us that our central web server wasn't responding. This has happened before and so I was able to immediately use our mod_status URL to see that yep, all of the workers were busy. This time around it wasn't from a popular but slow CGI or a single IP address; instead, a bunch of people were downloading PDFs of slides for a course here. Slowly, apparently (although the PDFs aren't all that big).

This time around I took the simple approach to deal with the problem; I increased the maximum number of workers by a bunch. This is obviously not an ideal solution, as we're using the prefork MPM and so more workers means more processes which means more memory used up (and potentially more thrashing in the kernel for various things). In our specific situation I figured that this would have relatively low impact, as worker processes that are just handling static file transfers to slow clients don't need many resources.

A high maximum workers setting is dangerous in general, though. With a different access pattern, the number of workers we have configured right now could easily kill the entire server (which would have a significant impact on a whole lot of people). Serving static files to a whole lot of slow clients is not a new problem (in fact it's a classic problem), but Apache has traditionally not had any clever way to handle it.

Modern versions of Apache have the event MPM, which its documentation claims is supposed to deal reasonably with this situation. We're using the prefork MPM, but I don't think we've carefully evaluated our choice here; it's just both the Ubuntu default and the historical behavior. We may want to reconsider this, as I don't think there's any reason we couldn't switch away from prefork (we don't enable PHP in the central web server).

(Per this serverfault question and answer, in the ever increasing world of HTTPS the event MPM is basically equivalent to the worker MPM. However, our central web server is still mostly HTTP, so could benefit here, and even the worker MPM is apparently better than prefork for avoiding memory load and so on.)

PS: There are potential interactions between NFS IO stalls and the choice of MPM here, but in practice in our environment any substantial NFS problem rapidly causes the web server to grind to a halt in general. If it grinds to a halt a little bit faster with the event or worker MPM than with the prefork one, this is not a big deal.

web/ApacheDownloadOverloadIssue written at 01:33:31; Add Comment

2016-07-12

How we do MIME attachment type logging with Exim

Last time around I talked about the options you have for how to log attachment information in an Exim environment. Out of our possible choices, we opted to do attachment logging using an external program that's run through Exim's MIME ACL, and to report the result to syslog in the program. All of this is essentially the least-effort choice. Exim parses MIME for us, and having the program do the logging means that it gets to make the decisions about just what to log.

However, the details are worth talking about, so let's start with the actual MIME ACL stanza we use:

# used only for side effects
warn
  # only act on potentially interesting parts
  condition = ${if or { \
     {and{{def:mime_content_disposition}{!eq{$mime_content_disposition}{inline}}}} \
     {match{$mime_content_type}{\N^(application|audio|video|text/xml|text/vnd)\N}} \
    } }
  # 
  decode = default
  # set a dummy variable to get ${run} executed
  set acl_m1_astatus = ${run {/etc/exim4/alogger/alogger.py \
     --subject ${quote:$header_subject:} \
     --csdnsbl ${quote:$header_x-cs-dnsbl:} \
     $message_exim_id \
     ${quote:$mime_content_type} \
     ${quote:$mime_content_disposition} \
     ${quote:$mime_filename} \
     ${quote:$mime_decoded_filename} }}

(See my discussion of quoting for ${run} for what's happening here.)

The initial 'condition =' is an attempt to only run our external program (and writing decoded MIME parts out to disk) for MIME parts that are likely to be interesting. Guessing what is an attachment is complicated and the program makes the final decision, but we can pre-screen some things. The parts we consider interesting are any MIME parts that explicitly declare themselves as non-inline, plus any inline MIME parts that have a Content-Type that's not really an inline thing.

There is one complication here, which is our check that $mime_content_disposition is defined. You might think that there's always going to be some content-disposition, but it turns out that when Exim says the MIME ACL is invoked on every MIME part it really means every part. Specifically, the MIME ACL is also invoked on the message body in a MIME email that is not a multipart (just, eg, a text/plain or text/html message). These single-part MIME messages can be detected because they don't have a defined content-disposition; we consider this to basically be an implicit 'inline' disposition and thus not interesting by itself.

The entire warn stanza exists purely to cause the ${run} to execute (this is a standard ACL trick; warn stanzas are often used just as a place to put ACL verbs). The easiest way to get that to happen is to (nominally) set the value of an ACL variable, as we do here. Setting an ACL variable makes Exim do string expansion in a harmless context that we can basically make into a no-op, which is what we need here.

(Setting a random ACL variable to cause string expansion to be done for its side effects is a useful Exim pattern in general. Just remember to add a comment saying it's deliberate that this ACL variable is never used.)

The actual attachment logger program is written in Python because basically the moment I started writing it, it got too complicated to be a shell script. It looks at the content type, the content disposition, and any claimed MIME filename in order to decide whether this part should be logged about or ignored (using the set of heuristics I outlined here). It uses the decoded content to sniff for ZIP and RAR archives and get their filenames (slightly recursively). We could have run more external programs for this, but it turns out that there are handy Python modules (eg the zipfile module) that will do the work for us. Working in pure Python probably doesn't perform as well as some of the alternatives, but it works well enough for us with our current load.

(In accord with my general principles, the program is careful to minimize the information it logs. For instance, we log only information about extensions, not filenames.)

The program is also passed the contents of some of the email headers so that it can add important information from them to the log message. Our anti-spam system adds a spam or virus marker to the Subject: header for recognized bad stuff, so we look for that marker and log if the attachment is part of a message scored that way. This is important for telling apart file types in real email that users actually care about from file types in spam that users probably don't.

(We've found it useful to log attachment type information on inbound email both before and after it passes through our anti-spam system. The 'before' view gives us a picture of what things look like before virus attachment stripping and various rejections happen, while the 'after' view is what our users actually might see in their mailboxes, depending on how they filter things marked as spam.)

Sidebar: When dummy variables aren't

I'll admit it: our attachment logger program prints out a copy of what it logs and our actual configuration uses $acl_m1_astatus later, which winds up containing this copy. We currently immediately reject all messages with ZIP files with .exes in them, and rather than parse MIME parts twice it made more sense to reuse the attachment logger's work by just pattern-matching its output.

sysadmin/EximOurAttachmentLogging written at 00:51:59; Add Comment

2016-07-11

Why Python can't have a full equivalent of Go's gofmt

I mentioned in passing here that people are working on Python equivalents of Go's gofmt and since then I've played around a bit with yapf, which was the most well developed one that I could find. Playing around with yapf (and thinking more about how to deal with my Python autoindent problem) brought home a realization, which is that Python fundamentally can't have a full, true equivalent of gofmt.

In Go, you can be totally sloppy in your pre-gofmt code; basically anything goes. Specifically, you don't need to indent your Go code in any particular way or even at all. If you're banging out a quick modification to some Go code, you can just stuff it in with either completely contradictory indentation or no indentation at all. More broadly, you can easily survive an editor with malfunctioning auto-indentation for Go code. Sure, you code will look ugly before you gofmt it, but it'll still work.

With Python, you can't be this free and casual. Since indentation is semantically meaningful, you must get the indentation correct right from the start; you can't leave it out or be inconsistent. A Python equivalent of gofmt can change the indentation level you use (and change some aspects of indentation style), but it can't add indentation for you in the way that gofmt does. This means that malfunctioning editor auto-indent is quite a bit more damaging (as is not having it at all); since indentation is not optional, you must correct or add it by hand, all the time. In Python, either you or your editor are forced to be less sloppy than you can be in Go.

(Sure, Go requires that you put in the { and } to denote block start and end, but those are easy and fast compared to getting indentation correct.)

Of course, you can start out with minimal, fast to create indentation; Python will let you do one or two space indents if you really want. But once you run yapf on your initial code, in many cases you're going to be stuck matching it for code changes. Python will tolerate a certain amount of indentation style mismatches, but not too much (Python 3 is less relaxed here than Python 2). Also, I'm confident that I don't know just how much sloppiness one can get away with here, so in practice I think most people are going to be matching the existing indentation even if they don't strictly have to. I know that I will be.

I hadn't thought about this asymmetry before my editor of choice started not getting my Python auto-indentation quite right, but it's now rather more on my mind.

python/PythonNoFullGofmt written at 00:02:48; Add Comment

2016-07-10

Some options for logging attachment information in an Exim environment

Suppose, not entirely hypothetically, that you use Exim as your mailer and you would like to log information about the attachments your users get sent. There are a number of different ways in Exim that you can do this, each with their own drawbacks and advantages. As a simplifying measure, let's assume that you want to do this during the SMTP conversation so that you can potentially reject messages with undesirable attachments (say ZIP files with Windows executables in them).

The first decision to make is whether you will scan and analyze the entire message in your own separate code, or let Exim break the message up into various MIME parts and look at them one-by-one. Examining the entire message at once means that you can log full information about its structure in one place, but it also means that you're doing all of the MIME processing yourself. The natural place to take a look at the whole message is with Exim's anti-virus content-scanning system; you would hook into it in a way similar to how we hooked our milter-based spam rejection into Exim.

(You'll want to use a warn stanza to just cause the scanner to run, and maybe to give you some stuff that you'll get Exim to log with the log_message ACL directive.)

If you want to let Exim section the message up into various different MIME parts for you, then you want a MIME ACL (covered in the Content scanning at ACL time chapter of the documentation). At this point you have another decision to make, which is whether you want to run an external program to analyze the MIME part or whether to rely only on Exim. The advantage of doing things entirely inside Exim is that Exim doesn't have to decode the MIME part to a file for your external program (and then run an outside program for each MIME part); the disadvantage is that you can only log MIME part information and can't do things like spot suspicious attempts to conceal ZIP files.

Mechanically, having Exim do it all means you'd just have a warn stanza in your MIME ACL that logged information like $mime_content_disposition, $mime_content_type, $mime_filename or its extension, and so on, using log_message =. You wouldn't normally use decode = because you have little use for decoding the part to a file unless you're going to have an outside program look at it. If you wanted to run a program against MIME parts, you'd use decode = default and then run the program with $mime_decoded_filename and possibly other arguments via ${run} in, for example, a 'set acl_m1_blah = ...' line.

(There are some pragmatic issues here that I'm deferring to another entry.)

Allowing Exim to section the message up for you is easier in many ways, but has two drawbacks. First, Exim doesn't really provide any way to get the MIME structure of the message, because you just get a stream of parts; you don't necessarily see, for example, how things are nested. The second is that processing things part by part obviously makes it harder to log all the information about a message's file types in a single line; the natural way is to log a separate line for each part, as you process it.

Speaking of logging, if you're running an external program (either for the entire message or for each MIME part) you need to decide whether your program will do the logging or whether you're going to have the program pass information back to Exim and have Exim log it. Passing information back to Exim is more work but means that you'll see your attachment information along with the other log lines for the message. Logging to a place like syslog may make the information more conveniently visible and it's generally going to be easier.

Sidebar: Exim's MIME parsing versus yours

Exim's MIME parsing is in C and is presumably done on an in-place version of the message that Exim already has on disk. It thus should be quite efficient (until you start decoding parts) and hopefully reasonably security hardened. Parsing a message's MIME structure yourself means relying on the speed, quality, resilience against broken MIME messages, and security of whatever code either you write or your language of choice already has for MIME parsing, and it requires Exim to reconstitute a full copy of the message for you.

My experience with Python's standard MIME parsing module was that it's at least somewhat fragile against malformed input. This isn't a security risk (it's Python), but it did mean that my code wound up spending a bunch of time recovering from MIME parsing explosions and trying to extract some information from the mess anyways. I wouldn't be surprised if other languages had standard packages that assumed well-formed input and threw errors otherwise (and it's hard to blame them; dealing with malformed MIME messages is a specialized need).

(Admittedly I don't know how well Exim itself deals with malformed MIME messages and MIME parts. Hopefully it parses them as much as possible, but it may just throw up its hands and punt.)

sysadmin/EximAttachmentLoggingOptions written at 01:07:41; Add Comment

2016-07-09

How Exim's ${run ...} string expansion operator does quoting

Exim has a complicated string expansion system with various expansion operations. One of these is ${run}, which runs a command to get its output (or just its exit status if you only care about that). The documentation for ${run} says, in part:

${run{<command> <args>}{<string1>}{<string2>}}
The command and its arguments are first expanded as one string. The string is split apart into individual arguments by spaces, [...]

Since the arguments are split by spaces, when there is a variable expansion which has an empty result, it will cause the situation that the argument will simply be omitted when the program is actually executed by Exim. If the script/program requires a specific number of arguments and the expanded variable could possibly result in this empty expansion, the variable must be quoted. [...]

What this documentation does not say is just how the command line is supposed to be quoted. For reasons to be covered later I have recently become extremely interested in this question, so I now have some answers.

The short answer is that the command is interpreted in the same way as it is in the pipe transport. Specifically:

Unquoted arguments are delimited by white space. If an argument appears in double quotes, backslash is interpreted as an escape character in the usual way. If an argument appears in single quotes, no escaping is done.

The usual way that backslashed escape sequences are handled is covered in character escape sequences in expanded strings.

Although the documentation for ${run} suggests using the sg operator to substitute dangerous characters, it appears that the much better approach is to use the quote operator instead. Using quote is simple and will allow you to pass through arguments unchanged, instead of either mangling characters with sg or doing complicated insertions of backslashes and so on. Note that this 'passing through unchanged' will include passing through literal newlines, which may be something you have to guard against in the command you're running. In fact, it appears that almost any time you're putting an Exim variable into a ${run} command line you should slap a ${quote:...} around it. Maybe the variable can't have whitespace or other dangerous things in it, but why take the chance?

(I suspect that the ${run} documentation was written at a time that quote didn't exist, but I haven't checked this.)

This documentation situation is less than ideal, to put it one way. It's possible that you can work all of this out without reading the Exim source code if you read all of the documentation at once and can hold it all in your head, but that's often not how documentation is used; instead it gets consulted as a sporadic reference. The ${run} writeup should at least have pointers to the sections with specific information on quoting, and ideally would have at least a brief inline discussion of its quoting rules.

(I also believe that the rules surrounding ${run}'s handling of argument expansion are dangerous and wrong, but it's too late to fix them now. See this entry and also this one.)

Sidebar: where in the Exim source this is

Since I had to read the Exim source to get my answer, I might as well note down where I found things.

${run} itself is handled in the EITEM_RUN case in expand_string_internal in expand.c. The actual command handling is done by calling transport_set_up_command, which is in transport.c. This handles single quotes itself in inline code but defers double quote handling to string_dequote in string.c, which calls string_interpret_escape to handle backslashed escape sequences.

(It looks like transport_set_up_command is called by various different things under various circumstances that I'm not going to try to decode the Exim source code to nail down.)

sysadmin/EximRunAndQuoting written at 00:54:14; Add Comment

2016-07-08

Some notes on UID and GID remapping in the Illumos/OmniOS NFS server

As part of looking into this whole issue, I recently wound up reading the current manpages for the OmniOS NFS server and thus discovered that it can remap UIDs and GIDs for clients via the uidmap= and gidmap= NFS share options. Server side NFS ID remapping is not as neat or scalable as client side remapping, but it does solve my particular problem, so I've now played around with it a bit and have some notes.

The feature itself is an Illumos one and is about two years old now, so it's probably been integrated into most Illumos-based releases that you want to use (even though we mostly don't update, you really do want to do so every so often). It's certainly in OmniOS r151014, which is what we use on our fileservers.

The share_nfs manpage describes mappings as [clnt]:[srv]:access_list. Despite its name, the access_list bit is just for matching the client; it doesn't create or change any NFS mount access permissions, which are still set through rw= and so on. You can also use a different mechanism in each place for identifying clients, say a netgroup for filesystem access and then a hostname to identify the client for remapping (which might be handy if using a netgroup has side effects).

The manpage also describes uidmap (and gidmap) as 'remapping the user ID (uid) in the incoming request to some other uid'. This is a completely accurate description in that the server does not remap IDs in replies, such as in the results from stat() system calls. For example, if you remap your client UID to your server UID, 'ls -l' will show that your files are owned by the server-side UID, not you on the client. This is potentially confusing in general and will probably cause anything that does client-side UID or GID checking to incorrectly reject you.

(This design decision is probably due to the fact that the UID and GID mapping is not necessarily 1:1, either on the server or for that matter on the client. And yes, I can imagine merely somewhat perverse client side uses of mapping a second local UID to a server UID that also exists on the client.)

Note that in general UID remapping is probably more important than GID remapping, since you can always force a purely server side group list (as far as I know this NFS server lookup entirely overwrites the group list from the client).

PS: I don't know how well this scales on any level. Since all of the mappings have to be specified as share options, I expect that this won't really be very pleasant to deal with if you're trying to do a lot of remapping (either many IDs for some clients or many clients with a few ID remappings).

solaris/NFSServerUIDRemapping written at 01:27:04; Add Comment

2016-07-06

Keeping around an index to the disk bays on our iSCSI backends

Today, for the first time in a year or perhaps more, we had a disk failure in one of our iSCSI backends (or at least we detected it today). As part of discovering that I was out of practice at dealing with this, I wound up having to hunt around to find our documentation on how iSCSI disk identifiers mapped to drive bays in our 16-bay Supermicro cases. This made me realize that probably we should do better at this.

The gold standard would be to label the actual drive sleds themselves with what iSCSI disk they are, but there are two problems with this. First, there's just not much good space on the front of each drive sled. Second and more severe, the actual assignment isn't tied to HD's serial number or anything to do with the drive sled, but on what bay the drive sled is in. Labeling the drive sleds thus has the same problem as comments in code, and we must be absolutely certain that each drive sled is put in the proper bay or has its label removed or updated if it's moved. I'm not quite willing to completely believe this any more than I ever completely believe code comments, and that means we're always going to need to double check.

While we have documentation (as mentioned), what we don't have is a printed out version of that documentation in our machine room. The whole thing fits on a single printed page, so there are plenty of obvious places it could go; probably the best place is on the exposed side of one of the fileserver racks. The heart of the documentation is a little chart, so we could cut out a few copies and put them up on the front of some of the racks that hold the fileservers and iSCSI backends. That would make the full documentation accessible when we're in the machine room, and keep the most important part visible when we're in front of the iSCSI backends about to pull a disk.

This isn't the only bit of fileserver and iSCSI backend information that would be handy to have immediately accessible in paper form in the machine room, either. I think I have some thinking, some planning, and some printing out to do in my future.

(We used to keep printed documentation about some things in the machine room, but then time passed, it got out of date or irrelevant, and we removed the old racks that it had previously been attached to. And we're a lot less likely to need things like 'this is theoretically the order to turn things on after a total power failure' than more routine documentation like 'what disk maps to what drive bay'.)

sysadmin/DriveChassisBayLabels written at 22:51:43; Add Comment

It turns out that viruses do try to conceal their ZIP files

One of the interesting things that happens when you start to log information about what types of files your users get in email is that you get to discover certain sorts of questionable things that people actually do ('people' in a loose sense). Here's one interesting MIME part, extracted from our logs:

attachment application/octet-stream; MIME file ext: .jpeg; zip exts: .js

The 'attachment' bit is the Content-Disposition and the nominal MIME type comes from the Content-Type. The MIME filename (which came either from Content-Type or Content-Disposition) had a .jpeg extension; however, our logging program found that the attachment actually was a ZIP file with a single .js file inside it, not a JPG image. Our anti-spam software later identified it as malware.

(I didn't set out to write an attachment type logging program that did content sniffing, but the Python zipfile module has a very convenient function for it and it's much simpler to structure the code that way instead of trying to maintain a list of file extensions and/or Content-Types that correspond to ZIP files.)

I vaguely knew that any number of file formats were actually ZIP files under the hood; there's .jar files, for example, and a number of the modern '* office' suites and programs use ZIP as their underlying format. Our file type logging program has peered inside any number of those sorts of attachments (as well as inside regular .zip attachments). I also knew that it was theoretically possible for bad actors to try to smuggle ZIP files through as some other file type. But I didn't expect to see it, especially so fast.

(To be fair, most malware does seem to stick to .zip files, not infrequently even with real MIME Content-Types. I suspect that malware wants to make it easy for people to open up the bad stuff that it sends them.)

PS: Hopefully no real content filtering software is fooled by this sort of transparent ruse. It's not as if ZIP archives are hard to detect. Sadly, that (some) malware does this kind of thing makes me suspect that some important software actually is defeated by it.

PPS: All of the cases seem to be from the same malware run, based on how they all happened today and have various other indicators in common.

spam/VirusesDoConcealZipFiles written at 01:31:39; Add Comment

2016-07-04

A feature I wish the Linux NFS client had: remapping UIDs and GIDs

My office workstation is a completely standalone machine, deliberately not dependent on, say, our NFS fileserver infrastructure. As a sysadmin's primary machine, there are obvious motivations for this and it's not something I'm ever going to change, but at the same time it does have its drawbacks and it'd be nice to sometimes NFS mount my fileserver home directory on my workstation.

Unfortunately there is a little but important obstacle. For peculiar historical reasons, the UID and GID of my local workstation account are not the same as on our fileservers (and thus our Ubuntu servers and so on). My login name is the same so things like SSH work fine, but NFS cares about the actual UID and GID and I wind up out of luck as far as NFS mounts go.

In theory the solution to this is NFS v4. In practice we don't use NFS v4 now, we're very unlikely to add NFS v4 any time soon for general use, and there is exactly zero chance that we'll add NFS v4 to our environment just so that I can NFS mount my fileserver home directory on my office workstation. Put bluntly, there are much easier solutions to that particular problem, ones that put all the work on my head where it rightfully belongs.

Hence my wish that the NFS client would support remapping UIDs and GIDs between the NFS server's view and the local Unix's view. In my particular situation I'd even be happy with a mount option that said 'always tell the server that we're UID X and GID Y', because that's all I need.

There's a pro and a con argument for doing this in the NFS client instead of the NFS server. The pro argument is that it's easier to scale this administratively if you can do it in the client. If it's done on the client, only the person running the client has to care; if it's done on the server, the server administrators have to be involved every time another client needs another UID or GID remapping.

The con argument is that NFS v3 'security' is already leaky enough without handing people a totally convenient client side way of subverting it totally (well, if you have root on a client). Yes, sure, you can already do this in a number of ways if you have client root, but all of those ways take at least some work. This feature would make it trivially easy, and there's a benefit to avoiding that.

(I expect that the real answer here is 'the Linux NFS maintainers have no interest in adding NFS v3 UID and GID mapping code in either the client or the server; use NFS v4 if you need this'. On the other hand, they did add NFS v3 server support for more than 16 GIDs.)

linux/NFSClientIDRemapWish written at 23:22:44; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.