Wandering Thoughts


Exim's (log) identifiers are basically unique on a given machine

Exim gives each incoming email message an identifier; these look like '1XgWdJ-00020d-7g'. Among other things, this identifier is used for all log messages about the particular email message. Since Exim normally splits information about each message across multiple lines, you routinely need to reassemble or at least match multiple lines for a single message. As a result of this need to aggregate multiple lines, I've quietly wondered for a long time just how unique these log identifiers were. Clearly they weren't going to repeat over the short term, but if I gathered tens or hundreds of days of logs for a particular system, would I find repeats?

The answer turns out to be no. Under normal circumstances Exim's message IDs here will be permanently unique on a single machine, although you can't count on global uniqueness across multiple machines (although the odds are pretty good). The details of how these message IDs are formed are in the Exim documentation's chapter 3.4. On most Unixes and with most Exim configurations they are a per-second timestamp, the process PID, and a final subsecond timestamp, and Exim takes care to guarantee that the timestamps will be different for the next possible message with the same PID.

(Thus a cross-machine collision would require the same message time down to the subsecond component plus the same PID on both machines. This is fairly unlikely but not impossible. Exim has a setting that can force more cross-machine uniqueness.)

This means that aggregation of multi-line logs can be done with simple brute force approaches that rely on ID uniqueness. Heck, to group all the log lines for a given message together you can just sort on the ID field, assuming you do a stable sort so that things stay in timestamp order when the IDs match.

(As they say, this is relevant to my interests and I finally wound up looking it up today. Writing it down here insures I don't have to try to remember where I found it in the Exim documentation the next time I need it.)

PS: like many other uses of Unix timestamps, all of this uniqueness potentially goes out the window if you allow time on your machine to actually go backwards. On a moderate volume machine you'd still have to be pretty unlucky to have a collision, though.

sysadmin/EximLogIdUniqueness written at 00:20:33; Add Comment


Some numbers on our inbound and outbound TLS usage in SMTP

As a result of POODLE, it's suddenly rather interesting to find out the volume of SSLv3 usage that you're seeing. Fortunately for us, Exim directly logs the SSL/TLS protocol version in a relatively easy to search for format; it's recorded as the 'X=...' parameter for both inbound and outbound email. So here's some statistics, first from our external MX gateway for inbound messages and then from our other servers for external deliveries.

Over the past 90 days, we've received roughly 1.17 million external email messages. 389,000 of them were received with some version of SSL/TLS. Unfortunately our external mail gateway currently only supports up to TLS 1.0, so the only split I can report is that only 130 of these messages were received using SSLv3 instead of TLS 1.0. 130 messages is low enough for me to examine the sources by hand; the only particularly interesting and eyebrow-raising ones were a couple of servers at a US university and a .nl ISP.

(I'm a little bit surprised that our Exim doesn't support higher TLS versions, to be honest. We're using Exim on Ubuntu 12.04, which I would have thought would support something more than just TLS 1.0.)

On our user mail submission machine, we've delivered to 167,000 remote addresses over the past 90 days. Almost all of them, 158,000, were done with SSL/TLS. Only three of them used SSLv3 and they were all to the same destination; everything else was TLS 1.0.

(It turns out that very few of our user submitted messages were received with TLS, only 0.9%. This rather surprises me but maybe many IMAP programs default to not using TLS even if the submission server offers it. All of these small number of submissions used TLS 1.0, as I'd hope.)

Given that our Exim version only supports TLS 1.0, these numbers are more boring than I was hoping they'd be when I started writing this entry. That's how it goes sometimes; the research process can be disappointing as well as educating.

(I did verify that our SMTP servers really only do support up to TLS 1.0 and it's not just that no one asked for a higher version than that.)

One set of numbers I'd like to get for our inbound email is how TLS usage correlates with spam score. Unfortunately our inbound mail setup makes it basically impossible to correlate the bits together, as spam scoring is done well after TLS information is readily available.

Sidebar: these numbers don't quite mean what you might think

I've talked about inbound message deliveries and outbound destination addresses here because that's what Exim logs information about, but of course what is really encrypted is connections. One (encrypted) connection may deliver multiple inbound messages and certainly may be handed multiple RCPT TO addresses in the same conversation. I've also made no attempt to aggregate this by source or destination, so very popular sources or destinations (like, say, Gmail) will influence these numbers quite a lot.

All of this means that this sort of numbers can't be taken as an indication of how many sources or destinations do TLS with us. All I can talk about is message flows.

(I can't even talk about how many outgoing messages are completely protected by TLS, because to do that I'd have to work out how many messages had no non-TLS deliveries. This is probably possible with Exim logs, but it's more work than I'm interested in doing right now. Clearly what I need is some sort of easy to use Exim log aggregator that will group all log messages for a given email message together and then let me do relatively sophisticated queries on the result.)

spam/CSLabTLSUsage2014-10 written at 23:27:51; Add Comment

Revisiting Python's string concatenation optimization

Back in Python 2.4, CPython introduced an optimization for string concatenation that was designed to reduce memory churn in this operation and I got curious enough about this to examine it in some detail. Python 2.4 is a long time ago and I recently was prompted to wonder what had changed since then, if anything, in both Python 2 and Python 3.

To quickly summarize my earlier entry, CPython only optimizes string concatenations by attempting to grow the left side in place instead of making a new string and copying everything. It can only do this if the left side string only has (or clearly will have) a reference count of one, because otherwise it's breaking the promise that strings are immutable. Generally this requires code of the form 'avar = avar + ...' or 'avar += ...'.

As of Python 2.7.8, things have changed only slightly. In particular concatenation of Unicode strings is still not optimized; this remains a byte string only optimization. For byte strings there are two cases. Strings under somewhat less than 512 bytes can sometimes be grown in place by a few bytes, depending on their exact sizes. Strings over that can be grown if the system realloc() can find empty space after them.

(As a trivial root, CPython also optimizes concatenating an empty string to something by just returning the other string with its reference count increased.)

In Python 3, things are more complicated but the good news is that this optimization does work on Unicode strings. Python 3.3+ has a complex implementation of (Unicode) strings, but it does attempt to do in-place resizing on them under appropriate circumstances. The first complication is that internally Python 3 has a hierarchy of Unicode string storage and you can't do an in-place concatenation of a more complex sort of Unicode string into a less complex one. Once you have compatible strings in this sense, in terms of byte sizes the relevant sizes are the same as for Python 2.7.8; Unicode string objects that are less than 512 bytes can sometimes be grown by a few bytes while ones larger than that are at the mercy of the system realloc(). However, how many bytes a Unicode string takes up depends on what sort of string storage it is using, which I think mostly depends on how big your Unicode characters are (see this section of the Python 3.3 release notes and PEP 393 for the gory details).

So my overall conclusion remains as before; this optimization is chancy and should not be counted on. If you are doing repeated concatenation you're almost certainly better off using .join() on a list; if you think you have a situation that's otherwise, you should benchmark it.

(In Python 3, the place to start is PyUnicode_Append() in Objects/unicodeobject.c. You'll probably also want to read Include/unicodeobject.h and PEP 393 to understand this, and then see Objects/obmalloc.c for the small object allocator.)

Sidebar: What the funny 512 byte breakpoint is about

Current versions of CPython 2 and 3 allocate 'small' objects using an internal allocator that I think is basically a slab allocator. This allocator is used for all overall objects that are 512 bytes or less and it rounds object size up to the next 8-byte boundary. This means that if you ask for, say, a 41-byte object you actually get one that can hold up to 48 bytes and thus can be 'grown' in place up to this size.

python/ExaminingStringConcatOptII written at 00:37:10; Add Comment


Vegeta, a tool for web server stress testing

Standard stress testing tools like siege (or the venerable ab, which you shouldn't use) are all systems that do N concurrent requests at once and see how your website stands up to this. This model is a fine one for putting a consistent load on your website for a stress test, but it's not actually representative of how the real world acts. In the real world you generally don't have, say, 50 clients all trying to repeatedly make and re-make one request to you as fast as they can; instead you'll have 50 new clients (and requests) show up every second.

(I wrote about this difference at length back in this old entry.)

Vegeta is a HTTP load and stress testing tool that I stumbled over at some point. What really attracted my attention is that it uses a 'N requests a second' model, instead of the concurrent request model. As a bonus it will also report not just average performance but also on outliers in the form of 90th and 99th percentile outliers. It's written in Go, which some of my readers may find annoying but which I rather like.

I gave it a try recently and, well, it works. It does what it says it does, which means that it's now become my default load and stress testing tool; 'N new requests a second' is a more realistic and thus interesting test than 'N concurrent requests' for my software (especially here, for obvious reasons).

(I may still do N concurrent requests tests as well, but it'll probably mostly be to see if there are issues that come up under some degree of consistent load and if I have any obvious concurrency race problems.)

Note that as with any HTTP stress tester, testing with high load levels may require a fast system (or systems) with plenty of CPUs, memory, and good networking if applicable. And as always you should validate that vegeta is actually delivering the degree of load that it should be, although this is actually reasonably easy to verify for a 'N new request per second' tester.

(Barring errors, N new requests a second over an M second test run should result in N*M requests made and thus appearing in your server logs. I suppose the next time I run a test with vegeta I should verify this myself in my test environment. In my usage so far I just took it on trust that vegeta was working right, which in light of my ab experience may be a little bit optimistic.)

web/VegetaLoadTesting written at 02:03:33; Add Comment


During your crisis, remember to look for anomalies

This is a war story.

Today I had one of those valuable learning experiences for a system administrator. What happened is that one of our old fileservers locked up mysteriously, so we power cycled it. Then it locked up again. And again (and an attempt to get a crash dump failed). We thought it might be hardware related, so we transplanted the system disks into an entirely new chassis (with more memory, because there was some indications that it might be running out of memory somehow). It still locked up. Each lockup took maybe ten or fifteen minutes from the reboot, and things were all the more alarming and mysterious because this particular old fileserver only had a handful of production filesystems still on it; almost all of them had been migrated to one of our new fileservers. After one more lockup we gave up and went with our panic plan: we disabled NFS and set up to do an emergency migration of the remaining filesystems to the appropriate new fileserver.

Only as we started the first filesystem migration did we notice that one of the ZFS pools was completely full (so full it could not make a ZFS snapshot). As we were freeing up some space in the pool, a little light came on in the back of my mind; I remembered reading something about how full ZFS pools on our ancient version of Solaris could be very bad news, and I was pretty sure that earlier I'd seen a bunch of NFS write IO at least being attempted against the pool. Rather than migrate the filesystem after the pool had some free space, we selectively re-enabled NFS fileservice. The fileserver stayed up. We enabled more NFS fileservice. And things stayed happy. At this point we're pretty sure that we found the actual cause of all of our fileserver problems today.

(Afterwards I discovered that we had run into something like this before.)

What this has taught me is during an inexplicable crisis, I should try to take a bit of time to look for anomalies. Not specific anomalies, but general ones; things about the state of the system that aren't right or don't seem right.

(There is a certain amount of hindsight bias in this advice, but I want to mull that over a bit before I wrote more about it. The more I think about it the more complicated real crisis response becomes.)

sysadmin/CrisisLookForAnomalies written at 00:54:50; Add Comment


My experience doing relatively low level X stuff in Go

Today I wound up needing a program that spoke the current Firefox remote control protocol instead of the old -remote based protocol that Firefox Nightly just removed. I had my choice between either adding a bunch of buffer mangling to a very old C program that already did basically all of the X stuff necessary or trying to do low-level X things from a Go program. The latter seemed much more interesting and so it's what I did.

(The old protocol was pretty simple but the new one involves a bunch of annoying buffer packing.)

Remote controlling Firefox is done through X properties, which is a relatively low level part of the X protocol (well below the usual level of GUIs and toolkits like GTK and Qt). You aren't making windows or drawing anything; instead you're grubbing around in window trees and getting obscure events from other people's windows. Fortunately Go has low level bindings for X in the form of Andrew Gallant's X Go Binding and his xgbutil packages for them (note that the XGB documentation you really want to read is for xgb/xproto). Use of these can be a little bit obscure so it very much helped me to read several examples (for both xgb and xgbutil).

All told the whole experience was pretty painless. Most of the stumbling blocks I ran into were because I don't really know X programming and because I was effectively translating from an older X API (Xlib) that my original C program was using to XCB, which is what XGB's API is based on. This involved a certain amount of working out what old functions that the old code was calling actually did and then figuring out how to translate them into XGB and xgbutil stuff (mostly the latter, because xgbutil puts a nice veneer over a lot of painstaking protocol bits).

(I was especially pleased that my Go code for the annoying buffer packing worked the first time. It was also pretty easy and obvious to write.)

One of the nice little things about using Go for this is that XGB turns out to be a pure Go binding, which means it can be freely cross compiled. So now I can theoretically do Firefox remote control from essentially any machine I remotely log into around here. Someday I may have a use for this, perhaps for some annoying system management program that insists on spawning something to show me links.

(Cross machine remote control matters to me because I read my email on a remote machine with a graphical program, and of course I want to click on links there and have them open in my workstation's main Firefox.)

Interested parties who want either a functional and reasonably commented example of doing this sort of stuff in Go or a program to do lightweight remote control of Unix Firefox can take a look at the ffox-remote repo. As a bonus I have written down in comments what I now know about the actual Firefox remote control protocol itself.

programming/GoLowLevelX written at 00:54:09; Add Comment


Don't use dd as a quick version of disk mirroring

Suppose, not entirely hypothetically, that you initially set up a server with one system disk but have come to wish that it had a mirrored pair of them. The server is in production and in-place migration to software RAID requires a downtime or two, so as a cheap 'in case of emergency' measure you stick in a second disk and then clone your current system disk to it with dd (remember to fsck the root filesystem afterwards).

(This has a number of problems if you ever actually need to boot from the second disk, but let's set them aside for now.)

Unfortunately, on a modern Linux machine you have just armed a time bomb that is aimed at your foot. It may never go off, or it may go off more than a year and a half later (when you've forgotten all about this), or it may go off the next time you reboot the machine. The problem is that modern Linux systems identify their root filesystem by its UUID, not its disk location, and because you cloned the disk with dd you now have two different filesystems with the same UUID.

(Unless you do something to manually change the UUID on the cloned copy, which you can. But you have to remember that step. On extN filesystems, it's done with tune2fs's -U argument; you probably want '-U random'.)

Most of the time, the kernel and initramfs will probably see your first disk first and inventory the UUID on its root partition first and so on, and thus boot from the right filesystem on the first disk. But this is not guaranteed. Someday the kernel may get around to looking at sdb1 before it looks at sda1, find the UUID it's looking for, and mount your cloned copy as the root filesystem instead of the real thing. If you're lucky, the cloned copy is so out of date that things fail explosively and you notice immediately (although figuring out what's going on may take a bit of time and in the mean time life can be quite exciting). If you're unlucky, the cloned copy is close enough to the real root filesystem that things mostly work and you might only have a few little anomalies, like missing log files or mysteriously reverted package versions or the like. You might not even really notice.

(This is the background behind my recent tweet.)

linux/DDMirroringDanger written at 02:13:02; Add Comment


Why system administrators hate security researchers every so often

So in the wake of the Bash vulnerability I was reading this Errata Security entry on Bash's code (via due to an @0xabad1dea retweet) and I came across this:

So now that we know what's wrong, how do we fix it? The answer is to clean up the technical debt, to go through the code and make systematic changes to bring it up to 2014 standards.

This will fix a lot of bugs, but it will break existing shell-scripts that depend upon those bugs. That's not a problem -- that's what upping the major version number is for. [...]

I cannot put this gently, so here it goes: FAIL.

The likely effect of any significant amount of observable Bash behavior changes (for behavior that is not itself a security bug) will be to leave security people feeling smug and the problem completely unsolved. Sure, the resulting Bash will be more secure. A powered off computer in a vault is more secure too. What it is not is useful, and the exact same thing is true of cavalierly breaking things in the name of security.

Bash's current behavior is relied on by a great many scripts written by a great many people. If you change any significant observable part of that behavior, so that scripts start breaking, you have broken the overall system that Bash is a part of. Your change is not useful. It doesn't matter if you change Bash's version number because changing the version number does nothing to magically fix those broken scripts.

Fortunately (for sysadmins), the Bash maintainers are extremely unlikely to take changes that will cause significant breakage in scripts. Even if the Bash maintainers take them, many distribution maintainers will not take them. In fact the distributions who are most likely to not take the fixes are the distributions that most need them, ie the distributions that have Bash as /bin/sh and thus where the breakage will cause the most pain (and Bashisms in such scripts are not necessarily bugs). Hence such a version of Bash, if one is ever developed by someone, is highly likely to leave security researchers feeling smug about having fixed the problem even if people are too obstinate to pick up their fix and to leave systems no more secure than before.

But then, this is no surprise. Security researchers have been ignoring the human side of their nominal field for a long time.

(As always, social problems are the real problems. If your proposed technical solution to a security issue is not feasible in practice, you have not actually fixed the problem. As a corollary, calling for such fixes is much the same as hoping magical elves will fix the problem.)

sysadmin/SecurityResearcherFail written at 01:12:55; Add Comment


Bashisms in #!/bin/sh scripts are not necessarily bugs

In the wake of Shellshock, any number of people have cropped up in any number of places to say that you should always be able to change a system's /bin/sh to something other than Bash because Bashisms in scripts that are specified to use #!/bin/sh are a bug. It is my heretical view that these people are wrong in general (although potentially right in specific situations).

First, let us get a trivial root out of the way: a Unix distribution is fully entitled to assume that you have not changed non-adjustable things. If a distribution ships with /bin/sh as Bash and does not have a supported way to change it to some other shell, then the distribution is fully entitled to write its own #!/bin/sh shell scripts so that they use Bashisms. This may be an unwise choice on the distribution's part, but it's not a bug unless they have an official policy that all of their shell scripts should be POSIX-only.

(Of course the distribution may act on RFEs that their #!/bin/sh scripts not use Bashisms. But that's different from it being a bug.)

Next, let's talk about user scripts. On a system where /bin/sh is always officially Bash, ordinary people are equally entitled to assume that your systems have not been manually mangled into unofficial states. As a result they are also entitled to write their #!/bin/sh scripts with Bashisms in them, because these scripts work properly on all officially supported system configurations. As with distributions, this may not be a wise choice (since it may cause pain if and when they ever move those scripts to another Unix system) but it is not a bug. The only case when it even approaches being a bug is when the distribution has officially included large warnings saying '/bin/sh is currently Bash but it may be something else someday, you should write all /bin/sh shell scripts to POSIX only, and here is a tool to help with that'.

There are some systems where this is the case and has historically been the case, and on those systems you can say that people using Bashisms in #!/bin/sh scripts clearly have a bug by the system's official policy. There are also quite a number of systems where this is or has not been the case, where the official /bin/sh is Bash and always has been. On those systems, Bashisms in #!/bin/sh scripts are not a bug.

(By the way, only relatively recently have you been able to count on /bin/sh being POSIX compatible; see here. Often it's had very few guarantees.)

By the way, as a pragmatic matter a system with only Bash as /bin/sh is likely to have plenty of /bin/sh shell scripts with Bashisms in them even if the official policy is that you should only use POSIX features in such scripts. This is a straightforward application of one of my aphorisms of system administration (and perhaps also this one). These scripts have a nominal bug, but of course people are not going to be happy if you break them.

sysadmin/BashAsShAndBashisms written at 02:06:38; Add Comment


System metrics need to be documented, not just to exist

As a system administrator, I love systems that expose metrics (performance, health, status, whatever they are). But there's a big caveat to that, which is that metrics don't really exist until they're meaningfully documented. Sadly, documenting your metrics is much less common than simply exposing them, perhaps because it takes much more work.

At the best of times this forces system administrators and other bystanders to reverse engineer your metrics from your system's source code or from programs that you or other people write to report on them. At the worst this makes your metrics effectively useless; sysadmins can see the numbers and see them change, but they have very little idea of what they mean.

(Maybe sysadmins can dump them into a stats tracking system and look for correlations.)

Forcing people to reverse engineer the meaning of your stats has two bad effects. The obvious one is that people almost always wind up duplicating this work, which is just wasted effort. The subtle one is that it is terribly easy for a mistake about what the metrics means to become, essentially, superstition that everyone knows and spreads. Because people are reverse engineering things in the first place, it's very easy for mistakes and misunderstandings to happen; then people write the mistake down or embody it in a useful program and pretty soon it is being passed around the Internet since it's one of the few resources on the stats that exist. One mistake will be propagated into dozens of useful programs, various blog posts, and so on, and through the magic of the Internet many of these secondary sources will come off as unhesitatingly authoritative. At that point, good luck getting any sort of correction out into the Internet (if you even notice that people are misinterpreting your stats).

At this point some people will suggest that sysadmins should avoid doing anything with stats that they reverse engineer unless they are absolutely, utterly sure that they're correct. I'm sorry, life doesn't work this way. Very few sysadmins reverse engineer stats for fun; instead, we're doing it to solve problems. If our reverse engineering solves our problems and appears sane, many sysadmins are going to share their tools and what they've learned. It's what people do these days; we write blog posts, we answer questions on Stackoverflow, we put up Github repos with 'here, these are the tools that worked for me'. And all of those things flow around the Internet.

(Also, the suggestion that people should not write tools or write up documentation unless they are absolutely sure that they are correct is essentially equivalent to asking people not to do this at all. To be absolutely sure that you're right about a statistic, you generally need to fully understand the code. That's what they call rather uncommon.)

sysadmin/StatsNeedDocumentation written at 01:24:54; Add Comment

(Previous 10 or go back to October 2014 at 2014/10/12)

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.