Wandering Thoughts archives

2014-09-21

One reason why Go can have methods on nil pointers

I was recently reading an article on why Go can call methods on nil pointers (via) and wound up feeling that it was incomplete. It's hard to talk about 'the' singular reason that Go can do this, because a lot of design decisions went into the mix, but I think that one underappreciated reason this happens is because Go doesn't have inheritance.

In a typical language with inheritance, you can both override methods on child classes and pass a pointer to a child class instance to a function that expects a pointer to the parent class instance (and the function can then call methods on that pointer). This combination implies that the actual machine code in the function cannot simply make a static call to the appropriate parent class method function; instead it must somehow go through some sort of dynamic dispatch process so that it calls the child's method function instead of the parent's when passed what is actually a pointer to a child instance.

In non-nil pointers to objects, you have a natural place to put such a vtable (or rather a pointer to it) because the object has actual storage associated with it. But a nil pointer has no storage associated with it and so you can't naturally do this. That means given a nil pointer, how do you find the correct vtable? After all it might be a nil pointer of the child class that should call child class methods.

Because Go has no inheritance this problem does not come up. If your function takes a pointer to a concrete type and you call t.Method(), the compiler statically knows which exact function you're calling; it doesn't need to do any sort of dynamic lookup. Thus it can easily make this call even when given a nil. In effect the compiler gets to rewrite a call to t.Method() to something like ttype_Method(t).

But wait, you may say. What about interfaces? These have exactly the dynamic dispatch problem I was just talking about. The answer is that Go actually represents interface values that are pointers as two pointers; one which is the actual value and another points to (among other things) a vtable for the interface (which is populated based on the concrete type). Because Go statically knows that it is dealing with an interface instead of a concrete type, the compiler builds code that calls indirectly through this vtable.

(As I found out, this can lead to a situation where what you think is a nil pointer is not actually a nil pointer as Go sees it because it has been wrapped up as an interface value.)

Of course you could do this two-pointer trick with concrete types too if you wanted to, but it would have the unfortunate effect of adding an extra word to the size of all pointers. Most languages are not interested in paying that cost just to enable nil pointers to have methods.

(Go doesn't have inheritance for other reasons; it's probably just a happy coincidence that it enables nil pointers to have methods.)

PS: it follows that if you want to add inheritance to Go for some reason, you need to figure out how to solve this nil pointer with methods problem (likely in a way that doesn't double the size of all pointers). Call this an illustration of how language features can be surprisingly intertwined with each other.

programming/GoNilMethodsWhy written at 01:38:12; Add Comment

2014-09-19

My view on using VLANs for security

I've recently read some criticism of the security value of VLANs. Since we use VLANs heavily I've been thinking a bit about this issue and today I feel like writing up my opinions. The short version is that I don't think using VLANs is anywhere close to being an automatic security failure. It's much more nuanced (and secure) than that.

My overall opinion is that the security of your VLANs rests on the security of the switches (and hosts) that the VLANs are carried on, barring switch bugs that allow you to hop between VLANs in various ways or to force traffic to leak from one to another. The immediate corollary is that the most secure VLANs are the ones that are on as few switches as possible. Unfortunately this cuts against both flexibility and uniformity; it's certainly easier if you have all of your main switches carry all of your VLANs by default, since that makes their configurations more similar and means it's much less work to surface a given VLAN at a given point.

(This also depends on your core network topology. A chain or a ring can force you to reconfigure multiple intermediate switches if VLAN A now needs to be visible at a new point B, whereas a star topology pretty much insures only a few directly involved switches need to be touched.)

Because they're configured (partly) through software instead of purely by physical changes, a VLAN based setup is more vulnerable to surreptitious evil changes. All an attacker has to do is gain administrative switch access and they can often make a VLAN available to something or somewhere it shouldn't be. As a corollary, it's harder to audit a VLAN-based network than one that is purely physical in that you need to check the VLAN port configurations in addition to the physical wiring.

(Since basically all modern switches are VLAN-capable even if you don't use the features, I don't think that avoiding VLANs means that an attacker who wants to get a new network on to a machine needs the machine to have a free network port. They can almost certainly arrange a way to smuggle the network to the machine as a tagged link on an existing port.)

So in summary I think that VLANs are somewhat less secure than separate physical networks but not all that much less secure, since your switches should be fairly secure in general (both physically and for configuration changes). But if you need ultimate security you do want or need to build out physically separate networks. However my suspicions are that most people don't have security needs that are this high and so are fine with using just VLANs for security isolation.

(Of course there are political situations where having many networks on one switch may force you to give all sorts of people access to that switch so that they can reconfigure 'their' network. If you're in this situation I think that you have several problems, but VLANs do seem like a bad idea because they lead to that shared switch awkwardness.)

Locally we don't have really ultra-high security needs and so our VLAN setup is good enough for us. Our per-group VLANs are more for traffic isolation than for extremely high security, although of course they and the firewalls between the VLANs do help increase the level of security.

Sidebar: virtual machines, hosts, VLANs, and security

One relatively common pattern that I've read about for virtual machine hosting is to have all of the VLANs delivered to your host machines and then to have some sort of internal setup that routes appropriate networks to all of the various virtual machines on a particular host. At one level you can say that this is obviously a point of increased vulnerability with VLANs; the host machine is basically operating as a network switch in addition to its other roles so it's an extra point of vulnerability (perhaps an especially accessible one if it can have the networking reconfigured automatically).

My view is that to say this is to misread the actual security vulnerability here. The real vulnerability is not having VLANs; it is hosting virtual machines on multiple different networks (presumably of different security levels) on the same host machine. With or without VLANs, all of those networks have to get to that host machine and thus it has access to all of them and thus can be used to commit evil with or to any of them. To really increase security here you need to deliver fewer networks to each host machine (which of course has the side effect of making them less uniform and constraining which host machines a given virtual machine can run on).

(The ultimate version is that each host machine is only on a single network for virtual machines, which means you need at least as many host machines as you have networks you want to deploy VMs on. This may not be too popular with the people who set your budgets.)

tech/VLANSecurityView written at 23:20:14; Add Comment

What I mean by passive versus active init systems

I have in the past talked about passive versus active init systems without quite defining what I meant by that, except sort of through context. Since this is a significant division between init systems that dictates a lot of other things, I've decided to fix that today.

Put simply, an active init system is one that actively tracks the status of services as part of its intrinsic features; a passive init system is one that does not. The minimum behavior of an active init system is that it knows what services have been activated and not later deactivated. Better active init systems know whether services are theoretically still active or if they've failed on their own.

(Systemd, upstart, and Solaris's SMF are all active init systems. In general any 'event-based' init system that starts services in response to events will need to be active, because it needs to know which services have already been started and which ones haven't and thus are candidates for starting now. System V init's /etc/init.d scripts are a passive init system, although /etc/inittab is an active one. Most modern daemon supervision systems are active systems.)

One direct consequence is that an active init system essentially has to do all service starting and stopping itself, because this is what lets it maintain an accurate record of what services are active. You may run commands to do this, but they have to talk to the init system itself. By contrast, in a passive init system the commands you run to start and stop services can be and often are just shell scripts; this is the archetype of System V init.d scripts. You can even legitimately start and stop services outside of the scripts at all, although things may get a bit confused.

(In the *BSDs things can be even simpler in that you don't have scripts and you may just run the daemons. I know that OpenBSD tends to work this way but I'm not sure if FreeBSD restarts stuff quite that directly.)

An active init system is also usually more communicative with the outside world. Since it knows the state of services it's common for the init system to have a way to report this status to people who ask, and of course it has to have some way of being told either to start and stop services or at least that particular services have started and stopped. Passive init systems are much less talkative; System V init basically has 'change runlevel' and 'reread /etc/inittab' and that's about it as far its communication goes (and it doesn't even directly tell you what the runlevel is; that's written to a file that you read).

Once you start down the road to an active init system, in practice you wind up wanting some way to track daemon processes so you can know if a service has died. Without this an active init system is basically flying blind in that it knows what theoretically started okay but it doesn't necessarily know what's still running. This can be done by requiring cooperative processes that don't do things like detach themselves from their parents or it can be done with various system specific Unix extensions to track groups of processes even if they try to wander off on their own.

As we can see from this, active init systems are more complicated than passive ones. Generally the more useful features they offer and the more general they are the more complicated they will be. A passive init system can be done with shell scripts; an attractive active one requires some reasonably sophisticated C programming.

PS: An active init system that notices when services die can offer a feature where it will restart them for you. In practice most active init systems aren't set up to do this for most services for various reasons (that may or may not be good ones).

(This entry was partly sparked by reading parts of this mail thread that showed up in my Referer logs because it linked to some of my other entries.)

unix/PassiveVsActiveInitSystems written at 00:57:11; Add Comment

2014-09-18

Ubuntu's packaging failure with mcelog in 14.04

For vague historical reasons we've had the mcelog package in our standard package set. When we went to build our new 14.04 install setup, this blew up on us; on installation, some of our machines would report more or less the following:

Setting up mcelog (100-1fakesync1) ...
Starting Machine Check Exceptions decoder: CPU is unsupported
invoke-rc.d: initscript mcelog, action "start" failed.
dpkg: error processing package mcelog (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 mcelog
E: Sub-process /usr/bin/dpkg returned an error code (1)

Here we see a case where a collection of noble intentions have had terrible results.

The first noble intention is a desire to warn people that mcelog doesn't work on all systems. Rather than silently run uselessly or silently exit successfully, mcelog instead reports an error and exits with a failure status.

The second noble intention is the standard Debian noble intention (inherited by Ubuntu) of automatically starting most daemons on installation. You can argue that this is a bad idea for things like database servers, but for basic system monitoring tools like mcelog and SMART monitoring I think most people actually want this; certainly I'd be a bit put out if installing smartd didn't actually enable it for me.

(A small noble intention is that the init script passes mcelog's failure status up, exiting with a failure itself.)

The third noble intention is that it is standard Debian behavior for an init script that fails when it is started in the package's postinstall script to cause the postinstall script itself to exit out with errors (it's in a standard dh_installinit stanza). When the package postinstall script errors out, dpkg itself flags this as a problem (as well it should) and boom, your entire package install step is reporting an error and your auto-install scripts fall down. Or at least ours do.

The really bad thing about this is that server images can change hardware. You can transplant disks from one machine to another for various reasons; you can upgrade the hardware of a machine but preserve the system disks; you can move virtual images around; you can (as we do) have standard machine building procedures that want to install a constant set of packages without having to worry about the exact hardware you're installing on. This mcelog package behavior damages this hardware portability in that you can't safely install mcelog in anything that may change hardware. Even if the initial install succeeds or is forced, any future update to mcelog will likely cause you problems on some of your machines (since a package update will likely fail just like a package install).

(This is a packaging failure, not an mcelog failure; given that mcelog can not work on some machines it's installed on, the init script failure should not cause a fatal postinstall script failure. Of course the people who packaged mcelog may well not have known that it had this failure mode on some machines.)

I'm sort of gratified to report that Debian has a bug for this, although the progress of the bug does not fill me with great optimism and of course it's probably important enough to ever make it into Ubuntu 14.04 (although there's also an Ubuntu bug).

PS: since mcelog has never done anything particularly useful for us, we have not been particularly upset over dropping it from our list of standard packages. Running into the issue was a bit irritating though, but mcelog seems to be historically good at irritation.

PPS: the actual problem mcelog has is even more stupid than 'I don't support this CPU'; in our case it turns out to be 'I need a special kernel module loaded for this machine but I won't do it for you'. It also syslogs (but does not usefully print) a message that says:

mcelog: AMD Processor family 16: Please load edac_mce_amd module.#012: Success

See eg this Fedora bug and this Debian bug. Note that the message really means 'family 16 and above', not 'family 16 only'.

linux/McelogUbuntuFailure written at 01:57:09; Add Comment

2014-09-17

In praise of Solaris's pfiles command

I'm sure that at one point I was introduced to pfiles through a description that called it the Solaris version of lsof for a single process. This is true as far as it goes and I'm certain that I used pfiles as nothing more than this for a long time, but it understates what pfiles can do for you. This is because pfiles will give you a fair amount more information than lsof will, and much of that information is useful stuff to know.

Like lsof, pfiles will generally report what a file descriptor maps to (file, device, network connection, and Solaris IPC 'doors', often with information about what process is on the other end of the door). Unlike on some systems, the pfiles information is good enough to let you track down who is on the other end of Unix domain sockets and pipes. Sockets endpoints are usually reported directly; pipe information generally takes cross-correlating with other processes to see who else has an S_IFIFO with the specific ino open.

(You would think that getting information on the destination of Unix domain sockets would be basic information, but on some systems it can take terrible hacks.)

Pfiles will also report some socket state information for sockets, like the socket flags and the send and receive buffers. Personally I don't find this deeply useful and I wish that pfiles also showed things like the TCP window and ACK state. Fortunately you can get this protocol information with 'netstat -f inet -P tcp' or 'netstat -v -f inet -P tcp' (if you want lots of details).

Going beyond this lsof-like information, pfiles will also report various fcntl() and open() flags for the file descriptor. This will give you basic information like the FD's read/write status, but it goes beyond this; for example, you can immediately see whether or not a process has its sockets open in non-blocking mode (which can be important). This is often stuff that is not reported by other tools and having it handy can save you from needing deep dives with DTrace, a debugger, or the program source code.

(I'm sensitive to several of these issues because my recent Amanda troubleshooting left me needing to chart out the flow of pipes and to know whether some sockets were nonblocking or not. I could also have done with information on TCP window sizes at the time, but I didn't find the netstat stuff until just now. That's how it goes sometimes.)

solaris/PfilesPraise written at 02:03:57; Add Comment

2014-09-15

My collection of spam and the spread of SMTP TLS

One of the things that my sinkhole SMTP server does that's new on my workstation is that it supports TLS, unlike my old real mail server there (which dates from a very, very long time ago). This has given me the chance to see how much of my incoming spam is delivered with TLS, which in turn has sparked some thoughts about the spread of SMTP TLS.

The starting point is that a surprising amount of my incoming spam is actually delivered with TLS; right now about 30% of the successful deliveries have used TLS. This is somewhat more striking than it sounds for two reasons; first, the Go TLS code I'm relying on for TLS is incomplete (and thus not all TLS-capable sending MTAs can actually do TLS with it), and second a certain amount of the TLS connection attempts fail because the sending MTA is offering an invalid client certificate.

(I also see a fair number of rejected delivery attempts in my SMTP command log that did negotiate TLS, but the stats there are somewhat tangled and I'm not going to try to summarize them.)

While there are some persistent spammers, most of the incoming email is your typical advance fee fraud and phish spam that's send through various sorts of compromised places. Much of the TLS email I get is this boring sort of spam, somewhat to my surprise. My prejudice is that a fair amount of this spam comes from old and neglected machines, which are exactly the machines that I would expect are least likely to do TLS.

(Some amount of such spam comes from compromised accounts at places like universities, which can and do happen to even modern and well run MTAs. I'm not surprised when they use TLS.)

What this says to me is that support for initiating TLS is fairly widespread in MTAs, even relatively old MTAs, and fairly well used. This is good news (it's now clear that pervasive encryption of traffic on the Internet is a good thing, even casual opportunistic encryption). I suspect that it's happened because common MTAs have enabled client TLS by default and the reason they've been able to do that is that it basically takes no configuration and almost always works.

(It's clear that at least some client MTAs take note when STARTTLS fails and don't try it again even if the server MTA offers it to them, because I see exactly this pattern in my SMTP logs from some clients.)

PS: you might wonder if persistent spammers use TLS when delivering their spam. I haven't done a systematic measurement for various reasons but on anecdotal spot checks it appears that my collection of them basically doesn't use TLS. This is probably unsurprising since TLS does take some extra work and CPU. I suspect that spammers may start switching if TLS becomes something that spam filtering systems use as a trust signal, just as some of them have started advertising DKIM signatures.

spam/SpamAndTLSSpread written at 23:25:16; Add Comment

I want my signed email to work a lot like SSH does

PGP and similar technologies have been in the news lately, and as a result of this I added the Enigmail extension to my testing Thunderbird instance. Dealing with PGP through Enigmail reminded me of why I'm not fond of PGP. I'm aware that people have all sorts of good reasons and that PGP itself has decent reasons for working the way it does, but for me the real strain point is not the interface but fundamentally how PGP wants me to work. Today I want to talk just about signed email, or rather however I want to deal with signed email.

To put it simply, I want people's keys for signed email to mostly work like SSH host keys. For most people the core of using SSH is not about specifically extending trust to specific, carefully validated host keys but instead about noticing if things change. In practical use you accept a host's SSH key the first time you're offered it and then SSH will scream loudly and violently if it ever changes. This is weaker than full verification but is far easier to use, and it complicates the job of an active attacker (especially one that wants to get away with it undetected). Similarly, in casual use of signed email I'm not going to bother carefully verifying keys; I'm instead going to trust that the key I fetched the first time for the Ubuntu or Red Hat or whatever security team is in fact their key. If I suddenly start getting alerts about a key mismatch, then I'm going to worry and start digging. A similar thing applies to personal correspondents; for the most part I'm going to passively acquire their keys from keyservers or other methods and, well, that's it.

(I'd also like this to extend to things like DKIM signatures of email, because frankly it would be really great if my email client noticed that this email is not DKIM-signed when all previous email from a given address had been.)

On the other hand, I don't know how much sense it makes to even think about general MUA interfaces for casual, opportunistic signed email. There is a part of me that thinks signed email is a sexy and easy application (which is why people keep doing it) that actually doesn't have much point most of the time. Humans do terribly at checking authentication, which is why we mostly delegate that to computers, yet casual signed email in MUAs is almost entirely human checked. Quick, are you going to notice that the email announcement of a new update from your vendor's security team is not signed? Are you going to even care if the update system itself insists on signed updates downloaded from secure mirrors?

(My answers are probably not and no, respectively.)

For all that it's nice to think about the problem (and to grumble about the annoyances of PGP), a part of me thinks that opportunistic signed email is not so much the wrong problem as an uninteresting problem that protects almost nothing that will ever be attacked.

(This also ties into the problem of false positives in security. The reality is that for casual message signatures, almost all missing or failed signatures are likely to have entirely innocent explanations. Or at least I think that this is the likely explanation today; perhaps mail gets attacked more often than I think on today's Internet.)

tech/MySignedMailDesire written at 01:42:10; Add Comment

2014-09-14

My current hassles with Firefox, Flash, and (HTML5) video

When I've written before about my extensions, I've said that I didn't bother with any sort of Flash blocking because NoScript handled that for me. The reality turns out to be that I was sort of living a charmed life, one that has recently stopped working the way I want it and forced me into a series of attempts at workarounds.

How I want Flash and video to work is that no Flash or video content activates automatically (autoplay is evil, among other things) but that I can activate any particular piece of content if and when I want to. Ideally this activation (by default) only last while the window is open; if I discard the window and re-visit the URL again, I don't get an autoplaying video or the like. In particular I want things to work this way on YouTube, which is my single most common source of videos (and also my most common need for JavaScript).

For a long time, things worked this way with just NoScript. Then at some point recently this broke down; if I relied only on NoScript, YouTube videos either would never play or would autoplay the moment the page loaded. If I turned on Firefox's 'ask to activate' for Flash, Firefox enabled and disabled things on a site-wide basis (so the second YouTube video I'd visit would autoplay). I wound up having to add two extensions to stop this:

  • Flashblock is the classic Flash blocker. Unlike Firefox's native 'ask to activate', it acts by default on a per-item basis, so activating one YouTube video I watch doesn't auto-play all future ones I look at. To make Flashblock work well I have disabled NoScript's blocking of Flash content so that I rely entirely on Flashblock; this has had the useful side effect of allowing me to turn on Flash elements on various things besides YouTube.

    But recently YouTube added a special trick to their JavaScript arsenal; if Flash doesn't seem to work but HTML5 video does, they auto-start their HTML5 player instead. For me this includes if Flashblock is blocking their Flash, so I had to find some way to deal with that.

  • StopTube stops YouTube autoplaying HTML5 videos. With both Flashblock and StopTube active, YouTube winds up using Flash (which is blocked and enabled by StopTube). I don't consider this ideal as I'd rather use HTML5, but YouTube is what it is. As the name of this addon sort of suggests, StopTube has the drawback that it only stops HTML5 video on YouTube itself. HTML5 video elsewhere are not blocked by it, including YouTube videos embedded on other people's pages. So far those embedded videos aren't autoplaying for me, but they may in the future. That might force me to not whitelist YouTube for JavaScript (at which point I almost might as well turn JavaScript off entirely in my browser).

    (An energetic person might be able to make such an addon starting from StopTube's source code.)

Some experimentation suggests that I might get back to what I want with just NoScript if I turn on NoScript's 'Apply these restrictions to whitelisted sites too' option for embedded content it blocks. But for now I like Flashblock's interface better (and I haven't been forced into this by being unable to block autoplaying HTML5 video).

There are still unfortunate aspects to this setup. One of them is that Firefox doesn't appear to have an 'ask to activate' (or more accurately 'ask to play') option for its HTML5 video support; this forces me to keep NoScript blocking that content instead of being able to use a nicer interface for enabling it if I want to. It honestly surprises me that Firefox doesn't already do this; it's an obviously feature and is only going to be more and more asked for as more people start using auto-playing HTML5 video for ads.

(See also this superuser.com question and its answers.)

web/FirefoxFlashVideoHassles written at 01:15:58; Add Comment

2014-09-13

What can go wrong with polling for writability on blocking sockets

Yesterday I wrote about how our performance problem with amandad were caused by amandad doing IO multiplexing wrong by only polling for whether it could read from its input file descriptors and assuming it could always write to its network sockets. But let's ask a question: suppose that amandad was also polling for writability on those network sockets. Would it work fine?

The answer is no, not without even more code changes, because amandad's network sockets aren't set to be non-blocking. The problem here is what it really means when poll() reports that something is ready for write (or for that matter, for read). Let me put it this way:

That poll() says a file descriptor is ready for writes doesn't mean that you can write an arbitrary amount of data to it without blocking.

When I put it this way, of course it can't. Can I write a gigabyte to a network socket or a pipe without blocking? Pretty much any kernel is going to say 'hell no'. Network sockets and pipes can never instantly absorb arbitrary amounts of data; there's always a limit somewhere. What poll()'s readiness indicator more or less means is that you can now write some data without blocking. How much data is uncertain.

The importance of non-blocking sockets is due to an API decision that Unix has made. Given that you can't write an arbitrary amount of data to a socket or a pipe without blocking, Unix has decided that by default when you write 'too much' you get blocked instead of getting a short write return (where you try to write N bytes and get told you wrote less than that). In order to not get blocked if you try a too large write you must explicitly set your file descriptor to non-blocking mode; at this point you will either get a short write or just an error (if you're trying to write and there is no room at all).

(This is a sensible API decision for reasons beyond the scope of this entry. And yes, it's not symmetric with reading from sockets and pipes.)

So if amandad just polled for writability but changed nothing else in its behavior, it would almost certainly still wind up blocking on writes to network sockets as it tried to stuff too much down them. At most it would wind up blocked somewhat less often because it would at least send some data immediately every time it tried to write to the network.

(The pernicious side of this particular bug is whether it bites you in any visible way depends on how much network IO you try to do how fast. If you send to the network (or to pipes) at a sufficiently slow rate, perhaps because your source of data is slow, you won't stall visibly on writes because there's always the capacity for how much data you're sending. Only when your send rates start overwhelming the receiver will you actively block in writes.)

Sidebar: The value of serendipity (even if I was wrong)

Yesterday I mentioned that my realization about the core cause of our amandad problem was sparked by remembering an apparently unrelated thing. As it happens, it was my memory of reading Rusty Russell's POLLOUT doesn't mean write(2) won't block: Part II that started me on this whole chain. A few rusty neurons woke up and said 'wait, poll() and then long write() waits? I was reading about that...' and off I went, even if my initial idea turned out to be wrong about the real cause. Had I not been reading Rusty Russell's blog I probably would have missed noticing the anomaly and as a result wasted a bunch of time at some point trying to figure out what the core problem was.

The write() issue is clearly in the air because Ewen McNeill also pointed it out in a comment on yesterday's entry. This is a good thing; the odd write behavior deserves to be better known so that it doesn't bite people.

programming/PollBlockingWritesBad written at 00:47:49; Add Comment

2014-09-12

How not to do IO multiplexing, as illustrated by Amanda

Every so often I have a belated slow-motion realization about what's probably wrong with an otherwise mysterious problem. Sometimes this is even sparked by remembering an apparently unrelated thing I read in passing. As it happens, that happened the other day.

Let's rewind to this entry, where I wrote about what I'd discovered while looking into our slow Amanda backups. Specifically this was a dump of amandad handling multiple streams of backups at once, which we determined is the source of our slowness. In that entry I wrote in passing:

[Amandad is] also spending much more of its IO wait time writing the data rather than waiting for there to be more input, although the picture here is misleading because it's also making pollsys() calls and I wasn't tracking the time spent waiting in those [...]

This should have set off big alarm bells. If amandad is using poll(), why is it spending any appreciable amount of time waiting for writes to complete? After all the whole purpose of poll() et al is to only be woken when you can do work, so you should spend minimal time blocked in the actual IO functions. The unfortunate answer is that Amanda is doing IO multiplexing wrong, in that I believe it's only using poll() to check for readability on its input FDs, not writability on its output FDs. Instead it tacitly assumes that whenever it has data to read it can immediately write all of this data out with no fuss, muss, or delays.

(Just checking for writability on the network connections wouldn't be quite enough, of course, but that's another entry.)

The problem is that this doesn't necessarily work. You can easily have situations where one TCP stream will accept much more data than another one, or where all (or just most) TCP streams will accept only a modest amount of data at a time; this is especially likely when the TCP streams are connected to different processes on the remote end. If one remote process stalls its TCP stream can stop accepting much more data, at which point a large write to the stream may stall in turn, which stalls all amandad activity even if it could succeed, which stalls upstream activity that is trying to send data to amandad. What amandad's handling of multiplexing does is to put all of the data streams flowing through it at the mercy of whatever is the slowest write stream at any given point in time. If anything blocks everything can block, and sooner or later we seem to wind up in a situation where anything can and does block. The result is a stuttering stop-start process full of stalls that reduces the overall data flow significantly.

In short, what you don't want in good IO multiplexing is a situation where IO on stream A has to wait because you're stalled doing IO on stream B, and that is just what amandad has arranged here.

Correct multiplexing is complex (even in the case of a single flow) but the core of it is never overcommitting yourself. What amandad should be doing is only writing as much output at a time as any particular connection can take, buffering some data internally, and passing any back pressure up to the things feeding it data (by stopping reading its inputs when it cannot write output and it has too much buffered). This would ensure that the multiplexing does no harm in that it can always proceed if any of the streams can proceed.

(Splitting apart the multiplexing into separate processes (or threads) does the same thing; because each one is only handling a single stream, that stream blocking doesn't block any other stream.)

PS: good IO multiplexing also needs to be fair, ie if if multiple streams or file descriptors can all do IO at the current time then all of them do some IO, instead of a constant stream of ready IO on one stream starving IO for other streams. Generally the easiest way to do this is to have your code always process all ready file descriptors before returning to poll(). This is also the more CPU-efficient way to do it.

programming/IOMultiplexingDoneWrong written at 01:10:43; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.