Wandering Thoughts archives


What should it mean for a system call to time out?

I was just reading Evan Klitzke's Unix System Call Timeouts (via) and among a number of thoughts about it, one of the things that struck me is a simple question. Namely, what should it mean for a Unix system call to time out?

This question may sound pointlessly philosophical, but it's actually very important because what we expect a system call timeout to mean will make a significant difference in how easy it would be to add system calls with timeouts. So let's sketch out two extreme versions. The first extreme version is that if a timeout occurs, the operation done by the system call is entirely abandoned and undone. For example, if you call rename("a", "b") and the operation times out, the kernel guarantees that the file a has not been renamed to b. This is obviously going to be pretty hard, since the kernel may have to reverse partially complete operations. It's also not always possible, because some operations are genuinely irreversible. If you write() data to a pipe and time out partway through doing so (with some but not all data written), you cannot reach into the pipe and 'unwrite' all of the already sent data; after all, some of it may already have been read by a process on the other side of the pipe.

The second extreme version is that having a system call time out merely causes your process to stop waiting for it to complete, with no effects on the kernel side of things. Effectively, the system call is shunted to a separate thread of control and continues to run; it may complete some time, or it may error out, but you never have to wait for it to do either. If the system call would normally return a new file descriptor or the like, the new file descriptor will be closed immediately when the system call completes. In practice implementing a strict version of this would also be relatively hard; you'd need an entire infrastructure for transferring system calls to another kernel context (or more likely, transplanting your user-level process to another kernel context, although that has its own issues). This is also at odds with the existing system calls that take timeouts, which generally result in the operation being abandoned part way through with no guarantees either way about its completion.

(For example, if you make a non-blocking connect() call and then use select() to wait for it with a timeout, the kernel does not guarantee that if the timeout fires the connect() will not be completed. You are in fact in a race between your likely close() of the socket and the connection attempt actually completing.)

The easiest thing to implement would probably be a middle version. If a timeout happens, control returns to your user level with a timeout indication, but the operation may be partially complete and it may be either abandoned in the middle of things or completed for you behind your back. This satisfies a desire to be able to bound the time you wait for system calls to complete, but it does leave you with a messy situation where you don't know either what has happened or what will happen when a timeout occurs. If your mkdir() times out, the directory may or may not exist when you look for it, and it may or may not come into existence later on.

(Implementing timeouts in the kernel is difficult for the same reason that asynchronous IO is hard; there is a lot of kernel code that is much simpler if it's written in straight line form, where it doesn't have to worry about abandoning things part way through at essentially any point where it may have to wait for the outside world.)

unix/SystemCallTimeoutMeaning written at 01:03:40; Add Comment


CSS, <pre>, and trailing whitespace lead to browser layout weirdness

Today someone left a comment on this entry of mine about Python which consisted of a block of code. The comment looked normal in my feed reader, but when I looked at it in my browser (for neurotic reasons) I got a surprise; the new comment was forcing my Firefox to display the page really widely, with even regular text out of my viewport. This was very surprising because I theoretically made that impossible years ago, by forcing all <pre> blocks in comments to have a CSS white-space: pre-wrap setting. At first I thought that my CSS had broken at some point, but with Firefox's style debugging tools I could see the actual CSS being applied and it had the proper white-space setting.

More experimentation showed that things were even weirder than it had initially looked. First, the behavior depended on how wide my Firefox window was; if it dropped below the critical width for my responsive design here to show the sidebar as a sidebar, the problem went away. Second, the exact behavior depended on the browser; in Chrome, the overall page gained a horizontal scrollbar but no actual content extended out the right side of the browser's viewport (ie, its visible window area).

(I've fixed how the live version of the page renders, but you can see the original version preserved here. Feel free to play around with your browser's tools to see if you can work out why this is happening, and I'd love to know what other browsers beyond Firefox and Chrome do with it.)

Eventually (and more or less by luck), I stumbled over what was causing this (although I still don't know why). The root cause is that the <pre> block has a huge area of whitespace at the end of almost every line. Although it looks like the widest <pre> line is 76 characters long, all but the last line are actually 135 characters long, padded out with completely ordinary spaces.

The MDN writeup of white-space contains a hint as to why this is happening, when it says that for pre-wrap 'sequences of white space are preserved'. This is what you need in preformatted text for many purposes, but it appears to mean that the really long runs of trailing whitespace in these lines are being treated as single entities that force the content width to be very wide. Firefox doesn't visibly wrap these lines anywhere and has the whitespace forcing the surrounding boxes to be wide, while Chrome merely had it widen the overall page width without expanding the content boxes. My Chrome at the right width will force the longest line or two of the <pre> content to wrap.

My fix for now was to use magic site admin powers to edit the raw comment to trim off the trailing whitespace. In theory one possible CSS-level fix for this is to also set the word-break CSS property to 'break-all' for <pre> elements, which appears to make Firefox willing to break things in the middle of this sort of whitespace. However this also makes Firefox willing to break <pre> elements in the middle of non-whitespace words, which I find both ugly and unreadable. What I really want is a setting for word-break that means 'try not to break words in the middle, but do it if you have to in order to not run over'.

(Maybe in another ten years CSS will have support for that. Yes, there's overflow-wrap for regular text, in theory, but it doesn't seem to do anything here. Possibly this is because Firefox doesn't feel that the large chunk of whitespace is actually overflowing its containing box but instead it's growing the containing box. CSS makes my head hurt.)

web/CSSPreLayoutTrailingWhitespace written at 01:47:25; Add Comment


Your live web server probably has features you don't know about

As has become traditional, I'll start with my tweet:

TIL that Ubuntu gives you an Apache /cgi-bin/ script alias that maps to /usr/lib/cgi-bin. Do you know what's in that dir on your web server?

Most web servers have configuration files and configuration processes that are sufficiently complicated that almost no one writes configurations for them from scratch from the ground up. For their own reasons, these servers simply require you to specify too many things; all of the modules you want loaded, all of the decisions about character set mappings, all of the sensible defaults that must actually be explicitly specified in a configuration file somewhere, and so on. Instead, to configure many web servers we start from vendor supplied configuration files; generally this is our OS vendor. In turn the OS vendor's configuration is generally derived from a combination of the standard or example configuration file in the upstream source plus a number of things designed to make an installed web server package work 'sensibly' out of the box.

Very frequently, this configuration contains things that you may not have expected. My discovery today was one of them. From the perspective of Ubuntu (and probably Debian), this configuration makes a certain amount of sense; it creates an out of the box feature that just works and that can be used by other packages that need to provide CGI-BINs that will 'just work' on a stock Ubuntu Apache without further sysadmin intervention. This is the good intention that is generally behind all of these surprises. In practice, though, this makes things work in the simple case at the cost of giving people surprises in the complex one.

(I suspect that Apache is especially prone to this because Apache configuration is so complex and baroque, or at least it is as usually presented. Maybe there is a small, simple hand-written configuration hiding deep inside all of the standard mess.)

I don't have any great fixes for this situation. We're probably never going to hand write our Apache configurations from the ground up so that we know and understand everything in them. This implies that we should at least scan through enabled modules and configuration snippets to see if anything jumps out at us.

This issue is related to but not quite the same as web server configurations that cross over between virtual hosts. In the crossover case we wanted Apache UserDirs, but only on one virtual host instead of all of them; in today's /cgi-bin/ script alias case, we didn't want this on any virtual host.

(Looking back at that entry I see that back then I was already planning to audit all the Apache global settings as a rainy day project. I guess I should move it up the priority list a bit.)

web/WebServerUnanticipatedFeatures written at 00:07:30; Add Comment


Malware is sometimes sent through organized, purchased infrastructure

Every so often, I wonder where malware comes from. Well, in a mechanical sense; does it come from infected machines, or from rented botnets, or what? Today we got a malware attack campaign that gave us a very clear answer: it came from dedicated, custom-built infrastructure.

Between 12:17 and 12:28 today, a cluster of four IP addresses tried to send us 376 email messages. All of them had a HELO and verified host name of confidentialdocumentdelivery.com, and the MAIL FROM was 'service-<tagged-address>@confidentialdocumentdelivery.com'. All of them that did not get rejected for other reasons had a .doc file that Sophos identifies as CXmail/OleDl-V. To make things look more tempting, they all had some common mail headers:

To: <whoever>
Subject: Confidential Documents Delivery
From: "Document Delivery" <service@confidentialdocumentdelivery.com>

(They also appear to have had valid DKIM signatures, just in case you think DKIM signatures on email are any sign of trust.)

At the time, confidentialdocumentdelivery.com was (and is) in the Spamhaus DBL, and all four IPs involved were in the Spamhaus CSS, probably among other blocklists. The four IPs in question are all in AS202053 (or), 'UpCloud Cloud Servers' according to RIPE information. Their DNS PTR records at the time were all 'confidentialdocumentdelivery.com', but they've since been recycled to other PTRs. The domain itself seems to have been registered only today, assuming I believe the whois data.

All of this makes it clear that these weren't infected machines, hijacked machines, or a rented botnet. This was a whole set of carefully built infrastructure; someone figured out and bought a good domain name, rented some VPSes, assigned DNS, configured a whole set of mail sending infrastructure (complete with VERP), and used all of this to deliberately send out malware, probably in large bulk. This was an entire organized campaign on dedicated infrastructure that was put together for this specific purpose.

(The infrastructure may or may not have been custom built. For all I know, there are people who sell spammers the service of 'I will set up your sending infrastructure; you provide the domain name and some VPSes and so on'. And if it was custom built, I suspect that the malware gang responsible for this will reuse much of the software configurations and so on for another malware barrage.)

The thing that puzzles me is why you would go through all of the effort to plan and develop this, execute the plan at good speed and with solid organization (if the domain was only registered today), and yet use malware that Sophos and presumably other could already recognize. According to Sophos's page, recognized versions of this have been around since January, which I suspect is an eternity in terms of malware recognition.

(For the curious, the four IPs are,,, and Out of those two /24s, and are also currently on the Spamhaus CSS.)

spam/MalwareFromPurchasedInfrastructure written at 01:32:21; Add Comment


I wish you could whitelist kernel modules, instead of blacklisting them

There was another Ubuntu kernel security update released today, this time one for CVE-2017-2636. It's a double-free in the N_HDLC line discipline, which can apparently be exploited to escalate privileges (per the news about it). Another double-free issue was also the cause of CVE-2017-6074; based on what Andrey Konovalov said for the latter, these issues are generally exploitable with standard techniques. Both were found with syzkaller, and all of this suggests that we're going to see more such double-free and use after free issues found in the future.

You've probably never heard of the N_HDLC line discipline, which is probably related to HDLC; certainly I hadn't. You may well not have heard of DCCP either. The Linux kernel contains a great deal of things like this, and modern distributions generally build it all as loadable kernel modules, because why not? Modules are basically free, since they just take up some disk space, and building everything avoids issues that have bit people in the past.

Unfortunately, in a world where more and more obscure Linux kernel code is being subjected to more and more attention, modules are no longer free. All of those neglected but loadable modules are now potential security risks, and the available evidence is that every so often one of them is going to explode in your face. So I said on Twitter:

I'm beginning to think that we should explicitly blacklist almost all Ubuntu kernel modules that we don't use. Too many security issues.

(At the time I was busy adding a blacklist entry for the n_hdlc module to deal with CVE-2017-2636. Given that we now blacklist DCCP, n_hdlc, and overlayfs, things are starting to add up.)

A lot of kernel modules are for hardware that we don't have, which is almost completely harmless since the drivers won't ever be loaded automatically and even if you did manage to load them, they would immediately give up and go away because the hardware they need isn't there. But there are plenty of things like DCCP that will be loaded on demand through the actions of ordinary users, and which are then exposed to be exploited. Today, this is dangerous.

There are two problems with the approach I tweeted. The first is that the resulting blacklist will be very big, since there are a lot of modules (even if one skips device drivers). The second is that new versions of Linux generally keep adding new modules, which you have to hunt down and add to your blacklist. Obviously, what would be better is a whitelist; we'd check over our systems and whitelist only the modules that we needed or expected to need. All other modules would be blocked by default, perhaps with some way to log attempts to load a module so we could find out when one is missing from our whitelist.

(Modern Linux systems load a lot of modules; some of our servers have almost a hundred listed in /proc/modules. But even still, the whitelist would be smaller than any likely blacklist.)

Unfortunately there doesn't seem to be any particular support for this in the Linux kernel module tools. Way back in 2010, Fedora had a planned feature for this and got as far as writing a patch to add this. Unfortunately the Fedora bugzilla entry appears to have last been active in 2012, so this is probably very dead by now (I doubt the patch still applies, for example).

linux/KernelModuleWhitelistWish written at 00:30:10; Add Comment


An AMD Ryzen is unlikely to be my next desktop's CPU

I'm probably going to wind up building a new home machine this year, to replace my current five year old one. One of the reasons I haven't been doing much on this yet is that both Intel and AMD only recently released their latest desktop CPU lines, Kaby Lake and Ryzen respectively. AMD has not been particularly competitive in CPUs for years now, so there's been a lot of hope attached to Ryzen; plenty of people really want AMD to come through this time around so Intel would face some real competition for once.

I have an unusual reason to be interested in Ryzen, which is that I would like ECC memory if possible. For a while, one of the quiet attractions of AMD is that they've been much more generous about supporting ECC in their CPUs and chipsets than Intel, who carefully locks ECC away from their good desktops. If Ryzen was reasonably competitive in CPU performance, had reasonable thermal performance, and supported ECC, it would suddenly might be worth overlooking things like single-threaded performance (despite what I've written about that being a priority, because I'm fickle).

The best current information I've absorbed is via Dan McDonald's Twitter, here (original, with more questions and so on) and a qualification here; unfortunately this is then followed up by more confusion. The short form version appears to be that ECC support is theoretically in the Ryzen CPUs and the AM4 chipset, but it is not qualified by AMD at the moment and it may not be enabled and supported by any particular actual AM4 Ryzen motherboard, and there are some indications that ECC may not actually be supported in this generation after all.

The really short form version: I shouldn't hold my breath for actual, buyable AMD Ryzen based desktop motherboards that support ECC RAM (in ECC mode). They may come out and they may even work reliably when they do, but ECC is clearly not a feature that either AMD or the motherboard vendors consider a priority.

With ECC off the table as a motivation, the rest of Ryzen doesn't look particularly compelling. Although AMD has gotten closer this time around, Ryzen doesn't have the raw CPU performance of Intel's best CPUs and perhaps not their good thermal performance either; for raw single CPU performance at reasonable cost and TDP, Intel's i7-7700K is still close to being the champion. Ryzen attempts to make up the deficit by throwing more cores at it, which I'm not excited by, and by being comparatively cheaper, which doesn't motivate me much when I seem to buy roughly one desktop every five years.

(As an additional drawback, current Ryzen CPUs don't have integrated graphics. I hate the idea of venturing into the graphics card swamps just to drive a 4K monitor.)

Still, Ryzen's a pretty good try this time around. If I ran a lot of virtual machines on my (office) desktop machine, Ryzen might be an interesting alternative to look at there, and I hope that there are a bunch of people working on cramming them into inexpensive servers (especially if they can get those servers to support ECC).

tech/AMDRyzenEarlyViews written at 00:56:48; Add Comment


Modern X Windows can be a very complicated environment

I mentioned Corebird, the GTK+ Twitter client the other day, and generally positive. That was on a logical weekend. The next day I went in to the office, set up Corebird there, and promptly ran into a problem: I couldn't click on links in Tweets, or rather I could but it didn't activate the link (it would often do other things). Corebird wasn't ignoring left mouse clicks in general, it's just that they wouldn't activate links. I had not had this problem at home (or my views would not have been so positive. I use basically the same fvwm-based window manager environment at home and at work, but since Corebird is a GTK+ application and GTK+ applications can be influenced by all sorts of magic settings and (Gnome) setting daemons, I assumed that it was something subtle that was different in my work GTK+/Gnome environment and filed a Fedora bug in vague hopes. To my surprise, it turned out to be not merely specific to fvwm, but specific to one aspect of my particular fvwm mouse configuration.

The full version is in the thread in the fvwm mailing list, but normally when you click and release a button, the X server generates two events, a ButtonPress and then a ButtonRelease. However, if fvwm was configured in a way such that it might need to do something with a left button press, a different set of events was generated:

  • a LeaveNotify with mode NotifyGrab, to tell Corebird that the mouse pointer had been grabbed away from it (by fvwm).
  • an EnterNotify with mode NotifyUngrab, to tell Corebird 'here is your mouse pointer back because the grab has been released' (because fvwm was passing the button press through to Corebird).
  • the ButtonPress for the mouse button.

The root issue appears to be that something in the depths of GTK+ takes the LeaveNotify to mean that the link has lost focus. Since GTK+ doesn't think the link is focused, when it receives the mouse click it doesn't activate the link, but it does take other action, since it apparently still understands that the mouse is being clinked in the text of the GtkLabel involved.

(There's a test program that uses a simple GtkLabel to demonstrate this, see this, and apparently there are other anomalies in GtkLabel's input processing in this area.)

If you think that this all sounds very complex, yes, exactly. It is. X has a complicated event model to start with, and then interactions with the window manager add extra peculiarities on top. The GTK+ libraries are probably strictly speaking in the wrong here, but I also rather suspect that this is a corner case that the GTK+ programmers never imagined, much less encountered. In a complex environment, some possibilities will drop through the cracks.

(If you want to read a high level overview of passive and active (mouse button) grabs, see eg this 2010 writeup by Peter Hutter. Having read it, I feel like I understand a bit more about what fvwm is doing here.)

By the way, some of this complexity is an artifact of the state of computing when X was created, specifically that both computers and networking were slow. Life would be simpler for everyone if all X events were routed through the window manager and then the window manager passed them on to client programs as appropriate. However, this would require all events to pass through an extra process (and possibly an extra one or two network hops), and in the days when X was young this could have had a real impact on overall responsiveness. So X goes to a great deal of effort to deliver events directly to programs whenever possible while still allowing the window manager to step in.

(My understanding is that in Wayland, the compositor handles all events and passes them to clients as it decides. The Wayland compositor is a lot more than just the equivalent of an X window manager, but it fills that role, and so in Wayland this issue wouldn't come up.)

unix/ModernXCanBeVeryComplex written at 22:39:05; Add Comment


Why I (as a sysadmin) reflexively dislike various remote file access tools for editors

I somewhat recently ran across this irreal.org entry (because it refers to my entry on staying with Emacs for code editing), and in a side note it mentions this:

[Chris] does a lot of sysadmin work and prefers Vim for that (although I think Tramp would go a long way towards meeting the needs that he thinks Vim resolves).

This is partly in reference to my entry on Why vi has become my sysadmin's editor, and at the end of that entry I dismissed remote file access things like Tramp. I think it's worth spending a little bit of time talking about why I reflexively don't like them, at least with my sysadmin hat on.

There are several reasons for this. Let's start with the mechanics of remote access to files. If you're a sysadmin, it's quite common for you to be editing files that require root permissions to write to, and sometimes to even read. This presents two issues for a Tramp like system. The first is that either you arrange passwordless access to root-privileged files or that at some point during this process you provide your root-access password. The first is very alarming and the second requires a great deal of trust in Tramp or other code to securely handle the situation. The additional issue for root access is that best practices today is to not log or scp in directly as root but instead to log in as yourself and then use su or sudo to gain root access. Perhaps you can make the remote file access system of choice do this, but it's extremely unlikely to be simple because this is not at all a common usage case for them. Almost all of these systems are built by developers to allow them to access their own files remotely; indirect access to privileged contexts is a side feature at best.

(Let's set aside issues of, say, two-factor authentication.)

All of this means that I would have to build access to an extremely sensitive context on uncertain foundations that require me to have a great deal of trust in both the system's security (when it probably wasn't built with high security worries in the first place) and that it will always do exactly and only the right thing, because once I give it root permissions one slip or accident could be extremely destructive.

But wait, there's more. Merely writing sensitive files is a dangerous and somewhat complicated process, one where it's actually not clear what you should always do and some of the ordinary rules don't always apply. For instance, in a sysadmin context if a file has hardlinks, you generally want to overwrite it in place so that those hardlinks stay. And you absolutely have to get the permissions and even ownership correct (yes, sysadmins may use root permissions to write to files that are owned by someone other than root, and that ownership had better stay). Again, it's possible for a remote file access system to get this right (or be capable of being set up that way), but it's probably not something that the system's developers have had as a high priority because it's not a common usage case. And I have to trust that this is all going to work, all the time.

Finally, often editing a file is only part of what I'm doing as root. I hopefully want to commit that file to version control and also perhaps (re)start daemons or run additional commands to make my change take effect and do something. Perhaps a remote file editing system even has support for this, even running as a privileged user through some additional access path, but frankly this is starting to strain my ability to trust this system to get everything right (and actually do this well). Of course I don't have to use the remote access system for this, since I can just get root privileges directly and do all of this by hand, but if I'm going to be setting up a root session anyways to do additional work, why not go the small extra step to run vi in it? That way I know exactly what I'm getting and I don't have to extend a great deal of trust that a great deal of magic will do the right thing and not blow up in my face.

(And if the magic blows up, it's not just my face that's affected.)

Ultimately, directly editing files with vim as root (or the appropriate user) on the target system is straightforward, simple, and basically guaranteed to work. It has very few moving parts and they are mostly simple ones that are amenable to inspection and understanding. All of this is something that sysadmins generally value quite a bit, because we have enough complexity in our jobs as it is.

sysadmin/WhyRemoteFileWriteDislike written at 23:25:24; Add Comment


Should you add MX entries for hosts in your (public) DNS?

For a long time, whenever we added a new server to our public DNS, we also added an MX entry for it (directing inbound email to our general external MX gateway). This was essentially historical habit, and I believe it came about because a very long time ago there were far too many programs that would send out email with From: and even To: addresses of '<user>@<host>.<our domain>'. Adding MX entries made all of that email work.

In the past few years, I have been advocating (mostly successfully) for moving away from this model of automatically adding MX entries, because I've come to believe that in practice it leads to problems over the long term. The basic problem is this: once an email address leaks into public, you're often stuck supporting it for a significant amount of time. Once those <host>.<our-dom> DNS names start showing up in email, they start getting saved in people's address books, wind up in mailing list subscriptions, and so on and so forth. Once that happens, you've created a usage of the name that may vastly outlast the actual machine itself; this means that over time, you may well have to accumulate a whole collection of lingering MX entries for now-obsolete machines that no longer exist.

These days, it's not hard to configure both machines and mail programs to use the canonical '<user>@<our-dom>' addresses that you want and to not generate those problematic '<user>@<host>.<our-dom>' addresses. If programs (or people) do generate such addresses anyways, your next step is to fix them up in your outgoing mail gateway, forcefully rewriting them to your canonical form. Once you get all of this going you no longer need to support the <host>.<our-dom> form of addresses at all in order to make people's email work, and so in my view you're better off arranging for such addresses to never work by omitting MX entries for them (and refusing them on your external mail gateway). That way if people do mistakenly (or deliberately) generate such addresses and let them escape to the outside world, they break immediately and people can immediately fix things. You aren't stuck with addresses that will work for a while and then either impose long-term burdens on you or break someday.

The obvious exception here is cases where you actually do want a hostname to work in email and be accepted in general; perhaps you want, say, '<user>@support.<our-dom>' to work, or '<user>@www.<our-dom>', or the like. But then you can actively choose to add an MX entry (and any other special processing you may need), and you can always defer doing this until the actual need materializes instead of doing it in anticipation when you set up the machine's other DNS.

(If you want only some local usernames to be accepted for such hostnames, say only 'support@support.<our-dom>', you'll obviously need to do more work in your email system. We haven't gone to this extent so far; all local addresses are accepted for all hostname and domain name variants that we accept as 'us'.)

sysadmin/AvoidingMXEntriesForHosts written at 23:48:13; Add Comment


Why exposing only blocking APIs are ultimately a bad idea

I recently read Marek's Socket API thoughts, which mulls over a number of issues and ends with the remark:

But nonetheless, I very much like the idea of only blocking API's being exposed to the user.

This is definitely an attractive idea. All of the various attempts at select() style APIs have generally not gone well, high level callbacks give you 'callback hell', and it would be conceptually nice to combine cheap concurrency with purely blocking APIs to have our cake and eat it too. It's no wonder this idea comes up repeatedly and I feel the tug of it myself.

Unfortunately, I've wound up feeling that it's fundamentally a mistake. While superficially attractive, attempting to do this in the real world is going to wind up with an increasingly ugly mess in practice. For the moment let's set aside the issue that cheap concurrency is fundamentally an illusion and assume that we can make the illusion work well enough here. This still leaves us with the select() problem: sooner or later the result of one IO will make you want to stop doing another waiting IO. Or more generally, sooner or later you'll want to stop doing some bit of blocking IO as the result of other events and processing inside your program.

When all IO is blocking, separate IO must be handled by separate threads and thus you need to support external (cross-thread) cancellation of in-flight blocked IO out from underneath a thread. The moment you have this sort of unsynchronized and forced cross-thread interaction, you have a whole collection of thorny concurrency issues that we have historically not been very good at dealing with. It's basically guaranteed that people will write IO handling code with subtle race conditions and unhandled (or mishandled) error conditions, because (as usual) they didn't realize that something was possible or that their code could be trying to do thing X right as thing Y was happening.

(I'm sure that there are API design mistakes that can and will be made here, too, just as there have been a series of API design mistakes around select() and its successors. Even APIs are hard to get completely right in the face of concurrency issues.)

There is no fix for this that I can see for purely blocking APIs. Either you allow external cancellation of blocked IO, which creates the cross-thread problems, or you disallow it and significantly limit your IO model, creating real complications as well as limiting what kind of systems your APIs can support.

(For the people who are about to say 'but Go makes it work', I'm afraid that Go doesn't. It chooses to limit what sort of systems you can build, and I'm not just talking about the memory issues.)

PS: I think it's possible to sort of square the circle here, but the solution must be deeply embedded into the language and its runtime. The basic idea is to create a CSP like environment where waiting for IO to complete is a channel receive or send operation, and may be mixed with other channel operations in a select. Once you have this, you have a relatively clean way to cancel a blocked IO; the thread performing the IO simply uses a multi-select, where one channel is the IO operation and another is the 'abort the operation' channel. This doesn't guarantee that everyone will get it right, but it does at least reduce your problem down to the existing problem of properly handling channel operation ordering and so on. But this is not really a 'only blocking API' as we normally think of it and, as mentioned, it requires very deep support in the language and runtime (since under the hood this has to actually be asynchronous IO and possibly involve multiple threads).

This is also going to sometimes be somewhat of a lie, because on many systems there is a certain amount of IO that is genuinely synchronous and can't be interrupted at all, despite you putting it in a multi-channel select statement. Many Unixes don't really support asynchronous reads and writes from files on disk, for example.

programming/PureBlockingAPIsWhyBad written at 23:53:52; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.