Wandering Thoughts

2020-01-28

More badly encoded MIME Content-Disposition headers

One of the things that our system for recording email attachment type information logs is the MIME Content-Disposition header for a MIME part, if it exists. Another thing it logs is the extension of the claimed MIME filename, if this information is part of the MIME headers for that part (people and programs don't always put it in). Under normal circumstances, the filename of a MIME part is given as a 'filename=...' parameter on the Content-Disposition header (although it may also come from a 'name=...' parameter on the Content-Type, for historical reasons (via)).

Now suppose that your attachment has a non-ASCII filename and you want to put it in the MIME headers, which are theoretically ASCII only. How you're supposed to deal with this is somewhat tangled. In theory you're apparently supposed to use RFC 2231, which defines a whole encoding scheme. In practice this encoding scheme seems to be pretty rare and not frequently observed; instead, the common thing to do seems to be to use RFC 2047 encoding for just the filename (possibly inside quotes). This is how most software does it.

(I've seen a message that used both at once, with the name= parameter on Content-Type encoded using RFC 2047 and the entire Content-Disposition header done with RFC 2231. I didn't cross-check to see whether the filenames came out the same.)

Of course, when you run an anti-spam system and turn over rocks by looking at your logs, sometimes you find surprises. When I was looking at our log of attachment information recently, I discovered one with an attachment type that looked like the following:

=?utf-8?q?attachment=3b_filename=3d=22tnt_import_clear?= =?utf-8?q?ance=e2=80=93_consignment_=239066721066-pdf=2eace=22?=

(I've broken this into two pieces for the blog, but it was one originally.)

If we decode this following RFC 2047, we get:

attachment; filename="tnt import clearance– consignment #9066721066-pdf.ace"

This seems to be some piece of malware that's used RFC 2047 encoded-word syntax on the entire Content-Disposition header, rather than just the filename. Whether anyone's email software will interpret this in a way that's useful for the malware is an open question, but probably they will since the malware does this. Some software will certainly not interpret it, and unfortunately part of that software is our system for rejecting email with bad attachment types at SMTP time.

(The reason that this came to my attention was that our commercial anti-spam software rejected it as CXmail/MalPE-AW while the unofficial ClamAV signatures we use detected it as 'Sanesecurity Malware 25738 AceHeur Exe'. Since we normally reject email with .ace attachments at SMTP time before we do further anti-virus checking, this would have been rejected immediately if not for its very encoded Content-Disposition, which prevented our current setup from recognizing it.)

It turns out that this isn't even the first time I've spotted and noted these things; back at the end of 2018, I wrote about some odd Content-Dispositions, including ones that were mis-encoded this way. At the time I don't seem to have noted ones that were attachment types that we'd have rejected, so I didn't think of it as a high priority to deal with in our attachment logging software.

Since pretty much all of these that we've seen are spam and malware, and this is unambiguously incorrect (and may be intended in part to evade anti-virus systems), I'm a bit tempted to make our external mail gateway reject all email with these badly encoded Content-Dispositions. That would save us from having to deal with various cases of decoding these things and then trying to parse the resulting header ourselves.

spam/EncodedMimeContentDisposition written at 00:19:02; Add Comment

2020-01-26

The real world is mutable (and consequences for system design)

Every so often I see people put together systems on the Internet that are designed to be immutable and permanent (most recently Go). I generally wince, and perhaps sigh to myself, because sooner or later there are going to be problems. The reality of life is that the real world is not immutable. I mean that at two levels. The first is that sometimes people make mistakes and publish things that they very strongly wish and need to change or retract. Pretending that they do not is ignoring reality. Beyond that, things in the real world are almost always mutable and removable because lawyers can show up on your doorstep with a court order to make them so, and the court generally doesn't care about what problems your choice of technology has created for you in complying. If the court says 'stop serving that', you had better do so (or have very good lawyers).

It's my view that designing systems without considering this creates two problems, one obvious and one not obvious. The obvious one is that on the day when the lawyers show up on your front door, you're going to have a problem; unless you enjoy the varied and generally unpleasant consequences of defying a court order, you're going to have to mutate your immutable thing (or perhaps shut it down entirely). If you're having to do this from a cold start, without any advance consideration of the issue, the result may be disruptive (and obviously shutting down entirely is disruptive, even if it's only temporary as you do a very hasty hack rewrite so that you can block certain things or whatever).

The subtle problem is that by creating an immutable system and then leaving it up to the courts to force you to mutate it, you've created a two-tier system. Your system actually supports deletions and perhaps modifications, but only for people who can afford expensive lawyers who can get those court orders that force you to comply. Everyone else is out of luck; for ordinary people, any mistakes they make are not fixable, unlike for the powerful.

(A related problem is that keeping your system as immutable as possible is also a privilege extended more and more to powerful operators of the service. Google can afford to pay expensive lawyers to object to proposed court orders calling for changes in their permanent proxy service; you probably can't.)

As a side note, there's also a moral dimension here, in that we know that people will make these mistakes and will do things that they shouldn't have, that they very much regret, and that sometimes expose them to serious consequences if not corrected (whether personal, professional, or for organizations). If people design a system without an escape hatch (other than what a court will force them to eventually provide), they're telling these people that their suffering is not important enough. Perhaps the designers want to say that. Perhaps they have strong enough reasons for it. But please don't pretend that there will never be bad consequences to real people from these design decisions.

PS: There's also the related but very relevant issue of abuse and malicious actors, leading to attacks such as the one that more or less took down the PGP Web of Trust. Immutability means that any such things that make it into the system past any defenses you have are a problem forever. And 'forever' can be a long time in Internet level systems.

tech/RealWorldIsMutable written at 22:06:11; Add Comment

How big our Prometheus setup is (as of January 2020)

I talked about about our setup of Prometheus and Grafana, but what I didn't discuss then is how big it is on various measures; things like how much disk space our Prometheus database takes, how many endpoints we're monitoring, how many metrics we have, how much cardinality is involved, and so on. Today I feel like running down all of those numbers for various reasons.

We started our production Prometheus setup on November 21st and it's been up since then, although the amount of metrics we've collected has varied over time (generally going up). At the moment our metrics database is using 815 GB, including 5.7 GB of WAL. Over roughly 431 days, that's averaged about 1.9 GB a day (and over the past seven days, we seem to be growing at about 1.97 GB a day, so that's more representative of our current growth rate).

At the moment we have 674 different targets that Prometheus scrapes. These range from Blackbox external probes of machines to the Prometheus host agent, so the number of metrics from each target varies considerably. Our major types of targets are Blackbox checks other than pings (260), Blackbox pings (199), and the host agent (108 hosts).

In terms of metrics, Prometheus's status information is currently reporting that we have 1,101 different metrics and 479,161 series in total. Our highest cardinality metrics are the host agent's metrics for systemd unit states (53,470 series) and a local series of metrics for Linux's NFS mountstats that condense them down to only 27,146 series (if we used the host agent's native support for this information, there would be a lot more). Our highest cardinality label is 'user', which we use both for per-user disk space usage information and VPN usage (with mostly overlapping user names). Our highest source of series is the host agent, unsurprisingly, with 449,669 of our series coming from it. The second highest is Pushgateway, which is responsible for 16,047 series. If you want to find out this detail for your own Prometheus setup, the query you want is:

sort_desc( count({__name__!=""}) by (job) )

The systemd unit state reporting generates so many series because the host agent generates a metric for every unit it reports on for every systemd state the unit can be in:

node_systemd_unit_state{ ..., name="cron.service", state="activating"}   0
node_systemd_unit_state{ ..., name="cron.service", state="active"}       1
node_systemd_unit_state{ ..., name="cron.service", state="deactivating"} 0
node_systemd_unit_state{ ..., name="cron.service", state="failed"}       0
node_systemd_unit_state{ ..., name="cron.service", state="inactive"}     0

Five series for each system unit adds up fast, even if you only have the host agent look at systemd .service units (normally it looks at more).

At the moment Prometheus appears to be adding on average about 31,000 samples a second to its database. The Prometheus process is currently reporting about 3.4 GB of resident RAM (on a 32 GB machine), although that undoubtedly fluctuates based on how many people are looking at our Grafana dashboards at any given time, as well as things like WAL compaction. It's using about 10% to 15% of a nominal single CPU (on a four-core machine with HT enabled). Outside of periodic spikes (which are probably for WAL compaction), the server as a whole runs at about 300 KB to 400 KB a second of writes; including all activity, the long term write bandwidth is about 561 KB/s. The incoming network bandwidth over the long term is about 345 KB/sec. All of this shows that we're not exactly stressing the machine.

(The machine has 32 GB of RAM not for its ordinary needs but to deal with RAM spikes due to complex ad-hoc queries. I've run the machine out of memory before when it had 16 GB. With 32 GB, we have more headroom and have been able to raise Prometheus query limits so we can support longer time ranges in our dashboards.)

sysadmin/PrometheusOurSize-2020-01 written at 01:41:47; Add Comment

2020-01-25

A network interface losing and regaining signal can have additional effects (in Linux)

My office at work features a dearth of electrical sockets and as a result a profusion of power bars and other means of powering a whole bunch of things from one socket. The other day I needed to reorganize some of the mess, and as part of that I wound up briefly unplugging the power supply for my 8-port Ethernet switch that my office workstation is plugged into. Naturally this meant that the network interface lost signal for a bit (twice, because I wound up shuffling the power connection twice). Nothing on my desktop really noticed, including all of the remote X stuff I do, so I didn't think more about it. However, when I got home, parts of my Wireguard tunnel didn't work. I eventually fixed the problem by restarting the work end of my Wireguard setup, which does a number of things that including turning on IP(v4) forwarding on my workstation's main network interface.

I already knew that deleting and then recreating an interface entirely can have various additional effects (as happens periodically when my PPPoE DSL connection goes away and comes back). However this is a useful reminder to me that simply unplugging a machine from the network and then plugging it in can have some effects too. Unfortunately I'm not sure what the complete list of effects is, which is somewhat of a problem. Clearly it includes resetting IP forwarding, but there may be other things.

(All of this also depends on your system's networking setup. For instance, NetworkManager will deconfigure an interface that goes down, while I believe that without it, the interface's IP address remains set and so on.)

I'm not sure if there's any good way to fix this so that these settings are automatically re-applied when an interface comes up again. Based on this Stackexchange question and answer, the kernel doesn't emit a udev event on a change in network link status (it does emit a netlink event, which is probably how NetworkManager notices these things). Nor is there any sign in the networkd documentation that it supports doing something on link status changes.

(Possibly I need to set 'IgnoreCarrierLoss=true' in my networkd settings for this interface.)

My unfortunate conclusion here is that if you have a complex networking setup and you lose link carrier on one interface, the simplest way to restore everything may be to reboot the machine. If this is not a good option, you probably should experiment in advance to figure out what you need to do and perhaps how to automate it.

(Another option is to work out what things are cleared or changed in your environment when a network interface loses carrier and then avoid using them. If I turned on IP forwarding globally and then relied on a firewall to block undesired forwarding, my life would probably be simpler.)

linux/InterfaceCarrierLossHasEffects written at 00:24:59; Add Comment

2020-01-24

Go compared to Python for small scale system administration scripts and tools

We write a certain amount of scripts and tools around here. I like Go and have used it for a while, we have some tools already written in Go, and while I'm also a long term user of Python I'm on record as being unhappy with various developments around Python 3. Despite all of this, Python is the programming language I default to when I need to do something that's more complicated than a shell script (and not because of our policy on internal tools). Over time I've come to believe that Python has some important pragmatic properties in our sort of relatively small scale environment with generally modest use of local tools, despite Go's collection of appealing properties.

The first useful property Python has is that you can't misplace the source code for your deployed Python programs. Unless you do something very peculiar, what you deploy is the source code (well, a version of it). With Go you deploy a compiled artifact, which means that you may someday have to find the source code and then try to match your compiled binary up against some version of it. Of course your deployed Python program can drift out of sync with the master copy in your version control repository, but sorting that out only requires use of diff.

Closely related to this is that Python code is generally simple for people to modify and re-deploy. Modest Python scripts and tools are likely to be only a single .py file, which you can edit and copy around, and even somewhat bigger ones are likely to just be a directory that can be copied. Deploying Go code requires not just the correct source code but also a Go development environment and the knowledge of how to build Go programs from source. With Python, you can even try things out by just modifying the deployed version in place, then back-port your eventual changes to the official master copy in your version control system.

(These days Go's support for modules makes all of this simpler than it used to be, but there are still important considerations and some potential complexities.)

I further feel that for people who're only mildly familiar with the language, Python code is easier to make minor modifications to. Python's dynamic typing makes for a relatively forgiving environment for many quick things or small changes. Python 3 throws a small spanner into this with its insistence that you absolutely can't mix spaces and tabs (go on, try to explain that one to someone who's just quickly adding a line in a general editor without being familiar with Python). Between the language and the requirement for compiling things, Go puts a higher bar in for quick changes to fix up some issue.

(As a corollary, I think that Go code in a small environment is much more likely to wind up being 'owned' by only one person, with everyone else relying on them for any changes no matter how small. This is a natural outcome of needing more specialized knowledge to work with the Go code.)

These things aren't issues for Go if you've already made a commitment to it. If you have plenty of larger scale tools written in Go because of its advantages (or you've just standardized on it), you'll already have solved all of these problems; you're keeping track of source code and versions, people know Go and have build environments, everyone can confidently change Go code, and so on.

(And once you're used to the simplicity of copying a single self contained binary artifact around, various aspects of the Python experience of working with lots of modules will irritate you.)

Overall, my view is that Go is a 'go big or go home' language for system administration tools. It works at the large scale quite well, but it doesn't necessarily scale down to occasional use for small things in the way that Python scripts can. Python scripts tolerate casual environments much more readily than Go does. Most smaller scale environments like ours won't be able to commit to Go this way, if only because we simply don't do all that much programming on an ongoing basis.

(This is related to the convenience of people writing commands in Python.)

Sidebar: My view on writing small tools

I also feel that it's simpler and faster to develop relatively small programs in Python than in Go. If I want to process some text in a way that's a bit too complicated for a sensible shell script (for example), Python gives me an environment where it's very easy to put something together and then iterate rapidly on it as I come to understand more about what I want (and the problem). Using Go would put a lot more up front work in the way of running code that solves my problem. All of this makes it more natural (for me) to use Python for small programs, such as parsing program output to generate VPN activity metrics.

(Large programs unquestionably benefit from the discipline that Go requires. It's possible to write clean large Python programs, but it's also easy to let them drift into awkward, 1600 line tangled semi-monstrosities as they grow step by step.)

sysadmin/SysadminGoVsPython written at 01:24:34; Add Comment

2020-01-23

What we've written in Go at work and how it came about (as of January 2020)

In comments on yesterday's entry, Joseph asked about what we're using Go for here at work. As it happens, there's a story or two here, because the starting point is that some years ago we decided to standardize on using only a few languages for our internal tools (later updated to switch to Python 3) and Go was not one of those languages. This wasn't because I didn't like Go (I've used it for a fairly long while); instead it was a tradeoff. Go didn't bring anything strongly useful for what we were doing and it would have been another language and toolchain for my co-workers to learn and deal with.

(Systems and tools we adopt from the outside can be in anything because we don't write and maintain them ourselves; they're mostly black boxes. As it happens, a lot of Prometheus related programs are written in Go, so in that sense we're running and using a fairly significant amount of Go code in production. But it's not our Go code.)

The first thing we wrote in Go was a SSH host key checker that is a core part of our local NFS mount authentication system. In one mode of operation, this checker has two challenges; we want to check a lot of machines at once in as short a time as possible, and doing the SSH handshake involves cryptography and so is somewhat CPU intensive. Since Go has reasonably CPU-efficient native code cryptography and a great story for parallelism, it was clearly the easiest way to write an efficient and fast program to do this. Python with threading or some sort of select() multiplexing would have been more awkward and likely taken significantly more CPU on our fileservers, and I didn't want to think about writing a security sensitive program like this in C with threading and third party SSH libraries.

The other two pieces of locally written Go that we're running are both part of our Prometheus metrics system, which makes them somewhat optional; if they don't work, we just miss some secondary metrics, and if they malfunction we can turn them off. One generates SNTP metrics (using code adopted from a SNTP query program I wrote) and the other generates metrics from Linux's NFS mountstats. I consider both of these programs to be interesting stories about some advantages Go has.

The SNTP metrics program is basically a thin wrapper around a third party package for making SNTP queries. It's written in Go because such a package exists in the Go package ecology (and I could find it), and Go makes using third party packages easy (partly because Go compiles to a single binary that includes everything). Had there been an equivalent Python package that I could have easily found and used, the SNTP metrics program could have as well been in Python (although then we'd have had to deal with the usual hassles of having a Python program with multiple modules).

I wrote the NFS mountstats metrics program in Go because I was worried about the CPU impact of processing all of our voluminous mountstats in Python (my worry may have been misplaced). CPU efficiency is another one of Go's advantages (as seen in our SSH host key checker), although we also got to use a third party package to parse mountstats which made it easier to write.

(The package we use is also likely to be better tested and more carefully written than something I would knock together just for our own use, regardless of the language that I wrote it in.)

I have opinions on Go as compared to Python in our particular unusual environment that don't fit within the margins of this entry. The short version is that I don't feel any urge to write most new things in Go instead of Python. Python remains our routine choice; Go is for things with special needs or where it offers special benefits. Being able to basically wrap someone else's package that does all of the work is a special benefit, as is CPU efficiency and Go's great concurrency story.

sysadmin/OurGoUses-2020-01 written at 00:27:21; Add Comment

2020-01-22

Why I've come to like that Go's type inference is limited

Although Go is a statically typed language, it has some degree of type inference to make your life easier and less bureaucratic. However, this type inference is limited to within a single function, so Go won't do things like infer the return type of your function for you even though it could. When I first started writing Go code (at the time, primarily coming from Python), I found this limitation irritating. The Go compiler could perfectly well see the types involved (and would complain if I got them wrong), so it felt annoying that it made me declare them again. Over time, I've come to appreciate this limitation and find it a good thing.

The obvious problem you avoid by limiting type inference to only within a single function is what I will call 'spooky type errors at a distance'. If return types can be inferred, you can have an entire chain of functions with inferred return types; you call A who calls B who calls C who calls D, and you use the result in some way that requires it to be of a type (or compatible with an interface). Now suppose D changes its return type. This change in return type will propagate up through the chain of inferred types until it hits your function and generates a type error where the new inferred type isn't compatible with what you're doing with it any more. D's type change has propagated to cause errors not in C but in you, far away from the change itself.

(One advantage of the type error happening in C is that C is the code that directly deals with D, and so it's the code and the people who are most familiar with what's going on, what they could change to deal with it, and so on. You and your function may have no real understanding of D or even have never heard of it before.)

Avoiding spooky type errors at a distance also means that you avoid arguments and decisions about where they should be fixed. With specified return types, if D's return type changes either C must fix it or change its own API by visibly changing its return type. If return types are inferred, you could maybe fix this anywhere in the call stack, from you on down. Each different fix would probably have different implications, some of them hard to track. With fixed return types, you avoid all that; it's always clear who has to change next and what the likely consequences are.

As a consequence of all of this, the effects of changing a return type are much more visible and obvious. With type inference for return types, you can tell yourself that no one will notice right up until the point that actually someone does. I've done this in my own Python code, when I forgot that some usage far away from the equivalent of function D depended on some property that I was now silently changing.

Since Go is designed in the service of large scale software engineering, I think this is the right trade-off for Go to make. Spooky action at a distance is exactly what you don't want in something designed for large scale software engineering, because that far off thing is probably written and maintained by an entirely different bunch of people from you. Even spooky action at a distance within your own package makes the effects and impact of changes less clear. Being straightforward is an advantage.

(When imagining a hypothetical Go with return type inference, let's assume that you don't allow type inference across package boundaries, because going that far opens a very large can of worms. This would mean that exported names had different type rules than unexported ones.)

programming/GoLimitedTypeInferenceLike written at 01:11:02; Add Comment

2020-01-20

The value of automation having ways to shut it off (a small story)

We have some old donated Dell C6220 blades that we use as SLURM based compute servers. Unfortunately, these machines appear to have some sort of combined hardware and software fault that causes them to lock up periodically under some loads (building Go from source with full tests is especially prone to triggering it). Fortunately these machines support IPMI and so can be remotely power cycled, and a while back we got irritated enough at the lockups that we set up their IPMIs and built a simple cron-based set of scripts to do this for us automatically.

(The scripts take the simple approach of detecting down machines through looking for alerts in our Prometheus system. To avoid getting in our way, they only run outside of working hours; during the working day, if a Dell C6220 blade goes down we have to run the 'power cycle a machine via IPMI' script by hand against the relevant machine. This lets us deliberately shut down machines without having them suddenly restarted on us.)

All of these Dell C6220 blades are located in a secondary machine room that has the special power they need. Unfortunately, this machine room's air conditioner seems to have developed some sort of fault where it just stops working until you turn it off, wait a bit, and turn it back on. Of course this isn't happening during the working day; instead it's happened in the evenings or night (twice, recently). When this happens and we see the alerts from our monitoring system, we notify the relevant people and then power off all or almost all of the servers in the room, including the Dell C6220 blades.

You can probably see where this is going. Fortunately we thought of the obvious problem here before we started powering down the C6220 blades, so both times we just manually disabled the cron job that auto-restarts them. However, you can probably imagine what sort of problems we might have if we had a more complex and involved system to automatically restart nodes and servers that were 'supposed' to be up; in an unusual emergency situation like this, we could be fighting our own automation if we hadn't thought ahead to build in some sort of shutoff switch.

Or in short, when you automate something, think ahead to how you'll disable the automation if you ever need to. Everything needs an emergency override, even if that's just 'remove the cron job that drives everything'.

It's fine if this emergency stop mechanism is simple and brute force. For example, our simple method of commenting out the cron job is probably good enough for us. We could build a more complex system (possibly with finer-grained controls), but it would require us to remember (or look up) more about how to shut things off.

We could also give the auto-restart system some safety features. An obvious one would be to get the machine room temperature from Prometheus and refuse to start up any of the blade nodes if it's too hot. This is a pretty specific safety check, but we've already had two AC incidents in close succession so we're probably going to have more. A more general safety check would be to refuse to turn on blades if there were too many down, on the grounds that a lot of blades being down is almost certainly not because of the problem that the script was designed to deal with.

sysadmin/AutomationShutoffValue written at 23:54:20; Add Comment

2020-01-19

Python 2, Apache's mod_wsgi, and its future in Linux distributions

Sometimes I have small questions about our future with Python 2, instead of big ones. Our Django web application currently runs in Apache using mod_wsgi, and the last time we attempted a Django upgrade (which is a necessary step in an upgrade to Python 3), it didn't go well. This means that we may wind up caring quite a bit about how long Ubuntu and other Linux distributions will package a version of mod_wsgi that still supports Python 2, instead of just Python 3 (assuming that the Linux distribution even provides Python 2 at all).

Fedora 31 currently still provides a Python 2 sub-package of mod_wsgi, but this should be gone in Fedora 32 since it naturally depends on Python 2 and all such (sub-)packages are supposed to be purged. Debian's 'unstable' also currently seems to have the Python 2 version of mod_wsgi, but it's included in Debian's list of Python 2 related packages to be removed (via), so I suspect it will be gone from the next stable Debian release.

(Debian is also getting rid of Python 2 support for uwsgi, which could be another way of running our WSGI application under Apache.)

What Ubuntu 20.04 will look like is an interesting question. Right now, the in-progress state of Ubuntu 'focal' (what will be 20.04) includes a libapache2-mod-wsgi package using Python 2. However, this package is listed in Ubuntu's list of Python 2 related packages to remove (via). Ubuntu could still remove the package (along with others), or it could now be too close to the release of 20.04 for the removal to be carried through by then.

(I believe that Ubuntu usually freezes their package set a decent amount of time before the actual release in order to allow for testing, and perhaps especially for LTS releases. I may be wrong about this, because the Ubuntu Focal Fossa release schedule lists the Debian import freeze as quite late, at the end of February.)

Even if the Python 2 version of mod_wsgi manages to stay in Ubuntu 20.04 LTS (perhaps along with other Python 2 WSGI gateways), it will definitely be gone by the time of Ubuntu 22.04, which is when we'd normally upgrade the server that currently hosts our Django web app. So by 2022, we need to have some solution for our Python 2 problem with the app, whatever it is.

python/Python2ApacheWsgiFuture written at 22:17:05; Add Comment

Why a network connection becoming writable when it succeeds makes sense

When I talked about how Go deals with canceling network connection attempts, I mentioned that it's common for the underlying operating system to signal you that a TCP connection (or more generally a network connection) has been successfully made by letting it become writable. On the surface this sounds odd, and to some degree it is, but it also falls out of what the operating system knows about a network connection before and after it's made. Also, in practice there is a certain amount of history tied up in this particular interface.

If we start out thinking about being told about events, we can ask what events you would see when a TCP connection finishes the three way handshake and becomes established. The connection is now established (one event), and you can generally now send data to the remote end, but usually there's no data from the remote end to receive so you would not get an event for that. So we would expect a 'connection is established' event and a 'you can send data' event. If we want a more compact encoding of events, it's quite tempting to merge these two together into one event and say that a new TCP connection becoming writable is a sign that its three way handshake has now completed.

(And you certainly wouldn't expect to see a 'you can send data' event before the three way handshake finishes.)

The history is that a lot of the fundamental API of asynchronous network IO comes from BSD Unix and spread from there (even to non-Unix systems, for various reasons). BSD Unix did not use a more complex 'stream of events' API to communicate information from the kernel to your program; instead it used simple and easy to implement kernel APIs (because this was the early 1980s). The BSD Unix API was select(), which passes information back and forth using bitmaps; one bitmap for sending data, one bitmap for receiving data, and one bitmap for 'exceptions' (whatever they are). In this API, the simplest way for the kernel to tell programs that the three way handshake has finished is to set the relevant bit in the 'you can send data' bitmap. The kernel's got to set that bit anyway, and if it sets that bit and also sets a bit in the 'exceptions' bitmap it needs to do more work (and so will programs; in fact some of them will just rely on the writability signal, because it's simpler for them).

Once you're doing this for TCP connections, it generally makes sense for all connections regardless of type. There are likely to be very few stream connection types where it makes sense to signal that you can now send (more) data partway through the connection being established, and that's the only case where this use of signaling writability gets in the way.

tech/ConnectingAndWritability written at 01:09:43; Add Comment

(Previous 10 or go back to January 2020 at 2020/01/18)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.