Wandering Thoughts archives


Why Let's Encrypt's short certificate lifetimes are a great thing

I recently had a conversation on Twitter about what we care about in TLS certificate sources, and it got me to realize something. I've written before about how our attraction to Let's Encrypt has become all about the great automation, but what I hadn't really thought about back then was how important the short certificate lifetimes are. What got me to really thinking about it was a hypothetical; suppose we could get completely automatically issued and renewed free certificates but they had the typical one or more year lifetime of most TLS certificates to date. Would we be interested? I realized that we would not be, and that we would probably consider the long certificate lifetime to be a drawback, not a feature.

There is a general saying in modern programming to the effect that if you haven't tested it, it doesn't work. In system administration, we tend towards a modified version of that saying; if you haven't tested it recently, it doesn't work. Given our generally changing system environments, the recently is an important qualification; it's too easy for things to get broken by changes around them, so the longer it's been since you tried something, the less confidence you can have in it. The corollary for infrequent certificate renewal is obvious, because even in automated systems things can happen.

With Let's Encrypt, we don't just have automation; the short certificate lifetime insures that we exercise it frequently. Our client of choice (acmetool) renews certificates when they're 30 days from expiring, so although the official Let's Encrypt lifetime is 90 days, we roll over certificates every sixty days. Having a rollover happen once every two months is great for building and maintaining our confidence in the automation, in a way that wouldn't happen if it was once every six months, once a year, or even less often. If it was that infrequent, we'd probably end up paying attention during certificate rollovers even if we let automation do all of the actual work. With the frequent rollover due to Let's Encrypt's short certificate lifetimes, they've become things we trust enough to ignore.

(Automatic certificate renewal for long duration certificates is not completely impossible here, because the university central IT has already arranged for free certificates for the university. Right now they're managed through a website and our university-wide authentication system, but in theory there could be automation for at least renewals. Our one remaining non Let's Encrypt certificate was issued through this service as a two year certificate.)

sysadmin/LetsEncryptDurationGood written at 01:24:45; Add Comment


A spammer misses a glorious opportunity

Most of the spam that I collect on the machines that I run my sinkhole SMTP server on is boring spam. Since it's boring, I've tried to block as much of it as possible; still, there are plenty of cases that get through, because that sort of spam can come from all over. Today I got what initially looked like one of those boring spams that sneak through. It appeared in my log like this:

[...] from / <REDACTED@justice.gov.za> to <REDACTED>: [...] helo 'mail3.justice.gov.za' [...]

I saw that and shrugged; clearly it was another forged advance fee fraud spam, just like the ones claiming to be from the FBI. But when I looked at the full metadata of the logged message, I got a surprise. There in the metadata was the resolved, verified DNS name of the sending IP and it was mail3.justice.gov.za. This wasn't email pretending to be from the South Africa's Department of Justice; this actually was email from one of the DoJ's mail servers. The reverse DNS is real and valid, and in fact this IP is one of the four MX servers for justice.gov.za (a second MX server is right beside it in that /24).

So why do I call this a spammer missing a glorious opportunity? Well, let me show you the important bits of the spam message itself:

From: REDACTED <REDACTED@justice.gov.za>
To: "info@cc.com" <info@cc.com>

To All,

Today Monday 12th of March 2018. We are shutting down your present web-mail to create space for 2018 Outlook Web Access with a high visual definition and Space.
This service creates more space and easy access to email. Please update your account by clicking on the link below and fill information for Activation.


That's right. Given the golden opportunity of access to the real, legitimate mail servers of the Department of Justice of South Africa (likely via a compromised account), the spammer used it to send not the most genuine looking advance fee fraud you could imagine, but a garden variety, completely untargeted phish spam.

Of course there's decent, boring reasons for this. For a start, the actual IP address source of advance fee fraud spam is completely unimportant, because the recipients who will even think of checking that aren't the kind of people who will fall for the spam in the first place. If anything, advance fee fraud spammers apparently may deliberately make their spam look bad and suspicious, so that anyone who actually answers is highly likely to be gullible enough to go through with the whole thing, instead of wasting their time. If that's so, sending from the real justice.gov.za is, if anything, a thing to avoid.

Still, I wish the spam message had been advance fee fraud. That's the way the universe should be when you get the chance to use justice.gov.za for your spam.

spam/SpammerMissedOpportunity written at 22:54:33; Add Comment

Linux is good at exposing the truth of how motherboards are wired

One of the things I've learned over time, sometimes the hard way, is that Linux (and other open source operating systems) are brutally honest about how various things on motherboards are actually hooked up. As a result, they are a good way of exposing any, well, let us call them 'convenient inaccuracies' in how motherboard manuals present things. The major source of inaccuracies that I've tended to run across has been SATA port numbering, and on servers we've also had Ethernet port numbering issues.

(Non-servers tend not to have issues with Ethernet port numbering because they only have at most one. Servers can have multiple ones, sometimes split between multiple chipsets for extra fun.)

Typical motherboards present a nice clear, coherent picture of their SATA port numbering and how it winds up in physical ports on the motherboard. Take, for example, the Asus Prime X370-Pro, a Ryzen motherboard that I happen to have some recent experience with. The manual for this motherboard, the board itself, and the board's BIOS, will all tell you that it has eight SATA ports, numbered 1 through 8. Each set of ports uses a dual connector and those connectors are in a row, with 1-2 on the bottom running up through 7-8 at the top.

(As usual, the manual doesn't tell you whether the top port or the bottom port in a dual connector is the lower numbered one. It turns out to be the top one. I don't count this as an inaccuracy as everything agrees on it once you can actually check.)

Linux will tell you that this is not accurate. From the bottom up, the ports actually run 1-2, 5-6, 3-4, 7-8; that is, the middle pairs of ports have been flipped (but not the two ports within a pair of ports; the lower numbered one is still on the top connector). This shows up in Linux's /dev/sd* enumeration, the underlying ataN kernel names, and Linux SCSI host names, and all of them are consistent with this reversed numbering. I assume that any open source OS would show the same results, since they're all likely looking directly at what the hardware tells them and ignoring any BIOS tables that might attempt to name various things.

(I don't know if the BIOS exposes its port naming in any OS-visible tables, but it seems at least plausible that the BIOS does. Certainly it seems likely to cause confusion in Windows users if the OS calls the devices one thing and the BIOS calls them another, and BIOS vendors are usually pretty motivated to not confuse Windows users. The motherboard's DMI/SMBIOS data does appear to have some information about the SATA ports, although I don't know if DMI contains enough detail to match up specific SATA ports with their DMI names.)

I have to assume that motherboard makers have good reasons for such weird port numbering issues. Since I have very little knowledge here, all I can do is guess and speculate, and the obvious speculation is wire routing issues that make it easier to flip some things around. Why only the middle two sets of ports would be flipped is a mystery, though.

(This is not the first time I've had to figure out the motherboard SATA port numbering; I think that was one of the issues here, for example, although there is no guarantee that the BIOS mapping matches the mapping on the physical motherboard and in the manual.)

tech/MotherboardWiringLies written at 00:42:29; Add Comment


A bad web scraper operating out of OVH IP address space

I'll start with my tweet:

I've now escalated to blocking entire OVH /16s to deal with the referer-forging web scraper that keeps hitting my techblog from OVH network space; they keep moving around too much for /24s.

I have strong views on forged HTTP referers, largely because I look at my Referer logs regularly and bogus entries destroy the usefulness of those logs. Making my logs noisy or useless is a fast and reliable way to get me to block sources from Wandering Thoughts. This particular web scraper hit a trifecta of things that annoy me about forged refers; the referers were bogus (they were for URLs that don't link to here), they were generated by a robot instead of a person, and they happened at volume.

The specific Referer URLs varied, but when I looked at them they were all for the kind of thing that might plausibly link to here; they were all real sites and often for recent blog entries (for example, one Referer URL used today was this openssl.org entry). Some of the Referers have utm_* query parameters that point to Feedburner, suggesting that they came from mining syndication feeds. This made the forged Referers more irritating, because even in small volume I couldn't dismiss them out of hand as completely implausible.

(Openssl.org is highly unlikely to link to here, but other places used as Referers were more possible.)

The target URLs here varied, but whatever software is doing this appears to be repeatedly scraping only a few pages instead of trying to spider around Wandering Thoughts. At the moment it appears to mostly be trying to scrape my recent entries, although I haven't done particularly extensive analysis. The claimed user agents vary fairly widely and cover a variety of browsers and especially of operating systems; today a single IP address claimed to be a Mac (running two different OS X versions), a Windows machine with Chrome 49, and a Linux machine (with equally implausible Chrome versions).

The specific IP addresses involved vary but they've all come from various portions of OVH network space. Initially there were few enough /24s involved in each particular OVH area that I blocked them by /24, but that stopped being enough earlier this week (when I made my tweet) and I escalated to blocking entire OVH /16s, which I will continue to do so as needed. Although this web scraper operates from multiple IP addresses, they appear to add new subnets only somewhat occasionally; my initial set of /24 blocks lasted for a fair while before they started getting through with new sources. So far this web scraper has not appeared anywhere outside of OVH, and with its Referer forging behavior I would definitely notice if it did.

(I've considered trying to block only OVH requests with Referer headers in order to be a little specific, but doing that with Apache's mod_rewrite appears likely to be annoying and it mostly wouldn't help any actual people, because their web browser would normally send Referer headers too. If there are other legitimate web spiders operating from OVH network space, well, I suggest that they relocate.)

I haven't even considered sending any report about this to OVH. Among many other issues, I doubt OVH would consider this a reason to terminate a paying customer (or to pressure a customer to terminate a sub-customer). This web scraper does not appear to be attacking me, merely sending web requests that I happen not to like.

(By 'today' I mean Saturday, which is logical today for me as I write this even if the clock has rolled past midnight.)

Sidebar: Source count information

Today saw 159 requests from 31 different IP addresses spread across 18 different /24s (and 10 different /16s). The most prolific IPs where the following ips:


None of these seem to be on any prominent DNS blocklists (not that I really know what's a prominent DNS blocklist any more, but they're certainly not on the SBL, unlike some people who keep trying).

web/OVHBadWebScraper written at 01:49:40; Add Comment


In Fedora, your initramfs contains a copy of your sysctl settings

It all started when I discovered that my office workstation had wound up with its maximum PID value set to a very large number (as mentioned in passing in this entry). I managed to track this down to a sysctl.d file from Fedora's ceph-osd RPM package, which I had installed for reasons that are not entirely clear to me. That was straightforward. So I removed the package, along with all of the other ceph packages, and rebooted for other reasons. To my surprise, this didn't change the setting; I still had a kernel.pid_max value of 4194304. A bunch of head scratching ensued, including extreme measures like downloading and checking the Fedora systemd source. In the end, the culprit turned out to be my initramfs.

In Fedora, dracut copies sysctl.d files into your initramfs when it builds one (generally when you install a kernel update), and there's nothing that forces an update or rebuild of your initramfs when something modifies what sysctl.d files the system has or what they contain. Normally this is relatively harmless; you will have sysctl settings applied in the initramfs and then reapplied when sysctl runs a second time as the system is booting from your root filesystem. If you added new sysctl.d files or settings, they won't be in the initramfs but they'll get set the second time around. If you changed sysctl settings, the initramfs versions of the sysctl.d files will set the old values but then your updated settings will get set the second time around. But if you removed settings, nothing can fix that up; the old initramfs version of your sysctl.d file will apply the setting, and nothing will override it later.

(In Fedora 27's Dracut, this is done by a core systemd related Dracut module in /usr/lib/dracut/modules.d, 00systemd/module-setup.sh.)

It's my view that this behavior is dangerous. As this incident and others have demonstrated, any time that normal system files get copied into initramfs, you have the chance that the live versions will get out of sync with the versions in initramfs and then you can have explosions. The direct consequence of this is that you should strive to put as little in initramfs as possible, in order to minimize the chances of problems and confusion. Putting a frozen copy of sysctl.d files into the initramfs is not doing this. If there are sysctl settings that have to be applied in order to boot the system, they should be in a separate, clearly marked area and only that area should go in the initramfs.

(However, our Ubuntu 16.04 machines don't have sysctl.d files in their initramfs, so this behavior isn't universal and probably isn't required by either systemd or booting in general.)

Since that's not likely to happen any time soon, I guess I'm just going to have to remember to rebuild my initramfs any time I remove a sysctl setting. More broadly, I should probably adopt a habit of preemptively rebuilding my initramfs any time something inexplicable is going on, because that might be where the problem is. Or at least I should check what the initramfs contains, just in case Fedora's dracut setup has decided to captured something.

(It's my opinion that another sign that this is a bad idea in general is there's no obvious package to file a bug against. Who is at fault? As far as I know there's no mechanism in RPM to trigger an action when files in a certain path are added, removed, or modified, and anyway you don't necessarily want to rebuild an initramfs by surprise.)

PS: For extra fun you actually have multiple initramfses; you have one per installed kernel. Normally this doesn't matter because you're only using the latest kernel and thus the latest initramfs, but if you have to boot an earlier kernel for some reason the files captured in its initramfs may be even more out of date than you expect.

linux/FedoraInitramfsSysctl written at 23:00:24; Add Comment

Some questions I have about DDR4 RAM speed and latency in underclocked memory

Suppose, not hypothetically, that you're putting together an Intel Core i7 based machine, specifically an i7-8700, and you're not planning to overclock. All Coffee Lake CPUs have an officially supported maximum memory rate of 2666 MHz (regardless of how many DIMMs or what sort of DIMM they are, unlike Ryzens), so normally you'd just buy some suitable DDR4 2666 MHz modules. However, suppose that the place you'd be ordering from is out of stock on the 2666 MHz CL15 modules you'd normally get, but has faster ones, say 3000 MHz CL15, for essentially the same price (and these modules are on the motherboard's qualified memory list).

At this point I have a bunch of questions, because I don't know what you can do if you use these higher speed DDR4-3000 CL15 DIMMs in a system. I can think of a number of cases that might be true:

  • The DIMMs operate as DDR4-2666 CL15 memory. Their faster speed does nothing for you now, although with a future CPU and perhaps a future motherboard they would speed up.

    (Alternately, perhaps underclocking the DIMMs has some advantage, maybe slightly better reliability or slightly lower power and heat.)

  • The DIMMs can run at 2666 MHz but at a lower latency, say CL14, since DDR4-3000 CL15 has an absolute time latency of 10.00 ns and 2666 MHz CL14 is over that at 10.5 ns (if I'm doing the math right).

    This might require activating an XMP profile in the BIOS, or it might happen automatically if what matters to this stuff is the absolute time involved, not the nominal CLs. However, according to the Wikipedia entry on CAS latency, synchronous DRAM cares about the clock cycles involved and so CL15 might really be CL15 even if when you're underclocking your memory. DDR4 is synchronous DRAM.

  • The DIMMs can run reliably at memory speeds faster than 2666 MHz, perhaps all the way up to their rated 3000 MHz; this doesn't count as CPU overclocking and is fine on the non-overclockable i7-8700.

    (One possibility is that any faster than 2666 MHz memory listed on the motherboard vendor's qualified memory list is qualified at its full speed and can be run reliably at that speed, even on ordinary non-overclockable i7 CPUs. That would be nice, but I'm not sure I believe the PC world is that nice.)

  • The system can be 'overclocked' to run the DIMMs faster than 2666 MHz (but perhaps not all the way to the rated 3000 MHz), even on an i7-8700. However this is actual overclocking of the overall system (despite it being within the DIMMs' speed rating), is not necessarily stable, and the usual caveats apply.

  • You need an overclockable CPU such as an i7-8700K in order to run memory any faster than the officially supported 2666 MHz. You might still be able to run DDR4-3000 CL15 at 2666 MHz CL14 instead of CL15 on a non-overclockable CPU, since the memory frequency is not going up, the memory is just responding faster.

Modern DIMMs apparently generally come with XMP profile(s) (see also the wikichip writeup) that let suitable BIOSes more or less automatically run them at their official rated speed, instead of the official JEDEC DDR4 standard speeds. Interestingly, based on the Wikipedia JEDEC table even DDR4-2666 CL15 is not JEDEC standard; the fasted DDR4-2666 CL the table lists is CL17. This may mean that turning on an XMP profile is required merely to get 2666 MHz CL15 even with plain standard DDR4-2666 CL15 DIMMs. That would be weird, but PCs are full of weird things. One interesting potential consequence of this could be that if you have DDR4-3000 CL15 DIMMs, you can't easily run them at 2666 MHz CL15 instead of 2666 MHz CL17 because the moment you turn on XMP they'll go all the way up to their rated 3000 MHz CL15.

(I learn something new every time I write an entry like this.)

PS: People say that memory speed isn't that important, but I'm not sure I completely believe them and anyway, if I wind up with DIMMs rated for more than 2666 MHz I'd like to know what they're good for (even if the answer is 'nothing except being available now instead of later'). And if one can reliably get somewhat lower latency and faster memory for peanuts, well, it's at least a bit tempting.

tech/DDR4RAMSpeedQuestions written at 01:56:50; Add Comment


Some things I mean when I talk about 'forged HTTP referers'

One of the most reliable and often the fastest ways to get me to block people from Wandering Thoughts is to do something that causes my logs to become noisy or useless. One of those things is persistently making requests with inaccurate Referer headers, because I look at my Referer logs on a regular basis. When I talk about this, I'll often use the term 'forged' here, as in 'forged referers' or 'referer-forging web spider'.

(I've been grumpy about this for a long time.)

I have casually used the term 'inaccurate' up there, as well as the strong term 'forged'. But given that the Referer header is informational, explicitly comes with no guarantees, and is fully under the control of the client, what does that really mean? As I to use it, I tend have one of three different meanings in mind.

First, let's say what an accurate referer header is: it's when the referer header value is an honest and accurate representation of what happened. Namely, a human being was on the URL in the Referer header and clicked on a link that sent them to my page, or on the site if you only put the site in the Referer. A blank Referer header is always acceptable, as are at least some Referer headers that aren't URLs if they honestly represent what a human did to wind up on my page.

An inaccurate Referer in the broad sense is any Referer that isn't accurate. There are at least two ways for it to be inaccurate (even if it is a human action). The lesser inaccuracy is if the source URL contains a link to my page, but it doesn't actually represent how the human wound up on my page, it's just a (random) plausible value. Such referers are inaccurate now but could be accurate in another circumstances. The greater inaccuracy is if the source URL doesn't even link to my page, so it would never be possible for the Referer to be accurate. Completely bogus referers are usually more irritating than semi-bogus referers, although this is partly a taste issue (both are irritating, honestly, but one shows you're at least trying).

(I'd like better terms for these two sorts of referers; 'bogus' and 'plausible' are the best I've come up with so far.)

As noted, I will generally call both of these cases 'forged', not just 'inaccurate'. Due to my view that Referer is a human only header, I use 'forged' for basically all referers that are provided by web spiders and the like. I can imagine circumstances when I'd call Referer headers sent by a robot as merely 'inaccurate', but they'd be pretty far out and I don't think I've ever run into them.

The third case and the strongest sense of 'forged' for me is when the Referer header has clearly been selected because the web spider is up to no good. One form of this is Referer spamming (which seems to have died out these days, thankfully). Another form is when whatever is behind the requests looks like it's deliberately picking Referer values to try to evade any security precautions that might be there. A third form is when your software uses the Referer field to advertise yourself in some way, instead of leaving this to the User-Agent field (which has happened, although I don't think I've seen it recently).

(Checking for appropriate Referer values is a weak security precaution that's easy to bypass and not necessarily a good idea, but like most weak security precautions it does have the virtue of making it pretty clear when people are deliberately trying to get around it.)

PS: Similar things apply when I talk about 'forged' other fields, especially User-Agent. Roughly speaking, I'll definitely call your U-A forged if you aren't human and it misleads about what you are. If you're a real human operating a real browser, I consider it your right to use whatever U-A you want to, including completely misleading ones. Since I'm human and inconsistent, I may still call it 'forged' in casual conversation for convenience.

web/ForgedRefererMyMeanings written at 23:30:55; Add Comment

The lie in Ubuntu source packages (and probably Debian ones as well)

I tweeted:

One of the things that pisses me off about the Debian and Ubuntu source package format is that people clearly do not actually use it to build packages; they use other tools. You can tell because of how things are broken.

(I may have been hasty in tarring Debian with this particular brush but it definitely applies to Ubuntu.)

Several years ago I wrote about one problem with how Debian builds from source packages, which is that it doesn't have a distinction between the package's source tree and the tree that the package is built in and as a result building the package can contaminate the source tree. This is not just a theoretical concern; it's happened to us. In fact it's now happened with both the Ubuntu 14.04 version of the package and then the Ubuntu 16.04 version, which was contaminated in a different way this time.

This problem is not difficult to find or notice. All you have to do is run debuild twice in the package's source tree and the second one will error out. People who are developing and testing package changes should be doing this all the time, as they build and test scratch versions of their package to make sure that it actually has what they want, passes package lint checks, and so on.

Ubuntu didn't find this issue, or if they found it they didn't care enough to fix it. The conclusion is inescapable; the source package and all of the documentation that tells you to use debuild on it is a lie. The nominal source package may contain the source code that went into the binary package (although I'm not sure you can be sure of that), but it's not necessarily an honest representation of how the package is actually built by the people who work on it and as a result building the package with debuild may or may not reproduce the binary package you got from Ubuntu. Certainly you can't reliably use the source package to develop new versions of the binary package; one way or another, you will have to use some sort of hack workaround.

(RPM based distributions should not feel too smug here, because they have their own package building issues and documentation problems.)

I don't build many Ubuntu packages. That I've stumbled over two packages out of the few that I've tried to rebuild and they're broken in two different ways strongly suggests to me that this is pretty common. I could be unlucky (or lucky), but I think it's more likely that I'm getting a reasonably representative random sample.

PS: If Ubuntu and/or Debian care about this, the solution is obvious, although it will slow things down somewhat. As always, if you really care about something you must test it and if you don't bother to test it when it's demonstrably a problem, you probably don't actually care about it. This is not a difficult test to automate.

(Also, if debuild is not what people should be using to build or rebuild packages these days, various people have at least a documentation problem.)

linux/UbuntuPackageBuildingLie written at 01:43:26; Add Comment


Getting chrony to not try to use IPv6 time sources on Fedora

Ever since I switched over to chrony, one of the quiet little irritations of its setup on my office workstation has been that it tried to use IPv6 time sources along side the IPv4 ones. It got these time sources from the default Fedora pool I'd left it using along side our local time sources (because I'm the kind of person who thinks the more time sources the merrier), and at one level looking up IPv6 addresses as well as IPv4 addresses is perfectly sensible. At another level, though, it wasn't, because my office workstation has no IPv6 connectivity and even no IPv6 configuration. All of those IPv6 time sources that chrony was trying to talk to were completely unreachable and would never work. At a minimum they were clutter in 'chronyc sources' output, but probably they were also keeping chrony from picking up some additional IPv4 sources.

I started out by reading the chrony.conf manpage, on the assumption that that would be where you configured this. When I found nothing, I unwisely gave up and grumbled to myself, eventually saying something on Twitter. This caused @rt2800pci1 to suggest using systemd restrictions so that chronyd couldn't even use IPv6. This had some interesting results. On the one hand, chronyd definitely couldn't use IPv6 and it said as much:

chronyd[4097894]: Could not open IPv6 command socket : Address family not supported by protocol

On the other hand, this didn't stop chronyd from trying to use IPv6 addresses as time sources:

chronyd[4097894]: Source 2620:10a:800f::14 replaced with 2620:10a:800f::11

(I don't know why my office workstation has such high PIDs at the moment. Something odd is clearly going on.)

However, this failure caused me to actually read the chronyd manpage, where I finally noticed the -4 command line option, which tells chrony to only use IPv4 addresses for everything. On Fedora, you can configure what options are given to chronyd in /etc/sysconfig/chronyd, which is automatically used by the standard Fedora chronyd.service systemd service for chrony(d). A quick addition and chrony restart, and now it's not trying to use IPv6 and I'm happy.

There are a number of lessons here. One of them is my perpetual one, which is that I should read the manual pages more often (and make sure I read all of them). There was no reason to stop with just the chronyd.conf manpage; I simply assumed that not using IPv6 would be configured there if it was configurable at all. I was wrong and I could had my annoyance fixed quite a while ago if I'd looked harder.

Another one, on the flipside, is that completely disabling IPv6 doesn't necessarily stop modern programs from trying to use it. Perhaps this is a bug on chrony's part, but I suspect that its authors will be uninterested in fixing it. It's likely becoming a de facto standard that Linux systems have IPv6 enabled, even if they don't have it configured and can't reach anything with it. Someday we're going to see daemons that bind themselves only to the IPv6 localhost, not the IPv4 one.

linux/ChronyDisableIPv6 written at 22:28:36; Add Comment


The value locked up in the Unix API makes it pretty durable

Every so often someone proposes or muses about replacing Unix with something more modern and better, or is surprised when new surface OSes (such as ChromeOS) are based on Unix (often Linux, although not always). One reason that this keeps happening and that some form of Unix is probably going to be with us for decades to come is that there is a huge amount of value locked up in the Unix API, and in more ways than are perhaps obvious.

The obvious way that a great deal of value is locked up in the Unix API is the kernels themselves. Whether you look at Linux, FreeBSD, OpenBSD, or even one of the remaining commercial Unixes, all of their kernels represent decades of developer effort. Some of this effort is in the drivers, many of which you could do without in an OS written from scratch for relatively specific hardware, but a decent amount of the effort is in core systems like physical and virtual memory management, process handling, interprocess communication, filesystems and block level IO handling, modern networking, and so on.

However, this is just the tip of the iceberg. The bigger value of the Unix API is in everything that runs on top of it. This comes in at least two parts. The first part is all of the user level components that are involved to boot and run Unix and everything that supports them, especially if you include the core of a graphical environment (such as some form of display server). The second part is all of the stuff that you run on your Unix as its real purpose for existing, whether this is Apache (or some other web server), a database engine, your own custom programs (possibly written in Python or Ruby or whatever), and so on. It's also the support programs for this, which blur the lines between the 'system' and being productive with it; a mailer, a nice shell, an IMAP server, and so on. Then you can add an extra layer of programs used to monitor and diagnose the system and another set of programs if you develop on it or even just edit files. And if you want to use the system as a graphical desktop there is an additional stack of components and programs that all use aspects of the Unix API either directly or indirectly.

All of these programs represent decades or perhaps centuries of accumulated developer effort. Throwing away the Unix API in favour of something else means either doing without these programs, rewriting your own versions from scratch, or porting them and everything they depend on to your new API. Very few people can afford to even think about this, much less undertake it for a large scale environment such as a desktop. Even server environments are relatively complex and multi-layered in practice.

(Worse, some of the Unix API is implicit instead of being explicitly visible in things like system calls. Many programs will expect a 'Unix' to handle process scheduling, memory management, TCP networking, and a number of other things in pretty much the same way that current Unixes do. If your new non-Unix has the necessary system calls but behaves significantly differently here, programs may run but not perform very well, or even malfunction.)

Also, remember that the practical Unix API is a lot more than system calls. Something like Apache or Firefox pretty much requires a large amount of the broad Unix API, not just the core system calls and C library, and as a result you can't get them up on your new system just by implementing a relatively small and confined compatibility layer. (That's been tried in the past and pretty much failed in practice, and is one reason why people almost never write programs to strict POSIX and nothing more.)

(This elaborates on a tweet of mine that has some additional concrete things that you'd be reimplementing in your non-Unix.)

unix/UnixAPIDurableValue written at 18:51:44; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.