Wandering Thoughts archives


OpenBSD has to be a BSD Unix and you couldn't duplicate it with Linux

OpenBSD has a well deserved reputation for putting security and a clean system (for code, documentation, and so on) first, and everything else second. OpenBSD is of course based on BSD (it's right there in the name) and descends from FreeBSD NetBSD (you can read the history here). But one of the questions you could ask about it is whether it had to be that way, and in particular if you could build something like OpenBSD on top of Linux. I believe that the answer is no.

Linux and the *BSDs have a significantly different model of what they are. BSDs have a 'base system' that provides an integrated and fully operational core Unix, covering the kernel, C library and compiler, and the normal Unix user level programs, all maintained and distributed by the particular BSD. Linux is not a single unit this way, and instead all of the component parts are maintained separately and assembled in various ways by various Linux distributions. Both approaches have their advantages, but one big one for the BSD approach is that it enables global changes.

Making global changes is an important part of what makes OpenBSD's approach to improving security, code maintenance, and so on work. Because it directly maintains everything as a unit, OpenBSD is in a position to introduce new C library or kernel APIs (or change them) and then immediately update all sorts of things in user level programs to use the new API. This takes a certain amount of work, of course, but it's possible to do it at all. And because OpenBSD can do this sort of ambitious global change, it does.

This goes further than just the ability to make global changes, because in theory you can patch in global changes on top of a bunch of separate upstream projects. Because OpenBSD is in control of its entire base system, it's not forced to try to reconcile different development priorities or integrate clashing changes. OpenBSD can decide (and has) that only certain sorts of changes will be accepted into its system at all, no matter what people want. If there are features or entire programs that don't fit into what OpenBSD will accept, they just lose out.

(I suspect that this decision on priorities gives OpenBSD has more leverage to push other people in directions that it wants, because the OpenBSD developers are clearly willing to remove support for something if they feel strongly enough about it. For example, I suspect that their new system call origin verification is going to eventually force Go to make system calls only through OpenBSD's C library, contrary to what Go prefers to do.)

unix/OpenBSDMustBeABSD written at 22:55:27; Add Comment

Filenames and paths should be a unique type and not a form of strings

I recently read John Goerzen's The Fundamental Problem in Python 3, which talks about Python 3's issues in environments where filenames (and other things) are not in a uniform and predictable encoding. As part of this, he says:

[...]. Critically, most of the Python standard library treats a filename as a String – that is, a sequence of valid Unicode code points, which is a subset of the valid POSIX filenames.


From a POSIX standpoint, the correct action would have been to use the bytes type for filenames; this would mandate proper encode/decode calls by the user, but it would have been quite clear. [...]

This is correct only from a POSIX standpoint, and then only sort of (it's correct in traditional Unix filesystems but not necessarily all current ones; some current Unix filesystems can restrict filenames to properly encoded UTF-8). The reality of modern life for a language that wants to work on Windows as well as Unix is that filenames must be presented as a unique type, not any form of strings or bytes.

How filenames and paths are represented depends on the operating system, which means that for portability filenames and paths need to be an opaque type that you have to explicitly insert string-like information into and extract string-like information out of, specifying the encoding if you don't want an opaque byte sequence of unpredictable contents. As with all encoding related operations, this can fail in both directions under some circumstances.

Of course this is not the Python 3 way. The Python 3 way is to pretend that everything is fine and that the world is all UTF-8 and Unicode. This is pretty much the pragmatically correct choice, at least if you want to have Windows as a first class citizen of your world, but it is not really the correct way. As with all aspects of its handling of strings and Unicode, Python 3 chose convenience over reality and correctness, and has been patching up the resulting mess on Unix since its initial release.

If Python was going to do this correctly, Python 3 would have been the time to do it; since it was breaking things in general, it could have introduced a distinct type and required that everything involving file names change to taking and returning that type. But that would have made porting Python 2 code harder and would have made it less likely that Python 3 was accepted by Python programmers, which is probably one reason it wasn't done.

(I don't think it was the only one; early Python 3 shows distinct signs that the Python developers had more or less decided to only support Unix systems where everything was proper UTF-8. This turned out to not be a viable position for them to maintain, so modern Python 3 is somewhat more accommodating of messy reality.)

python/FilenamesUniqueType written at 01:46:30; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.