Wandering Thoughts

2018-09-22

The ed(1) command in the Single Unix Specification (and the SVID)

When I wrote my entry on some differences between various versions of ed(1), I forgot to check the Single Unix Specification to see if it has anything to say about ed. It turns out that it does, and ed is part of the 'Shell & Utilities' section.

SUS ed is mostly the same as FreeBSD ed, which is kind of what I think of as 'the baseline modern ed'. SUS ed requires that s support a bunch of flags for printing the results (only GNU ed documents supporting them all), but it doesn't require a z command (to print paginated output). Interestingly, SUS requires that ed support a prompt and extra help, and that it print warnings if you try to do something that would lose a modified buffer. SUS ed is only required to support SUS Basic Regular Expressions, while all modern eds go at least somewhat beyond this; FreeBSD ed supports '\<' and '\>', for example.

One area of ed's history that I don't know very much about is how it evolved in System III and System V. SCO's website (of all people) has a PDF version of the System V Interface Definition online here, and for as long as it lasts you want to look in volume 2 for the manpage for ed. The SVID ed is mostly the same as FreeBSD ed, and in particular it has '\<' and '\>' in its regular expressions, unlike SUS ed. Its s command doesn't support the l, n, or p flags (required by SUS but not in FreeBSD). It does have prompting and help. I think that ed is the only editor required by the SVID, which may explain why the System V people enhanced it over the V7 and BSD eds; in BSD, the development of vi (and before it ex) probably made ed a relatively secondary editor.

Conveniently, the SUS ed documentation includes a rationale section that includes a discussion of historical differences between BSD and SVID behavior, commands supported, and so on. Of course, 'BSD' in the SUS basically means 'UCB BSD through 4.4 or so', as I don't think the SUS looks at what modern FreeBSD and OpenBSD are doing any more than it looks at Linux or GNU programs.

(My previous entry and this one only look at obvious differences like commands supported. I'm sure that there are plenty of subtle behavior differences between various versions of ed, both old and modern ones; the SUS ed rationale points out some of them, and GNU ed's --traditional switch suggests other points of difference for it.)

EdInSingleUnixSpecAndSVID written at 22:33:42; Add Comment

Some differences between various versions of ed(1)

Today, the versions of ed(1) that people are most likely to encounter and experiment with are GNU ed on Linux, and FreeBSD ed and OpenBSD ed on their respective systems. GNU ed is a new implementation written from scratch, while I believe that the *BSD ed is a descendant from the original V7 ed by way of 4.x BSD ed, such as 4.4BSD ed. However, a great deal of what's said about ed(1), especially in stuff from the 1980s, is basically about the original V7 ed. This sort of matters because these various versions have added and changed some things, so the experience with the original V7 ed is not quite the same as what you'd have today.

Every modern version of ed offers a '-p' flag that provides a command prompt (and a P command to toggle the prompt on and off, with a default prompt of '*'), but this is not in either V7 ed or 4.4BSD ed. This idea appeared in some versions of ed very early; for example, in the early 1980s people at the University of Toronto modified the early versions of ed from V7 and 4.x BSD into 'UofT ed', and one of its standard features was that it basically always had a command prompt.

(As far as I know, UofT ed was strongly focused on interactive use, especially by undergraduate students.)

Line addressing is almost the same in all versions. All modern versions support a bare ';' to mean 'from here to the end of the file', and the *BSD versions support a bare '%' as an alias for a bare ',' (ie, 'the entire file'). Modern versions of ed support more regular expression features (such as character classes), but unsurprisingly haven't changed the basics. GNU ed has the most features here, but OpenBSD and FreeBSD turn out to differ slightly from each other in their choices for some syntax.

GNU ed has the most additions and changes in commands. It's the only ed that has a cut buffer for transferring content between files (the x and y commands), # as a comment command, and it has various options to print out lines that a s command has acted on.

Compared to modern versions of ed, V7 ed and 4.4BSD ed don't have a version of the e, r, or w commands that run a program instead of using a file (modern 'e !<command>' et al), G or V commands for interactively editing lines that match or don't match a regular expression, the h or H commands to actually explain errors, or the n command to print lines with line numbers. V7 ed doesn't have either z (to print the buffer in pages) or wq, but 4.4BSD ed has both. The V7 s command is quite minimal, with no support for printing out the lines that s operated on; 4.4 BSD may have some support for this, although it's not clearly documented. Both V7 and 4.4BSD only let you substitute the first match or all matches in the line; all modern eds let you substitute the Nth match if you want. According to its manpage, V7 ed would let you do a q or e with a modified buffer without any warning; everything else warns you and requires you to do it again. V7 ed also has a limited u command that only applies to the last line edited; from 4.4BSD onward u was extended to undo all effects of multi-line commands like d and g.

Update: V7 ed has the E command, which implicitly mentions that ed does warn you on e with a modified buffer; I suspect it also warns you on a q. It's just that V7 doesn't feel that this needs to be mentioned in the manpage. Everyone else documents it explicitly.

In general, modern versions of ed are more friendly and powerful than the original V7 ed and even than 4.4BSD ed, but they probably aren't substantially so. The experience of using ed today is thus pretty close to the original V7 experience, especially if you don't turn on a prompt or extended help information about errors.

I was actually a bit surprised by how relatively little ed has changed since V7, and also how comparatively much has changed since 4.4 BSD. Apparently a number of people in both the *BSDs and GNU really cared about making ed actively friendlier for interactive editing, since prompts and more help only appear after 4.4BSD.

(I suspect that modern versions of ed can edit larger files and ones with longer lines than the original V7 ed, but you probably don't want to do that anyway. V7 ed appears to have had some support for editing files larger than could fit easily in memory, judging by some cautions in its manpage.)

Sidebar: UofT ed

Now that I've carefully read its manpage, UofT ed is an interesting departure from all of this. It has a number of additions to its regular expressions; '\!' matches control characters apart from tabs, '\_' matches a non-empty sequence of tabs and spaces, '\{' and '\}' match the start and end of 'identifiers', '\?' is a shortest-match version of '*', and it has '|' for regular expression alternation, which not even GNU ed has opted to add. In line addressing, UofT ed added '&', which means 'a page of lines'. For commands, it added b which is essentially the modern z, a h command to print help on the last error message or on various topics, a o command to set some options including whether regular expressions are case-independent, and a special z 'zap' command that allows 'interactive' modification of one line. It also supports ! in e, r, and w and allows specifying which match to use in s expressions.

The UofT ed zap command is especially interesting because it's an attempt to provide a visual way of modifying a line without departing from ed's fundamental approach to editing; in other words, it's an attempt to deal with ed's fundamental limitation. I'm going to quote the start of the manpage's documentation to give you the flavour:

The zap command allows one line at a time to be modified according to graphical requests. The line to be modified is typed out, and then the modify request is read from the terminal (even if the z command is in a global command); this is done repeatedly until the modify request is empty. Generally each character in the request specifies how to modify the character immediately above it, in the original line, as described in the following table.

Zap allows you to delete characters, replace them, overwrite or insert characters, or replace the entire rest of the line with something else, and you could iterate this until you were happy with the result. Because I can, here is an authentic example of z in action:

; ed
[...]
*p
abc def ghi
*z
abc def ghi
    ####
abc ghi
^def 
def abc ghi
        fred
def abc fred
    $afunc(a, b)
def afunc(a, b)

*

This is a kind of silly example, but you can see how we can successively modify the line while still being able to see what we're doing. This is an especially clever approach because ed (the program) doesn't have to switch to character at a time input to implement it; ed is still reading a line at a time, but it's created a way where you can use this visually.

(Interested parties are encouraged to implement this for GNU ed, and if you are one I can give you a complete description of the various characters for z. You'd have to pick another command letter, though, since modern eds use z for paginated output of the buffer.)

EdVersionsDifferences written at 20:51:30; Add Comment

2018-09-02

NFS directory reading and directory file type information

NFS has always had an operation to read directories (unsurprisingly called READDIR). In NFS v2, this operation simply returned a list of names (and 'fileids', ie inode numbers). One of the things that NFS v3 introduced was an extended version of this, called READDIRPLUS that returns some additional information along with the directory listing. This new operation was motivated by the observation that NFS clients often immediately followed a READDIR operation by a bunch of additional NFS calls to get additional information on many or all of the names in the directory. In light of the fact that file type information is available in Unix directories at least some of the time (on many Unixes), I found myself wondering if this file type information was sufficient for an NFS server to implement READDIRPLUS, so that such a Unix could satisfy READDIRPLUS requests purely from reading the directory itself.

As far as I can see, unfortunately the answer is that it isn't. Directory file type information only gives you the file type of each name, while NFS v3's READDIRPLUS operation is specified in RFC 1813 as returning full information on each name, what the standard calls a fattr3 (defined on page 22). This is basically the same as what you get from stat(), and this implies that the NFS server has to read each inode to pull up this information. That's kind of a pity, at least for NFS v3, and one of the consequences is that you can't get the same type of high-efficiency file type scanning over NFS v3 as you can locally.

We have historically used NFS v3 only and so I default to mostly looking at it. However, there's also NFS v4 (specified in RFC 7530), and once I looked at it, it turns out to be different in an important way. NFS v4 has only a READDIR operation, but it has been defined to allow the NFS client to specify what attributes of each name it wants to get back. A NFS v4 client can thus opt to ask for only fileids and file type information, which permits an NFS v4 server to satisfy the READDIR request purely from reading the directory, without having to stat() each file in the directory. Even in the possible case where file type information isn't known for all files, the NFS v4 server would only have to stat() some files, not all of them.

With that said, I don't know if NFS v4 clients actually make such limited READDIR requests or if they actually ask NFS v4 servers to give them enough extra information that the server has to stat() everything. Sadly, one thing clients could sensibly want to know to save time is the NFS filehandle of each name, and the filehandle generally requires information that needs a stat().

(Learning this about NFS v4 may make us more interested in trying to use it, assuming that we can make everything work in NFS v4 with traditional 'sec=sys' Unix style 'trust the client's claims about UIDs and GIDs' security.)

NFSReaddirAndDType written at 01:38:19; Add Comment

2018-08-25

The history of file type information being available in Unix directories

The two things that Unix directory entries absolutely have to have are the name of the directory entry and its 'inode', by which we generically mean some stable kernel identifier for the file that will persist if it gets renamed, linked to other directories, and so on. Unsurprisingly, directory entries have had these since the days when you read the raw bytes of directories with read(), and for a long time that was all they had; if you wanted more than the name and the inode number, you had to stat() the file, not just read the directory. Then, well, I'll quote myself from an old entry on a find optimization:

[...], Unix filesystem developers realized that it was very common for programs reading directories to need to know a bit more about directory entries than just their names, especially their file types (find is the obvious case, but also consider things like 'ls -F'). Given that the type of an active inode never changes, it's possible to embed this information straight in the directory entry and then return this to user level, and that's what developers did; on some systems, readdir(3) will now return directory entries with an additional d_type field that has the directory entry's type.

On Twitter, I recently grumbled about Illumos not having this d_type field. The ensuing conversation wound up with me curious about exactly where d_type came from and how far back it went. The answer turns out to be a bit surprising due to there being two sides of d_type.

On the kernel side, d_type appears to have shown up in 4.4 BSD. The 4.4 BSD /usr/src/sys/dirent.h has a struct dirent that has a d_type field, but the field isn't documented in either the comments in the file or in the getdirentries(2) manpage; both of those admit only to the traditional BSD dirent fields. This 4.4 BSD d_type was carried through to things that inherited from 4.4 BSD (Lite), specifically FreeBSD, but it continued to be undocumented for at least a while.

(In FreeBSD, the most convenient history I can find is here, and the d_type field is present in sys/dirent.h as far back as FreeBSD 2.0, which seems to be as far as the repo goes for releases.)

Documentation for d_type appeared in the getdirentries(2) manpage in FreeBSD 2.2.0, where the manpage itself claims to have been updated on May 3rd 1995 (cf). In FreeBSD, this appears to have been part of merging 4.4 BSD 'Lite2', which seems to have been done in 1997. I stumbled over a repo of UCB BSD commit history, and in it the documentation appears in this May 3rd 1995 change, which at least has the same date. It appears that FreeBSD 2.2.0 was released some time in 1997, which is when this would have appeared in an official release.

In Linux, it seems that a dirent structure with a d_type member appeared only just before 2.4.0, which was released at the start of 2001. Linux took this long because the d_type field only appeared in the 64-bit 'large file support' version of the dirent structure, and so was only return by the new 64-bit getdents64() system call. This would have been a few years after FreeBSD officially documented d_type, and probably many years after it was actually available if you peeked at the structure definition.

(See here for an overview of where to get ancient Linux kernel history from.)

As far as I can tell, d_type is present on Linux, FreeBSD, OpenBSD, NetBSD, Dragonfly BSD, and Darwin (aka MacOS or OS X). It's not present on Solaris and thus Illumos. As far as other commercial Unixes go, you're on your own; all the links to manpages for things like AIX from my old entry on the remaining Unixes appear to have rotted away.

Sidebar: The filesystem also matters on modern Unixes

Even if your Unix supports d_type in directory entries, it doesn't mean that it's supported by the filesystem of any specific directory. As far as I know, every Unix with d_type support has support for it in their normal local filesystems, but it's not guaranteed to be in all filesystems, especially non-Unix ones like FAT32. Your code should always be prepared to deal with a file type of DT_UNKNOWN.

(Filesystems can implement support for file type information in directory entries in a number of different ways. The actual on disk format of directory entries is filesystem specific.)

It's also possible to have things the other way around, where you have a filesystem with support for file type information in directories that's on a Unix that doesn't support it. There are a number of plausible reasons for this to happen, but they're either obvious or beyond the scope of this entry.

DirectoryDTypeHistory written at 00:31:39; Add Comment

2018-08-22

Why ed(1) is not a good editor today

I'll start with my tweet:

Heretical Unix opinion time: ed(1) may be the 'standard Unix editor', but it is not a particularly good editor outside of a limited environment that almost never applies today.

There is a certain portion of Unixdom that really likes ed(1), the 'standard Unix editor'. Having actually used ed for a not insignificant amount of time (although it was the friendlier 'UofT ed' variant), I have some reactions to what I feel is sometimes overzealous praise of it. One of these is what I tweeted.

The fundamental limitation of ed is that it is what I call an indirect manipulation interface, in contrast to the explicit manipulation interfaces of screen editors like vi and graphical editors like sam (which are generally lumped together as 'visual' editors, so called because they actually show you the text you're editing). When you edit text in ed, you have some problems that you don't have in visual editors; you have to maintain in your head the context of what the text looks like (and where you are in it), you have to figure out how to address portions of that text in order to modify them, and finally you have to think about how your edit commands will change the context. Copious use of ed's p command can help with the first problem, but nothing really deals with the other two. In order to use ed, you basically have to simulate parts of ed in your head.

Ed is a great editor in situations where the editor explicitly presenting this context is a very expensive or outright impossible operation. Ed works great on real teletypes, for example, or over extremely slow links where you want to send and receive as little data as possible (and on real teletypes you have some amount of context in the form of an actual printout that you can look back at). Back in the old days of Unix, this described a fairly large number of situations; you had actual teletypes, you had slow dialup links (and later slow, high latency network links), and you had slow and heavily overloaded systems.

However, that's no longer the situation today (at least almost all of the time). Modern systems and links can easily support visual editors that continually show you the context of the text and generally let you more or less directly manipulate it (whether that is through cursoring around it or using a mouse). Such editors are easier and faster to use, and they leave you with more brainpower free to think about things like the program you're writing (which is the important thing).

If you can use a visual editor, ed is not a particularly good editor to use instead; you will probably spend a lot of effort (and some amount of time) on doing by hand something that the visual editor will do for you. If you are very practiced at ed, maybe this partly goes away, but I maintain that you are still working harder than you need to be.

The people who say that ed is a quite powerful editor are correct; ed is quite capable (although sadly limited by only editing a single file). It's just that it's also a pain to use.

(They're also correct that ed is the foundation of many other things in Unix, including sed and vi. But that doesn't mean that the best way to learn or understand those things is to learn and use ed.)

This doesn't make ed a useless, vestigial thing on modern Unix, though. There are uses for ed in non-interactive editing, for example. But on modern Unix, ed is a specialized tool, much like dc. It's worth knowing that ed is there and roughly what it can do, but it's probably not worth learning how to use it before you need it. And you're unlikely to ever be in a situation where it's the best choice for interactive editing (and if you are, something has generally gone wrong).

(But if you enjoy exploring the obscure corners of Unix, sure, go for it. Learn dc too, because it's interesting in its own way and, like ed, it's one of those classical old Unix programs.)

EdNoLongerGoodEditor written at 01:17:53; Add Comment

2018-08-19

Why I'm mostly not interest in exploring new fonts (on Unix)

Every so often, an enthusiasm for some new font or set of fonts goes around the corners of the Internet that I pay attention to. It's especially common for monospaced fonts (programmers seem to have a lot of opinions here), but people get enthused by proportional ones as well. I used to be reasonably interested in this area and customized my fonts and sometimes my font rendering, but these days I find that I generally take at most a cursory look at new fonts and tune out.

The first reason I tune out is limited Unicode glyph coverage. As I found out when I dug into it, xterm doesn't require your primary font to have CJK glyphs, but as far as I know it does require your font to have most anything else, including Cyrillic, Arabic, and Hebrew glyphs, if you're not going to see just 'glyph missing' boxes. I don't necessarily look at stuff in these languages in my xterms all that often, but it comes up every so often and I'd like to have things available in any new font, since I pretty much do currently.

(I have to admit that the most common place these character sets come up is when I'm looking at spam messages, email and otherwise.)

For proportional fonts, things are less clear to me. Firefox apparently still uses glyphs from multiple fonts as necessary (based on eg this bug), even with modern XFT/FreeType fonts. I believe that Gnome and probably KDE have some sort of system for this, which sometimes matters for programs like Liferea (syndication feeds) and Corebird (Twitter), but I have no idea how it all works or is controlled. If all of this works, it means that my nominal main proportional font wouldn't need complete glyph coverage, although I might have to do a bunch of configuration tweaking.

(Back in the pre-XFT days, Firefox used to have a quite complex and arcane system for finding glyphs to render, since X fonts often had very small coverage. If things went wrong this could produce rather bizarre results; Firefox was perfectly capable of switching fonts around in the middle of lines or words, even in nominally European text, if it decided that for some reason your primary font didn't have the necessary character and it had to fish it out from some other, very different font. All of this is fortunately behind us and I've mercifully forgotten most of the details.)

All of this matters because most alternate fonts don't have a very wide glyph coverage. This isn't surprising; drawing a lot of glyphs is a lot of work, and creating glyphs for non-Latin characters requires its own set of type design expertise in how those characters are supposed to look. In practice most artisanal new fonts are likely to be pretty much confined to Latin and near-Latin characters.

The bigger reason that I tune out these days is that standard Unix XFT fonts and font display are now pretty good. Back in the old pre-XFT days, X fonts (bitmapped or TrueType) were generally pretty limited and often not of particularly high quality, and rendering decisions could be questionable or oriented at hardware that you didn't have. This could make it very worthwhile to seek out an alternate font, because it could significantly improve your reading experience. I used an alternate bitmapped font in Firefox for years, and even after that stopped being a good idea I switched to an alternate TrueType font (one or the other is visible in this view of my desktop, although I no longer remember if it was the bitmapped 'Garamond' font or TTF Georgia).

The modern XFT world is much different. The current fonts that are mostly used and shipped by major desktop environments and Linux distributions are pretty well designed and have increasingly good glyph coverage (primarily this is DejaVu), and if you don't like them, there are alternates (eg Google's Noto). I've also come around to finding that the rendering decisions are generally good, which is a change from the past for me.

Given that the default fonts are already pretty good, the incremental improvement I might gain through fiddling around with my fonts doesn't appear to be worth the effort and the uncertainty. I could spend a lot of time tinkering, trying out various fonts, and basically wind up with something that was subtly worse because, to be honest, I don't know what I'm actually doing here (and I know enough to know that type design and typography is complex). So the easy path is just to use the defaults.

(This entry is sparked by reading vermaden's FreeBSD Desktop – Part 15 – Configuration – Fonts & Frameworks (via), and thinking about why I had no interest in doing something similar.)

Sidebar: Why Unix XFT fonts and font rendering got good

My impression is that part of it is simply the progress of time creating a slow but steady improvement, but a lot of it is that Unix/Linux font rendering became important enough that people spent money on it. Google obviously spent the money to develop Noto, but they're far from the only people. Sometimes this funding came explicitly, by commissioning font work and so on, and sometimes it came passively, where companies like Red Hat and Canonical hired people and had them spend (work) time on Unix font rendering.

(Android probably helped motivate Google and other parties here.)

MyFontDisinterest written at 01:09:47; Add Comment

2018-08-03

Firefox now implements its remote control partly over D-Bus

On Unix, Firefox has had a long standing feature where you could remote control a running Firefox instance. This has traditionally worked through X properties (with two generations of protocols), which has the nice advantage that it works from remote machines as well as your local one, provided that you're forwarding X. Since I read my mail through exmh that's running on one of our servers, not my desktop, this is pretty useful for me; I can click on a link in mail in exmh, and it opens in my desktop Firefox. However, working through X properties also has the disadvantage that it naturally doesn't work at all on Wayland. Since Wayland is increasingly important, last November or so the Mozilla people fixed this by adding a new D-Bus based protocol (it landed in bug 1360560 and bug 1360566 but has evolved in various ways since then).

On current versions of Firefox, you will find this service on the session bus under the name org.mozilla.firefox.<something>, where the <something> is often 'ZGVmYXVsdA__'. In general this weird thing is the base64 encoded name of your Firefox profile with a few special characters turned into _, and that particular name is, well:

; echo -n default | base64
ZGVmYXVsdA==

Because this directly encodes the profile name in something that you have to get right, the D-Bus based version of Firefox remote control will reliably restrict itself to talking to a running Firefox that's using the same profile; the X properties based version doesn't always (or didn't always, at any rate). You can force a new Firefox to not try to talk to an existing Firefox by using --new-instance, as before.

(One case where you might need this is if you're testing an alternate version of Firefox by manually setting your $HOME to, eg, /tmp/ffox-test.)

It turns out that which protocol Firefox uses when is a bit tangled. If Firefox is built with D-Bus support, a running Firefox on X will be listening for incoming requests using both D-Bus and the X properties based protocol; you can talk to this Firefox with either. In the current Firefox code, if you built with both D-Bus and Wayland support, the client Firefox always uses D-Bus to try to talk to the running 'server' Firefox; it doesn't fall back to X properties if there's no D-Bus available. If you built Firefox without Wayland support, it always uses the X properties based protocol (even if you built with D-Bus, and so the running Firefox is listening there). You can see this sausage being made in StartRemoteClient() here.

This logic was introduced in the change for bug 1465371. Before then Firefox tried to use the X properties based remote control if it was running on X, and fell back to the D-Bus protocol otherwise. In thinking about it I've come to believe that the logic here is sound, because in a Wayland session you may have some programs that think they're running in X and then pass this view on to things run from them. D-Bus is more session type agnostic, although it only works on the local machine.

Note that this implies that you can no longer use Firefox itself as a client on a second machine, at least not if your second machine Firefox is a modern one that was built with Wayland support; it'll try to talk D-Bus and fail because your running Firefox isn't on that machine. If you want to remote control Firefox from a second machine, you now want a dedicated client like my ffox-remote program.

(Hopefully Mozilla will leave the X properties based protocol there for many years to come, so my cross-machine remote control will still keep working.)

Sidebar: some D-Bus protocol details

The D-Bus object path is /org/mozilla/firefox/Remote, which has one org.mozilla.firefox method, OpenURL(), all of which you can see by using a D-Bus browsing program such as d-feet. In the Firefox source code, what you want to look at is widget/xremoteclient/DBusRemoteClient.cpp (the client side, ie the firefox command you just ran that is going to pass your URL or whatever to the currently running one) and toolkit/components/remote/nsDBusRemoteService.cpp (the server side, ie the running Firefox).

Despite the fact that D-Feet will tell you that the argument to OpenURL() is a string, in actuality it's an entire command line encoded in the same annoying binary encoding that is used in the current X property based protocol, which you can read a concise description of in nsRemoteService.cpp. Presumably this minimizes code changes, although it's not the most natural D-Bus interface. This encoding does mean that you're going to need some moderately tangled code to remote-control Firefox over D-Bus; you can't fire up just any old D-Bus client program for it.

The client code for this is in toolkit/xre/nsAppRunner.cpp, in the StartRemoteClient() function.

FirefoxDBusRemoteControl written at 23:00:04; Add Comment

2018-07-30

My own configuration files don't have to be dotfiles in $HOME

Back when I started with Unix (a long time ago), programs had a simple approach to where to look for or put little files that they needed; they went into your $HOME as dotfiles, or if the program was going to have a bunch of them it might create a dot-directory for itself. This started with shells (eg $HOME/.profile) and spread steadily from there, especially for early open source programs. When I started writing shell scripts, setup scripts for my X environment, and other bits and pieces that needed configuration files or state files, the natural, automatic thing to do was to imitate this and put my own dotfiles and dot-directories in my $HOME. The entirely unsurprising outcome of this is that my home directories have a lot of dotfiles (some of them very old, which can cause problems). How many is a lot? Well, in my oldest actively used $HOME, I have 380 of them.

(Because dotfiles are normally invisible, it's really easy for them to build up and build up to absurd levels. Not that my $HOME is neat in general, but I have many fewer non-dotfiles cluttering it up.)

Recently it slowly dawned on me that my automatic reflex to put things in $HOME as dotfiles is both not necessary and not really a good idea. It's not necessary because I can make my own code look wherever I want it to, and it's not a good idea because $HOME's dotfiles are a jumbled mess where it's very hard to keep track of things or even to see them. Instead I'm better off if I put my own files in non-dotfile directory hierarchies somewhere else, with sensible names and sensible separation into different subdirectories and all of that.

(I'm not quite sure when and why this started to crystalize for me, but it might have been when I was revising my X resources and X setup stuff on my laptop and realized that there was no particular reason to put them in _$HOME/.X<something> the way I had on my regular machines.)

I'm probably not going to rip apart my current $HOME and its collection of dotfiles. Although the idea of a scorched earth campaign is vaguely attractive, it'd be a lot of hassle for no visible change. Instead, I've decided that any time I need to make any substantial change to things that are currently dotfiles, I'll take the opportunity to move them out of $HOME.

(The first thing I did this with was my X resources, which had to change on my home machine due to a new and rather different monitor. Since I was basically gutting them to start with, I decided it made no sense to do it in place in $HOME.)

PS: Modern Unix (mostly Linux) has the XDG Base Directory Specification, which tries to move a lot of things under $HOME/.config, $HOME/.local/share, and $HOME/.cache. In theory I could move my own things under there too. In practice I'm not particularly interested in hiding them away that way; I'd rather put them somewhere more obvious, such as $HOME/share/X11/resources.

MovingOutOfHOME written at 21:36:41; Add Comment

2018-07-16

Why people are probably going to keep using today's Unixes

A while back I wrote about how the value locked up in the Unix API makes it durable. The short version is that there's a huge amount of effort and thus value invested in both the kernels (that provide one level of the Unix API) and in all of the programs and tools and systems that run on top of them, using the Unix APIs. If you start to depart from this API you start to lose access to all of those things.

The flipside of this is why I think people are probably going to keep using current Unixes in the future instead of creating new Unix-like OSes or Unix OSes. To a large extent, the potential value in departing from current Unixes lies in doing things differently at some API level, and once you depart from the API you're fighting the durable power of the Unix API. If you don't depart from the Unix API, it's hard to see much of a point; 'we wrote a different kernel but we still support all of the Unix API' (and variants) don't appear to have all that high a value. You're spending a lot of effort to wind up in essentially the same place.

(There was a day when you could argue that current Unix kernels and systems were fatally flawed and you could make important improvements. Given how well they work today and how much effort they represent, that argument is no longer very convincing. Perhaps we could do better, but can we do lots better, enough to justify the cost?)

In one way this is depressing; it means that the era of many Unixes and many Unix-like OSes flourishing is over. Not only is the cost of departing from Unix too high, but so is the cost of reimplementing it and possibly even keeping up with the leading implementations. The Unixes we have today are likely to be the only Unixes we ever have, and probably not all of them are going to survive over the long term (and that's apart from the commercial ones that are on life support today, like Solaris).

(This isn't really a new observation; Rob Pike basically made it a long time ago in the context of academic systems software research (see the mention in this entry).)

But this doesn't mean that innovation in Unix and the Unix API is dead; it just means that it has to happen in a different way. You can't drive innovation by creating a new Unix or Unix-like, but you can drive innovation by putting something new into a Unix that's popular enough, so it becomes broadly available and people start taking advantage of it (the obvious candidate here is Linux). It's possible that OpenBSD's pledge() will turn out to be such an innovation (whether other Unixes implement it as a system call or as a library function that uses native mechanisms).

(Note that not all attempts to extend or change the practical Unix API turn out to be good ideas over the long term.)

It also doesn't always mean that what we wind up with is really 'Unix' in a conventional sense. One thing that's already happening is that an existing Unix is used as the heart of something that has custom layers wrapped around it. Android, iOS, and macOS are all versions of this; they have a core layer that uses an existing Unix kernel and so on but then a bunch of things specific to themselves on top. These systems have harvested what they find to be the useful value of their Unix and then ignored the rest of it. Of course all of them represent a great deal of effort in their custom components, and they wouldn't have happened if the people involved couldn't extract a lot of value from that additional work.

(This extends my other tweet from the time of the first entry.)

DurableCurrentUnixes written at 23:42:28; Add Comment

2018-06-29

What 'PID rollover' is on Unix systems

On Unix, everything is a process (generally including the threads inside processes, because that makes life simpler), and all processes have a PID (Process ID). In theory, the only special PID is PID 1, which is init, which has various jobs and which often causes your system to reboot if it dies (which isn't required even if most Unixes do it). Some Unixes also have a special 'PID 0', which is a master process in the kernel (on Illumos PID 0 is sched, and on FreeBSD it's called [kernel]). PIDs run from PID 1 upward to some maximum PID value and traditionally they're used strictly sequentially, so PID X is followed by PID X+1 and PID X+2 (even if some of the processes may be very short-lived).

(OpenBSD uses randomized PIDs by default; FreeBSD can turn them on by setting the kern.randompid sysctl, at least according to Internet searches. Normal Linux and Illumos are always sequential.)

Once, a very long time ago, Unix was a small thing and it ran on small, slow machines that liked to use 16-bit integers, ie the DEC PDP-11 series that was the home of Research Unix up through V7. In V7, PIDs were C shorts, which meant that they had a natural maximum value of 32767, and the kernel further constrained their maximum value to be 29,999. What happened when you hit that point? Well, let's just quote from newproc() in slp.c:

   /*
    * First, just locate a slot for a process
    * and copy the useful info from this process into it.
    * The panic "cannot happen" because fork has already
    * checked for the existence of a slot.
    */
retry:
    mpid++;
    if(mpid >= 30000) {
           mpid = 0;
           goto retry;
    }

(The V7 kernel had a lot of gotos.)

This is PID rollover, or rather the code for it.

The magical mpid is a kernel global variable that holds the last PID that was used. When it hits 30,000, it rolls back over to 0, gets incremented to be 1, and then we'll find that PID 1 is in use already and try again (there's another loop for that). Since V7 ran on small systems, there was no chance that you could have 30,000 processes in existence at once; in fact the kernel had a much smaller hardcoded limit called NPROC, which was usually 150 (see param.h).

Ever since V7, most Unix systems have kept the core of this behavior. PIDs have a maximum value, often still 30,000 or so by default, and when your sequential PID reaches that point you go back to starting from 1 or a low number again. This reset is what we mean by PID rollover; like an odometer rolling over, the next PID rolls over from a high value to a low value.

(I believe that it's common for modern Unixes to reset PIDs to something above 1, so that the very low numbered PIDs can't be reused even if there's no process there any more. On Linux, this low point is a hardcoded value of 300.)

Since Unix is no longer running on hardware where you really want to use 16-bit integers, we could have a much larger maximum PID value if we wanted to. In fact I believe that all current Unixes use a C type for PIDs that's at least 32 bits, and perhaps 64 (both in the kernel and in user space). Sticking to signed 32 bit integers but using the full 2^31-1 integer range would give us enough PIDs that it would take more than 12 years of using a new PID every 500 microseconds before we had a PID rollover. However, Unixes are startlingly conservative so no one goes this high by default, although people have tinkered with the specific numbers.

(FreeBSD PIDs are officially 0 to 99999, per intro(2). For other Unixes, see this SE question and its answers.)

To be fair, one reason to keep PIDs small is that it makes output that includes PIDs shorter and more readable (and it makes it easier to tell PIDs apart). This is both command output, for things like ps and top, and also your logs when they include PIDs (such as syslog). Very few systems can have enough active or zombie processes that they'll have 30,000 or more PIDs in use at the same time, and for the rest of us, having a low maximum PID makes life slightly more friendly. Of course, we don't have to have PID rollover to have low maximum PIDs; we can just have PID randomization. But in theory PID rollover is just as good and it's what Unix has always done (for a certain value of 'Unix' and 'always', given OpenBSD and so on).

In the grand Unix tradition, people say that PID rollover doesn't have issues, it just exposes issues in other code that isn't fully correct. Such code includes anything that uses daemon PID files, code that assumes that PID numbers will always be ascending or that if process B is a descendant of process A, it will have a higher PID, and code that is vulnerable if you can successfully predict the PID of a to-be-created process and grab some resource with that number in it. Concerns like these are at least part of why OpenBSD likes PID randomization.

(See this interesting stackexchange answer about how Unixes behave and when they introduced randomization options.)

PidRollover written at 23:51:18; Add Comment

(Previous 10 or go back to June 2018 at 2018/06/17)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.