Wandering Thoughts archives

2017-04-29

Some versions of sort can easily sort IPv4 addresses into natural order

Every so often I need to deal with a bunch of IPv4 addresses, and it's most convenient (and best) to have them sorted into what I'll call their natural ascending order. Unfortunately for sysadmins, the natural order of IPv4 addresses is not their lexical order (ie what sort will give you), unless you zero-pad all of their octets. In theory you can zero pad IPv4 addresses if you want, turning 58.172.99.1 into 058.172.099.001, but this form has two flaws; it looks ugly and it doesn't work with a lot of tools.

(Some tools will remove the zero padding, some will interpret zero-padded octets as being in octal instead of decimal, and some will leave the leading zeros on and not work at all; dig -x is one interesting example of the latter. In practice, there are much better ways to deal with this problem and people who zero-pad IPv4 addresses need to be politely corrected.)

Fortunately it turns out that you can get many modern versions of sort to sort plain IPv4 addresses in the right order. The trick is to use its -V argument, which is also known as --version-sort in at least GNU coreutils. Interpreting IPv4 addresses as version numbers is basically exactly what we want, because an all-numeric MAJOR.MINOR.PATCH.SUBPATCH version number sorts in exactly the same way that we want an IPv4 A.B.C.D address to sort.

Unfortunately as far as I know there is no way to sort IPv6 addresses into a natural order using common shell tools. The format of IPv6 addresses is so odd and unusual that I expect we're always going to need a custom program for it, although perhaps someday GNU Sort will grow the necessary superintelligence.

This is a specific example of the kind of general thinking that you need in order to best apply Unix shell tools to your problems. It's quite helpful to always be on the lookout for ways that existing features can be reinterpreted (or creatively perverted) in order to work on your problems. Here we've realized that sort's idea of 'version numbers' includes IPv4 addresses, because from the right angle both they and (some) version numbers are just dot-separated sequences of numbers.

PS: with brute force, you can use any version of sort that supports -t and -k to sort IPv4 addresses; you just need the right magic arguments. I'll leaving working them out (or doing an Internet search for them) as an exercise for the reader.

PPS: for the gory details of how GNU sort treats version sorting, see the Gnu sort manual's section on details about version sort. Okay, technically it's ls's section on version sorting. Did you know that GNU coreutils ls can sort filenames partially based on version numbers? I didn't until now.

(This is a more verbose version of this tweet of mine, because why should I leave useful stuff just on Twitter.)

Sidebar: Which versions of sort support this

When I started writing this entry, I assumed that sort -V was a GNU coreutils extension and would only be supported by the GNU coreutils version. Unixes with other versions (or with versions that are too old) would be out of luck. This doesn't actually appear to be the case, to my surprise.

Based on the GNU Coreutils NEWS file, it appears that 'sort -V' appeared in GNU coreutils 7.0 or 7.1 (in late 2008 to early 2009). The GNU coreutils sort is used by most Linux distributions, including all of the main ones, and almost anything that's modern enough to be getting security updates should have a version of GNU sort that is recent enough to include this.

Older versions of FreeBSD appear to use an old version of GNU coreutils sort; I have access to a FreeBSD 9.3 machine that reports that /usr/bin/sort is GNU coreutils sort 5.3.0 (from 2004, apparently). Current versions of FreeBSD and OpenBSD have switched to their own version of sort, known as version '2.3-FreeBSD', but this version of sort also supports -V (I think the switch happened in FreeBSD 10, because a FreeBSD 10.3 machine I have access to reports this version). Exactly how -V orders things is probably somewhat different between GNU coreutils sort and FreeBSD/OpenBSD sort, but it doesn't matter for IPv4 addresses.

The Illumos /usr/bin/sort is very old, but I know that OmniOS ships /usr/gnu/bin/sort as standard and really you want /usr/gnu/bin early in your $PATH anyways. Life is too short to deal with ancient Solaris tool versions with ancient limitations.

SortingIPv4Addresses written at 01:26:50; Add Comment

2017-04-08

Wayland is now the future of Unix graphics and GUIs

The big Unix graphics news of the past week is that Ubuntu threw in the towel on their Unity GUI and with it their Mir display server (see the Ars story for more analysis). I say 'Unix' instead of 'Linux' here because I think this is going to have consequences well beyond Linux.

While there was a three-way fight for the future between Wayland, Ubuntu's Mir, and the default of X, it was reasonably likely that support for X was going to remain active in things like Firefox, KDE, and even Gnome. As a practical matter, Mir and Wayland were both going to support X programs, so if you targeted X (possibly as well as Wayland and/or Mir) you could run on everything and people would not be yelling at you and so on. But, well, there isn't a three-way fight any more. There is only X and Wayland now, and that makes Wayland the path forward by default. With only one path forward, the pressure for applications and GUI environments to remain backwards compatible to X is going to be (much) lower. And we already know how the Gnome people feel about major breaking changes; as Gnome 3 taught us, the Gnome developers are perfectly fine with them if they think the gain is reasonable.

In short: running exclusively on Wayland is the future of Gnome and Gnome-based programs, which includes Firefox; I suspect that it's also the future of KDE. It's not an immediate future, but in five years I suspect that it will be at least looming if not arriving. At that point, anyone who is not running Wayland will not be getting modern desktop software and programs and sooner or later won't be getting browser security fixes for what they currently have.

People run desktop software on more Unixes than just Linux. With Gnome and important desktop apps moving to Wayland, those Unixes face a real problem; they can live with old apps, or they can move to Wayland too. FreeBSD is apparently working seriously on Wayland support (cf), and at one point a Dragonfly BSD developer had Wayland running there. OpenBSD? Don't hold your breath. Solaris? That's up to Oracle these days but I don't really expect it; it would be a lot of work and I can't imagine that Oracle has many customers who will pay for it. Illumos? Probably not unless someone gets very energetic.

With that said, old X programs and environments are not going to suddenly go away. Fvwm will be there for years or decades to come, for example, as will xterm and any number of other current X programs and window managers. But people who are stuck in X will also be increasingly stuck in the past, unable to run current versions of more and more programs.

(For some people, this will be just fine. We're probably going to see a fairly strong sorting function among the free Unixes for what sort of person winds up where, which is going to make cultural issues even more fun than usual.)

PS: Some people may sneer at 'desktop software and programs', but this category includes quite a lot of things that are attractive but by and large desktop agnostic, like photography programs, Twitter clients, and syndication feed readers. Most modern graphical programs on Unix are built on top of some mid-level toolkit like GTK+ or QT, not on basic X stuff, because those mid-level toolkits make it so much faster and easier to put together GUIs. If and when those toolkits become Wayland-only and the latest versions of the programs move to depend on recent versions of the toolkits, the programs become Wayland-only too.

WaylandNowTheFuture written at 00:28:59; Add Comment

2017-04-05

Why the modern chown command uses a colon to separate the user and group

In the beginning, all chown(1) did was change the owner of a file; if you wanted to change a file's group too, you had to use chgrp(1) as well. This is actually more unusual than I realized before I started to write this entry, because even in V7 Unix the chown(2) system call itself could change both user and group, per the V7 chown(2) manpage. Restricting chown(1) to only changing the owner did make the command itself pretty simple, though.

By the time of 4.1c BSD, chown(1) had become chown(8), because, per the manual page:

Only the super-user can change owner,
in order to simplify as yet unimplemented accounting procedures.

(The System V line of Unixes would retain an unrestricted chown(2) system call for some time and thus I believe they kept the chown command in section 1, for general commands anyone could use.)

In 4.3 BSD, someone decided that chown(8) might as well let you change the group at the same time, to match the system call. As the manual page covers, they used this syntax:

/etc/chown [ -f -R ] owner[.group] file ...

That is, to chown a file to user cks, group staff, you did 'chown cks.staff file'.

This augmented version of the chown command was picked up by various Unixes that descended from 4.x BSD, although not immediately (like many things from 4.3 BSD, it took a while to propagate around). Sometimes this was the primary version of chown, found in /usr/bin or the like; sometimes this was a compatibility version, in /usr/ucb (Solaris through fairly late, for example). Depending on how you set up your $PATH on such systems, you could wind up using this version of chown and thus get used to having 'user:group' rejected as an error.

Then, when it came time for POSIX to standardize this, someone woke up and created the modern syntax for changing both owner and group at once. As seen in the Single Unix Specification for chown, this is 'chown owner[:group] file', ie the separator is now a colon. Since POSIX and the SUS normally standardized existing practice (where it actually existed), you might wonder why they changed it. The answer is simple: a colon is not a valid character in a login, while a dot is.

Sure, dots are unusual in Unix logins in most places, but they're legal and they do show up in some environments (and they're legal in group names as well). Colons are outright illegal unless you like explosions, fundamentally because they're the field separator character in /etc/passwd and /etc/group. The SUS manpage actually has an explicit discussion of this in the RATIONALE section, although it doesn't tell you what it means by 'historical implementations'.

(The SUS manpage also discusses a scenario where using chown and chgrp separately isn't sufficient, and you have to make the change in a single chown() system call.)

PS: Since I think I ran into this dot-versus-colon issue on our old Solaris 10 fileservers, I probably had /usr/ucb before /usr/bin in my $PATH there. I generally prefer UCB versions of things to the stock Solaris versions, but in this case it tripped me up.

PPS: It turns out the GNU chown accepts the dot form as well provided that it's unambiguous, although this is covered only in the chown info file, and is not mentioned in the normal manpage.

ChownColonSeparatorWhy written at 00:47:44; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.