Wandering Thoughts

2017-08-20

The surprising longevity of Unix manpage formatting

As part of writing yesterday's entry, I wound up wanting to read the 4.3 BSD ifconfig manpage, which is online as part of the 4.3 BSD source tree at tuhs.org. More exactly, I wanted to see more or less how it had originally looked in formatted form, because in source form the bit I was interested in wasn't too readable:

.TP 15
.BI netmask " mask"
(Inet only)
Specify how much of the address to reserve for subdividing
networks into sub-networks.
[...]

If I wrote and dealt with Unix manpages more than occasionally, perhaps I could read this off the top of my head, but as it is, I'm not that familiar with the raw *roff format of manpages. So I decided to start with the simplest, most brute force way of seeing a formatted version. I copied the raw text into a file on my Linux machine and then ran 'man -l' on it. What I hoped for was something that wasn't too terribly mangled so that I could more or less guess at the original formatting. What I got was a manpage that was almost completely intact (and possible it's completely intact).

To me, this is surprising and impressive longevity in manpage formatting. The 4.3 BSD ifconfig manpage dates from 1986, so that's more than 30 years of compatibility, and we can go back even further; it appears that V7 manpages (such as the one for ls) still format fine.

(V6 manpages are where things break, because apparently the man *roff macros were changed significantly between V6 and V7. One of the visible signs of this is that many of the macros were upper-cased; the V6 ls manpage has things like .th instead of the V7 .TH.)

I'm not going to speculate on reasons for why the man macros have been so stable for so long, but one thing it suggests to me is that the initial V7 set probably didn't have anything particularly wrong with them. On the other hand, the BSD people did build a completely new set of manpage macros in 4.3 Reno and Net/2, the mdoc macros, which have been carried forward into the current *BSDs.

(For more on this, see both The History of Unix Manpages and Wikipedia's history of manpages.)

ManpageMacroLongevity written at 01:36:44; Add Comment

2017-08-19

Subnets and early Unix implementations of TCP/IP networking

If you've been involved in networking (well, Internet and IP networking at least), you've probably heard and used the term 'subnet' and 'subnets'. As a term, subnet has a logical and completely sensible definition, and certainly the direct meaning is probably part of why we wound up with the term. If you've been around networking a while, you've probably also heard of 'CIDR' notation for networks, for example 192.168.1.0/24, and you may know that CIDR stands for Classless Inter-Domain Routing. You may even have heard of 'class C' and 'class B' networks, and had people refer to /24 CIDRs and /16 CIDRs as 'class C' and 'class B' respectively.

Back in the early days of IP, the entire IP network address space was statically divided up into a number of chunks of different sizes, and these different sizes were called the class of the particular chunk or network. When I say 'statically divided', I mean that what sort of network you had was determined by what your IP address was. If your IP address was, say, 8.10.20.30, you were in a class A network and everything in 8.*.*.* was (theoretically) on the same network as your machine. You can read the full details and the history on Wikipedia.

At the very beginning of Unix's support for IP networking, in 4.2 BSD, this approach was fine (and anyway it was the standard for how IP address space was divided up, so it was just how you did IP networking at the time). There were Ethernet LANs using IP (RFC 826 on ARP dates from that time), but there weren't many machines on them regardless of what class the network was. As a result of this, 4.2 BSD had no concept of network masks for interfaces. Really, it's right there in the 4.2 BSD ifconfig manpage; in 4.2 BSD, network interfaces were configured purely by specifying the interface's IP address. If the IP address was in a class A network, or a class B network, or a class C one, that was what you got; the 4.2 BSD kernel directly hard-coded how to split an arbitrary IP address into the standard (classful) network and host portions.

(See in_netof and in_lnaof in the 4.2 BSD sys/netinet/in.c.)

Naturally, this didn't last very long. The universities, research organizations, and so on that started using 4.2 BSD were also the places that got large class A (/8) and class B (/16) networks in the early days of the ARPANET, and so pretty soon they had far too many hosts to have them all on a single 10 MBit/sec LAN (especially once they had hosts in several different buildings). As a result, Unix networking (ie BSD networking) gained the concept of a netmask and subnets. The 4.3 BSD ifconfig manpage describes it this way:

netmask mask
(Inet only) Specify how much of the address to reserve for subdividing networks into sub-networks. The mask includes the network part of the local address and the subnet part, which is taken from the host field of the address. [...]

The idea here was that your university might have a class B, but your department would have what we would now call a /24 from that class B. The normal class B netmask is 255.255.0.0, but on your machines you'd set the netmask as 255.255.255.0 so they'd know that addresses outside that were not on the local network and had to be sent through the router instead of ARP'd for.

However, the bad news is that the non-netmask version of 4.2 BSD IP networking did last long enough to get out into the field, both in real 4.2 BSD machines and in early versions of commercial Unixes like SunOS. Some of these early SunOS workstations and servers were bought by universities with class B networks that were starting to subnet them. This wound up causing the obvious fun problems, where some of your department's machines might not be able to talk to the rest of the university because they were grimly determined that they were on a class B network and so could reach every host in 128.100.*.* on the local LAN.

(They could reach hosts that weren't on 128.100.*.* just fine, at least if you configured the right gateway.)

It turns out that this history is visible in an interesting series of RFCs. RFC 917 from 1984 begins the conversation on subnets, then RFC 925 suggests extending ARP to work across multiple interconnected LANs. Subnets are formalized in RFC 950, "proxy ARP" appears in RFC 1009, and finally RFC 1027 describes how the authors used proxy ARP at the University of Texas at Austin to implement transparent subnet gateways, where hosts on your (sub)net don't have to be aware that they are on a subnet instead of the full class A or class B network that they think they're on. Transparent subnet gateways are also know as 'how you get your 4.2 BSD and SunOS 2.x hosts to talk to the rest of the university'.

(Since IP networking started out by talking about 'networks', not 'subnets', it seems highly likely that our current use of subnet' comes from this terminology invention and growth in the early to mid 1980s. I find it interesting that the 1986 4.3 BSD ifconfig manpage is still talking about 'sub-networks' instead of shortening it to 'subnets'.)

SubnetsAndEarlyUnixIPv4 written at 01:31:32; Add Comment

2017-07-17

Why I think Emacs readline bindings work better than Vi ones

I recently saw a discussion about whether people used the Emacs bindings for readline editing or the Vi bindings (primarily in shells, although there are plenty of other places that use readline). The discussion made me realize that I actually had some opinions here, and that my view was that Emacs bindings are better.

(My personal background is that vim has been my primary editor for years now, but I use Emacs bindings in readline and can't imagine switching.)

The Emacs bindings for readline aren't better because Emacs bindings are better in general (I have no opinion on that for various reasons). Instead, they're better here because the nature of Emacs bindings make going back and forth between entering text and editing text, especially without errors. This is because Emacs bindings don't reuse normal characters. Vi gains a certain amount of its power and speed from reusing normal letters for editing commands (especially lower case letters, which are the easiest to type), while Emacs exiles all editing commands to special key sequences. Vi's choice is fine for large scale text editing, where you generally spend substantial blocks of time first entering text and then editing it, but it is not as great if you're constantly going back and forth over short periods of time, which is much more typical of how I do things in a single command line. The vi approach also opens you up to destructive errors if you forget that you're in editing mode. With Emacs bindings there is no such back and forth switching or confusion (well, mostly no such, as there are still times when plain letters are special or control and meta characters aren't).

Another way of putting this is that Emacs bindings at least feel like they're optimized for quickly making small edits, while vi ones feel more optimized for longer, larger-scale edits. Since typo-fixes and the like are most of what I do with command line editing, it falls into the 'small edits' camp where Emacs bindings shine.

Sidebar: Let's admit to the practical side too

Readline defaults to Emacs style bindings. If you only use a few readline programs on a few systems, it's probably no big deal to change the bindings (hopefully they all respect $HOME/.inputrc). But I'm a sysadmin, and I routinely use many systems (some of them not configured at all) as many users (me, root, and others). Trying to change all of those readline configurations is simply not feasible, plus some programs use alternate readline libraries that may not have switchable bindings.

In this overall environment, sticking with the default Emacs bindings is far easier and thus I may be justifying to myself why it 'makes sense' to do so. I do think that Emacs bindings make quick edits easier, but to really be sure of that I'd have to switch a frequently used part of my environment to vi bindings for long enough to give it a fair shake, and I haven't ever tried that.

As a related issue, my impression is that using Emacs bindings have become the default in basically anything that offers command line editing, even if they're not using readline at all and have reimplemented it from scratch. This provides its own momentum for sticking with Emacs bindings, since you're going to run into them sooner or later no matter how you set your shell et al.

EmacsForReadline written at 00:24:26; Add Comment

2017-07-11

The BSD r* commands and the history of privileged TCP ports

Once upon a time, UCB was adding TCP/IP to (BSD) Unix. They had multiple Unix machines, and one obvious thing to want when you have networking and multiple Unix machines is a way to log in and transfer files from one machine to another. Fortunately for the UCB BSD developers, TCP/IP already had well-specified programs (and protocols) to do this, namely telnet and FTP. So all they had to do was implement telnet and FTP clients and servers and they were done, right?

The USB BSD people did implement telnet and FTP, but they weren't satisfied with just that, apparently because neither was convenient and flexible enough. In particular, telnet and FTP have baked into them the requirement for a password. Obviously you need to ask people to authenticate themselves (with a password) when you're accepting remote connections from who knows where, but the UCB people were just logging in and transferring files and so on between their own local collection of Vaxes. So the BSD people wound up creating a better way, in the form of the r* commands: rlogin, rsh, and rcp. The daemons that implemented these were rlogind and rshd.

(See also rcmd(3).)

The authentication method was pretty simple; it relied on checking /etc/hosts.equiv to see if the client host was trusted in general, or ~/.rhosts to see if you were allowing logins from that particular remote user on the remote host. As part of this, these daemons obviously relied on the client not lying about who the remote user was (and what their login name was). How did they have some assurance about this? The answer is that the BSD developers added a hack to their TCP/UDP implementation, namely a new idea of 'privileged ports'.

Privileged ports were ports under 1024 (aka IPPORT_RESERVED), and the hack was that the kernel only allowed them to be used by UID 0. If you asked for 'give me any port', they were skipped, and if you weren't root and tried to bind(2) to such a port, the kernel rejected you. Rlogind and rshd insisted that client connections come from privileged ports, at which point they knew that it was a root-owned process talking to them from the client and they could trust its claims of what the remote login name was. Ordinary users on the client couldn't make their own connection and claim to be someone else, because they wouldn't be allowed to use a privileged port.

(Sun later reused this port-based authentication mechanism as part of NFS security.)

Based on the Unix versions available on tuhs.org, all of this appears to have been introduced in 4.1c BSD. This is the version that adds (IPPORT_RESERVED)) to netinet/in.h and has the TCP/UDP port binding code check it in in_pcbbind in netinet/in_pcb.c. In case you think the BSD people thought that this was an elegant idea, let me show you the code:

if (lport) {
  u_short aport = htons(lport);
  int wild = 0;

  /* GROSS */
  if (aport < IPPORT_RESERVED && u.u_uid != 0)
    return (EACCES);
  [...]

The BSD people knew this was a hack; they just did it anyway, probably because it was a very handy hack in their trusted local network environment. Unix has quietly inherited it ever since.

(Privileged ports are often called 'reserved ports', as in 'reserved for root only'. Even the 4.1c BSD usage here is inconsistent; the actual #define is IPPORT_RESERVED, but things like the rlogind manpage talk about 'privileged port numbers'. Interestingly, in 4.1c BSD the source code for the r* programs is hanging out in an odd place, in /usr/src/ucb/netser, along with several other things. By the release of 4.2 BSD, they had all been relocated to /usr/src/ucb and /usr/src/etc, where you'd expect.)

PS: Explicitly using a privileged port when connecting to a server is one of the rare cases when you need to call bind() for an outgoing socket, which is usually something you want to avoid.

Sidebar: The BUGS section of rlogind and rshd is wise

In the style of Unix manpages admitting big issues that they don't have any good way of dealing with at the moment, the manpages of both rlogind and rshd end with:

.SH BUGS
The authentication procedure used here assumes the
integrity of each client machine and the connecting
medium.  This is insecure, but is useful in an ``open''
environment.

.PP
A facility to allow all data exchanges to be encrypted
should be present.

These issues would stay for a decade or so, getting slowly more significant over time, until they were finally fixed by SSH in 1995. Well, starting in 1995, the switch wasn't exactly instant and even now the process of replacing rsh and rlogin is sort of ongoing (and also).

BSDRcmdsAndPrivPorts written at 00:38:16; Add Comment

2017-06-17

One reason you have a mysterious Unix file called 2 (or 1)

Suppose, one day, that you look at the ls of some directory and you notice that you have an odd file called '2' (just the digit). If you look at the contents of this file, it probably has nothing that's particularly odd looking; in fact, it likely looks like plausible output from a command you might have run.

Congratulations, you've almost certainly fallen victim to a simple typo, one that's easy to make in interactive shell usage and in Bourne shell scripts. Here it is:

echo hi  >&2
echo oop >2

The equivalent typo to create a file called 1 is very similar:

might-err 2>&1 | less
might-oop 2>1  | less

(The 1 files created this way are often empty, although not always, since many commands rarely produce anything on standard error.)

In each case, accidentally omitting the '&' in the redirection converts it from redirecting one file descriptor to another (for instance, forcing echo to report something to standard error) into a plain redirect-to-file redirection where the name of the file is your target file descriptor number.

Some of the time you'll notice the problem right away because you don't get output that you expect, but in other cases you may not notice for some time (or ever notice, if this was an interactive command and you just moved on after looking at the output as it was). Probably the easiest version of this typo to miss is in error messages in shell scripts:

if [ ! -f "$SOMETHING" ]; then
  echo "$0: missing file $SOMETHING" 1>2
  echo "$0: aborting" 1>&2
  exit 1
fi

You may never run the script in a way that triggers this error condition, and even if you do you may not realize (or remember) that you're supposed to get two error messages, not just the 'aborting' one.

(After we stumbled over such a file recently, I grep'd all of my scripts for '>2' and '>1'. I was relieved not to find any.)

(For more fun with redirection in the Bourne shell, see also how to pipe just standard error.)

ShellStderrRedirectionOops written at 23:58:57; Add Comment

2017-06-10

One downside of the traditional style of writing Unix manpages

A while back I wrote about waiting for a specific wall-clock time in Unix, which according to POSIX you can do by using clock_nanosleep with the CLOCK_REALTIME clock and the TIMER_ABSTIME flag. This is fully supported on Linux (cf) and not supported on FreeBSD. But here's a question: is it supported on Illumos-derived systems?

So, let us consult the Illumos clock_nanosleep manpage. This manpage is very much written in the traditional (corporate) style of Unix manpages, high on specification and low on extra frills. This style either invites or actively requires a very close reading, paying very careful attention to both what is said and what is not said. The Illumos manpage does not explicitly say that your sleep immediately ends if the system's wall clock time is adjusted forward far enough; instead it says, well:

If the flag TIMER_ABSTIME is set in the flags argument, the clock_nanosleep() function causes the current thread to be suspended from execution until either the time value of the clock specified by clock_id reaches the absolute time specified by the rqtp argument, [or a signal happens]. [...]

The suspension time caused by this function can be longer than requested because the argument value is rounded up to an integer multiple of the sleep resolution, or because of the scheduling of other activity by the system. [...] The suspension for the absolute clock_nanosleep() function (that is, with the TIMER_ABSTIME flag set) will be in effect at least until the value of the corresponding clock reaches the absolute time specified by rqtp, [except for signals].

On the surface, this certainly describes a fully-featured implementation of clock_nanosleep that behaves the way we want. Unfortunately, if you're a neurotic reader of Unix manpages, all is not so clear. The potential weasel words are 'the suspension ... will be in effect at least until ...'. If you don't shorten CLOCK_REALTIME timeouts when the system clock jumps forward, you are technically having them wait 'at least until' the clock reaches their timeout value, because you sort of gave yourself room to have them wait (significantly) longer. At the same time this is a somewhat perverse reading of the manpage, partly because the first sentence of that paragraph alleges that the system will only delay waking you up because of scheduling, which would disallow this particular perversity.

To add to my uncertainty, let's look at the Illumos timer_settime manpage, which contains the following eyebrow-raising wording:

If the flag TIMER_ABSTIME is set in the argument flags, timer_settime() behaves as if the time until next expiration is set to be equal to the difference between the absolute time specified by the it_value member of value and the current value of the clock associated with timerid. That is, the timer expires when the clock reaches the value specified by the it_value member of value. [...]

These two sentences do not appear to be equivalent for the case of CLOCK_REALTIME clocks. The first describes an algorithm that freezes the time to (next) expiration when timer_settime is called, which is not proper CLOCK_REALTIME behavior, and then the second broadly describes correct CLOCK_REALTIME behavior where the timer expires if the real time clock advances past it for any reason.

With all that said, Illumos probably fully implements CLOCK_REALTIME, with proper handling of the system time being adjusted while you're suspended or have a timer set. But its manpages never comes out and say that explicitly, because that's simply not the traditional style of Unix manpages, and the way they're written leaves me with uncertainty. If I cared about this, I would have to write a test program and then run it on a machine where I could set the system time both forward and backward.

This fault is not really with these specific Illumos manpages, although some elements of their wording aren't helping things. This is ultimately a downside to the terse, specification-like traditional style of Unix manpages. Where every word may count and the difference between 'digit' and 'digits' matters, you sooner or later get results like this, situations where you just can't tell.

(Yes, this would be a perverse implementation and a weird way of writing the manpages, but (you might say) perhaps the original Solaris corporate authors really didn't want to admit in plain text that Solaris didn't have a complete implementation of CLOCK_REALTIME.)

Also, I'm sure that different people will read these manpages differently. My reading is unquestionably biased by knowing that clock_nanosleep support is not portable across all Unixes, so I started out wondering if Illumos does support it. If you start reading these manpages with the assumption that of course Illumos supports it, then you get plenty of evidence for that position and all of the wording that I'm jumpy about is obviously me being overly twitchy.

ManpageStyleDownside written at 01:19:53; Add Comment

2017-06-04

Why the popen() API works but more complex versions blow up

Years ago I wrote about a long-standing Unix issue with more sophisticated versions of popen(); my specific example was writing a large amount of stuff to a subprogram through a pipe and then reading its output, where both sides stall trying to write to full pipes. Of course this is not the only way to have this problem bite you, so recently I ran across Andrew Jorgensen's A Tale of Two Pipes (via), where the same problem comes up when a subprogram writes to both standard output and standard error and you consume them one at a time.

Things like Python's subprocess module and many other imitators generally trace their core idea back to the venerable Unix popen(3) library function, which first appeared in V7 Unix. However, popen() itself does not actually have this problem; only more sophisticated and capable interfaces based on it do.

The reason popen() doesn't have the problem is straightforward and points to the core problem with more elaborated versions of the API. popen() doesn't have a problem because it only gives you a single IO stream, either the sub-program's standard input or its standard output. More sophisticated APIs give you multiple streams, and multiple streams are where you get into trouble. You get into trouble because more sophisticated APIs with multiple streams are implicitly pretending that the streams can be dealt with independently and serially, ie that you can fully process one stream before looking at another one at all. As A Tale of Two Pipes makes clear, this is not so. In actuality the streams are inter-dependent and have to be processed together, although Unix pipe buffers can hide this from you for a while.

Of course you can handle the streams properly yourself, resorting to poll() or some similar measure. But you shouldn't have to remember to do that, partly because as long as you have to take additional complex steps to make things work right, people are going to be forgetting this requirement. In the name of looking simple and generic, these APIs have armed a gun that is pointed straight at your feet. A more honest API would make the inter-dependency clear, perhaps by returning a Subprocess object that you registered callbacks on. Callbacks have a bad reputation but they at least make it clear that things can (and will) happen concurrently, instead of one stream being fully handled before another stream is even touched.

(Go has an interesting approach to the problem that is sort of half solution and half not. In its core os/exec API for this, you you provide streams which will be read from or written to asynchronously. However there are helper methods that give you a more traditional 'here is a stream' interface and with it the traditional problems.)

Sidebar: Why people keep creating these flawed subprogram APIs on Unix

These APIs keep getting created because they're attractive. How the API appears to behave (ie, without the deadlock issues) is how people often want to deal with subprograms. Most of the time you're not interacting with them step by step, sending in some input and collecting some output; instead you're sending in the input, collecting the output, and maybe collecting standard error as well in case something blew up. People don't want to write poll() based loops or callbacks or anything complicated, because concurrency is at least annoying. They just want the simple API to work.

Possibly libraries should make the straightforward user code work by handling all of the polling and so on internally and being willing to buffer unlimited amounts of standard output and standard error. This would probably blow up less often than the current scheme does, and you could provide various options for how much to buffer and how to deal with overflow for advanced users.

PopenAPIWiseLimitation written at 02:26:09; Add Comment

2017-05-05

Digging into BSD's choice of Unix group for new directories and files

I have to eat some humble pie here. In comments on my entry on an interesting chmod failure, Greg A. Woods pointed out that FreeBSD's behavior of creating everything inside a directory with the group of the directory is actually traditional BSD behavior (it dates all the way back to the 1980s), not some odd new invention by FreeBSD. As traditional behavior it makes sense that it's explicitly allowed by the standards, but I've also come to think that it makes sense in context and in general. To see this, we need some background about the problem facing BSD.

In the beginning, two things were true in Unix: there was no mkdir() system call, and processes could only be in one group at a time. With processes being in only one group, the choice of the group for a newly created filesystem object was easy; it was your current group. This was felt to be sufficiently obvious behavior that the V7 creat(2) manpage doesn't even mention it.

(The actual behavior is implemented in the kernel in maknode() in iget.c.)

Now things get interesting. 4.1c BSD seems to be where mkdir(2) is introduced and where creat() stops being a system call and becomes an option to open(2). It's also where processes can be in multiple groups for the first time. The 4.1c BSD open(2) manpage is silent about the group of newly created files, while the mkdir(2) manpage specifically claims that new directories will have your effective group (ie, the V7 behavior). This is actually wrong. In both mkdir() in sys_directory.c and maknode() in ufs_syscalls.c, the group of the newly created object is set to the group of the parent directory. Then finally in the 4.2 BSD mkdir(2) manpage the group of the new directory is correctly documented (the 4.2 BSD open(2) manpage continues to say nothing about this). So BSD's traditional behavior was introduced at the same time as processes being in multiple groups, and we can guess that it was introduced as part of that change.

When your process can only be in a single group, as in V7, it makes perfect sense to create new filesystem objects with that as their group. It's basically the same case as making new filesystem objects be owned by you; just as they get your UID, they also get your GID. When your process can be in multiple groups, things get less clear. A filesystem object can only be in one group, so which of your several groups should a new filesystem object be owned by, and how can you most conveniently change that choice?

One option is to have some notion of a 'primary group' and then provide ways to shuffle around which of your groups is the primary group. One problem with this is that it's awkward and error-prone to work in different areas of the filesystem where you want your new files and directories to be in different groups; every time you cd around, you may have to remember to change your primary group. If you move into a collaborative directory, better shift (in your shell) to that group; cd back to $HOME, or simply want to write a new file in $HOME, and you'd better remember to change back.

Another option is the BSD choice of inheriting the group from context. By far the most common case is that you want your new files and directories to be created in the 'context', ie the group, of the surrounding directory. If you're working in $HOME, this is your primary login group; if you're working in a collaborative area, this is the group being used for collaboration. Arguably it's a feature that you don't even have to be in that group (if directory permissions allow you to make new files). Since you can chgrp directories that you own, this option also gives you a relatively easy and persistent way to change which group is chosen for any particular area.

If you fully embrace the idea of Unix processes being in multiple groups, not just having one primary group and then some number of secondary groups, then the BSD choice makes a lot of sense. And for all of its faults, BSD tended to relatively fully embrace its changes (not totally, perhaps partly because it had backwards compatibility issues to consider). While it leads to some odd issues, such as the one I ran into, pretty much any choice here is going to have some oddities. It's also probably the more usable choice in general if you expect much collaboration between different people (well, different Unix logins), partly because it mostly doesn't require people to remember to do things.

(I know that on our systems, a lot of directories intended for collaborative work tend to end up being setgid specifically to get this behavior.)

BSDDirectoryGroupChoice written at 01:00:53; Add Comment

2017-05-03

Sometimes, chmod can fail for interesting reasons

I'll start by presenting this rather interesting and puzzling failure in illustrated form:

; mkdir /tmp/newdir
; chmod g+s /tmp/newdir
chmod: /tmp/newdir: Operation not permitted

How can I not be able to make this chmod change when I just made the directory and I own it? For extra fun, some people on this particular system won't experience this problem, and in fact many of them are the people you might report this problem to, namely the sysadmins.

At first I wondered if this particular /tmp filesystem disallowed setuid and setgid entirely, but it turned out to be not that straightforward:

; ls -ld /tmp/newdir
drwxr-xr-x  2 cks  wheel  512 May  3 00:35 /tmp/newdir

This at least explains why my chmod attempt failed. I'm not in group wheel, and for good reasons you can't make a file setgid to a group that you're not a member of. But how on earth did my newly created directory in /tmp wind up in group wheel, a group I'm not a member of? Well, perhaps someone made /tmp setgid, so all directories created in it inherited its group (presumably group wheel). Let's see:

; ld -ld /tmp
drwxrwxrwt  157 root  wheel  11776 May  3 00:41 /tmp

Although /tmp is indeed group wheel, it has perfectly ordinary permissions (mode 777 and sticky ('t'), so you can only delete or rename your own files). There's no setgid to be seen.

The answer to this mystery is that this is a FreeBSD machine, and on FreeBSD, well, let's quote the mkdir(2) manpage:

The directory's owner ID is set to the process's effective user ID. The directory's group ID is set to that of the parent directory in which it is created.

And also the section of the open(2) manpage that deals with creation of new files:

When a new file is created it is given the group of the directory which contains it.

In other words, on FreeBSD all directories have an implicit setgid bit. Everything created inside them (whether directories or files) inherits the directory's group. Normally this is not a problem and you'll probably never notice, but /tmp (and /var/tmp) are special because they allow everyone to create files and directories in them, and so there are a lot of people making things there who are not a member of the directory's group.

(The sysadmins usually are members of group wheel, though, so things will work for them. This should add extra fun if a user reports the general chmod issue as a problem, since sysadmins can't reproduce it as themselves.)

You might think that this is an obscure issue that no one will ever care about, but actually it caused a Go build failure on FreeBSD for a while. Tracking down the problem took me a while and a bunch of head scratching.

PS: arguably GID 0 should not be group wheel but instead something else that only root is a member of and wheel should be a completely separate group. To have group wheel used for group ownership as well as su access to root is at least confusing.

ChmodInterestingFailure written at 01:39:47; Add Comment

2017-04-29

Some versions of sort can easily sort IPv4 addresses into natural order

Every so often I need to deal with a bunch of IPv4 addresses, and it's most convenient (and best) to have them sorted into what I'll call their natural ascending order. Unfortunately for sysadmins, the natural order of IPv4 addresses is not their lexical order (ie what sort will give you), unless you zero-pad all of their octets. In theory you can zero pad IPv4 addresses if you want, turning 58.172.99.1 into 058.172.099.001, but this form has two flaws; it looks ugly and it doesn't work with a lot of tools.

(Some tools will remove the zero padding, some will interpret zero-padded octets as being in octal instead of decimal, and some will leave the leading zeros on and not work at all; dig -x is one interesting example of the latter. In practice, there are much better ways to deal with this problem and people who zero-pad IPv4 addresses need to be politely corrected.)

Fortunately it turns out that you can get many modern versions of sort to sort plain IPv4 addresses in the right order. The trick is to use its -V argument, which is also known as --version-sort in at least GNU coreutils. Interpreting IPv4 addresses as version numbers is basically exactly what we want, because an all-numeric MAJOR.MINOR.PATCH.SUBPATCH version number sorts in exactly the same way that we want an IPv4 A.B.C.D address to sort.

Unfortunately as far as I know there is no way to sort IPv6 addresses into a natural order using common shell tools. The format of IPv6 addresses is so odd and unusual that I expect we're always going to need a custom program for it, although perhaps someday GNU Sort will grow the necessary superintelligence.

This is a specific example of the kind of general thinking that you need in order to best apply Unix shell tools to your problems. It's quite helpful to always be on the lookout for ways that existing features can be reinterpreted (or creatively perverted) in order to work on your problems. Here we've realized that sort's idea of 'version numbers' includes IPv4 addresses, because from the right angle both they and (some) version numbers are just dot-separated sequences of numbers.

PS: with brute force, you can use any version of sort that supports -t and -k to sort IPv4 addresses; you just need the right magic arguments. I'll leaving working them out (or doing an Internet search for them) as an exercise for the reader.

PPS: for the gory details of how GNU sort treats version sorting, see the Gnu sort manual's section on details about version sort. Okay, technically it's ls's section on version sorting. Did you know that GNU coreutils ls can sort filenames partially based on version numbers? I didn't until now.

(This is a more verbose version of this tweet of mine, because why should I leave useful stuff just on Twitter.)

Sidebar: Which versions of sort support this

When I started writing this entry, I assumed that sort -V was a GNU coreutils extension and would only be supported by the GNU coreutils version. Unixes with other versions (or with versions that are too old) would be out of luck. This doesn't actually appear to be the case, to my surprise.

Based on the GNU Coreutils NEWS file, it appears that 'sort -V' appeared in GNU coreutils 7.0 or 7.1 (in late 2008 to early 2009). The GNU coreutils sort is used by most Linux distributions, including all of the main ones, and almost anything that's modern enough to be getting security updates should have a version of GNU sort that is recent enough to include this.

Older versions of FreeBSD appear to use an old version of GNU coreutils sort; I have access to a FreeBSD 9.3 machine that reports that /usr/bin/sort is GNU coreutils sort 5.3.0 (from 2004, apparently). Current versions of FreeBSD and OpenBSD have switched to their own version of sort, known as version '2.3-FreeBSD', but this version of sort also supports -V (I think the switch happened in FreeBSD 10, because a FreeBSD 10.3 machine I have access to reports this version). Exactly how -V orders things is probably somewhat different between GNU coreutils sort and FreeBSD/OpenBSD sort, but it doesn't matter for IPv4 addresses.

The Illumos /usr/bin/sort is very old, but I know that OmniOS ships /usr/gnu/bin/sort as standard and really you want /usr/gnu/bin early in your $PATH anyways. Life is too short to deal with ancient Solaris tool versions with ancient limitations.

SortingIPv4Addresses written at 01:26:50; Add Comment

(Previous 10 or go back to April 2017 at 2017/04/08)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.