Wandering Thoughts

2017-06-17

One reason you have a mysterious Unix file called 2 (or 1)

Suppose, one day, that you look at the ls of some directory and you notice that you have an odd file called '2' (just the digit). If you look at the contents of this file, it probably has nothing that's particularly odd looking; in fact, it likely looks like plausible output from a command you might have run.

Congratulations, you've almost certainly fallen victim to a simple typo, one that's easy to make in interactive shell usage and in Bourne shell scripts. Here it is:

echo hi  >&2
echo oop >2

The equivalent typo to create a file called 1 is very similar:

might-err 2>&1 | less
might-oop 2>1  | less

(The 1 files created this way are often empty, although not always, since many commands rarely produce anything on standard error.)

In each case, accidentally omitting the '&' in the redirection converts it from redirecting one file descriptor to another (for instance, forcing echo to report something to standard error) into a plain redirect-to-file redirection where the name of the file is your target file descriptor number.

Some of the time you'll notice the problem right away because you don't get output that you expect, but in other cases you may not notice for some time (or ever notice, if this was an interactive command and you just moved on after looking at the output as it was). Probably the easiest version of this typo to miss is in error messages in shell scripts:

if [ ! -f "$SOMETHING" ]; then
  echo "$0: missing file $SOMETHING" 1>2
  echo "$0: aborting" 1>&2
  exit 1
fi

You may never run the script in a way that triggers this error condition, and even if you do you may not realize (or remember) that you're supposed to get two error messages, not just the 'aborting' one.

(After we stumbled over such a file recently, I grep'd all of my scripts for '>2' and '>1'. I was relieved not to find any.)

(For more fun with redirection in the Bourne shell, see also how to pipe just standard error.)

ShellStderrRedirectionOops written at 23:58:57; Add Comment

2017-06-10

One downside of the traditional style of writing Unix manpages

A while back I wrote about waiting for a specific wall-clock time in Unix, which according to POSIX you can do by using clock_nanosleep with the CLOCK_REALTIME clock and the TIMER_ABSTIME flag. This is fully supported on Linux (cf) and not supported on FreeBSD. But here's a question: is it supported on Illumos-derived systems?

So, let us consult the Illumos clock_nanosleep manpage. This manpage is very much written in the traditional (corporate) style of Unix manpages, high on specification and low on extra frills. This style either invites or actively requires a very close reading, paying very careful attention to both what is said and what is not said. The Illumos manpage does not explicitly say that your sleep immediately ends if the system's wall clock time is adjusted forward far enough; instead it says, well:

If the flag TIMER_ABSTIME is set in the flags argument, the clock_nanosleep() function causes the current thread to be suspended from execution until either the time value of the clock specified by clock_id reaches the absolute time specified by the rqtp argument, [or a signal happens]. [...]

The suspension time caused by this function can be longer than requested because the argument value is rounded up to an integer multiple of the sleep resolution, or because of the scheduling of other activity by the system. [...] The suspension for the absolute clock_nanosleep() function (that is, with the TIMER_ABSTIME flag set) will be in effect at least until the value of the corresponding clock reaches the absolute time specified by rqtp, [except for signals].

On the surface, this certainly describes a fully-featured implementation of clock_nanosleep that behaves the way we want. Unfortunately, if you're a neurotic reader of Unix manpages, all is not so clear. The potential weasel words are 'the suspension ... will be in effect at least until ...'. If you don't shorten CLOCK_REALTIME timeouts when the system clock jumps forward, you are technically having them wait 'at least until' the clock reaches their timeout value, because you sort of gave yourself room to have them wait (significantly) longer. At the same time this is a somewhat perverse reading of the manpage, partly because the first sentence of that paragraph alleges that the system will only delay waking you up because of scheduling, which would disallow this particular perversity.

To add to my uncertainty, let's look at the Illumos timer_settime manpage, which contains the following eyebrow-raising wording:

If the flag TIMER_ABSTIME is set in the argument flags, timer_settime() behaves as if the time until next expiration is set to be equal to the difference between the absolute time specified by the it_value member of value and the current value of the clock associated with timerid. That is, the timer expires when the clock reaches the value specified by the it_value member of value. [...]

These two sentences do not appear to be equivalent for the case of CLOCK_REALTIME clocks. The first describes an algorithm that freezes the time to (next) expiration when timer_settime is called, which is not proper CLOCK_REALTIME behavior, and then the second broadly describes correct CLOCK_REALTIME behavior where the timer expires if the real time clock advances past it for any reason.

With all that said, Illumos probably fully implements CLOCK_REALTIME, with proper handling of the system time being adjusted while you're suspended or have a timer set. But its manpages never comes out and say that explicitly, because that's simply not the traditional style of Unix manpages, and the way they're written leaves me with uncertainty. If I cared about this, I would have to write a test program and then run it on a machine where I could set the system time both forward and backward.

This fault is not really with these specific Illumos manpages, although some elements of their wording aren't helping things. This is ultimately a downside to the terse, specification-like traditional style of Unix manpages. Where every word may count and the difference between 'digit' and 'digits' matters, you sooner or later get results like this, situations where you just can't tell.

(Yes, this would be a perverse implementation and a weird way of writing the manpages, but (you might say) perhaps the original Solaris corporate authors really didn't want to admit in plain text that Solaris didn't have a complete implementation of CLOCK_REALTIME.)

Also, I'm sure that different people will read these manpages differently. My reading is unquestionably biased by knowing that clock_nanosleep support is not portable across all Unixes, so I started out wondering if Illumos does support it. If you start reading these manpages with the assumption that of course Illumos supports it, then you get plenty of evidence for that position and all of the wording that I'm jumpy about is obviously me being overly twitchy.

ManpageStyleDownside written at 01:19:53; Add Comment

2017-06-04

Why the popen() API works but more complex versions blow up

Years ago I wrote about a long-standing Unix issue with more sophisticated versions of popen(); my specific example was writing a large amount of stuff to a subprogram through a pipe and then reading its output, where both sides stall trying to write to full pipes. Of course this is not the only way to have this problem bite you, so recently I ran across Andrew Jorgensen's A Tale of Two Pipes (via), where the same problem comes up when a subprogram writes to both standard output and standard error and you consume them one at a time.

Things like Python's subprocess module and many other imitators generally trace their core idea back to the venerable Unix popen(3) library function, which first appeared in V7 Unix. However, popen() itself does not actually have this problem; only more sophisticated and capable interfaces based on it do.

The reason popen() doesn't have the problem is straightforward and points to the core problem with more elaborated versions of the API. popen() doesn't have a problem because it only gives you a single IO stream, either the sub-program's standard input or its standard output. More sophisticated APIs give you multiple streams, and multiple streams are where you get into trouble. You get into trouble because more sophisticated APIs with multiple streams are implicitly pretending that the streams can be dealt with independently and serially, ie that you can fully process one stream before looking at another one at all. As A Tale of Two Pipes makes clear, this is not so. In actuality the streams are inter-dependent and have to be processed together, although Unix pipe buffers can hide this from you for a while.

Of course you can handle the streams properly yourself, resorting to poll() or some similar measure. But you shouldn't have to remember to do that, partly because as long as you have to take additional complex steps to make things work right, people are going to be forgetting this requirement. In the name of looking simple and generic, these APIs have armed a gun that is pointed straight at your feet. A more honest API would make the inter-dependency clear, perhaps by returning a Subprocess object that you registered callbacks on. Callbacks have a bad reputation but they at least make it clear that things can (and will) happen concurrently, instead of one stream being fully handled before another stream is even touched.

(Go has an interesting approach to the problem that is sort of half solution and half not. In its core os/exec API for this, you you provide streams which will be read from or written to asynchronously. However there are helper methods that give you a more traditional 'here is a stream' interface and with it the traditional problems.)

Sidebar: Why people keep creating these flawed subprogram APIs on Unix

These APIs keep getting created because they're attractive. How the API appears to behave (ie, without the deadlock issues) is how people often want to deal with subprograms. Most of the time you're not interacting with them step by step, sending in some input and collecting some output; instead you're sending in the input, collecting the output, and maybe collecting standard error as well in case something blew up. People don't want to write poll() based loops or callbacks or anything complicated, because concurrency is at least annoying. They just want the simple API to work.

Possibly libraries should make the straightforward user code work by handling all of the polling and so on internally and being willing to buffer unlimited amounts of standard output and standard error. This would probably blow up less often than the current scheme does, and you could provide various options for how much to buffer and how to deal with overflow for advanced users.

PopenAPIWiseLimitation written at 02:26:09; Add Comment

2017-05-05

Digging into BSD's choice of Unix group for new directories and files

I have to eat some humble pie here. In comments on my entry on an interesting chmod failure, Greg A. Woods pointed out that FreeBSD's behavior of creating everything inside a directory with the group of the directory is actually traditional BSD behavior (it dates all the way back to the 1980s), not some odd new invention by FreeBSD. As traditional behavior it makes sense that it's explicitly allowed by the standards, but I've also come to think that it makes sense in context and in general. To see this, we need some background about the problem facing BSD.

In the beginning, two things were true in Unix: there was no mkdir() system call, and processes could only be in one group at a time. With processes being in only one group, the choice of the group for a newly created filesystem object was easy; it was your current group. This was felt to be sufficiently obvious behavior that the V7 creat(2) manpage doesn't even mention it.

(The actual behavior is implemented in the kernel in maknode() in iget.c.)

Now things get interesting. 4.1c BSD seems to be where mkdir(2) is introduced and where creat() stops being a system call and becomes an option to open(2). It's also where processes can be in multiple groups for the first time. The 4.1c BSD open(2) manpage is silent about the group of newly created files, while the mkdir(2) manpage specifically claims that new directories will have your effective group (ie, the V7 behavior). This is actually wrong. In both mkdir() in sys_directory.c and maknode() in ufs_syscalls.c, the group of the newly created object is set to the group of the parent directory. Then finally in the 4.2 BSD mkdir(2) manpage the group of the new directory is correctly documented (the 4.2 BSD open(2) manpage continues to say nothing about this). So BSD's traditional behavior was introduced at the same time as processes being in multiple groups, and we can guess that it was introduced as part of that change.

When your process can only be in a single group, as in V7, it makes perfect sense to create new filesystem objects with that as their group. It's basically the same case as making new filesystem objects be owned by you; just as they get your UID, they also get your GID. When your process can be in multiple groups, things get less clear. A filesystem object can only be in one group, so which of your several groups should a new filesystem object be owned by, and how can you most conveniently change that choice?

One option is to have some notion of a 'primary group' and then provide ways to shuffle around which of your groups is the primary group. One problem with this is that it's awkward and error-prone to work in different areas of the filesystem where you want your new files and directories to be in different groups; every time you cd around, you may have to remember to change your primary group. If you move into a collaborative directory, better shift (in your shell) to that group; cd back to $HOME, or simply want to write a new file in $HOME, and you'd better remember to change back.

Another option is the BSD choice of inheriting the group from context. By far the most common case is that you want your new files and directories to be created in the 'context', ie the group, of the surrounding directory. If you're working in $HOME, this is your primary login group; if you're working in a collaborative area, this is the group being used for collaboration. Arguably it's a feature that you don't even have to be in that group (if directory permissions allow you to make new files). Since you can chgrp directories that you own, this option also gives you a relatively easy and persistent way to change which group is chosen for any particular area.

If you fully embrace the idea of Unix processes being in multiple groups, not just having one primary group and then some number of secondary groups, then the BSD choice makes a lot of sense. And for all of its faults, BSD tended to relatively fully embrace its changes (not totally, perhaps partly because it had backwards compatibility issues to consider). While it leads to some odd issues, such as the one I ran into, pretty much any choice here is going to have some oddities. It's also probably the more usable choice in general if you expect much collaboration between different people (well, different Unix logins), partly because it mostly doesn't require people to remember to do things.

(I know that on our systems, a lot of directories intended for collaborative work tend to end up being setgid specifically to get this behavior.)

BSDDirectoryGroupChoice written at 01:00:53; Add Comment

2017-05-03

Sometimes, chmod can fail for interesting reasons

I'll start by presenting this rather interesting and puzzling failure in illustrated form:

; mkdir /tmp/newdir
; chmod g+s /tmp/newdir
chmod: /tmp/newdir: Operation not permitted

How can I not be able to make this chmod change when I just made the directory and I own it? For extra fun, some people on this particular system won't experience this problem, and in fact many of them are the people you might report this problem to, namely the sysadmins.

At first I wondered if this particular /tmp filesystem disallowed setuid and setgid entirely, but it turned out to be not that straightforward:

; ls -ld /tmp/newdir
drwxr-xr-x  2 cks  wheel  512 May  3 00:35 /tmp/newdir

This at least explains why my chmod attempt failed. I'm not in group wheel, and for good reasons you can't make a file setgid to a group that you're not a member of. But how on earth did my newly created directory in /tmp wind up in group wheel, a group I'm not a member of? Well, perhaps someone made /tmp setgid, so all directories created in it inherited its group (presumably group wheel). Let's see:

; ld -ld /tmp
drwxrwxrwt  157 root  wheel  11776 May  3 00:41 /tmp

Although /tmp is indeed group wheel, it has perfectly ordinary permissions (mode 777 and sticky ('t'), so you can only delete or rename your own files). There's no setgid to be seen.

The answer to this mystery is that this is a FreeBSD machine, and on FreeBSD, well, let's quote the mkdir(2) manpage:

The directory's owner ID is set to the process's effective user ID. The directory's group ID is set to that of the parent directory in which it is created.

And also the section of the open(2) manpage that deals with creation of new files:

When a new file is created it is given the group of the directory which contains it.

In other words, on FreeBSD all directories have an implicit setgid bit. Everything created inside them (whether directories or files) inherits the directory's group. Normally this is not a problem and you'll probably never notice, but /tmp (and /var/tmp) are special because they allow everyone to create files and directories in them, and so there are a lot of people making things there who are not a member of the directory's group.

(The sysadmins usually are members of group wheel, though, so things will work for them. This should add extra fun if a user reports the general chmod issue as a problem, since sysadmins can't reproduce it as themselves.)

You might think that this is an obscure issue that no one will ever care about, but actually it caused a Go build failure on FreeBSD for a while. Tracking down the problem took me a while and a bunch of head scratching.

PS: arguably GID 0 should not be group wheel but instead something else that only root is a member of and wheel should be a completely separate group. To have group wheel used for group ownership as well as su access to root is at least confusing.

ChmodInterestingFailure written at 01:39:47; Add Comment

2017-04-29

Some versions of sort can easily sort IPv4 addresses into natural order

Every so often I need to deal with a bunch of IPv4 addresses, and it's most convenient (and best) to have them sorted into what I'll call their natural ascending order. Unfortunately for sysadmins, the natural order of IPv4 addresses is not their lexical order (ie what sort will give you), unless you zero-pad all of their octets. In theory you can zero pad IPv4 addresses if you want, turning 58.172.99.1 into 058.172.099.001, but this form has two flaws; it looks ugly and it doesn't work with a lot of tools.

(Some tools will remove the zero padding, some will interpret zero-padded octets as being in octal instead of decimal, and some will leave the leading zeros on and not work at all; dig -x is one interesting example of the latter. In practice, there are much better ways to deal with this problem and people who zero-pad IPv4 addresses need to be politely corrected.)

Fortunately it turns out that you can get many modern versions of sort to sort plain IPv4 addresses in the right order. The trick is to use its -V argument, which is also known as --version-sort in at least GNU coreutils. Interpreting IPv4 addresses as version numbers is basically exactly what we want, because an all-numeric MAJOR.MINOR.PATCH.SUBPATCH version number sorts in exactly the same way that we want an IPv4 A.B.C.D address to sort.

Unfortunately as far as I know there is no way to sort IPv6 addresses into a natural order using common shell tools. The format of IPv6 addresses is so odd and unusual that I expect we're always going to need a custom program for it, although perhaps someday GNU Sort will grow the necessary superintelligence.

This is a specific example of the kind of general thinking that you need in order to best apply Unix shell tools to your problems. It's quite helpful to always be on the lookout for ways that existing features can be reinterpreted (or creatively perverted) in order to work on your problems. Here we've realized that sort's idea of 'version numbers' includes IPv4 addresses, because from the right angle both they and (some) version numbers are just dot-separated sequences of numbers.

PS: with brute force, you can use any version of sort that supports -t and -k to sort IPv4 addresses; you just need the right magic arguments. I'll leaving working them out (or doing an Internet search for them) as an exercise for the reader.

PPS: for the gory details of how GNU sort treats version sorting, see the Gnu sort manual's section on details about version sort. Okay, technically it's ls's section on version sorting. Did you know that GNU coreutils ls can sort filenames partially based on version numbers? I didn't until now.

(This is a more verbose version of this tweet of mine, because why should I leave useful stuff just on Twitter.)

Sidebar: Which versions of sort support this

When I started writing this entry, I assumed that sort -V was a GNU coreutils extension and would only be supported by the GNU coreutils version. Unixes with other versions (or with versions that are too old) would be out of luck. This doesn't actually appear to be the case, to my surprise.

Based on the GNU Coreutils NEWS file, it appears that 'sort -V' appeared in GNU coreutils 7.0 or 7.1 (in late 2008 to early 2009). The GNU coreutils sort is used by most Linux distributions, including all of the main ones, and almost anything that's modern enough to be getting security updates should have a version of GNU sort that is recent enough to include this.

Older versions of FreeBSD appear to use an old version of GNU coreutils sort; I have access to a FreeBSD 9.3 machine that reports that /usr/bin/sort is GNU coreutils sort 5.3.0 (from 2004, apparently). Current versions of FreeBSD and OpenBSD have switched to their own version of sort, known as version '2.3-FreeBSD', but this version of sort also supports -V (I think the switch happened in FreeBSD 10, because a FreeBSD 10.3 machine I have access to reports this version). Exactly how -V orders things is probably somewhat different between GNU coreutils sort and FreeBSD/OpenBSD sort, but it doesn't matter for IPv4 addresses.

The Illumos /usr/bin/sort is very old, but I know that OmniOS ships /usr/gnu/bin/sort as standard and really you want /usr/gnu/bin early in your $PATH anyways. Life is too short to deal with ancient Solaris tool versions with ancient limitations.

SortingIPv4Addresses written at 01:26:50; Add Comment

2017-04-08

Wayland is now the future of Unix graphics and GUIs

The big Unix graphics news of the past week is that Ubuntu threw in the towel on their Unity GUI and with it their Mir display server (see the Ars story for more analysis). I say 'Unix' instead of 'Linux' here because I think this is going to have consequences well beyond Linux.

While there was a three-way fight for the future between Wayland, Ubuntu's Mir, and the default of X, it was reasonably likely that support for X was going to remain active in things like Firefox, KDE, and even Gnome. As a practical matter, Mir and Wayland were both going to support X programs, so if you targeted X (possibly as well as Wayland and/or Mir) you could run on everything and people would not be yelling at you and so on. But, well, there isn't a three-way fight any more. There is only X and Wayland now, and that makes Wayland the path forward by default. With only one path forward, the pressure for applications and GUI environments to remain backwards compatible to X is going to be (much) lower. And we already know how the Gnome people feel about major breaking changes; as Gnome 3 taught us, the Gnome developers are perfectly fine with them if they think the gain is reasonable.

In short: running exclusively on Wayland is the future of Gnome and Gnome-based programs, which includes Firefox; I suspect that it's also the future of KDE. It's not an immediate future, but in five years I suspect that it will be at least looming if not arriving. At that point, anyone who is not running Wayland will not be getting modern desktop software and programs and sooner or later won't be getting browser security fixes for what they currently have.

People run desktop software on more Unixes than just Linux. With Gnome and important desktop apps moving to Wayland, those Unixes face a real problem; they can live with old apps, or they can move to Wayland too. FreeBSD is apparently working seriously on Wayland support (cf), and at one point a Dragonfly BSD developer had Wayland running there. OpenBSD? Don't hold your breath. Solaris? That's up to Oracle these days but I don't really expect it; it would be a lot of work and I can't imagine that Oracle has many customers who will pay for it. Illumos? Probably not unless someone gets very energetic.

With that said, old X programs and environments are not going to suddenly go away. Fvwm will be there for years or decades to come, for example, as will xterm and any number of other current X programs and window managers. But people who are stuck in X will also be increasingly stuck in the past, unable to run current versions of more and more programs.

(For some people, this will be just fine. We're probably going to see a fairly strong sorting function among the free Unixes for what sort of person winds up where, which is going to make cultural issues even more fun than usual.)

PS: Some people may sneer at 'desktop software and programs', but this category includes quite a lot of things that are attractive but by and large desktop agnostic, like photography programs, Twitter clients, and syndication feed readers. Most modern graphical programs on Unix are built on top of some mid-level toolkit like GTK+ or QT, not on basic X stuff, because those mid-level toolkits make it so much faster and easier to put together GUIs. If and when those toolkits become Wayland-only and the latest versions of the programs move to depend on recent versions of the toolkits, the programs become Wayland-only too.

WaylandNowTheFuture written at 00:28:59; Add Comment

2017-04-05

Why the modern chown command uses a colon to separate the user and group

In the beginning, all chown(1) did was change the owner of a file; if you wanted to change a file's group too, you had to use chgrp(1) as well. This is actually more unusual than I realized before I started to write this entry, because even in V7 Unix the chown(2) system call itself could change both user and group, per the V7 chown(2) manpage. Restricting chown(1) to only changing the owner did make the command itself pretty simple, though.

By the time of 4.1c BSD, chown(1) had become chown(8), because, per the manual page:

Only the super-user can change owner,
in order to simplify as yet unimplemented accounting procedures.

(The System V line of Unixes would retain an unrestricted chown(2) system call for some time and thus I believe they kept the chown command in section 1, for general commands anyone could use.)

In 4.3 BSD, someone decided that chown(8) might as well let you change the group at the same time, to match the system call. As the manual page covers, they used this syntax:

/etc/chown [ -f -R ] owner[.group] file ...

That is, to chown a file to user cks, group staff, you did 'chown cks.staff file'.

This augmented version of the chown command was picked up by various Unixes that descended from 4.x BSD, although not immediately (like many things from 4.3 BSD, it took a while to propagate around). Sometimes this was the primary version of chown, found in /usr/bin or the like; sometimes this was a compatibility version, in /usr/ucb (Solaris through fairly late, for example). Depending on how you set up your $PATH on such systems, you could wind up using this version of chown and thus get used to having 'user:group' rejected as an error.

Then, when it came time for POSIX to standardize this, someone woke up and created the modern syntax for changing both owner and group at once. As seen in the Single Unix Specification for chown, this is 'chown owner[:group] file', ie the separator is now a colon. Since POSIX and the SUS normally standardized existing practice (where it actually existed), you might wonder why they changed it. The answer is simple: a colon is not a valid character in a login, while a dot is.

Sure, dots are unusual in Unix logins in most places, but they're legal and they do show up in some environments (and they're legal in group names as well). Colons are outright illegal unless you like explosions, fundamentally because they're the field separator character in /etc/passwd and /etc/group. The SUS manpage actually has an explicit discussion of this in the RATIONALE section, although it doesn't tell you what it means by 'historical implementations'.

(The SUS manpage also discusses a scenario where using chown and chgrp separately isn't sufficient, and you have to make the change in a single chown() system call.)

PS: Since I think I ran into this dot-versus-colon issue on our old Solaris 10 fileservers, I probably had /usr/ucb before /usr/bin in my $PATH there. I generally prefer UCB versions of things to the stock Solaris versions, but in this case it tripped me up.

PPS: It turns out the GNU chown accepts the dot form as well provided that it's unambiguous, although this is covered only in the chown info file, and is not mentioned in the normal manpage.

ChownColonSeparatorWhy written at 00:47:44; Add Comment

2017-03-13

What should it mean for a system call to time out?

I was just reading Evan Klitzke's Unix System Call Timeouts (via) and among a number of thoughts about it, one of the things that struck me is a simple question. Namely, what should it mean for a Unix system call to time out?

This question may sound pointlessly philosophical, but it's actually very important because what we expect a system call timeout to mean will make a significant difference in how easy it would be to add system calls with timeouts. So let's sketch out two extreme versions. The first extreme version is that if a timeout occurs, the operation done by the system call is entirely abandoned and undone. For example, if you call rename("a", "b") and the operation times out, the kernel guarantees that the file a has not been renamed to b. This is obviously going to be pretty hard, since the kernel may have to reverse partially complete operations. It's also not always possible, because some operations are genuinely irreversible. If you write() data to a pipe and time out partway through doing so (with some but not all data written), you cannot reach into the pipe and 'unwrite' all of the already sent data; after all, some of it may already have been read by a process on the other side of the pipe.

The second extreme version is that having a system call time out merely causes your process to stop waiting for it to complete, with no effects on the kernel side of things. Effectively, the system call is shunted to a separate thread of control and continues to run; it may complete some time, or it may error out, but you never have to wait for it to do either. If the system call would normally return a new file descriptor or the like, the new file descriptor will be closed immediately when the system call completes. In practice implementing a strict version of this would also be relatively hard; you'd need an entire infrastructure for transferring system calls to another kernel context (or more likely, transplanting your user-level process to another kernel context, although that has its own issues). This is also at odds with the existing system calls that take timeouts, which generally result in the operation being abandoned part way through with no guarantees either way about its completion.

(For example, if you make a non-blocking connect() call and then use select() to wait for it with a timeout, the kernel does not guarantee that if the timeout fires the connect() will not be completed. You are in fact in a race between your likely close() of the socket and the connection attempt actually completing.)

The easiest thing to implement would probably be a middle version. If a timeout happens, control returns to your user level with a timeout indication, but the operation may be partially complete and it may be either abandoned in the middle of things or completed for you behind your back. This satisfies a desire to be able to bound the time you wait for system calls to complete, but it does leave you with a messy situation where you don't know either what has happened or what will happen when a timeout occurs. If your mkdir() times out, the directory may or may not exist when you look for it, and it may or may not come into existence later on.

(Implementing timeouts in the kernel is difficult for the same reason that asynchronous IO is hard; there is a lot of kernel code that is much simpler if it's written in straight line form, where it doesn't have to worry about abandoning things part way through at essentially any point where it may have to wait for the outside world.)

SystemCallTimeoutMeaning written at 01:03:40; Add Comment

2017-03-06

Modern X Windows can be a very complicated environment

I mentioned Corebird, the GTK+ Twitter client the other day, and generally positive. That was on a logical weekend. The next day I went in to the office, set up Corebird there, and promptly ran into a problem: I couldn't click on links in Tweets, or rather I could but it didn't activate the link (it would often do other things). Corebird wasn't ignoring left mouse clicks in general, it's just that they wouldn't activate links. I had not had this problem at home (or my views would not have been so positive. I use basically the same fvwm-based window manager environment at home and at work, but since Corebird is a GTK+ application and GTK+ applications can be influenced by all sorts of magic settings and (Gnome) setting daemons, I assumed that it was something subtle that was different in my work GTK+/Gnome environment and filed a Fedora bug in vague hopes. To my surprise, it turned out to be not merely specific to fvwm, but specific to one aspect of my particular fvwm mouse configuration.

The full version is in the thread in the fvwm mailing list, but normally when you click and release a button, the X server generates two events, a ButtonPress and then a ButtonRelease. However, if fvwm was configured in a way such that it might need to do something with a left button press, a different set of events was generated:

  • a LeaveNotify with mode NotifyGrab, to tell Corebird that the mouse pointer had been grabbed away from it (by fvwm).
  • an EnterNotify with mode NotifyUngrab, to tell Corebird 'here is your mouse pointer back because the grab has been released' (because fvwm was passing the button press through to Corebird).
  • the ButtonPress for the mouse button.

The root issue appears to be that something in the depths of GTK+ takes the LeaveNotify to mean that the link has lost focus. Since GTK+ doesn't think the link is focused, when it receives the mouse click it doesn't activate the link, but it does take other action, since it apparently still understands that the mouse is being clinked in the text of the GtkLabel involved.

(There's a test program that uses a simple GtkLabel to demonstrate this, see this, and apparently there are other anomalies in GtkLabel's input processing in this area.)

If you think that this all sounds very complex, yes, exactly. It is. X has a complicated event model to start with, and then interactions with the window manager add extra peculiarities on top. The GTK+ libraries are probably strictly speaking in the wrong here, but I also rather suspect that this is a corner case that the GTK+ programmers never imagined, much less encountered. In a complex environment, some possibilities will drop through the cracks.

(If you want to read a high level overview of passive and active (mouse button) grabs, see eg this 2010 writeup by Peter Hutter. Having read it, I feel like I understand a bit more about what fvwm is doing here.)

By the way, some of this complexity is an artifact of the state of computing when X was created, specifically that both computers and networking were slow. Life would be simpler for everyone if all X events were routed through the window manager and then the window manager passed them on to client programs as appropriate. However, this would require all events to pass through an extra process (and possibly an extra one or two network hops), and in the days when X was young this could have had a real impact on overall responsiveness. So X goes to a great deal of effort to deliver events directly to programs whenever possible while still allowing the window manager to step in.

(My understanding is that in Wayland, the compositor handles all events and passes them to clients as it decides. The Wayland compositor is a lot more than just the equivalent of an X window manager, but it fills that role, and so in Wayland this issue wouldn't come up.)

ModernXCanBeVeryComplex written at 22:39:05; Add Comment

(Previous 10 or go back to March 2017 at 2017/03/01)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.