Wandering Thoughts

2019-10-22

Groups of processes are a frequent and fundamental thing in Unix

Recently, I wrote about a gotcha when catching Control-C in programs that are run from scripts, where things could go wrong because the Control-C was delivered not just to the program but also to the shell script, which wasn't expecting it (while the program was). From the way I wrote that entry (which focused on a gotcha involving this group signalling behavior), you might wind up with the impression that this behavior of Unix signals is a wart in Unix. In fact, it's not; that signals from things like Control-C behave this way is an important part of Unix shell usability.

The core reason for this is that in Unix, it's very common for a group of processes to be one entity as far as you're concerned. Unix likes processes and it likes assembling things out of groups and trees of processes, and so you wind up with what people think of as one entity that is actually composed of multiple processes. When you do things like type a Control-C, you almost always want to operate on the entity as a whole, not any specific process in it, and so Unix supports this by sending terminal signals to its best guess at the group of processes that are one thing.

That sounds pretty abstract, so let's make it concrete. One simple case of a group of processes acting as one entity is the shell pipeline:

$ prog1 <somefile | prog2 | prog3 | prog4

If you type a Control-C, almost everyone wants the entire pipeline to be interrupted and exit. It's not sufficient for the kernel to just signal one process, let it exit, and hope that this causes all of the other ones to hit pipe IO errors, because one of those programs (say prog2) could be engaged in a long, slow computation before it reads or writes to a pipe.

(As a sysadmin, one of my common cases here is 'fgrep some-pattern big-file | tail -10', and then if it takes too long I get impatient and Ctrl-C the whole thing.)

Shell scripts are another obvious case; since the shell is such a relatively limited language, almost all shell scripts run plenty of external programs even when they're not using pipes. That creates at least two processes (the shell script and the external program), and again when you Ctrl-C the command you want both of them to be interrupted.

A final common case for a certain sort of person is running make. Especially for large programs, a make run can create quite deep trees of processes (and go through quite a lot of them). And again, if you Ctrl-C your make, you want everything to be interrupted (and promptly).

(Unix could delegate this responsibility to some single process in this situation, such as the master process for a shell script or make itself. But for much the same reason that basic terminal line editing belongs in the kernel, Unix opts to have the kernel do it.)

ProcessGroupsEverywhere written at 23:39:41; Add Comment

2019-10-03

Making changes to multiple files at once in Vim

We recently finished switching the last of our machines to a different client for Let's Encrypt, and as part of that switch the paths to our TLS certificates had to be updated in all of the configuration files using them. On a lot of our machines there's only a single configuration file, but on some of our Apache servers we have TLS certificate paths in multiple files. This made me quite interested in finding out how to do the same change across multiple files in Vim. It turns out that Vim has supported this for a long time and you can go about it in a variety of ways. Some of these ways expose what I would call a quirk of Vim and other people probably call long-standing design decisions.

Under most circumstances, or more specifically when I'm editing only a moderate number of files, the easiest thing for me to do is to use the very convenient ':all' command to open a window for every buffer, and then use ':windo' to apply a command for every window, eg ':windo %s/.../.../'. Then I'd write out all of the changed buffers with ':wa'.

The Vim tips on Search and replace in multiple buffers and Run a command in multiple buffers also cover ':bufdo', which runs things on all buffers whether or not they're in windows. That was in fact the first thing I tried, except I left off the magic bit to write out changed buffers and Vim promptly stopped after making my change in the first file. This is what I consider a historical quirk, although we're stuck with it.

Vi has long had multiple buffers, but it's always been pretty stubborn about getting you to write out your changes to the current buffer before you moved to another one. It's easy to see the attraction of getting people to do this on the small and relatively loaded machines that vi was originally written on, since a clean buffer is a buffer that you don't have to retain in memory or in a scratch file on disk (and it's also not at risk if either vi or the machine crashes). However, these days it's at odds with how most other multi-file editors approach the problem. Most of them will let you keep any number of modified buffers around without complaint, and merely stop you from quitting without saving them or actively discarding them. Not hassling you all of the time makes these editors a bit easier to use, and Vim is already a bit inconsistent here since windows are allowed to be changed without preventing you from switching away from them.

Given my views here, I probably want to set 'hidden' to on. Unless I'm very confident in my change, I don't want to add '| update' to the ':bufdo' command to immediately write out updates, and as noted 'hidden' being on makes Vim behave more like other editors. The drawbacks that the Vim documentation notes don't apply to me; I never use ':q!' or ':qa!' unless my intention is explicitly to discard all unsaved changes.

It's possible to do this in a trickier way, with the ':hide' command. Because I just experimented with this, it should be done as:

:hide bufdo %s/.../.../

I don't think there's a perfect way to undo the effects of a multi-file operation that didn't work out as I intended. If all buffers were unchanged before the bufdo or windo, I can use it again to invoke undo in each buffer, with ':bufdo u'. With unchanged buffers, this is harmless if my cross-file operation didn't actually change a particular file. If there are unsaved changes in some buffers, though, this becomes dangerous because the undo in each buffer is blind; it will undo the most recent change whether or not that came from the first 'bufdo'.

(All of this tells me that I should carefully (re)read the Vim buffer FAQ, because how Vim does buffers, files, tabs, and windows is kind of confusing. GNU Emacs is also confusing here in its own way, but at least with it I understand the history.)

On the whole, ':all' and then ':windo ...' is the easier to remember and easier to use option, and it lets me immediately inspect some of the changes across all of the files involved. So it's likely to be what I normally use. It's not as elegant as the various other options and I'm sure that Vim purists will sigh, but I'm very much not a Vim purist.

(This is one of those entries that I write for my own future reference. Someday I'll write a page in CSpace to collect all of the Vim things that I want to remember and keep using regularly, but for now blog entries will have to do.)

VimMultiFileChanges written at 23:54:31; Add Comment

2019-09-21

Why chroot is a security feature for (anonymous) FTP

I recently ran across Is chroot a security feature? (via); following Betteridge's law of headlines, the article's answer is 'no', for good reasons that I will let you read in the article. However, I mildly disagree with the article on a philosophical level for the case of anonymous ftp and things like it. Chroot is a security feature for ftpd because ftpd does something special; anonymous ftp adds an additional security context to your system that wasn't there before.

Before you set up anonymous ftp, your system had the familiar Unix security contexts of user, group, and 'all logins'. Anonymous ftp adds the additional context of 'everyone on the network'. This context is definitely not the same as 'everyone with a login on the system' (it's much broader), and so there's good reasons to want to distinguish between the two. This is especially the case if you allow people to write things through anonymous ftp, since Unixes traditionally have and rely on various generally writable directories (not just /tmp and /var/tmp, but also things like queue submission directories). You almost certainly don't want to open those up to everyone on the network just because you opened them up to everyone on the machine.

(The more your Unix machine is only used by a small group of people and the broader the scope of the network it's on, the more difference there is between these contexts. If you take a small research group's Unix machine and put it on the ARPANET, you have a relatively maximal case.)

Ftpd could implement this additional security context itself, as most web servers do. But as web servers demonstrate, this would be a bunch of code and configuration, and it wouldn't necessarily always work (over the years, various web servers and web environments have had various bugs here). Rolling your own access permission system is a complicated thing. Having the kernel do it for you in a simple and predictable way is much easier, and that way you get chroot.

(Now that I've followed this chain of thought, I don't think it's a coincidence that the first use of chroot() for security seems to have been 4.2 BSD's ftpd.)

ChrootFtpdAndContexts written at 23:58:33; Add Comment

2019-09-10

Catching Control-C and a gotcha with shell scripts

Suppose, not entirely hypothetically, that you have some sort of spiffy program that wants to use Control-C as a key binding to get it to take some action. In Unix, there are two ways of catching Control-C for this sort of thing. First, you can put the terminal into raw mode, where Control-C becomes just another character that you read from the terminal and you can react to it in any way you like. This is very general but it has various drawbacks, like you have to manage the terminal state and you have to be actively reading from the terminal so you can notice when the key is typed. The simpler alternative way of catching Control-C is to set a signal handler for SIGINT and then react when it's invoked. With a signal handler, the kernel's standard tty input handling does all of that hard work for you and you just get the end result in the form of an asynchronous SIGINT signal. It's quite convenient and leaves you with a lot less code and complexity in your spiffy Control-C catching program.

Then some day you run your spiffy program from inside a shell script (perhaps you wanted to add some locking), hit Control-C to signal your program, and suddenly you have a mess (what sort of a mess depends on whether or not your shell does job control). The problem is that when you let the kernel handle Control-C by delivering a SIGINT signal, it doesn't just deliver it to your program; it delivers it to the shell script and in fact any other programs that the shell script is also running (such as a flock command used to add locking). The shell script and these other programs are not expecting to receive SIGINT signals and haven't set up anything special to handle it, so they will get killed.

(Specifically, the kernel will send the SIGINT to all processes in the foreground process group.)

Since your shell was running the shell script as your command and the shell script exited, many shells will decide that your command has finished. This means they'll show you the shell prompt and start interacting with you again. This can leave your spiffy program and your shell fighting over terminal output and perhaps terminal input as well. Even if your shell and your spiffy program don't fight for input and write their output and shell prompt all over each other, generally things don't go well; for example, the rest of your shell script isn't getting run, because the shell script died.

Unfortunately there isn't a good general way around this problem. If you can arrange it, the ideal is for the wrapper shell script to wind up directly exec'ing your spiffy program so there's nothing else a SIGINT will be sent to (and kill). Failing that, you might have to make the wrapper script trap and ignore SIGINT while it's running your program (and to make your program unconditionally install its SIGINT signal handler, even if SIGINT is ignored when the program starts).

Speaking from painful personal experience, this is an easy issue to overlook (and a mysterious one to diagnose). And of course everything works when you test your spiffy program by running it directly, because then the only process getting a SIGINT is the one that's prepared for it.

CatchingCtrlCAndScripts written at 20:54:47; Add Comment

2019-08-21

Making sense of OpenBSD 'pfctl -ss' output for firewall state tables

Suppose, not entirely hypothetically, that you have some OpenBSD firewalls and every so often you wind up looking at the state table listing that's produced by 'pfctl -ss'. On first impression, this output looks sort of understandable, with entries like:

all tcp 172.17.110.193:22 <- 128.100.3.X:46392       ESTABLISHED:ESTABLISHED
all tcp 128.100.3.X:46392 -> 172.17.110.193:22       ESTABLISHED:ESTABLISHED

I won't say that appearances are deceptive here, but things are not as straightforward as they look once you start wanting to know what this is really telling you. For instance, there is no documentation on what that 'all' actually means. Since I've been digging into this, here's what I've learned.

The general form of a state table entry as printed by 'pfctl -ss' is:

IFACE PROTO LEFT-ADDR DIR RIGHT-ADDR   LEFT-STATE:RIGHT-STATE

At least for our firewalls, the interface is generally 'all'. The protocol can be any number of things, including tcp, udp, icmp, esp, ospf, and pfsync. For TCP connections, the listed states are the TCP states (and you can get all of the weird and wonderful conditions where the two directions of the connection are in different states, such as half-closed connections). For other protocols there's a smaller list; see the description of 'set timeout' in pf.conf's OPTIONS section for a discussion of most of them. There's also a NO_TRAFFIC state for when no traffic has happened in one direction.

So let's talk about directions, the field for which I have called DIR and which will always be either '<-' or '->', which mean in and out respectively. By that I mean PF_IN and PF_OUT (plus PF_FWD for forwarded packets), not 'inside' and 'outside'. OpenBSD PF doesn't have any notion of inside and outside interfaces, but it does have a notion of incoming traffic and outgoing traffic, and that is what ultimately determines the direction. If a packet is matched or handled during input and that creates a state table entry, that will be an in entry; similarly, matching or passing it during output will create an out entry. Sometimes this is through explicit 'pass in' and 'pass out' rules, but other times you have a bidirectional rule (eg 'match on <IF> ... binat-to ...') and then the direction depends on packet flow.

The first thing to know is that contrary to what I believed when I started writing this entry, all state table entries are created by rules. As far as I can tell, there are no explicit state table entries that get added to handle replies; the existing 'forward' state table entries are just used in reverse to match the return traffic. The reason that state table entries usually come in pairs (at least for us) is that we have both 'pass in' and 'pass out' rules that apply to almost all packets, and so both rules create a corresponding state table entry for a specific connection. An active, permitted connection will thus have two state table entries, one for the 'pass in' rule that allows it in and one for the 'pass out' rule that allows it out.

The meaning of the left and the right address changes depending on the direction. For an out state table entry, the left address is the (packet or connection) source address and the right address is the destination address; for an in state table entry it's reversed, with the left address the destination and the right address the source. The LEFT-STATE and RIGHT-STATE fields are associated with the left and the right addresses respectively, whatever they are, and for paired up state table entries I believe they're always going to be mirrors of each other.

(I believe that the corollary of this is that the NO_TRAFFIC state can only appear on the destination side, ie the side that didn't originate the packet flow. This means that for an out state NO_TRAFFIC will always be the right state, and on an in state it will always be the left one.)

So far I have shown a pair of state table entries from a simple firewall without any sort of NAT'ing going on (which includes 'rdr-to' rules). If you have some sort of NAT in effect, the output changes and generally that change will be asymmetric between the pair of state table entries. Here is an example:

all tcp 128.100.X.X:22 <- 172.17.110.193:58240       ESTABLISHED:ESTABLISHED
all tcp 128.100.3.Y:60689 (172.17.110.193:58240) -> 128.100.X.X:22       ESTABLISHED:ESTABLISHED

This machine has made an outgoing SSH connection that was first matched by a 'pass in' rule and then NAT'd on output. Inbound NAT creates a different set of state table entries:

all tcp 10.X.X.X:22 (128.100.20.X:22) <- 1.2.3.4:52000       ESTABLISHED:ESTABLISHED
all tcp 1.2.3.4:52000 -> 10.X.X.X:22       ESTABLISHED:ESTABLISHED

The rule is that the pre-translation address is in () and the post translation address is not. On outbound NAT, the pre-translation address is the internal address and the post-translation one is the public IP; on inbound NAT it's the reverse. Notice that this time the NAT was applied on input, not on output, and of course there was a 'pass in' rule that matched.

(If you have binat-to machines they can have both sorts of entries at once, with some connections coming in from outside and some connections going outside from the machine.)

If you do your NAT through bidirectional rules (such as 'match on <IF> ...'), where NAT is applied is determined by what interface you specify in the rule combined with packet flow. This is our case; all of our NAT rules are applied on our perimeter firewall's external interface. If we applied them to the internal interface, we could create situations where the right address had the NAT mapping instead of the left one. The resulting state table entries would look like this (for an inbound connect that was RDR'd):

all tcp 128.100.3.X:25 <- 128.100.A.B:39304       ESTABLISHED:ESTABLISHED
all tcp 128.100.A.B:39304 -> 128.100.3.YYY:25 (128.100.3.X:25)       ESTABLISHED:ESTABLISHED

This still follows the rule that the pre-translation address is in the () and the post-translation address is not.

In general, given only a set of state table entries, you don't know what is internal and what is external. This is true even when NAT is in effect, because you don't necessarily know where NAT is being applied (as shown here; all NAT'd addresses are internal ones, but they show up almost all over). If you know certain things about your rules, you can know more from your state table entries (without having to do things like parse IP addresses and match network ranges). Given how and where we apply NAT, it's always going to appear in our left addresses, and if it appears on an in state table entry it's an external machine making an inbound connection instead of an internal machine making an outgoing one.

PS: According to the pfctl code, you may sometimes see extra text in left or right address that look like '{ <IP address> }'. I believe this appears only if you use af-to to do NAT translation between IPv4 and IPv6 addresses. I'm not sure if it lists the translated address or the original.

PPS: Since I just tested this, the state of an attempted TCP connection in progress to something that isn't responding is SYN_SENT for the source paired with CLOSED for the destination. An attempted TCP connection that has been refused by the destination with a RST has a TIME_WAIT:TIME_WAIT state. Both of these are explicitly set in the relevant pf.c code; see pf_create_state and pf_tcp_track_full (for the RST handling). Probably those are what you'd expect from the TCP state transitions in general.

Sidebar: At least three ways to get singleton state table entries

I mentioned that state table entries usually come in pairs. There are at least three exceptions. The first is state table entries for traffic to the firewall itself, including both pings and things like SSH connections; these are accepted in 'pass in' rules but are never sent out to anywhere, so they never get a second entry. The second is traffic that is accepted by 'pass in' rules but then matches some 'block out' rule so that it's not actually sent out. The third and most obvious exception is that if you match in one direction with 'no state' but use state on the other one, perhaps by accident or omission.

(Blocked traffic tends to have NO_TRAFFIC as the state for one side, but not all NO_TRAFFIC states are because of blocks; sometimes they're just because you're sending traffic to something that doesn't respond.)

I was going to say things about the relative number of in and out states as a consequence and corollary of this, but now that I've looked at our actual data I'm afraid I have no idea what's going on.

(I think that part of it is that for TCP connections, you can have closed down or inactive connections where one state table entry expires before the other. This may apply to non-TCP connections too, but my head hurts. For that matter, I'm not certain that 'pfctl -ss' is guaranteed to report a coherent copy of the state table. Pfctl does get it from the kernel in a single ioctl(), but the kernel may be mutating the table during the process.)

OpenBSDPfctlStates written at 20:52:57; Add Comment

2019-08-07

What has to happen with Unix virtual memory when you have no swap space

Recently, Artem S. Tashkinov wrote on the Linux kernel mailing list about a Linux problem under memory pressure (via, and threaded here). The specific reproduction instructions involved having low RAM, turning off swap space, and then putting the system under load, and when that happened (emphasis mine):

Once you hit a situation when opening a new tab requires more RAM than is currently available, the system will stall hard. You will barely be able to move the mouse pointer. Your disk LED will be flashing incessantly (I'm not entirely sure why). [...]

I'm afraid I have bad news for the people snickering at Linux here; if you're running without swap space, you can probably get any Unix to behave this way under memory pressure. If you can't on your particular Unix, I'd actually say that your Unix is probably not letting you get full use out of your RAM.

To simplify a bit, we can divide pages of user memory up into anonymous pages and file-backed pages. File-backed pages are what they sound like; they come from some specific file on the filesystem that they can be written out to (if they're dirty) or read back in from. Anonymous pages are not backed by a file, so the only place they can be written out to and read back in from is swap space. Anonymous pages mostly come from dynamic memory allocations and from modifying the program's global variables and data; file backed pages come mostly from mapping files into memory with mmap() and also, crucially, from the code and read-only data of the program.

(A file backed page can turn into an anonymous page under some circumstances.)

Under normal circumstances, when you have swap space and your system is under memory pressure a Unix kernel will balance evicting anonymous pages out to swap space and evicting file-backed pages back to their source file. However, when you have no swap space, the kernel cannot evict anonymous pages any more; they're stuck in RAM because there's nowhere else to put them. All the kernel can do to reclaim memory is to evict whatever file-backed pages there are, even if these pages are going to be needed again very soon and will just have to be read back in from the filesystem. If RAM keeps getting allocated for anonymous pages, there is less and less RAM left to hold whatever collection of file-backed pages your system needs to do anything useful and your system will spend more and more time thrashing around reading file-backed pages back in (with your disk LED blinking all of the time). Since one of the sources of file-backed pages is the executable code of all of your programs (and most of the shared libraries they use), it's quite possible to get into a situation where your programs can barely run without taking a page fault for another page of code.

(This frantic eviction of file-backed pages can happen even if you have anonymous pages that are being used only very infrequently and so would normally be immediately pushed out to swap space. With no swap space, anonymous pages are stuck in RAM no matter how infrequently they're touched; the only anonymous pages that can be discarded are ones that have never been written to and so are guaranteed to be all zero.)

In the old days, this usually was not very much of an issue because system RAM was generally large compared to the size of programs and thus the amount of file-backed pages that were likely to be in memory. That's no longer the case today; modern large programs such as Firefox and its shared libraries can have significant amounts of file-backed code and data pages (in addition to their often large use of dynamically allocated memory, ie anonymous pages).

In theory, this thrashing can happen in any Unix. To prevent it, your Unix has to decide to deliberately not allow you to allocate more anonymous pages after a certain point, even though it could evict file-backed pages to make room for them. Deciding when to cut your anonymous page allocations off is necessarily a heuristic, and so any Unix that tries to do it is sooner or later going to prevent you from using some of your RAM.

(This is different than the usual issue with overcommitting virtual memory address space because you're not asking for more memory than could theoretically be satisfied. The kernel has to guess how much file-backed memory programs will need in order to perform decently, and it has to do so at the time when you try to allocate anonymous memory since it can't take the memory back later.)

NoSwapConsequence written at 22:26:29; Add Comment

2019-08-05

dup(2) and shared file descriptors

In my entry on how sharing file descriptors with child processes is a clever Unix decision, I said:

This full sharing is probably easier to implement in the kernel than making an independent copy of the file descriptor (unless you also changed how dup() works). [...]

Currently, dup() specifically shares the file offset between the old file descriptor and the new duplicated version. This implies a shared file descriptor state within the kernel for at least file descriptors in the current process, and along with it some way to keep track of when the last reference to a particular shared state goes away (because only then can the kernel actually close the file and potentially trigger things like pending deletes).

Once you have to have this shared descriptor state within a single process, it's relatively straightforward to extend this to multiple processes, especially in the kind of plain uniprocessor kernel environment that Unix had for a long time. Basically, instead of having a per-process data structure for shared file descriptor state, you have a single global one, and everyone manipulates entries in it. You need reference counting regardless of whether file descriptor state is shared within a process or across processes.

(Then each process has a mapping from file descriptor number to the shared state. In early Unixes, this was a small fixed size array, the u_ofile array in the user structure. Naturally, early Unixes also had a fixed size array for the actual file structures for open files, as seen in V7's c.c and param.h. You can see V7's shared file structure here.)

PS: The other attraction of this in small kernel environments, as seen in the V7 implementation, is that if file descriptor state is shared across all processes, you need significantly fewer copies of the state for a given file that's passed to children a lot, as is common for standard input, standard output, and standard error.

DupAndSharedFileDescriptors written at 21:15:00; Add Comment

2019-08-03

Sharing file descriptors with child processes is a clever Unix decision

One of the things that happens when a Unix process clones itself and executes another program is that the first process's open file descriptors are shared across into the child (well, apart from the ones that are marked 'close on exec'). This is not just sharing that the new process has the same files or IO streams open, the way that it would have if it open()'d them independently; this shares the actual kernel level file descriptors. This full sharing means that if one process changes the properties of file descriptors, those changes are experienced by the other processes as well.

(This inheritance of file descriptors sometimes has not entirely desirable consequences, as does that file descriptor properties are shared. Running a program that leaves standard input set to O_NONBLOCK is often still a reliable way to get your shell to immediately exit after the program finishes. Many shells reset the TTY properties, but often don't think of O_NONBLOCK.)

This full sharing is probably easier to implement in the kernel than making an independent copy of the file descriptor (unless you also changed how dup() works). But it has another important property that makes it a clever choice for Unix, which is that the file offset is part of what is shared and this means that the following subshell operation can work as intended:

(sed -e 10q -e 's/^/a: /'; sed -e 10q -e 's/^/b: /') <afile

(Let's magically assume that sed doesn't use buffered reads and so will read only exactly ten lines each time. This isn't true in practice.)

If the file offset wasn't shared between all children, it's not clear how this would work. You'd probably have to invent some sort of pipe-like file descriptor that either shared the file offset or was a buffer and didn't support seeking, and then have the shell use it (probably along with some other programs).

Sharing the file offset is also the natural way to handle multiple processes writing standard output (or standard error) to a file, as in the following example:

(program1; program2; program3) >afile

If the file offset wasn't shared, each process would start writing at the start of afile and they'd overwrite each other's results. Again, you'd need some pipe-like trick to make this work.

(Once you have O_APPEND, you can use it for this, but O_APPEND appears to postdate V7 Unix; it's not in the V7 open(2) manpage.)

PS: The implementation of shared file descriptors across processes in old Unixes is much simplified by the fact that they're uniprocessor environments, so the kernel has no need to worry about locking for updating file offsets (or much of anything else to do with them). Only one process can be in the kernel manipulating them at any given time.

SharedFileDescriptorsClever written at 19:39:04; Add Comment

2019-07-28

What I want out of my window manager

One answer to what I want out of my window manager is 'fvwm'. It's my current window manager and I'm not likely to switch to anything else because I'm perfectly satisfied with it. But that's not a good answer, because fvwm has a lot of features and I'm not using them all. As with everyone who uses a highly customizable thing, my important subset of fvwm is probably not quite the same as anyone else's important subset of it.

(I'm thinking about what I want out of my window manager because Wayland is coming someday, and that means I'm almost certainly going to need a new window manager at some time in, say, the next ten years.)

I can't tell for sure what's important to me, because I'm sort of a fish in water when it comes to fvwm and my fvwm configuration; I've been using it exclusively for so long that I'm not certain what I'd really miss if I moved and what's unusual. With that said, I think that the (somewhat) unusual features that I want go like this (on top of a straightforward 'floating layout' window manager):

  • Something like FvwmIconMan, which is central to how I manage terminal windows (which I tend to have a lot of).

  • The ability to iconify windows to icons on the root window and then place those icons in specific locations where they'll stay. I also want to be able to record the location of those icons and reposition them back, because I do that. Putting iconified windows in specific places is how I currently manage my plethora of Firefox windows, including keeping track of what I'm going to read soon. As usual, icons need to have both an icon and a little title string.

    (Perhaps I should figure out a better way to handle Firefox windows, one that involves less clutter. I have some thoughts there, although that's for another entry. But even with Firefox handled, there are various other windows I keep around in iconified form.)

  • Multiple virtual desktops or screens, with some sort of pager to show me a schematic view of what is on what screen or desktop and to let me switch between them by the mouse. I also need key bindings to flip around between screens. It has to be possible to easily move windows (in normal or iconified form) from screen to screen, including from the command line, and I should be able to set it so that some icons or windows are always present (ie, they float from screen to screen).

  • Window title bars that can be either present or absent, because some of my windows have them and some don't. I'd like the ability to customize what buttons a window titlebar has and what they do, but it's not really important; I could live with everything with a titlebar having a standard set.

  • User-defined menus that can be brought up with a wide variety of keys, because I have a lot of menus that are bound to a lot of different keys. My fvwm menus are one of my two major ways of launching programs, and I count on having a lot of different key bindings to make them accessible without having to go through multiple menu levels.

  • User-defined key bindings, including key bindings that still work when the keyboard focus is on a window. Key bindings need to be able to invoke both window manager functions (like raising and lowering windows) and to run user programs, especially dmenu.
  • User-defined bindings for mouse buttons, because I use a bunch of them.

  • Minimal or no clutter apart from things that I specifically want. I don't want the window manager insisting that certain interface elements must exist, such as a taskbar.

  • What fvwm calls 'focus follows mouse', where the keyboard focus is on the last window the mouse was in even if the mouse is then moved out to be over the root window. I don't want click to focus for various reasons and I now find strict mouse focus to be too limiting.

Fvwm allows me great power over customizing the fonts used, the exact width of window borders, and so on, but for the most part it's not something I care deeply about if the window manager does a competent job and makes good choices in general. It's convenient if the window manager has a command interface for testing and applying small configuration changes, like FvwmConsole; restarting the window manager when you have a lot of windows is kind of a pain.

(As you might guess from my priorities, my fvwm configuration file is almost entirely menu configurations, key and mouse button bindings, and making FvwmIconMan and fvwm's pager work right. I have in the past tried tricky things, but at this point I'm no longer really using any of them. All of my vaguely recent changes have been around keyboard bindings for things like moving windows and changing sound volume.)

PS: Command line interface and control of the window manager would be pretty handy. I may not use FvwmCommand very often, but I like that it's there. And I do use the Perl API for my hack.

Sidebar: Fvwm's virtual screens versus virtual desktops

Fvwm has both virtual screens and virtual desktops, and draws a distinction between them that is covered in the relevant section of its manpage. I use fvwm's virtual screens but not its desktops, and in practice I treat every virtual screen as a separate thing. It can sometimes be convenient that a window can spill over from virtual screen to virtual screen, since it often gives me a way of grabbing the corner of an extra-large window. On the other hand, it's also irritating when a window winds up protruding into another virtual screen.

All of this is leading up to saying that I wouldn't particularly object to a window manager that had only what fvwm would call virtual desktops, without virtual screens. This is good because I think that most modern window managers have adopted that model for their virtual things.

WindowManagerWants written at 00:39:19; Add Comment

2019-07-22

Why file and directory operations are synchronous in NFS

One of the things that unpleasantly surprises people about NFS every so often is that file and directory operations like creating a file, renaming it, or removing it are synchronous. This can make operations like unpacking a tar file or doing a VCS clone or checkout be startlingly slow, much slower than they are on a local filesystem. Even removing a directory tree can be drastically slower than it is locally.

(Anything that creates files also suffers from the issue that NFS clients normally force a flush to disk after they finish writing a file.)

In the original NFS, all writes were synchronous. This was quite simple but also quite slow, and for NFS v3, the protocol moved to a more complicated scheme for data writes, where the majority of data writes could be asynchronous but the client could force the server to flush them all to disk every so often. However, even in NFS v3 the protocol more or less requires that directory level operations are synchronous. You might wonder why.

One simple answer is that the Unix API provides no way to report delayed errors for file and directory operations. If you write() data, it is an accepted part of the Unix API that errors stemming from that write may not be reported until much later, such as when you close() the file. This includes not just 'IO error' type errors, but also problems such as 'out of space' or 'disk quota exceeded'; they may only appear and become definite when the system forces the data to be written out. However, there's no equivalent of close() for things like removing files or renaming them, or making directories; the Unix API assumes that these either succeed or fail on the spot.

(Of course, the Unix API doesn't necessarily promise that all errors are reported at close() and that close() flushes your data to disk. But at least close() explicitly provides the API a final opportunity to report that some errors happened somewhere, and thus allows it to not report all errors at write()s.)

This lack in the Unix API means that it's pretty dangerous for a kernel to accept such operations without actually committing them; if something goes wrong, there's no way to report the problem (and often no process left to report them to). It's especially dangerous in a network filesystem, where the server may crash and reboot without programs on the client noticing (there's no Unix API for that either). It would be very disconcerting if you did a VCS checkout, started working, had everything stall for a few minutes (as the server crashed and came back), and then suddenly all of your checkout was different (because the server hadn't committed it).

You could imagine a network filesystem where the filesystem protocol itself said that file and directory operations were asynchronous until explicitly committed, like NFS v3 writes. But since the Unix API has no way to expose this to programs, the client kernel would just wind up making those file and directory operations synchronous again so that it could immediately report any and all errors when you did mkdir(), rename(), unlink(), or whatever. Nor could the client kernel really batch up a bunch of those operations and send them off to the network filesystem server as a single block; instead it would need to send them one by one just to get them registered and get an initial indication of success or failure (partly because programs often do inconvenient things like mkdir() a directory and then immediately start creating further things in it).

Given all of this, it's not surprising that neither the NFS protocol nor common NFS server implementations try to change the situation. With no support from the Unix API, NFS clients will pretty much always send NFS file and directory operations to the server as they happen and need an immediate reply. In order to avoid surprise client-visible rollbacks, NFS servers are then more or less obliged to commit these metadata changes as they come in, before they send back the replies. The net result is a series of synchronous operations; the client kernel has to send the NFS request and wait for the server reply before it returns from the system call, and the server has to commit before it sends out its reply.

(In the traditional Unix way, some kernels and some filesystems do accept file and metadata operations without committing them. This leads to problems. Generally, though, the kernel makes it so that your operations will only fail due to a crash or an actual disk write error, both of which are pretty uncommon, not due to other delayed issues like 'out of disk space' or 'permission denied (when I got around to checking)'.)

NFSSynchronousMetadata written at 21:10:08; Add Comment

(Previous 10 or go back to June 2019 at 2019/06/30)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.