Wandering Thoughts

2019-06-19

How Bash decides it's being invoked through sshd and sources your .bashrc

Under normal circumstances, Bash only sources your .bashrc when it's run as an interactive non-login shell; for example, this is what the Bash manual says about startup files. Well, it is most of what the manual says, because there is an important exception, which the Bash manual describes as 'Invoked by remote shell daemon':

Bash attempts to determine when it is being run with its standard input connected to a network connection, as when executed by the remote shell daemon, usually rshd, or the secure shell daemon sshd. If Bash determines it is being run in this fashion, it reads and executes commands from ~/.bashrc, [...]

(You can tell how old this paragraph of the manual is because of how much prominence it gives to rshd. Also, note that this specific phrasing about standard input presages my discovery of when bash doesn't do this.)

As the result of recent events, I became interested in discovering exactly how Bash decides that it's being run in the form of 'ssh host command' and sources your .bashrc. There turn out to be two parts to this answer, but the summary is that if this is enabled at all, Bash will always source your .bashrc for non-interactive commands if you've logged in to a machine via SSH.

First, this feature may not even be enabled in your version of Bash, because it's a non-default configuration setting (and has been since Bash 2.05a, which is pretty old). Debian and thus Ubuntu turn this feature on, as does Fedora, but the FreeBSD machine I have access to doesn't in the version of Bash that's in its ports. Unsurprisingly, OmniOS doesn't seem to either. If you compile Bash yourself without manually changing the relevant bit of config-top.h, you'll get a version without this.

(Based on some digging, I think that Arch Linux also builds Bash without enabling this, since they don't seem to patch config-top.h. I will leave it to energetic people to check other Linuxes and other *BSDs.)

Second, how it works is actually very simple. In practice, a non-interactive Bash decides that it is being invoked by SSHD if either $SSH_CLIENT or $SSH2_CLIENT are defined in the environment. In a robotic sense this is perfectly correct, since OpenSSH's sshd puts $SSH_CLIENT in the environment when you do 'ssh host command'. In practice it is wrong, because OpenSSH sets $SSH_CLIENT all the time, including for logins. So if you use SSH to log in somewhere, $SSH_CLIENT will be set in your shell environment, and then any non-interactive Bash will decide that it should source ~/.bashrc. This includes, for example, the Bash that is run (as 'bash -c ...') to execute commands when you have a Makefile that has explicitly set 'SHELL=/bin/bash', as Makefiles that are created by the GNU autoconfigure system tend to do.

As a result, if you have ancient historical things in a .bashrc, for example clearing the screen on exit, then surprise, those things will happen for every command that make runs. This may not make you happy. For situations like Makefiles that explicitly set 'SHELL=/bin/bash', this can happen even if you don't use Bash as your login shell and haven't had anything to do with it for years.

(Of course it also happens if you have perfectly modern things there and expect that they won't get invoked for non-interactive shells, and you do use Bash as your login shell. But if you use Bash as your login shell, you're more likely to notice this issue, because routine ordinary activities like 'ssh host command' or 'rsync host:/something .' are more likely to fail, or at least do additional odd things.)

PS: This October 2001 comment in variables.c sort of suggests why support for this feature is now an opt-in thing.

PPS: If you want to see if your version of Bash has this enabled, the simple way to tell is to run strings on the binary and see if the embedded strings include 'SSH_CLIENT'. Eg:

; /etc/fedora-release 
Fedora release 29 (Twenty Nine)
; strings -a /usr/bin/bash | fgrep SSH_CLIENT
SSH_CLIENT

So the Fedora 29 version does have this somewhat dubious feature enabled. Perhaps Debian and Fedora feel stuck with it due to very long-going backwards compatibility, where people would be upset if Bash stopped doing this in some new Debian or Fedora release.

Sidebar: The actual code involved

The code for this can currently be found in run_startup_files in shell.c:

  /* get the rshd/sshd case out of the way first. */
  if (interactive_shell == 0 && no_rc == 0 && login_shell == 0 &&
      act_like_sh == 0 && command_execution_string)
    {
#ifdef SSH_SOURCE_BASHRC
      run_by_ssh = (find_variable ("SSH_CLIENT") != (SHELL_VAR *)0) ||
                   (find_variable ("SSH2_CLIENT") != (SHELL_VAR *)0);
#else
      run_by_ssh = 0;
#endif

[...]

Here we can see that the current Bash source code is entirely aware that no one uses rshd any more, among other things.

BashDetectRemoteInvocation written at 22:50:11; Add Comment

2019-06-02

I haven't customized my Vim setup and I'm not sure I should try to (yet)

I was recently reading At least one Vim trick you might not know (via). In passing, the article divides Vim users (and its tips) into purists, who deliberately use Vim with minimal configuration, and exobrains, who "stuff Vim full of plugins, functions, and homebrew mappings". All of this is to say that currently, as a Vim user I am a non-exobrain; I use Vim with minimal customization (although not none).

This is not because I am a deliberate purist. Instead, it's partly because I've so far perceived the universe of Vim customizations as a daunting and complex place that seems like too much work to explore when my Vim (in its current state) works well enough for me. Well, that's not entirely true. I'm also aware that I could improve my Vim experience with more knowledge and use of Vim's own built in features. Trying to add customizations to Vim when I haven't even mastered its relative basics doesn't seem like a smart idea, and it also seems like I'd make bad decisions about what to customize and how.

(Part of the dauntingness is that in my casual reading, there seem to be several different ways to manage and maintain Vim plugins. I don't know enough to pick the right one, or even evaluate which one is more popular or better.)

There are probably Vim customizations and plugins that could improve various aspects of my Vim experience. But finding them starts with the most difficult part, which is understanding what I actually want from my Vim experience and what sort of additions would clash with it. The way I've traditionally used Vim is that I treat it as a 'transparent' editor, one where my interest is in getting words (and sometimes code) down on the screen. In theory, a good change would be something that increases this transparency, that deals with some aspect of editing that currently breaks me out of the flow and makes me think about mechanics.

(I think that the most obvious candidate for this would be some sort of optional smart indentation for code and annoying things like YAML files. I don't want smart indentation all of the time, but putting the cursor in the right place by default is a great use of a computer, assuming that you can make it work well inside Vim's model.)

Of course the other advantage of mostly avoiding customizing my Vim experience is that it preserves a number of the advantages that make Vim a good sysadmin's editor. I edit files with Vim in a lot of different contexts, and it's useful if these all behave pretty much the same. And of course getting better at core Vim improves things for me in all of these environments, since core Vim is everywhere. Even if I someday start customizing my main personal Vim with extra things to make it nicer, focusing on core Vim until I think I have all of the basics I care about down is more generally useful right now.

(As an illustration of this, one little bit of core Vim that I've been finding more and more convenient as I remember it more is the Ctrl-A and Ctrl-X commands to increment and decrement numbers in the text. This is somewhat a peculiarity of our environment, but it comes up surprisingly often. And this works everywhere.)

PS: Emacs is not entirely simpler than Vim here as far as customization go, but I have a longer history with customizing Emacs than I do with Vim. And it does seem like Emacs has their package ecology fairly nailed down, based on my investigations from a while back for code editing.

VimMinimalCustomization written at 00:23:58; Add Comment

2019-05-31

Some things about where icons for modern X applications come from

If you have a traditional window manager like fvwm, one of the things it can do is iconify X windows so that they turn into icons on the root window (which would often be called the 'desktop'). Even modern desktop environments that don't iconify programs to the root window (or their desktop) may have per-program icons for running programs in their dock or taskbar. If your window manager or desktop environment can do this, you might reasonably wonder where those icons come from by default.

Although I don't know how it was done in the early days of X, the modern standard for this is part of the Extended Window Manager Hints. In EWMH, applications give the window manager a number of possible icons, generally in different sizes, as ARGB bitmaps (instead of, say, SVG format). The window manager or desktop environment can then pick whichever icon size it likes best, taking into account things like the display resolution and so on, and display it however it wants to (in its original size or scaled up or down).

How this is communicated in specific is through the only good interprocess communication method that X supplies, namely X properties. In the specific case of icons, the _NET_WM_ICON property is what is used, and xprop can display the size information and an ASCII art summary of what each icon looks like. It's also possible to use some additional magic to read out the raw data from _NET_WM_ICON in a useful format; see, for example, this Stackoverflow question and its answers.

(One reason to extract all of the different icon sizes for a program is if you want to force your window manager to use a different size of icon than it defaults to. Another is if you want to reuse the icon for another program, again often through window manager settings.)

X programs themselves have to get the data that they put into _NET_WM_ICON from somewhere. Some programs may have explicit PNGs (or whatever) on the filesystem that they read when they start (and thus that you can too), but others often build this into their program binary or compiled data files, which means that you have to go to the source code to pull the files out (and they may not be in a bitmap format like PNG; there are probably programs that start with a SVG and then render it to various sized PNGs).

(As a concrete example, as far as I know Firefox's official icons are in the 'defaultNN.png' files in browser/branding/official. Actual builds may not use all of the sizes available, or at least not put them into _NET_WM_ICON; on Fedora 29, for example, the official Fedora Firefox 66 only offers up to 32x32, which is tragically small on my HiDPI display.)

None of this is necessarily how a modern integrated desktop like Gnome or KDE handles icons for their own programs. There are probably toolkit-specific protocols involved, and I suspect that there is more support and encouragement for SVG icons than there is in EWMH (where there is none).

PS: All of this is going to change drastically in Wayland, since we obviously won't have X properties any more.

(This whole exploration was prompted by a recent question on the FVWM mailing list.)

ModernXAppIcons written at 00:50:28; Add Comment

2019-05-23

I will probably never give my shell dotfiles the major reform they could use

I've been using my current shell for what is now a very long time, and over that time I've always carried forward the core structure of my Unix $HOME, including my shell dotfiles. One of the consequences of this is that my dotfiles have accumulated a great deal of historical cruft. Another is that if I was starting from scratch with a clean slate, I probably wouldn't structure and code my dotfiles the way they currently are; my tastes and views have changed from the long-ago person who first wrote those files and set the basic approach that I've followed since then.

Some people would take a period of vacation or a move to a new system or whatever as a reason to rip everything out and start from scratch (perhaps even with a new shell, one that's more popular, active, and widely available by default). I applaud the energy of these people, but I've come to realize that I no longer have that sort of enthusiasm. These days I'm a pragmatist about any number of things, and one of those things is that I just don't care enough to go in and do anything to perfectly working shell dotfiles.

(I don't even care enough to delete all of the sections for long dead systems and system types. Well, in theory I don't care, but now that I'm writing this entry my fingers have a bit of an itchy urge.)

There is still a bit of me that would like to start all over again from complete scratch, with a bare $HOME and no history. Perhaps someday I will have a completely new system that I want to make my new home and I'll get to do that, but for now that seems unlikely. I do admire the people who run around doing lots of experiments and customizations and so on with their shells, though.

(I theorize that one thing that leads to restarting your $HOME is changing jobs in the commercial world, where in many cases you're not going to export things out of your old job or into your new job. A personal Unix system probably often functions as another 'job' for these purposes. Part of the advantage of being at a university for all of these years, even using different systems, is that I could freely propagate my dotfiles around between different systems, including my home system.)

PS: There are some systems where I don't have my usual shell or dotfiles set up and I just use Bash with an almost stock setup. But these systems are also ones that I don't care about enough or use actively enough to set up what I consider a 'real' environment on. If I did care, I'm not sure I'd create a new Bash based setup or just propagate a variant of my usual shell setup. Propagating over my current environment is certain a fast way to get set up, since everything just drops in in one go.

ShellDotfilesLaziness written at 23:49:32; Add Comment

2019-05-18

Binding keys to actions in xterm, and my bindings as an example

One of the probably lesser known features of xterm is that it has a quite complete but obscure and tangled system of adding custom key bindings. Because xterm is a traditional X program, this is configured through X resources instead of a dotfile somewhere, and because xterm is an old program, the syntax for this is what you could politely call interesting. The full gory details are in the KEY BINDINGS section of the xterm manpage, and you can also see the Arch wiki xterm page's section on this.

(Urxvt aka rxvt-unicode can also do key bindings through X resources, but as a more modern program it has a much simpler X resource syntax for them. See the 'keysym.sym' writeup in the urxvt manpage's section of X resources. For other current Unix terminal emulators, you're on your own, but things like gnome-terminal and konsole are probably configured through a GUI.)

I've given an example of these resources several years ago in my entry on getting xterm and modern X applications to do cut and paste together, but today I want to show and discuss my full set of xterm bindings, which now that I look at them turn out to have some history embedded in them and thus probably some surplus things. Here is the X resource:

 XTerm*VT100.Translations: #override <Key>Prior: scroll-back(1,halfpage) \n\
   <Key>Next: scroll-forw(1, halfpage) \n\
   Shift<Key>BackSpace: string(0x7f) \n\
   Shift<Key>Delete: string(0x7f) \n\
   Meta<Btn2Down>: ignore() \n\
   Ctrl Shift <KeyPress> C: copy-selection(CLIPBOARD) \n\
   Ctrl Shift <KeyPress> V: insert-selection(CLIPBOARD)

A lot of this is relatively obvious; you have some modifiers, the action involved in <..>, and then the key or mouse button name. One tricky bit is that the key name is usually the X keysym, which is usually but not always related to what the key produces and what is printed on it. If in doubt, my tool for finding out what keysym something generates is the venerable xev. I'm honestly not sure what the difference is between <Key> and <KeyPress>, if there is any.

(If your mouse has additional weird buttons, xev is also a good way to find out what X button numbers they generate.)

There's nothing really intricate or clever in my bindings, which is basically how I feel it should be with X terminal emulators if at all possible. Every bit of highly customized behavior you create in your favorite terminal emulator is another obstacle you'll run into if you ever need to use another one temporarily (and if my experience is anything to go by, sooner or later you'll need to for some reason).

As far as the actual bindings go, the Ctrl-Shift-C and Ctrl-Shift-V bindings are from my entry on cut and paste. The bindings for Prior and Next make the PgUp and PgDn scroll only a half a page at a time, which I prefer because it makes it easier to skim through a scrolled xterm without worrying that I'll miss noticing something at the top or the bottom of some page; I can mostly scan the middle area.

The shift-Delete and shift-Backspace bindings are basically there to make the shifted versions of these characters behave the same as the un-shifted versions, probably because I kept accidentally holding the shift key down just a little bit too long when I was going to delete a character. These particular bindings are quite old and predate my no longer swapping Delete and BackSpace. I'm not sure if they're necessary any more.

Ignoring the middle mouse button when meta is held is also an old binding. The very good reason to do nothing here is that the default key binding for this is to clear xterm's scrollback buffer. I definitely don't want to have that on any key combination that I can hit easily, although these days my window manager normally wouldn't let the mouse click through anyways (I have a window manager level binding for meta middle mouse button).

PS: This exercise also serves to demonstrate the kind of things that can sit quietly in the depths of X resources files and other obscure sources of customizations if you use the same environment for long enough. Which I very definitely have.

XtermKeybinding written at 23:33:42; Add Comment

2019-04-27

Some useful features of (GNU) date for things like time conversion

As part of using shell scripts to generate Prometheus metrics, and also sometimes wanting to interact with Prometheus's API through the command line (cf), I've wound up getting deeper into the date command than I usually do.

(I've done some things with GNU date before, such as using 'date -d', including for working out Linux kernel timestamps, something which is now mostly obsolete since 'dmesg -T' will do that for you.)

If you have a timestamp of seconds since the epoch and want to convert it to a date, the GNU Date manpage itself will tell you about the '@<timestamp>' date format that it accepts as input:

date --date @2147483647

If you have something, like Prometheus' web interface, that requires input in UTC time, you can add --utc to see that instead of local time:

date -d @1556398606 --utc

By extension you can convert between local time and UTC time, for instance if you have logs in local time and need to plug their time into Prometheus in UTC. However, there is a gotcha; you may need to explicitly specify your timezone, even if it's not in the normal timestamp. So:

date -d 'Apr 26 08:24:30 EDT' --utc

To get the current time as seconds since the epoch, the format specifier is '%s':

date +%s

If you want to use timestamps from date to compute how long something took (for instance, generating metrics in a shell script), GNU date will give you nanoseconds as well with the '%N' format:

stime="$(date +%s.%N)"
[...]
etime="$(date +%s.%N)"
dur="$(echo "$etime - $stime" | bc)"

The FreeBSD version of date has most of these features if you look through its manpage and the strftime(3) manpage, but it doesn't seem to be able to output the current time in any higher precision than the second. Rough equivalences are:

date -r 2147483647
date -r 1556398606 -u
date +%s

There doesn't seem to be an easy way to convert a time string into UTC time output from what I can see (but perhaps I'm missing something). On the FreeBSD machine I have access to, the following outputs local time, not UTC time:

date -j -f '%b %d %T %Z' 'Apr 26 08:24:30 EDT' -u

You can do it in two steps, by converting the time string to a timestamp and then using 'date -r <..> -u', but that's kind of annoying. You also have to specify the time string's format instead of letting GNU date's superintelligence work it out for you (hopefully correctly). If I had to do this on a FreeBSD machine, I would write some shell scripts, and then shortly they might turn into Perl programs or something so that they'd be smarter and do everything in one step. Also, a Perl, Python, C, or Go program would definitely be required if I needed timestamps with sub-second precision.

(Or one could just compile GNU date.)

(This is the kind of entry where I write things down for my future reference, since I keep digging them out of my older scripts and sometimes taking shortcuts.)

GNUDateUsefulTricks written at 17:19:49; Add Comment

2019-04-19

V7 Unix programs are often not written the way you would expect

Yesterday I wrote that V7 ed read its terminal input in cooked mode a line at a time, which was an efficient, low-CPU design that was important on V7's small and low-power hardware. Then in comments, frankg pointed out that I was wrong about part of that, namely about how ed read its input. Here, straight from the V7 ed source code, is how ed read input from the terminal:

getchr()
{
	[...]
	if (read(0, &c, 1) <= 0)
		return(lastc = EOF);
	lastc = c&0177;
	return(lastc);
}

gettty()
{
	[...]
	while ((c = getchr()) != '\n') {
	[...]
}

(gettty() reads characters from getchr() into a linebuf array until end of line, EOF, or it runs out of space.)

In one way, this is surprising; it's very definitely not how we'd write this today, and if you did, many Unix programmers would immediately tell you that you're being inefficient by making so many calls to read() and you should instead use a buffer, for example through stdio's fgets(). Very few modern Unix programs do character at a time reads from the kernel, partly because on modern machines it's not very efficient.

(It may have been comparatively less inefficient on V7 on the PDP-11, if for example the relative cost of making a system call was lower than it is today. My impression is that this may have been the case.)

V7 had stdio in more or less its modern form, complete with fgets(). V6 had a precursor version of stdio and buffered IO (see eg the manpage for getc()). However, many V7 and V6 programs didn't necessarily use them; instead they used more basic system calls. This is one of the things that often gives the code for early Unix programs (V7 and before) an usual feel, along with the short variable names and the lack of comments.

The situation with ed is especially interesting, because in V5 Unix, ed appears to have still been written in assembly; see ed1.s, ed2.s, and ed3.s here in 's1' of the V5 sources. In V6, ed was rewritten in C to create ed.c (still in a part of the source tree called 's1'), but it still used the same read() based approach that I think it used in the assembly version.

(I haven't looked forward from V7 to see if later versions were revised to use some form of buffering for terminal input.)

Sidebar: An interesting undocumented ed feature

Reading this section of the source code for ed taught me that it has an interesting, undocumented, and entirely characteristic little behavior. Officially, ed commands that have you enter new text have that new text terminate by a . on a line by itself:

$ ed newfile
a
this is new text that we're adding.
.

This is how the V7 ed manual documents it and how everyone talks about. But the actual ed source code implements this on input is, from that gettty() function:

if (linebuf[0]=='.' && linebuf[1]==0)
        return(EOF);
return(0);

In other words, it turns a single line with '.' into an EOF. The consequence of this is that if you type a real EOF at the start of a line, you get the same result, thus saving you one character (you use Control-D instead of '.' plus newline). This is very V7 Unix behavior, including the lack of documentation.

This is also a natural behavior in one sense. A proper program has to react to EOF here in some way, and it might as well do so by ending the input mode. It's also natural to go on to try reading from the terminal again for subsequent commands; if this was a real and persistent EOF, for example because the pty closed, you'll just get EOF again and eventually quit. V7 ed is slightly unusual here in that it deliberately converts '.' by itself to EOF, instead of signaling this in a different way, but in a way that's also the simplest approach; if you have to have some signal for each case and you're going to treat them the same, you might as well have the same signal for both cases.

Modern versions of ed appear to faithfully reimplement this convenient behavior, although they don't appear to document it. I haven't checked OpenBSD, but both FreeBSD ed and GNU ed work like this in a quick test. I haven't checked their source code to see if they implement it the same way.

EdV7CodedUnusually written at 23:49:59; Add Comment

2019-04-18

One reason ed(1) was a good editor back in the days of V7 Unix

It is common to describe ed(1) as being line oriented, as opposed to screen oriented editors like vi. This is completely accurate but it is perhaps not a complete enough description for today, because ed is line oriented in a way that is now uncommon. After all, you could say that your shell is line oriented too, and very few people use shells that work and feel the same way ed does.

The surface difference between most people's shells and ed is that most people's shells have some version of cursor based interactive editing. The deeper difference is that this requires the shell to run in character by character TTY input mode, also called raw mode. By contrast, ed runs in what Unix usually calls cooked mode, where it reads whole lines from the kernel and the kernel handles things like backspace. All of ed's commands are designed so that they work in this line focused way (including being terminated by the end of the line), and as a whole ed's interface makes this whole line input approach natural. In fact I think ed makes it so natural that it's hard to think of things as being any other way. Ed was designed for line at a time input, not just to not be screen oriented.

(This was carefully preserved in UofT ed's very clever zap command, which let you modify a line by writing out the modifications on a new line beneath the original.)

This input mode difference is not very important today, but in the days of V7 and serial terminals it made a real difference. In cooked mode, V7 ran very little code when you entered each character; almost everything was deferred until it could be processed in bulk by the kernel, and then handed to ed all in a single line which ed could also process all at once. A version of ed that tried to work in raw mode would have been much more resource intensive, even if it still operated on single lines at a time.

(If you want to imagine such a version of ed, think about how a typical readline-enabled Unix shell can move back and forth through your command history while only displaying a single line. Now augment that sort of interface with a way of issuing vi-like bulk editing commands.)

This is part of why I feel that ed(1) was once a good editor (cf). Ed is carefully adapted for the environment of early Unixes, which ran on small and slow machines with limited memory (which led to ed not holding the file it's editing in memory). Part of that adaptation is being an editor that worked with the system, not against it, and on V7 Unix that meant working in cooked mode instead of raw mode.

(Vi appeared on more powerful, more capable machines; I believe it was first written when BSD Unix was running on Vaxes.)

Update: I'm wrong in part about how V7 ed works; see the comment from frankg. V7 ed runs in cooked mode but it reads input from the kernel a character at a time, instead of in large blocks.

EdDesignedForCookedInput written at 23:25:56; Add Comment

2019-03-13

Peculiarities about Unix's statfs() or statvfs() API

On modern Unixes, the official interface to get information about a filesystem is statvfs(); it's sufficiently official to be in the Single Unix Specification as seen here. On Illumos it's an actual system call, statvfs(2). On many other Unixes (at least Linux, FreeBSD, and OpenBSD)), it's a library API on top of a statfs(2) system call ([[Linux, FreeBSD, OpenBSD). However you call it and however it's implemented, the underlying API of the information that gets returned is a little bit, well, peculiar, as I mentioned yesterday.

(In reality the API is more showing its age than peculiar, because it dates from the days when filesystems were simpler things.)

The first annoyance is that statfs() doesn't return the number of 'files' (inodes) in use on a filesystem. Instead it returns only the total number of inodes in the filesystem and the number of inodes that are free. On the surface this looks okay, and it probably was back in the mists of time when this was introduced. Then we got more advanced filesystems that didn't have a fixed number of inodes; instead, they'd make as many inodes as you needed, provided that you had the disk space. One example of such a filesystem is ZFS, and since we have ZFS fileservers, I've had a certain amount of experience with the results.

ZFS has to answer statfs()'s demands somehow (well, statvfs(), since it originated on Solaris), so it basically makes up a number for the total inodes. This number is based on the amount of (free) space in your ZFS pool or filesystem, so it has some resemblance to reality, but it is not very meaningful and it's almost always very large. Then you can have ZFS filesystems that are completely full and, well, let me show you what happens there:

cks@sanjuan-fs3:~$ df -i /w/220
Filesystem      Inodes IUsed IFree IUse% Mounted on
<...>/w/220        144   144     0  100% /w/220

I suggest that you not try to graph 'free inodes over time' on a ZFS filesystem that is getting full, because it's going to be an alarming looking graph that contains no useful additional information.

The next piece of fun in the statvfs() API is how free and used disk space is reported. The 'struct statvfs' has, well, let me quote the Single Unix Specification:

f_bsize    File system block size. 
f_frsize   Fundamental file system block size. 

f_blocks   Total number of blocks on file system
           in units of f_frsize. 

f_bfree    Total number of free blocks. 
f_bavail   Number of free blocks available to 
           non-privileged process. 

When I was an innocent person and first writing code that interacted with statvfs(), I said 'surely f_frsize is always going to be something sensible, like 1 Kb or maybe 4 Kb'. Silly me. As you can find out using a program like GNU Coreutils stat(1), the actual 'fundamental filesystem block size' can vary significantly among different sorts of filesystems. In particular, ZFS advertises a 'fundamental block size' of 1 MByte, which means that all space usage information in statvfs() for ZFS filesystems has a 1 MByte granularity.

(On our Linux systems, statvfs() reports regular extN filesystems as having a 4 KB fundamental filesystem block size. On a FreeBSD machine I have access to, statvfs() mostly reports 4 KB but also has some filesystems that report 512 bytes. Don't even ask about the 'filesystem block size', it's all over the map.)

Also, notice that once again we have the issue where the amount of space in use must be reported indirectly, since we only have 'total blocks' and 'available blocks'. This is probably less important for total disk space, because that's less subject to variations than the total amount of inodes possible.

StatfsPeculiarities written at 23:46:13; Add Comment

2019-03-07

Exploring the mild oddity that Unix pipes are buffered

One of the things that blogging is good for is teaching me that what I think is common knowledge actually isn't. Specifically, when I wrote about a surprisingly arcane little Unix shell pipeline example, I assumed that it was common knowledge that Unix pipes are buffered by the kernel, in addition to any buffering that programs writing to pipes may do. In fact the buffering is somewhat interesting, and in a way it's interesting that pipes are buffered at all.

How much kernel buffering there is varies from Unix to Unix. 4 KB used to be the traditional size (it was the size on V7, for example, per the V7 pipe(2) manpage), but modern Unixes often have much bigger limits, and if I'm reading it right POSIX only requires a minimum of 512 bytes. But this isn't just a simple buffer, because the kernel also guarantees that if you write PIPE_BUF bytes or less to a pipe, your write is atomic and will never be interleaved with other writes from other processes.

(The normal situation on modern Linux is a 64 KB buffer; see the discussion in the Linux pipe(7) manpage. The atomicity of pipe writes goes back to early Unix and is required by POSIX, and I think POSIX also requires that there be an actual kernel buffer if you read the write() specification very carefully.)

On the one hand this kernel buffering and the buffering behavior makes perfect sense and it's definitely useful. On the other hand it's also at least a little bit unusual. Pipes are a unidirectional communication channel and it's pretty common to have unbuffered channels where a writer blocks until there's a reader (Go channels work this way by default, for example). In addition, having pipes buffered in the kernel commits the kernel to providing a certain amount of kernel memory once a pipe is created, even if it's never read from. As long as the read end of the pipe is open, the kernel has to hold on to anything it allowed to be written into the pipe buffer.

(However, if you write() more than PIPE_BUF bytes to a pipe at once, I believe that the kernel is free to pause your process without accepting any data into its internal buffer at all, as opposed to having to copy PIPE_BUF worth of it in. Note that blocking large pipe writes by default is a sensible decision.)

Part of pipes being buffered is likely to be due to how Unix evolved and what early Unix machines looked like. Specifically, V7 and earlier Unixes ran on single processor machines with relatively little memory and without complex and capable MMUs (Unix support for paged virtual memory post-dates V7, and I think wasn't really available on the PDP-11 line anyway). On top of making the implementation simpler, using a kernel buffer and allowing processes to write to it before there is a reader means that a process that only needs to write a small amount of data to a pipe may be able to exit entirely before the next process runs, freeing up system RAM. If writer processes always blocked until someone did a read(), you'd have to keep them around until that happened.

(In fact, a waiting process might use more than 4 KB of kernel memory just for various data structures associated with it. Just from a kernel memory perspective you're better off accepting a small write buffer and letting the process go on to exit.)

PS: This may be a bit of a just-so story. I haven't inspected the V7 kernel scheduler to see if it actually let processes that did a write() into a pipe with a waiting reader go on to potentially exit, or if it immediately suspended them to switch to the reader (or just to another ready to run process, if any).

BufferedPipes written at 22:43:42; Add Comment

(Previous 10 or go back to March 2019 at 2019/03/04)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.