2011-11-29
The alternate screen terminal emulator plague
There is one bit of behavior in modern X environments that drives me
up the wall: the use of 'alternate screens' in terminal emulators (and
in programs). That's probably obscure to most people, so let me put it
the other way; what drives me up the wall is when I edit a file in vi
or view it in a pager like less, quit, and all of the text that I
was just looking at instantly disappears in favour of what was on the
screen before I started whatever it was.
(Having this happen in less is especially infuriating, since less's
entire purpose is showing me information. But the moment I quit, I can't
have it.)
What is happening is not the fault of vi, less, and so
on, or at least not exactly. Unix systems have terminfo, which is essentially a big
database of terminals and escape sequences that programs can send to
make them do things; two of the defined terminfo capabilities are escape
sequences that you are supposed to send when your full screen program
starts and stops. Many terminal emulators (and a certain number of
real terminals) support an alternate screen, and the people who wrote
terminfo database entries for them decided that full screen programs
should always use this alternate screen so they put the escape sequences
for 'switch to alternate screen' and 'switch to main screen' into the
initialization and de-initialization sequences. When programs like
vi, emacs, and less dutifully send the escape sequences that the
terminfo database tells them to, they shoot your foot off.
My personal opinion is that this is an unfortunate example of the slow
fossilization of X terminal emulators. xterm probably does this
because some real terminal had its terminfo set up this way in the
depths of time, and everyone else has probably slavishly copied how
xterm works. No one has stopped to ask if the end result makes sense
and is usable, because if they had a great many people would have told
them that it doesn't and isn't.
(Xterm gets partial credit because it has a way to turn this off, so
at least the xterm people recognize that it's a terrible idea even
if they feel constrained by backwards compatibility to not fix the
default.)
Unfortunately there is no general fix for this. Some programs can be told to not send the terminfo initialization and de-initialization strings; some terminal emulators can be told to ignore them. Sadly, sending the strings and paying attention to them is the default behavior; this leads to fixing a lot of programs on a lot of systems, one by one.
(For extra fun, some Unixes do things right to start with. For example,
Solaris has never put the alternate screen escape sequences into their
terminfo entries for xterm.)
Sidebar: the quick fixes
If you use xterm, set the XTerm*titeInhibit resource to true
to make it ignore the alternate screen escape sequences. As usual,
gnome-terminal has no way of controlling this.
For less, do 'export LESS=X' or otherwise give it the -X switch.
For vi, add 'set t_ti= t_te=' to your .vimrc.
2011-11-27
Why processing things in inode order is a good idea
In a note on yesterday's entry on readdir()'s ordering, a commentator wrote (in part):
Note many utils use FTS which sorts directory entries by inode [...]
It may not be obvious why this is a good thing to do, so let me take a shot at it.
Suppose that you want to do something that either looks at or
touches the inodes of the files in a directory; perhaps your ls or
find needs to stat() them, or your chmod needs to change their
permissions. What is the right order to process the files in?
As always, modern disks are seek limited. You can't do anything to change how many seeks it takes to read the directory (or in general how fast it happens), because you don't control anything about the order that you get directory entries in; as discussed last entry, the kernel returns directory entries in whatever it wants to. But you can control what order the kernel reads inodes. So we want to ask for inodes in whatever order minimizes seeks.
In general, you don't know exactly how a filesystem organizes where inodes go on the disk and they are usually not all in one contiguous area but scattered in various spots over the disk. However, filesystems have historically put inodes on the disk in increasing order of inode number; you can be pretty certain that inode X+1000 is in a block that is after the block that inode X is in (at least as far as logical block numbers go). Asking for inodes in increasing numerical order thus at least means that the disk only seeks forward (and probably the minimum distance possible) and it maximizes the chances that the kernel will be able to read several inodes of interest in one block. Asking for inodes in any other order increases the chances that the kernel will have to seek back and forth over the disk to give them to you.
(There are some filesystems where this is no longer true, primarily filesystems (such as ZFS) which never rewrite things in place. That means that every time an inode is modified it has to be written to a new place on disk, which means that the (fixed) inode number of a file has no bearing on where on disk the inode has wound up.)
2011-11-26
About the order that readdir() returns entries in
In a mostly unrelated article (seen via Planet Sysadmin) I recently noticed the following:
readdir(3) just returns the directory entries in the order they are linked together, which is also not related to inode numbering but as best as I can tell is from outer leaf inwards (since the most recently created file is listed first).
On most systems, readdir(3) is a modestly warmed over
version of the underlying 'read directory entries'
system call, and returns directory entries in the same order that the
system call does. In theory a Unix kernel can return directory entries
in whatever order it wants (including, say, sorted alphabetically). In
practice kernels almost always give you directory entries in what I will
call 'traversal order', whatever the natural order is for entries in the
on-disk data structures that represent a directory.
For a very long time, Unix directories on disk were simple linear arrays (first with fixed-size entries and then with variable sized ones when BSD introduced long filenames). When a new entry was added, it was generally put in the first spot where there was room; at the end if none of the directory's entries had been deleted, and perhaps earlier if a filename of a suitable length had been deleted earlier. The kernel read directories in forward linear order, starting at the first block of the directory and going up, and so returned entries in this order.
(In the original simple Unix filesystems, inode numbers were also allocated in a straightforward 'first free number' order, so the order of directory entry creation could correspond quite well to inode order. The Berkeley FFS changed this somewhat by allocating inodes in a more scattered manner.)
Modern Unix systems commonly use some sort of non-linear directories under at least some circumstances (a linear data structure may still be more efficient for small directories); generally these are some variant of balanced trees. The natural traversal order is tree order, but what that is is very implementation dependent. I believe it's common to hash filenames and then insert entries into the tree in hash order, but hashes (and thus the hash order) vary tremendously between filesystems, and I'm sure that somewhere there is a filesystem that doesn't hash names and just inserts them straight in some order.
(Because this is a per-filesystem thing, it follows that the traversal order can be different for different directories on the same system even if they have the same entries created in the same order, either because the directories are using different filesystem types or just because some parameters were set differently on the different filesystems.)
2011-11-17
The drawback of modern X font handling
In some ways, font handling in modern versions of X is quite nice. We have a decent number of good, high quality fonts in modern scalable formats like TrueType, and one can use fonts from Windows and Mac OS X if one wants to. Thanks to various reforms in font handling, specifying fonts is generally easier and more flexible (hands up everyone who ever tried to generate or read an XLFD string for a particular font), and you can install personal fonts without huge contortions. But it does have one drawback, at least for someone like me.
In the old days of X font handling, the X server did all of the work. X
clients simply told the server to render some text in a particular
font; it was the server itself that was responsible for generating
the font bitmaps and drawing them (sometimes the X server delegated
generating font bitmaps to a separate program, such as xfs, the X font
server). This meant that you only had to tune fonts in one place and
your tuning applied to every X client that you ran, no matter what they
were or where they running. Or to put it another way, I could carefully
select an xterm font (and size) that I really liked and it would stick
everywhere.
(The fly in this 'all in the server' ointment was default X application resources, but you could fix that with some more work.)
In the new world of X fonts, fonts are rendered separately by each X client (using various layers of font selection and rendering) and sent to the server as precomputed bitmaps. If all of your clients are running on the same machine and using the same set of font libraries, the result is the same as in the old world. But if some of your clients are running on different machines and displaying remotely (or some of your local clients have decided to use their own copies of libraries), they can render the same nominal font quite differently. This is especially so if you use generic font names like 'monospace' or 'serif', because what actual fonts those generic names map to is system-specific; one machine may very well map 'monospace' to 'DejaVu Sans Mono', while another maps it to 'Liberation Mono'.
(The corollary to this is that font availability is also a per-machine
thing. If you install a new font you like onto your local workstation,
an xterm or Firefox or whatever running from a remote server cannot
use it.)
In the new world, what you see for something like 'DejaVu Sans Mono 10' depends on the specific version of the font each system has installed, what exact rendering library version each system is using, and what rendering settings each system is using for things like subpixel anti-aliasing. This drastically complicates efforts to, say, pick a single modern font for all of your terminal windows.
(I'm aware that the modern answer to this drawback is that I should run all of my X programs locally and just use ssh. This is what you could politely call a fail.)
Sidebar: a concrete example
Both of the following images are xterm using DejaVu Sans Mono 10,
displaying on a Fedora 15 machine's X server. One of the xterms is
running locally on the Fedora 15 machine; the other is running on a
32-bit Ubuntu 10.04 machine.

One of these I rather like, one of these I can't stand.
(Part of the difference is clearly in different settings for subpixel anti-aliasing; the Ubuntu 10.04 version has colour fringes that the Fedora 15 version does not. But I don't think the difference in line width that makes the 10.04 version visibly blacker is due to that.)
2011-11-08
Files and fundamental filesystem activities (on Unix)
Back in a discussion of filesystem deduplication I said that writing blocks is a fundamental filesystem activity while writing files is not. On the surface this sounds like a strange thing, so today I'm going to defend it.
At one level, it's clear how writing blocks is a fundamental filesystem activity. Filesystems allocate disk space in blocks and pretty much only write blocks; if you try to write less than a block, the filesystem actually usually does a 'read modify write' cycle. Although this was once forced by physical disk constraints, that's no longer true today; until recently, disks used smaller physical blocks than the filesystem block size, so the filesystem could do sub-block writes if it wanted to. Filesystems just don't, by and large.
What's not clear is why writing files is not. To see why, let's ask a
question: what does it mean to write a file, and when are you done? In
the simple case the answer is that you write all of the data in the
file in sequential order, and then close the file descriptor. This
probably describes a huge amount of the file writes done on a typical
Unix system, and it's certainly what most people think of, since this
describes things like saving a file in an editor or writing out an image
in your image editor. But there's a lot of files on Unix that aren't
'written' this way. Databases (SQLite included) are the classic case,
but there are other examples; even
rsync may 'write' files in non-sequential chunks in some situations.
Some of these cases may not close the file for days or weeks, although
they may go idle for significant amounts of time.
The result is that writing files is a diffuse activity while writing blocks is a very sharp one. You can clearly write to a file in a way that touches only a small portion of the file, and if the end of writing a file is when you close it you can write files very slowly, with huge gaps between your actual IO. And the system makes all of these patterns relatively efficient, unlike partial-block writes.
This cause problems for a number of things that want to react when a file is written. File level deduplication is one example; another is real time virus scanners, even with system support to hook events.
(The more I think about it, the more I think that this is not just a Unix thing. Although I may have blinkered vision due to Unix, it's hard to see a viable API that could make writing files a fundamental activity. There's many situations where you just can't pregenerate all of the file before writing it even if you're writing things sequentially, plus there's random write IO to consider unless you make that an entirely separate 'database' API.)