Wandering Thoughts archives


The pending delete problem for Unix filesystems

Unix has a number of somewhat annoying filesystem semantics that tend to irritate designers and implementors of filesystems. One of the famous ones is that you can delete a file without losing access to it. On at least some OSes, if your program open()s a file and then tries to delete it, either the deletion fails with 'file is in use' or you immediately lose access to the file; further attempts to read or write it will fail with some error. On Unix your program retains access to the deleted file and can even pass this access to other processes in various ways. Only when the last process using the file closes it will the file actually get deleted.

This 'use after deletion' presents Unix and filesystem designers with the problem of how you keep track of this in the kernel. The historical and generic kernel approach is to keep both a link count and a reference count for each active inode; an inode is only marked as unused and the filesystem told to free its space when both counts go to zero. Deleting a file via unlink() just lowers the link count (and removes a directory entry); closing open file descriptors is what lowers the reference count. This historical approach ignored the possibility of the system crashing while an inode had become unreachable through the filesystem and was only being kept alive by its reference count; if this happened the inode became a zombie, marked as active on disk but not referred to by anything. To fix it you had to run a filesystem checker, which would find such no-link inodes and actually deallocate them.

(When Sun introduced NFS they were forced to deviate slightly from this model, but that's an explanation for another time.)

Obviously this is not suitable for any sort of journaling or 'always consistent' filesystem that wants to avoid the need for a fsck after unclean shutdowns. All such filesystems must keep track of such 'deleted but not deallocated' files on disk using some mechanism (and the kernel has to support telling filesystems about such inodes). When the filesystem is unmounted in an orderly way, these deleted files will probably get deallocated. If the system crashes, part of bringing the filesystem up on boot will be to apply all of the pending deallocations.

Some filesystems will do this as part of their regular journal; you journal, say, 'file has gone to 0 reference count', and then you know to do the deallocation on journal replay. Some filesystems may record this information separately, especially if they have some sort of 'delayed asynchronous deallocation' support for file deletions in general.

(Asynchronous deallocation is popular because it means your process can unlink() a big file without having to stall while the kernel frantically runs around finding all of the file's data blocks and then marking them all as free. Given that finding out what a file's data blocks are often requires reading things from disk, such deallocations can be relatively slow under disk IO load (even if you don't have other issues there).)

PS: It follows that a failure to correctly record pending deallocations or properly replay them is one way to quietly lose disk space on such a journaling filesystem. Spotting and fixing this is one of the things that you need a filesystem consistency checker for (whether it's a separate program or embedded into the filesystem itself).

unix/UnixPendingDeleteProblem written at 01:02:45; Add Comment


In Go, you need to always make sure that your goroutines will finish

Yesterday I described an approach to writing lexers in Go that pushed the actual lexing into a separate goroutine, so that it could run as straight-line code that simply consumed input and produced a stream of tokens (which were sent to a channel). Effectively we're using a goroutine to implement what would be a generator in some other languages. But because we're using goroutines and channels, there's something important we need to do: we need to make sure the lexer goroutine is run to completion, so that the goroutine will actually finish.

Right now you may be saying 'well of course the lexer will always be run to the end of the input, that's what the parser does'. But not so fast; what happens if the parser runs into a parse error because of a syntax error or the like? The natural thing to do in the parser is to immediately error out without looking at any further tokens from the lexer, which means that the actual lexer goroutine will stall as it sits there trying to send the next token into its communication channel, a channel that will never be read from because it's been abandoned by the parser.

The answer here is that the parser must do something to explicitly run the lexer to completion or otherwise cause it to exit, even if the tokens the lexer are producing will never be used. In some environments having the lexer process all of the remaining input is okay because it will always be small (and thus fast), but if you're lexing large bodies of text you'll want to arrange some sort of explicit termination signal via another channel or something.

This is an important way in which goroutines and channels aren't a perfect imitation of generators. In typical languages with generators, abandoning a generator results in it getting cleaned up via garbage collection; you can just walk away without doing anything special. In Go with goroutines, this isn't the case; you need to consider goroutine termination conditions and generally make sure it always happens.

You might think that this is a silly bug and of course anyone who uses goroutines like this will handle it as a matter of course. If so, I regret to inform you that I didn't come up with this realization on my own; instead Rob Pike taught it to me with his bugfix to Go's standard text/template module. If Rob Pike can initially overlook this issue in his own code in the standard library, anyone can.

programming/GoAlwaysDrainGoroutines written at 00:11:59; Add Comment


Go goroutines as a way to capture and hold state

The traditional annoyance when writing lexers is that lexers have internal state (at least their position in the stream of text), but wind up returning tokens to the parser at basically random points in their execution. This means holding the state somewhere and writing the typical start/stop style of code that you find at the bottom of a pile of subroutine calls; your 'get next token' entry point gets called, you run around a bunch of code, you save all your state, and you return the token. Manual state saving and this stuttering style of code execution doesn't lend itself to clear logic.

Some languages have ways around this structure. In languages with generators, your lexer can be a generator that yields tokens. In lazy evaluation languages your lexer turns into a stream transformation from raw text to tokens (and the runtime keeps this memory and execution efficient, only turning the crank when it needs the next token).

In Rob Pike's presentation on lexing in Go, he puts the lexer code itself into its own little goroutine. It produces tokens by sending them to a channel; your parser (running separately) obtains tokens by reading the channel. There are two ways I could put what Rob Pike's done here. The first is to say that you can use goroutines to create generators, with a channel send and receive taking the place of a yield operation. The second is that goroutines can be used to capture and hold state. Just as with ordinary threads, goroutines turn asynchronous code with explicitly captured state into synchronous code with implicitly captured state and thus simplify code.

(I suppose another way of putting it is that goroutines can be used for coroutines, although this feels kind of obvious to say.)

I suspect that this use for goroutines is not new for many people (and it's certainly implicit in Rob Pike's presentation), but I'm the kind of person who sometimes only catches on to things slowly. I've read so much about goroutines for concurrency and parallelism that the nature of what Rob Pike (and even I) were doing here didn't really sink in until now.

(I think it's possible to go too far overboard here; not everything needs to be a coroutine or works best that way. When I started with my project I thought I would have a whole pipeline of goroutines; in the end it turned out that having none was the right choice.)

programming/GoroutinesAsStateCapture written at 02:19:32; Add Comment


It's time to stop coddling software that can't handle HTTPS URLs

A couple of years ago I moved my personal website from plain HTTP to using HTTPS. When I did that, one of the lessons I learned was that there were a certain number of syndication feed fetchers that didn't support HTTPS requests at all. My solution at the time was to sigh and add some bits to my lighttpd configuration so they'd be allowed to still fetch the HTTP version of my syndication feeds Now I'm in the process of moving this blog from HTTP to HTTPS and so I've been considering what I'll do about issues like this for here. This time around my decision is that I'm not going to create any special rules; anything fetching syndication feeds or web pages from here that can't do HTTPS (or follow redirections) is flat out of luck.

There are some pragmatic reasons for this, but ultimately it comes down to that I think it's now clearly time that we stopped accepting and coddling software that can only deal with HTTP URLs. The inevitable changes of the Internet have rendered such software broken. It's clear that HTTPS is increasingly the future of web activity and also clear that a decent number of sites will be moving to it via HTTP to HTTPS redirection. Software that cannot cope with both of these is decaying; the more sites that do this, the more pragmatically broken the software is.

I'm not going to say that you should never give in and accommodate decaying, broken software; if nothing else, I certainly have made some accommodations myself. But when I do that, I do it on a case by case basis and only when I've decided that it's sufficiently important; I don't do it generally. Coddling broken software in general only prolongs the pain, not just for you but for everyone. In this case, the more we accommodate HTTP only software the more traffic remains HTTP (and subject to snooping and alteration) instead of moving to HTTPS. HTTPS is not ideal, but it's clear that an HTTPS network is an improvement over the HTTP one we have today in practice.

This is likely going to hurt me somewhat (and already has, as some Planets (also) that carry Wandering Thoughts apparently haven't coped with this). But even apart from the pragmatic impossibility of trying to pick through all of the request to here to see which aren't successfully transitioning to HTTPS, I'm currently just not willing to coddle such bad software any more. It's 2015. You'd better be ready for the HTTPS transition because it's coming whether you like it or not.

The reason I feel like this now when I didn't originally is pretty simple: more time has passed. The whole situation with HTTP and HTTPS on the Internet has evolved significantly since 2013, and there is now real and steadily increasing momentum behind the HTTPS shift. What was kind of wild eyed and unreasonable in 2013 is increasingly mainstream.

web/NoMoreHTTPOnlySoftware written at 00:01:55; Add Comment


The problem with proportional fonts for editing code and related things

One of the eternal attractive ideas for programmers, sysadmins, and other people who normally spend a lot of time working with monospaced fonts in editors, terminal emulators, and so on is the idea of switching to proportional fonts. I've certainly considered it myself (there are various editors and so on that will do this) but I've consistently rejected trying to make the switch.

The big problem is text alignment, specifically what I'll call 'interior' text alignment. Having things line up vertically is quite important for readability and I'm not willing to do without it. At one level there's no problem; automatically lining up leading whitespace is a solved issue, and other things you can align by hand. At another level there's a big problem, because I need to interact with an outside world that uses monospace fonts; the stuff I carefully line up in my proportional fonts editor needs to look okay for them and the stuff that they carefully line up in monospace fonts needs to look okay for me. And automatically detecting and aligning things based on implied columns is a hard problem.

(I used to use a manual page reader that used proportional fonts by default. It made some effort to align things but not enough, and on some manual pages the results came out really terrible. This experience has convinced me that proportional fonts with bad alignment are significantly worse than monospaced fonts.)

This is probably not an insoluble problem. But it means that simply writing an editor that uses proportional fonts is the easy part; even properly indenting leading whitespace is the easy part. In turn this means that you need a very smart editor to make using proportional fonts really a nice experience, especially if you routinely interact with code from outside your own sphere. Really smart editors are rare and relatively prickly and opinionated; if you don't like their interface and behavior, well, you're stuck. You're also stuck if you're strongly attached to editors that don't have this kind of smarts.

(The same logic holds for things like terminal programs but even more so. A really smart terminal program that used proportional fonts would have to detect column alignment in output basically on the fly and adjust things.)

So I like the idea of using proportional fonts for this sort of stuff in theory, but I'm pretty sure that in practice I'm never going to find an environment that fully supports it that works for me.

(For those people who wonder why you'd want to consider this idea at all: proportional fonts are usually more readable and nicer than monospaced fonts. This entry is basically all plain text, so you can actually look at the DWikiText monospaced source for it against the web browser version. At least for me, the web browser's proportional font version looks much better.)

tech/ProportionalFontProblem written at 01:24:21; Add Comment


Our mail submission system winds up handling two sorts of senders

Yesterday I mentioned that while in theory our mail submission system could use sender verification to check whether a MAIL FROM address at an outside domain was valid, but that I didn't feel this was worth it. One of the reasons I feel this way is that I don't think this check will fail very often for most outside domains, and to do that I need to talk about how we have two sorts of senders: real people and machines.

Real people are, well, real people with a MUA who are sending email out through us. My view is that when real people may send out email using outside domains in their From: address, it's extremely likely that this address will be correct; if it's not correct, the person is probably going to either notice it or get told by people they are trying to talk to through some out of band mechanism. Unless you're very oblivious and closed off, you're just not going to spend very long with your MUA misconfigured this way. On top of it, real people have to explicitly configure their address in their MUA, which means there is a whole class of problems that get avoided.

Machines are servers and desktops and everything we have sitting around on our network that might want to send status email, report in to its administrator, spew out error reports to warn people of stuff, and so on. Email from these machines is essentially unidirectional (it goes out from the machine but not back), may not be particularly frequent, and is often more or less automatically configured. All of this makes it very easy for machines to wind up with bad or bogus MAIL FROMs. Often you have to go out of your way during machine setup in order to not get this result.

(For instance, many machines will take their default domain for MAIL FROMs from DNS PTR results, which malfunctions in the presence of internal private zones.)

Most broken machine origin addresses are easily recognized, because they involve certain characteristic mistakes (eg using DNS PTR results as your origin domain). Many of these addresses cannot be definitively failed with sender verification because, for example, the machine doesn't even run a SMTP listener that you can talk to.

You can mostly use sender verification for addresses from real people, but even ignoring the other issues there's little point because they'll almost never fail. Real people will almost always be using sender addresses from outside domains, not from internal hostnames.

sysadmin/MailSubmissionTwoSenders written at 01:45:03; Add Comment


What addresses we accept and reject during mail submission

Like many places, our mail setup includes a dedicated mail submission machine (or two). I mentioned yesterday that this submission machine refuses some MAIL FROM addresses, so today I want to talk about what we accept and refuse during mail submission and why.

When we were designing our mail submission configuration many years ago, our starting point was that we didn't expect clients to deal very well if the submission server gave them a failure response. What you'd like is for the MUA to notice the error, report it, give you a chance to re-edit the email addresses involved, and so on and so forth. What we actually expected would happen would be some combination of lost email, partially delivered email (if some RCPT TOs failed but others succeeded), and awkward interfaces for dealing with failed email sending. So a big guiding decision was that our mail submission machine should accept the email if at all possible, even if we knew that it would partially or completely fail delivery. It was better to accept the email and send a bounce rather than count on all of the MUAs that our users use to get it right.

(Some but not all RCPT TO addresses failing during SMTP is a somewhat challenging problem for any MUA to deal with. How do you present this to the user, and what do you want to do when the user corrects the addresses? For example, if the user corrects the addresses and resends, should it be resent to all addresses or just the corrected ones? There's all sorts of UI issues involved.)

Given that our recovery method for bad destination addresses is sending a bounce, we need to have what at least looks like a valid MAIL FROM to send the bounce back to; if we don't we can't send bounces, so we're better off rejecting during SMTP and hoping that the MUA will do something sensible. For email addresses in outside domains, the practical best we can do is verify that the domain exists. For email addresses in our own domain, we can check that the local part is valid (using our list of valid local parts), so we do.

(We also do some basic safety checks for certain sorts of bad characters and bad character sequences in MAIL FROM and RCPT TO addresses. These probably go beyond what the RFCs require and may not be doing anything useful these days; we basically inherited them from the stock Ubuntu configuration of close to a decade ago.)

We allow people to use MAIL FROM addresses that are not in our domain in part because some people in the department have a real need to do this as part of their work. In general we log enough source information that if anyone abuses this we can find them and deal with this.

(You might say 'but what about spammers compromising accounts and sending spam through you with forged origin addresses?' My answer is that that's a feature.)

PS: In theory checking outside domain MAIL FROM addresses is one place where sender verification has a real justification, and you can even legitimately use the null sender address for it. In practice there are all sorts of failure modes that seem likely to cause heartburn and it's just not worth it in my opinion.

sysadmin/MailSubmissionAcceptReject written at 00:58:08; Add Comment


Sometimes it's useful to have brute force handy: an amusing IPMI bug

Once upon a time we had gotten in some new servers. These servers had an IPMI and the IPMI could be configured to send out email alerts if something happened, like a fan stopping or a power supply losing power. Getting such alerts (where possible) seemed like a good idea, so I dutifully configured this in the IPMI's web interface. Sensibly, the IPMI needed me to set the origin address for the email, so I set it to sm-ipmi@<us> (and then made sure there was an sm-ipmi alias, so our mail submission machine would accept the email).

Of course, configurations are never quite done until they're tested. So I poked the IPMI to send me some test email. No email arrived. When I went off to our mail submission machine to look at its logs, I got rather a surprise; the logs said the machine had dutifully rejected a message that claimed a MAIL FROM address of =sm-ipmi@<us>.

While the insides of an IPMI's embedded software are inscrutable (at least to lazy sysadmins who are not security researchers), this smells like some form of classic data storage mismatch bug. The web interface thinks the email address should be stored with an '=' in front, maybe as an 'X=Y' thing, whereas whatever is actually using the address either has an off by one character parsing bug or doesn't want the extra leading = that the web interface is adding when it stores it.

There are probably a bunch of ways we could have dealt with this. As it happens our mail system is flexible enough to let us do the brute force approach: we just defined an alias called '=sm-ipmi'. Our mail system is willing to accept an '=' in local parts, even at the start, so that's all it took to make everything happy. It looks a little bit peculiar in the actual email messages, but that's just a detail.

A more picky email system would have given us more heartburn here. In a way we got quite lucky that none of the many levels of checks and guards we have choked on this. Our alias generation system was willing to see '=' as a valid character, even at the start; the basic syntax checks we do on MAIL FROM didn't block a = at the start; Exim itself accepts such a MAIL FROM local part and can successfully match it against things. I've used mail systems in the past that were much more strict about this sort of stuff and they'd almost certainly have rejected such an address out of hand or at least given us a lot of trouble over it.

(I don't even know if such an address is RFC compliant.)

The whole situation amuses me. The IPMI has a crazy, silly bug that should never have slipped through development and testing, and we're dealing with it by basically ignoring it. We can do that because our mail system is itself willing to accept a rather crazy local part as actually existing and being valid, which is kind of impressive considering how many different moving parts are involved.

PS: I call this the brute force solution because 'make an alias with a funny character in it' is more brute force than, say, 'figure out how to use sender address rewriting to strip the leading = that the IPMI is erroneously putting in there'.

PPS: Of course, some day maybe we'll update the IPMI firmware and suddenly find the notification mail not going through because the IPMI developers noticed the bug and fixed it. I suppose I should add the 'sm-ipmi' alias back in, just in case.

sysadmin/IPMIEmailBug written at 02:18:46; Add Comment


Why keeping output to 80 columns (or less) is still sensible

When I talked about how monitoring tools should report timestamps and other identifying information, I mentioned that I felt that keeping output to 80 columns or less was still a good idea even if meant sometimes optionally omitting timestamps. So let's talk about that, since it's basically received wisdom these days that the 80 column limit is old fashioned, outdated, and unnecessary.

I think that there are still several reasons that short output is sensible, especially at 80 columns or less. First, 80 columns is still the default terminal window size in many environments; if you make a new one and do nothing special, 80 columns is what you get by default (often 80 by 24). This isn't just on Unix systems; I believe that eg Windows often defaults to this size for both SSH client windows and its own command line windows. This means that if your line spills over 80 columns, many people have to take an extra step to get readable results (by widening their default sized window) and they may mangle some existing output for the purposes of eg cut and paste (since many terminal windows still don't re-flow lines when the window widens or narrow).

Next, there's an increasingly popular class (or classes) of device with relatively constrained screen size, namely smartphones and small tablets. Even a large tablet might only be 80 columns wide in vertical orientation. Screen space is precious on those devices and there's often nothing the person using the device can really do to get any more of it. And yes, people are doing an increasing amount of work from such devices, especially in surprise situations where a tablet might be the best (or only) thing you have with you. Making command output useful in such situations is an increasingly good idea.

Finally, overall screen real estate can be a precious resource even on large-screen devices because you can have a lot of things competing for space. And there are still lots of situations where you don't necessarily need timestamps and they'll just add clutter to output that you're actively scanning. I won't pretend that my situation is an ordinary one; there are plenty of times where you're basically just glancing at the instantaneous figures every so often or looking at recent past or the like.

(As far as screen space goes, often my screen winds up completely covered in status monitoring windows when I'm troubleshooting something complicated. Partly this is because it's often not clear what statistic will be interesting so I want to watch them all. Of course what this really means is that we should finally build that OS level stats gathering system I keep writing about. Then we'd always be collecting everything and I wouldn't have to worry about maybe missing something interesting.)

tech/StickingTo80Columns written at 23:53:37; Add Comment

Unix's pipeline problem (okay, its problem with file redirection too)

In a comment on yesterday's entry, Mihai Cilidariu sensibly suggested that I not add timestamp support to my tools but instead outsource this to a separate program in a pipeline. In the process I would get general support for this and complete flexibility in the timestamp format. This is clearly and definitely the right Unix way to do this.

Unfortunately it's not a good way in practice, because of a fundamental pragmatic problem Unix has with pipelines. This is our old friend block buffering versus line buffering. A long time ago, Unix decided that many commands should change their behavior in the name of efficiency; if they wrote lines of output to a terminal you'd get each line as it was written, but if they wrote lines to anything else you'd only get it in blocks.

This is a big problem here because obviously a pipeline like 'monitor | timestamp' basically requires the monitor process to produce output a line at time in order to be useful; otherwise you'd get large blocks of lines that all had the same timestamp because they were written to the timestamp process in a block. The sudden conversion from line buffered to block buffered can also affect other sorts of pipeline usage.

It's certainly possible to create programs that don't have this problem, ones that always write a line at a time (or explicitly flush after every block of lines in a single report). But it is not the default, which means that if you write a program without thinking about it or being aware of the issue at all you wind up with a program that has this problem. In turn people like me can't assume that a random program we want to add timestamps to will do the right thing in a pipeline (or keep doing it).

(Sometimes the buffering can be an accidental property of how a program was implemented. If you first write a simple shell script that runs external commands and then rewrite it as a much better and more efficient Perl script, well, you've probably just added block buffering without realizing it.)

In the end, what all of this really does is that it chips away quietly at the Unix ideal that you can do everything with pipelines and that pipelining is the right way to do lots of stuff. Instead pipelining becomes mostly something that you do for bulk processing. If you use pipelines outside of bulk processing, sometimes it works, sometimes you need to remember odd workarounds so that it's mostly okay, and sometimes it doesn't do what you want at all. And unless you know Unix programming, why things are failing is pretty opaque (which doesn't encourage you to try doing things via pipelines).

(This is equally a potential problem with redirecting program output to files, but it usually hits most acutely with pipelines.)

unix/PipelineProblem written at 02:27:41; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.