2011-08-17
MH is an iceberg
From a comment on here:
MH of course is the example in the KornShell command and programming book as something that can be re-written as a series of shell scripts.
My immediate reaction is 'only as a very unsatisfying, slow, and incomplete version'. The Korn shell people shouldn't feel bad, though; lots of people have looked at MH and had that reaction, and some of them have even given it a try. Unfortunately it's much harder than it looks.
For all that I harsh on MH, one of the things that it does really well is hide a great deal of complexity behind a simple, easy to use exterior. It often manages to do this so seamlessly that you aren't even aware that what you're doing involves complexity. The end result is that what MH does feels simple and trivial when it's anything but.
For example, let's take the issue of what message or messages an MH command should operate on. One of the MH features is something called 'sequences'; a sequence is a possibly disjoint set of messages in a folder. Sequences are named, and you can tell a MH command to operate on a sequence by giving it the name of the sequence. A number of things in MH are implemented as sequences; for example, 'cur' (the current message) and 'unseen' (the unread messages in this folder) are both sequences.
You can support sequences in shell scripts, if you want to; I can think
of at least one plausible brute force representation. But it's not going
to be trivial or really fast because shell scripts simply don't have any
natural support for ordered sets, especially large ones. Or you could
skip supporting sequences and implement things like 'cur' and 'unseen'
with purpose-specific techniques, although this makes your system less
powerful and less like real MH. Since either way this is a significant
chunk of code, you're going to want to put it in some form of a shell
script library (either as a separate command that MH commands run to
find what to operate on, or as code that is directly pulled into scripts
with '.'); neither option is svelte and fast.
(In short, suddenly 'show' has stopped being a little shell script
that wraps 'less <filename>' with some handwaving to find out the
filename. Finding that filename, or filenames, is not so simple.)
This is just one example. Things like this are all over MH, as invisible support for common operations that people who use MH don't even think about. As a result, any shell script reimplementation of MH would be incomplete, hugely complex, and terribly slow (with the complexity and the slowness proportional to how incomplete; the more incomplete, the simpler and faster). The only way to get a simple shell script version of MH is to throw out almost all of what makes MH decent to use in the first place.
The same more or less holds true of reimplementations of MH in serious
languages. Any reasonably full featured MH clone is going to have much
of the same work happening behind the scenes, which makes it nowhere
near simple to write. You need quite a lot of code to get to the point
where 'scan unseen' or 'repl 10' or 'refile +pending' just work
right (and that's ignoring the people who've customized their behavior).
The end result is that there are a lot of partially started incomplete reimplementations of MH, some basic reimplementations of parts of MH that make their author happy but probably not anyone else, and no general simpler reimplementation of MH that I know of.
Sidebar: the other problem with MH reimplementations
MH has a lot of baroque functionality. Everyone who uses MH (and knows about this functionality) agrees that most of it can be dropped as stupid and unnecessary. Unfortunately, everyone who uses MH has a somewhat different idea of which specific bits are absolutely useful and important and have to be kept, and which bits are stupid and unnecessary and should be left out of any reimplementation. Name your favorite obscure MH feature and there are people who are very attached to it and other people who never touch it.
I am not an exception in this. I have my own set of crazy MH bits that I use, and I would be pretty much completely uninterested in an MH reimplementation that left them out entirely and simply hard-coded a single 'common' behavior.
2011-08-15
A bit about what life was like on Unix before shared libraries
If you look at it right, many MH commands are a relatively
thin veneer over a large pool of shared functionality. For example,
when you run show to display the current message there is a bunch
of infrastructure that turns the concept 'the current message' into a
filename for show to open, and of course this infrastructure is common
across all MH commands that can work on 'the current message'. Similar
bits of infrastructure, large and small, exist across a lot of what MH
does (for example, clearly you want each MH command to accept the same
arguments for specifying messages to operate on).
MH is written in C. The obvious way to implement all of this shared infrastructure in C is to put it in a library (or several libraries) and then have every MH command link to the infrastructure libraries that it needs (which is usually most of them). Since most of the functionality of MH has been factored out into these libraries, most of its code is library code. All of this is fine and works great on any system with shared libraries; the library code lives in shared libraries and the commands are tiny executables that use the shared libraries.
But Unix did not always have shared libraries, and people have written systems like this on pre-shared-library Unixes (in fact MH itself predates shared libraries). And back in those days, you had a problem; the total size of your system's executables was huge, because each executable was statically linked against these libraries and mostly consisted of duplicated code (wasting both disk and memory space at a time when both were precious).
There was no good solution to this, merely various unpleasant
workarounds. One of them was used by the the PBM system (at least as I remember it). Since
the executable was the unit of sharing in a pre-shared-library world,
PBM could be built so that it merged many of its separate commands
into a single executable; the front end code in the executable figured
out which command to run by inspecting argv[0]. My memory is that
this did not involve refactoring the code, although it did involve
contortions in the build process.
(MH itself simply shrugged and used more disk space, perhaps partly
because it was already using argv[0] for its own purposes.)
Disclaimer: I may be misremembering which package worked this way. I know that at least one well regarded Unix package did.
2011-08-14
The tragedy of MH
At its fundamental core, MH is one of the most Unix-y mail programs going, because its two core ideas are that you should put each message in a separate file and that each mail command should be a separate Unix command. Ignoring practical considerations for a moment, these ideas are utterly the right way to do a mailer on Unix because both of them let you leverage other Unix tools and shell scripting. Also, since MH simply uses Unix directories to represent mail folders it has a natural way of allowing and representing nested folders.
Unfortunately, the actual implementation of these ideas is rather terribly non-Unixy. Some of this is (or was) forced by implementation issues, but much of it is by choice. For example, MH implements a completely baroque string processing language that's used to format information from messages in various situations (for example, scanning a mail folder or replying to a message). The net effect is that actually using MH commands does not feel very Unix-y, apart from the fact that you're doing so in your shell; the arguments are odd, the configuration process is odd, and so on.
(The source code is also rather non-Unixy, including things like its own weird config and build system.)
The net result feels like its creators had a couple of brilliant moments of Unix insight and then promptly forgot it all in favour of a typical plodding big complex system implementation, complete with the kitchen sink. And I've always thought that that is the tragedy of MH; that such a Unix-y core idea has wound up wrapped in a very non-Unixy everything else.
(A disclaimer: I like MH and have for a long time; it's one of those rare Unix programs where I had a visceral 'this is the right answer' reaction from the moment that I first saw it. I've been using MH for more than 20 years now. But I am not blind to its flaws.)