2018-09-21
Why I mostly don't use ed(1)
for non-interactive edits in scripts
One of the things that is frequently
said about
ed(1)
is that it remains useful for non-interactive modifications
to files, for example as part of shell scripts. I even mentioned
this as a good use of ed
today in my entry on why ed
is not
a good (interactive) editor today,
and I stand by that. But, well, there is a problem with using ed
this way, and that problem is why I only very rarely actually use
ed
for scripted modifications to files.
The fundamental problem is that non-interactive editing with ed
has no error handling. This is perfectly reasonable, because ed
was originally written for interactive editing and in interactive
editing the human behind the keyboard does the error handling, but
when you apply this model to non-interactive editing it means that
your stream of ed
commands is essentially flying blind. If the
input file is in the state that you expected it to be, all will go
well. If there is something different about the input file, so that
your line numbers are off, or a '/search/
' address doesn't match
what you expect (or perhaps at all), or any number of other things
go wrong, then you can get a mess, sometimes a rapidly escalating
one, and then you will get to the end of your ed
commands and
'w
' the resulting mess into your target file.
As a result of this, among other issues, ed
tends to be my last
resort for non-interactive edits in scripts. I would much rather
use sed
or something else that is genuinely focused on stream
editing if I can, or put together some code in a language where I
can include explicit error checking so I'll handle the situation
where my input file is not actually the way I thought it was going
to be.
(If I did this very often, I would probably dust off my Perl.)
If I was creating an ideal version of ed
for non-interactive
editing, I would definitely have it include some form of conditionals
and 'abort with a non-zero exit status if ...' command. Perhaps
you'd want to model a lot of this on what sed
does here with
command blocks, b
, t
(and T
in GNU sed), and so on, but I
can't help but think that there has to be a more readable and clear
version with things like relatively explicit if conditions.
(I have a long standing sed
script that uses some clever tricks
with b
and the pattern space and so on. I wrote it in sed
to
deliberately explore these features and it works, but it's basically
a stunt and I would probably be better off if I rewrote the script
in a language where the actual logic was not hiding in the middle
of a Turing tarpit.)
PS: One place this comes up, or rather came up years ago and got
dealt with then, is in what diff
format people use for patch
.
In theory you can use ed scripts; in practice, everyone considers
those to be too prone to problems and uses other formats. These
days, about the only thing I think ed format diffs are used for is
if you want to see a very compact version of the changes. Even then
I'm not convinced by their merits against 'diff -u0
', although
we still use ed format diffs in our worklogs
out of long standing habit.
Sidebar: Where you definitely need ed
instead of sed
The obvious case is if you want to move text around (or copy it),
especially if you need to move text backwards (to earlier in the
file). As a stream editor, sed
can change lines and it can move
text to later in the file if you work very hard at it, but it can
never move text backward. I think it's also easier to delete a
variable range of lines in ed, for example 'everything from a start
line up to but not including an end marker'.
Ed will also do in-place editing without the need to write to a
temporary file and then shuffle the temporary file into place. I'm
neutral on whether this is a feature or not, and you can certainly
get ed
to write your results to a new file if you want to.
2018-09-12
A surprise discovery about procmail (and wondering about what next)
I've been using procmail for a very long time now, and over that
time I generally haven't paid much attention to the program itself.
It was there in the operating systems I used, it worked, and so
everything was fine; it was just sort of there, like cat
. Thus,
I was rather surprised to stumble over the 2010 LWN article Reports
of procmail's death are not terribly exaggerated (via, sort of via, via, via
Planet Debian), which covers how procmail development and maintenance
had stopped. Things don't exactly seem to have gotten more lively
since 2010 (for example, the procmail domain seems to have mostly
vanished, and then there's the message from Philip Guenther that's
linked to from the wikipedia page). This raises a number of
questions.
The obvious question is whether this even matters (as LWN notes in the original article). Procmail still works fine, and just as importantly, it's still being packaged by Debian, Ubuntu, and so on. There are outstanding Debian bugs, but Debian appears to also be fixing issues in their patches (and there's a 2017 patch in there, so it's not all old stuff). While we have quite a few users that depend a lot on procmail and we'd thus have real problems if, say, Ubuntu stopped packaging it, this doesn't appear likely to happen any time soon.
(Actually, if Ubuntu dropped procmail our answer would likely be to start building the package ourselves. It's not like it changes much.)
But, well, procmail is sort of Internet software, and I've said before that Internet software decays if not actively maintained. Knowing that procmail is only sort of being looked after does make me a little bit uncomfortable. However, this raises the question of what alternatives I (and we) would have for equivalent mail filtering systems. Many people seem to use Sieve, but I believe that has to be integrated into your MTA instead of run through a program in the way that procmail operates, and I don't think it can run external programs (which is important for some people). The closest thing to procmail that I've read about is maildrop, but it's slightly more limited than procmail in several spots and I'm not sure it could fully cover the various ways people here use procmail for spam filtering and running spam filters.
Exim itself has its own filtering system (documented here). These are more powerful than Exim-based Sieve filters (they can deliver to external programs, for example) but of course they require Exim specifically and couldn't be moved to another mailer. They're still not quite as capable as procmail; specifically Exim filters can't directly write to MH format directories (which matters to me because of how I now do a bunch of mail filtering).
We've historically declined to enable either Sieve based filtering or Exim's own filtering in our mail system on the grounds that we wanted to preserve our freedom to change mailers. In light of what I've now learned about procmail, I'm wondering if that's still the right choice. We also don't currently have maildrop installed on our central mail machine (where people already run procmail); perhaps we should change that as well, to give people the option (even if they most likely won't take it).
PS: A quick check suggests that we have around 195 people or so who
are using procmail (in that they have it set up in their .forward
),
which is actually more than I expected. Not all of them are
necessarily using our mail system much any more, though.
2018-09-06
Our future IPv6 access control problems due to non-DHCP6 machines
Back almost two years ago, I wrote about how I suspected a lot of IPv6 hosts wouldn't have reverse DNS because they would be using stateless address autoconfiguration (SLAAC) where they essentially assign themselves one or more random IPv6 addresses when they show up on your network. For us, this presents a problem much larger than just DNS, because control over what hosts DHCP will give addresses to (and what addresses it will assign) are how we force machines to be registered on our laptop network and our wireless network before we give them network access.
The specific driver of IPv6 SLAAC is Android devices, which don't do DHCP6 at all; unfortunately this also includes ChromeOS, which means Chromebooks. But once you enable SLAAC on your network, any number of things may decide to grab themselves SLAAC addresses and then use them, even if they also do DHCP6 and so get whatever address you give them there (this is the iOS behavior I observed a couple of years ago; I don't know how Windows, macOS, and so on behave here). If the IPv6 address and routing they get via DHCP6 doesn't seem to work, I suspect that quite a lot of devices will be perfectly happy to route via their SLAAC address and route, and if that doesn't work, well, the Android and ChromeOS devices aren't getting on the Internet.
There are a number of approaches I can think of. One possible brute force answer is to simply not do SLAAC, only DHCP6 and (IPv4) DHCP. This would mean that SLAAC-only devices would only get IPv4 addresses, but that's not likely to be a practical problem for a long time to come. I think this is our most likely short term answer, because it's the easiest approach and we can always get more complicated later. The other brute force approach is some sort of MAC filtering on our firewalls, but we use OpenBSD and my understanding is that there are a number of issues around MAC filtering in OpenBSD PF.
The officially approved answer is probably to move to IEEE 802.1X on our networks that require this sort of access control. This is infeasible for multiple reasons, including that I believe it would require a wholesale replacement of our network switches on the affected networks. For extra bonus points we don't even run much of the infrastructure that provides our wireless network, which is one of the networks we need this access control on (this is not as crazy as it sounds, but that's another entry).
All of this is yet another reason why any migration to IPv6 will be neither fast nor easy for us, and thus why we still haven't done more than vaguely look in the direction of IPv6. Someday, maybe, when IPv6 appears to actually be important for something.
(And when we do start doing IPv6, it's highly likely to start out being only for a few servers with static IP addresses. Extending it to people's own 'client' devices is likely to be one of the last things we get around to.)
(I was reminded of all of this today by cweiske's question on my old entry.)