How and why the new iptables -w
option is such a terrible fumble
I wrote recently about the relatively new -w
option for iptables
and how it will make things blow up. Unfortunately
for Linux sysadmins everywhere, exactly how the iptables people
introduced this option is a case study in how not to make changes
like this; it is essentially backwards from what you want to do.
They could probably have made the situation worse than it is now,
but it would take some ingenuity.
Perhaps it is not obvious why iptables -w
is so terrible (I mean,
clearly it wasn't obvious to the iptables developers). To start
seeing where they went so wrong, let's ask a simple question: how
do you write a script (or a program) that will run on both a system
without this change and a system with it?
You can't just use -w
on all your iptables
commands, because
the old version of iptables
doesn't support the option; if you
add it blindly, every command will fail. You can't not use -w
on
systems that support it, because omitting -w
will make random
iptables
commands that you're running fail under some circumstances
(as we've seen); in practice -w
is a
mandatory iptables
option on systems that support it unless you
have a relatively unusual system.
So the answer is 'you must probe for whether or not -w
is supported
on this version of iptables'. Which cuts to the root of the problem:
Introducing
-w
this way created a flag day for all uses ofiptables
.
Before the flag day, you could not use -w
. After the flag day,
you must use -w
. Or at least, you must use -w
if you want your
iptables
commands to be reliable all the time under all circumstances,
including odd ones.
That's the next failing: the flag day introduction of -w
created
a situation where most or all uses of plain iptables
on modern
systems are subtly buggy and dangerous. They aren't obviously broken
so that they fail all or most of the time; instead they now have a
race condition. Race conditions are hard to run into (or find
deliberately) and hard to diagnose, making them one of the most
pernicious classes of bugs. We can see that this is the case because
there are still buggy uses of iptables
on Fedora.
The final failing is that the iptables developers made this use a
single global lock. This maximizes the chance that iptables
commands will collide with each other, even if they happen to be
doing two completely unrelated things that would not interfere with
each other in the least. Are you setting up IPv6 blocks in parallel
with querying IPv4 ones? Tough luck, iptables
will save you from
yourself by making things fail.
All of this is a completely unforced set of errors on the part of
the iptables developers. Faced with the underlying bug that two
simultaneous iptables
commands could interfere with each other
in some situations, they could have solved the issue by serializing
all iptables
commands by default (ie, the equivalent of '-w
').
This would have solved the problem without breaking all current
uses of plain iptables
. People who wanted their commands to fail
instead of wait could have had a new 'fail immediately' option.
(I've written before about the related issue of how to deprecate
things. Arguably this actually is the
same issue, since in practice the iptables developers have deprecated
use of iptables
without -w
.)
Sidebar: A bonus additional issue (fortunately rare)
If you happen to be running multiple iptables
commands in parallel
with -w
and one stream of them is sufficiently unlucky that it
waits for long enough, it will print to standard error a message
like this:
Another app is currently holding the xtables lock; waiting for it to exit...
(The iptables developers have varied this message repeatedly as they've fiddled with various micro-issues around the implementation of locking, so different versions of different distributions will have somewhat different messages.)
This is not quite the total failure that printing new warning messages by default is, since you have to give a new command line option to produce this behavior. Still, it's not very helpful and of course it's not documented and it's generally hard to hit this, so you can easily write programs that don't expect this and will blow up in various ways if it ever happens.
|
|