2016-10-10
How and why the new iptables -w
option is such a terrible fumble
I wrote recently about the relatively new -w
option for iptables
and how it will make things blow up. Unfortunately
for Linux sysadmins everywhere, exactly how the iptables people
introduced this option is a case study in how not to make changes
like this; it is essentially backwards from what you want to do.
They could probably have made the situation worse than it is now,
but it would take some ingenuity.
Perhaps it is not obvious why iptables -w
is so terrible (I mean,
clearly it wasn't obvious to the iptables developers). To start
seeing where they went so wrong, let's ask a simple question: how
do you write a script (or a program) that will run on both a system
without this change and a system with it?
You can't just use -w
on all your iptables
commands, because
the old version of iptables
doesn't support the option; if you
add it blindly, every command will fail. You can't not use -w
on
systems that support it, because omitting -w
will make random
iptables
commands that you're running fail under some circumstances
(as we've seen); in practice -w
is a
mandatory iptables
option on systems that support it unless you
have a relatively unusual system.
So the answer is 'you must probe for whether or not -w
is supported
on this version of iptables'. Which cuts to the root of the problem:
Introducing
-w
this way created a flag day for all uses ofiptables
.
Before the flag day, you could not use -w
. After the flag day,
you must use -w
. Or at least, you must use -w
if you want your
iptables
commands to be reliable all the time under all circumstances,
including odd ones.
That's the next failing: the flag day introduction of -w
created
a situation where most or all uses of plain iptables
on modern
systems are subtly buggy and dangerous. They aren't obviously broken
so that they fail all or most of the time; instead they now have a
race condition. Race conditions are hard to run into (or find
deliberately) and hard to diagnose, making them one of the most
pernicious classes of bugs. We can see that this is the case because
there are still buggy uses of iptables
on Fedora.
The final failing is that the iptables developers made this use a
single global lock. This maximizes the chance that iptables
commands will collide with each other, even if they happen to be
doing two completely unrelated things that would not interfere with
each other in the least. Are you setting up IPv6 blocks in parallel
with querying IPv4 ones? Tough luck, iptables
will save you from
yourself by making things fail.
All of this is a completely unforced set of errors on the part of
the iptables developers. Faced with the underlying bug that two
simultaneous iptables
commands could interfere with each other
in some situations, they could have solved the issue by serializing
all iptables
commands by default (ie, the equivalent of '-w
').
This would have solved the problem without breaking all current
uses of plain iptables
. People who wanted their commands to fail
instead of wait could have had a new 'fail immediately' option.
(I've written before about the related issue of how to deprecate
things. Arguably this actually is the
same issue, since in practice the iptables developers have deprecated
use of iptables
without -w
.)
Sidebar: A bonus additional issue (fortunately rare)
If you happen to be running multiple iptables
commands in parallel
with -w
and one stream of them is sufficiently unlucky that it
waits for long enough, it will print to standard error a message
like this:
Another app is currently holding the xtables lock; waiting for it to exit...
(The iptables developers have varied this message repeatedly as they've fiddled with various micro-issues around the implementation of locking, so different versions of different distributions will have somewhat different messages.)
This is not quite the total failure that printing new warning messages by default is, since you have to give a new command line option to produce this behavior. Still, it's not very helpful and of course it's not documented and it's generally hard to hit this, so you can easily write programs that don't expect this and will blow up in various ways if it ever happens.
The modern web is an unpredictable and strange place to develop for
Our local support site used to be not all that attractive and also not entirely well organized. Ultimately these descended from the same root cause; that iteration of the site started out life as a wiki (with a default wiki skin), then was converted to plain HTML via brute force when the wiki blew up in our faces. Recently we replaced it with a much nicer version that has a much more streamlined modern design.
As part of that more modern design, it has a menubar along the top with drop-down (sub-)menus for many of the primary options. These drop-downs are done via CSS, specifically with a :hover based style. When I tried the new site out on my desktop it all looked great, but as I was playing around with it a dim light went off in the back of my mind; I had a memory that hover events aren't supported on touch-based systems, for obvious reasons. So I went off to my work iPad Mini (then running iOS 9), fired up Safari, and lo and behold those nice drop-down menus were completely non-functional. You couldn't see them at all; if you touched the primary menu, Safari followed the link. We hacked around with a few options but decided to take the simple approach of insuring that all of the sub-menu links were accessible off the target page of the primary menu.
So far this was exactly what we'd expected. Then one of my co-workers reported that this didn't happen on her iPhone, and it emerged that she used the iOS version of Chrome instead of Safari. I promptly installed that on my iPad Mini, and indeed I saw the same Chrome behavior she did; touching or tapping the primary menu didn't follow the link, it caused the dropdown to appear. Well, okay, that wasn't too strange and it sort of made sense that different browsers might do things slightly differently here (perhaps even deliberately).
(Note that this is slightly weird on iOS because on iOS all browsers use the same underlying engine. So Safari and Chrome are using the same core engine here but are making it behave somewhat differently. The Brave browser has Chrome's behavior.)
Now things get weird. I recently got a latest-generation iPhone; naturally I wound up browsing our (new) support site in it, on Safari, and I tapped one of those primary menus. Imagine my surprise when I got a drop-down submenu instead of having Safari follow the primary menu link. I went back to the iPad Mini, made sure it was updated to the same version of iOS, and tried again. And the behavior wasn't the same as on the iPhone. On the iPad Mini, touching the primary menu followed the link. On the iPhone, touching the primary menu dropped down the sub-menu.
(On the iPhone, I can double-tap the primary menu to follow the link.)
What I took away from this experience is that developing for the modern web is stranger and more unpredictable than I can imaging. I would have never guessed that two iOS devices, running the same iOS version and using the same system browser, would have two different behaviors.
(One implication is that testing things on an iPad Mini is not a complete standin for testing things on an iPhone and vice-versa. This is unfortunate; if nothing else, it makes testing more time-consuming.)