Wandering Thoughts archives


How and why the new iptables -w option is such a terrible fumble

I wrote recently about the relatively new -w option for iptables and how it will make things blow up. Unfortunately for Linux sysadmins everywhere, exactly how the iptables people introduced this option is a case study in how not to make changes like this; it is essentially backwards from what you want to do. They could probably have made the situation worse than it is now, but it would take some ingenuity.

Perhaps it is not obvious why iptables -w is so terrible (I mean, clearly it wasn't obvious to the iptables developers). To start seeing where they went so wrong, let's ask a simple question: how do you write a script (or a program) that will run on both a system without this change and a system with it?

You can't just use -w on all your iptables commands, because the old version of iptables doesn't support the option; if you add it blindly, every command will fail. You can't not use -w on systems that support it, because omitting -w will make random iptables commands that you're running fail under some circumstances (as we've seen); in practice -w is a mandatory iptables option on systems that support it unless you have a relatively unusual system.

So the answer is 'you must probe for whether or not -w is supported on this version of iptables'. Which cuts to the root of the problem:

Introducing -w this way created a flag day for all uses of iptables.

Before the flag day, you could not use -w. After the flag day, you must use -w. Or at least, you must use -w if you want your iptables commands to be reliable all the time under all circumstances, including odd ones.

That's the next failing: the flag day introduction of -w created a situation where most or all uses of plain iptables on modern systems are subtly buggy and dangerous. They aren't obviously broken so that they fail all or most of the time; instead they now have a race condition. Race conditions are hard to run into (or find deliberately) and hard to diagnose, making them one of the most pernicious classes of bugs. We can see that this is the case because there are still buggy uses of iptables on Fedora.

The final failing is that the iptables developers made this use a single global lock. This maximizes the chance that iptables commands will collide with each other, even if they happen to be doing two completely unrelated things that would not interfere with each other in the least. Are you setting up IPv6 blocks in parallel with querying IPv4 ones? Tough luck, iptables will save you from yourself by making things fail.

All of this is a completely unforced set of errors on the part of the iptables developers. Faced with the underlying bug that two simultaneous iptables commands could interfere with each other in some situations, they could have solved the issue by serializing all iptables commands by default (ie, the equivalent of '-w'). This would have solved the problem without breaking all current uses of plain iptables. People who wanted their commands to fail instead of wait could have had a new 'fail immediately' option.

(I've written before about the related issue of how to deprecate things. Arguably this actually is the same issue, since in practice the iptables developers have deprecated use of iptables without -w.)

Sidebar: A bonus additional issue (fortunately rare)

If you happen to be running multiple iptables commands in parallel with -w and one stream of them is sufficiently unlucky that it waits for long enough, it will print to standard error a message like this:

Another app is currently holding the xtables lock; waiting for it to exit...

(The iptables developers have varied this message repeatedly as they've fiddled with various micro-issues around the implementation of locking, so different versions of different distributions will have somewhat different messages.)

This is not quite the total failure that printing new warning messages by default is, since you have to give a new command line option to produce this behavior. Still, it's not very helpful and of course it's not documented and it's generally hard to hit this, so you can easily write programs that don't expect this and will blow up in various ways if it ever happens.

linux/IptablesWOptionFumbles written at 23:03:07; Add Comment

The modern web is an unpredictable and strange place to develop for

Our local support site used to be not all that attractive and also not entirely well organized. Ultimately these descended from the same root cause; that iteration of the site started out life as a wiki (with a default wiki skin), then was converted to plain HTML via brute force when the wiki blew up in our faces. Recently we replaced it with a much nicer version that has a much more streamlined modern design.

As part of that more modern design, it has a menubar along the top with drop-down (sub-)menus for many of the primary options. These drop-downs are done via CSS, specifically with a :hover based style. When I tried the new site out on my desktop it all looked great, but as I was playing around with it a dim light went off in the back of my mind; I had a memory that hover events aren't supported on touch-based systems, for obvious reasons. So I went off to my work iPad Mini (then running iOS 9), fired up Safari, and lo and behold those nice drop-down menus were completely non-functional. You couldn't see them at all; if you touched the primary menu, Safari followed the link. We hacked around with a few options but decided to take the simple approach of insuring that all of the sub-menu links were accessible off the target page of the primary menu.

So far this was exactly what we'd expected. Then one of my co-workers reported that this didn't happen on her iPhone, and it emerged that she used the iOS version of Chrome instead of Safari. I promptly installed that on my iPad Mini, and indeed I saw the same Chrome behavior she did; touching or tapping the primary menu didn't follow the link, it caused the dropdown to appear. Well, okay, that wasn't too strange and it sort of made sense that different browsers might do things slightly differently here (perhaps even deliberately).

(Note that this is slightly weird on iOS because on iOS all browsers use the same underlying engine. So Safari and Chrome are using the same core engine here but are making it behave somewhat differently. The Brave browser has Chrome's behavior.)

Now things get weird. I recently got a latest-generation iPhone; naturally I wound up browsing our (new) support site in it, on Safari, and I tapped one of those primary menus. Imagine my surprise when I got a drop-down submenu instead of having Safari follow the primary menu link. I went back to the iPad Mini, made sure it was updated to the same version of iOS, and tried again. And the behavior wasn't the same as on the iPhone. On the iPad Mini, touching the primary menu followed the link. On the iPhone, touching the primary menu dropped down the sub-menu.

(On the iPhone, I can double-tap the primary menu to follow the link.)

What I took away from this experience is that developing for the modern web is stranger and more unpredictable than I can imaging. I would have never guessed that two iOS devices, running the same iOS version and using the same system browser, would have two different behaviors.

(One implication is that testing things on an iPad Mini is not a complete standin for testing things on an iPhone and vice-versa. This is unfortunate; if nothing else, it makes testing more time-consuming.)

web/ModernWebIOSDifference written at 01:03:13; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.