What is and isn't a bug in software

March 24, 2015

In response to my entry on how systemd is not fully SysV init compatible because it pays attention to LSB dependency comments when SysV init does not, Ben Cotton wrote in a comment:

I'd argue that "But I was depending on that bug!" is generally a poor justification for not fixing a bug.

I strongly disagree with this view at two levels.

The first level is simple: this is not a bug in the first place. Specifically, it's not an omission or a bug that System V init doesn't pay attention to LSB comments; it's how SysV init behaves and has behaved from the start. SysV init runs things in the order they are in the rcN.d directory and that is it. In a SysV init world you are perfectly entitled to put whatever you want to into your script comments, make symlinks by hand, and expect SysV init to run them in the order of your symlinks. Anything that does not do this is not fully SysV init compatible. As a direct consequence of this, people who put incorrect information into the comments of their init scripts were not 'relying on a bug' (and their init scripts did not have a bug; at most they had a mistake in the form of an inaccurate comment).

(People make lots of mistakes and inaccuracies in comments, because the comments do not matter in SysV init (very little matters in SysV init).)

The second level is both more philosophical and more pragmatic and is about backwards compatibility. In practice, what is and is not a bug is defined by what your program accepts. The more that people do something and your program accepts it, the more that thing is not a bug. It is instead 'how your program works'. This is the imperative of actually using a program, because to use a program people must conform to what the program does and does not do. It does not matter whether or not you ever intended your program to behave that way; that it behaves the way it does creates a hard reality on the ground. That you left it alone over time increases the strength of that reality.

If you go back later and say 'well, this is a bug so I'm fixing it', you must live up to a fundamental fact: you are changing the behavior of your program in a way that will hurt people. It does not matter to people why you are doing this; you can say that you are doing it because the old behavior was a mistake, because the old behavior was a bug, because the new behavior is better, because the new behavior is needed for future improvements, or whatever. People do not care. You have broken backwards compatibility and you are making people do work, possibly pointless work (for them).

To say 'well, the old behavior was a bug and you should not have counted on it and it serves you right' is robot logic, not human logic.

This robot logic is of course extremely attractive to programmers, because we like fixing what are to us bugs. But regardless of how we feel about them, these are not necessarily bugs to the people who use our programs; they are instead how the program works today. When we change that, well, we change how our programs work. We should own up to that and we should make sure that the gain from that change is worth the pain it will cause people, not hide behind the excuse of 'well, we're fixing a bug here'.

(This shows up all over. See, for example, the increasingly aggressive optimizations of C compilers that periodically break code, sometimes in very dangerous ways, and how users of those compilers react to this. 'The standard allows us to do this, your code is a bug' is an excuse loved by compiler writers and basically no one else.)


Comments on this page:

I should have included the second part of my thought. Failure to adhere to a standard while on the surface making use of it is a bug. It's not a SySV init bug, but a bug in the particular in it script. Why write the information at all if it's not going to be used, and especially if it could cause unexpected behavior? Any init script that includes dependency information in that format (as opposed to a textual listing for human reference) is effectively lying.

I agree with you in the general sense that at a certain critical mass (some product of time and usage) bugs become expected behavior. The hard part about that is knowing where the line should be drawn. "Robot logic" is certainly easier in this case because it's less subjective.

In any event, I can't blame systemd for this. The script provides information, and you can't blame systemd for it being incomplete. The change in behavior is obviously undesirable. Is it better to ignore the additional information for SysV init scripts even if that means scripts that have complete information can't take advantage of it?

By Frank Ch. Eigler at 2015-03-24 09:24:19:

"the increasingly aggressive optimizations of C compilers that periodically break code"

At least the compiler writers provide a knob for these standards-incompliant programs to keep working, namely -O0.

Why write the information at all if it's not going to be used, and especially if it could cause unexpected behavior?

Why not? It doesn’t matter what it says (under the old interface contract). It’s not lying. It’s just whistling to itself. It can whistle any tune it likes because nothing it says or doesn’t say makes any difference.

If you change the interface contract so that now it does make a difference, and then everything breaks, it’s you that caused the breakage.

Any and all metadata is always substantially incomplete and significantly incorrect unless it changes the behaviour of the system in some way. This is simply reality: meaningless metadata is mostly worthless, because in absence of any impact, there is no corrective pressure on it. There are countless proverbs that boil down to this basic principle (e.g. “code comments rot”).

So just because a format has room for recording metadata about X does not mean you can assume that the metadata will be correct. At the very best that is lazy. You are looking to move the system from a state of meaningless metadata about X to one of meaningful metadata about X. That means you have to provide a way to make that transition, and it is you who has to provide that.

That means you have to provide a way to make that transition, and it is you who has to provide that.

I take it from your comments over the past two blog posts that you'd prefer that systemd instead had no support for SysV-style init scripts instead? That's a reasonable position to take, though I suspect it would cause just as much pain (if not more) than the current implementation. The other option, which maybe is what you're advocating, is that systemd ignore the dependency information in the same ways SysV did.

What it comes down to is that systemd is not SysV. I can't speak for the developers because I'm not them and I haven't even followed the mailing lists. I suspect they included support for init scripts in order to speed adoption and make life a little easier on the end user. But it's not fair to expect systemd to know if the metadata is valid or not. That's on the person who wrote the metadata.

As Chris said in a follow-up post, it's not realistic to expect people to have the LSB memorized and always comply perfectly. Copy/paste is a thing that happens frequently for better and for worse. But If the person who wrote the metadata isn't responsible for it, who is?

I take it from your comments over the past two blog posts that you'd prefer that systemd instead had no support for SysV-style init scripts instead? […] The other option, which maybe is what you're advocating, is that systemd ignore the dependency information in the same ways SysV did.

I thought it was reasonably obvious what position I take, but maybe not, since you missed both tries. So to be explicit, the position I argue is that systemd may well choose to be capable of executing SysV init scripts in parallel, but ought not do so by default. It might elect to emit a message that it’s operating in non-concurrent mode and that this might change at some point in the future.

It should also try its best to offer diagnostics of some kind, e.g. a message that script X would have been started in parallel with script Y if concurrency were enabled, and/or which dependencies prevented it from being started earlier. Some sort or dry run mode for the SysV init launcher should also be available. That way, users get a chance to audit and fix the metadata before it breaks everything.

It’s a transition. You give users the tools to manage the transition. I’m not against the transition (nor for it, for that matter). You just don’t simply declare the world changed by fiat from one moment to the next and let the users pick up the pieces.

The tragicomedy of the situation is how modest the required effort would be.

But it's not fair to expect systemd to know if the metadata is valid or not. That's on the person who wrote the metadata.

It’s completely fair to expect systemd to presume that all existing metadata is utter garbage. Which at least some of it is guaranteed to be, because nothing needed it to be correct before. And for that reason there is also nothing that’s on the person who wrote the metadata: it simply didn’t matter.

As Chris said in a follow-up post, it's not realistic to expect people to have the LSB memorized and always comply perfectly. Copy/paste is a thing that happens frequently for better and for worse. But If the person who wrote the metadata isn't responsible for it, who is?

The person who edits or adds the metadata after the metadata came to matter, of course. Who else?

(On further reflection, it may or may not work for concurrent mode to be a global switch. It may have to be a per-script setting, or something in-between. The scenario I am thinking of is vendor vs site init scripts, but auditing sets of init scripts piecemeal may be a requirement also. I can‘t be bothered with that level of detail here though.)

By Nadya at 2015-03-26 14:46:18:

This can be a dangerous way to reason. Many bug-fixes will change someone's workflow or make them do work. Where do you draw the line?

Good illustration of silliness (albeit taken to an absurd level): https://xkcd.com/1172/

Fear of relying on your judgement and sloppy slippery slope thinking do not an argument make. You draw the line where the line is. That’s what thinking in terms of interface contracts helps you do (along with other ways of reasoning about it).

If you think I have been saying that nobody’s workflow should be broken by anything ever or that nobody should ever be made to do any work, you have you not been paying attention; in that case please re-read what I wrote with more care. If you have any specific questions I will be glad to answer them.

By nobody at 2015-04-01 15:21:43:

Sysvinit compatibility IMPLIES behaving like sysvinit IMPLIES discarding what sysvinit discards. Any divergence is a bug in systemd's advertised compatibility, if there has been one.

And this is the end of the discussion, until we switch to an "enough hipsters at enough CLI prompts" software development methodology, that is.

Written on 24 March 2015.
« Systemd is not fully backwards compatible with System V init scripts
A significant amount of programming is done by superstition »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Mar 24 01:46:45 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.