Wandering Thoughts archives

2005-08-30

The Version Control System dependency problem

One of the aphorisms of HCI design is that if users keep making the same error with a program, it's the program and not the users that's actually wrong. By this standard, almost all version control systems really need an HCI makeover, because there is one classical mistake that users keep making over and over.

The mistake: make a change, then make an unrelated change, then make a third unrelated change (perhaps fix three different and independent bugs in entirely different files). You get a version diagram that looks like:

Original -> B -> C -> D

Now try to pass just the third change to someone, using the version control system. Almost all version control systems will refuse, saying that the C to D change depends on C (which depends on B), and you can't pass D without its dependencies.

(The 'proper' way to do this is that you should have put each change in a separate branch (or equivalent), then merged them together to get something you can test.)

But these are unrelated changes and the VCS is wrong, because most VCSes have adopted an extremely simplistic idea of dependency: if C comes after B in the same branch, it depends on B. In turn, this simplistic idea clashes with how normal people try to use VCSes before they get inoculated with the version control religion.

And this is the VCS dependency problem: VCSes are too strict about 'dependencies', so much so that it gets in people's way. Clinging to mathematic purity may have a clean intellectual appeal, but it comes at the expense of practical usability.

In practice it also comes at a cost of damaging the integrity of the codebase's history. People will make these kinds of mistakes and have to fix them; the only question is whether they will be able to do so inside the VCS or whether they will resort to exporting and importing patches and the like (with the loss of some of the development history).

VCSDependencyProblem written at 03:21:13; Add Comment

2005-08-26

Two faces of RSS

I recently came to a realization that RSS (and syndication in general) has two faces:

  1. notification of new information
  2. rearranging information from feeds

This neatly explains something that's been puzzling me for a while: why I find RSS sometimes very useful and sometimes very irritating.

Notification works great, except for web sites that already update as frequently as I normally read them. (For example, Slashdot, which I only browse once or twice a day. If my RSS reader were to tell me that Slashdot had no new articles today, I would assume Slashdot's RSS feed was broken.)

Rearranging information from feeds is so-so. The clear win is having my RSS reader keep track of what I haven't read yet. While you can argue that this (and rearranging unread blog entries to oldest to newest) is just compensating for traditional blog formats being braindamaged, having it fixed is still useful.

Displaying entries differently from how my browser would is hit and miss. When a website does stupid HTML tricks, my RSS reader ignoring most of it is a nice win; but the real problem is that the website should have better design. When the website has good design, simplifying the entries is at best neutral and goes downhill rapidly.

Worse, RSS readers are at the mercy of the websites generating the feeds. If the formatting of the entries in the feed is broken in various ways (some described in AnnoyingRSSFeedTricks), there is nothing the RSS reader can do and the feed version is always going to be inferior.

Most RSS readers don't seem to do much other information rearrangement, although they could and should (as mentioned in RSSAndUsenet). Some information rearrangement is done by aggregate feeders, such as Planet Python.

Slashdot is probably the worst case for me in RSS feeds:

  • it updates more often than I want to browse it.
  • I have no problem keeping track of what I've already read.
  • it looks good in my browser.
  • it looks at best so-so in my RSS reader.

The first three are the important ones. That they also apply to a number of other websites I wind up reading often neatly explains why I am unenthused about their syndication feeds.

On the other side, the less frequently and the more irregularly a website updates, the more useful its RSS feed is to me. Just getting pinged with the information that an update exists is a huge win (since it means I will actually read the new stuff); any bad presentation in my RSS reader is a minor side issue.

TwoFacesOfRSS written at 01:05:09; Add Comment

2005-08-25

Explaining rubber duck debugging

Rubber duck debugging is the process of explaining your problem out loud to a suitable object (a rubber ducky or another person, take your pick); the alternate version is to write down your problem, for example in a grumbling blog entry or an email. This may sound very peculiar if you've never done it, but it works surprisingly well.

(Obligatory attribution dammit to the Centresource.com blog entry on it.)

As it happens, my leading explanation for why this works so well has already been written up here, so I will just quote it:

[...] Because speaking out loud forces you to marshal your thoughts, which in turn highlights any contradictions or missed steps that you hadn't noticed while everything was just swirling around inside your head.

My experience is that it's very easy to handwave things and lie to yourself in your thoughts, partly because we allow a kind of mental shorthand that lets fuzz sneak in. When we speak out loud or write something down there is no such shorthand and it is much easier to see that we have (or haven't) covered something.

The other useful thing that rubber duck debugging can do is expose just how ugly and stupid a particular idea or bit of code or interface is. This followup of the original message has a nice story of it in action.

Several times I've started to document a bad idea, realized how embarrassed I would be to have people reading about it, and been driven to fix it. It's especially useful for stupid limitations or awkward system administration rain dances. (This does require you to have a sense of shame or a pride in your work, but I hope you have that already.)

WhyTalkToTheDuck written at 02:32:33; Add Comment

2005-08-19

The March of the Cheap

Once upon a time, a collection of companies ruled the workstation market, selling what were then called '1M^3' machines, machines with one million MIPs, one megabyte of RAM, and one million (black and white) pixels. With computing so clearly heading their way, companies like Apollo, DEC, IBM, HP, SGI, and Sun had futures so bright they had to wear shades.

Computers have long since surpassed the '1M^3' level of performance, yet of the companies I just listed, only Sun still really sells Unix workstations. One can point to mundane reasons for this, but there is a common decision all of them made:

All of them abandoned the low end.

(There were always solid commercial reasons for doing so, that boiled down to 'it's too hard to compete with a flood of cheap PClones'.)

The problem with abandoning the low end is that the march of computing progress (Moore's Law and all) means that the low end keeps moving upwards. As it moves upwards more and more computing becomes low end computing, and the 'high end' keeps shrinking. As the companies abandoned the low end they were left chasing a shrinking and ever more competitive market, with the sort of results you'd expect.

Competing at the low end of the hardware market isn't easy. There's no guarantee that any of these companies could have succeeded at it. But by giving up on the low end, they guaranteed their slow diminishment and effectively cut their own throats.

The peculiar case of Sun

I was going to subtitle this '(why I think Sun is doomed)', but friends have told me that these days they may be price competitive for small servers and what 'workstations' have turned into. (Naturally, these are based on commodity PC hardware.)

The question of whether Sun will stay with Solaris is an interesting one. It boils down to whether they can afford to continue paying for Solaris development, which comes down to how much of a premium Sun's marketing people can convince people to pay for Solaris. Since Sun spent years successfully persuading people to buy expensive under-performing workstations, I suspect that Sun's marketing department is pretty good.

MarchOfTheCheap written at 01:33:44; Add Comment

2005-08-17

Remember to think about the scale of things

One of the famous computer programming quotes is 'premature optimization is the root of all evil' (C.A.R. Hoare quoted by Donald Knuth; attribution dammit (tm)).

A related issue is 'think about the scale of what you're planning'. A recent LiveJournal story provides a lovely example of this. To quote from it:

An increasing number of companies (large and small) are really insistent that we ping them with all blog updates, for reasons I won't rant about.

LiveJournal gets 3 or more public posts a second. That's a third of a second per post that has to include all DNS lookups, connection setup, sending the HTTP or SOAP or XML or whatever the ping format is, and connection teardown. (Apparently none of the companies gave LiveJournal a streaming interface, where LJ could open a connection once and then feed in results.)

The LiveJournal people gave a couple of these companies what they asked for. None of them could keep up.

The companies probably don't have bad or buggy software. I'm sure it works fine for the current set of blogs that pings them, and even has room for future growth. They just didn't think about the scale of what they were asking for from LiveJournal, and it probably didn't even occur to them to think about it.

Of course that's part of the problem of scale: it rarely occurs to people to think about it. Especially people almost never think about radical scale changes, whether up or down. This can lead to perfectly good solutions that don't fit the scale of the problem, or (as in this case) perfectly good solutions that don't quite handle the scale of a new problem.

When I start thinking about a system, I've found it useful to think about the scale of things as well as the problem itself. Sometimes this means I have to do more work; not infrequently it means I can do less. Thereby avoiding premature optimization and evil, and bringing me back to the quote up at the top.

Sidenote: the full optimization quote

A Google search wound up here, which cites the full quote as:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

An MSDN web page gives the source as:

Quoted in Donald E. Knuth, Literate Programming (Stanford, California: Center for the Study of Language and Information, 1992), 276.

ThinkAboutScale written at 00:56:50; Add Comment

Annoying RSS Feed Tricks

The RSS feed tricks that are really annoying me right now are all the different ways people have invented to serve partial entry content. Almost all of them are bad, plus the basic idea is bad too.

Serving partial entries implies that the blog authors don't expect their readers to be interested in most of their words (otherwise, why make them go through extra effort to read them). The only good reasons for this that I can think of offhand are that very long entries or entries on a huge variety of topics. (Given the blogs I read, I can discount vulgar commercial motives.)

(My feed reader makes it very easy to skip the rest of an entry if I decide it's not interesting. If yours doesn't, find better software.)

The best excuse for this and the best version of it I've seen is the BBC news site. They at least have the excuse that they cover everything from soccer scores to earthquakes in Japan. They also go to the actual effort of publishing single sentence summaries of the news story (plus the headline).

Everyone else has both far less excuse and devotes far less effort to it. The result is, unsurprisingly, far less usable and far more annoying. Bad ways include:

  • serving an article abstract for a feed that's only about one thing. If I am interested enough to subscribe to the feed, I am interested enough to read more than your abstracts.
  • truncating the entry after the first sentence or paragraph, which may not serve all that well as a summary and/or teaser.
  • just truncating the entry after a certain number of words. You get bonus points for not explicitly noting the truncation, or marking it with something that can be at the end of your short posts too, like '...'.

The third method produces the worst results and is naturally the most common technique (perhaps because the other two take effort, instead of trivial code). I suppose I should be thankful that I've yet to see anyone truncating entries after so many characters, gleefully slicing words in half with their sharp ginsu code.

If your blog truncates entries in your syndication feed, for the love of the gods please take a look at how the feed looks in a feed reader. Then ask yourself if the result is either appealing or useful.

(I do not object to cutting off what are essentially footnotes from RSS entries; I sometimes do it too.)

Updated, August 25th: Another stupid entry truncation trick is just to have the title/link, with no entry text at all; bonus points are awarded for unhelpful titles. I had mercifully forgotten about this one until the feed in question had a new posting to one of the Planet feed aggregators that I read.

AnnoyingRSSFeedTricks written at 00:22:50; Add Comment

2005-08-10

Why open source needs distributed version control

Centralized versus distributed version control is one of the big discussion topics in the field. Being distributed complicates the software, and CVS and Subversion (the two version control systems most widely used for open source) are both centralized. Recently, Ian Bicking has written some interesting articles on the issue, in Centralized vs Decentralized and its part 2.

I believe that open source development needs distributed version control. The argument why this is necessary is a bit too long for a comment on Ian Bicking's articles (and besides, this way I have a real editor), so I'm putting it here.

Unless you have a small or really peculiar open source project, you can't give everyone who wants to do non-trivial development core commit access. However, you still want all those people to be using version control, and it needs to let them share their work with collaborators, testers, critics, and so on.

If you do let them piggyback off your project's core version control environment, what they need translates to publicly visible branches (ideally with access control under their control). To use the version control system for their work, a developer obtains a new branch and gets to commit on the branch. (If you don't let them piggyback off the core, you get a de facto 'distributed' (also anarchic and uncontrolled) version control 'system'.)

As Ian Bicking has noted, there is no technical reason that a centralized version control system can't support this; it's just that nothing does today.

However, this still leaves at least two big questions: who can create branches, and when do branches get deleted? (You probably don't want to answer 'anyone' and 'never'.)

These are not technical questions, these are questions of policy and thus of politics. This means that someone or some group has to play some form of central gatekeeper (and the people involved need to be respected, ie good developers). As it involves politics, this is not going to be a light, pleasant job. Also, the more developers you get, the more branches and branch issues the central gatekeeper has to deal with, the more grief these people catch, the less time they have to do actual development, and so on.

With a distributed version control system, branching is distributed. A developer that wants to branch just does it locally. If they want to share it, they do; if they want to give someone access, they do; if they want to keep the branch around or kill it, they do. All the management overhead of branches falls on the heads of the people actually doing the branching; none of it falls on anyone not involved.

Thus: a distributed system scales branching excellently as you add more developers, unlike a centralized one.

This is why I believe open source fundamentally needs distributed version control systems.

(And looking at the length and somewhat lack of coherence of this, it's a good thing I didn't try to make it a comment in Ian Bicking's blog.)

Sidebar: centralized resource requirements

Centralized version control systems also have the disadvantage that they centralize resource requirements, because everyone has to talk to the master server to do anything (at least anything that involves writes). This also makes the central server a crucial resource; if it's not there, no one can do any version control work.

This means that the resources required to run a central server acceptably keep rising as the number of developers rises. If your project gets popular and you acquire hundreds of new developers eager to help, you may not be able to accommodate them (at least right away). Oops.

I don't consider this a real issue, since so far it's always been possible to buy beefier servers. But I do think it's something to keep in mind as one downside of the centralized approach. (On the other hand, centralization has its advantage, like wider visibility of branches and development. (Indeed some distributed VC based projects, like the Linux kernel, create their own centralization point for people to find branches in one spot.))

WhyDistributedVersionControl written at 01:55:56; Add Comment

2005-08-08

Security is a pain

Every so often, people in my line of work are surprised when users and other people don't take security seriously. However, we really shouldn't be, for a simple reason: security is a pain.

Almost always, computer security means extra work that you have to do and things that get in your way; hoops that you have to jump through before you get to what you really want to do. Is it any surprise that people don't like it and avoid it when they can? (Especially when bad consequences for lax security are so rare.)

We can preach all the homilies we want to about the virtuousness of security and how people should care and do it; they will work about as well as any of these, on any subject, ever work with real people. Which is to say, not very well at all. If any of this surprises us, it is because we haven't been paying attention.

(Perhaps not paying attention to real human nature; perhaps not paying attention to how much of a pain computer security is for ordinary people.)

There are only four ways out of this that I can see:

  1. Make computer security less of a pain.
  2. Have the risks of not going through the pain rise dramatically.
  3. Beat people until they do the security despite its pain.
  4. Hope for a miracle.

Unfortunately, many 'security initiatives' seem to consist of some mixture of #3 and #4 (often heavy on the #3). Since no one likes being beaten (or threatened with it), the actual results are usually less than entirely satisfactory and often have undesirable long-term consequences.

(As for #4 alone, to quote someone: 'hope is not a plan'.)

No one likes #2, but lots of people think it is going to happen someday. So far any tendencies in that direction tend to produce slow reactions that are good enough to keep the pain down enough. In a sense this is unsurprising; predators usually don't want to destroy their prey population.

The only truly proven and successful way of increasing computer security is #1. Unfortunately it often runs into problems:

  • it's hard.
  • to do it is to admit that your previous security precautions were too onerous, something that can be hard for people to do.
  • it can be messy; computer science people never like mess.
  • the compromises often involved are anathema to some security cultures.

These problems can be overcome. But it takes work, and to do that work people need to be persuaded that making security less painful is the way to go. And a lot of people are in denial about that.

Please don't be one of them.

See also: Computer Security in the Real World

While this rant has been bubbling in my head for some time, its timing and some of its substance is strongly inspired by the start of Computer Security in the Real World, by Butler W. Lampson. For flavour, here's the opening paragraph of the abstract:

After thirty years of work on computer security, why are almost all the systems in service today extremely vulnerable to attack? The main reason is that security is expensive to set up and a nuisance to run, so people judge from experience how little of it they can get away with. Since there's been little damage, people decide that they don't need much security. In addition, setting it up is so complicated that it's hardly ever done right. While we await a catastrophe, simpler setup is the most important step toward better security.

What he said.

SecurityPain written at 01:23:43; Add Comment

2005-08-05

Perimeter firewalls and universities

The University of Toronto doesn't have a firewall. There, I've said it and you can all gasp in horror: how could any organization on the Internet in this day and age fail to have such a basic thing as a firewall between them and the nasty net?

Because it doesn't help all that much; because it has the wrong threat model. A perimeter firewall protects you from evil people out there on the net, but does nothing to protect you from evil people inside, on your intranet.

A decent-sized university is overrun with students. Some of those students may be malicious, and many of them are going to be careless with their student logins. (These days, many of them may have compromised laptops, or in residences, desktops.)

Compound the situation with unsecured network drops and plugs, and ad-hoc wireless networks set up by departments, workgroups, and professors. Compound the situation again because in practice the general public can wander pretty freely through any place where ordinary students are found.

So, any serious attacker and many casual attackers (who are likely to pick on something on the network they're already around because oh, it's there) is going to have on-campus network access. Bypassing a theoretical perimeter firewall entirely, since they are inside.

Once I've secured my machines against on-campus attackers, a perimeter firewall isn't doing me any good. (It's probably getting in the way by blocking access to new and novel things I actually want to let the Internet at.)

In this sort of situation a perimeter firewall may even do active harm. If people naively believe that the firewall is protecting them, they may slack off on the security of their machines, leaving them more exposed to 'internal' attacks than before. (Since people are fundamentally lazy about security, this is actually quite likely.)

Two sidebars:

Firewalls between a department, a workgroup, or a cluster of servers and the rest of the university can sometimes make sense, depending on how isolated the work of the machines or people is. (The UofT's administrative servers are behind a very restrictive and carefully constructed firewall. Some of them are on networks not even routed from the Internet.)

People who run student labs have an even harder job, since malicious people can almost certainly get a legitimate looking login just from things like shoulder surfing.

UniversityFirewalls written at 01:56:19; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.