2008-12-23
You need major advantages to really move issues
Here is something that is intuitively clear to me but I am finding hard to explain coherently:
In order to really push an issue, to really get people to care about it, you need to have a major advantage to offer in it.
Here is an example of what I mean:
Imagine that you want to improve boot times, and in fact you see fast boot times as both important for usability and a competitive advantage. However, most people don't care about boot times right now, or really notice them. If you improve boot times by 1%, you don't have very much; such a small improvement isn't going to really be noticed by most people and it's not much of an advantage. But if you improve boot times by 50%, even in a restricted environment, suddenly all sorts of people are going to be energized and paying attention.
By delivering a major reduction in boot times, you will have created at least the potential for a real competitive advantage, and by doing that you will have made all sorts of people care about boot times. Some of them will go off and do things that improve the overall state of the art in boot times, which will generalize your work and probably help improve it more. Others will just improve their boot times, which validates your position that boot times are important and increases your competitive advantage (provided that you can stay ahead). And in that better boot times are an overall good thing, you will have improved everyone's life.
(Your competitive advantage is based both on how much of an improvement you can deliver and on how many people care. More people emphasizing boot times means that more people will care about it.)
But, as mentioned, none of these things happen if you don't have that major advantage to start off with. If you have only a small advantage, you cannot push your issue very effectively; it will stay unimportant and under-developed.
As I suggested in the example, often pushing an issue is important for more than just having a meaningful competitive advantage. The default situation in many areas is that people could make all sorts of incremental improvements but the demand for them isn't there, or there isn't enough of it. Unless you are very big you cannot change this on your own, because your own demand is not big enough to move the market by itself or even really get the market to listen. Pushing the issue is necessary to increase the demand in the market so that it will start delivering general improvements.
2008-12-14
Feed aggregators should fail gracefully
I don't just mean that they should fail gracefully when the feeds aren't well formed (although that's an important part). There are all sorts of troublesome things that feeds can do; they can be longer than you expected, for example, or they can be sent to you very slowly. In all of these cases, feed aggregators should try to fail gracefully, to extract as much information from the feed as possible unless it is utterly clear that something horrible has gone wrong and you cannot trust the feed at all.
In general all feed readers should do this (or at least all feed readers that are not deliberately being strict to make a point), but I think that it is especially important for feed aggregators to do this. Feed aggregators often have a lot of users behind them, any problems are more likely to be invisible to those users (aggregators are traditionally a lot less transparent than desktop clients, which usually give you some error indications), and even once a user detects that there are problems, they are generally powerless to change settings to fix the situation. The result is that feed aggregators are effectively holding their users hostage to their decisions in a greater way than desktop clients are, and so I maintain that they should be more careful.
(I also think that it is more likely for feed aggregators to have various sorts of limits than desktop clients, simply because an aggregator is probably dealing with more data and more feeds in a situation with less resources.)
This makes me feel that feed aggregators should use a stream oriented parser. Such parsers are less XML-correct (since an XML error at the end of the stream won't be detected until the end of the stream), but are likely to be much better at extracting information from feeds that are incomplete (either naturally or because your aggregator is only willing to look at so much data) or otherwise problematic.
(I admit that my ox is gored on this issue, as the LiveJournal feed of WanderingThoughts once again ran afoul of LiveJournal's arbitrary feed size limit, cutting off updates for about a month until one of my readers left a comment about it here.)
2008-12-11
Why syndication feed readers (and web browsers) should fail gracefully
There are two schools about dealing with errors in pseudo-XML formats like syndication feeds and XHTML: strict failure (the Firefox model, where the user gets a big BZZT dialog) and graceful failure (the Universal Feed Parser model, where you try to get as much as possible out of things). Which school is better is one of those frequently argued things on the Internet.
(Yes, yes, theoretically both syndication feeds and 'XHTML' are real XML and there is only one valid failure mode. This is not true in the real world.)
But it's recently occurred to me that there is a really simple summary of why I think graceful failure is the correct answer. It is this observation:
Strict failure punishes the reader for the sins of the site author.
(And yes, preventing people from reading something that they've expressed an interest in is punishing them.)
Much as the browser is the wrong place to warn about HTML errors, feed readers are the wrong place to complain about invalid feeds. Unless the author reads their feed, those big BZZT dialogs are being shoved in the faces of people who had nothing to do with the problem, which means that you are punishing the wrong people. And my belief is that punishing the wrong people is pretty much always the wrong thing to do.
(The same logic applies directly to browsers dealing with XHTML.)
2008-12-04
The rewriting problem on ZFS and other 'log structured' filesystems
When they are planning how to organize their IO, programs (and designers) normally assume that rewriting (overwriting) an existing, already written file is the fastest sort of write operation possible and will not change the existing file layout (for example, if it has been created to be sequential). This is because all of the data blocks have already been allocated and all of the metadata has been set up; if you write in block-aligned units, pretty much all the operating system has to do is shovel data into disk sectors.
(None of this applies if you do something that makes the operating system discard the existing data block allocations, for example truncating the file before starting to rewrite it.)
Filesystems like ZFS break this assumption, because one of their fundamental principles is that they never overwrite existing blocks (this gives them great resilience, enables cheap snapshots, and so on). On what I am inaccurately calling a 'log structured' filesystem, rewriting a block requires allocating a new block and hooking it into the file's metadata, which is at least as expensive and slow as writing it in the first place. As a side effect of allocating new blocks, it will change the file layout in somewhat complicated ways depending on exactly how you rewrite and how much you rewrite.
If you are rewriting randomly there are two major scenarios, depending on whether you're going to do random reads or sequential reads from the rewritten file; call these the database case and the BitTorrent case. The BitTorrent case is horrible, with slower than expected rewrites followed by read speeds that will probably be at least an order of magnitude slower than expected. The database case is just slower rewrites than you expected with access time no worse than before (since we assume random access already), but remember that even databases periodically do sequential table scans (probably especially for backups).
(If you preallocate a file through whatever special OS interface is available for this, it's possible that a log structured filesystem could still preserve a sequential layout while letting you do random writes. I don't know if filesystems are this smart yet, and in specific I don't know if ZFS even has interfaces for this. And this only works if you are rewriting each block only once, so it doesn't help the general database case.)