2008-04-30
Why you can't stop 'abuse' of file sharing services
I hope that everyone will agree that filtering 'banned' content (whatever that is in your local jurisdiction) is in practice impossible. Users are happy to rename files and otherwise obscure them in order to evade machine inspection, and no one can afford to hand audit everything that is being shared.
(Arguing that people must hand audit things anyways basically amounts to arguing that no one can run a file sharing service.)
This means that the only thing you can really do to deter this banned content is to make your service 'too unattractive' for it. But because you can't tell good content from banned content, this means making your service too unattractive for sharing things freely in general, and this means dooming your service to at best obscurity and at worse complete failure (regardless of whether it is commercial or just some free software that you want to see widely used). This is not exactly an attractive proposition, to put it one way.
(There are all sorts of ways to make your service unattractive; restricting what sorts of content you'll accept, making it awkward to upload or download, insisting on detailed registration for people who want to upload stuff so that you can trace 'abusers' precisely, and so on.)
(Demanding that people take steps to deter sharing of banned content anyways is tantamount to demanding that new file sharing services cripple themselves right from the start, right when they most need to get people interested.)
The inevitable result is that every file sharing service that is actually useful has and will always have 'banned' content; the more useful the service is, the more banned content it will have. The only way to have no banned content is to have an all but useless service, or one that is extremely restricted in scope.
2008-04-25
BitTorrent trackers are not innocent bystanders
Every so often, I see someone put forth the view that BitTorrent trackers aren't at all responsible for what they help distribute because they operate without actually knowing anything about the torrents they coordinate. In practice this is wrong.
People say this because in theory a tracker (or at least an unencrypted one) doesn't have to know anything more about torrents than their SHA1 hashes, or more technically some shared key of the right size that all the clients agree on; to the tracker it is an opaque handle. Thus you could run an 'open tracker', one that let anyone use it as the tracker for any torrent, and such a mode is (or was) supported by the very basic tracker included in the original BitTorrent codebase.
(Using the SHA1 hash of the torrent metainfo file has the helpful property that the chances of two torrents having the same shared key are very, very low, and you don't need to put an additional piece of information in the metainfo file.)
In practice, almost all trackers will only track torrents that have been registered with them in advance, which means that almost no tracker operator can really disclaim all knowledge about what they are being used to coordinate; they are all filtering to at least some degree.
(Open trackers are so rare partly because running an actively used tracker can require a significant amount of network bandwidth, due to tens of thousands of machines sending you status updates and requesting lists of peers.)
2008-04-20
What FAQs are
Here's a question: which type of documentation is the ever-popular FAQ format?
My answer is that in practice, FAQs are a collection of tutorials; they tell you how to do various specific frequently asked operations. This has important implications if your primary documentation format is FAQs, because FAQs effectively have drawbacks from both sides; they're not as well organized or as comprehensive as reference documents, and they don't give you an overall view the way a single unified tutorial does.
(That they are a collection implies that FAQs live and die on their organization, ie how easy it is for people to find the specific question that they need.)
If FAQs are currently your primary form of documentation, I think that the most important second form to have is reference documentation, because it is the most efficient way to cover the remaining information and probably also the easiest for people to use.
(By 'efficient' I mean 'most results for the least amount of documentation writing'. Complete tutorial documentation is almost always longer than equally complete reference documentation.)
As an aside, FAQs as your primary documentation may seem silly, but I think it's a natural thing if you're too pressed for time to write much documentation. FAQs make it easy to create something useful right away and then grow it piece by piece, they give you a natural starting point, and it's less intimidating to sit down to write some FAQs than to write either full tutorial or full reference documentation.
2008-04-18
The two (at least) forms of documentation
The problem with GNU Texinfo is not that it is a bad format for documentation as such. The problem is what sort of documentation gets written in GNU texinfo format.
To simplify, there are at least two forms of documentation: tutorials and references. Where I draw the line between the two is that is that tutorial documentation is intended to be read from start to finish, while reference documentation is intended to be skimmed and skipped through.
Writing something as a tutorial has at least three effects that are relevant here:
- it is perfectly sensible to scatter information about something
across several places; it is more important to have the entire work
in a logical sequence when read as a whole than to have all of the
information about something in a single place.
- it's also perfectly sensible to break information up into small
chunks, each covering one thing or one concept, and present them
independently; it gives readers natural pause spots and avoids
confronting them with an endless wall of text.
- it is not important to have a clear, easily visible organization, as long as everything makes sense and flows logically as you read through the whole thing.
None are good things for reference documentation, where you want everything in one spot, clearly organized even if you don't know the subject and easy to skim and page through as a single unit.
The general problem with focusing on tutorial documentation over reference documentation is that tutorial documentation is used infrequently, to learn the system and to get large-scale refreshers on it, while reference documentation is used much more frequently, when you forget an option or a feature or whatever.
(Tutorials also fall down if your audience is not really interested in your subject matter, because they demand too large an investment of time before you get much out of them. With reference documentation people have at least some chance of getting a specific question answered quickly, before they just give up.)
So the problem with GNU texinfo is that people pretty much only use it to write tutorial documentation, and many of its features are oriented towards this; if you write in texinfo format, there are lots of things pushing you towards tutorials as the most natural outcome. (It doesn't help that the FSF is strongly against manual pages; culture matters.)
2008-04-16
My secret mouse fear
One of my little computer terrors is that someday I won't be able to find good quality plain three button mice any more, because everyone will have stopped making them. In fact it may already be too late to get decent USB three button mice, which worries me; although I have a stash of PS/2 ones, I may someday have to migrate to a USB-only machine.
(Searching the web is basically no help in finding out, and since I have a reasonably large stash of suitable PS/2 mice I haven't had to go out into the computer store wilds to find out.)
What I would really like is something that I am not sure is even made: a mouse with three regular buttons on top and then a scroll wheel somewhere on the side. This would handily deal with my major objection to the scroll wheel, that it gets in the way of easy and fluid use of the middle mouse button.
2008-04-08
An alternate take on availability numbers
The usual way of presenting availability numbers is to cover how much (or how little) downtime you have per year. As another way of understanding them, let's turn that around and ask how much less downtime you have as you move from one availability level to a higher one.
| 90% to 99% | two hours less in a day almost 33 days less a year |
| 99% to 99.9% | three days less in a year |
| 99.9% to 99.99% | just under eight hours less in a year |
| 99.99% to 99.999% | 47 minutes less in a year |
| 99.999% to 99.9999% | just under five minutes less in a year |
Looking at the numbers this way makes it clear why so few people pay for very high availability; objectively, you don't get that much for your large quantities of money. Towards the higher end of the chart, you have to be asking how much money you are making (or losing) per minute in order for an availability bump to make any sense.
(Conversely, you get a lot of value at the low end of the availability numbers, assuming that you are willing to ignore exceptional disasters that require very costly things like replicated data centers.)
2008-04-05
Why people are accepting bad uptimes from Internet applications
Recently (for my version of recently), some pundits have asked why people are willing to accept the sometimes less than stellar uptimes that they get from your typical internet service. (Okay, the original article applied the question to various other services too.)
My view is that it's pretty simple and really comes down to two things:
- people are not willing to abandon flawed services, because flawed services are generally better than nothing.
- people are also not willing to pay the price of carrier grade uptimes, because most of the time such a high uptime doesn't really matter to people.
The second is because once you achieve a basic level of reliability, most of the time people either don't notice a downtime (because they're not using the service at the time) or don't care that much when they do notice (because it's not important enough to them).
Without either government regulation or enough people being willing to give up entirely on merely ordinarily reliable Internet services, there is not enough pressure on service providers to force them to improve things. And neither seems very likely to happen.
(Well, okay, there is one more source of pressure: if a competitor introduces a service that's as good, more or less as cheap, and significantly more reliable, and you can't make your service better than theirs so you have to compete on reliability.)
(You might think that people could sell higher reliability for a higher price, but experience seems to show that such things are niche products at best.)