2008-03-28
The stages of Bittorrent encryption
Recently it's been mentioned in the news that versions of BitTorrent have gotten more encryption. Because this sort of thing interests me (partly due to my long standing interest in BitTorrent), I've decided to summarize the various current sorts of BitTorrent encryption.
BitTorrent encryption efforts have been designed to hide BitTorrent, so they have acted to make it less recognizable.
Traffic shapers started out recognizing (and choking) the distinct client to client communication, so people obfuscated and encrypted it. A unique session key is arranged between each pair of clients using Diffie-Hellman key exchange initiated by the connecting client; trackers are not involved except to optionally track clients that offer or require encrypted connections.
(The specifics of the DH key exchange seem resistant to being recognized by traffic shapers.)
This still left the communication between client and tracker completely unencrypted, including the peer list that the tracker gives to clients, and some traffic shapers exploited this to determine peer IPs and block communication with them. The new encryption obfuscates this list by essentially encrypting it with a secret shared between the tracker and the clients.
(The shared secret is reasonably clever: it is the SHA1 hash of the torrent meta-information, which is now carefully not passed around between client and tracker. Not negotiating a per-client secret significantly lowers CPU overhead for very active trackers, since they can encrypt blocks of peer IPs once and then just hand them out.)
This still leaves the overall client to tracker communication in the clear, which means that a traffic shaper could exploit this to choke the bandwidth of a known client that started sending and receiving streams of random data with various IP addresses. Thus, the likely next stage in BitTorrent encryption is to encrypt all of it; since the client talks to the tracker with HTTP, the easiest way would be to use ordinary SSL (although this would require more tracker CPU power).
2008-03-21
Journaling filesystems and the fsync() problem
Consider your ordinary journaling filesystem. For simplicity and reliability you have a single, global log in which you put transactions for all of your filesystem activity, instead of anything more complicated. One useful consequence of this global log is that you have now created a filesystem-wide global order of all filesystem events (sometimes called a 'total order'), which will be preserved even if you crash and restart.
(You implicitly had a total order before, but it didn't necessarily survive crashes.)
This sounds great until someone does an fsync() to insure that changes
to their particular file are fully stable. That you have a global log
means that changes to their file are intermixed with other changes; your
log's total order means that you have to commit everything up to the
last modification point of their file, regardless of what any particular
change modifies.
On a sufficiently busy system, almost all of the changes in the journal
log will not be to the file being fsync()'d. Flushing and committing
all of these unrelated changes is overhead that just serves to slow down
the fsync(), sometimes by quite a lot.
You can get around this, but it generally requires a significantly more
complicated filesystem and journal design, which may or may not be
considered worth it in general. (Not that many applications actually use
fsync(), and many of them are not all that speed sensitive. On the
other hand, the exceptions tend to be pretty important.)
2008-03-19
The problem of charging for things (well, one of them)
People who want to charge relatively low fees for things often make the argument that people will pay because the charge is such a small amount of money that it is easily afforded, and people get great value for it. One of the problems of charging for things is that it doesn't work this way.
When you charge, you're not competing to be a great value, you're competing to be the most attractive thing for people to do with that money. Want to get two dollars from someone? You'd better be more interesting than a cup of coffee to that person.
(Disclaimer: I don't know what a cup of coffee actually costs.)
There are two additional things worth noting:
- enthusiasts and entrepreneurs are especially at risk of this, because of
course they are convinced that their system is attractive.
- many of the things organizations want to charge their users for are not attractive; instead they are effectively taxes. No wonder users revolt.
(Note that it doesn't matter that the things you get are a benefit. What matters is if people actually want them, or simply have to have them.)
It further strikes me that one way to make something attractive is to make it remove an irritating limitation. Of course, you have to tread carefully to make sure that users will perceive the limitation as genuinely necessary, instead of something that exists mostly to extract money from them.
(This thought has been indirectly sparked by the recent LiveJournal drama about eliminating new ad-free free accounts.)
2008-03-14
Why I like definite answers to support issues
One of the things that irritates me about commercial support is the difficulty of getting a definite answer from them if they do not have a canned solution to my issue. If they cannot give me great news, getting any news from them is often like pulling teeth (or slower), and even when they're willing to talk to me they tend to mumble a lot.
I want answers for a simple reason: it lets me resolve the situation one way or another. In this, I would rather have a definite no than a dragged on maybe; in the former case, at least I can make sure decisions and sensible plans. Without a definite answer, it is very hard to give up and let things drop; after all, the answer might be 'yes' tomorrow, if only I'd waited.
The result is that vendor mumbling turns into local paralysis, unless we are forced by outside events to establish a deadline (eg, 'must have something operational on date X'). We don't like the paralysis, but it is very hard to fight without that outside deadline, partly because we know that any deadline we pick for making a definite decision is pretty much arbitrary.
(No doubt this serves the vendor's purposes; after all, in a year the horse might sing.)
2008-03-09
The difference between a SAN and a cluster filesystem
Both SANs and cluster filesystems have multiple machines talking to multiple shared disks that all of them can see. The difference is that SANs are designed for a single machine to talk to a given disk at a time, while cluster filesystems allow multiple machines to talk to the disk at once.
Which raises the big question: why do you need cluster filesystems at all? Why can't multiple systems share a single disk without doing anything special?
There's two problems with shared disks: caches and coordinating updates. These days, pretty much all filesystems cache bits of the on-disk filesystem in memory, ranging from file data to parts of the filesystem metadata like directories. None of this caching works very well if there is something else changing the data on the disk, because the system has no idea that it is serving stale data from cache instead of throwing it out and fetching the current data again.
In theory you could get around that by doing no caching (although the performance loss would probably be pretty impressive). However, this still leaves you with the problem of coordinating several systems that are all trying to update the filesystem at the same time. Without some sort of locking, you are going to wind up with a pretty scrambled filesystem in short order, as systems gleefully allocate the same data block to several files, overwrite each other's directory updates, and so on.
Further, there's nothing that the SAN storage can do to fix either issue because both problems happen well out of its sphere of operations. Without cooperation from the systems talking to it, the most it can do to help is to enforce exclusive access to disks. (Of course, exclusive access to disks is exactly what you don't want if you really do have a cluster filesystem, or even some sorts of failover depending on how exactly the exclusive access is implemented.)