2007-12-31
There are really two GPL v2 licenses
While is only one GPL v2 license, there are effectively two versions of the GPL v2 that programs are licensed under: programs licensed under the GPL v2 (only), and programs licensed under 'GPL v2 or (at your option) any later version', depending on the license verbiage that the program itself uses. The FSF encourages the 'GPL v2 or any later version' one, but some people have not been so open ended.
When there was only one GPL license in common circulation, this wasn't important, but now that the FSF has come out with version 3 of the GPL this difference is starting to matter. Because the GPL v2 and GPL v3 are incompatible, only code licensed under 'GPL v2 or later' can be mingled with GPL v3 code.
(The simple rule of mingling code is that you must satisfy all of the licenses at once in order to be able to distribute the result. Although straight GPL v2 and GPL v3 cannot both be satisfied at once, you can make things work with 'GPL v2 or later' code by taking the option to distribute it under the terms of the GPL v3. Note that this doesn't actually change the original code to GPL v3; it still remains 'GPL v2 or later'.)
Part of the confusion is a terminology issue, because we tend to use 'license' for two different things, namely the licenses themselves and the terms that a given program is distributable under. When the two aren't the same, as in the case of the different 'program licenses' (to coin an awkward phrase) for the GPL v2, things can become ambiguous unless one is careful.
2007-12-28
Please don't make me pick an account name
I just got through registering with yet another website. It was a pain, because they made me pick an account name.
There's two problems with account names. First, they matter because they're public identities; they say something about you to onlookers, so you care what your account name is. Second, at least for me, they're hard to come up with.
My first and second choices are almost always taken; three letter names usually go fast, if the system even allows them, as do common first names. I feel that my last name, while pretty unique, is too long to make a good account name, and I've never come up with a distinctive online handle.
(I sometimes envy people with short last names, especially if they have a first initial that goes well with it. Not that last names are perfect; among other things, they tie your real world identity to the account, and not everyone is willing to do that. And people's last names change every so often.)
The result is that any time I'm asked to come up with an account name I get to flail around trying to come up with something that is both untaken and something that I'm willing to be known as. Often I can't change my mind later and rename myself, so I'd better be really confidant about whatever I pick (which is hard when I'm making one up on the spot). Unless the service is very compelling, this can be enough to make me go away, because it is simply too much annoyance.
I suspect that I am not alone in this, so I have a simple plea to people: please don't make new users pick an 'account name' unless you really truly need such an identifier. If at all possible, let people supply just a name or an email address. And understand that if you ask people to make up account names, you are asking for something hard, especially if you do not let people change them afterwards.
2007-12-20
Virtualization does not eliminate security concerns
Here is something that has struck me recently: virtualization and abstraction cannot eliminate security concerns, they can only move them from one place to another. In other words, virtualization by itself doesn't do anything to prevent security bugs; it just means that they happen in a different place.
(By virtualization I mean more than hardware and OS virtualization, I also include things like the JVM.)
The advantage of virtualization is that it moves the problem inwards, towards the center of the security onion, where fewer people have to get it right and it makes sense to devote much more effort to security. The disadvantage to virtualization is that abstractions are usually more general, which means that they are bigger and more complex, which is one of the things that are bad for security.
(The other disadvantage is that security bugs in the virtualization are much more dangerous and much more valuable to attackers, because they may compromise a whole bunch of people at once.)
In the face of this, views on abstraction are partly a matter of perspective. With a local view of your system, you can have less exposure to security issues from not having to trust large abstractions. But if you have a global view, if your goal is to not have any security issues in any of your systems, you are less exposed with abstractions because they reduce the overall amount of security sensitive stuff across all of your systems; without the central abstractions, everyone has to get it right all of the time, which is a very difficult challenge.
2007-12-01
BitTorrent's protocol is not designed to hide
Every so often, I will hear someone say that Bram Cohen clearly wrote BitTorrent to facilitate piracy (despite any of his claims to the contrary) because it was deliberately designed to frustrate attempts to monitor its traffic. This claim irritates me partly because it is clearly wrong, almost blatantly so.
(Disclaimer: I am talking here about classic BitTorrent, as it was before ISPs started whacking things with hammers and people started reacting.)
There are two important things in a BitTorrent transfer: the peers, the collection of machines exchanging pieces of the file, and the tracker, a machine that tells peers (and would be peers) about each other. Your client joins the swarm by registering itself with the tracker, asks the tracker for a list of IP addresses of other peers, and then talks to them directly to exchange pieces of the file; every so often it sends a status update to the tracker.
(This is classic BitTorrent, where torrents had only a single tracker. Since this made the tracker a single point of failure, people soon extended the .torrent metainfo file format to allow for multiple trackers, and these days there are 'trackerless' versions of the protocol.)
The peer to peer protocol is distinct and easily identified and decoded, and it often uses a relatively narrow range of destination ports (TCP 6881 and up). While the peer to tracker protocol is HTTP, the contents of the requests and replies are quite distinct and should easily be identified by any competent traffic inspection system.
Sometimes people say that BitTorrent is hiding things in one of two ways: it limits the amount of information you can find out about peers, and it limits the amount of information you can find out about a random torrent that some people are exchanging. Both are somewhat misleading charges.
While there is no direct way to get a list of all of the peers in a swarm, you can get relatively close by joining the swarm and then repeatedly asking the tracker for peers. The tracker does have a limit of how many peers it will give out at once, but this is self defense; consider what would happen to its bandwidth if a few badly coded or greedy clients joined a popular swarm and started asking for a list of a few thousand peers. (The tracker also doesn't try to keep track of what peers it's already told you about, so you get a random subset each time.)
While it's true that you can't find out the names of the files being transfered in the torrent, this is because the protocols identify torrents using the SHA1 hash of the torrent meta-information instead of passing around the (much larger) meta-information itself.
(However, the protocol has enough information that a passive eavesdropper can reassemble a complete copy of the data in the correct order.)
Not worrying about distributing the meta-information itself makes BitTorrent different from many other P2P protocols, but it also simplifies its job tremendously. Much like web servers worry about serving pages and leave indexing to search engines, BitTorrent concentrates on efficiently distributing a specific blob of data to peers and leaves the rest of the job to someone else. Among other things, this makes it more flexible.
Hopefully all this has demonstrated how absurd it is to claim that BitTorrent was deliberately designed to hide things. About the only thing it could do to be more obvious (without using more bandwidth or trying to require objectionable non-technical things of trackers) would be to have a registered port for trackers instead of using HTTP.
Sidebar: why requiring metainfo availability is bad
You could try to get around the SHA1 hash issue by requiring that trackers always have the metainfo file for each torrent they serve and be willing to give it out. The problem is that this sets you up for an inevitable clash with private and access-restricted torrents. If trackers must give out metainfo files for their torrents to random third parties, then you cannot have a genuinely private torrent; if you can have private torrents, there is no guarantee that trackers will give nosy third parties metainfo files any more, and you might as well not pretend.
In addition, this complicates trackers significantly, because now they are required to implement a relatively full HTTP server environment and use it to serve files. A standards-compliant HTTP/1.0 server is not trivial, and let's not even think about HTTP/1.1.
(Trackers often do display informational pages, but this not required. You can implement a perfectly conformant tracker that only answers the announce URL and only handles a very limited subset of HTTP.)