BitTorrent's file fragmentation problem

November 19, 2008

When BitTorrent receives a file, it gets the various chunks out of order, generally in a completely random one. This presents the client with the problem of putting them in order and in place.

I believe that historically there have been three approaches to deal with this. The very first BitTorrent clients did it the simple way: they put every received block in its correct place by seeking to that spot and writing the block out. The problem with this was horrible file fragmentation (resulting in terrible sequential read performance); because the blocks were written in random order they were generally allocated randomly around the disk, instead of nicely sequential.

Next came the approach of always growing the file in order, and reordering blocks inside the file. When the client got block N, it initially wrote it at the current end of file; when the file grew to be more than N blocks long, block N finally could be swapped into its correct location in exchange for whatever was already there. This avoids file fragmentation (the client is always expanding the file sequentially), but at at the cost of an increasing amount of file IO to shuffle blocks into their correct places.

(This file IO does not matter all that much for typical clients, which have much more disk bandwidth than network bandwidth, but it can be a significant issue if you are running BitTorrent on fast networks, especially since disks are seek limited.)

The final approach is for the client to pre-write the file (with empty contents) before it starts receiving anything, and then to directly write received blocks into their correct locations. Pre-writing the file forces sequential allocation (possibly better than growing the file does), and rewriting parts of it later generally doesn't change this. The cost of this approach is a potentially significant startup delay, as the client writes what may be several gigabytes to disk.

(Note that many of these sequential allocation assumptions break down if you are using a log-structured filesystem such as ZFS. Copying the file again after you've received it may be the only good solution.)

I wish I could tell you that BitTorrent has solved this problem, but as far as I know it hasn't; you just get to pick which drawback you want. I believe that most BitTorrent clients today default to the second approach but give you an option to do the third.


Comments on this page:

From 193.203.82.194 at 2008-11-19 07:23:32:

There's another option: preallocation of files. Which is basically reserving contiguous space for the file in a smart way without actually doing all the I/O to write out zeroes. XFS has supported this forever, and support for preallocation has been added to ext4 as well.

See e.g. http://article.gmane.org/gmane.comp.file-systems.ext4/558 for a description of how XFS does it.

Azureus has been able to preallocate files in an XFS-specific way (it calls the xfs_io program from memory) for quite a while. There's also a filesystem-independent interface (fallocate) which has been wired up fairly recently.

Transmission seems to have support for fallocate: http://trac.transmissionbt.com/ticket/849

Written on 19 November 2008.
« Where vi runs into its limits
A growing realization about tcpdump and reading IP traffic »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Nov 19 01:28:08 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.