Wandering Thoughts archives

2013-04-03

How to make sysadmins unhappy with your project's downloads

I tweeted:

Dear every project that doesn't have an URL for their tarballs that is easily wget'able: ha ha. Very funny. Please stop. #sysadmin

Let me expand on this a bit. First, I'll give a pass to everyone who has access-restricted downloads; there is no good way to make them easily fetched. This is for everyone else, all of the various projects that have public downloads.

Here is the thing: sysadmins are not necessarily browsing your website on the machine where they actually want the source code. In fact it's almost certain that they aren't, since very few sysadmins run Firefox or Chrome on their servers. What sysadmins want to do is use 'Copy Link Location' on the (nominal) URL of your project's distribution tarball, open a connection to the server, type 'wget <pasted URL>' on it, and wind up with a sensibly named tarball (or zip file or whatever) of your source afterwards.

There are at least two ways that this goes wrong. Sadly I am going to have to pick on the Django web framework for the first one, because it inspired my tweet. The download URL for Django 1.5.1 is:

https://www.djangoproject.com/download/1.5.1/tarball/

If you feed this URL to wget, you do not get something called 'Django-1.5.1.tar.gz' but instead a file called 'index.html' (which is the gzip'd tarball that you want, just with the wrong name). This is because wget operates in a very straightforward way; it puts whatever it fetches in a file named after the last component (or index.html if the last component looked like a directory, as it did here). Wget does have an option to change this, --content-disposition, but I had to look it up in the manpage. Sysadmins do not appreciate being forced to look up (and then type) long options to wget to get your tarballs.

The fix for this is straightforward: your download URL should have a last component that is the name of the distribution tarball or applicable file. Then wget will do the right thing.

(Github does a variant of this. The stated URLs of a zipfile of a repo are things like <user>/<project>/archive/master.zip, but the fetched file is supposed to be called <project>-master.zip. Browsers that pay attention to the HTTP Content-Disposition header will save it under that file name; wget will at least use master.zip.)

The other really bad thing you can do is what Sourceforge at least used to do. The nominal 'download' links on Sourceforge projects didn't go directly to the files (despite appearing as if they did); instead they went to an interstitial HTML page that told you about mirrors and and automatically started a download (I assume through the use of a HTML '<meta http-equiv="refresh" ...>' in the page). This is of course completely impossible to feed to wget, which doesn't interpret this HTML <meta> tag at all. You should not do this sort of trickery; your download links should actually be links to the files, not to any sort of interstitial experience. If you need to make people go through a mirror, do that with an HTTP redirect and put an explanation about it on your download page.

programming/WgetableDownloads written at 00:28:48; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.