How to make sysadmins unhappy with your project's downloads
Dear every project that doesn't have an URL for their tarballs that is easily wget'able: ha ha. Very funny. Please stop. #sysadmin
Let me expand on this a bit. First, I'll give a pass to everyone who has access-restricted downloads; there is no good way to make them easily fetched. This is for everyone else, all of the various projects that have public downloads.
Here is the thing: sysadmins are not necessarily browsing your website
on the machine where they actually want the source code. In fact it's
almost certain that they aren't, since very few sysadmins run Firefox or
Chrome on their servers. What sysadmins want to do is use 'Copy Link
Location' on the (nominal) URL of your project's distribution tarball,
open a connection to the server, type 'wget <pasted URL>
' on it, and
wind up with a sensibly named tarball (or zip file or whatever) of your
source afterwards.
There are at least two ways that this goes wrong. Sadly I am going to have to pick on the Django web framework for the first one, because it inspired my tweet. The download URL for Django 1.5.1 is:
https://www.djangoproject.com/download/1.5.1/tarball/
If you feed this URL to wget
, you do not get something called
'Django-1.5.1.tar.gz' but instead a file called 'index.html
' (which is
the gzip'd tarball that you want, just with the wrong name). This is
because wget
operates in a very straightforward way; it puts whatever
it fetches in a file named after the last component (or index.html if
the last component looked like a directory, as it did here). Wget does
have an option to change this, --content-disposition
, but I had to
look it up in the manpage. Sysadmins do not appreciate being forced to
look up (and then type) long options to wget
to get your tarballs.
The fix for this is straightforward: your download URL should have
a last component that is the name of the distribution tarball or
applicable file. Then wget
will do the right thing.
(Github does a variant of this. The stated URLs of a zipfile of
a repo are things like <user>/<project>/archive/master.zip, but
the fetched file is supposed to be called <project>-master.zip.
Browsers that pay attention to the HTTP Content-Disposition
header will save it under that file name; wget
will at least
use master.zip
.)
The other really bad thing you can do is what Sourceforge at least used
to do. The nominal 'download' links on Sourceforge projects didn't go
directly to the files (despite appearing as if they did); instead they
went to an interstitial HTML page that told you about mirrors and and
automatically started a download (I assume through the use of a HTML
'<meta http-equiv="refresh" ...>
' in the page). This is of course
completely impossible to feed to wget, which doesn't interpret this
HTML <meta> tag at all. You should not do this sort of trickery; your
download links should actually be links to the files, not to any sort of
interstitial experience. If you need to make people go through a mirror,
do that with an HTTP redirect and put an explanation about it on your
download page.
|
|