In shell programming, I should be more willing to write custom tools
One of the very strong temptations in Unix shell programming is to use and abuse existing programs in order to get things done, rather than going to the hassle of writing your own custom tool to do just what you want. I don't want to say that this is wrong, exactly, but it does have its limits; in a variant of the general shell programming Turing tar pit, you can spend a lot of time banging your head against those limits or you can just write something that is specific to your problem and so does what you want. I have a bias against writing my own custom tools, for various reasons, but this bias is probably too strong.
All of that sounds really abstract, so let me get concrete about the case that sparked this thought. I have a shell script that decides what to do with URLs that I click on in my Twitter client, which is not as simple as 'hand them to my browser' for various reasons. As part of this script I want to reach through the HTTP redirections imposed by the various levels of URL shorteners that people use on Twitter.
If you want to get HTTP redirections on a generic Unix system with
existing tools, the best way I know of to do this is to abuse curl
along with some other things:
curl -siI "$URL" | grep -i '^location:' | awk '{print $2}' | tr -d '\r'
Put plainly, this is a hack. We aren't actually getting the redirection
as such; we're getting curl
to make a request that should only
have headers, dumping the headers, and then trying to pick out the
HTTP redirection header. We aren't verifying that we actually got
a HTTP redirect status code and I think that the server could do
wacky things with the Location:
header as well, and we certainly
aren't verifying that the server only gave us headers. Bits of this
incantation evolved over time as I ran into limitations in it; both
the case-independent grep
and the entire tr
were later additions
to cope with unusual servers. The final nail here is that curl
on Fedora 21 has problems talking to CloudFlare HTTPS sites and that
affects some specialized URL shorteners I want to strip redirections
from.
(You might think that servers will never include content bodies
with HEAD
replies, but from personal experience I can say that a very similar
mistake is quite easy to make in a custom framework.)
The right solution here is to stop torturing curl and to get or write a specialized tool to do the job I want. This tool would specifically check that we got a HTTP redirection and then output only the target URL from the redirect. Any language with a modern HTTP framework should make this easy and fast to write; I'd probably use Go just because.
(In theory I could also use a more focused 'make HTTP request and extract specific header <X>' tool to do this job. I don't know if any exist.)
Why didn't I write a custom tool when I started, or at least when
I started running into issues with curl
? Because each time it
seemed like less work to use existing tools and hack things up a
bit instead of going all the way to writing my own. That's one of
the temptations of a Turing tar pit; every step into the tar can
feel small and oh so reasonable, and only at the end do you realize
that you're well and truly mired.
(Yes, there are drawbacks to writing custom tools instead of bending standard ones to your needs. That's something for another entry, though.)
PS: writing custom tools that do exactly what you want and what
your script needs also has the side effect of making your scripts
clearer, because there is less code that's just there to wrap and
manipulate the core standard tool. Three of the four commands in
that 'get a redirection' pipeline are just there to fix up curl
's
output, after all.
|
|