2015-02-23
In shell programming, I should be more willing to write custom tools
One of the very strong temptations in Unix shell programming is to use and abuse existing programs in order to get things done, rather than going to the hassle of writing your own custom tool to do just what you want. I don't want to say that this is wrong, exactly, but it does have its limits; in a variant of the general shell programming Turing tar pit, you can spend a lot of time banging your head against those limits or you can just write something that is specific to your problem and so does what you want. I have a bias against writing my own custom tools, for various reasons, but this bias is probably too strong.
All of that sounds really abstract, so let me get concrete about the case that sparked this thought. I have a shell script that decides what to do with URLs that I click on in my Twitter client, which is not as simple as 'hand them to my browser' for various reasons. As part of this script I want to reach through the HTTP redirections imposed by the various levels of URL shorteners that people use on Twitter.
If you want to get HTTP redirections on a generic Unix system with
existing tools, the best way I know of to do this is to abuse curl
along with some other things:
curl -siI "$URL" | grep -i '^location:' | awk '{print $2}' | tr -d '\r'
Put plainly, this is a hack. We aren't actually getting the redirection
as such; we're getting curl to make a request that should only
have headers, dumping the headers, and then trying to pick out the
HTTP redirection header. We aren't verifying that we actually got
a HTTP redirect status code and I think that the server could do
wacky things with the Location: header as well, and we certainly
aren't verifying that the server only gave us headers. Bits of this
incantation evolved over time as I ran into limitations in it; both
the case-independent grep and the entire tr were later additions
to cope with unusual servers. The final nail here is that curl
on Fedora 21 has problems talking to CloudFlare HTTPS sites and that
affects some specialized URL shorteners I want to strip redirections
from.
(You might think that servers will never include content bodies
with HEAD replies, but from personal experience I can say that a very similar
mistake is quite easy to make in a custom framework.)
The right solution here is to stop torturing curl and to get or write a specialized tool to do the job I want. This tool would specifically check that we got a HTTP redirection and then output only the target URL from the redirect. Any language with a modern HTTP framework should make this easy and fast to write; I'd probably use Go just because.
(In theory I could also use a more focused 'make HTTP request and extract specific header <X>' tool to do this job. I don't know if any exist.)
Why didn't I write a custom tool when I started, or at least when
I started running into issues with curl? Because each time it
seemed like less work to use existing tools and hack things up a
bit instead of going all the way to writing my own. That's one of
the temptations of a Turing tar pit; every step into the tar can
feel small and oh so reasonable, and only at the end do you realize
that you're well and truly mired.
(Yes, there are drawbacks to writing custom tools instead of bending standard ones to your needs. That's something for another entry, though.)
PS: writing custom tools that do exactly what you want and what
your script needs also has the side effect of making your scripts
clearer, because there is less code that's just there to wrap and
manipulate the core standard tool. Three of the four commands in
that 'get a redirection' pipeline are just there to fix up curl's
output, after all.
2015-02-17
Your example code should work and be error-free
This is one of those cases where I'm undoubtedly preaching to the choir, but one thing that sends me into a moderate rage is articles about programming that helpfully include sample code as illustrations but then have errors in the sample code. The worst errors are subtle errors, things where the code almost works but occasionally is going to blow up.
Actually, let me clarify that. It's common to omit error checking in sample code and sometimes the demands of space and comprehension mean that you can't really put in all of the code to handle the full complexity of a situation with all of its corner cases. But if you do this you should also add a note about it, especially about unhandled corner cases. Omitting error checks (especially without a note) is more forgivable in a language with exceptions, since the code will at least malfunction obviously.
Perhaps it is not obvious to my readers why this is a bad idea. The answer is simple: sooner or later someone is going to either copy your sample code into their program more or less as is or use it as a guideline to write their own. After all, you've already given them the algorithm and the structure of what they want to do; why shouldn't they copy your literal code rather than rewrite from scratch based on their understanding of your article? If your code is good, it's actually less error-prone to just copy it. Of course the exact people who need your article and are going to copy your code are the people who are the worst equipped to spot the errors and omitted corner cases lurking in it.
The more subtle problem is that anyone who does know enough to spot errors in your sample code is going to immediately distrust your entire article. If you have buggy code in your examples, what else have you screwed up? Everything you're saying is suspect, especially if the flaw is relatively fundamental, like a concurrency race, as opposed to a a relatively simple mistake.
(Since I've put code examples into entries on Wandering Thoughts, this is of course kind of throwing stones in what may well be a glass house. I know I've had my share of errors that commentors have pointed out, although I do try to run everything I put in an entry in the exact form it appears here.)
(This entry is sadly brought to you by my irritation with this article on some Go concurrency patterns. It contains potentially interesting stuff but also code with obvious, classic, and un-commented-on concurrency races, so I can't trust it at all. (Well, hopefully the concurrency races are as obvious as I think they are.))