Wandering Thoughts archives

2015-02-23

In shell programming, I should be more willing to write custom tools

One of the very strong temptations in Unix shell programming is to use and abuse existing programs in order to get things done, rather than going to the hassle of writing your own custom tool to do just what you want. I don't want to say that this is wrong, exactly, but it does have its limits; in a variant of the general shell programming Turing tar pit, you can spend a lot of time banging your head against those limits or you can just write something that is specific to your problem and so does what you want. I have a bias against writing my own custom tools, for various reasons, but this bias is probably too strong.

All of that sounds really abstract, so let me get concrete about the case that sparked this thought. I have a shell script that decides what to do with URLs that I click on in my Twitter client, which is not as simple as 'hand them to my browser' for various reasons. As part of this script I want to reach through the HTTP redirections imposed by the various levels of URL shorteners that people use on Twitter.

If you want to get HTTP redirections on a generic Unix system with existing tools, the best way I know of to do this is to abuse curl along with some other things:

curl -siI "$URL" | grep -i '^location:' | awk '{print $2}' | tr -d '\r'

Put plainly, this is a hack. We aren't actually getting the redirection as such; we're getting curl to make a request that should only have headers, dumping the headers, and then trying to pick out the HTTP redirection header. We aren't verifying that we actually got a HTTP redirect status code and I think that the server could do wacky things with the Location: header as well, and we certainly aren't verifying that the server only gave us headers. Bits of this incantation evolved over time as I ran into limitations in it; both the case-independent grep and the entire tr were later additions to cope with unusual servers. The final nail here is that curl on Fedora 21 has problems talking to CloudFlare HTTPS sites and that affects some specialized URL shorteners I want to strip redirections from.

(You might think that servers will never include content bodies with HEAD replies, but from personal experience I can say that a very similar mistake is quite easy to make in a custom framework.)

The right solution here is to stop torturing curl and to get or write a specialized tool to do the job I want. This tool would specifically check that we got a HTTP redirection and then output only the target URL from the redirect. Any language with a modern HTTP framework should make this easy and fast to write; I'd probably use Go just because.

(In theory I could also use a more focused 'make HTTP request and extract specific header <X>' tool to do this job. I don't know if any exist.)

Why didn't I write a custom tool when I started, or at least when I started running into issues with curl? Because each time it seemed like less work to use existing tools and hack things up a bit instead of going all the way to writing my own. That's one of the temptations of a Turing tar pit; every step into the tar can feel small and oh so reasonable, and only at the end do you realize that you're well and truly mired.

(Yes, there are drawbacks to writing custom tools instead of bending standard ones to your needs. That's something for another entry, though.)

PS: writing custom tools that do exactly what you want and what your script needs also has the side effect of making your scripts clearer, because there is less code that's just there to wrap and manipulate the core standard tool. Three of the four commands in that 'get a redirection' pipeline are just there to fix up curl's output, after all.

WriteCustomToolsForScripts written at 01:52:07; Add Comment

2015-02-17

Your example code should work and be error-free

This is one of those cases where I'm undoubtedly preaching to the choir, but one thing that sends me into a moderate rage is articles about programming that helpfully include sample code as illustrations but then have errors in the sample code. The worst errors are subtle errors, things where the code almost works but occasionally is going to blow up.

Actually, let me clarify that. It's common to omit error checking in sample code and sometimes the demands of space and comprehension mean that you can't really put in all of the code to handle the full complexity of a situation with all of its corner cases. But if you do this you should also add a note about it, especially about unhandled corner cases. Omitting error checks (especially without a note) is more forgivable in a language with exceptions, since the code will at least malfunction obviously.

Perhaps it is not obvious to my readers why this is a bad idea. The answer is simple: sooner or later someone is going to either copy your sample code into their program more or less as is or use it as a guideline to write their own. After all, you've already given them the algorithm and the structure of what they want to do; why shouldn't they copy your literal code rather than rewrite from scratch based on their understanding of your article? If your code is good, it's actually less error-prone to just copy it. Of course the exact people who need your article and are going to copy your code are the people who are the worst equipped to spot the errors and omitted corner cases lurking in it.

The more subtle problem is that anyone who does know enough to spot errors in your sample code is going to immediately distrust your entire article. If you have buggy code in your examples, what else have you screwed up? Everything you're saying is suspect, especially if the flaw is relatively fundamental, like a concurrency race, as opposed to a a relatively simple mistake.

(Since I've put code examples into entries on Wandering Thoughts, this is of course kind of throwing stones in what may well be a glass house. I know I've had my share of errors that commentors have pointed out, although I do try to run everything I put in an entry in the exact form it appears here.)

(This entry is sadly brought to you by my irritation with this article on some Go concurrency patterns. It contains potentially interesting stuff but also code with obvious, classic, and un-commented-on concurrency races, so I can't trust it at all. (Well, hopefully the concurrency races are as obvious as I think they are.))

ExamplesShouldWork written at 01:57:15; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.