Trailing text, a subtle gotcha with Go's fmt.Sscanf

December 8, 2016

I've written some Go code that wanted to do some simple scanf-like parsing of strings. When I did this, I peered at the fmt package documentation for the Sscanf function and confidently wrote something like the following code:

_, e := fmt.Sscanf(input, "%d:%d", &hr, &min)
if e != nil {
   return ..., e

This code has a bug, or perhaps some people would call it an unintended feature (for me it's definitely a bug). Namely, if you feed this code the input string '10:15 this is trailing text', you will not get an error message. Your code will parse the 10:15 part of the input string and silently ignore the rest, or more exactly Sscanf() will.

At this point you might wonder how to either force Sscanf to produce an error on trailing text or detect that you have trailing text. As far as I can tell there is no straightforward way, but there are two options depending on how paranoid you want to be (and where you get your input string from). The simple option is to add an explicit newline to your format string:

_, e := fmt.Sscanf(input, "%d:%d\n", &hr, &min)

This will parse an input string of '10:15' (with no trailing newline) without raising an error, and will detect most cases of trailing input by raising an error. It won't detect the relatively perverse case of something such as '10:15\n and more', because the '\n' in the input matches the expected newline and then Sscanf stops looking.

(At the moment you can stack more than one \n on the end of your format string and still parse a plain '10:15', so you can add some more caution and/or paranoia if you want. Sufficiently perverse input can always get past you, though, because as far as I can see there is no way to tell Sscanf that what you really mean is an EOF.)

The complicated hack is to add an extra string match to your format string and look at how many items were successfully parsed:

n, _ := fmt.Sscanf(input, "%d:%d%s", &hr, &min, &junk)
if n != 2 {
   return ..., error("Bad input")

Among other drawbacks, we have to ignore the error that Sscanf returns; it doesn't tell us whether or not the input was good, and when it has an error value it may be meaningless for our caller.

My suspicion is that in cases like this I am probably pushing Sscanf too far and it's actually the wrong tool for the job. In most cases the right answer is probably matching things with regular expressions so that I can directly say what I mean. Or, in this case, just using time.ParseInLocation even though it's less convenient and I'd have to do a bunch of manipulation on the result.

(Regular expressions are probably slower than Sscanf and I'd have to use strconv to turn the results into numbers, but my code here is not exactly performance critical.)

Comments on this page:

I'm confused about why you consider "10:15\n and more" to be perverse input. Shouldn't the "and more" be left in the buffer to be used as later input date? For example, if your script had a y/n question after the time input, then "10:15\ny" would be valid input (and potentially useful for scripting).

By cks at 2016-12-08 12:23:03:

The simple answer is that when you're calling fmt.Sscanf() in particular, there is no buffer. All you have is the input string, and Sscanf doesn't tell you where it stopped parsing so you can use the rest of the string for something; effectively Sscanf consumes the entire string. If someone stuffs newlines in that string when you aren't expecting them to be there, as you aren't if you're using the ending newline to create an error on trailing text, that's perverse input.

(In my specific case, the input string is the (Unix) command line arguments to a program. Embedding a newline in command line arguments is definitely being perverse; many scripts and commands will malfunction in various fun ways if you do it.)

Written on 08 December 2016.
« One reason why user namespaces keep enabling Linux kernel security issues
It's a good idea to test your spare disks every so often »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 8 00:37:48 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.