November 17, 2011

I recently read Evan Martin's entry on quoting and escaping, which brought to mind one of those classic C and Unix programming mistakes which hopefully is not done very much any more.

Suppose that ustr is a string that comes from user input. The classic form of this particular quoting bug is to write printf(ustr) (or perhaps fprintf(out, ustr)); a much obvious version of this is syslog(pri, ustr).

(Some people might think that the printf() version is silly since there are much more nominally efficient ways to print plain strings, but in practice plenty of people find it much more efficient on programmer time to use printf() or fprintf() to output everything.)

The problem with all of these is that the first argument to printf() and syslog() is not a plain string to be printed; it is a format. Like many quoting bugs, this goes undetected for much of the time because a true plain string is also a valid format that formats to itself. However, if someone supplies a 'plain string' that includes printf formatting directives, things rapidly go off the rails; if you are lucky the program crashes right away so you can immediately notice the problem.

(If you are sufficiently unlucky, this is an exploitable security vulnerability. And yes, people have written code like this that made it into important programs.)

The right solution is of course to quote the user supplied string. The simple way to do this is to supply your own simple formatting string: syslog(pri, "%s", ustr) (or the equivalent with printf() et al). Of course at this point you might want to think about other bad characters that could appear in ustr and how you want to display the message to make it clear that the string comes from the user, not from your program's internals.

This bug can happen with any function that takes a format string as an argument and in any language, not just C. C is just a more dangerous language to have it happen in because C generally has no check for the wrong number of arguments being supplied to a function.

(I have opinions on how languages can avoid this entire class of bugs, but that's something for another entry.)

(Perhaps this is not strictly a quoting bug. It is in my mind, but I may have a somewhat odd view of what constitutes one.)

Comments on this page:

From at 2011-11-17 18:54:46:

What other ways are there in C to print strings ?

From at 2011-11-17 19:57:07:

GCC can warn if the format string isn't a constant, which it almost always should be ... so it's pretty easy to avoid now.

Written on 17 November 2011.
Last modified: Thu Nov 17 00:39:57 2011
