== FastCGI's encoding mistakes and how not to design a wire protocol One of the things that [[FastCGI http://www.fastcgi.com/]] needs to transfer between the web server and the FastCGI server are a bunch of name/value pairs; this is used, for example, to transfer the traditional CGI environment variables for a request to the FastCGI server. FastCGI being FastCGI, it has defined a binary record for this; each name/value pair is encoded as the length of the name, the length of the value, and then the name and value as bytes (with no terminating zero). In order to reduce the overhead for short names and/or values, FastCGI allows the length of either to be encoded in either a short one byte form or a four-byte form. To tell them apart the protocol uses the high bit of the first byte (which is the high byte) of each length; if the bit is set, it is a four-byte length (and the high bit is masked out when the length is computed). Unset, and it is a one-byte length. This allows FastCGI to save up to six bytes per name/value pair, reducing the protocol overhead from eight bytes to two and, as the specification is careful to note, allows a FastCGI application to immediately allocate a correctly sized buffer for the pair. (All of this is in [[the FastCGI specification, section 3.4 http://www.fastcgi.com/drupal/node/6?q=node/22#S3.4]].) It is genuinely hard to count all of the terrible mistakes in this wire encoding. * this significantly complicates decoding and encoding name/value pairs. * this multi-way encoding optimizes a non-problem. FastCGI is used either on the local machine (over Unix domain sockets or connections to localhost) or at the worst over local gigabit network links. Neither environment is short on bandwidth compared to web server request rates. (The FastCGI protocol specs make high-minded appeals to the X Windows protocol, completely ignoring the vast difference in request rates between an X server and a web server.) * the four-byte length encoding is completely over-engineered, since it can encode names or values that are up to 2 *gigabytes* in size. Web servers do not get and do not accept header names or header values that are anywhere near that large. (The HTTP 1.1 specification does not appear to restrict the maximum length of header names or header values, so if you care about utter spec compliance even a limit of 2 GBytes is too low.) * in order to 'immediately allocate' a correct sized buffer for the name and value, you must make four separate reads (conceptually): one for the first byte of the name length, possibly another one to get the next three bytes, one for the first byte of the value length, and finally possibly another one to get the next three bytes. (Nor can you read a bunch of bytes into a buffer and then start sorting out the mess. Values are allowed to be empty, so the shortest possible name/value record is only three bytes long.) After this you can efficiently make that one memory allocation. * in practice you never send just one name/value pair; you send a bunch of them at once. If FastCGI wanted to reduce the overhead of sending name/value pairs, it should create a single record that covered a *sequence* of name/value pairs. * All of this efficiency ignores the fact that this bit of FastCGI is a stream protocol layered over a (multiplexed) message protocol, so you are not reading straight from the raw network into your single memory allocation. You are already copying data around from hither to yon (from the low-level message to your stream of parameter data) and you may already be doing fragment reassembly in arbitrary places. * still on the efficiency front, since name/value data is not terminated with null bytes many C-based environments will need to copy it in order to create C style null-terminated strings. Essentially, everything that FastCGI has done here is a micro-optimization that misses the forest for the trees. Not only is it optimizing things that don't matter, it is the protocol equivalent of carefully polishing your bubble sort routine when you ought to be using quicksort. If you want to optimize a wire protocol in this way, you must take a whole stack view; you must look at the flow of data through the whole decoding process (from the point where you get some bytes from the operating system on up), not just at one small portion of it, and you must look at the protocol overhead of an entire transaction (taking all layers into account). Of course, all of that is a lot less showy than coming up with a complex encoding scheme for lengths that saves six bytes.