One of the things that FastCGI needs to
transfer between the web server and the FastCGI server are a bunch of
name/value pairs; this is used, for example, to transfer the traditional
CGI environment variables for a request to the FastCGI server. FastCGI
being FastCGI, it has defined a binary record for this; each name/value
pair is encoded as the length of the name, the length of the value, and
then the name and value as bytes (with no terminating zero).
In order to reduce the overhead for short names and/or values, FastCGI
allows the length of either to be encoded in either a short one byte
form or a four-byte form. To tell them apart the protocol uses the high
bit of the first byte (which is the high byte) of each length; if the
bit is set, it is a four-byte length (and the high bit is masked out
when the length is computed). Unset, and it is a one-byte length. This
allows FastCGI to save up to six bytes per name/value pair, reducing the
protocol overhead from eight bytes to two and, as the specification is
careful to note, allows a FastCGI application to immediately allocate
a correctly sized buffer for the pair.
(All of this is in the FastCGI specification, section 3.4.)
It is genuinely hard to count all of the terrible mistakes in this
- this significantly complicates decoding and encoding name/value
- this multi-way encoding optimizes a non-problem. FastCGI is used
either on the local machine (over Unix domain sockets or connections
to localhost) or at the worst over local gigabit network links.
Neither environment is short on bandwidth compared to web server
(The FastCGI protocol specs make high-minded appeals to the X
Windows protocol, completely ignoring the vast difference in
request rates between an X server and a web server.)
- the four-byte length encoding is completely over-engineered,
since it can encode names or values that are up to 2 gigabytes
in size. Web servers do not get and do not accept header names
or header values that are anywhere near that large.
(The HTTP 1.1 specification does not appear to restrict the maximum
length of header names or header values, so if you care about
utter spec compliance even a limit of 2 GBytes is too low.)
- in order to 'immediately allocate' a correct sized buffer for
the name and value, you must make four separate reads (conceptually):
one for the first byte of the name length, possibly another one
to get the next three bytes, one for the first byte of the value
length, and finally possibly another one to get the next three bytes.
(Nor can you read a bunch of bytes into a buffer and then start
sorting out the mess. Values are allowed to be empty, so the shortest
possible name/value record is only three bytes long.)
After this you can efficiently make that one memory allocation.
- in practice you never send just one name/value pair; you send a
bunch of them at once. If FastCGI wanted to reduce the overhead
of sending name/value pairs, it should create a single record
that covered a sequence of name/value pairs.
- All of this efficiency ignores the fact that this bit of FastCGI
is a stream protocol layered over a (multiplexed) message protocol,
so you are not reading straight from the raw network into your
single memory allocation. You are already copying data around
from hither to yon (from the low-level message to your stream of
parameter data) and you may already be doing fragment reassembly
in arbitrary places.
- still on the efficiency front, since name/value data is not terminated
with null bytes many C-based environments will need to copy it
in order to create C style null-terminated strings.
Essentially, everything that FastCGI has done here is a
micro-optimization that misses the forest for the trees. Not only is it
optimizing things that don't matter, it is the protocol equivalent of
carefully polishing your bubble sort routine when you ought to be using
quicksort. If you want to optimize a wire protocol in this way, you
must take a whole stack view; you must look at the flow of data through
the whole decoding process (from the point where you get some bytes from
the operating system on up), not just at one small portion of it, and
you must look at the protocol overhead of an entire transaction (taking
all layers into account).
Of course, all of that is a lot less showy than coming up with a complex
encoding scheme for lengths that saves six bytes.