Wandering Thoughts archives

2025-01-26

Some learning experiences with HTTP cookies in practice

Suppose, not hypothetically, that you have a dynamic web site that makes minor use of HTTP cookies in a way that varies the output, and also this site has a caching layer. Naturally you need your caching layer to only serve 'standard' requests from cache, not requests that should get something non-standard. One obvious and simple approach is to skip your cache layer for any request that has a HTTP cookie. If you (I) do this, I have bad news about HTTP requests in practice, at least for syndication feed fetchers.

(One thing you might do with HTTP cookies is deliberately bypass your own cache, for example to insure that someone who posts a new comment can immediately see their own comment, even if an older version of the page is in the cache.)

The thing about HTTP cookies is that the HTTP client can send you anything it likes as a HTTP cookie and unfortunately some clients will. For example, one feed reader fetcher deliberately attempts to bypass Varnish caches by sending a cookie with all fetch requests, so if the presence of any HTTP cookie causes you to skip your own cache (and other things you do that use the same logic), well, feeder.co is bypassing your caching layer too. Another thing that happens is that some syndication feed fetching clients appear to sometimes leak unrelated cookies into their HTTP requests.

(And of course if your software is hosted along side other software that might set unrestricted cookies for the entire website, those cookies may leak into requests made to your software. For feed fetching specifically, this is probably most likely in feed readers that are browser addons.)

The other little gotcha is that you shouldn't rely on merely the presence or absence of a 'Cookie:' header in the request to tell you if the request has cookies, because a certain number of HTTP clients appear to send a blank Cookie: header (ie, just 'Cookie:'). You might be doing this directly in a CGI by checking for the presence of $HTTP_COOKIE, or you might be doing this indirectly by parsing any Cookie: header in the request into a 'Cookies' object of some sort (even if the value is blank), in which case you'll wind up with an empty Cookies object.

(You can also receive cookies with a blank value in a Cookies: header, eg 'JSESSIONID=', which appears to be a deliberate decision by the software involved, and seems to be to deal with a bad feed source.)

If you actually care about all of this, as I do now that I've discovered it all, you'll want to specifically check for the presence of your own cookies and ignore any other cookies you see, as well as a blank 'Cookie:' HTTP header. Doing extra special things if you see a 'bypass_varnish=1' cookie is up to you.

(In theory I knew that the HTTP Cookies: header was untrusted client data and shouldn't be trusted, and sometimes even contained bad garbage (which got noted every so often in my logs). In practice I didn't think about the implications of that for some of my own code until now.)

web/HTTPCookiePracticalSurprises written at 22:29:22;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.