2011-11-21
A cheap caching trick with a preforking server in Python
When the load here climbs, DWiki (the software behind this blog) transmogrifies itself into an SCGI based preforking server. I'm always looking for cheap ways to speed DWiki up for Slashdot style load surges (however unlikely it is that I'll ever need such tuning), and it recently occurred to me that there was an obvious way to exploit a preforking server: cache rendered pages in memory in each preforked process. Well, not even rendered pages; the simplest way to implement this is to cache your response objects.
(DWiki already has various layers of caching, but its page cache is disk based. A separate cache has various advantages (such as cache sharing between preforked instances) and a disk based cache means that you don't have to worry about memory exhaustion, only disk space, but both aspects slow the cache down.)
A simple brute force in-memory cache like this has a number of attractions. Caching ready to use response objects (combined with simple time-based invalidation) means that this cache is about as fast as your application will ever go. It's quite simple to add to your application, especially if your application already has the concept of a flexible processing pipeline; you can just add a request-stealing step early on, and cache the response objects that you're already bubbling up through the pipeline. Assuming that you're having processes exit after handling some moderate number of requests, using a per-process cache creates a natural limit on any inadvertent cache leaks, memory usage, and cache expiry and invalidation issues; after not too long the entire process goes away, caches and all.
(You can also size the cache quite low; you might make it one tenth or one fifth the number of requests that a single process will serve before exiting. A large cache is obviously relatively pointless; as the cache size rises, the number of cache hits that the 'tail' of the cache can ever have drops.)
Adding such an in-memory cache to the preforking version of DWiki
did expose one assumption that I was making. For this cache to work,
response objects have to be immutable after they are finished being
generated. It turned out that DWiki's code for conditional GET
cheated
by directly mutating response objects; when I added response object
caching this resulted in a very odd series of HTTP responses that were
half conditional GET
replies and half regular replies. I had a certain
amount of head-scratching confusion until I worked out what was going
on and why, for example, I was seeing 304 responses with large response
bodies.
Mouse scroll wheels versus buttons that you actually want to use
I've written before that I don't want a mouse where the middle button is a scroll wheel. At the time I was brief about why, but today I'm going to expand on that and explain why you can't combine the two functions in something where both work as well as they would independently. In short: if you have a scroll wheel that is also a mouse button, using one or the other necessarily has to kind of suck.
The core problem is one of activation pressure. A good mouse button takes relatively little finger pressure to activate. You don't want too little finger pressure because then it gets activated accidentally, but the more pressure it takes the more fatiguing it is to use the button (and to some extent the slower and more awkward to do so). However, a low activation pressure for the button function of a scroll wheel is in strong conflict with using the scroll wheel as a scroll wheel.
To use a scroll wheel comfortably, you need to be able to rest your finger on the wheel without having anything happen. This means that scrolling itself needs to require some pressure to do, especially if you want to scroll in perceptible discreet steps (perceptible steps imply that there is extra resistance between steps, so that it easy to stop after a step). This pressure is not exerted directly downwards but it does involve a certain amount of downforce, especially if the user is a bit sloppy or hurried. To make using the scroll wheel work well, you need this downforce from using the scroll wheel to not activate the button function; otherwise, using the scroll wheel to scroll would come with bonus random button clicks, which is sure to irritate you.
Ergo you have a conflict. You can either have a nice to use button with a hair trigger, hard to use scroll wheel, or you can have a scroll wheel that works well with a button function that takes significant extra force to use. For relatively obvious reasons, every 'scroll wheel middle button' mouse I've seen takes the second approach.
(If you have a scroll wheel mouse, you can try this yourself; feel out the amount of force that's required to activate each button. With every scroll wheel mouse I've ever used, the left and right buttons have relatively light force while the middle scroll wheel requires perceptibly more.)
I kind of like the scroll wheel, but I use the middle mouse button all the time, almost literally. This makes the common scroll wheel tradeoff a very bad thing for me; it is exactly reversed from my usage pattern.
(There are also issues with finger positioning, where the scroll wheel is too far back for the tip of my finger to naturally rest on top of it. This might just be me and it might go away if I used scroll wheels more, which may come to pass.)
The likely cause of my IPSec dropped packet mystery
I believe that I've identified the cause of my mysterious dropped GRE tunnel packets that showed up in recent kernels. The short description of the cause is recursive routing leading to a path MTU collapse.
Explaining this is going to take some verbiage. Back when I set up my GRE tunnel, I wrote:
My current trick is routing the subnet that the target of the tunnel is on over the tunnel itself, which makes my head hurt.
Let me make this concrete. The GRE tunnel target is 128.100.3.58, and as part of my dual identity routing I have a route:
ip route add 128.100.3.0/24 dev extun
Let us call the tunnel target T, my machine's inside address I, and my machine's outside address O (because all of these are much shorter and clearer than writing out the IP addresses in full). In an environment with policy based routing it's possible to see how all of this works; because the tunnel is explicitly specified as being from O to T, it is forced to ignore the route to T's subnet that would normally send the GRE-encapsulated traffic to T back over the tunnel. This still works even if you talk directly to T without specifying a source address; your plain TCP connection will be routed over the GRE tunnel (and get the source address of I), and then the encapsulated version will be routed over the regular connection since it now comes from O.
(It's possible that the kernel is smart enough to do this even without policy based routing, but I haven't tested that.)
Because the GRE tunnel is an encapsulation over my regular link, it has a lower MTU than the regular link. This means that traffic going from I to T has a lower (path) MTU than traffic going from O to T.
In old kernels, all of this worked fine, and in particular the kernel kept the path MTUs of the two versions of traffic to T separate. In recent kernels, this appears to have changed; it looks like there is only a single path MTU for T, regardless of the path to it. The consequence is that when I start a TCP conversation with T over the GRE tunnel, the path MTU to T almost immediately collapses down to 552 octets (the default minimum path MTU). I assume that this is happening due to a recursive series of path MTU reductions; first the GRE tunnel reduces the normal MTU to T down to the GRE tunnel's MTU, then the encapsulation code notices that the GRE tunnel's MTU doesn't fit in the MTU to T and chops it in turn, and things repeat until the kernel won't let the MTU go any lower.
(There appears to be a minimum MTU for the GRE tunnel that is over
552 octets. Once the MTU to T shrinks too far and I try to talk
to anything over the GRE tunnel, I see a series of locally generated
rejections of the form 'ICMP 128.100.3.51 unreachable - need to frag
(mtu 478), length 556
'. Another diagnostic is that the transmit error
count shown by 'ip -s -s show dev extun
' keeps counting up.)
One can see some of this by inspecting the routing cache with
'ip route show table cache
'. However, flushing the cache
(with 'ip route flush table cache
') does not help; it seems
that in current kernels, this routing cache is not the real fully
authoritative source of this information. (I am not up enough on
Linux networking to understand what is going on here.)
This problem can be avoided to a certain extent by creating a host route for T that sends traffic for it explicitly over the underlying link, not the GRE tunnel. However you will still provoke this problem if you force traffic for T to go over the GRE tunnel (for example, by specifying a source IP of I so that policy based routing kicks in); this just avoids accidents.
(Much of my understanding of what's going on has been developed through interacting with Eric Dumazet on the Linux kernel netdev mailing list, and in skimming netdev in general. Without Eric's questions in response to my initial bug report, I would never have been able to work out what's going on.)
Sidebar: useful sysctls and other things
There are two potentially useful sysctls in /proc/sys/net/ipv4/route.
min_pmtu
sets the minimum path MTU, and is normally 552.
mtu_expires
sets how long (in seconds) that learned path MTU(s) will
stick around for and is normally ten minutes; I believe that setting
it to a low value does not expire already-learned path MTUs. There is
a seductive looking flush
sysctl entry in the same directory but I
was unable to get it to do anything useful in testing; whatever it's
flushing is not what is grimly holding on to a bad path MTU.