2013-03-25
Rethinking avoiding Apache
Somewhat recently I wrote about when I'd use a web server other than Apache (despite Apache's temptations). I've recently discovered that I need to change those opinions somewhat; Apache turns out to be much more usable than I expected in a constrained resources situation.
One of my recent hobbies has been testing DWiki in a low-memory virtual machine (as I mentioned once in passing). I did my primary testing using nginx because it had an SCGI gateway, but with that working I decided on a whim to see how Apache plus mod_wsgi would do in the same small VM. To be honest, I expected Apache to explode spectacularly under any sort of real concurrent connection load, driving the virtual machine into the ground in the process.
To my total surprise, this did not happen. Not at all. Instead a more or less stock Ubuntu 12.04 Apache plus mod_wsgi setup handily dealt with all of the load I could throw at it. In my limited testing it was actually slightly faster on average than my nginx setup, dealt better with really extreme numbers of concurrent connections, and still left the machine with free memory. It was also easier to manage than my nginx lashup, which needed a separate system to run and restart the SCGI-based WSGI server that nginx talked to.
Part of this seems to be that Ubuntu 12.04 has sensible (ie small) Apache configuration settings. Another part is that mod_wsgi totally isolates the WSGI serving into separate processes (although they are still Apache processes). But regardless of all of this the whole setup just works and does so in an environment where I had previously expected Apache to be completely unsuitable. I am metaphorically eating my hat right about now.
(If I ever do deploy DWiki into such an environment, Apache plus mod_wsgi is now going to be my first choice. Not for performance, I doubt there's any meaningful practical difference, but because it's easier to manage because everything is in one spot and mod_wsgi has good support for easy code reloads.)
Sidebar: a caution about my performance results
Siege, the load tester I was using, reports only the average request time (and the maximum and minimum); it doesn't provide any difference about the distribution. It's possible that the distribution of response times is worse with Apache and the average is masking this. To do real testing I'd need to find a more thorough HTTP load tester (well, one with better stats reporting).
2013-03-20
Don't use ab for your web server stress tests (I like siege instead)
Like many other people, I sort of automatically reach for the venerable
ab Apache program when I want to do some sort of a web server stress
test. I've heard that it has flaws and it's not the best program out
there, but surely it's good enough for the basics, right?
Well, no, as I found out recently. I don't know exactly why or what's
going on, but ab's concurrency option plain doesn't work; you get
nowhere near as much concurrency as you asked for and it claims. Due to
my concurrency misunderstanding
I got to see this first hand and very vividly. When I ran 'ab -c N'
against a test DWiki setup, nowhere near as many worker processes got
started and used as there should have been (I believe I asked for 50
concurrent requests and saw only 4 worker processes running, which is
very wrong). So my message is simple: do not use ab to test anything
you care about. That it's there does not make it worthwhile unless
you are very sure that it is not quietly doing something odd on you.
On the other hand I can attest that siege works. When I asked it to make N
concurrent requests, well, my worker process count shot right up to
what it should have been (in the case of high concurrency, every worker
process that I allowed). Siege is also capable of hammering on a fast
web server so rapidly that it exhausts your machine's normal range of
28,000 or so local TCP ports. On the one hand this is vaguely annoying.
On the other hand I can only describe it as a good problem to have,
since it means you are serving requests considerably faster than old
sockets can expire out of TIME_WAIT.
(Siege is not perfect and I have not conducted either an exhaustive test of web server stress testers or a careful validation of the numbers it reports. Plus, if you really care about this you will want not just averages for things like response speeds but also 90th and 99th percentiles and distributions and so on. You may also want a more sophisticated model than just concurrent connections, one that more closely models the real world behavior of people.)
(This elaborates on a tweet I made a while ago.)
2013-03-14
What I want out of a web-based syndication feed reader
In light of Google Reader's impending shutdown I've started thinking about what I'd want out of any replacement to it that I switch to. I don't use Google Reader as my primary feed reader (that has always been Liferea); instead, my use is for three somewhat contradictory things:
- feeds that I want to be able to browse from more than one place.
- casual reading feeds, where Google Reader's slow expiry of old unread entries is a feature.
- feeds that I don't want to get lost in the black hole that my Liferea feeds have turned into.
(Unless I really care about a feed, adding it to Liferea usually insures that I then ignore it; I just have too many things in there. I should probably remove most of my current Liferea feeds but I can't get up the willpower and I can't quite abandon the idea that I'll read those worthwhile entries someday.)
This leads me to think that a number of features are important to me (besides just being web-based in some way, even self-hosted):
- not a 'river of news' interface where all entries from all feeds
are dumped on me at once. A Planet-style interface may work for many
people but it doesn't work for my casual reading; I need to be able
to pick and choose what I'm going to read at any given point.
- a notion of unread and read entries where I don't have to read a
feed in any specific order; I can skip around, read some entries,
and leave others for later (even leave entire feeds for later).
- unread entries need to expire after a while. Ideally not really
fast; say, a month.
- meaningful visibility of entry contents while I'm browsing things
(ie the way Google Reader does it). I don't want to see little
snapshots of web pages or anything like that, I want to see some
(or all) of the text of an entry.
- efficient use of space that does not slice things up into a
squeezed multi-column layout. I read one entry at a time; I
do not need to see two or three columns of them on the screen,
forcing the one I want to read into a tiny skinny box.
(I think I've seen this sort of bad layout called a newspaper like layout, presumably because of a newspaper's multiple columns.)
I'm relatively indifferent to whether or not the feed reading presents entries as simple, readable text (as Google Reader and Liferea do) or whether it makes some attempt to make entries look like they do on the real site (as some other web-based feed readers apparently do). Terrible formatting will just cause me to unsubscribe from a feed, which should be no major loss given what I'm theoretically using this for (mostly).
Unfortunately all of this is a sufficiently complex set of wishes that it implies a web application instead of just a website (although I'm willing to self-host the web app if I can).
(In theory I'd also be happy with a good graphical feed reader program that synced things between multiple machines using some backend. In practice I'm not sure there's any such program whose interface I'd like and that runs on Fedora.)