Wandering Thoughts archives

2006-09-07

Industrial strength Python

It's not uncommon to have people harsh on Python and similar languages as being 'scripting languages', not suitable for serious jobs because they're not fast enough or they use too much memory, or the like. I have to beg to differ, because I have a great counter-example.

We use a SMTP frontend daemon to check connections before we bother to start a real SMTP conversation. This program does our greylisting, our DNS blocklist checking, connection count limiting, and a number of other checks. And it's written in Python, for various reasons.

Last week it handled 1.4 million connections, with over 700,000 coming in one day. For those 24 hours, it was handling just over 8 connections a second. Yes, it used a bit of CPU time to do this (it seems to have averaged a bit under 0.06 CPU seconds per connection), but modern machines generally have a lot of CPU to spare.

Nor did it gobble memory to do this; at the end of the week, the process was using 20 megabytes of virtual memory, with an 11 megabyte resident set size. This is up from its starting size, which is around 8 megabytes with 5.5 megabytes of RSS, but the frontend is remembering the first and last connection times of most every IP address that ever talked to it; last week it was tracking almost 53,000 of them. A version of the frontend written in C (or maybe Java) would probably use less memory. But the Python version's memory usage is not over the top or excessive for a modern machine, and it's not leaking.

I won't claim that writing industrial strength Python like this is completely easy; you do have to pay attention to detail and watch out for various things, and I certainly got my hands dirty poking around down in the depths of Python's object management in the process of making sure that I was using as little memory as possible. But it's not hugely difficult, and a lot of it is common sense.

(And to a fair extent you're going to have to do this no matter what language you use; industrial strength programs require attention to details, period. Different languages just require you to pay attention to different bits.)

python/IndustrialPython written at 22:29:05; Add Comment

Some wise words from Henry Spencer on backups

Henry Spencer recently wrote some very useful words of advice on backups on a local sysadmin mailing list. They struck me as the sort of things that are useful enough to share more widely, so with Henry's permission I'm putting his message here. (I thought about running just part of his email, but the more I read it, the more I wanted people to see all of it, so I'm just going to put up the whole thing.)

So, in Henry Spencer's own words:

...So please don't be put off doing a simple thing that will produce significant benefit in most cases, such as storing backups in the next building, just because there exist some "movie plot" scenarios in which this would not be good enough.

I concur. (And I speak as one of the few people on this list who's been running machines on campus long enough to remember the Sandford Fleming fire.) Remember also two things:

(1) A disaster big enough to wipe out both your building and the next building over is likely to have repercussions severe enough to make the up-to-dateness of your offsite backups somewhat secondary.

(2) A wonderful offsite-backup plan which is so inconvenient that it is followed only fitfully is worse than none at all.

There is something to be said for doing an occasional very-offsite backup. But for the weeklies and monthlies, above all you want a plan which is practical enough and convenient enough that you will FOLLOW IT consistently, month after month after month. Hauling a pile of media to and from a remote location gets tedious quickly.

Bear in mind, too, that by a corollary of Murphy's Law, the time when a backup will be most needed will be when the relevant sysadmin is out of town. You want an offsite-backup location that your assistant (etc.) can get access to when necessary; the top shelf of your hall closet is out. If your offsite backups are stored in the next building by informal arrangement between you and the sysadmin there, make sure that other people in both places know about it. You may want to have a formal authorizing letter ("Joe Blow and his staff from Dept. XYZ are authorized to remove or exchange the tapes on the bottom shelf of storage cabinet 3 at any time") on file in case everybody technical at the far end is away.

The one halfway-plausible accident that just might manage to affect two adjacent buildings is a fire. Not because the fire is likely to spread to the second building, but because water and smoke don't necessarily respect building boundaries. (When Sandford Fleming burned down, the firemen spent six hours pouring water in from all sides... and at least one adjacent building was closed due to flooding; indeed, there was flooding as far away as Queen's Park subway station.) Smoke in particular can get into places you'd never think it would reach -- closed drawers, etc. -- and the soot it leaves can be quite corrosive.

There is one simple step you can take that will make your offsite backups much less vulnerable to such indirect hazards: bag them in airtight zip-lock bags. In fact, this is worth doing for the most recent set of on-site backups too -- a serious fire anywhere in your building can expose your computing facility to water and smoke even if the fire never gets anywhere near it.

The hazards of smoke and soot are something I hadn't previously thought of, and the zip-lock bag trick strikes me as both very clever and nicely simple. (I have a weakness for simple, low-tech solutions to problems.)

(PS: for University of Toronto people who stumble over this entry and want to be on the local sysadmins mailing list, you can get on by sending email to ut-admins-request at the domain utcc.utoronto.ca.)

sysadmin/SpencerOnBackups written at 12:46:23; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.