2006-09-17
How to convert a time string in GMT to seconds since the epoch
Recently, I've run into a situation where I have an ASCII string representing a time, expressed in UTC/GMT, and I want to convert it to the standard Unix representation of seconds since the epoch. It turns out that there are some interesting issues involved (well, in my opinion).
Doing this for a time string in local time is a snap: time.strptime()
will convert the ASCII time string to Python's version of C's struct
tm, and then time.mktime() will turn it into seconds.
However, time.mktime() specifically always uses local
time for the conversion; there is no option to have
it use GMT. To use GMT, you need to use calendar.timegm()
instead.
(To the documentation's credit, there is a mention of this at the bottom of the time module, although it is somewhat obscure and easy to miss. (I did the first time around.))
timegm() is somewhat out of place in the calendar module; its
documentation even says that it's an 'unrelated but handy function'.
So why isn't it in the time module instead, where it logically
belongs? It's probably because the CPython interpreter makes it hard
to have a module that integrates C and Python code.
The timegm() function is a simple bit of Python. However, the time
module is pretty much a thin wrapper around the C library's time
functions, and so is written in C. A version of timegm() redone in C
would be annoyingly more complex, and the existing Python version cannot
easily be shoved into a C module. You can shim Python functions into
existing modules, but the trick is getting it to happen automatically
when people do 'import time' (without going the route of the socket
module, where the real socket module is actually something called
_socket).
(While it would also be out of place in a wrapper module, this isn't something programmers using the module should really have to care about. What's implemented by the C library and what's implemented in C by the Python runtime is an implementation detail.)
2006-09-11
Python's extra-clever help() function
I recently used 'help()' on one of my own internal modules, mostly as
a quick way to browse the entire function/class collection in one go.
Now, I'm kind of a slacker when it comes to using docstrings (for some
reason I prefer comments before the function), so I didn't expect to
see very much in the help() output.
Much to my surprise I saw things like:
hostFromEnv(env) # Extract the host from the # REQUEST_URI, if it is present.
The amount of introspective magic required to do this turns out to be pretty impressive; for extra points, it's pretty much all done in Python.
The help() function is a shim created in site.py
as an instance of a class. The function wraps pydoc.help(),
which winds up calling on inspect.getcomments()
for the heavy lifting. This works by finding and scanning the source
file itself, starting from the co_filename and co_firstlineno
attributes that the bytecode compiler glues on things.
(I should probably get over my reluctance to use docstrings, especially if I'm putting the exact same information in the comments. At the moment I like the visual look of comments better; docstrings make me feel that the documentation is quietly disappearing into the rest of the code.)
2006-09-07
Industrial strength Python
It's not uncommon to have people harsh on Python and similar languages as being 'scripting languages', not suitable for serious jobs because they're not fast enough or they use too much memory, or the like. I have to beg to differ, because I have a great counter-example.
We use a SMTP frontend daemon to check connections before we bother to start a real SMTP conversation. This program does our greylisting, our DNS blocklist checking, connection count limiting, and a number of other checks. And it's written in Python, for various reasons.
Last week it handled 1.4 million connections, with over 700,000 coming in one day. For those 24 hours, it was handling just over 8 connections a second. Yes, it used a bit of CPU time to do this (it seems to have averaged a bit under 0.06 CPU seconds per connection), but modern machines generally have a lot of CPU to spare.
Nor did it gobble memory to do this; at the end of the week, the process was using 20 megabytes of virtual memory, with an 11 megabyte resident set size. This is up from its starting size, which is around 8 megabytes with 5.5 megabytes of RSS, but the frontend is remembering the first and last connection times of most every IP address that ever talked to it; last week it was tracking almost 53,000 of them. A version of the frontend written in C (or maybe Java) would probably use less memory. But the Python version's memory usage is not over the top or excessive for a modern machine, and it's not leaking.
I won't claim that writing industrial strength Python like this is completely easy; you do have to pay attention to detail and watch out for various things, and I certainly got my hands dirty poking around down in the depths of Python's object management in the process of making sure that I was using as little memory as possible. But it's not hugely difficult, and a lot of it is common sense.
(And to a fair extent you're going to have to do this no matter what language you use; industrial strength programs require attention to details, period. Different languages just require you to pay attention to different bits.)