2010-03-26
Tkinter sometimes has a busy-wait main loop (more or less)
One of my always-running programs is a little Tkinter-based Python
app. For a while now I've noticed that it was accumulating a surprising
amount of CPU time and would sometimes show up on top as active,
despite me not doing anything with it. Today I finally got around to
figuring out why, and the answer is that Tkinter's main loop effectively
busy-waits in some Python environments.
The standard way for Tkinter-based programs to operate is to make all of
the Tkinter calls to set up your windows and then call root.mainloop()
(where root is the object you got from calling Tk()). If your version
of CPython is built with thread support (the common case) and your version
of Tcl is built without threads enabled (this depends), the inner core
of mainloop() looks more or less like this:
while keep_running:
Tcl_DoOneEvent(TCL_DONT_WAIT);
if no event processed:
sleep(20 msec);
check_for_signals();
(One reason for this loop is to handle Unix signals promptly; the full code also has some thread-related locking stuff. See Tkapp_MainLoop() in Modules/_tkinter.c in the CPython source for the gory details.)
The net effect is that when your Tkinter-based program is sitting idle, it wakes up every 20 milliseconds to spin around doing nothing; over time this can add up to visible and even significant CPU usage.
(In theory the sleep interval can be increased; in practice you can't turn this up without lowering the responsiveness of your application, because your program won't process new events until it wakes up from the sleep (and it's going to wind up in the sleep fairly often). If you really want to touch this, see Tkinter._tkinter.setbusywaitinterval().)
My solution was to replace my use of root.mainloop() with the
following code:
global exit_mainloop
while not exit_mainloop:
root.dooneevent(0)
In Tkinter-related code where I was previously calling .quit() on
Tkinter objects to get the application to quit, I instead set the
exit_mainloop global to 1; this is more or less what .quit() does
anyways. This is probably somewhat less efficient if your application is
active all the time (since you now go through (more) interpreted Python
code for every event), but is much more efficient if your application
spends most of its time idle; strace now shows my program sitting
there doing nothing instead of constantly twitching around in system
calls.
The one caution with this approach is that .mainloop() also exits if
there are no Tk main windows left. If this matters for your application,
you'll need to keep track of this yourself and set exit_mainloop
appropriately.
Sidebar: how to see if your Tcl is built with threads enabled
Run tclsh and give it the command 'parray tcl_platform'. If it
has a threaded entry, your platform built Tcl with --enable-threads
(this information is from here). It appears
that Fedora 11 and FreeBSD build Tcl without threads while Debian, Ubuntu
and RHEL 5 build it with threads.
(I have no idea why Fedora and RHEL are different here.)
2010-03-22
Why I exploit Python to shim modules for testing purposes
A while back I wrote about how I monkey-patch modules for testing, but I didn't really write about why I prefer to do my testing this way instead of, say, using dependency injection in the code I'm going to test.
My ultimate reason for doing it this way is that I would rather have clean code and ugly tests than dirty code and cleaner tests. 'Testable' code that has everything it talks to injected into it as a parameter is generally ugly; specifically, it is artificially contorted from the natural way it would be written purely so that it can be tested. In some languages you don't have any choice, because you have no way of changing the behavior of low-level code. In Python you do, so I cheerfully take advantage of this to make my code clean and natural.
I maintain that clean, natural code really does matter. Such code is
easier and simpler to understand, and you can rely on all of your
accumulated knowledge of Python and its standard library of modules to
understand what it's doing and how it's doing things. You don't have to
double-check to make sure that self.socketSvcs.lookupName() really is
socket.gethostbyname() (or perhaps socket.getaddrinfo() these days),
or remember how this program's mutation of it is supposed to behave.
(Besides, this way I don't have to test the code that creates all of the necessary 'with everything set up to run for real' object instances. Although I suppose those are really tested via end to end functionality tests, instead of mocked-out unit tests.)
2010-03-08
Exceptions versus error return values
Python has two ways of signalling that a function has failed; you can raise an exception or return a special error value of some sort. I use both techniques in different circumstances; since I've recently been writing some Python code, I've been thinking about exactly what those circumstances are, as far as I can tell.
(Self-analysis is tricky given that I don't particularly think through the choice when I'm making it; I handle errors however seems right for the function I'm writing at the time.)
Generally, I tend to use error return values if I expect failure to be routine, especially if there is a natural return value that is easy for callers to use. For example, getting a list of IPv4 and IPv6 addresses for a host; it's routine to look up nonexistent names (or at least names with no IP addresses), and returning an empty list is an easy return value for callers to use (since in many cases they will just iterate through the list of IPs anyways).
I use exceptions if I expect failure to be rare, especially if there is nothing that the direct caller of a function is going to do to handle the problem. If the only thing that I'll do on failure is abort the program with a pretty error message, there's no need to complicate all of the code between the program's main routine and the failing function with code to check for and immediately return the error. (The obvious exception is if there is cleanup work to be done on the way out, but I've come up with ways to handle that, similar to phase tracking.)
I'm pretty sure that I'd use exceptions even for common failures if they had to be handled by someone other than the function's direct caller; I don't like cluttering functions up with a bunch of 'if error: return error' code.
This view is not the common Python one. As we can see from the standard library, the Pythonic way uses exceptions a lot more often than I do.
(I'd argue that this is a sensible tradeoff for a library, too. The advantage of exceptions is that they are unambiguous signals of failures that you can't possibly confuse with valid return values, and they force people using your library to explicitly deal with errors.)