What I mostly care about for speeding up our Python programs
There are any number of efforts and technologies around these days that try to speed up Python, starting with the obvious PyPy and going on to things like Cython and grumpy. Every so often I think about trying to apply one of them to the Python code I deal with, and after doing this a few times (and even making some old experiments with PyPy) I've come to a view of what's important to me in this area.
What I've come around to caring about most is reducing the overall resource usage of short running programs that mostly use the Python standard library and additional pure-Python modules. By 'resource usage' I mean a combination of both CPU usage and memory usage; in our situation it's not exactly great if I make a program run twice as fast but use four times as much memory. In fact for some programs I probably care more about memory usage than CPU, because in practice our Python-based milter system probably spends most of its time waiting for our commercial anti-spam system to digest the email message and give it a verdict.
(Meanwhile, our attachment logger program is probably very close to being CPU bound. Yes, it has to read things off disk, but in most cases those files have just been written to disk so they're going to be in the OS's disk cache.)
I'm also interested in making DWiki (the code behind Wandering Thoughts) faster, but again I actually want it to be less resource-intensive on the systems it runs on, which includes its memory usage too. And while DWiki can run in a somewhat long-running mode, most of the time it runs as a short-lived CGI that just serves a single request. DWiki's long-running daemon mode also has some features that might make it play badly with PyPy, for example that it's a preforking network server and thus that PyPy is probably going to wind up doing a lot of duplicate JIT translation.
I think that all of this biases me towards up-front approaches like Cython and grumpy over on the fly ones such as PyPy. Up-front translation is probably going to work better for short running programs (partly because I pay the translation overhead only once, and in advance), and the results are at least reasonably testable; I can build a translated version and see in advance whether the result is probably worth it. I think this is a pity because PyPy is likely to be both the easiest to use and the most powerful accelerator, but it's not really aimed at my usage case.
(PyPy's choice here is perfectly sensible; bigger, long-running programs that are actively CPU intensive for significant periods of time are where there's the most payoff for speeding things up.)
PS: With all of this said, if I was serious here I would build the latest version of PyPy by hand and actually test it. My last look and the views I formed back then were enough years ago that I'm sure PyPy has changed significantly since then.
Why modules raising core exceptions mostly hurts, not helps, your users
A while back I wrote an entry about how modules should never raise core Python exceptions. Recently via my Referer logs I found out that some people aren't convinced by my entry, so I feel like taking another run at this topic, this time approaching it from the perspective of someone using your module.
If I'm invoking some function or method from your module and want to trap errors, I need to write code like this:
import yourmoddef fred(): try: res = yourmod.some_thing(10, 20) except SOMETHING as e: ...
In order to fill in that
SOMETHING with the right exception, I
need to consult your module's documentation. Given that I have to
look this up, reusing a general exception saves me essentially
nothing; at most I type a little less, and
RuntimeError is not exactly a compelling savings. If I just want
to catch your explicitly raised errors, using
RuntimeError (or a
subclass of it) is not saving me any real effort.
In practice, only catching explicitly raised errors is almost always what people using your module want to do, because of the various dangers of over-broad tries that I mentioned in my original entry and elsewhere. And if I really do want to catch all errors that come out of your code, I can already do that explicitly:
try: res = yourmod.some_thing(10, 20) except Exception as e: ...
Notice that raising
RuntimeError instead of your own error doesn't
actually help me here. If I want to catch all possible errors that
can happen during your module's execution, I need to go much broader
(There are valid cases for doing this broad catching, generally in top-level code that wants to insure that no uncaught exceptions ever surface to the user.)
Which brings me around to the one case where it is sensible to raise
standard errors, which is when you're writing code that stands
in for standard Python code that raises these errors. This is the
one case where using a standard error saves me from looking things
up; in fact, using a standard error is generally essential. If
you're writing a class that will get used instead of a standard
KeyError and so on is absolutely essential,
because that makes your objects transparent substitutes for real
(This is in a sense a matter of API compatibility, where the
exceptions that get raised are part of the API. Sometimes this can
be explicit, as in the case of
KeyError, but sometimes this is
more implicit, in cases where raising and catching errors is more
Sidebar: my view on subclassing
I don't think you should do this because I don't think it's useful.
If someone is catching
RuntimeError today they probably intend
to catch significant internal issues inside Python, not errors that
your module happens to raise. Sweeping your module's errors into
except clauses is probably not helpful and may even be
For better or worse, I think that there is (or should be) a strong separation between 'internal' Python errors raised by the CPython interpreter and core Python modules, and module-specific errors raised by modules (yours, third party modules, and non-core modules in the standard library). These two sorts of errors don't really live in the same taxonomy and so putting one under the other is generally not going to help anyone.