Wandering Thoughts archives

2017-04-17

What I mostly care about for speeding up our Python programs

There are any number of efforts and technologies around these days that try to speed up Python, starting with the obvious PyPy and going on to things like Cython and grumpy. Every so often I think about trying to apply one of them to the Python code I deal with, and after doing this a few times (and even making some old experiments with PyPy) I've come to a view of what's important to me in this area.

(This has come to be more and more on my thoughts because these days we run at least one Python program for every incoming email from the outside world. Sometimes we run more than that.)

What I've come around to caring about most is reducing the overall resource usage of short running programs that mostly use the Python standard library and additional pure-Python modules. By 'resource usage' I mean a combination of both CPU usage and memory usage; in our situation it's not exactly great if I make a program run twice as fast but use four times as much memory. In fact for some programs I probably care more about memory usage than CPU, because in practice our Python-based milter system probably spends most of its time waiting for our commercial anti-spam system to digest the email message and give it a verdict.

(Meanwhile, our attachment logger program is probably very close to being CPU bound. Yes, it has to read things off disk, but in most cases those files have just been written to disk so they're going to be in the OS's disk cache.)

I'm also interested in making DWiki (the code behind Wandering Thoughts) faster, but again I actually want it to be less resource-intensive on the systems it runs on, which includes its memory usage too. And while DWiki can run in a somewhat long-running mode, most of the time it runs as a short-lived CGI that just serves a single request. DWiki's long-running daemon mode also has some features that might make it play badly with PyPy, for example that it's a preforking network server and thus that PyPy is probably going to wind up doing a lot of duplicate JIT translation.

I think that all of this biases me towards up-front approaches like Cython and grumpy over on the fly ones such as PyPy. Up-front translation is probably going to work better for short running programs (partly because I pay the translation overhead only once, and in advance), and the results are at least reasonably testable; I can build a translated version and see in advance whether the result is probably worth it. I think this is a pity because PyPy is likely to be both the easiest to use and the most powerful accelerator, but it's not really aimed at my usage case.

(PyPy's choice here is perfectly sensible; bigger, long-running programs that are actively CPU intensive for significant periods of time are where there's the most payoff for speeding things up.)

PS: With all of this said, if I was serious here I would build the latest version of PyPy by hand and actually test it. My last look and the views I formed back then were enough years ago that I'm sure PyPy has changed significantly since then.

FasterPythonInterests written at 02:05:16; Add Comment

2017-04-03

Why modules raising core exceptions mostly hurts, not helps, your users

A while back I wrote an entry about how modules should never raise core Python exceptions. Recently via my Referer logs I found out that some people aren't convinced by my entry, so I feel like taking another run at this topic, this time approaching it from the perspective of someone using your module.

If I'm invoking some function or method from your module and want to trap errors, I need to write code like this:

import yourmod
def fred():
  try:
    res = yourmod.some_thing(10, 20)
  except SOMETHING as e:
    ...

In order to fill in that SOMETHING with the right exception, I need to consult your module's documentation. Given that I have to look this up, reusing a general exception saves me essentially nothing; at most I type a little less, and yourmod.Error versus RuntimeError is not exactly a compelling savings. If I just want to catch your explicitly raised errors, using RuntimeError (or a subclass of it) is not saving me any real effort.

In practice, only catching explicitly raised errors is almost always what people using your module want to do, because of the various dangers of over-broad tries that I mentioned in my original entry and elsewhere. And if I really do want to catch all errors that come out of your code, I can already do that explicitly:

try:
  res = yourmod.some_thing(10, 20)
except Exception as e:
  ...

Notice that raising RuntimeError instead of your own error doesn't actually help me here. If I want to catch all possible errors that can happen during your module's execution, I need to go much broader than merely RuntimeError.

(There are valid cases for doing this broad catching, generally in top-level code that wants to insure that no uncaught exceptions ever surface to the user.)

Which brings me around to the one case where it is sensible to raise standard errors, which is when you're writing code that stands in for standard Python code that raises these errors. This is the one case where using a standard error saves me from looking things up; in fact, using a standard error is generally essential. If you're writing a class that will get used instead of a standard dictionary, raising KeyError and so on is absolutely essential, because that makes your objects transparent substitutes for real dictionaries.

(This is in a sense a matter of API compatibility, where the exceptions that get raised are part of the API. Sometimes this can be explicit, as in the case of KeyError, but sometimes this is more implicit, in cases where raising and catching errors is more uncommon.)

Sidebar: my view on subclassing RuntimeError

I don't think you should do this because I don't think it's useful. If someone is catching RuntimeError today they probably intend to catch significant internal issues inside Python, not errors that your module happens to raise. Sweeping your module's errors into their except clauses is probably not helpful and may even be harmful.

For better or worse, I think that there is (or should be) a strong separation between 'internal' Python errors raised by the CPython interpreter and core Python modules, and module-specific errors raised by modules (yours, third party modules, and non-core modules in the standard library). These two sorts of errors don't really live in the same taxonomy and so putting one under the other is generally not going to help anyone.

RaisingCoreExceptionsNotHelpful written at 00:44:11; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.