2014-04-27
Thoughts about Python classes as structures and optimization
I recently watched yet another video of a talk on getting good
performance out of Python. One of the things it talked about was
the standard issue of 'dictionary abuse', in this case in the context
of creating structures. If you want a collection of data, the
equivalent of a C struct, things that speed up Python will do much
better if you say what you mean by representing it as a class:
class AStruct(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
Even though Python is a dynamic language and AStruct instances
could in theory be rearranged in many ways, in practice they generally
aren't and when they aren't we know a lot of ways to speed them up
and make them use minimal amounts of memory. If you instead just
throw them into a dictionary, much less optimization is (currently)
done.
(I suspect that many of these dynamic language optimizations could be applied to dictionary usage as well, it's just that people are hoping to avoid it for various reasons.)
My problem with this is that even small bits of extra typing tempt
me into unwise ways to reduce it. In this early example I both skipped having an __init__
function and just directly assigned attributes on new instances and
wrote a generic function to do it (this has
a better version). This is all well and good in ordinary CPython,
but now I have to wonder how far one can go before the various
optimizers and JIT engines will throw up their hands and give up
on clever things.
(I suspect that the straightforward __init__ version is easiest
for optimizers to handle, partly because it's a common pattern that
attributes aren't added to an instance after __init__ finishes.)
It's tempting to ask for standard library support for simple
structures in the form of something that makes them easy to declare.
You could do something like 'AStruct = structs.create('a', 'b',
'c')' and then everything would work as expected (and optimizers
would have a good hook to latch on to). Unfortunately such a function
is hard to create today in Python, especially in a form that
optimizers like PyPy are likely to recognize and accelerate.
Probably this is a too petty and limited wish.
PS: of course the simplest and easiest to optimize version today
is just a class that just has a __slots__ and no __init__.
PyPy et al are guaranteed that no other attributes will ever be set
on instances, so they can pack things as densely as they want.
2014-04-14
My reactions to Python's warnings module
A commentator on my entry on the warnings problem pointed out the existence of the warnings module as a possible solution to my issue. I've now played around with it and I don't think it fits my needs here, for two somewhat related reasons.
The first reason is that it simply makes me nervous to use or even take over the same infrastructure that Python itself uses for things like deprecation warnings. Warnings produced about Python code and warnings that my code produces are completely separate things and I don't like mingling them together, partly because they have significantly different needs.
The second reason is that the default formatting that the warnings module uses is completely wrong for the 'warnings produced from my program' case. I want my program warnings to produce standard Unix format (warning) messages and to, for example, not include the Python code snippet that generated them. Based on playing around with the warnings module briefly it's fairly clear that I would have to significantly reformat standard warnings to do what I want. At that point I'm not getting much out of the warnings module itself.
All of this is a sign of a fundamental decision in the warnings module: the warnings module is only designed to produce warnings about Python code. This core design purpose is reflected in many ways throughout the module, such as in the various sorts of filtering it offers and how you can't actually change the output format as far as I can see. I think that this makes it a bad fit for anything except that core purpose.
In short, if I want to log warnings I'm better off using general logging and general log filtering to control what warnings get printed. What features I want there are another entry.
2014-04-13
A problem: handling warnings generated at low levels in your code
Python has a well honed approach for handling errors that happen at a low level in your code; you raise a specific exception and let it bubble up through your program. There's even a pattern for adding more context as you go up through the call stack, where you catch the exception, add more context to it (through one of various ways), and then propagate the exception onwards.
(You can also use things like phase tracking to make error messages more specific. And you may want to catch and re-raise exceptions for other reasons, such as wrapping foreign exceptions.)
All of this is great when it's an error. But what about warnings? I recently ran into a case where I wanted to 'raise' (in the abstract) a warning at a very low level in my code, and that left me completely stymied about what the best way to do it was. The disconnect between errors and warnings is that in most cases errors immediately stop further processing while warnings don't, so you can't deal with warnings by raising an exception; you need to somehow both 'raise' the warning and continue further processing.
I can think of several ways of handling this, all of which I've sort of used in code in the past:
- Explicitly return warnings as part of the function's output. This
is the most straightforward but also sprays warnings through your
APIs, which can be a problem if you realize that you've found a
need to add warnings to existing code.
- Have functions accumulate warnings on some global or relatively
global object (perhaps hidden through 'record a warning' function
calls). Then at the end of processing, high-level code will go
through the accumulated warnings and do whatever is desired with
them.
- Log the warnings immediately through a general logging system that you're using for all program messages (ranging from simple to very complex). This has the benefit that both warnings and errors will be produced in the correct order.
The second and third approaches have the problem that it's hard for intermediate layers to add context to warning messages; they'll wind up wanting or needing to pass the context down to the low level routines that generate the warnings. The third approach can have the general options problem when it comes to controlling what warnings are and aren't produced, or you can try to control this by having the high level code configure the logging system to discard some messages.
I don't have any answers here, but I can't help thinking that I'm missing a way of doing this that would make it all easy. Probably logging is the best general approach for this and I should just give in, learn a Python logging system, and use it for everything in the future.
(In the incident that sparked this entry, I wound up punting and just
printing out a message with sys.stderr.write() because I wasn't in a
mood to significantly restructure the code just because I now wanted to
emit a warning.)