2013-12-31
Reversing my view on Python 3 for new general code: avoid it
Just about exactly two years ago I wrote Python3NewCode, in which I waved my hands a bunch and then said:
Ignoring [my handwaved issues] as ultimately unimportant, I don't think there's any reason not to write new, non-sysadmin code in Python 3.
I take all of that back. In retrospect I was being too nice to Python 3 back then and I was wrong to do so. Here is my new view: you should avoid Python 3 even for new code because there is very little to gain from writing in Python 3 and significant downsides to doing so.
(Part of those downsides is that the things that I so blithely handwaved away did not in fact go away and remain as real issues today, two years after I wrote that entry.)
The spark for this reassessment is twofold. First, I have not actually written any of my new Python code in Python 3 (for all sorts of reasons I'm not going to belabor). Second, Alex Gaynor recently wrote 'About Python 3' and this got me thinking about the whole situation and my feelings.
The big problem with Python 3, per Gaynor's article, is that the Python 3 ecosystem is a ghost town. Regardless of whether or not you have Python 3 available on any particular system, the reality is that almost no one is writing Python 3 code. The practical Python ecosystem, the one where people will answer your questions and develop interesting new modules and interesting Python things is Python 2. Useful deployment options are in practice Python 2 ones. If you choose to write in Python 2, you get to take advantage of all of this. If you write in Python 3, not so much. In exchange for giving up all of this you get very little. Most people will find no killer, must-have feature in Python 3 to compensate for the risks and problems you are taking on by using it.
(There are some modules that are only available for Python 3. My impression is that for the most part they come from the core Python developers, precisely because all outside people who are developing modules understand that most of the actual Python programming happens in Python 2.)
Given the complete shambles of the Python 2 to Python 3 transition and the resulting uncertainty about what's going to happen in the longer term, I can't recommend starting even greenfield development in Python 3 unless you have some quite strong reason for it (ie, something important that you can do in Python 3 but not in Python 2). Certainly I reverse my previous position; there's no strong reason to write new code in Python 3 and some good reasons to avoid doing so. Python 2 is here and good today. Even today, Python 3 is still a 'someday maybe in the future' thing.
(At this point I'm not sure if a genuine Python 2 to Python 3 transition will ever happen. The really pessimistic future is that Python 2 shambles on as an increasingly creaky zombie for the next decade, Python 3 effectively fails and becomes irrelevant, and as a result Python itself is abandoned for other languages.)
Link: Alex Gaynor's 'About Python 3'
Alex Gaynor just wrote About Python 3, which is not a bright and happy assessment about the state of Python 3. He says many things that I agree wholeheartedly with, from a position of authority and of good writing. He also crystallizes a number of things for me, such as the following:
Since the time of the Python 3.1 it's been regularly said that the new features and additions the standard library would act as carrots to motivate people to upgrade. Don't get me wrong, Python 3.3 has some really cool stuff in it. But 99% of everybody can't actually use it, so when we tell them "that's better in Python 3", we're really telling them "Fuck You", because nothing is getting fixed for them.
Yes. This. Wholeheartedly this. Every Python 3 only feature or module or improvement might as well be on the far side of the moon as far as it goes for me using it for anything meaningful.
And what he says at the end, too. Everything that the core Python developers are currently doing is completely irrelevant to what I do with Python and will probably be for at least five more years and perhaps as much as a decade. At this point we are living on different planets.
By the way, significant problems surfacing with Python 2 and not getting fixed would not get me to migrate to Python 3. I cannot migrate to Python 3 at this point because it is simply not present on platforms that I use. Very soon my best alternative to Python 2 will probably be Go, because at least I'll be able to compile static binaries for just about everything I care about and push them to the target machines.
(Using Go will suck compared to using Python for the problems that I use Python for, but it will suck less than building and installing my own version of Python 3.)
This is a drum that I have been banging on for some time so of course I'm happy to see it getting attention from people with influence, instead of just my lone voice in a corner. I'd like to think that people like Alex Gaynor speaking up will create actual change but I don't expect that to happen at this point. The core Python developers have to be very invested in their vision of Python 3 and its transition by now; a significant reversal would be very surprising because people almost never reverse such deeply held positions regardless of what they are.
2013-12-12
Some observations from playing with PyPy on DWiki
DWiki is the software behind Wandering Thoughts. It makes a convenient test platform for me to experiment with PyPy because it's probably the most CPU-intensive Python code I have anything to do with and also the potentially longest-running program I have, which turns out to be very important for PyPy performance. In the process of doing this today I've wound up with some observations.
(All of these are against PyPy 2.1.0 on a 64-bit Fedora 19 machine.)
My first discovery that it can be relatively hard to make a relatively optimized program descend into true CPU crunching of the sort that PyPy theoretically drastically accelerates. DWiki has significant amounts of caching that try to avoid (theoretically) expensive operations like turning DWikiText into HTML, and in normal operation these caches are hit all of the time. PyPy doesn't seem to be able to do anything too impressive with what's left.
(In reading PyPy performance documentation I see that I'm probably also getting hit by bad PyPy performance on cPickle, as DWiki's caches are pickle-based.)
When I bypassed some of this caching so that my Python code was doing a lot more work, I got confirmation of what I already sort of knew: PyPy required a lot of warmup before it performed well. And by 'performed well' I mean 'ran at least as fast as CPython'. In my code on a very low level operation (simply converting DWikiText to HTML, without any caches), PyPy needed hundreds of repeats of warmup before it crossed over to being faster than CPython. This general issue is common for tracing JITs, but I didn't expect it to be so large for PyPy. CPython has flat performance, of course. The good news is that on this low level task PyPy does eventually wind up faster than CPython (although it's hard to say how much faster; my test framework may over-specialize the generated code at present).
(This warmup issue has significant implications for preforking network servers. You likely need to have any given worker process handle quite a lot of requests before PyPy is at all worth it, and that raises concerns with slow memory leaks and so on.)
So far I have only talked about CPU usage and haven't mentioned memory usage. There's a good reason for that: for DWiki, PyPy's memory usage is through the roof. My test setup consistently has CPython at around 13 MB of active RAM (specifically RSS). PyPy doing the same thing takes anywhere from 70 MB to 130 MB depending on exactly what I'm testing. In many situations today this is a relative killer (especially again if you're dealing with a preforking network server, since PyPy memory usage seems to grow over time and that implies every child worker process will have its own copy).
My overall observation from this is unsurprising and unexciting, namely that PyPy is not a drop in magic way of speeding up my code. It may work, it may not work, it may require code changes to work well (and right now the tools for finding what code to change are a bit lacking), and I won't know until I try. Unfortunately all of this uncertainty reduces my interest in PyPy.
(I have seen PyPy clearly beat Python, so it can definitely happen.)
2013-12-10
My current view of PyPy
In a comment on my entry about Go as a 'Python with performance' for me, I was asked about my views on using PyPy for this. I flailed around a bit in my reply there and since then I've been thinking about it more, so it's time to go on at more length.
The simple version is that today I think of PyPy as perhaps a way to make some Python programs go faster but not as a way to write fast Python programs. If I have an existing Python program that fit what I think of as the PyPy profile (long-running, generally does basic operations, and I'm indifferent to memory usage) and I absolutely needed it to go faster, I'd consider feeding it to PyPy to see what'd happen. If it speeds up without exploding the memory usage, I've won and I can stop. If that doesn't work, well, time for other measures. However, PyPy is too unpredictable to me for me to be able to write Python code that I can count on it speeding up dramatically, especially if I also want to control the memory usage and so on.
There are other pragmatic issues with using it. For a start, the version of PyPy available to me through distribution packages varies widely from system to system here and with that variance I can expect an equally large performance variance. The current version of PyPy is 2.2.1 while Fedora 19 has 2.1.0 and Ubuntu 12.04 LTS is back at 1.8. Beyond that, a certain amount of interesting Python environments just don't work with PyPy; for example, I can't use PyPy to speed up parts of a Django app deployed through mod_wsgi (not that the app is likely to have a performance bottleneck anyways, that's an illustration).
There's also two serious problems with PyPy today that make it far less interesting for me (at least as of the Fedora 19 version of 2.1.0). The first is what I alluded to above; PyPy has a significant startup delay before it starts speeding up your program and thus doesn't really speed up short running things. I'm pretty sure that if I had a Python program that ran in a second, PyPy wouldn't speed it up very much. The second is that PyPy quietly explodes on common Python idioms under some circumstances.
For an example that I have personally run into, consider:
data = open("/some/file", "r").read()
This is a not uncommon Python idiom to casually read in a file. If
you try this in a PyPy-run program in any sort of situation where you
do this repeatedly, you'll probably wind up with a 'too many open
files' error before too long. In straight (C)Python the open file is
immediately garbage collected at the end of the .read(); in PyPy, it
seems to hang around (presumably for a full garbage collection run) and
with it the open file descriptor. Boom.
Yes, yes, you say, this is bad style. The reality is that this 'bad style' is common in Python, as are other examples where code assumes that dropped or out of scope objects will be immediately garbage collected. I don't want to spend my time troubleshooting mysterious problems in otherwise reliable long-running Python programs that only appears when I run them under PyPy. Not running them under PyPy is by far the easier solution, even if it costs me performance.
(In my opinion non-deterministic garbage collection is actually a serious problem, but that's another entry.)