Python3NoBenefit written at 01:53:47; Add Comment
Python 3 has very little benefit for ordinary Python programmers
Sometimes an incompatible transition is strongly justified. In some cases the old code and the old ways were actively dangerous to people because they were too easy to misuse (or were actually basically impossible to use safely); in other cases the baggage of the old was making it essentially impossible to add important new features that people actively wanted.
The Python 3 transition is not one of these. It was almost entirely about removing warts in the language, and here is the thing: ordinary programmers don't really care about language warts. Every language has some warts and in practice those warts rarely get in the way of doing work in the language; people work around them if necessary and often don't even notice them. Removing these warts from Python was (as far as I can tell) not required to make other progress in the language or the standard library. They were just things about the language that irritated the core Python developers.
(Hence, among other things, the comparison of Python 3 to XHTML.)
The big exception to this is also the most prominent and consequential
change in Python 3, that of making strings into Unicode by default. But
as Python 2's '
(In fact anything that is now covered in '
Note that this is not the same thing as saying that Python 3 has not brought new and worthwhile things to Python programmers. It certainly has. But as far as I can tell the reason they are only in Python 3 is a choice on the part of the Python developers, not a requirement.
(This idea is not unique to me by any means and I've touched on it in passing before, but today I want to state it explicitly.)
Python3TransitionIssue written at 01:57:58; Add Comment
The core issue in the Python 3 transition (from my view)
In response to my entry about how Python 3 has always made me kind of angry, a commentator asked an interesting question:
As I read it, this question contains a hidden assumption: that you are going to make an abrupt and thus incompatible transition. I don't think that there's any good way to do this in a language and I don't know of any languages that have managed it gracefully once they got a significant number of users. An incompatible transition by definition creates not one language but two closely related languages, possibly somewhat translateable.
(It's theoretically possible to successfully do a transition like this; what you need is a tool that mechanically rewrites old code to be new, working code. Go actually has such a thing for many language and library changes. 2to3 was not such a tool.)
Such transitions are almost always the result of choices, or really one choice, that of the developers choosing to walk away from the old code. If you refuse to do any work to have a graceful transition then of course you don't get one. This is more or less what I remember the Python developers doing, at least initially; Python 2.7 people had a few limited bones thrown to them but not really all that many. In theory I think it would have been perfectly possible to do a much more graceful transition between Python 2 and Python 3. It just would have taken more time and required adding more things to Python 2 (and probably to its standard library).
(For 'code' one should really read 'language design'. I don't think that the actual CPython code base underwent any particularly major upheavals and rewrites between Python 2.7 and Python 3 and all of the issues that the Python developers say prompted Python 3 were about historical warts in the language.)
There's more I could say on this but in the end none of it matters. The Python developers made a decision that they were not interested in doing further work on Python 2.7 and users of Python could more or less lump it. If the developers are not interested in a graceful transition, you do not get one.
DualImportProblems written at 22:16:08; Add Comment
The consequences of importing a module twice
Back when I wrote about Python's relative import problem, I mentioned that only actually importing a module once can be important due to Python's semantics. Today I feel like discussing what these are and how much they can matter.
The straightforward thing that goes wrong if you manage to import a module twice (under two different names) is that any code in the module gets run twice, not once. Modules that run active code on import assume that this code is only going to be run once; running it again may result in various sorts of malfunctions.
At one level, modules that run code on import are relatively rare
because people understand it's bad form for a simple import to have
big side effects. At another level, various frameworks like Django
effectively run code on module import in order to handle things like
setting up models and view forms and so on; it's just that this
code isn't directly visible in your module because it's hiding in
framework metaclasses. But this issue is a signpost to the really big
thing: function and class definitions are executable statements that are run at import time. The net effect
is that when you import a module a second time the new import has a
completely distinct set of functions, classes, exceptions, sentinel
objects, and so on. They look identical to the versions from the first
import but as far as Python is concerned they are completely distinct;
(This is the same effect that you get when you use
However, my guess is that this generally won't matter. Most Python code uses duck typing and the two distinct classes are identical as far as that goes. Use of things like specific exceptions, sentinel values, and imported classes is probably going to be confined to the modules that directly imported the dual-imported module and thus mostly hidden from the outside world (for example, it's usually considered bad manners to leak exceptions from a module that you imported into the outside world). In many cases even the objects from the imported module are going to be significantly confined to the importing module.
(One potentially bad thing is that if the module has an internal cache of some sort, you will get two copies of the cache and thus perhaps twice the memory use.)
RelativeImportProblem written at 00:54:18; Add Comment
Python's relative import problem
Back in this entry I bemoaned the fact that
Python's syntax for relative imports ('
Unfortunately for me, I suspect that this restriction is not arbitrary. The problem that Python is probably worrying about is importing the same submodule twice under different names. The official Python semantics are that there is only one copy of a particular (sub)module and its module level code is run only once, even if the module is imported multiple times; imports after the first one simply return a cached reference.
(These semantics are important in a number of situations that may not be obvious, due to Python's execution model.)
However, Python has opted to do this based on the apparent (full) module name, not based on (say) remembering the file that a particular module was loaded from and not reloading the file. When you do a relative import inside a module, Python knows the full name of the new submodule you're importing (because it knows the full, module-included name of the code doing the relative import). When you do a relative import outside a module, Python has no such knowledge but it knows that in theory this code is part of a module. This opens up the possibility of double-importing a submodule (once under its full name and once under whatever magic name you make up for a non-module relative import). Python opts to be safe and block this by refusing to do a relative import unless it can (reliably) work out the absolute name.
(There are still plenty of ways to import a module twice but they all require you to actively do something bad, like add both a directory and one of its subdirectories to your Python path. Sadly this is quite easy because Python will automatically add things to the Python path for you under some common circumstances.)
SysadminVirtualenvView written at 00:11:07; Add Comment
My sysadmin view of Python virtualenvs
It all started with a tweet from Matt Simmons:
There are certainly many things that can go wrong with virtualenvs, but there are also many things that can go wrong with servers and OS packages (as I tweeted, you can have an obscure one-off server just as easily as you can have an obscure one-off virtualenv). My views on this are that there are both drawbacks and advantages to virtualenvs and to lesser solutions (like installing your own copies of packages outside of the system Python area).
There are three drawbacks of virtualenvs and similar setups. First and foremost, you (the person building the virtualenv) have just become not a sysadmin but an OS distribution vendor in that it is now your job to track security issues and bugs in everything in use in the virtualenv, from the version of Python on up. If you are not plugged into all of these, Matt Simmons is correct and your virtualenv may be a ticking time bomb of security issues.
The second drawback is common to anything that installs packages outside of the standard packaging system; it is the lack of system-wide visibility into what packages (and what versions of them) are installed and in use on the system. If someone hears that there is an important issue with version X of package Y, having a horde of virtualenvs means that there is no simple way to answer the question of 'are we running that?' Relatedly is the issue that you can't just update everyone at once by installing a system package update.
(It follows from these two issues that developers absolutely cannot just bundle up a virtualenv, throw it over the wall to operations, and then forget about it. If you do that you're begging for bad problems down the line.)
The final issue is that if you depend on virtualenvs you may run into problems integrating your software into environments that basically must use the system version of Python. One example is if you develop in a virtualenv and then decide that you want to deploy with Apache's mod_wsgi (perhaps because it is unexpectedly good). Presumably if you start down the virtualenv path you've already thought about this.
Set against this are two significant advantages. The first advantage is that you get the version of everything that you want without having to fight against the system package management system (which leads to serious problems). This is especially useful if you're using one of the OS distributions with long term support, which in practice means that they have obsolete versions of pretty much everything. The second advantage is that you are not at risk of a package update from your OS distribution blowing up your applications. How much of a real risk you consider this depends on how much trust you place in your OS distribution vendor and what sort of changes they tend to make. Some OSes will happily do major package version changes as the 'simplest' way to fix security issues (or just because a new major version came out and should be compatible); some are much more conservative. With virtualenvs you're isolated from this and you can also take a selective, per-application approach to updates, where some applications are okay with the new version (or are sufficiently unimportant that you'll take the risk) and other applications need to be handled very carefully with a lot of testing.
(I haven't used a full-blown virtualenv, but our single Django app uses a private version of Django because the version of Ubuntu LTS we originally deployed it on had a too-old system version. And yes, tracking Django security updates and so on is kind of a pain.)
PythonDataStructuresProblem written at 01:55:35; Add Comment
Python's data structures problem
Python has a problem with data structures. Well, actually it has two, which are closely related to each other.
The first problem is what I illustrated yesterday, namely that there is generally no point in building nice sophisticated data structures in your Python code because they won't perform very well. Priority heaps, linked lists, all sorts of trees (B+, red-black, splay, etc), they're all nifty and nice but generally there's no point in even trying. Unless things are relatively unusual in your problem you'll be just as fast (if not faster) and write less code by just using and abusing Python's native data types. So what if dictionaries and lists (and a few other things that have been implemented at the C level) aren't quite an exact fit for your problem? They're the best you're going to get.
(I've written about this before, although that was more the general version instead of a Python-focused one.)
In theory it might sense to implement your own data structures anyways because they can efficiently support unusual operations that are important to you. In practice my impression is that the performance difference is generally assumed to be large enough that people don't bother doing this unless simpler and more brute force versions are clearly inadequate.
The second problem is that this isn't really true. Data structures implemented in Python code under CPython are slow but other Python implementations can and do make them fast, sometimes even faster than a similar data structure kludged together with native types. But almost everyone writes for CPython and so they're not going to create these alternate data structures that (eg) PyPy could make fast. In fact sometimes they may kludge up data structures that PyPy et al have a very hard time making go fast; they're fine for CPython but pessimal for other environments.
My view is that this matters if we want Python to ever get fast, because getting fast is going to call for data structures that are amenable to optimization instead of what I've seen called hash-map addiction. But I have no (feasible) ideas for how to fix things and I'm pretty sure that asking people to code data structures in CPython isn't feasible until there's a benefit to it even in CPython.
(This is in part a slow reaction to Jason Moiron's What's Going On.)
LinkedListCost written at 01:36:18; Add Comment
Classic linked lists versus Python's list (array) type
For reasons beyond the margins of this entry, let's consider a classic linked list implemented in Python. Because I feel like a traditionalist today we'll built it out of Lisp-style cons cells, using about the most minimal and lightweight implementation we can do:
class Cons(object): __slots__ = ('car', 'cdr') def __init__(self, car, cdr): self.car = car self.cdr = cdr def __str__(self): return '(%s, %s)' % (self.car, self.cdr)
Now let's ask a question: how does the memory use and performance of this compare to just using a Python list (which is not a linked list but instead an array)? I'm going to look purely at building a 1,000 element list element-by-element and I'm going to allow each implementation to append in whatever order is fastest for it. The code:
from itertools import repeatdef native(size): l =  for _ in repeat(None, size): l.append(0) return l def conslst(size): h = Cons(0, None) for _ in repeat(None, size): h = Cons(0, h) return h
On a 32-bit machine the 1,000 element native list takes 4,512 bytes. A
Cons cell takes 28 bytes (not counting the size of what it points to for
As for timings, the Cons-based list construction for a thousand elements is about a factor of five worse than Python native lists on my test machine (if I have GC running). Creating the Cons objects appears to be very cheap and what matters for the runtime is all of the manipulation that goes on around them. Creating shorter lists is somewhat better, creating longer ones is worse.
(Since I checked just to be sure, I can tell you that a version of
Right now some of my readers are rolling their eyes and telling me that
of course the
Sidebar: how to make the
* * *
Atom feeds are available; see the bottom of most pages.