2012-06-29
The magnitude of the migration to Python 3, illustrated
I was just reading Nick Coghlan's Python 3 Q & A. and ran across this:
Support in enterprise Linux distributions is also a key point for uptake of Python 3. Canonical have already shipped a supported version (Python 3.2 in Ubuntu 12.04 LTS) with a stated goal of eliminating Python 2 from the live install CD for 12.10. A Python 3 stack has existed in Fedora since Fedora 13 and has been growing over time, but Red Hat has not made any public statements regarding the possible inclusion of that stack in a future version of RHEL.
To give some other perspectives on the transition, I'll note that Ubuntu already has a tentative plan to move their Python 2 stack into the community supported universe repositories and only officially support Python 3 for their 14.04 release.
(It should be noted here that based on the release schedule, Ubuntu 14.04 is currently scheduled for April 2014, with a feature freeze probably happening around six months earlier.)
I'm afraid that my reaction to this involves a certain amount of grim laughter. To explain why, let's talk about the magnitude of the effort involved in making this sort of transition for a Linux distribution; in particular, I am going to look at Ubuntu 12.04 and Fedora 17.
On an Ubuntu 12.04 machine more or less configured as a desktop and with
a decent package selection installed, there are around 240 executables
installed that use Python 2, from around 90 different Ubuntu packages,
with over a hundred programs that are directly run by people (ie are in
/bin
, /sbin
, /usr/bin
, or /usr/sbin
). Now some of these are part
of Python or otherwise tied to it but there are fair number that are
not, including some large and important packages like Mercurial.
On my Fedora 17 workstation (which has quite a number of packages) there
are around 310 executables installed that use Python 2, from over 130
packages; over 200 programs are in /usr/bin
or /usr/sbin
. Again
there are large and significant packages involved, including a fair
number of important system management packages (especially yum
, in
many ways the core package management system for Fedora).
But the bad news is not done yet. On both Ubuntu 12.04 and Fedora 17, there are no non-Python packages that use Python 3. Zero. Zip. None. The only Python programs that use Python 3 come from Python 3 itself. And if you want another bit of bad news, neither Fedora 17 nor Ubuntu 12.04 even install Python 3 by default. New, stock installed systems are Python 2 only. This is not a migration that is in progress; this is a migration that hasn't even started yet.
(By the way, as far as I am concerned this means that Ubuntu 12.04 can't be fairly described as 'shipping Python 3', merely as having it available.)
Sidebar: why not installing Python 3 by default matters
Imagine a new Python user on a Fedora or Ubuntu system. The most
convenient version of Python for them to start using is one that's
already on their system (especially the one that's called 'python
').
The more you needs to know and the more you need to do to use another
version, the less likely you are to use it or try it out. Right now, as
a new Python user you have to go out of your way to know about Python 3,
install Python 3, and then use it. By contrast you can use Python 2 by
just typing 'python
'.
(Among other things this affects people who are casually curous about how things are in Python 3. Casual curiosity doesn't survive work.)
Or in short: installing Python 3 by default makes it enticing. Not installing Python 3 by default makes it unenticing.
More about my issues with DTrace's language
In his comment on my entry about why we haven't taken to DTrace, Brendan Gregg wrote in part:
It's been mentioned a few times, but I suspect it would be possible to create a higher level language ("D++") that speaks to libdtrace. An advantage of D being low level is that the user is conscious of how the system is actually getting traced, in the same way that C is low level. [...]
I don't think that this is the case. In fact I think it works the other way around; I doubt very much that people can go from D's limitations to any real understanding how the system is traced, but if you know how DTrace is implemented you can see the bones of this implementation underneath some of D's oddities.
To start with, I will agree that making some things clear is useful and
even important. For example, I think that access to kernel data and
variables should look different than access to user level data (and in
a way that makes access to user level data look more expensive). What
I object to is things that D makes pointlessly difficult, things where
it doesn't support the obvious simple way of doing whatever and forces
you to be indirect. The shining example of this is conditionals. D does
not have any form of an if
statement that you can use in the action
that fires for a particular probe; however, probes themselves can be
conditional, based on an expression. So you're left to fake an if
by writing your entire probe action twice (and yes, I've done this in
DTrace code).
The story I remember hearing about why this limitation exists is that
the DTrace implementation doesn't want to be dynamically allocating
output buffer space as a probe action executes; it wants to allocate the
space once, before the probe's action starts. Well, fine, but if this
is the reason you can deal with it in if
-using D code by allocating
the maximum amount of space the code might need if it followed the most
pessimistic, space consuming path through the conditionals. Alternately,
you could transform if
s by automatically creating multiple specific
probes with probe conditions. Forcing DTrace users to duplicate their
code in order to do this by hand is perverse, or at least an excessive
focus on being literal about how DTrace's internals behave.
(You can argue that it saves users from themselves under some circumstances, for example if a rare condition requires a bunch more buffer space than the common ones. But this is an optimization and generally a premature one.)
Now, this story is clearly not the complete explanation given that
DTrace has plenty of things that certainly look like they create
variable sized output (including an outright ternary ?:
operator).
This pretty much illustrates my point, in that running into this D
constraint hasn't made me any better informed than before about how the
system is actually traced. It's still a black box, it's just a more
frustrating black box.