Some notes on lifting Python 2 code into Python 3 code
We have a set of Python programs that are the core of our ZFS spares handling system. The production versions are written in Python 2 and run on OmniOS on our ZFS fileservers, but we're moving to ZFS-based Linux fileservers, so this code needed a tune-up to cope with the change in environment. As part of our decision to use Python 3 for future tools, I decided to change this code over to Python 3 (partly because I needed to write some completely new Python code to handle Linux device names).
This is not a rewrite or even a port; instead, let's call it lifting
code from Python 2 up to Python 3. Mechanically what I did is similar
to the first time I did this sort of shift,
which is that I changed the '
#!/usr/bin/python' at the start of
the programs to '
#!/usr/bin/python3' and then worked to fix
everything that Python 3 complained about. For this code, there have
only been a few significant things so far:
- changing all tabs to spaces, which I did with
expand(and I think I overdid it, since I didn't use '
print()calls. I learned the hard way to not overlook bare '
- converting '
except CLS, VAR:' statements to the modern form, as this code was old enough to have a number of my old Python 2 code habits.
.sort()s that used comparison functions and figuring out how to creatively generate sort keys that gave the same results. This opened my mind up a bit, although there are still nuances that using sort keys can't easily capture.
list()-ifying most calls of
adict.keys(), because that particular assumption was all over my code. There were a couple of cases that perhaps I could have deferred the list-ification to later (if at all), but this 'lifting' is intended to be brute force.
(I didn't list-ify cases where I was clearly immediately iterating, such as 'for ... in d.keys()' or 'avar = [x for ... in d.keys()]'. But any time I assigned .keys() to a name or returned it, it got list-ified.)
- replace use of optparse with argparse. This wasn't strictly necessary (Python 3 still has optparse), but argparse is the future so I figured I'd fix things while I was working on the code anyway.
Although these tools do have a certain amount of IO, I could get away with relying on Python 3's default character set conversion rules; in practice they should only ever be dealing with ASCII input and output, and if they aren't something has probably gone terribly wrong (eg our ZFS status reporting program has decided to start spraying out binary garbage). This is fairly typical of internal-use system tools but not necessarily of other things, which can expose interesting character set conversion questions.
(My somewhat uninformed view is that character set conversion issues are where moving from Python 2 to Python 3 gets exciting. If you can mostly ignore them, as I could here, you have a much easier time. If you have to consider them, it's probably going to be more porting than just casually lifting the code into Python 3.)
For the most part this 2-to-3 lifting went well and was straightforward. It would have gone better if I had meaningful tests for this code, but I've always had problems writing tests for command line programs (and some of this code is unusually complex to test). I used pyflakes to try to help find Python 3 issues that I'd overlooked; it found some issues but not all of them, and it at least feels less thorough than pychecker used to be. What I would really like is something that's designed to look for lingering Python 2-isms that either definitely don't work in Python 3 or that might be signs of problems, but I suspect that no such tool exists.
(I tried pylint very briefly, but stopped when it had an explosion of gripes with no obvious way to turn off most of them. I don't care about style 'issues' in this code; I want to know about actual problems.)
I'm a bit concerned that there are lingering problems in the code,
but this is basically the tradeoff I get to make for taking the
approach of 'lifting' instead of 'porting'. Lifting is less work
if everything is straightforward and goes well, but it's not as
thorough as carefully reading through everything and porting it
piece by carefully considered piece (or using tests on everything).
I had to stumble over a few
.sort()s with comparison functions
.keys(), especially early on, which has made me
conscious that there could be other 2-to-3 issues I just haven't
hit in my test usage of the programs. That's one reason I'd like a
scanner; it would know what to look for (probably better than I do
right now) and as a program, it would look in all of the code's
PS: I remember having a so-so experience with
2to3 many years in
the past, but writing this entry got me to see what it did to the
Python 2 versions. For the most part it was an okay starting point,
but it didn't even flag uses of
.sort() with a comparison function
and it did significant overkill on list-ifying
Still, reading its proposed diffs just now was interesting. Probably
not interesting enough to get me to use it in the future, though.
When I'll probably be able to use Python assignment expressions
The big recent Python news is that assignment expressions have been accepted for Python 3.8. This was apparently so contentious and charged a process that in its wake Guido van Rossum has stepped down as Python's BDFL. I don't have any strong feelings on assignment expressions for reasons beyond the scope of this entry, but today I want to think about how soon I could possibly use them in my Python code, and then how soon I could safely use them (ie how soon they will be everywhere I care about). The answers to out to be surprising, at least to me (it's probably not to experienced Python hands).
The nominal Python 3.8 release schedule is set out in PEP 569. According to it, Python 3.8 is planned to be released in October of 2019; however, there's some signs that the Python people want to move faster on this (see this LWN article). If Python sticks to the original timing, Python 3.8 might make Ubuntu 20.04 LTS (released in April 2020 but frozen before then) and would probably make the next Fedora release if Fedora keeps to their current schedule and does a release in May of 2020. So at this point it looks like the earliest I'd be able to use assignment expressions is in about two years. If Python moves up the 3.8 release schedule significantly, it might make one Fedora release earlier (the fall 2019 release), making that about a year and a half before I could think about it.
There are many versions of 'can safely use' for me, but I'll pick the one for work. There 'safely use' means that they're supported by the oldest Ubuntu LTS release I need to run the Python code on. We're deploying long-lived Ubuntu 18.04 machines now that will only be updated starting in 2022, so if Python 3.8 makes Ubuntu 20.04 that will be when I can probably start thinking about it, because everything will be 2020 or later. That's actually a pretty short time to safe use as these things go, but that's a coincidence due to the release timing of Python 3.8 and Ubuntu LTS versions. If Python 3.8 misses Ubuntu 20.04 LTS, I'd have to wait another two years (to 2024) unless I only cared about running my code on Ubuntu 22.04.
Of course, I'm projecting things four to six years into the future and that's dangerous at the best of times. We've already seen that Python may change its release timing, and who knows about both Ubuntu and Fedora.
(It seems a reasonably safe guess that I'll still be using Fedora on my desktops over that time period, and pretty safe that we'll still be using Ubuntu LTS at work, but things could happen there too.)
The reason that all of this was surprising to me was that I assumed Python 3.8 was further along in its development if controversial and much argued over change proposals were getting accepted for it. I guess the arguments started well before Python 3.7 was released, which makes sense given the 3.7 release schedule; 3.7 was frozen at the end of January, so everyone could start arguing about 3.8 no later than then.
(The official PEP has an initial date of the end of February, but I've heard it was in development and being discussed before then, just not formalized yet as a PEP.)
PS: If Debian keeps to their usual release schedule, it looks like Python 3.8 on its original schedule would miss the next Debian stable version (Debian 10). It would probably miss it even on an aggressive release schedule that saw Python 3.8 come out only a year after 3.7, since 3.7 was released only a few weeks ago.
Remembering that Python lists can use tuples as the sort keys
I was recently moving some old Python 2 code to Python 3 (due to
a recent decision). This
particular code is sufficiently old that it has (or had) a number
of my old Python code habits, and in
particular it made repeated use of list
.sort() with comparison
functions. Python 3 doesn't support this; instead you have to tell
.sort() what key to use to sort the list.
For a lot of the code the conversion was straightforward and obvious
because it was just using a field from the object as the sort key.
Then I hit a comparison function that looked like this:
def _pricmp(a, b): apri = a.prio or sys.maxint bpri = b.prio or sys.maxint if apri != bpri: return cmp(apri, bpri) return cmp(a.totbytes, b.totbytes)
I stared at this with a sinking feeling, because this comparison function wasn't just picking a field, it was expressing logic. Losing complex comparison logic is a long standing concern of mine, so I was worried that I'd finally run into a situation where I would be forced into unpleasant hacks.
Then I remembered something obvious: Python supports sorting on
tuples, not just single objects. Sorting on tuples compares the
two tuples field by field, so you can easily implement the same
sort of tie-breaking secondary comparison that I was doing in
_pricmp. So I wrote a simple function to generate the tuple
of key fields:
def _prikey(a): apri = a.prio or sys.maxint return (apri, a.totbytes)
Unsurprisingly, this just worked (including the tie-breaking, which actually comes up fairly often in this particular comparison). It's probably even somewhat clearer, and it certainly avoids some potential comparison function mistakes
(It's also shorter, but that's not necessarily a good thing.)
PS: Python has supported sorting tuples for a long time but I don't
usually think about it, so things had to swirl around in my head
for a bit before the light dawned about how to solve my issue.
There's a certain mental shift that you need to go from 'the
function retrieves the key field' to 'the
key= function creates
the sort key, but it's usually a plain field value'.
Revising my view on Python 3 for new code again: you should use it
Almost five years ago, I wrote Reversing my view on Python 3 for new general code: avoid it (which Pete Zaitcev recently reminded me about). I have now reversed my views once again and now I feel that you should definitely use Python 3 for new code. There are three reasons for this, two positive and one negative.
The first positive reason is that the current Python 3 ecosystem is generally vibrant and alive, unlike the state (almost) five years ago. With Python 3 having become a success some time ago, people have been writing Python 3 things and porting things to Python 3 for some time now. For that matter, an increasing number of interesting things are Python 3 only. So today you're pretty unlikely to suffer from ecosystem issues in Python 3; if anything, it's likely that the Python 3 ecosystem is healthier than Python 2's. Certainly if you like interesting new packages that are exploring new ideas and new APIs, you want to be using Python 3.
The second positive reason is that I've come around to feeling that Python 3 has genuine attractions and interesting things, both new language features and improvements in the standard library. This was attractive back in 2016 and it's slowly gotten more so since then. Sometimes Python 3 has even sped itself up (well, CPython, which is what we mostly think of as 'Python'). I suspect that the improvements aren't revolutionary for most people, but they are nice. Also, as I've found out myself, writing Python 3 code is generally not much different than writing Python 2 code, and I certainly haven't found it more annoying.
The negative reason is that time is running out for Python 2 (and even I can see that). We're less than two years away from the official End of Life of Python 2 from the core developers and we're seeing developments like an increasing number of Linux distributions at least trying to either drop or reduce support for Python 2 by then, as LWN has covered (I've got my own views and hopes). The attempts to move away from having Python 2 around or supporting it are likely to ramp up significantly over the next year and a half, both in OS distributions and in major Python projects that still support it (such as Django, where 1.11 is the last version that supports Python 2). If you're going to write new Python 2 code now, you're increasingly going to be staring this abyss in the face unless you're only using systems and projects that you already know will be supporting Python 2 past its official EOL, possibly well past based on your needs.
(This looming abyss is one reason that the Python 3 ecosystem is probably already healthier than the Python 2 one and it's only going to increase as January 1st 2020 looms up on us. One Python version has a future, one doesn't, and you can guess where people are going to increasingly focus.)
I still feel that Python 3's Unicode handling and its interactions with Unix has warts, but I'm also a pragmatist. Those warts lurk in dark corners and most of the time, most of us will never run into them. If your systems are well behaved your code is not going to run into non-UTF-8 command line arguments or filenames or the like, just like most of the time our shell scripts don't run into filenames with newlines in them. More generally, forced character set conversion into and out of Unicode almost always works on modern systems in many circumstances, because modern systems almost always use and have valid UTF-8. The result is that you can write a lot of perfectly functional Python code that basically ignores the issues and assumes you'll never hit a Unicode decoding or encoding error. I certainly have (and it's running fine for us).
The time to be compatible with both Python 2 and Python 3 is past
Overall, the biggest source of issues was not the py3 model, but trying to make the code compatible. I'm not going to do that again if I can help it: either py2 or py3, but not both.
For all that I've had plenty of issues with Python 3, I wholeheartedly agree with Pete Zaitcev's view here; it's time to abandon compatibility with Python 2, especially for programs instead of packages, unless you have a compelling reason otherwise. If you want to move code to Python 3, just do that, don't try to make your code work on both. A clean break will make your life better.
Back in the old days, when Python 3 was just starting to spread, it made sense to be 2/3 cross compatible even if it was a bit of a pain and added odd contortions to your code; not everyone even had decent versions of Python 3 (to the extent that they even existed in the beginning) and there were all sorts of other roadblocks and considerations. But those days are long over. Python 3 is both more capable and more pervasive and most of all it's succeeded, and at this point we're less than two years from the official end of life of Python 2. It's time to put Python 2 out to pasture and move onward, instead of making life hard on ourselves.
(Sometimes you can make code trivially or even accidentally cross compatible and if this happens, sure, keep things that way. What I'm talking about is going to extra effort and adding extra contortions to your code to accommodate both Python 2 and Python 3 people.)
If you want to move a program to Python 3, the modern state of things is that pretty much anyone who wants to use it should be able to do so. If they can't do so because they're on a system that is so old it doesn't have a decent version of Python 3, they've got bigger problems than just your program; sooner or later they're going to have to get a capable Python 3, probably sooner. For packages, well, we're less than two years from Python 2 EOL so anyone who is stuck with Python 2 only packages has a problem that goes well beyond being unable to use your new Python 3 only version.
(If they just haven't gotten around to moving their code to Python 3, perhaps your package will be just the push they need. But probably not; I suspect that a lot of people with Python 2 programs and systems have basically frozen them at this point.)
If you have to run your code on a system or in an environment without a good Python 3, that's one thing. If you're being paid to make it work on both versions, for whatever reasons, well, you're being paid for it. But otherwise? If you're going to change code to run on Python 3, it's time to let Python 2 go, and I say that as someone who still is unhappy about how the whole Python 2/3 transition was done (or is still being done).
PS: As far as Python 2 code goes, if you have existing code and you want or need to keep it running on Python 2, don't bother trying to make it also run on Python 3; wait until you can make a clean break with Python 2. In my view the same is true for new Python 2 code, but if you're writing new Python 2 code at this point you know your own situation best; it may be that your new code will have to live on past your transition from 2 to 3 and making it 3-compatible from the start will be better and less work than porting it at some point.
Python modules use operator overloading in two different ways
In Python (as in elsewhere), there are at least two different things that people use operator overloading for. That there's more than one thing makes a difference because some patterns of designing how operator overload work aren't sufficiently general to handle both things; if you want to serve both groups, you need to design a more general mechanism than you might expect, one that delegates more power to objects.
The first use of operator overloading is to extend operators so that they work (in the traditional ways) on objects that they wouldn't normally work on. The classical examples of this is complex numbers and rational numbers (both of which Python has in the standard library), and in general various sorts of things built with numbers and numeric representations. However you can go beyond this, to objects that aren't strictly numeric but which can use at least some of the the traditional numeric operators in ways that still obey the usual rules of arithmetic and make sense. Python sets implement some numeric operations in ways that continue to make sense and are unsurprising.
The second use is to simply hijack the operations in order to do
something convenient for your objects with a handy symbol for it.
Sometimes these operations are vaguely related to their numeric
equivalents (such as string multiplication, where
"a" * 4 gets
"aaaa"), but sometimes they have nothing to do with it. The
classic example of the latter is the string
% operator, which has
nothing at all to do with arithmetic but instead formats a string
% formatting codes. Using the
% operator for this is
certainly convenient and it has a certain mnemonic value and neatness
factor, but it definitely has nothing to do with
%'s normal use
Now, let us consider the case of Python not allowing you to overload boolean AND and OR. In a comment on that entry, Aneurin Price said:
I'm not at all convinced by this argument. My expectation for this hypothetical
__band__is that it would be called after evaluating a and finding it truthy, at which point b is evaluated either way. [...]
This is definitely true if you think of operator overloading as only for the first case. But, unfortunately for the design of overloading AND and OR, this is not all that people would like to use it for. My understanding is that ORMs such as Django's and SQLAlchemy would like to intercept AND and OR in order to build up complicated conditional SQL queries with, essentially, a DSL based on Python expressions. In this DSL, they would like to be able to write something like:
Q.descfield.startswith("Who") or Q.descfield.startswith("What")
This wouldn't evaluate or produce any sort of truth value; instead it
would produce an object representing a pending SQL query with a
clause that encoded this OR condition. Later you'd execute the SQL query
to produce the actual results.
If operator overloading for AND and OR paid any attention to the nominal truth value of the left expression, there is no way to make this work. Instead, allowing general overloading of AND and OR requires allowing the left side expression to hijack the process before then. In general, operator overloading that allows for this sort of usage needs to allow for this sort of early hijacking; fortunately this is generally easy for arithmetic operators.
(I'm not sure Python has truly general support for mixing unusual numerical types together, but then such general support is probably very hard to implement. I think you want to be able to express a compatibility table, where each type can say that its overloads handle certain other types or types that have certain properties or something. Otherwise getting your rational number type to interact well with my Point type gets really complicated really fast, if not impossible.)
One reason why Python doesn't let you overload the boolean AND and OR operations
Recently I read Kurt Rose's DISappearing and (via
Planet Python), where Kurt noted that Python doesn't have
methods that let you override boolean
or operations on
your class objects. As it happens, there's a really good reason for
this, which is that Python would require a new fundamental data type
in order to make it really work.
or have the extremely valuable property of
short-circuiting evaluation, where if you write, say, '
a() evaluates to false, Python will not even call
Let's imagine a hypothetical world in which Python allows you to
do this overriding and the boolean operators still preserve this
short circuiting. As usual, if you write '
a and b', this will (at
least some of the time) translate into a call to the override method
a, let's call it
__band__, and the
__band__ method will
receive an additional argument that represents the right hand side:
class AClass: def __band__(self, right): ....
Now here is the big question: what's the type of
right in this
right is the value we get from evaluating
the right hand side expression; if you write '
a & b()', this is
roughly the same as
a.__and__(b()). However this can't be the
__band__, because that would mean no more short-circuiting;
a had a
__band__ method, writing
a and b() would call
b() all of the time. To preserve short-circuiting,
to be some type that represents the right hand side expression in
an un-evaluated form.
However, Python has no such type today. Closures sort of come close,
but they create additional effects and do things like appear in Python
exception backtraces. This means that adding override methods for
boolean operations would require either discarding short-circuiting (and
right be the evaluation result) or figuring out and introducing
a new, relatively complex type in Python just to support this.
(Continuations are sort of what you'd need but I think they're not quite what you want, or at least you need a continuation that captures only the right side expression.)
The other problem of such a
right type is that you'd want to be
able to peer inside it relatively easily. After all, the entire
purpose of implementing your own
__band__ method is so that you
can do something different from a plain boolean
and when the right
hand side is some special thing. If all you're going to do is:
def __band__(self, right): if not bool(self): return False else: return right.eval()
then there's not really any point in having a
__band__ at all,
especially given the general complexity involved in Python as a
(This is of course not necessarily the only reason for Python to fence off boolean operations as things that you absolutely can't override. You can certainly argue that they should be inviolate and not subject to clever redefinitions simply as a matter of principle.)
I'm hoping that RHEL 8's decision on Python 2 isn't Ubuntu 20.04's decision
I recently wrote about whether Ubuntu 20.04 will include Python 2, and threw in a sidebar about it in RHEL 8. Thanks to comments from Twirrim and Seth (on this entry), I then found out that Red Hat has recently announced that they won't won't be including Python 2 in Red Hat Enterprise Linux 8. This is in a way very useful to know, because we'd like to build some systems with RHEL 8 in the not too distant future if we can and those systems will need to run some Python-based system management tools. Since RHEL 8 won't include Python 2, I'd better start thinking about how to make these tools Python 3.
However, despite the news about Python 2 in RHEL 8, I remain reasonably optimistic that Python 2 will be in Ubuntu 20.04 (which would be very convenient for us, due to the relatively short time to 20.04). This is because I think there are a number of significant differences between the situation Red Hat finds themselves in with RHEL 8 and the situation Ubuntu will be in with Ubuntu 20.04.
Red Hat ships only a limited number of carefully curated packages in 'Red Hat Enterprise Linux' and then strongly supports them for a quite long time (so far for ten years, so RHEL 8 is expected to be supported through at least 2028). Red Hat is clearly willing to either change or remove packages that would normally depend on Python 2, and they have the manpower (and small package set) to make this feasible and presumably not too disruptive to what RHEL users expect (ie, not removing too many packages).
By contrast, Ubuntu has a shorter support period (20.04 will be supported only to early 2025), ships significantly more packages even in their nominally fully supported package set, supports them to a lesser extent, relies much more on upstream Debian packaging efforts, and has an escape hatch in the form of the officially less supported and much larger 'universe' package set. I'm not sure how Debian is doing in their efforts to push Python 2 out, but my impression is that it hasn't been going very fast (as with basically all large scale changes of this nature in Debian). All of this makes it both less of a burden for Ubuntu 20.04 to include Python 2 and probably more disruptive to not do so (with more excluded packages and also surprised users). As a result, I expect Ubuntu 20.04 to include Python 2 at least in their broad 'universe' package set.
(Red Hat doesn't have a formal equivalent of the Ubuntu 'universe' package set, but RHEL does have a rough functional equivalent in EPEL. It's possible that Python 2 for RHEL 8 could wind up being packaged in EPEL, at least for a while.)
PS: It'll be interesting to see if there's a
RHEL 8 is when it comes out, or if there's only a
think my personal preference is for there to be no
but that's biased by having multiple systems and wanting things
that expect Python 2 to immediately fail on RHEL 8 with a clear
error rather than exploding mysteriously.
Sidebar: My guess at Ubuntu's path to removing Python 2
Ubuntu doesn't just do LTS releases every two years; they also do regular releases every six months. These releases are both an early signal of what will be in a future LTS release and a chance for Ubuntu to start making gradual changes. Since completely dropping Python 2 from one release to another would be quite disruptive, what I expect Ubuntu to do instead is first move it (and all of the packages that depend on it) to the 'universe' package set. This would effectively start the clock running on its actual removal in some later release, and also give people like me some advance warning about it.
(I believe that packages can be moved this way without causing heartburn to people upgrading from one release to the next, but I may be wrong.)
Our real problem with a removal of Python 2 is likely to be our users
In my recent entry on whether Ubuntu 20.04 LTS will include Python 2, I mentioned that this mattered because we have various system management tools written in Python 2, so if Python 2 was not going to be in 20.04, we'd need to start porting them so they'd be ready in time. Unfortunately, this need to port our system tools is probably not going to be the most painful part of the day that Ubuntu ships without Python 2. Instead, the real problem is our users. More specifically, the problem is all of the Python (2) programs that our users will have written over the years and still use and need.
Well, let me rephrase that. I shouldn't say 'users'; I should say 'the graduate students and professors of the department' (and also researchers, postdocs, undergrads doing research work with professors, visiting collaborators, and so on). Unusually for the modern world, we provide general multiuser computing to all of these people, and so these people log on to those Ubuntu-based servers and do whatever they want (more or less) with the things that Ubuntu provides. Some of these people write Python programs, and some of them are probably Python 2 programs. When Python 2 goes away, those programs are going to break.
(They will also probably break if
/usr/bin/python turns into
Python 3, which is one reason I hope Ubuntu doesn't do that any
time soon. There being no
/usr/bin/python is less confusing and
easier to explain to angry users than '
python is now incompatible
with what it was a week ago, sorry about that'.)
A few of these people are probably avid Python users and already know about Python 3. Of course, these people are probably already writing everything in Python 3, so they're unaffected by this. Many more of these people probably don't know about Python 3 for various reasons, including that their real work is writing a thesis or doing research, not knowing about developments in the programming language that they're working in. To add to the difficulty, we don't even know who they are (and I'm not sure how we'd find out, unless there is some very lightweight and non-intrusive way of instrumenting our systems to gather data when Python 2 gets run).
(Since we can't currently give our users any definitive information on when they won't have Python 2, it's also not very useful to reach out to them right now. Many of our users would rightfully consider rewriting things from Python 2 to Python 3 to be a distraction from their thesis or their research, and for that matter they may only need their Python 2 programs for a relatively limited time.)
The basically inevitable result of this is that we're likely to be forced to install Python 2 for backward compatibility for at least one LTS generation after Ubuntu drops it officially. Hopefully there will be people packaging Python 2.x as a PPA, which is the most convenient option on Ubuntu. The possible exception for this would be if Ubuntu gave everyone a significant amount of advance warning, for example if they announced before 20.04 that it would be the last release that included any version of Python 2 in the normal Ubuntu repositories. Then we could at least start trying to reach users, not that I expect us to be very successful at it.
PS: We're definitely not going to ever change
print a warning about the situation (for many reasons including
that warnings are often fatal errors in practice), and I'm pretty sure we'd never alter
it to syslog things when it starts. Any method of acquiring information
about when Python 2 gets run needs to be entirely external.
The interesting question of whether Ubuntu 20.04 LTS will include Python 2
It's 2018, which means that 2020's end of Python 2 support is only two years away. Two years seems like a long time, but it's not really, especially if you're not a full time developer or Python person, which is our situation. One of the questions about what we have to do about our current set of Python programs boils down to the question of whether Ubuntu's very likely April 2020 Long Term Support release (Ubuntu 20.04) will include Python 2.
So far, Ubuntu has done LTS releases every two years in April; 10.04, 14.04, 16.04, and now the impending 18.04. If they follow this pattern, they will release the next LTS in April of 2020, after Python 2's end of life (which the Python people say is January 1st 2020), and if we follow our usual practices, we'll begin using Ubuntu 20.04 on some systems that summer and autumn. These systems will need to run our Python system management tools, which means that if Ubuntu 20.04 doesn't include Python 2, we need to have our tools running on Python 3 before then.
(Of course it might be a good idea to port our tools to Python 3 now, but there's a difference between being prepared and being forced. This is especially important when you have to prioritize various things you could be working on, which is generally our situation.)
Since Ubuntu 20.04 will be released after the Python 2 cutoff date, in theory it could drop Python 2 on the grounds that it's no longer supported by the upstream developers. However, in practice there are two issues. First, it seems very likely that Python 2 will be supported by other people if problems emerge, because there are other long term Linux distributions that are already committed to supporting systems with Python 2 past 2020 (for example, Red Hat Enterprise Linux 7, which will be supported through 2024, and then there's Ubuntu 18.04 itself, which will be supported through 2023). Second, it's not clear that all packages that currently use Python 2 will be updated to Python 3 in time for 2020 (see eg). Ubuntu could choose to throw Python 2 out anyway and to hell with any packages that this forces out, but that might not be very popular with people.
The current state of Ubuntu 18.04 is that Python 2.7 will be available in the 'main' package repository, directly supported by Ubuntu. One possible option for 20.04 is that Python 2.7 would be available but would be demoted to the community supported 'universe' package repository, which theoretically gives you lower expectations of bug and security fixes. This would give Ubuntu an option to shrug their shoulders if some serious issue comes up after 2020 and no one steps forward to fix it.
Probably the safest option for us is to begin moving our tools to Python 3, but likely not until 2019. If we started now, I'd have to make them compatible with Ubuntu 14.04's Python 3.4.3; if I wait until we've migrated all of our 14.04 machines to 18.04, I get to base everything on Ubuntu 16.04's somewhat more recent 3.5.2.
(Using 3.5 as the base could be potentially important, since the
3.5 changes brought
in formatting for bytes and better handling for character encoding
sys.stdout, both of which might be
handy for our sysadmin-focused uses of Python.)
Sidebar: Red Hat Enterprise Linux 8 and Python 2
Unlike Ubuntu, Red Hat hasn't officially announced any timing or formal plans for RHEL 8. However, a new version of RHEL is due (based on RH's traditional timing) and there are some signs that one is in preparation, probably for release this summer. I can't imagine such a version not including Python 2, which means that Red Hat would likely be committed to supporting it through 2028.
This isn't necessarily a big burden, because it's my opinion that we're unlikely to find any serious issues in Python 2.7 after 2020. This is especially so if people like Red Hat make a concerted effort to find any remaining 2.7 problems before the official end of support, for example by extensively running fuzzing tools against 2.7 or by paying for some security auditing of Python's SSL code (or doing it themselves).