Wandering Thoughts

2018-01-09

Differences between keywords and constants in Python

Yesterday I wrote about the challenges of having true constants in Python and said that there are other mechanisms that achieve basically the same results if all we care about are a few builtin values like True, False, and None. The most straightforward way is what was done with None in Python 2 and eventually done with True and False in Python 3, which is to make them into keywords. This raises the obvious question, namely why the Python people waited until Python 3 to make this change. One way of starting to answer this is to ask what the difference is (or would be) between Python keywords and hypothetical true constants (or just the ordinary 'constants' Python 2 has today for True and False).

If you look in Python's language documentation in the keywords section, you sort of get an answer:

The following identifiers are used as reserved words, or keywords of the language, and cannot be used as ordinary identifiers. [...]

(Emphasis mine.)

A keyword cannot be used as an identifier in any context, not merely as a variable (whether global to a module or even local to a function). If you try to define the following class in Python 3, you'll get a syntax error:

class example:
  def __init__(self):
    self.True = 10

If you try harder, you can nominally create the instance attribute (either by directly setting it in self.__dict__ or by naming it in __slots__), but then you have no way of getting access to it as an attribute, since writing obj.True in any context gets you a syntax error.

(By extension, you can't have a method called True or False either.)

Our hypothetical true constants would not be so restricted. A constant would be unchangeable in its namespace, but it certainly wouldn't block the use of its name as an identifier in general in other contexts. You probably shouldn't give user-created names that much power (and the idea is a bad fit for Python's semantics anyway, with no obvious way to implement it).

Given this, we can look at some issues with making True and False into keywords in Python 2.

To start with, it's unlikely that someone was using either True or False as the name of a field in a class but it's not impossible. If they were and if some version of Python 2 made True and False into keywords, that code would immediately fail to even start running. Although I don't know for sure, I suspect that Python 2 had no infrastructure that would have let it report deprecation warnings in advance for this, so it probably would have been an abrupt change.

However, this is a pretty esoteric reason and there's a much more pragmatic one, illustrated by the the example that Giedrius Statkevičius reported at the end of his article. The pymodbus module defined True and False in its __init__.py, not because it was worried about other people overriding them, but because at one point it wanted to support older Python versions while still using them:

# Define True and False if we don't have them (2.3.2)
try:
    True, False
except NameError:
    True, False = 1, 0

(A later version changed the values to be the result of boolean comparisons.)

If True and False had been created as keywords, there would be no way to use them and be backwards compatible with versions of Python 2 before they were defined. If they're keywords, merely writing a line that says 'True = (1 == 1)' is a syntax error when the module is imported or otherwise used, even if the line is never executed. You have no good way to define your own versions of them in Python versions where they're not supported (technically there is one way, but let's not go there), which means that you can't use them at all until you're willing to completely abandon support for those older Python versions. Forcing people to make this choice right up front is not a good way to get new features used; in fact, it's a great way for a story to spread through the community of 'oh, you can't use True and False because ...'. This is counterproductive, to put it one way.

Python 3 can make this sort of change because Python 3 was already making incompatible changes; in fact, making incompatible changes is its entire point. Python 2 was not in a good position to do it. Thus, I suspect that this is the major reason that Python 2 didn't make True and False into keywords but instead just put them into the builtins namespace as values.

KeywordsVsConstants written at 01:25:32; Add Comment

2018-01-07

The challenges of having true constants in Python

In his article What is Actually True and False in Python?, Giedrius Statkevičius takes Python to task for not having genuine constants in the language and making True and False into such constants, instead leaving them as changeable names in the builtins module. This provides a convenient starting point for a discussion of why having true constants in Python is a surprisingly difficult thing.

One of the big divisions between programming languages is what variable names mean. Many languages are what I will call storage languages, where a variable is a label for a piece of storage that you put things in (perhaps literally a chunk of RAM, as in C and Go, or something more abstracted, as in Perl). Python is not a storage language; instead it's what I'll call a binding language, where variables are bindings (references) to anonymous things.

In a storage language, variable assignment is copying data from one storage location to another; when you write 'a = 42' the language will copy the representation of 42 into the storage location for a. In a binding language, variable assignment is establishing a reference; when you write 'a = 42', the language makes a a reference to the 42 object (this can lead to fun errors).

One result of that is that constants are different between the two sorts of languages. In a storage language what it means to make something a simple constant is relatively straightforward; it's a label that doesn't allow you to change the contents of its storage location. In a binding language, a constant must be defined differently; it must be something that doesn't allow you to change its name binding. Once you set 'const a = 42', a will always refer to the 42 object and you can't rebind it.

In Python, what names are bound to is not a property of the name, it is instead a property of the namespace they are in (which is part of why del needs to be a builtin). This means that in order for Python to have true constants, the various namespaces in Python would need to support names that cannot be re-bound once created with some initial value. This is certainly possible, but it's not a single change because there are at least three different ways of storing variables in Python (in actual dicts, local variables in functions, and __slots__ instance variables) and obviously all of them would need this.

You also need some way to support reloading modules, because this normally just runs all of the new module's code in the existing namespace. People will be unhappy if they can't change the value of a module level constant by reloading the module with a new version, or even convert a constant into an ordinary variable (and they'd be unhappier if they can't reload modules with constants at all).

Because the namespace of builtins is special, it would probably not be all that difficult to support true constants purely for it. In theory this would give you constants for True and False, but in practice people can and normally will create module-level versions of those constants with different values. In fact this is a general issue for any builtin constants; if they're supposed to genuinely be constants, you probably don't want to let people shadow them at the module level (at least). This requires more magic for all of the various ways of writing names to module level globals.

One more complication is that Python likes to implement this sort of thing with a general feature, instead of specific and narrowly tailored code. Probably the most obvious general way of supporting constants would be to support properties at the module level, not just in classes (although this doesn't solve the shadowing problem for builtin constants and you'd need an escape for reloading modules). However, there are probably a bunch of semantic side effects and questions if you did this, in addition to likely performance impacts.

(Any general feature for this is going to lead to a bunch of side effects and questions, because that's what general features do; they have far-reaching general effects.)

There's also a philosophical question of whether Python should even have true user-defined constants. Python is generally very much on the side that you can monkey-patch things if you really want to; any protections against doing so are usually at least partially social, in that you can bypass them if you try hard. Genuinely read-only names at the module level seem a violation of that, and there are other mechanisms if all we really care about are a few builtin values like True, False, and None.

(Why Python 2 didn't use such mechanisms to make True and False into 'constants' is another entry.)

Sidebar: Straightforward constants versus full constants

So far I've been pretending that it's sufficient to stop the name binding from changing in order to have a constant (or the storage location for storage languages). As Python people know full well, this is not enough because objects can mutate themselves if you ask them to (after all, this is the difference between a list and a tuple).

Suppose that Python had a magic const statement that made something a constant from that point onward:

alist = [1, 2, 3]
const alist

Clearly this must cause 'alist = 10' to be an error. But does it stop 'alist.pop()', and if so how (especially if we want it to work on arbitrary user-provided objects of random classes)?

One plausible answer is that const should simply fail on objects that can't be dictionary keys, on the grounds that this is as close as Python gets to 'this object is immutable'. People who want to do things like make a dict into a constant are doing something peculiar and can write a custom subclass to arrange all of the necessary details.

(Or they can just make their subclass lie about their suitability as dictionary keys, but then that's on them.)

ChallengesOfConstants written at 18:38:03; Add Comment

2018-01-06

What's happening when you change True and False in Python 2

Today I read Giedrius Statkevičius' What is Actually True and False in Python? (via), which talks about the history of how True and False aren't fixed constants until Python 3 and thus how you can change them in Python 2. But what does it really mean to do this? So let's dive right in to the details in an interactive Python 2 session.

As seen in Statkevičius' article, reversing True and False is pretty straightforward:

>>> int(True)
1
>>> True, False = False, True
>>> int(True)
0

Does this change what boolean comparisons actually return, though?

>>> int((0 == 0) == True)
0
>>> (0 == 0) == True
False
>>> (0 == 0) == False
True
>>> (0 == 0) is False
True

It doesn't, and this is our first clue to what is going on. We haven't changed the Python interpreter's view of what True and False are, or the actual bool objects that are True and False; we've simply changed what the names True and False refer to. Basically we've done 'fred, barney = False, True' but (re)using names that code expects to have a certain meaning. Our subsequent code is using our redefined True and False names because Python looks up what names mean dynamically, as the code runs, so if you rebind a name that rebinding takes immediate effect.

This is also why the truth values being printed are correct; the bool objects themselves are printing out their truth value, and since that truth value hasn't changed we get the results we expect:

>>> True, False
(False, True)

But what names have we changed?

>>> (0 == 0) is __builtins__.True
True
>>> True is __builtins__.False
True
>>> globals()["True"]
False

This tells us the answer, which is that we've added True and False global variables in our module's namespace by copying False and True values from the global builtins. This means that our redefined True and False are only visible in our own namespace. Code in other modules will be unaffected, as we've only shadowed the builtin names inside our own module.

(An interactive Python session has its own little module-level namespace.)

To see that this is true, we need a tst helper module with a single function:

 def istrue(val):
     if val == True:
        print "Yes"
     else:
        print "No"

Then:

>>> import tst
>>> tst.istrue(True)
No
>>> tst.istrue(0 == 0)
Yes

But we don't have to restrict ourselves to just our own module. So let's redefine the builtin versions instead, which will have a global effect. First, let's clear out our 'module' versions of those names:

>>> del True; del False

Then redefine them globally:

>>> __builtins__.True, __builtins__.False = (0 == 1), (0 == 0)
>>> (0 == 0) is True
False

We can verify that these are no longer in our own namespace:

>>> globals()["True"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'True'

We reuse our helper module to show that we've now made a global change:

 >>> tst.istrue(0 == 0)
 No

But of course:

 >>> tst.istrue(True)
 Yes

Changing __builtins__.True has changed the True that all modules see, unless they deliberately shadow the builtin True with their own module-level True. Unlike before, True now means the same thing in our interactive session and in the tst module.

Since modules are mutable, we can actually fix tst.istrue from the outside:

 >>> tst.True = (0 == 0)
 >>> tst.istrue(0 == 0)
 Yes
 >>> tst.True
 True

Now the tst module has its own module-global True name with the correct value and tst.istrue works correctly again. However, we're back to a difference in what True means in different modules:

>>> tst.istrue(True)
No
>>> False is tst.True
True

(Since our interactive session's 'module' has no name binding for False, it uses the binding in the builtins, which we made point to the True boolean object. However tst has its own name binding for True, which also points to the True boolean object. Hence our False is tst's True. Yes, this gets confusing fast.)

As noted in Statkevičius' article, Python only ever has two bool objects, one True and one False. These objects are immutable (and known by the CPython interpreter), and so we can't change the actual truth value of comparisons, what gets printed by the bool objects, and so on. All we can do is change what the names True and False mean at various levels; in a function (not shown here), for an entire module, or globally through the builtins.

(Technically there's a few more namespaces we could fiddle with.)

As a side note, we can't subclass bool to make a thing that is considered a boolean yet has different behavior. If we try it, CPython 2 tells us:

TypeError: Error when calling the metaclass bases
    type 'bool' is not an acceptable base type

This is an explicitly coded restriction; the C-level bool type doesn't allow itself to be subclassed.

(Technically it's coded by omitting a 'this can be a base type' flag from the C-level type flags for the bool type, but close enough. There are a number of built-in CPython types that can't be subclassed because they omit this flag.)

We can change the True and False names to point to non-bool objects if we want. If you take this far enough, you can arrange to get interesting errors and perhaps spectacular explosions:

>>> __builtins__.False = set("a joke")
>>> (0 != 0) == False
False
>>> d = {}
>>> d[False] = False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'

For maximum fun, arrange for True and False to be objects that are deliberately uncomparable and can't be converted to booleans (in Python 2, this requires raising an error in your __eq__ and __nonzero__ methods).

(I've used False here because many objects in Python 2 are considered to be boolean True. In fact, by default almost all objects are; you have to go out of your way to make something False.)

ChangingTrueDetails written at 20:46:42; Add Comment

2017-12-29

To get much faster, an implementation of Python must do less work

Python, like many other dynamically typed languages, is both flexible and what I'll call mutable. Python's dynamic typing and the power it gives you over objects means that apparently simple actions can unpredictably do complex things.

As an example of what I mean by this, consider the following code:

def afunc(dct, strct, thing2):
  loc = dct["athing"] + thing2
  return loc + strct.attr

It's possible and perhaps even very likely that this Python code does something very straightforward, where dct is a plain dictionary, strct is a plain object with straightforward instance attributes, and all the values are basic built-in Python types (ideally the same type, such as integers) with straightforward definitions of addition. But it's also possible that dct and strct are objects with complex behavior, dct["athing"] winds up returning another complex object with custom addition behavior, and thing2 is another complex object with its own behavior that will come into effect when the 'addition' starts happening. In addition, all of this can change over time; afunc() can be called with different sorts of arguments, and even for the same arguments, their behavior can be mutated between calls and even during a call.

A straightforward implementation of Python is going to go through checking for all of these possibilities every time through, and it's going to generate real Python objects for everything, probably including intermediate forms and values. Even when strct is really just a plain object that has some fields but no methods or other behavior, a Python implementation is probably going to use a generic, broad implementation (even __slots__ is fairly general here; a lot of things still happen to look up slot values). All of this is work, and all of this work takes time. Even if some of this work is done inefficiently today in any particular implementation, there is a limit to how much it can be improved and sped up.

This leads to a straightforward conclusion: to really get faster, a Python implementation must do less work. It must recognize cases where all of this flexibility and mutability is not being used and then skip it for more minimal implementations that do less.

The ultimate version of this is Python recognizing, for example, when it is only working with plain integers in code like this:

def func2(upto, alist):
  mpos = -1
  for i in range(upto):
     if (alist[i] % upto) == 0:
        mpos = i
  return mpos

If upto and alist have suitable values, this can turn into pretty fast code. But it can become fast code only when Python can do almost none of the work that it normally would; no iterator created by range() and then traversed by the for loop, no Python integer objects created for i and the % operation, no complex lookup procedure for alist (just a memory dereference at an offset, with the bounds of alist checked once), the % operation being the integer modulo operation, and so on. The most efficient possible implementations of all of those general operations cannot come close to the performance of not doing them at all.

(This is true of more or less all dynamic languages. Implementation tweaks can speed them up to some degree, but to get significant speed improvements they must do less work. In JIT environments, this often goes by the term 'specialization'.)

FasterPythonMustDoLess written at 01:37:56; Add Comment

2017-12-14

How Python makes it hard to write well structured little utilities

I'll start with the tweets, where I sort of hijacked something glyph said with my own grump:

@glyph: Reminder: even ridiculous genius galaxy brain distributed systems space alien scientists can't figure out how to make and ship a fucking basic python executable. Not only do we need to make this easy we need an AGGRESSIVE marketing push once actually viable usable tools exist. <link>

@thatcks: As a sysadmin, it’s a subtle disincentive to writing well structured Python utilities. The moment I split my code into modules, my life gets more annoying.

The zen of Python strongly encourages using namespaces, for good reasons. There's a number of sources of namespaces (classes, for example), but (Python) modules are one big one. Modules are especially useful in their natural state because they also split up your code between multiple files, leaving each file smaller, generally more self-contained, and hopefully simpler. With an 'everything in one file' collection of code, it's a little too easy to have it turn mushy and fuzzy on you, even if in theory it has classes and so on.

This works fine for reasonable sized projects, like Django web apps, where you almost certainly have a multi-stage deployment process and multiple artifacts involved anyway (this is certainly the case for our one Django app). But speaking from personal experience, it rapidly gets awkward if you're a sysadmin writing small utility programs. The canonical ideal running form of a small utility program is a single self-contained artifact that will operate from any directory; if you need it somewhere, you copy the one file and you're done.

(The 'can be put anywhere' issue is important in practice, and if you use modules Python can make it annoying because of the search path issue.)

One part of this awkwardness is my long standing reluctance to use third-party modules. When I've sometimes given in on that, it's been for modules that were already packaged for the only OS where I intended to use the program, and the program only ever made sense to run on a few machines.

But another part of it is that I basically don't modularize the code I write for my modest utilities, even when it might make sense to break it up into separate little chunks. This came into clear view for me recently when I wound up writing the same program in Python and then Go (for local reasons). The Python version is my typical all in one file small utility program, but the Go version wound up split into seven little files, which I think made each small concern easier for me to follow even if there's more Go code in total.

(With that said, the Go experience here has significant warts too. My code may be split into multiple files, but it's all in the same Go package and thus the same namespace, and there's cross-contamination between those files.)

I would like to modularize my Python code here; I think the result would be better structured and it would force me to be more disciplined about cross-dependencies between bits of the code that really should be logically separate. But the bureaucracy required to push the result out to everywhere we need it (or where I might someday want it) means that I don't seriously consider it until my programs get moderately substantial.

I've vaguely considered using zip archives, but for me it's a bridge too far. It's not just that this requires a 'compilation' step (and seems likely to slow down startup even more, when it's already too slow). It's also that, for me, packing a Python program up in a binary format loses some of the important sysadmin benefits of using Python. You can't just look at a zip-packaged Python program to scan how it works, look for configuration variables, read the comment that tells you where the master copy is, or read documentation at its start; you have to unpack your artifact first. A zip archive packed Python utility is less of a shell script and more of a compiled binary.

(It also seems likely that packing our Python utility programs up in zip files would irritate my co-workers more than just throwing all the code into one plain-text file. Code in a single file is just potentially messy and less clear (and I can try to mitigate that); a zip archive is literally unreadable as is no matter what I do.)

UtilityModularityProblem written at 17:45:09; Add Comment

2017-11-27

Code stability in my one Django web application

We have one Django web application, a system for automating the handling of much of our new Unix account requests. It was started in early 2011 (using Django 1.2) and I did a retrospective at the end of 2014 where I called it a faithful web app, one that had just kept on quietly working without problems. That's continued through to today; the app needs no routine attention, although every so often I tweak it to better handle an obscure situation.

One of the interesting aspects of that quiet stability is the relative stability of the application's Python code over those nearly six years so far. There are web frameworks where in six years you'd need to significantly rework and restructure your code to deal with changing APIs and approaches. For us, Django hasn't been one of them. Although we're not quite current on Django versions, we're not that far back, yet much of the code is basically the same (or literally the same) as it started out all those years ago. I'm pretty sure that almost all of our model and view code is untouched over that time, and I think a lot of our templates are untouched or only minorly changed.

However, this is not a complete picture of code churn in our app, because there have been Django changes over that time in areas such as routing, command argument processing, template processing, and project structure. These changes have forced code changes in the areas of our app that deal with such things (and the change in project structure eventually forced a massive renaming of files when we went to Django 1.9). While this sounds kind of bad, I've wound up considering all of them to be relatively peripheral. In a way, all of the code involved is plumbing and glue. None of it really touches the heart of our web application, which (for us) lives mostly in the models and views and somewhat in the core logic of the templates. Django has been very good about keeping that core code from needing any substantive changes. We still validate form submissions and generate views and process model data in basically the same way we did in 2011, and all of that is what I think of as the hard stuff.

(Although I haven't measured, I think also it's most of the app's code by line count.)

This code stability is one reason why Django upgrades have been somewhat painful but not deeply painful. If we'd needed major code restructuring, well, I'd probably have done it eventually because we might have had no choice, but we'd have likely updated Django versions more sporadically than we have so far.

PS: Although Django is going from version 1.11 to version 2.0 in the next release, the Django people say that this shouldn't be any more of an upgrade than usual. And speaking of that. I should get working on updating us to 1.11, since security updates for 1.10 will end soon (if they haven't already).

DjangoAppCodeStability written at 23:13:02; Add Comment

2017-11-05

How collections.defaultdict is good for your memory usage

There is a classical pattern in code that uses entries in dictionaries to accumulate data. In the simplest form, it looks like this:

 e = dct.get(ky, None)
 if e is None:
    e = []
    dct[ky] = e

 # now we work on e without
 # caring if it's new or old

There is an obvious variation of this that gets rid of the whole bureaucracy involving the if:

e = dct.setdefault(ky, [])
# work on e

On the surface, this looks very much like what you get with collections.defaultdict. At this level you might reasonably think that defaultdict is just a convenience, giving you a slightly shorter and nicer way to write this code so you don't have to do either the if or use .setdefault() instead of just doing a simple dct[ky]. However, there's an important way that both defaultdict and the if-based version are better than the .setdefault() version.

To see it, let's change what the individual elements are:

e = dct.setdefault(ky, ExpensiveItem())
....

When I write things this way, the problem may jump out right away. The issue with this version is that we always create a new ExpensiveItem object regardless of whether ky is already in dct. If ky is not in dct, we use the new object and all is good, but if there already is one, we throw away the new object we created. If we're dealing with a lot of keys that already exist, this is a lot of objects being created and then immediately thrown away. Both the if-based version and defaultdict avoid this problem because they only create a new object if and when they actually need it, and a defaultdict version is just as short as the .setdefault() version.

(The other subtle advantage of defaultdict is that you specify the default item only once, when you create the dictionary, instead of having to duplicate it in every section of code where you need to do this update-or-add pattern.)

On the one hand, this advantage of defaultdict feels obvious once I write it out like this. On the other hand, Python doesn't really encourage people to think about how often objects are created and other aspects of memory churn. Also, even if you know about the issue (as I generally do), it's tempting to go with the setdefault() version instead of the if version just because it's shorter and you probably aren't dealing with enough objects for this to matter. Using collections.defaultdict lets you have your cake and eat it too; you get short code and memory efficiency.

DefaultdictAndMemoryChurn written at 23:56:04; Add Comment

2017-10-18

I still like Python and often reach for it by default

Various local events recently made me think a bit about the future of Python at work. We're in a situation where a number of our existing tools will likely get drastically revised or entirely thrown away and replaced, and that raises local issues with Python 3 as well as questions of whether I should argue for changing our list of standard languages. I have some technical views on the answer, but thinking through this has made me realize something on a more personal level. Namely, I still like Python and it's my go-to default language for a number of things.

I'm probably always going to be a little bit grumpy about the whole transition toward Python 3, but that in no way erases the good parts of Python. Despite the baggage around it, Python 3 has its own good side and I remain reasonably enthused about it. Writing modest little programs in Python has never been a burden; the hard parts are never from Python, they're from figuring out things like data representation and that's the same challenge in any language. In the mean time, Python's various good attributes make it pretty plastic and easily molded as I'm shaping and re-shaping my code as I figure out more of how I want to do things.

(In other words, experimenting with my code is generally reasonably easy. When I may completely change how I approach a problem between my first draft and my second attempt, this is quite handy.)

Also, Python makes it very easy to do string-bashing and to combine it with basic Unix things. This describes a lot of what I do, which means that Python is a low-overhead way of writing something that is much like a shell script but that's more structured, better organized, and expresses its logic more clearly and directly (because it's not caught up in the Turing tarpit of Bourne shell).

(This sort of 'better shell script' need comes up surprisingly often.)

My tentative conclusion about what this means for me is that I should embrace Python 3, specifically I should embrace it for new work. Despite potential qualms for some things, new program that I write should be in Python 3 unless there's a strong reason they can't be (such as having to run on a platform with an inadequate or missing Python 3). The nominal end of life for Python 2 is not all that far off, and if I'm continuing with Python in general (and I am), then I should be carrying around as little Python 2 code as possible.

IStillLikePython written at 02:58:38; Add Comment

2017-10-03

Some thoughts on having both Python 2 and 3 programs

Earlier, I wrote about my qualms about using Python 3 in (work) projects in light of the extra burden it might put on my co-workers if they had to work on the code. One possible answer here is that it's possible both to use Python 3 features in Python 2 and to write code that naturally runs unmodified under both versions (as I did without explicitly trying to). This is true, but there's a catch and that catch matters in this situation.

The compatibility between Python 2 and Python 3 is not symmetric. If you write natural Python 3 code, it can often run under Python 2, sometimes with __future__ imports. However, if you write natural Python 2 code it will not run under Python 3, unless your code completely avoids at least print as a statement and mixing tabs and spaces. A Python 3 programmer who knows very little about Python 2 and who simply writes natural code can produce a program that runs unaltered under Python 2 and can probably modify a Python 2 program without having it blow up in their face. But a Python 2 programmer who tries to work on a Python 3 program is quite possibly going to have things explode. They could get lucky, but all it takes is one print statement and Python 3 is complaining. This is true even if the original Python 3 code is careful to be Python 2 compatible (it uses appropriate __future__ imports and so on).

Since there are Python 3 features that are simply not available in Python 2 even with __future__ imports, a Python 3 programmer can still wind up blowing up a Python 2 program. But as someone who's now written both Python 2 and Python 3 code (including some that wound up being valid Python 2 code too), my feeling is that you have to go at least a bit out of your way in straightforward code to wind up doing this. By contrast, it's very easy for a Python 2 programmer to use Python 2 only things in code, partly because one of them (print statements) is a long standing standard Python 2 idiom. A Python 2 programmer is relatively unlikely to produce code that also runs on Python 3 unless they explicitly try to (which requires a number of things, including awareness that there is even a Python 3).

So if you have part-time Python 3 programmers and some Python 2 programs, you'll probably be fine (and you can increase the odds by putting __future__ imports into the Python 2 programs in advance, so they're fully ready for Python 3 idioms like print() as a function). If you have part-time Python 2 programmers and some Python 3 programs, you're probably going to have to keep an eye on things; people may get surprises every so often. Unfortunately there's nothing you can really do to make the Python 3 code able to deal with Python 2 idioms like print statements.

(In the long run it seems clear that everyone is going to have to learn about Python 3, but that's another issue and problem. I suspect that many places are implicitly deferring it until they have no choice. I look forward to an increasing number of 'what to know about Python 3 for Python 2 programmers' articles as we approach 2020 and the theoretical end of Python 2 support.)

MixingPython2And3Programs written at 00:19:38; Add Comment

2017-09-21

My potential qualms about using Python 3 in projects

I wrote recently about why I didn't use the attrs module recently; the short version is that it would have forced my co-workers to learn about it in order to work on my code. Talking about this brings up a potentially awkward issue, namely Python 3. Just like the attrs module, working with Python 3 code involves learning some new things and dealing with some additional concerns. In light of this, is using Python 3 in code for work something that's justified?

This issue is relevant to me because I actually have Python 3 code these days. For one program, I had a concrete and useful reason to use Python 3 and doing so has probably had real benefits for our handling of incoming email. But for other code I've simply written it in Python 3 because I'm still kind of enthused about it and everyone (still) does say it's the right thing to do. And there's no chance that we'll be able to forget about Python 2, since almost all of our existing Python code uses Python 2 and isn't going to change.

However, my tentative view is that using Python 3 is a very different situation than the attrs module. To put it one way, it's quite possible to work with Python 3 without noticing. At a superficial level and for straightforward code, about the only difference between Python 3 and Python 2 is print("foo") versus 'print "foo". Although I've said nasty things about Python 3's automatic string conversions in the past, they do have the useful property that things basically just work in a properly formed UTF-8 environment, and most of the time that's what we have for sysadmin tools.

(Yes, this isn't robust against nasty input, and some tools are exposed to that. But many of our tools only process configuration files that we've created ourselves, which means that any problems are our own fault.)

Given that you can do a great deal of work on an existing piece of Python code without caring whether it's Python 2 or Python 3, the cost of using Python 3 instead of Python 2 is much lower than, for example, the cost of using the attrs module. Code that uses attrs is basically magic if you don't know attrs; code in Python 3 is just a tiny bit odd looking and it may blow up somewhat mysteriously if you do one of two innocent-seeming things.

(The two things are adding a print statement and using tabs in the indentation of a new or changed line. In theory the latter might not happen; in practice, most Python 3 code will be indented with spaces.)

In situations where using Python 3 allows some clear benefit, such as using a better version of an existing module, I think using Python 3 is pretty easily defensible; the cost is very likely to be low and there is a real gain. In situations where I've just used Python 3 because I thought it was neat and it's the future, well, at least the costs are very low (and I can argue that this code is ready for a hypothetical future where Python 2 isn't supported any more and we want to migrate away from it).

Sidebar: Sometimes the same code works in both Pythons

I wrote my latest Python code as a Python 3 program from the start. Somewhat to my surprise, it runs unmodified under Python 2.7.12 even though I made no attempt to make it do so. Some of this is simply luck, because it turns out that I was only ever invoking print() with a single argument. In Python 2, print("fred") is seen as 'print ("fred")', which is just 'print "fred"', which works fine. Had I tried to print() multiple arguments, things would have exploded.

(I have only single-argument print()s because I habitually format my output with % if I'm printing out multiple things. There are times when I'll deviate from this, but it's not common.)

Python3LearningQualms written at 01:35:57; Add Comment

(Previous 10 or go back to September 2017 at 2017/09/17)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.