Wandering Thoughts

2018-04-12

I'm hoping that RHEL 8's decision on Python 2 isn't Ubuntu 20.04's decision

I recently wrote about whether Ubuntu 20.04 will include Python 2, and threw in a sidebar about it in RHEL 8. Thanks to comments from Twirrim and Seth (on this entry), I then found out that Red Hat has recently announced that they won't won't be including Python 2 in Red Hat Enterprise Linux 8. This is in a way very useful to know, because we'd like to build some systems with RHEL 8 in the not too distant future if we can and those systems will need to run some Python-based system management tools. Since RHEL 8 won't include Python 2, I'd better start thinking about how to make these tools Python 3.

However, despite the news about Python 2 in RHEL 8, I remain reasonably optimistic that Python 2 will be in Ubuntu 20.04 (which would be very convenient for us, due to the relatively short time to 20.04). This is because I think there are a number of significant differences between the situation Red Hat finds themselves in with RHEL 8 and the situation Ubuntu will be in with Ubuntu 20.04.

Red Hat ships only a limited number of carefully curated packages in 'Red Hat Enterprise Linux' and then strongly supports them for a quite long time (so far for ten years, so RHEL 8 is expected to be supported through at least 2028). Red Hat is clearly willing to either change or remove packages that would normally depend on Python 2, and they have the manpower (and small package set) to make this feasible and presumably not too disruptive to what RHEL users expect (ie, not removing too many packages).

By contrast, Ubuntu has a shorter support period (20.04 will be supported only to early 2025), ships significantly more packages even in their nominally fully supported package set, supports them to a lesser extent, relies much more on upstream Debian packaging efforts, and has an escape hatch in the form of the officially less supported and much larger 'universe' package set. I'm not sure how Debian is doing in their efforts to push Python 2 out, but my impression is that it hasn't been going very fast (as with basically all large scale changes of this nature in Debian). All of this makes it both less of a burden for Ubuntu 20.04 to include Python 2 and probably more disruptive to not do so (with more excluded packages and also surprised users). As a result, I expect Ubuntu 20.04 to include Python 2 at least in their broad 'universe' package set.

(Red Hat doesn't have a formal equivalent of the Ubuntu 'universe' package set, but RHEL does have a rough functional equivalent in EPEL. It's possible that Python 2 for RHEL 8 could wind up being packaged in EPEL, at least for a while.)

PS: It'll be interesting to see if there's a /usr/bin/python on RHEL 8 is when it comes out, or if there's only a python3. I think my personal preference is for there to be no /usr/bin/python, but that's biased by having multiple systems and wanting things that expect Python 2 to immediately fail on RHEL 8 with a clear error rather than exploding mysteriously.

Sidebar: My guess at Ubuntu's path to removing Python 2

Ubuntu doesn't just do LTS releases every two years; they also do regular releases every six months. These releases are both an early signal of what will be in a future LTS release and a chance for Ubuntu to start making gradual changes. Since completely dropping Python 2 from one release to another would be quite disruptive, what I expect Ubuntu to do instead is first move it (and all of the packages that depend on it) to the 'universe' package set. This would effectively start the clock running on its actual removal in some later release, and also give people like me some advance warning about it.

(I believe that packages can be moved this way without causing heartburn to people upgrading from one release to the next, but I may be wrong.)

Python2RHEL8VsUbuntu2004 written at 02:16:55; Add Comment

2018-04-10

Our real problem with a removal of Python 2 is likely to be our users

In my recent entry on whether Ubuntu 20.04 LTS will include Python 2, I mentioned that this mattered because we have various system management tools written in Python 2, so if Python 2 was not going to be in 20.04, we'd need to start porting them so they'd be ready in time. Unfortunately, this need to port our system tools is probably not going to be the most painful part of the day that Ubuntu ships without Python 2. Instead, the real problem is our users. More specifically, the problem is all of the Python (2) programs that our users will have written over the years and still use and need.

Well, let me rephrase that. I shouldn't say 'users'; I should say 'the graduate students and professors of the department' (and also researchers, postdocs, undergrads doing research work with professors, visiting collaborators, and so on). Unusually for the modern world, we provide general multiuser computing to all of these people, and so these people log on to those Ubuntu-based servers and do whatever they want (more or less) with the things that Ubuntu provides. Some of these people write Python programs, and some of them are probably Python 2 programs. When Python 2 goes away, those programs are going to break.

(They will also probably break if /usr/bin/python turns into Python 3, which is one reason I hope Ubuntu doesn't do that any time soon. There being no /usr/bin/python is less confusing and easier to explain to angry users than 'python is now incompatible with what it was a week ago, sorry about that'.)

A few of these people are probably avid Python users and already know about Python 3. Of course, these people are probably already writing everything in Python 3, so they're unaffected by this. Many more of these people probably don't know about Python 3 for various reasons, including that their real work is writing a thesis or doing research, not knowing about developments in the programming language that they're working in. To add to the difficulty, we don't even know who they are (and I'm not sure how we'd find out, unless there is some very lightweight and non-intrusive way of instrumenting our systems to gather data when Python 2 gets run).

(Since we can't currently give our users any definitive information on when they won't have Python 2, it's also not very useful to reach out to them right now. Many of our users would rightfully consider rewriting things from Python 2 to Python 3 to be a distraction from their thesis or their research, and for that matter they may only need their Python 2 programs for a relatively limited time.)

The basically inevitable result of this is that we're likely to be forced to install Python 2 for backward compatibility for at least one LTS generation after Ubuntu drops it officially. Hopefully there will be people packaging Python 2.x as a PPA, which is the most convenient option on Ubuntu. The possible exception for this would be if Ubuntu gave everyone a significant amount of advance warning, for example if they announced before 20.04 that it would be the last release that included any version of Python 2 in the normal Ubuntu repositories. Then we could at least start trying to reach users, not that I expect us to be very successful at it.

(Because of the department's support model for users and other factors, most people will be on their own for dealing with things, too.)

PS: We're definitely not going to ever change /usr/bin/python to print a warning about the situation (for many reasons including that warnings are often fatal errors in practice), and I'm pretty sure we'd never alter it to syslog things when it starts. Any method of acquiring information about when Python 2 gets run needs to be entirely external.

Python2AndOurUsers written at 20:32:25; Add Comment

2018-04-08

The interesting question of whether Ubuntu 20.04 LTS will include Python 2

It's 2018, which means that 2020's end of Python 2 support is only two years away. Two years seems like a long time, but it's not really, especially if you're not a full time developer or Python person, which is our situation. One of the questions about what we have to do about our current set of Python programs boils down to the question of whether Ubuntu's very likely April 2020 Long Term Support release (Ubuntu 20.04) will include Python 2.

So far, Ubuntu has done LTS releases every two years in April; 10.04, 14.04, 16.04, and now the impending 18.04. If they follow this pattern, they will release the next LTS in April of 2020, after Python 2's end of life (which the Python people say is January 1st 2020), and if we follow our usual practices, we'll begin using Ubuntu 20.04 on some systems that summer and autumn. These systems will need to run our Python system management tools, which means that if Ubuntu 20.04 doesn't include Python 2, we need to have our tools running on Python 3 before then.

(Of course it might be a good idea to port our tools to Python 3 now, but there's a difference between being prepared and being forced. This is especially important when you have to prioritize various things you could be working on, which is generally our situation.)

Since Ubuntu 20.04 will be released after the Python 2 cutoff date, in theory it could drop Python 2 on the grounds that it's no longer supported by the upstream developers. However, in practice there are two issues. First, it seems very likely that Python 2 will be supported by other people if problems emerge, because there are other long term Linux distributions that are already committed to supporting systems with Python 2 past 2020 (for example, Red Hat Enterprise Linux 7, which will be supported through 2024, and then there's Ubuntu 18.04 itself, which will be supported through 2023). Second, it's not clear that all packages that currently use Python 2 will be updated to Python 3 in time for 2020 (see eg). Ubuntu could choose to throw Python 2 out anyway and to hell with any packages that this forces out, but that might not be very popular with people.

The current state of Ubuntu 18.04 is that Python 2.7 will be available in the 'main' package repository, directly supported by Ubuntu. One possible option for 20.04 is that Python 2.7 would be available but would be demoted to the community supported 'universe' package repository, which theoretically gives you lower expectations of bug and security fixes. This would give Ubuntu an option to shrug their shoulders if some serious issue comes up after 2020 and no one steps forward to fix it.

Probably the safest option for us is to begin moving our tools to Python 3, but likely not until 2019. If we started now, I'd have to make them compatible with Ubuntu 14.04's Python 3.4.3; if I wait until we've migrated all of our 14.04 machines to 18.04, I get to base everything on Ubuntu 16.04's somewhat more recent 3.5.2.

(Using 3.5 as the base could be potentially important, since the 3.5 changes brought in formatting for bytes and better handling for character encoding issues with sys.stdin and sys.stdout, both of which might be handy for our sysadmin-focused uses of Python.)

Sidebar: Red Hat Enterprise Linux 8 and Python 2

Unlike Ubuntu, Red Hat hasn't officially announced any timing or formal plans for RHEL 8. However, a new version of RHEL is due (based on RH's traditional timing) and there are some signs that one is in preparation, probably for release this summer. I can't imagine such a version not including Python 2, which means that Red Hat would likely be committed to supporting it through 2028.

This isn't necessarily a big burden, because it's my opinion that we're unlikely to find any serious issues in Python 2.7 after 2020. This is especially so if people like Red Hat make a concerted effort to find any remaining 2.7 problems before the official end of support, for example by extensively running fuzzing tools against 2.7 or by paying for some security auditing of Python's SSL code (or doing it themselves).

Python2AndLTSLinuxes written at 22:06:50; Add Comment

2018-03-22

Why seeing what current attributes a Python object has is hard

Back when I wrote some notes on __slots__ and class hierarchies, I said in passing that there was no simple way to see what attributes an object currently has (I was sort of talking about objects that use __slots__, but it's actually more general). Today, for reasons beyond the scope of this entry, I feel like talking about why things work out this way.

To see where things get tricky, I'll start out by talking about where they're simple. If what we have is some basic struct object and we want to see what fields it has, the most straightforward approach is to look at its __dict__. We can get the same result indirectly by taking the dir() of the object and subtracting the dir() of its class:

>>> class A:
...   def __init__(self):
...      self.a = 10
...      self.b = 20
... 
>>> a = A()
>>> set(dir(a)) - set(dir(a.__class__))
{'b', 'a'}

(This falls out of the definition of dir(), but note that this only works on simple objects that don't do a variety of things.)

The first problem is that neither version of this approach works for instances of classes that use __slots__. Such objects have no __dict__, and if you look at dir() it will tell you that they have no attributes of their own:

>>> class B:
...   __slots__ = ('a', 'b')
...   def __init__(self):
...      self.a = 10
...
>>> b = B()
>>> set(dir(b)) - set(dir(b.__class__))
set()

This follows straightforwardly from how __slots__ are defined, particularly this bit:

  • __slots__ are implemented at the class level by creating descriptors (Implementing Descriptors) for each variable name. [...]

Descriptors are attributes on the class, not on instances of the class, although they create behavior in those instances. As we can see in dir(), the class itself has a and b attributes:

>>> B.a
<member 'a' of 'B' objects>

(In CPython, these are member_descriptor objects.)

For an instance of a __slots__ using class, we still have a somewhat workable definition of what attributes it has. For each __slots__ attribute, an instance has the attribute if hasattr() is true for it, which means that you can access it. Here our b instance of B has an a attribute but doesn't have a b attribute. You can at least write code that mechanically checks this, although it's a bit harder than it looks.

(One part is that you need the union of __slots__ on all base classes.)

However, we've now arrived at the tricky bit. Suppose that we have a general property on a class under the name par. When should we say that instances of this class have a par attribute? In one sense, instances never will, because at the mechanical level par will always be a class attribute and will never appear in an instance __dict__. In another sense, we could reasonably say that instances have a par attribute when hasattr() is true for it, ie when accessing inst.par won't raise AttributeError; this is the same definition as we used for __slots__ attributes. Or we might want to be more general and say that an attribute only 'exists' for our purposes when accessing it doesn't raise any errors, not just AttributeError (after all, this is when we can use the attribute). But what if this property actually computes the value for par on the fly from somewhere, in effect turning an attribute into a method; do we say that par is still an attribute of the instance, even though it doesn't really act like an attribute any more?

Python has a lot of ways to attach sophisticated behavior to instances of classes that's triggered when you try to access an attribute in some way. Once we have such sophisticated behavior in action, there's no clear or universal definition of when an instance 'has' an attribute and it becomes a matter of interpretation and opinion. This is one deep cause of why there's no simple way to see what attributes an object currently has; once we get past the simple cases, it's not even clear what the question means.

(Even if we come up with a meaning for ourselves, classes that define __getattr__ or __getattribute__ make it basically impossible to see what attribute names we want to check, as the dir() documentation gently notes. There are many complications here.)

Sidebar: The pragmatic answer

The pragmatic answer is that if it's sensible to ask this question about an object at all, we can get pretty much the answer we want by looking at the object's __dict__ (if it has one), then adding the merged __slots__ names for which hasattr() reports true.

That this answer blows up on things like proxy objects suggests that perhaps it's not a question we should be asking in the first place, at least not outside of limited and specialized situations.

(In other words, it's possible to get entirely too entranced with the theory of Python and neglect its practical applications. I'm guilty of this from time to time.)

KnowingObjectAttrsHard written at 00:14:03; Add Comment

2018-03-20

Python and the 'bags of unstructured data' approach

These days I write code in both Go and Python, which sometimes gives me interesting new perspectives on each language as I shift back and forth. I was recently hacking on a Python program to mutate it into what I wanted, and as I did so what struck me is how Python's dynamic typing and everything around it enabled a specific approach that I'll call the 'bag of data' approach.

The base code I was starting with parses Linux's /proc/self/mountstats to get at all of the NFS statistics found there. All of the data fields in these statistics have defined meanings, meanings that this Python code knew, so it could have opted to use some kind of structures for them with actual named fields (perhaps using namedtuple). However, it didn't. Instead the code dumps everything into a small collection of dicts using various named keys, then yanks bits back out again as it needs them (and knows what structure each key's data will have).

This 'bag of data' approach only works in a dynamic language like Python, because there's no structure or typing to what goes where. A given key may give you a string, a number, an list of either, a sub-dict, or whatever. On the one hand this is harder to follow than something with fixed, named fields. On the other hand it's marvelously flexible and easy to manipulate and transform, especially in bulk. In theory you could do the same sort of thing with named fields (in Python), but in practice it is just easier to write code when you're dealing with dictionary keys and values, because getting lists of them and accessing arbitrary ones and doing indirection is really simple. With a dict that's a bag of data, it's natural to write code like this:

datas = [self.__rpc_data[x] for x in self.__rpc_data['ops']]
sumrpc = [sum(x) for x in zip(*datas)]

This isn't necessarily the best way to do things in final code, once you're sure you know what you need, but in the mean time this plasticity makes it very easy to experiment by transforming and mutating and remixing various pieces of data in the bag in convenient and quick to write ways. When 'sum fields together across all of the different NFS RPC operations' is a two liner, you're much more likely to try it if you think something interesting might result.

(One way that this is potentially flawed is that not all statistics fields in NFS RPC operations may make sense to sum together. But that's up to you to keep track of and sort out, because that's the tradeoff you get here.)

There's other nice things you can do with the bag of data approach. For example, it gives you a relatively natural way to deal with data that isn't always there. Python has lots of operations for checking if keys are in dicts, getting the value of a key with a default value if it's not there, and so on. You can build equivalents of all of these for named fields, but it's more work and isn't likely to feel as natural as 'if key in databage: ...'.

Another thing I've done in code is to successively refine my bag of data in multiple passes. My bag of data starts out with only very basic raw fields, then I generate some calculated fields and add them to the bag, then another pass derives an additional set of fields, and so on. Again, you can do this pattern with named fields, but it isn't a natural fit; probably the right way to do it with structured data is a series of different structures, perhaps embedding the previous ones. Pragmatically it's easier to write passes that simply add and update dictionary keys, though, partly because the knowledge of what fields exist when and where can be very localized.

(This is especially the case if what you're dealing with is a tree of data and you may want to run a pass over each node in the tree, where different nodes may be of different types. When everything is a dictionary it's easy to write generic code that only acts on certain things; life gets messier if you must carefully sniff out if the object you have is the right type of object for your pass to even start looking at.)

BagsOfData written at 00:13:27; Add Comment

2018-02-28

Using Python 3 for example code here on Wandering Thoughts

When I write about Python here, I often wind up having some example Python code, such as the subCls example in my recent entry about subclassing a __slots__ class. Mostly this Python code has been Python 2 by default, with Python 3 as the exception. When I started writing, Python 3 wasn't even released; then it wasn't really something you wanted to use; and then I was grumpy about it so I deliberately continued to use Python 2 for examples here, just as I continued to write programs in it (for good reasons). Sometimes I explicitly mentioned that my examples were in Python 2, but sometimes not, and that too was a bit of my grumpiness in action.

(There was also the small fact that I'm far more familiar with Python 2 than Python 3, so writing Python 2 code is what happens if I don't actively think about it.)

However, things change. Over the past few years I've basically made my peace with Python 3 and these days I'm trying to write new code in Python 3. Although writing my example code here in Python 2 is close to being a reflex, it's one that I want to consciously break. Going forward from now, I'm going to write sample code in Python 3 by default and only use Python 2 if there is some special reason for it (and then mention explicitly that the example is Python 2 instead of 3). This is a small gesture, but I figure it's about time, and it's also probably what more and more readers are just going to expect.

(It looks like I've been doing this inconsistently for a while, or at least testing some of my examples in Python 3 too, eg, and also increasingly linking to the Python 3 version of Python documentation instead of the Python 2 version.)

Actually doing this is going to take me some work and attention. Since I write Python 2 code by reflex, I'm going to have to double-check my examples to make sure that they're valid Python 3 (and that they behave the same way in Python 3). Some of the time this will mean actually testing even small fragments instead of relying on my Python (2) knowledge to write from memory. Also, when I'm checking Python's behavior for something (or prototyping some code), I'll have to remember to run python3 instead of just python or I'll accidentally wind up testing the wrong Python.

(When I wrote my recent entry I was quietly careful to make the example code Python 3 code by including a super() and then using the no-argument version, which is Python 3 only.)

(I'm writing this entry partly to put a marker in the ground for myself, so that I won't be tempted to let a Python 2 example slide just because I'm feeling lazy and I don't want to work out and verify the Python 3 version.)

Python3ForExamples written at 02:24:08; Add Comment

2018-02-25

What Python does when you subclass a __slots__ class is the right answer

Recently on Twitter @simonw sort of asked this question:

Trying to figure out if it's possible to create an immutable Python class who's subclasses inherit its immutability - using __slots__ on the parent class prevents attrs being set, but any child class still gets a mutable __dict__

To rephrase this, if you subclass a class that uses __slots__, by default you can freely set arbitrary attributes on instances of your subclass. Python's behavior here surprises people every so often (me included); it seems to strike a fair number of people as intuitively obvious that __slots__ should be sticky, so that if you subclass a __slots__ class, you yourself would also be a __slots__ class. In light of this, we can ask whether Python has made the right decision here or if this is basically a bug.

My answer is that this is a feature and Python has made the right choice. Let's consider the problems if things worked the other way around and __slots__ was sticky to subclasses. The obvious problem that would come up is that any subclass would have to remember to declare any new instance attributes it used in a __slots__ of its own. In other words, if you wrote this, it would fail:

class subCls(slotsCls):
   def __init__(self, a, b, c):
      super().__init__(a)
      self.b = b
      self.c = c
   [...]

In practice, I don't think this is a big issue by itself. Almost all Python code sets all instance variables in __init__, which means that you'd find out about the problem the moment you created an instance of the subclass, for example in your tests. Even if you only create some instance variables in __init__ and defer others to later, a single new variable would be enough to trigger the usual failure. This means you're extremely likely to trip over this right away, either in tests or in real code; you'll almost never have mysterious deferred failures.

However, it points toward the real problem, which is that classes couldn't switch to using __slots__ without breaking subclasses. Effectively, that you weren't using __slots__ would become part of your class API. With Python as it is today, using or not using __slots__ is an implementation decision that's local to your class; you can switch back and forth without affecting anyone else (unless outside people try to set their own attributes on your instances, but that's not a good idea anyway). If the __slots__ nature was inherited and you switched to using __slots__ for your own reasons, all subclasses would break just like my subCls example above, including completely independent people outside your codebase who are monkey subclassing you in order to change some of your behavior.

(You could sort of work around this with the right magic, but then you'd lose some of the memory use benefits of switching to __slots__.)

Given the long term impact of making __slots__ sticky, I think that Python made the right decision to not do so. A Python where __slots__ was sticky would be a more annoying one with more breakage as people evolved classes (and also people feeling more constrained with how they could evolve their classes).

(There would also be technical issues with a sticky __slots__ with CPython today, but those could probably be worked around.)

SlotsSubclassSurpriseRight written at 01:34:35; Add Comment

2018-01-09

Differences between keywords and constants in Python

Yesterday I wrote about the challenges of having true constants in Python and said that there are other mechanisms that achieve basically the same results if all we care about are a few builtin values like True, False, and None. The most straightforward way is what was done with None in Python 2 and eventually done with True and False in Python 3, which is to make them into keywords. This raises the obvious question, namely why the Python people waited until Python 3 to make this change. One way of starting to answer this is to ask what the difference is (or would be) between Python keywords and hypothetical true constants (or just the ordinary 'constants' Python 2 has today for True and False).

If you look in Python's language documentation in the keywords section, you sort of get an answer:

The following identifiers are used as reserved words, or keywords of the language, and cannot be used as ordinary identifiers. [...]

(Emphasis mine.)

A keyword cannot be used as an identifier in any context, not merely as a variable (whether global to a module or even local to a function). If you try to define the following class in Python 3, you'll get a syntax error:

class example:
  def __init__(self):
    self.True = 10

If you try harder, you can nominally create the instance attribute (either by directly setting it in self.__dict__ or by naming it in __slots__), but then you have no way of getting access to it as an attribute, since writing obj.True in any context gets you a syntax error.

(By extension, you can't have a method called True or False either.)

Our hypothetical true constants would not be so restricted. A constant would be unchangeable in its namespace, but it certainly wouldn't block the use of its name as an identifier in general in other contexts. You probably shouldn't give user-created names that much power (and the idea is a bad fit for Python's semantics anyway, with no obvious way to implement it).

Given this, we can look at some issues with making True and False into keywords in Python 2.

To start with, it's unlikely that someone was using either True or False as the name of a field in a class but it's not impossible. If they were and if some version of Python 2 made True and False into keywords, that code would immediately fail to even start running. Although I don't know for sure, I suspect that Python 2 had no infrastructure that would have let it report deprecation warnings in advance for this, so it probably would have been an abrupt change.

However, this is a pretty esoteric reason and there's a much more pragmatic one, illustrated by the the example that Giedrius Statkevičius reported at the end of his article. The pymodbus module defined True and False in its __init__.py, not because it was worried about other people overriding them, but because at one point it wanted to support older Python versions while still using them:

# Define True and False if we don't have them (2.3.2)
try:
    True, False
except NameError:
    True, False = 1, 0

(A later version changed the values to be the result of boolean comparisons.)

If True and False had been created as keywords, there would be no way to use them and be backwards compatible with versions of Python 2 before they were defined. If they're keywords, merely writing a line that says 'True = (1 == 1)' is a syntax error when the module is imported or otherwise used, even if the line is never executed. You have no good way to define your own versions of them in Python versions where they're not supported (technically there is one way, but let's not go there), which means that you can't use them at all until you're willing to completely abandon support for those older Python versions. Forcing people to make this choice right up front is not a good way to get new features used; in fact, it's a great way for a story to spread through the community of 'oh, you can't use True and False because ...'. This is counterproductive, to put it one way.

Python 3 can make this sort of change because Python 3 was already making incompatible changes; in fact, making incompatible changes is its entire point. Python 2 was not in a good position to do it. Thus, I suspect that this is the major reason that Python 2 didn't make True and False into keywords but instead just put them into the builtins namespace as values.

KeywordsVsConstants written at 01:25:32; Add Comment

2018-01-07

The challenges of having true constants in Python

In his article What is Actually True and False in Python?, Giedrius Statkevičius takes Python to task for not having genuine constants in the language and making True and False into such constants, instead leaving them as changeable names in the builtins module. This provides a convenient starting point for a discussion of why having true constants in Python is a surprisingly difficult thing.

One of the big divisions between programming languages is what variable names mean. Many languages are what I will call storage languages, where a variable is a label for a piece of storage that you put things in (perhaps literally a chunk of RAM, as in C and Go, or something more abstracted, as in Perl). Python is not a storage language; instead it's what I'll call a binding language, where variables are bindings (references) to anonymous things.

In a storage language, variable assignment is copying data from one storage location to another; when you write 'a = 42' the language will copy the representation of 42 into the storage location for a. In a binding language, variable assignment is establishing a reference; when you write 'a = 42', the language makes a a reference to the 42 object (this can lead to fun errors).

One result of that is that constants are different between the two sorts of languages. In a storage language what it means to make something a simple constant is relatively straightforward; it's a label that doesn't allow you to change the contents of its storage location. In a binding language, a constant must be defined differently; it must be something that doesn't allow you to change its name binding. Once you set 'const a = 42', a will always refer to the 42 object and you can't rebind it.

In Python, what names are bound to is not a property of the name, it is instead a property of the namespace they are in (which is part of why del needs to be a builtin). This means that in order for Python to have true constants, the various namespaces in Python would need to support names that cannot be re-bound once created with some initial value. This is certainly possible, but it's not a single change because there are at least three different ways of storing variables in Python (in actual dicts, local variables in functions, and __slots__ instance variables) and obviously all of them would need this.

You also need some way to support reloading modules, because this normally just runs all of the new module's code in the existing namespace. People will be unhappy if they can't change the value of a module level constant by reloading the module with a new version, or even convert a constant into an ordinary variable (and they'd be unhappier if they can't reload modules with constants at all).

Because the namespace of builtins is special, it would probably not be all that difficult to support true constants purely for it. In theory this would give you constants for True and False, but in practice people can and normally will create module-level versions of those constants with different values. In fact this is a general issue for any builtin constants; if they're supposed to genuinely be constants, you probably don't want to let people shadow them at the module level (at least). This requires more magic for all of the various ways of writing names to module level globals.

One more complication is that Python likes to implement this sort of thing with a general feature, instead of specific and narrowly tailored code. Probably the most obvious general way of supporting constants would be to support properties at the module level, not just in classes (although this doesn't solve the shadowing problem for builtin constants and you'd need an escape for reloading modules). However, there are probably a bunch of semantic side effects and questions if you did this, in addition to likely performance impacts.

(Any general feature for this is going to lead to a bunch of side effects and questions, because that's what general features do; they have far-reaching general effects.)

There's also a philosophical question of whether Python should even have true user-defined constants. Python is generally very much on the side that you can monkey-patch things if you really want to; any protections against doing so are usually at least partially social, in that you can bypass them if you try hard. Genuinely read-only names at the module level seem a violation of that, and there are other mechanisms if all we really care about are a few builtin values like True, False, and None.

(Why Python 2 didn't use such mechanisms to make True and False into 'constants' is another entry.)

Sidebar: Straightforward constants versus full constants

So far I've been pretending that it's sufficient to stop the name binding from changing in order to have a constant (or the storage location for storage languages). As Python people know full well, this is not enough because objects can mutate themselves if you ask them to (after all, this is the difference between a list and a tuple).

Suppose that Python had a magic const statement that made something a constant from that point onward:

alist = [1, 2, 3]
const alist

Clearly this must cause 'alist = 10' to be an error. But does it stop 'alist.pop()', and if so how (especially if we want it to work on arbitrary user-provided objects of random classes)?

One plausible answer is that const should simply fail on objects that can't be dictionary keys, on the grounds that this is as close as Python gets to 'this object is immutable'. People who want to do things like make a dict into a constant are doing something peculiar and can write a custom subclass to arrange all of the necessary details.

(Or they can just make their subclass lie about their suitability as dictionary keys, but then that's on them.)

ChallengesOfConstants written at 18:38:03; Add Comment

2018-01-06

What's happening when you change True and False in Python 2

Today I read Giedrius Statkevičius' What is Actually True and False in Python? (via), which talks about the history of how True and False aren't fixed constants until Python 3 and thus how you can change them in Python 2. But what does it really mean to do this? So let's dive right in to the details in an interactive Python 2 session.

As seen in Statkevičius' article, reversing True and False is pretty straightforward:

>>> int(True)
1
>>> True, False = False, True
>>> int(True)
0

Does this change what boolean comparisons actually return, though?

>>> int((0 == 0) == True)
0
>>> (0 == 0) == True
False
>>> (0 == 0) == False
True
>>> (0 == 0) is False
True

It doesn't, and this is our first clue to what is going on. We haven't changed the Python interpreter's view of what True and False are, or the actual bool objects that are True and False; we've simply changed what the names True and False refer to. Basically we've done 'fred, barney = False, True' but (re)using names that code expects to have a certain meaning. Our subsequent code is using our redefined True and False names because Python looks up what names mean dynamically, as the code runs, so if you rebind a name that rebinding takes immediate effect.

This is also why the truth values being printed are correct; the bool objects themselves are printing out their truth value, and since that truth value hasn't changed we get the results we expect:

>>> True, False
(False, True)

But what names have we changed?

>>> (0 == 0) is __builtins__.True
True
>>> True is __builtins__.False
True
>>> globals()["True"]
False

This tells us the answer, which is that we've added True and False global variables in our module's namespace by copying False and True values from the global builtins. This means that our redefined True and False are only visible in our own namespace. Code in other modules will be unaffected, as we've only shadowed the builtin names inside our own module.

(An interactive Python session has its own little module-level namespace.)

To see that this is true, we need a tst helper module with a single function:

 def istrue(val):
     if val == True:
        print "Yes"
     else:
        print "No"

Then:

>>> import tst
>>> tst.istrue(True)
No
>>> tst.istrue(0 == 0)
Yes

But we don't have to restrict ourselves to just our own module. So let's redefine the builtin versions instead, which will have a global effect. First, let's clear out our 'module' versions of those names:

>>> del True; del False

Then redefine them globally:

>>> __builtins__.True, __builtins__.False = (0 == 1), (0 == 0)
>>> (0 == 0) is True
False

We can verify that these are no longer in our own namespace:

>>> globals()["True"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'True'

We reuse our helper module to show that we've now made a global change:

 >>> tst.istrue(0 == 0)
 No

But of course:

 >>> tst.istrue(True)
 Yes

Changing __builtins__.True has changed the True that all modules see, unless they deliberately shadow the builtin True with their own module-level True. Unlike before, True now means the same thing in our interactive session and in the tst module.

Since modules are mutable, we can actually fix tst.istrue from the outside:

 >>> tst.True = (0 == 0)
 >>> tst.istrue(0 == 0)
 Yes
 >>> tst.True
 True

Now the tst module has its own module-global True name with the correct value and tst.istrue works correctly again. However, we're back to a difference in what True means in different modules:

>>> tst.istrue(True)
No
>>> False is tst.True
True

(Since our interactive session's 'module' has no name binding for False, it uses the binding in the builtins, which we made point to the True boolean object. However tst has its own name binding for True, which also points to the True boolean object. Hence our False is tst's True. Yes, this gets confusing fast.)

As noted in Statkevičius' article, Python only ever has two bool objects, one True and one False. These objects are immutable (and known by the CPython interpreter), and so we can't change the actual truth value of comparisons, what gets printed by the bool objects, and so on. All we can do is change what the names True and False mean at various levels; in a function (not shown here), for an entire module, or globally through the builtins.

(Technically there's a few more namespaces we could fiddle with.)

As a side note, we can't subclass bool to make a thing that is considered a boolean yet has different behavior. If we try it, CPython 2 tells us:

TypeError: Error when calling the metaclass bases
    type 'bool' is not an acceptable base type

This is an explicitly coded restriction; the C-level bool type doesn't allow itself to be subclassed.

(Technically it's coded by omitting a 'this can be a base type' flag from the C-level type flags for the bool type, but close enough. There are a number of built-in CPython types that can't be subclassed because they omit this flag.)

We can change the True and False names to point to non-bool objects if we want. If you take this far enough, you can arrange to get interesting errors and perhaps spectacular explosions:

>>> __builtins__.False = set("a joke")
>>> (0 != 0) == False
False
>>> d = {}
>>> d[False] = False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'

For maximum fun, arrange for True and False to be objects that are deliberately uncomparable and can't be converted to booleans (in Python 2, this requires raising an error in your __eq__ and __nonzero__ methods).

(I've used False here because many objects in Python 2 are considered to be boolean True. In fact, by default almost all objects are; you have to go out of your way to make something False.)

ChangingTrueDetails written at 20:46:42; Add Comment

(Previous 10 or go back to December 2017 at 2017/12/29)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.