Wandering Thoughts

2018-05-10

Python modules use operator overloading in two different ways

In Python (as in elsewhere), there are at least two different things that people use operator overloading for. That there's more than one thing makes a difference because some patterns of designing how operator overload work aren't sufficiently general to handle both things; if you want to serve both groups, you need to design a more general mechanism than you might expect, one that delegates more power to objects.

The first use of operator overloading is to extend operators so that they work (in the traditional ways) on objects that they wouldn't normally work on. The classical examples of this is complex numbers and rational numbers (both of which Python has in the standard library), and in general various sorts of things built with numbers and numeric representations. However you can go beyond this, to objects that aren't strictly numeric but which can use at least some of the the traditional numeric operators in ways that still obey the usual rules of arithmetic and make sense. Python sets implement some numeric operations in ways that continue to make sense and are unsurprising.

The second use is to simply hijack the operations in order to do something convenient for your objects with a handy symbol for it. Sometimes these operations are vaguely related to their numeric equivalents (such as string multiplication, where "a" * 4 gets you "aaaa"), but sometimes they have nothing to do with it. The classic example of the latter is the string % operator, which has nothing at all to do with arithmetic but instead formats a string using % formatting codes. Using the % operator for this is certainly convenient and it has a certain mnemonic value and neatness factor, but it definitely has nothing to do with %'s normal use in arithmetic.

Now, let us consider the case of Python not allowing you to overload boolean AND and OR. In a comment on that entry, Aneurin Price said:

I'm not at all convinced by this argument. My expectation for this hypothetical __band__ is that it would be called after evaluating a and finding it truthy, at which point b is evaluated either way. [...]

This is definitely true if you think of operator overloading as only for the first case. But, unfortunately for the design of overloading AND and OR, this is not all that people would like to use it for. My understanding is that ORMs such as Django's and SQLAlchemy would like to intercept AND and OR in order to build up complicated conditional SQL queries with, essentially, a DSL based on Python expressions. In this DSL, they would like to be able to write something like:

Q.descfield.startswith("Who") or Q.descfield.startswith("What")

This wouldn't evaluate or produce any sort of truth value; instead it would produce an object representing a pending SQL query with a WHERE clause that encoded this OR condition. Later you'd execute the SQL query to produce the actual results.

If operator overloading for AND and OR paid any attention to the nominal truth value of the left expression, there is no way to make this work. Instead, allowing general overloading of AND and OR requires allowing the left side expression to hijack the process before then. In general, operator overloading that allows for this sort of usage needs to allow for this sort of early hijacking; fortunately this is generally easy for arithmetic operators.

(I'm not sure Python has truly general support for mixing unusual numerical types together, but then such general support is probably very hard to implement. I think you want to be able to express a compatibility table, where each type can say that its overloads handle certain other types or types that have certain properties or something. Otherwise getting your rational number type to interact well with my Point type gets really complicated really fast, if not impossible.)

TwoSortsOfOverloading written at 00:28:02; Add Comment

2018-05-07

One reason why Python doesn't let you overload the boolean AND and OR operations

Recently I read Kurt Rose's DISappearing and (via Planet Python), where Kurt noted that Python doesn't have __...__ methods that let you override boolean and and or operations on your class objects. As it happens, there's a really good reason for this, which is that Python would require a new fundamental data type in order to make it really work.

Boolean and and or have the extremely valuable property of short-circuiting evaluation, where if you write, say, 'a() and b()' and a() evaluates to false, Python will not even call b(). Let's imagine a hypothetical world in which Python allows you to do this overriding and the boolean operators still preserve this short circuiting. As usual, if you write 'a and b', this will (at least some of the time) translate into a call to the override method on a, let's call it __band__, and the __band__ method will receive an additional argument that represents the right hand side:

class AClass:
  def __band__(self, right):
    ....

Now here is the big question: what's the type of right in this method?

In binary __and__, right is the value we get from evaluating the right hand side expression; if you write 'a & b()', this is roughly the same as a.__and__(b()). However this can't be the case for __band__, because that would mean no more short-circuiting; if a had a __band__ method, writing a and b() would call b() all of the time. To preserve short-circuiting, right has to be some type that represents the right hand side expression in an un-evaluated form.

However, Python has no such type today. Closures sort of come close, but they create additional effects and do things like appear in Python exception backtraces. This means that adding override methods for boolean operations would require either discarding short-circuiting (and making right be the evaluation result) or figuring out and introducing a new, relatively complex type in Python just to support this.

(Continuations are sort of what you'd need but I think they're not quite what you want, or at least you need a continuation that captures only the right side expression.)

The other problem of such a right type is that you'd want to be able to peer inside it relatively easily. After all, the entire purpose of implementing your own __band__ method is so that you can do something different from a plain boolean and when the right hand side is some special thing. If all you're going to do is:

def __band__(self, right):
  if not bool(self):
    return False
  else:
    return right.eval()

then there's not really any point in having a __band__ at all, especially given the general complexity involved in Python as a whole.

(This is of course not necessarily the only reason for Python to fence off boolean operations as things that you absolutely can't override. You can certainly argue that they should be inviolate and not subject to clever redefinitions simply as a matter of principle.)

WhyNoAndOverloading written at 00:33:01; Add Comment

2018-04-12

I'm hoping that RHEL 8's decision on Python 2 isn't Ubuntu 20.04's decision

I recently wrote about whether Ubuntu 20.04 will include Python 2, and threw in a sidebar about it in RHEL 8. Thanks to comments from Twirrim and Seth (on this entry), I then found out that Red Hat has recently announced that they won't won't be including Python 2 in Red Hat Enterprise Linux 8. This is in a way very useful to know, because we'd like to build some systems with RHEL 8 in the not too distant future if we can and those systems will need to run some Python-based system management tools. Since RHEL 8 won't include Python 2, I'd better start thinking about how to make these tools Python 3.

However, despite the news about Python 2 in RHEL 8, I remain reasonably optimistic that Python 2 will be in Ubuntu 20.04 (which would be very convenient for us, due to the relatively short time to 20.04). This is because I think there are a number of significant differences between the situation Red Hat finds themselves in with RHEL 8 and the situation Ubuntu will be in with Ubuntu 20.04.

Red Hat ships only a limited number of carefully curated packages in 'Red Hat Enterprise Linux' and then strongly supports them for a quite long time (so far for ten years, so RHEL 8 is expected to be supported through at least 2028). Red Hat is clearly willing to either change or remove packages that would normally depend on Python 2, and they have the manpower (and small package set) to make this feasible and presumably not too disruptive to what RHEL users expect (ie, not removing too many packages).

By contrast, Ubuntu has a shorter support period (20.04 will be supported only to early 2025), ships significantly more packages even in their nominally fully supported package set, supports them to a lesser extent, relies much more on upstream Debian packaging efforts, and has an escape hatch in the form of the officially less supported and much larger 'universe' package set. I'm not sure how Debian is doing in their efforts to push Python 2 out, but my impression is that it hasn't been going very fast (as with basically all large scale changes of this nature in Debian). All of this makes it both less of a burden for Ubuntu 20.04 to include Python 2 and probably more disruptive to not do so (with more excluded packages and also surprised users). As a result, I expect Ubuntu 20.04 to include Python 2 at least in their broad 'universe' package set.

(Red Hat doesn't have a formal equivalent of the Ubuntu 'universe' package set, but RHEL does have a rough functional equivalent in EPEL. It's possible that Python 2 for RHEL 8 could wind up being packaged in EPEL, at least for a while.)

PS: It'll be interesting to see if there's a /usr/bin/python on RHEL 8 is when it comes out, or if there's only a python3. I think my personal preference is for there to be no /usr/bin/python, but that's biased by having multiple systems and wanting things that expect Python 2 to immediately fail on RHEL 8 with a clear error rather than exploding mysteriously.

Sidebar: My guess at Ubuntu's path to removing Python 2

Ubuntu doesn't just do LTS releases every two years; they also do regular releases every six months. These releases are both an early signal of what will be in a future LTS release and a chance for Ubuntu to start making gradual changes. Since completely dropping Python 2 from one release to another would be quite disruptive, what I expect Ubuntu to do instead is first move it (and all of the packages that depend on it) to the 'universe' package set. This would effectively start the clock running on its actual removal in some later release, and also give people like me some advance warning about it.

(I believe that packages can be moved this way without causing heartburn to people upgrading from one release to the next, but I may be wrong.)

Python2RHEL8VsUbuntu2004 written at 02:16:55; Add Comment

2018-04-10

Our real problem with a removal of Python 2 is likely to be our users

In my recent entry on whether Ubuntu 20.04 LTS will include Python 2, I mentioned that this mattered because we have various system management tools written in Python 2, so if Python 2 was not going to be in 20.04, we'd need to start porting them so they'd be ready in time. Unfortunately, this need to port our system tools is probably not going to be the most painful part of the day that Ubuntu ships without Python 2. Instead, the real problem is our users. More specifically, the problem is all of the Python (2) programs that our users will have written over the years and still use and need.

Well, let me rephrase that. I shouldn't say 'users'; I should say 'the graduate students and professors of the department' (and also researchers, postdocs, undergrads doing research work with professors, visiting collaborators, and so on). Unusually for the modern world, we provide general multiuser computing to all of these people, and so these people log on to those Ubuntu-based servers and do whatever they want (more or less) with the things that Ubuntu provides. Some of these people write Python programs, and some of them are probably Python 2 programs. When Python 2 goes away, those programs are going to break.

(They will also probably break if /usr/bin/python turns into Python 3, which is one reason I hope Ubuntu doesn't do that any time soon. There being no /usr/bin/python is less confusing and easier to explain to angry users than 'python is now incompatible with what it was a week ago, sorry about that'.)

A few of these people are probably avid Python users and already know about Python 3. Of course, these people are probably already writing everything in Python 3, so they're unaffected by this. Many more of these people probably don't know about Python 3 for various reasons, including that their real work is writing a thesis or doing research, not knowing about developments in the programming language that they're working in. To add to the difficulty, we don't even know who they are (and I'm not sure how we'd find out, unless there is some very lightweight and non-intrusive way of instrumenting our systems to gather data when Python 2 gets run).

(Since we can't currently give our users any definitive information on when they won't have Python 2, it's also not very useful to reach out to them right now. Many of our users would rightfully consider rewriting things from Python 2 to Python 3 to be a distraction from their thesis or their research, and for that matter they may only need their Python 2 programs for a relatively limited time.)

The basically inevitable result of this is that we're likely to be forced to install Python 2 for backward compatibility for at least one LTS generation after Ubuntu drops it officially. Hopefully there will be people packaging Python 2.x as a PPA, which is the most convenient option on Ubuntu. The possible exception for this would be if Ubuntu gave everyone a significant amount of advance warning, for example if they announced before 20.04 that it would be the last release that included any version of Python 2 in the normal Ubuntu repositories. Then we could at least start trying to reach users, not that I expect us to be very successful at it.

(Because of the department's support model for users and other factors, most people will be on their own for dealing with things, too.)

PS: We're definitely not going to ever change /usr/bin/python to print a warning about the situation (for many reasons including that warnings are often fatal errors in practice), and I'm pretty sure we'd never alter it to syslog things when it starts. Any method of acquiring information about when Python 2 gets run needs to be entirely external.

Python2AndOurUsers written at 20:32:25; Add Comment

2018-04-08

The interesting question of whether Ubuntu 20.04 LTS will include Python 2

It's 2018, which means that 2020's end of Python 2 support is only two years away. Two years seems like a long time, but it's not really, especially if you're not a full time developer or Python person, which is our situation. One of the questions about what we have to do about our current set of Python programs boils down to the question of whether Ubuntu's very likely April 2020 Long Term Support release (Ubuntu 20.04) will include Python 2.

So far, Ubuntu has done LTS releases every two years in April; 10.04, 14.04, 16.04, and now the impending 18.04. If they follow this pattern, they will release the next LTS in April of 2020, after Python 2's end of life (which the Python people say is January 1st 2020), and if we follow our usual practices, we'll begin using Ubuntu 20.04 on some systems that summer and autumn. These systems will need to run our Python system management tools, which means that if Ubuntu 20.04 doesn't include Python 2, we need to have our tools running on Python 3 before then.

(Of course it might be a good idea to port our tools to Python 3 now, but there's a difference between being prepared and being forced. This is especially important when you have to prioritize various things you could be working on, which is generally our situation.)

Since Ubuntu 20.04 will be released after the Python 2 cutoff date, in theory it could drop Python 2 on the grounds that it's no longer supported by the upstream developers. However, in practice there are two issues. First, it seems very likely that Python 2 will be supported by other people if problems emerge, because there are other long term Linux distributions that are already committed to supporting systems with Python 2 past 2020 (for example, Red Hat Enterprise Linux 7, which will be supported through 2024, and then there's Ubuntu 18.04 itself, which will be supported through 2023). Second, it's not clear that all packages that currently use Python 2 will be updated to Python 3 in time for 2020 (see eg). Ubuntu could choose to throw Python 2 out anyway and to hell with any packages that this forces out, but that might not be very popular with people.

The current state of Ubuntu 18.04 is that Python 2.7 will be available in the 'main' package repository, directly supported by Ubuntu. One possible option for 20.04 is that Python 2.7 would be available but would be demoted to the community supported 'universe' package repository, which theoretically gives you lower expectations of bug and security fixes. This would give Ubuntu an option to shrug their shoulders if some serious issue comes up after 2020 and no one steps forward to fix it.

Probably the safest option for us is to begin moving our tools to Python 3, but likely not until 2019. If we started now, I'd have to make them compatible with Ubuntu 14.04's Python 3.4.3; if I wait until we've migrated all of our 14.04 machines to 18.04, I get to base everything on Ubuntu 16.04's somewhat more recent 3.5.2.

(Using 3.5 as the base could be potentially important, since the 3.5 changes brought in formatting for bytes and better handling for character encoding issues with sys.stdin and sys.stdout, both of which might be handy for our sysadmin-focused uses of Python.)

Sidebar: Red Hat Enterprise Linux 8 and Python 2

Unlike Ubuntu, Red Hat hasn't officially announced any timing or formal plans for RHEL 8. However, a new version of RHEL is due (based on RH's traditional timing) and there are some signs that one is in preparation, probably for release this summer. I can't imagine such a version not including Python 2, which means that Red Hat would likely be committed to supporting it through 2028.

This isn't necessarily a big burden, because it's my opinion that we're unlikely to find any serious issues in Python 2.7 after 2020. This is especially so if people like Red Hat make a concerted effort to find any remaining 2.7 problems before the official end of support, for example by extensively running fuzzing tools against 2.7 or by paying for some security auditing of Python's SSL code (or doing it themselves).

Python2AndLTSLinuxes written at 22:06:50; Add Comment

2018-03-22

Why seeing what current attributes a Python object has is hard

Back when I wrote some notes on __slots__ and class hierarchies, I said in passing that there was no simple way to see what attributes an object currently has (I was sort of talking about objects that use __slots__, but it's actually more general). Today, for reasons beyond the scope of this entry, I feel like talking about why things work out this way.

To see where things get tricky, I'll start out by talking about where they're simple. If what we have is some basic struct object and we want to see what fields it has, the most straightforward approach is to look at its __dict__. We can get the same result indirectly by taking the dir() of the object and subtracting the dir() of its class:

>>> class A:
...   def __init__(self):
...      self.a = 10
...      self.b = 20
... 
>>> a = A()
>>> set(dir(a)) - set(dir(a.__class__))
{'b', 'a'}

(This falls out of the definition of dir(), but note that this only works on simple objects that don't do a variety of things.)

The first problem is that neither version of this approach works for instances of classes that use __slots__. Such objects have no __dict__, and if you look at dir() it will tell you that they have no attributes of their own:

>>> class B:
...   __slots__ = ('a', 'b')
...   def __init__(self):
...      self.a = 10
...
>>> b = B()
>>> set(dir(b)) - set(dir(b.__class__))
set()

This follows straightforwardly from how __slots__ are defined, particularly this bit:

  • __slots__ are implemented at the class level by creating descriptors (Implementing Descriptors) for each variable name. [...]

Descriptors are attributes on the class, not on instances of the class, although they create behavior in those instances. As we can see in dir(), the class itself has a and b attributes:

>>> B.a
<member 'a' of 'B' objects>

(In CPython, these are member_descriptor objects.)

For an instance of a __slots__ using class, we still have a somewhat workable definition of what attributes it has. For each __slots__ attribute, an instance has the attribute if hasattr() is true for it, which means that you can access it. Here our b instance of B has an a attribute but doesn't have a b attribute. You can at least write code that mechanically checks this, although it's a bit harder than it looks.

(One part is that you need the union of __slots__ on all base classes.)

However, we've now arrived at the tricky bit. Suppose that we have a general property on a class under the name par. When should we say that instances of this class have a par attribute? In one sense, instances never will, because at the mechanical level par will always be a class attribute and will never appear in an instance __dict__. In another sense, we could reasonably say that instances have a par attribute when hasattr() is true for it, ie when accessing inst.par won't raise AttributeError; this is the same definition as we used for __slots__ attributes. Or we might want to be more general and say that an attribute only 'exists' for our purposes when accessing it doesn't raise any errors, not just AttributeError (after all, this is when we can use the attribute). But what if this property actually computes the value for par on the fly from somewhere, in effect turning an attribute into a method; do we say that par is still an attribute of the instance, even though it doesn't really act like an attribute any more?

Python has a lot of ways to attach sophisticated behavior to instances of classes that's triggered when you try to access an attribute in some way. Once we have such sophisticated behavior in action, there's no clear or universal definition of when an instance 'has' an attribute and it becomes a matter of interpretation and opinion. This is one deep cause of why there's no simple way to see what attributes an object currently has; once we get past the simple cases, it's not even clear what the question means.

(Even if we come up with a meaning for ourselves, classes that define __getattr__ or __getattribute__ make it basically impossible to see what attribute names we want to check, as the dir() documentation gently notes. There are many complications here.)

Sidebar: The pragmatic answer

The pragmatic answer is that if it's sensible to ask this question about an object at all, we can get pretty much the answer we want by looking at the object's __dict__ (if it has one), then adding the merged __slots__ names for which hasattr() reports true.

That this answer blows up on things like proxy objects suggests that perhaps it's not a question we should be asking in the first place, at least not outside of limited and specialized situations.

(In other words, it's possible to get entirely too entranced with the theory of Python and neglect its practical applications. I'm guilty of this from time to time.)

KnowingObjectAttrsHard written at 00:14:03; Add Comment

2018-03-20

Python and the 'bags of unstructured data' approach

These days I write code in both Go and Python, which sometimes gives me interesting new perspectives on each language as I shift back and forth. I was recently hacking on a Python program to mutate it into what I wanted, and as I did so what struck me is how Python's dynamic typing and everything around it enabled a specific approach that I'll call the 'bag of data' approach.

The base code I was starting with parses Linux's /proc/self/mountstats to get at all of the NFS statistics found there. All of the data fields in these statistics have defined meanings, meanings that this Python code knew, so it could have opted to use some kind of structures for them with actual named fields (perhaps using namedtuple). However, it didn't. Instead the code dumps everything into a small collection of dicts using various named keys, then yanks bits back out again as it needs them (and knows what structure each key's data will have).

This 'bag of data' approach only works in a dynamic language like Python, because there's no structure or typing to what goes where. A given key may give you a string, a number, an list of either, a sub-dict, or whatever. On the one hand this is harder to follow than something with fixed, named fields. On the other hand it's marvelously flexible and easy to manipulate and transform, especially in bulk. In theory you could do the same sort of thing with named fields (in Python), but in practice it is just easier to write code when you're dealing with dictionary keys and values, because getting lists of them and accessing arbitrary ones and doing indirection is really simple. With a dict that's a bag of data, it's natural to write code like this:

datas = [self.__rpc_data[x] for x in self.__rpc_data['ops']]
sumrpc = [sum(x) for x in zip(*datas)]

This isn't necessarily the best way to do things in final code, once you're sure you know what you need, but in the mean time this plasticity makes it very easy to experiment by transforming and mutating and remixing various pieces of data in the bag in convenient and quick to write ways. When 'sum fields together across all of the different NFS RPC operations' is a two liner, you're much more likely to try it if you think something interesting might result.

(One way that this is potentially flawed is that not all statistics fields in NFS RPC operations may make sense to sum together. But that's up to you to keep track of and sort out, because that's the tradeoff you get here.)

There's other nice things you can do with the bag of data approach. For example, it gives you a relatively natural way to deal with data that isn't always there. Python has lots of operations for checking if keys are in dicts, getting the value of a key with a default value if it's not there, and so on. You can build equivalents of all of these for named fields, but it's more work and isn't likely to feel as natural as 'if key in databage: ...'.

Another thing I've done in code is to successively refine my bag of data in multiple passes. My bag of data starts out with only very basic raw fields, then I generate some calculated fields and add them to the bag, then another pass derives an additional set of fields, and so on. Again, you can do this pattern with named fields, but it isn't a natural fit; probably the right way to do it with structured data is a series of different structures, perhaps embedding the previous ones. Pragmatically it's easier to write passes that simply add and update dictionary keys, though, partly because the knowledge of what fields exist when and where can be very localized.

(This is especially the case if what you're dealing with is a tree of data and you may want to run a pass over each node in the tree, where different nodes may be of different types. When everything is a dictionary it's easy to write generic code that only acts on certain things; life gets messier if you must carefully sniff out if the object you have is the right type of object for your pass to even start looking at.)

BagsOfData written at 00:13:27; Add Comment

2018-02-28

Using Python 3 for example code here on Wandering Thoughts

When I write about Python here, I often wind up having some example Python code, such as the subCls example in my recent entry about subclassing a __slots__ class. Mostly this Python code has been Python 2 by default, with Python 3 as the exception. When I started writing, Python 3 wasn't even released; then it wasn't really something you wanted to use; and then I was grumpy about it so I deliberately continued to use Python 2 for examples here, just as I continued to write programs in it (for good reasons). Sometimes I explicitly mentioned that my examples were in Python 2, but sometimes not, and that too was a bit of my grumpiness in action.

(There was also the small fact that I'm far more familiar with Python 2 than Python 3, so writing Python 2 code is what happens if I don't actively think about it.)

However, things change. Over the past few years I've basically made my peace with Python 3 and these days I'm trying to write new code in Python 3. Although writing my example code here in Python 2 is close to being a reflex, it's one that I want to consciously break. Going forward from now, I'm going to write sample code in Python 3 by default and only use Python 2 if there is some special reason for it (and then mention explicitly that the example is Python 2 instead of 3). This is a small gesture, but I figure it's about time, and it's also probably what more and more readers are just going to expect.

(It looks like I've been doing this inconsistently for a while, or at least testing some of my examples in Python 3 too, eg, and also increasingly linking to the Python 3 version of Python documentation instead of the Python 2 version.)

Actually doing this is going to take me some work and attention. Since I write Python 2 code by reflex, I'm going to have to double-check my examples to make sure that they're valid Python 3 (and that they behave the same way in Python 3). Some of the time this will mean actually testing even small fragments instead of relying on my Python (2) knowledge to write from memory. Also, when I'm checking Python's behavior for something (or prototyping some code), I'll have to remember to run python3 instead of just python or I'll accidentally wind up testing the wrong Python.

(When I wrote my recent entry I was quietly careful to make the example code Python 3 code by including a super() and then using the no-argument version, which is Python 3 only.)

(I'm writing this entry partly to put a marker in the ground for myself, so that I won't be tempted to let a Python 2 example slide just because I'm feeling lazy and I don't want to work out and verify the Python 3 version.)

Python3ForExamples written at 02:24:08; Add Comment

2018-02-25

What Python does when you subclass a __slots__ class is the right answer

Recently on Twitter @simonw sort of asked this question:

Trying to figure out if it's possible to create an immutable Python class who's subclasses inherit its immutability - using __slots__ on the parent class prevents attrs being set, but any child class still gets a mutable __dict__

To rephrase this, if you subclass a class that uses __slots__, by default you can freely set arbitrary attributes on instances of your subclass. Python's behavior here surprises people every so often (me included); it seems to strike a fair number of people as intuitively obvious that __slots__ should be sticky, so that if you subclass a __slots__ class, you yourself would also be a __slots__ class. In light of this, we can ask whether Python has made the right decision here or if this is basically a bug.

My answer is that this is a feature and Python has made the right choice. Let's consider the problems if things worked the other way around and __slots__ was sticky to subclasses. The obvious problem that would come up is that any subclass would have to remember to declare any new instance attributes it used in a __slots__ of its own. In other words, if you wrote this, it would fail:

class subCls(slotsCls):
   def __init__(self, a, b, c):
      super().__init__(a)
      self.b = b
      self.c = c
   [...]

In practice, I don't think this is a big issue by itself. Almost all Python code sets all instance variables in __init__, which means that you'd find out about the problem the moment you created an instance of the subclass, for example in your tests. Even if you only create some instance variables in __init__ and defer others to later, a single new variable would be enough to trigger the usual failure. This means you're extremely likely to trip over this right away, either in tests or in real code; you'll almost never have mysterious deferred failures.

However, it points toward the real problem, which is that classes couldn't switch to using __slots__ without breaking subclasses. Effectively, that you weren't using __slots__ would become part of your class API. With Python as it is today, using or not using __slots__ is an implementation decision that's local to your class; you can switch back and forth without affecting anyone else (unless outside people try to set their own attributes on your instances, but that's not a good idea anyway). If the __slots__ nature was inherited and you switched to using __slots__ for your own reasons, all subclasses would break just like my subCls example above, including completely independent people outside your codebase who are monkey subclassing you in order to change some of your behavior.

(You could sort of work around this with the right magic, but then you'd lose some of the memory use benefits of switching to __slots__.)

Given the long term impact of making __slots__ sticky, I think that Python made the right decision to not do so. A Python where __slots__ was sticky would be a more annoying one with more breakage as people evolved classes (and also people feeling more constrained with how they could evolve their classes).

(There would also be technical issues with a sticky __slots__ with CPython today, but those could probably be worked around.)

SlotsSubclassSurpriseRight written at 01:34:35; Add Comment

2018-01-09

Differences between keywords and constants in Python

Yesterday I wrote about the challenges of having true constants in Python and said that there are other mechanisms that achieve basically the same results if all we care about are a few builtin values like True, False, and None. The most straightforward way is what was done with None in Python 2 and eventually done with True and False in Python 3, which is to make them into keywords. This raises the obvious question, namely why the Python people waited until Python 3 to make this change. One way of starting to answer this is to ask what the difference is (or would be) between Python keywords and hypothetical true constants (or just the ordinary 'constants' Python 2 has today for True and False).

If you look in Python's language documentation in the keywords section, you sort of get an answer:

The following identifiers are used as reserved words, or keywords of the language, and cannot be used as ordinary identifiers. [...]

(Emphasis mine.)

A keyword cannot be used as an identifier in any context, not merely as a variable (whether global to a module or even local to a function). If you try to define the following class in Python 3, you'll get a syntax error:

class example:
  def __init__(self):
    self.True = 10

If you try harder, you can nominally create the instance attribute (either by directly setting it in self.__dict__ or by naming it in __slots__), but then you have no way of getting access to it as an attribute, since writing obj.True in any context gets you a syntax error.

(By extension, you can't have a method called True or False either.)

Our hypothetical true constants would not be so restricted. A constant would be unchangeable in its namespace, but it certainly wouldn't block the use of its name as an identifier in general in other contexts. You probably shouldn't give user-created names that much power (and the idea is a bad fit for Python's semantics anyway, with no obvious way to implement it).

Given this, we can look at some issues with making True and False into keywords in Python 2.

To start with, it's unlikely that someone was using either True or False as the name of a field in a class but it's not impossible. If they were and if some version of Python 2 made True and False into keywords, that code would immediately fail to even start running. Although I don't know for sure, I suspect that Python 2 had no infrastructure that would have let it report deprecation warnings in advance for this, so it probably would have been an abrupt change.

However, this is a pretty esoteric reason and there's a much more pragmatic one, illustrated by the the example that Giedrius Statkevičius reported at the end of his article. The pymodbus module defined True and False in its __init__.py, not because it was worried about other people overriding them, but because at one point it wanted to support older Python versions while still using them:

# Define True and False if we don't have them (2.3.2)
try:
    True, False
except NameError:
    True, False = 1, 0

(A later version changed the values to be the result of boolean comparisons.)

If True and False had been created as keywords, there would be no way to use them and be backwards compatible with versions of Python 2 before they were defined. If they're keywords, merely writing a line that says 'True = (1 == 1)' is a syntax error when the module is imported or otherwise used, even if the line is never executed. You have no good way to define your own versions of them in Python versions where they're not supported (technically there is one way, but let's not go there), which means that you can't use them at all until you're willing to completely abandon support for those older Python versions. Forcing people to make this choice right up front is not a good way to get new features used; in fact, it's a great way for a story to spread through the community of 'oh, you can't use True and False because ...'. This is counterproductive, to put it one way.

Python 3 can make this sort of change because Python 3 was already making incompatible changes; in fact, making incompatible changes is its entire point. Python 2 was not in a good position to do it. Thus, I suspect that this is the major reason that Python 2 didn't make True and False into keywords but instead just put them into the builtins namespace as values.

KeywordsVsConstants written at 01:25:32; Add Comment

(Previous 10 or go back to January 2018 at 2018/01/07)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.