Wandering Thoughts

2018-09-17

Python 3 supports not churning memory on IO

I am probably late to this particular party, just as I am late to many Python 3 things, but today (in the course of research for another entry) I discovered the pleasant fact that Python 3 now supports read and write IO to and from appropriate pre-created byte buffers. This is supported at the low level and also at the high level with file objects (as covered in the io module).

In Python 2, one of the drawbacks of Python for relatively high performance IO-related code was that reading data always required allocating a new string to hold it, and changing what you were writing also required new strings (you could write the same byte string over and over again without memory allocation, although not necessarily a Unicode string). Python 3's introduction of mutable bytestring objects (aka 'read-write bytes-like objects') means that we can bypass both issues now. With reading data, you can read data into an existing mutable bytearray (or a suitable memoryview), or a set of them. For writing data, you can write a mutable bytestring and then mutate it in place to write different data a second time. This probably doesn't help much if you're generating entirely new data (unless you can do it piece by piece), but is great if you only need to change a bit of the data to write a new chunk of stuff.

One obvious question here is how you limit how much data you read. Python modules in the standard library appear to have taken two different approaches to this. The os module and the io module use the total size of the pre-allocated buffer or buffers you've provided as the only limit. The socket module defaults to the size of the buffer you provide, but allows you to further limit the amount of data read to below that. This initially struck me as odd, but then I realized that network protocols often have situations where you know you want only a few more bytes in order to complete some element of a protocol. Limiting the amount of data read below the native buffer size means that you can have a single maximum-sized buffer while still doing short reads if you only want the next N bytes.

(If I'm understanding things right, you could do this with a memoryview of explicitly limited size. But this would still require a new memoryview object, and they actually take up a not tiny amount of space; sys.getsizeof() on a 64-bit Linux machine says they're 192 bytes each. A bytearray's fixed size is actually smaller, apparently coming in at 56 bytes for an empty one and 58 bytes for one with a single byte in it.)

Sidebar: Subset memoryviews

Suppose you have a big bytearray object, and you want a memoryview of the first N bytes of it. As far as I can see, you actually need to make two memoryviews:

>>> b = bytearray(200)
>>> b[0:4]
bytearray(b'\x00\x00\x00\x00')
>>> m = memoryview(b)
>>> ms = m[0:30]
>>> ms[0:4] = b'1234'
>>> b[0:4]
bytearray(b'1234')

It is tempting to do 'memoryview(b[0:30])', but that creates a copy of the bytearray that you then get a memoryview of, so your change doesn't actually change the original bytearray (and you're churning memory). Of course if you intend to do this regularly, you'd create the initial memoryview up front and keep it around for the lifetime of the bytearray itself.

I'm a little bit surprised that memoryview objects don't have support for creating subset views from the start, although I'm sure there are good reasons for it.

Python3MutableBufferIO written at 23:32:23; Add Comment

2018-09-16

CPython has a fairly strongly predictable runtime, which can be handy

I recently needed a program to test and explore some Linux NFS client behavior (namely, our recent NFS issue). Because this behavior depended on what user-level operations the kernel saw, I needed to be very specific about what system calls my test setup made, in what order, and so on. I also wanted something that I could rapidly put together and easily revise and alter for experiments, to see just what sequence of (system call) operations were necessary to cause our issues. In a way the obvious language to write this in would be C, but instead I immediately turned to Python.

Beyond the speed of writing things in Python, the obvious advantage of Python here is that the os module provides more or less direct access to all of the system calls I wanted (ultimately mixed with the fcntl module in order to get flock()). Although Python normally works with file objects, which are abstracted, the os module gives you almost raw access to Unix file descriptors and the common operations on them, which map closely to system calls.

That latter bit is important, and leads to the subtle thing. Although the os module's documentation doesn't quite promise it directly, the operations it exposes translate almost completely directly to Unix system calls, and CPython's interpreter runtime doesn't alter them or add others intermixed into them (well, not others related to the files and so on that you're working on; it may do operations like request more memory, although probably not for simple test code). This means that you can write a fair amount of code using the os module (and fcntl, and a few others) that deal with raw Unix file descriptors (fds) and be pretty confident that Python is doing exactly what you asked it to and nothing else.

This is something you get with C, of course, but it's not something you can always say about other language runtimes. For test programs like what I needed, it can be a quite handy sort of behavior. I already knew CPython behaved like this from previous work, which is why I was willing to immediately turn to it for my test program here.

(If you're sufficiently cautious, you'll want to verify the behavior with a system call tracer, such as strace on Linux. If you do, it becomes very useful that the CPython runtime makes relatively few system calls that you didn't ask it to make, so it's easy to find and follow the system calls produced by your test code. Again, some language runtime environments are different here; they may have a churn of their own system calls that are used to maintain background activities, which clutter up strace output and so on.)

CPythonPredictableSyscalls written at 01:03:59; Add Comment

2018-08-24

Incremental development in Python versus actual tests

After writing yesterday's entry on how I need to add tests to our Djanog app, I've been thinking about why it doesn't have them already. One way to describe the situation is that I didn't bother to write any tests when I wrote the app, but another view is that I didn't write tests because I didn't need to. So let me explain what I mean by that.

When I ran into the startup overhead of small Python programs, my eventual solution was to write a second implementation in Go, which was kind of an odd experience (as noted). One of the interesting differences between the two versions is that the Go version has a fair number of tests and the Python one doesn't have any. There are a number of reasons for this, but one of them is that in Go, tests are often how you interact with your code. I don't mean that philosophically; I mean that concretely.

In Python, if you've written some code and you want to try it out to see if it at least sort of works, you fire up the Python interpreter, do 'import whatever' (even for your main program), and start poking away. In Go, you have no REPL, so often the easiest way to poke at some new code is to open up your editor, write some minimal code that you officially call a 'test', and run 'go test' to invoke it (and everything else you have as a test). This is more work than running an interactive Python session and it's much slower to iterate on 'what happens if ...' questions about the code, but it has the quiet advantage that it's naturally persistent (since it's already in a file).

This is the sense in which I didn't need tests to write our Django app. As I was coding, I could use the Python REPL and then later all of Django's ready to go framework itself to see if my code worked. I didn't have to actually write tests in order to test my code, not in the way that you can really need to in Go. In Python, incremental development can easily be done with all of your 'tests' being ad hoc manual work that isn't captured or repeatable.

(Even in Go, my testing often trails off dramatically the moment I have enough code written that I can start running a command to exercise things. In the Go version of my Python program, basically all of the tests are for low-level things and I 'tested' to see if higher level things worked by running the program.)

PS: Django helps this incremental development along by making it easy to connect bits of your code up to a 'user interface' in the form of a web page. You need somewhat more than a function call in the REPL but not much more, and then you can use 'manage runserver ...' to give you a URL you can use to poke your code, both to see things rendered and to test form handling. And sometimes you can check various pieces of code out just through the Django admin interface.

PPS: Of course it's better to do incremental development by writing actual tests. But it takes longer, especially if you don't already know how to test things in the framework you're using, as I didn't when I was putting the app together (cf).

PythonREPLAndTests written at 01:11:54; Add Comment

2018-08-23

It's time for me to buckle down and add tests to our Django web app

Our Django web app is the Python 2 code that I'm most concerned about in a world where people are trying to get rid of Python 2, for two reasons. First, not only do we have to worry about Python 2 itself being available but also the Django people have been quite explicit that Django 1.11 is the last version that supports Python 2 and the Django people will stop supporting it in 2020. We probably don't want to be using an unsupported web framework. Second, it's probably the program that's most exposed to character set conversion issues, simply because that seems to be in the nature of things that deal with the web, databases, and so on. In short, we've got to convert it to Python 3 sometime, probably relatively soon, and it's likely to be more challenging than other conversions we've done.

One of the things that would make a Python 3 conversion less challenging is if we had (automated) tests for the app, ideally fairly comprehensive ones. Having solid tests for your code is best practices for a Python 3 conversion for good reasons, and they'd also probably help with things like our Django upgrade pains. Unfortunately we've never had them, which was something I regretted in 2014 and hasn't gotten any better since then, because there's never been a time when adding tests was either a high enough priority or something that I was at all enthused about doing.

(One of my reasons for not feeling enthusiastic is that I suspect that trying to test the current code would lead me to have to do significant revisions on it in order to make it decently testable.)

Looking at our situation, I've wound up feeling that it's time for this to change. Our path forward with the Django app should start with adding tests, which will make both Python 3 and future Django upgrades (including to 1.11 itself) less risky, less work, and less tedious (since right now I do all testing by hand).

(Hopefully adding tests will have other benefits for future development and so on, but some of these are contingent on additional factors beyond the scope of this entry.)

Unfortunately, adding tests to this code is likely to feel like make-work to me, and in a sense it is; the code already works (yes, as far as we know), so all that tests do is confirm that it does. I have no features to add, so I can't add tests to cover the new features as I add them; instead, this is going to have to be just grinding out tests for existing code. Still, I think it needs to be done, and the first step for doing it is for me to learn how to test Django code, starting by reading the documentation.

(This entry is one of the ones that I'm writing in large part as a marker in the ground for myself, to make it more likely that I'll actually carry through on things. This doesn't always work; I still haven't actually studied dpkg and apt, despite declaring I was going to two years ago and having various tabs open on documentation since then. I've even read bits of the documentation from time to time, and then all of the stuff I've read quietly falls out of my mind again. The useful bits of dpkg I've picked up since then have come despite this documentation, not because of it. Generally they come from me having some problem and stumbling over a way to fix it. Unfortunately, our problems with our Django app, while real, are also big and diffuse and not immediate, so it's easy to put them off.)

DjangoWeNeedTests written at 01:38:11; Add Comment

2018-07-23

Some notes on lifting Python 2 code into Python 3 code

We have a set of Python programs that are the core of our ZFS spares handling system. The production versions are written in Python 2 and run on OmniOS on our ZFS fileservers, but we're moving to ZFS-based Linux fileservers, so this code needed a tune-up to cope with the change in environment. As part of our decision to use Python 3 for future tools, I decided to change this code over to Python 3 (partly because I needed to write some completely new Python code to handle Linux device names).

This is not a rewrite or even a port; instead, let's call it lifting code from Python 2 up to Python 3. Mechanically what I did is similar to the first time I did this sort of shift, which is that I changed the '#!/usr/bin/python' at the start of the programs to '#!/usr/bin/python3' and then worked to fix everything that Python 3 complained about. For this code, there have only been a few significant things so far:

  • changing all tabs to spaces, which I did with expand (and I think I overdid it, since I didn't use 'expand -i').

  • changing print statements into print() calls. I learned the hard way to not overlook bare 'print' statements; in Python 2 that produces a newline, while in Python 3 it's still valid but does nothing.

  • converting 'except CLS, VAR:' statements to the modern form, as this code was old enough to have a number of my old Python 2 code habits.

  • taking .sort()s that used comparison functions and figuring out how to creatively generate sort keys that gave the same results. This opened my mind up a bit, although there are still nuances that using sort keys can't easily capture.

  • immediately list()-ifying most calls of adict.keys(), because that particular assumption was all over my code. There were a couple of cases that perhaps I could have deferred the list-ification to later (if at all), but this 'lifting' is intended to be brute force.

    (I didn't list-ify cases where I was clearly immediately iterating, such as 'for ... in d.keys()' or 'avar = [x for ... in d.keys()]'. But any time I assigned .keys() to a name or returned it, it got list-ified.)

  • replace use of optparse with argparse. This wasn't strictly necessary (Python 3 still has optparse), but argparse is the future so I figured I'd fix things while I was working on the code anyway.

Although these tools do have a certain amount of IO, I could get away with relying on Python 3's default character set conversion rules; in practice they should only ever be dealing with ASCII input and output, and if they aren't something has probably gone terribly wrong (eg our ZFS status reporting program has decided to start spraying out binary garbage). This is fairly typical of internal-use system tools but not necessarily of other things, which can expose interesting character set conversion questions.

(My somewhat uninformed view is that character set conversion issues are where moving from Python 2 to Python 3 gets exciting. If you can mostly ignore them, as I could here, you have a much easier time. If you have to consider them, it's probably going to be more porting than just casually lifting the code into Python 3.)

For the most part this 2-to-3 lifting went well and was straightforward. It would have gone better if I had meaningful tests for this code, but I've always had problems writing tests for command line programs (and some of this code is unusually complex to test). I used pyflakes to try to help find Python 3 issues that I'd overlooked; it found some issues but not all of them, and it at least feels less thorough than pychecker used to be. What I would really like is something that's designed to look for lingering Python 2-isms that either definitely don't work in Python 3 or that might be signs of problems, but I suspect that no such tool exists.

(I tried pylint very briefly, but stopped when it had an explosion of gripes with no obvious way to turn off most of them. I don't care about style 'issues' in this code; I want to know about actual problems.)

I'm a bit concerned that there are lingering problems in the code, but this is basically the tradeoff I get to make for taking the approach of 'lifting' instead of 'porting'. Lifting is less work if everything is straightforward and goes well, but it's not as thorough as carefully reading through everything and porting it piece by carefully considered piece (or using tests on everything). I had to stumble over a few .sort()s with comparison functions and un-listified .keys(), especially early on, which has made me conscious that there could be other 2-to-3 issues I just haven't hit in my test usage of the programs. That's one reason I'd like a scanner; it would know what to look for (probably better than I do right now) and as a program, it would look in all of the code's corners.

PS: I remember having a so-so experience with 2to3 many years in the past, but writing this entry got me to see what it did to the Python 2 versions. For the most part it was an okay starting point, but it didn't even flag uses of .sort() with a comparison function and it did significant overkill on list-ifying adict.keys(). Still, reading its proposed diffs just now was interesting. Probably not interesting enough to get me to use it in the future, though.

LiftingPython2ToPython3 written at 23:56:44; Add Comment

2018-07-15

When I'll probably be able to use Python assignment expressions

The big recent Python news is that assignment expressions have been accepted for Python 3.8. This was apparently so contentious and charged a process that in its wake Guido van Rossum has stepped down as Python's BDFL. I don't have any strong feelings on assignment expressions for reasons beyond the scope of this entry, but today I want to think about how soon I could possibly use them in my Python code, and then how soon I could safely use them (ie how soon they will be everywhere I care about). The answers to out to be surprising, at least to me (it's probably not to experienced Python hands).

The nominal Python 3.8 release schedule is set out in PEP 569. According to it, Python 3.8 is planned to be released in October of 2019; however, there's some signs that the Python people want to move faster on this (see this LWN article). If Python sticks to the original timing, Python 3.8 might make Ubuntu 20.04 LTS (released in April 2020 but frozen before then) and would probably make the next Fedora release if Fedora keeps to their current schedule and does a release in May of 2020. So at this point it looks like the earliest I'd be able to use assignment expressions is in about two years. If Python moves up the 3.8 release schedule significantly, it might make one Fedora release earlier (the fall 2019 release), making that about a year and a half before I could think about it.

There are many versions of 'can safely use' for me, but I'll pick the one for work. There 'safely use' means that they're supported by the oldest Ubuntu LTS release I need to run the Python code on. We're deploying long-lived Ubuntu 18.04 machines now that will only be updated starting in 2022, so if Python 3.8 makes Ubuntu 20.04 that will be when I can probably start thinking about it, because everything will be 2020 or later. That's actually a pretty short time to safe use as these things go, but that's a coincidence due to the release timing of Python 3.8 and Ubuntu LTS versions. If Python 3.8 misses Ubuntu 20.04 LTS, I'd have to wait another two years (to 2024) unless I only cared about running my code on Ubuntu 22.04.

Of course, I'm projecting things four to six years into the future and that's dangerous at the best of times. We've already seen that Python may change its release timing, and who knows about both Ubuntu and Fedora.

(It seems a reasonably safe guess that I'll still be using Fedora on my desktops over that time period, and pretty safe that we'll still be using Ubuntu LTS at work, but things could happen there too.)

The reason that all of this was surprising to me was that I assumed Python 3.8 was further along in its development if controversial and much argued over change proposals were getting accepted for it. I guess the arguments started well before Python 3.7 was released, which makes sense given the 3.7 release schedule; 3.7 was frozen at the end of January, so everyone could start arguing about 3.8 no later than then.

(The official PEP has an initial date of the end of February, but I've heard it was in development and being discussed before then, just not formalized yet as a PEP.)

PS: If Debian keeps to their usual release schedule, it looks like Python 3.8 on its original schedule would miss the next Debian stable version (Debian 10). It would probably miss it even on an aggressive release schedule that saw Python 3.8 come out only a year after 3.7, since 3.7 was released only a few weeks ago.

AssignmentExpressionsWhen written at 23:35:50; Add Comment

2018-07-10

Remembering that Python lists can use tuples as the sort keys

I was recently moving some old Python 2 code to Python 3 (due to a recent decision). This particular code is sufficiently old that it has (or had) a number of my old Python code habits, and in particular it made repeated use of list .sort() with comparison functions. Python 3 doesn't support this; instead you have to tell .sort() what key to use to sort the list. For a lot of the code the conversion was straightforward and obvious because it was just using a field from the object as the sort key. Then I hit a comparison function that looked like this:

def _pricmp(a, b):
  apri = a.prio or sys.maxint
  bpri = b.prio or sys.maxint
  if apri != bpri:
      return cmp(apri, bpri)
  return cmp(a.totbytes, b.totbytes)

I stared at this with a sinking feeling, because this comparison function wasn't just picking a field, it was expressing logic. Losing complex comparison logic is a long standing concern of mine, so I was worried that I'd finally run into a situation where I would be forced into unpleasant hacks.

Then I remembered something obvious: Python supports sorting on tuples, not just single objects. Sorting on tuples compares the two tuples field by field, so you can easily implement the same sort of tie-breaking secondary comparison that I was doing in _pricmp. So I wrote a simple function to generate the tuple of key fields:

def _prikey(a):
  apri = a.prio or sys.maxint
  return (apri, a.totbytes)

Unsurprisingly, this just worked (including the tie-breaking, which actually comes up fairly often in this particular comparison). It's probably even somewhat clearer, and it certainly avoids some potential comparison function mistakes

(It's also shorter, but that's not necessarily a good thing.)

PS: Python has supported sorting tuples for a long time but I don't usually think about it, so things had to swirl around in my head for a bit before the light dawned about how to solve my issue. There's a certain mental shift that you need to go from 'the key= function retrieves the key field' to 'the key= function creates the sort key, but it's usually a plain field value'.

SortTakesTupleKeys written at 00:43:25; Add Comment

2018-06-20

Revising my view on Python 3 for new code again: you should use it

Almost five years ago, I wrote Reversing my view on Python 3 for new general code: avoid it (which Pete Zaitcev recently reminded me about). I have now reversed my views once again and now I feel that you should definitely use Python 3 for new code. There are three reasons for this, two positive and one negative.

The first positive reason is that the current Python 3 ecosystem is generally vibrant and alive, unlike the state (almost) five years ago. With Python 3 having become a success some time ago, people have been writing Python 3 things and porting things to Python 3 for some time now. For that matter, an increasing number of interesting things are Python 3 only. So today you're pretty unlikely to suffer from ecosystem issues in Python 3; if anything, it's likely that the Python 3 ecosystem is healthier than Python 2's. Certainly if you like interesting new packages that are exploring new ideas and new APIs, you want to be using Python 3.

The second positive reason is that I've come around to feeling that Python 3 has genuine attractions and interesting things, both new language features and improvements in the standard library. This was attractive back in 2016 and it's slowly gotten more so since then. Sometimes Python 3 has even sped itself up (well, CPython, which is what we mostly think of as 'Python'). I suspect that the improvements aren't revolutionary for most people, but they are nice. Also, as I've found out myself, writing Python 3 code is generally not much different than writing Python 2 code, and I certainly haven't found it more annoying.

The negative reason is that time is running out for Python 2 (and even I can see that). We're less than two years away from the official End of Life of Python 2 from the core developers and we're seeing developments like an increasing number of Linux distributions at least trying to either drop or reduce support for Python 2 by then, as LWN has covered (I've got my own views and hopes). The attempts to move away from having Python 2 around or supporting it are likely to ramp up significantly over the next year and a half, both in OS distributions and in major Python projects that still support it (such as Django, where 1.11 is the last version that supports Python 2). If you're going to write new Python 2 code now, you're increasingly going to be staring this abyss in the face unless you're only using systems and projects that you already know will be supporting Python 2 past its official EOL, possibly well past based on your needs.

(This looming abyss is one reason that the Python 3 ecosystem is probably already healthier than the Python 2 one and it's only going to increase as January 1st 2020 looms up on us. One Python version has a future, one doesn't, and you can guess where people are going to increasingly focus.)

I still feel that Python 3's Unicode handling and its interactions with Unix has warts, but I'm also a pragmatist. Those warts lurk in dark corners and most of the time, most of us will never run into them. If your systems are well behaved your code is not going to run into non-UTF-8 command line arguments or filenames or the like, just like most of the time our shell scripts don't run into filenames with newlines in them. More generally, forced character set conversion into and out of Unicode almost always works on modern systems in many circumstances, because modern systems almost always use and have valid UTF-8. The result is that you can write a lot of perfectly functional Python code that basically ignores the issues and assumes you'll never hit a Unicode decoding or encoding error. I certainly have (and it's running fine for us).

Python3NewCodeIII written at 23:21:09; Add Comment

The time to be compatible with both Python 2 and Python 3 is past

Pete Zaitcev recently made his program Slasti run under Python 3. In his LJ entry, he said this about the work:

Overall, the biggest source of issues was not the py3 model, but trying to make the code compatible. I'm not going to do that again if I can help it: either py2 or py3, but not both.

For all that I've had plenty of issues with Python 3, I wholeheartedly agree with Pete Zaitcev's view here; it's time to abandon compatibility with Python 2, especially for programs instead of packages, unless you have a compelling reason otherwise. If you want to move code to Python 3, just do that, don't try to make your code work on both. A clean break will make your life better.

Back in the old days, when Python 3 was just starting to spread, it made sense to be 2/3 cross compatible even if it was a bit of a pain and added odd contortions to your code; not everyone even had decent versions of Python 3 (to the extent that they even existed in the beginning) and there were all sorts of other roadblocks and considerations. But those days are long over. Python 3 is both more capable and more pervasive and most of all it's succeeded, and at this point we're less than two years from the official end of life of Python 2. It's time to put Python 2 out to pasture and move onward, instead of making life hard on ourselves.

(Sometimes you can make code trivially or even accidentally cross compatible and if this happens, sure, keep things that way. What I'm talking about is going to extra effort and adding extra contortions to your code to accommodate both Python 2 and Python 3 people.)

If you want to move a program to Python 3, the modern state of things is that pretty much anyone who wants to use it should be able to do so. If they can't do so because they're on a system that is so old it doesn't have a decent version of Python 3, they've got bigger problems than just your program; sooner or later they're going to have to get a capable Python 3, probably sooner. For packages, well, we're less than two years from Python 2 EOL so anyone who is stuck with Python 2 only packages has a problem that goes well beyond being unable to use your new Python 3 only version.

(If they just haven't gotten around to moving their code to Python 3, perhaps your package will be just the push they need. But probably not; I suspect that a lot of people with Python 2 programs and systems have basically frozen them at this point.)

If you have to run your code on a system or in an environment without a good Python 3, that's one thing. If you're being paid to make it work on both versions, for whatever reasons, well, you're being paid for it. But otherwise? If you're going to change code to run on Python 3, it's time to let Python 2 go, and I say that as someone who still is unhappy about how the whole Python 2/3 transition was done (or is still being done).

PS: As far as Python 2 code goes, if you have existing code and you want or need to keep it running on Python 2, don't bother trying to make it also run on Python 3; wait until you can make a clean break with Python 2. In my view the same is true for new Python 2 code, but if you're writing new Python 2 code at this point you know your own situation best; it may be that your new code will have to live on past your transition from 2 to 3 and making it 3-compatible from the start will be better and less work than porting it at some point.

AbandonPython2Versions written at 01:17:18; Add Comment

2018-05-10

Python modules use operator overloading in two different ways

In Python (as in elsewhere), there are at least two different things that people use operator overloading for. That there's more than one thing makes a difference because some patterns of designing how operator overload work aren't sufficiently general to handle both things; if you want to serve both groups, you need to design a more general mechanism than you might expect, one that delegates more power to objects.

The first use of operator overloading is to extend operators so that they work (in the traditional ways) on objects that they wouldn't normally work on. The classical examples of this is complex numbers and rational numbers (both of which Python has in the standard library), and in general various sorts of things built with numbers and numeric representations. However you can go beyond this, to objects that aren't strictly numeric but which can use at least some of the the traditional numeric operators in ways that still obey the usual rules of arithmetic and make sense. Python sets implement some numeric operations in ways that continue to make sense and are unsurprising.

The second use is to simply hijack the operations in order to do something convenient for your objects with a handy symbol for it. Sometimes these operations are vaguely related to their numeric equivalents (such as string multiplication, where "a" * 4 gets you "aaaa"), but sometimes they have nothing to do with it. The classic example of the latter is the string % operator, which has nothing at all to do with arithmetic but instead formats a string using % formatting codes. Using the % operator for this is certainly convenient and it has a certain mnemonic value and neatness factor, but it definitely has nothing to do with %'s normal use in arithmetic.

Now, let us consider the case of Python not allowing you to overload boolean AND and OR. In a comment on that entry, Aneurin Price said:

I'm not at all convinced by this argument. My expectation for this hypothetical __band__ is that it would be called after evaluating a and finding it truthy, at which point b is evaluated either way. [...]

This is definitely true if you think of operator overloading as only for the first case. But, unfortunately for the design of overloading AND and OR, this is not all that people would like to use it for. My understanding is that ORMs such as Django's and SQLAlchemy would like to intercept AND and OR in order to build up complicated conditional SQL queries with, essentially, a DSL based on Python expressions. In this DSL, they would like to be able to write something like:

Q.descfield.startswith("Who") or Q.descfield.startswith("What")

This wouldn't evaluate or produce any sort of truth value; instead it would produce an object representing a pending SQL query with a WHERE clause that encoded this OR condition. Later you'd execute the SQL query to produce the actual results.

If operator overloading for AND and OR paid any attention to the nominal truth value of the left expression, there is no way to make this work. Instead, allowing general overloading of AND and OR requires allowing the left side expression to hijack the process before then. In general, operator overloading that allows for this sort of usage needs to allow for this sort of early hijacking; fortunately this is generally easy for arithmetic operators.

(I'm not sure Python has truly general support for mixing unusual numerical types together, but then such general support is probably very hard to implement. I think you want to be able to express a compatibility table, where each type can say that its overloads handle certain other types or types that have certain properties or something. Otherwise getting your rational number type to interact well with my Point type gets really complicated really fast, if not impossible.)

TwoSortsOfOverloading written at 00:28:02; Add Comment

(Previous 10 or go back to May 2018 at 2018/05/07)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.