2024-08-26
What's going on with 'quit' in an interactive CPython session (as of 3.12)
We're probably all been there at some time or the other:
$ python [...] >>> quit Use quit() or Ctrl-D (i.e. EOF) to exit
It's an infamous and frustrating 'error' message and we've probably
all seen it (there's a similar one for 'exit
'). Today I was
reminded of this CPython behavior by a Fediverse conversation and as I was
thinking about it, the penny belatedly dropped on what is going
on here in CPython.
Let's start with this:
>>> type(quit) <class '_sitebuiltins.Quitter'>
In CPython 3.12 and earlier, the CPython interactive interpreter evaulates Python statements; as far as I know, it has little to no special handling of what you type to it, it just evaluates things and then prints the result under appropriate circumstances. So 'quit' is not special syntax recognized by the interpreter, but instead a Python object. The message being printed is not special handling but instead a standard CPython interpreter feature to helpfully print the representation of objects, which the _sitebuiltins.Quitter class has customized to print this message. You can see all of this in Lib/_sitebuiltins.py, along with classes used for some other, related things.
(Then the 'quit' and 'exit' instances are created and wired up in Lib/site.py, along with a number of other things.)
This is changing in Python 3.13 (via), which defaults to using a new interactive shell, which I believe is called 'pyrepl' (see Libs/_pyrepl). Pyrepl has specific support for commands like 'quit', although this support actually reuses the _sitebuiltins code (see REPL_COMMANDS in Lib/_pyrepl/simple_interact.py). Basically, pyrepl knows to call some objects instead of printing their repr() if they're entered alone on a line, so you enter 'quit' and it winds up being the same as if you'd said 'quit()'.
2024-07-31
We may want /usr/bin/python to be Python 3 sooner than I expected
For historical reasons, we still have a '/usr/bin/python' that is Python 2 on our Ubuntu 22.04 machines. Yes, we know, Python 2 isn't supported any more, but our users have had more than a decade where /usr/bin/python was Python 2 and while Ubuntu continued to ship a Python 2, we didn't feel like breaking their '#!/usr/bin/python' lines in scripts by either removing /usr/bin/python or making it Python 3. That option ran out in Ubuntu 24.04, which doesn't ship any Python 2 packages and so provides no native way to have a Python 2 /usr/bin/python (you can make a symlink to your own version of Python 2, if you really insist). In my entry on the state of Python in Ubuntu 24.04, I speculated that we might wind up with /usr/bin/python existing and being Python 3 in Ubuntu 26.04. With more time and more water under the bridge, I think we're fairly likely to do that or even move faster, partly because there are forces pushing reasonably strongly in that direction.
One of the things that I've been doing is watching for things running '/usr/bin/python' on our current login servers, because those things are going to break when we start upgrading them from Ubuntu 22.04 to Ubuntu 24.04 and I'd like to warn people in advance. In doing this, I found a number of people who seemed to every now and then run '/usr/bin/python' in a VSCode environment. Now, I rather doubt that people who are using VSCode are writing Python 2 programs here in 2024. Instead, I think it's much more likely that something in their VSCode environment is invoking '/usr/bin/python' and expecting to get Python 3.
Here in 2024, I suspect that this is a perfectly reasonable expectation and almost always works out for whatever VSCode related bit is doing it. Python 2 has been unsupported for four years and probably almost all Linux systems with a /usr/bin/python have it being Python 3 (this has been the situation on Fedora Linux for some time, for example). I also suspect that most Linux systems do have a /usr/bin/python. We are the weird outliers, and as weird outliers we can expect things to not work; at first a few things, and then more things as the assumption that '/usr/bin/python' is the way you get Python 3 becomes embedded in more and more software.
(I suspect that VSCode is not the only thing doing this on our systems, merely the one that's most visible to me right now.)
Having written this entry, I'm now reconsidering our schedule. As far as I can tell, we have low usage of /usr/bin/python today, although my checks aren't necessarily comprehensive, which means that relatively few people will be affected by a change to what it is. So rather than waiting until Ubuntu 26.04 or later to make /usr/bin/python be Python 3, perhaps we should wait only six months or so after we roll out Ubuntu 24.04 before switching from having no /usr/bin/python (and any remaining people having their scripts fail to run) to having it be Python 3. The result would probably be better for both people and programs.
PS: The simple answer to why not immediately switch /usr/bin/python to Python 3 when we move to Ubuntu 24.04 is that the error messages people will get for /usr/bin/python being missing are likely to be clearer than the ones they would get from running Python 2 code under Python 3.
2024-06-16
Understanding a Python closure oddity
Recently, Glyph pointed out a Python oddity on the Fediverse and I had to stare at it for a bit to understand what was going on, partly because my mind is partly thinking in Go these days, and Go has a different issue in similar code. So let's start with the code:
def loop(): for number in range(10): def closure(): return number yield closure eagerly = [each() for each in loop()] lazily = [each() for each in list(loop())]
The oddity is that 'eagerly
' and 'lazily
' wind up different,
and why.
The first thing that is going on in this Python code is that while
'number
' is only used in the for
loop, it is an ordinary function
local variable. We could set it before the loop and look at it after
the loop if we wanted to, and if we did, it would be '9' at the end
of the for
loop. The consequence and the corollary is that every
closure returned in the 'for
' loop is using the same 'number
'
local variable.
(In some languages and in some circumstances, each closure would
close over a different instance of 'number
'; see for example
this Go 1.22 change.)
Since all of the closures are using the same 'number
' local
variable, what matters for what value they return is when they are
called. When you call any of them, it will return the value of
'number
' that is in effect in the 'loop
' function as of that
moment. And if you call any of them after the 'loop
' function has
finished, 'number
' has the value of '9'.
This also means that if you call a single 'each' function more than once, the value it returns can be different. For example:
>>> g = loop() >>> each0 = g.__next__() >>> each0() 0 >>> each1 = g.__next__() >>> each0() 1
(What the 'loop()
' call actually returns is a generator.
I'm directly calling its magic method to be explicit, rather
than using the more general next()
.)
And in a way this is the difference between 'eagerly
' and 'lazily
'.
For 'eagerly
', the list comprehension
iterates through the results of 'loop()
' and immediately calls
each version of 'each
' that it obtains, which gets the value of
'number
' that is in effect right then. For 'lazily
', the
'list(loop())
' first collects all of the 'each
' closures, which
ends the 'for
' loop in the 'loop
' function and means 'number
'
is now '9', and then calls all of the 'each
' closures, which all
return the final value of 'number
'.
The 'eagerly' and 'lazily' names may be a bit confusing (they were
to me). What they refer to is whether we eagerly or lazily call
each closure as it is returned by 'loop()
'. In 'eagerly', we call
the closures immediately; in 'lazily', we call them only later,
after the 'for
' loop is done and 'number
' has taken on its final
value. As Glyph said on the Fediverse, there is another
level of eagerness or laziness, which is how aggressively we iterate
the generator from 'loop()
', and this is actually backward from the
names; in 'eagerly' we lazily iterate the generator, while in 'lazily'
we eagerly iterate the generator (that's what the 'list()
' does).
(I'm writing this entry partly for myself, because someday I may run into an issue like this in my own Python code. If you only use a generator with code patterns like the 'eagerly' case, an issue like this could lurk undetected for some time.)
2024-05-29
PyPy has been quietly working for me for several years now
A number of years ago I switched to installing various Python programs through pipx so that each of them got their own automatically managed virtual environment, rather than me having to wrestle with various issues from alternate approaches. On our Ubuntu servers, it wound up being simpler to do this using my own version of PyPy instead of Ubuntu's CPython, for various reasons. I've been operating this way for long enough that I didn't really remember how long.
Recently we got our first cloud server, and I wound up installing our cloud provider's basic CLI tool. This CLI tool has a number of official ways of installing it, but when the dust settles I discovered it was a Python package (with a bunch of additional complicated dependencies) and this package is available on PyPi. So I decided to see if 'pipx install <x>' would work, which it did. Only much later did it occur to me that this very large Python and stuff tool was running happily under PyPy, because this is the default if I just 'pipx install' something.
As it turns out, everything I have installed through pipx on our servers is currently installed using PyPy instead of CPython, and all of it works fine. I've been running all sorts of code with PyPy for years without noticing anything different. There is definitely code that will notice (I used to have some), but either I haven't encountered any of it yet or significant packages are now routinely tested under PyPy and hardened against things like deferred garbage collection of open files.
(Some current Python idioms, such as the 'with
' statement,
avoid this sort of problem, because they explicitly close files and
otherwise release resources as you're done with them.)
In a way there's nothing remarkable about this. PyPy's goal is to be a replacement for CPython that simply works while generally being faster. In another way, it's nice to see that PyPy has been basically completely successful in this for me, to the extent that I can forget that my pipx-installed things are all running under PyPy and that a big cloud vendor thing just worked.
2024-04-30
The state of Python in Ubuntu 24.04 LTS
Ubuntu 24.04 LTS has just been released and as usual it's on our minds, although not as much so as Ubuntu 22.04 was. So once again I feel like doing a quick review of the state of Python in 24.04, as I did for 22.04. Since Fedora 40 has also just been released I'm going to throw that in too.
The big change between 22.04 and 24.04 for us is that 24.04 has entirely dropped Python 2 packages. There is no CPython 2, which has been unsupported by the main Python developers for years, but there's also no Python 2 version of PyPy, which is supported upstream and will be for a long time (cf). At the moment, the Python 2 binary .debs from Ubuntu 22.04 LTS still install and work well enough for us on Ubuntu 24.04, but the writing is on the wall there. In Ubuntu 26.04 we will likely have to compile our own Python from source (and not the .deb sources, which don't seem to readily rebuild on 24.04). It's possible that someone has a PPA with CPython 2 for 24.04; I haven't looked.
(Yes, we still care about Python 2 because we have system management scripts that have been there for fifteen years and which are written in Python 2.)
In Ubuntu 22.04, /usr/bin/python was an optional symbolic link that could point to either Python 2 or Python 3. In 24.04 it is still an optional symbolic link, but now your only option is Python 3. We've opted to have no /usr/bin/python in our 24.04 installation, so that any of our people who are still using '#!/usr/bin/python' in scripts will have them clearly break. It's possible that in a few years (for Ubuntu 26.04 LTS, if we use it) we'll start having a /usr/bin/python that points to Python 3 (or Ubuntu will make it a mandatory part of their Python 3 package). If nothing else, that would be convenient for interactive use.
Ubuntu 24.04 has Python 3.12.3, which was released this past April 9th; this is really fast work to get it into 24.04 (although since Canonical will be supporting 24.04 for up to five years, they have a bit of a motivation to start with the latest). Perhaps unsurprisingly, Fedora 40 is a bit further behind, with Python 3.12.2. Both Ubuntu 24.04 and Fedora 40 have PyPy 7.3.15. Ubuntu 24.04 only has the Python 3.9 version of PyPy 3; Fedora has both the 3.9 and 3.10 versions.
Both Ubuntu 24.04 and Fedora 40 have pipx available as a standard package. Fedora 40 has version 1.5.0; Ubuntu 24.04 is on 1.4.3. The pipx changelog suggests that this isn't a critical difference, and I'm not certain I'd notice any difference in practice.
I suspect that Fedora won't keep its minimal CPython 2 package around forever, although I don't know what their removal schedule is. Hopefully they will keep the Python 2 version of PyPy around for at least as long as the upstream PyPy supports it. Fedora has more freedom here than Ubuntu does, since a given Fedora release only has to be supported for a year or so, instead of Ubuntu 24.04 LTS's five years (or more, if you pay for extended support from Canonical).
PS: Ubuntu 24.04 has Django version 4.2.11, the latest version of the 4.2 series, which is a sensible choice since the Django 4.2 series is one of the Django project's LTS releases and so will be supported upstream until April 2026, saving Canonical some work (cf).
2024-04-12
Please don't try to hot-reload changed Python files too often
There is a person running a Python program on one of our servers, which is something that people do regularly. As far as I can tell, this person's Python program is using some Python framework that supports on the fly reloading (often called hot-reloading) of changed Python code for at least some of the loaded code, and perhaps much or all of it. Naturally, in order to see if you need to hot-reload any code, you need to check whether a bunch of files have changed (at least in our environment, some environments may be able to do this slightly better). This person's Python code is otherwise almost always idle.
The particular Python code involved has decided to check for a need to hot-reload code once every second. In our NFS fileserver environment, this has caused one particular fileserver to see a constant load of about 1100 NFS RPC operations a second, purely from the Python hot-reload code rechecking what appears to be a pile of things every second. These checks are also not cheap on the machine where the code is running; this particular process routinely uses about 7% to 8% of one CPU as it's sitting there otherwise idle.
(There was a time when you didn't necessarily care about CPU usage on otherwise idle machines. In these days of containerization and packing multiple services on one machine and renting the smallest and thus cheapest VPS you can get away with, there may be no such thing as a genuinely idle machine, and all CPU usage is coming from somewhere.)
To be fair, it's possible that the program is being run in some sort of development mode, where fast hot-reload can be potentially important. But people do run 'development mode' in more or less production, and it's possible to detect that. It would be nice if hot-reload code made some efforts to detect that, and perhaps also some efforts to detect when things were completely idle and there had been no detected changes for a long time and it should dial back the frequency of hot-reload checks. But I'm probably tilting at windmills.
(I also think that you should provide some sort of option to set the hot-reload frequency, because people are going to want to do this sooner or later. You should do this even if you only try to do hot reloading in development mode, because sooner or later people are going to run your development mode in pseudo-production because that's the easiest way for them.)
PS: These days this also applies to true development mode usage of things. People can easily step away from their development environment for meetings or whatever, and they may well be running it on their laptop, where they would like you to not burn up their battery constantly. Just because someone has a development mode environment running doesn't mean they're actively using it right now.
2024-03-24
Platform peculiarities and Python (with an example)
I have a long standing little Python tool to turn IP addresses into
verified hostnames and report what's wrong if it can't do this
(doing verified reverse DNS lookups is somewhat complicated). Recently I discovered
that socket.gethostbyaddr()
on
my Linux machines was only returning a single name for an IP address
that was associated with more than one. A Fediverse thread revealed that this
reproduced for some people, but not for everyone, and that it also
happened in other programs.
The Python socket.gethostbyaddr()
documentation doesn't discuss
specific limitations like this, but the overall socket
documentation does
say that the module is basically a layer over the platform's C
library APIs. However, it doesn't document exactly what APIs are
used, and in this case it matters. Glibc on Linux says that
gethostbyaddr()
is
deprecated in favour of getnameinfo()
, so a C
program like CPython might reasonably use either to implement its
gethostbyaddr()
. The C gethostbyaddr()
supports returning
multiple names (at least in theory), but getnameinfo()
specifically
does not; it only ever returns a single name.
In practice, the current CPython on Linux will normally use
gethostbyaddr_r()
(see Modules/socketmodule.c's
socket_gethostbyaddr()).
This means that CPython isn't restricted to returning a single name and
is instead inheriting whatever peculiarities of glibc (or another libc,
for people on Linux distributions that use an alternative libc). On glibc,
it appears that this behavior depends on what NSS modules you're using, with the default glibc
'dns' NSS module not
seeming to normally return multiple names this way, even for glibc APIs
where this is possible.
Given all of this, it's not surprising that the CPython documentation doesn't say anything specific. There's not very much specific it can say, since the behavior varies in so many peculiar ways (and has probably changed over time). However, this does illustrate that platform peculiarities are visible through CPython APIs, for better or worse (and, like me, you may not even be aware of those peculiarities until you encounter them). If you want something that is certain to bypass platform peculiarities, you probably need to do it yourself (in this case, probably with dnspython).
(The Go documentation for a similar function does specifically say that if
it uses the C library it returns at most one result, but that's
because the Go authors know their function calls getnameinfo()
and as mentioned, that can only return one name (at most).)
2024-02-04
I switched to explicit imports of things in our Django application
When I wrote our Django application it was a long time ago, I didn't know Django, and I was sort of in a hurry, so I used what I believe was the style at the time for Django of often doing broad imports of things from both Django modules and especially the application's other modules:
from django.conf.urls import *from accounts.models import *
This wasn't universal; even at the time it was apparently partly the style to import only specific things from Django modules, and I followed that style in our code.
However, when I moved the application to Python 3 I also switched all of these over to specific imports. This wasn't required by Django (or by Python 3); instead, I did it because it made my editor complain less. Specifically it made Flycheck in GNU Emacs complain less (in my setup). I decided to do this change because I wanted to use Flycheck's list of issues to check for other, more serious issues, and because Flycheck specifically listed all of the missing or unknown imports. Because Flycheck listed them for me, I could readily write down everything it was reporting and see the errors vanish. When I had everything necessary imported, Flycheck was nicely quiet (about that).
Some of the import lines wound up being rather long (as you can imagine, the application's views.py uses a lot of things from our models.py). Even still, this is probably better for a future version of me who has to look at this code later. Some of what comes from the application models is obvious (like core object types), but not all of it; I was using some imported functions as well, and now the imports explicitly lists where they come from. And for Django modules, now I have a list of what I'm using from them (often not much), so if things change in a future Django version (such as the move from django.conf.urls to django.urls), I'll be better placed to track down the new locations and names.
In theory I could have made this change at any time. In practice, I only made it once I'd configured GNU Emacs for good Python editing and learned about Flycheck's ability to show me the full error list. Before then all of the pieces were two spread apart and too awkward for me to reach for.
(Of course, this isn't the first time that my available tools have influenced how I programmed in a way that I noticed.)
2024-02-03
Solving one of our Django problems in a sideways, brute force way
A few years ago I wrote about an issue with propagating some errors in our Django application. We have two sources of truth for user authorization, one outside of Django (in Unix group membership that was used by Apache HTTP Basic Authentication), and one inside Django in a 'users' table; these two can become desynchronized, with someone in the Unix group but not in the application's users table. The application's 'retrieve a user record' function either returns the user record or raises an Http404 exception that Django automatically handles, which means that someone who hasn't been added to the user table will get 404 results for every URL, which isn't very friendly. I wanted to handle this by finding a good way to render a different error page in this case, either by customizing what the 'Http404' error page contained or by raising a different error.
All of this is solving the problem in the obvious way and also a cool thing to (try to) do in Django. Who doesn't want to write Python code that handles exceptional cases by, well, raising exceptions and then having them magically caught and turn into different rendered pages? But Django doesn't particularly support this, although I might have been able to add something by writing an application specific piece of Django middleware that worked by catching our custom 'no such user' exception and rendering an appropriate template as the response. However, this would have been my first piece of middleware, so I held off trying anything here until we updated to a modern version of Django (partly in the hopes it might have a solution).
Then, recently a simpler but rather less cool option to deal with this whole issue occurred to me. We have a Django management command that checks our database for consistency in various ways (for example, unused records of certain types, or people in the application's users table who no longer exist), which we run every night (from cron). Although it was a bit of a violation of 'separation of concerns', I could have that command know about the Unix group(s) that let people through Apache, and then have it check that all of the group members were in the Django user table. If people were omitted, we'd get a report. This is pretty brute force and there's nothing that guarantees that the command's list of groups stays in synchronization with our Apache configuration, but it works.
It's also a better experience for people than the cool way I was previously considering, because it lets us proactively fix the problem before people encounter it, instead of only reactively fixing it after someone runs into this and reports the issue to us. Generally, we'll add someone to the Unix group, forget to add them to Django, and then get email about it the next day before they'll ever try to use the application, letting us transparently fix our own mistake.
(This feels related to something I realized very early about not trying to do everything through Django's admin interface.)
2024-02-01
Our Django application is now using Python 3 and a modern Django
We have a long standing Django web application to handle the process of people requesting Unix accounts here and having the official sponsor of their account approve it. For a long time, this web app was stuck on Python 2 and Django 1.10 after a failed attempt to upgrade to Django 1.11 in 2019. Our reliance on Python 2 was obviously a problem, and with the not so far off end of life of Ubuntu 20.04 it was getting more acute (we use Apache's mod_wsgi, and Ubuntu 22.04 and later don't have a Python 2 version of that for obvious reasons). Recently I decided I had to slog through the process of moving to Python 3 and a modern Django (one that is actually supported) and it was better to start early. To my pleasant surprise the process of bringing it up under Python 3 and Django 4.2 was much less work than I expected, and recently we migrated the production version. At this point it's been running long enough (and has done enough) that I'm calling this upgrade a success.
There are a number of reasons for this smooth and rapid sailing. For a start, it turns out that my 2019 work to bring the app up under Python 3 covered most of the work necessary, although not all of it. Our previous problems with CSRF and Apache HTTP Basic Authentication have either been sidestepped by Django changes since 1.11 or perhaps mitigated by Django configuration changes based on a greater understanding of this area that I worked out two years ago. And despite some grumpy things I've said about Django in the past, our application needed very few changes to go from Django 1.10 to Django 4.2.
(Most of the Django changes seem to have been moving from 'load staticfiles' to 'load static' in templates, and replacing use of django.conf.urls.url() with django.urls.re_path(), although we could probably do our URL mapping better if we wanted to. There are other minor changes, like importing functions from different places, changing request.POST.has_key(X) to X in request.POST, and defining DEFAULT_AUTO_FIELD in our settings.)
Having this migration done and working takes a real load off of my mind for the obvious reasons; neither Python 2 nor Django 1.10 are what we should really be using today, even if they work, and now we're free to upgrade the server hosting this web application beyond Ubuntu 20.04. I'm also glad that it took relatively little work now.
(Probably this will make me more willing to keep up to date with Django versions in the future. We're not on Django 5.0 because it requires a more recent version of Python 3 than Ubuntu 20.04 has, but that will probably change this summer or fall as we start upgrades to Ubuntu 24.04.)