The convenience (for me) of people writing commands in Python
The other day I was exploring Certbot,
which is more or less the standard and 'as official as it ever gets'
client for Let's Encrypt, and it did something that I objected to.
Certbot is a very big program with a great many commands, modes,
options, settings, and so on, and this was the kind of thing where
I wasn't completely confident there even was a way to disable it.
However, sometimes I'm a system programmer and the particular thing
had printed a distinctive message. So, off to the source code I went
grep (okay, ripgrep),
to find the message string and work backward from there.
Conveniently, Certbot is written in Python, which has two advantages here. The first advantage is that I actually know Python, which makes it easier to follow any logic I need to follow. The second is that Python programs intrinsically come with their source code, just as the standard library does. Certbot is open source and I was installing Ubuntu's official package for it, which gave me at least two ways of getting the source code, but there's nothing like not even having to go to the effort.
(And then there's WebAssembly.)
Another cultural aspect of this is that a lot of commands written
in Python are written in relatively straightforward ways that are
easy to follow; you can usually
grep through the code for what
function something is in, then what calls that function, and so on
and so forth. This is not a given and it's quite possible to create
hard to follow tangles of magic (I've sort of done this in the
past) or a tower of classes inside
classes that are called through hard to follow patterns of delegation,
object instantiation, and so on. But it's at least unusual, especially
in relatively straightforward commands and in code bases that aren't
PS: Certbot is on the edge of 'large' here, but for what I was looking for it was still functions calling functions.
PPS: That installing a Python thing gives you a bunch of
on your filesystem is not a completely sure thing. I believe that
there are Python package and module distribution formats that don't
.py files but leave them all bundled up, although the
current Wheel format is apparently purely for distribution, not
running in place.
I am out of touch with the state of Python package distribution,
so I don't know how this goes if you install things yourself.
Conditional expressions in any form are an attractive thing
In a recent entry I mentioned in passing
that I once had relatively strong feelings about Python's '
COND else B_' conditional expressions but those
had probably faded away. In a comment, Twirrim said:
I've started to use the A if FOO else BAR syntax, much to my surprise. In general, I hate it.
One increasingly common pattern in the code I write:
logging.basicConfig(level=logging.DEBUG if args.verbose else logging.INFO)
(or variations thereof, if I'm using a CLI framework like click)
Yes, very much this. One of my feelings about almost any form of ternary operator or conditional expression is that having it at all is so attractive that people will use almost any syntax that you come up with, regardless of what they feel about the syntax. Condensing a multi-line set of statements down to a single expression is sufficiently compelling that people will put up with a great deal to get it. I'll go so far as to say that people will willingly make their code less readable to get it.
There are ways around needing a conditional expression in situations like this, and I have probably adopted some of them in my code; for example, I might initialize a global 'log level' variable or setting based on things like the verbosity level the user has set on the command line. Whether or not this is a good thing is probably in the eye of the beholder, and I'm sure that some people will say that the best code is the one that spells it out explicitly (perhaps in a function that you call to determine the log level).
(In my view, the obvious corollary of how attractive conditional expressions are is that it's important to give them good syntax. Unlike other language constructs, where sufficiently annoying syntax may lead to them not being used, conditional expressions will likely get used no matter what. If your syntax is less than ideal, it'll still be all over code bases sooner or later.)
Sidebar: My use of conditional expressions has now surprised me
In my first entry, I claimed that I hadn't used conditional expressions yet. That was based on grep'ing an assortment of code that I had on hand, but it turns out that I wasn't complete enough. More extensive searching turned up at least two places (and then now a third). First, our Django app has one usage, which Mercurial tells me dates from 2013. Of course, Python's conditional expressions are also very old; they first appeared in Python 2.5, which was released in September of 2006.
More embarrassingly, the source code for DWiki turns out to have several uses, and some of these are reasonably complex, where I wrote things like:
avar = b if b else c.thing if c else None
I'm not sure if this nested code is a good idea, especially without ()'s to make ordering clear, but for what it's worth I can sort of read it out even now, several years after I wrote it.
(And I also used it in one place in a recent little utility program. As you might guess from my difficulties here, our Python code is scattered all over.)
Why I no longer have strong feelings about changes to Python
A while back I wrote about when I'll probably be able to use the
(then) very contentious Python assignment expressions and said in passing that I didn't have
any strong opinions on them. There was probably a time when I would
have felt differently; for example, I used to have relatively
strong feelings about '
A if CONDITION else B' conditional expressions and I'm not sure I do any more, although I don't
seem to have used them yet in any recent code.
(Since I had to look up the syntax just now, that's probably partly because I just didn't remember how to write them.)
There are two ways of putting why I no longer have strong opinions here. The first is that I no longer really care what Python looks like. The second is that I have lost my mental picture of Python; what was once clear enough for me to have opinions on how things looked has dissolved into a muddle. These two reasons are related, of course, with each contributing to the other. Since Python has turned into a complex muddle of a language, with more features and syntax than I can keep track of at my current relatively low and infrequent usage of Python, one more piece of syntax makes little difference to me and I can no longer have any opinions on conceptual unity, Pythonicness, or the like.
Will I ever use assignment expressions even once they're available to me? Probably not, honestly, unless I find some code where they would make things much clearer and I remember that they exist. In practice, a fair number of new Python 3 features have not been compelling enough to get me to even think about using them (or, to put it more bluntly, to remember them in any detail).
(There is also the issue that a lot of my current Python code is
written for work, and there I run into the same issue as my
pragmatic problem with using the
attrs module in work code. In practice this concern is probably
overblown. Among other things, I suspect my current Python code is
less readable than I think.)
PS: My detachment from modern Python 3 is not exactly a strength and perhaps someday I should reset it, perhaps by going through the current Python 3 tutorial. Sooner or later I should learn how to write modern Python 3 code, however that looks and whatever modern idioms the community has settled on as good practices.
Some thoughts on Red Hat Enterprise 8 including Python 2 and what it means
Red Hat Enterprise 8 was released the other day, and now Red Hat
has published an article on Python 2 (and 3) in RHEL 8
The short version is that they aren't providing a package called
'python' but instead two packages called 'python3' and 'python2'
(or two 'application streams'
for Python 2 and 3, which come with additional packages). Although
it's not entirely clear, Red Hat is apparently not going to have a
/usr/bin/python symlink by default, leaving it up you to set one up
through their alternatives system. Red Hat is recommending that you
explicitly use '
python2' or '
python3' as the name of the script
interpreter in '
#!' lines, instead of relying on just the '
(The presence of a '
python2' binary name is not new in RHEL 8;
one was present at least as far back as RHEL 6. Also, this may or
may not backtrack on things Red Hat said a year ago.)
In a way, the big news here is that RHEL 8 includes Python 2 at all as an official package, since RHEL 8 will be supported for probably somewhere around a decade (and they'd previously sort of suggested that they weren't going to). Unless Red Hat officially abandons doing any updates for Python 2 at some point, this means that they'll be supporting it (at least as far as fixing any security issues that are discovered) for much of that decade, and since their work here is open source, other people can take advantage of it. I suspect that Red Hat is not entirely happy with this, but I also suspect that they felt they had no choice for various reasons.
(I rather expect Python 2 to not be included in a future Red Hat Enterprise 9, which might be released somewhere around 2023 or 2024 based on past history. Unless Red Hat gets a lot of push back from customers, I suspect that RHEL 8 will be the only dual-Python RHEL release.)
I suspect that this makes it somewhat more likely than it already was that Ubuntu 20.04 LTS will include Python 2. At the moment, Python 2 is currently part of the bleeding edge Ubuntu rolling version and is still apparently part of the 'main' package repository. That could change before 20.04 LTS freezes and branches, but Ubuntu is running out of time to do that and, more importantly, they're running out of pre-LTS releases to do it in; there would normally only be 19.10, due out in October. Since RHEL 8 includes Python 2, including Python 2 in Ubuntu is safer in that Ubuntu can probably rely on copying Red Hat's fixes, if any are needed.
(Also, per this 2018 LWN article, Debian will be shipping Python 2 with their next distribution, which they're in the process of trying to release at the moment. I believe that Debian wants to strip out Python 2 after that, but I wouldn't necessarily expect fast movement on that, and Ubuntu probably won't be more aggressive than Debian here.)
None of this means that people using Python 2 are completely safe. For a start, Python based packages and systems have been moving away from supporting Python 2 for some time. For an example that's relevant to us, the last Django version that supports Python 2 is 1.11, which itself will only be supported until April 2020 (cf). Unless we want to count on Ubuntu 18.04's packaging of Django (and we don't), the presence of Python 2 in Ubuntu 20.04 will be not too relevant for our Django web application. These days, we also install some popular Python packages for GPU computation and so on, and they're very likely to be Python 3 only soon if they aren't already (I haven't checked the current state of things like Tensorflow). And even if Ubuntu 20.04 includes Python 2, Ubuntu 22.04 might not, and that's not all that far away.
I also suspect that even when Python 2 is available in some form,
more future distributions will follow RHEL 8's model and try not
to provide a
/usr/bin/python that points to it, especially on
completely new installs (which is our usual case). We can try to fight this,
but I suspect that we're better off changing our Python (2) programs
to use '
#!/usr/bin/python2'. Our users may force our hands, though,
if they object strongly enough to there not being a '
(Slowly making that change may give us a chance to inventory just how many Python programs we actually have rattling around this place. The answer is probably 'more than I thought we did', since we've been writing various random things in Python for quite a while now.)
Various aspects of Python made debugging my
tarfile problem unusual
I was recently thinking about what I like when I use Python, and
in the process I wound up reflecting about how working out that
tarfile module is too generous about what is a tar file was made different and easier by various
aspects of Python. I'm not going to say that I couldn't have
worked out a similar problem in, say, Go,
but if I had, I think it would have been a relatively different
One aspect of CPython specifically is that a lot of the standard
library is written in Python and so intrinsically has its source
code available even on a standard Python install (because the source
code is what CPython will run). You don't have to try to install
debugging symbols or fetch a source package; I could just go find
tarfile.py and read it immediately. This reduced friction is part
of what made me actually go digging in the first place, because it
wasn't that much work to take a quick peek to see if I could figure
out what was going on (then things snowballed from there).
Once I was poking at the
tarfile module, another useful Python
peculiarity became important. Python lets you use (or abuse) the
import path to provide your own versions of modules from the standard
library, preempting the stock version. I could copy my program to
a scratch directory, copy the
tarfile.py from Python distribution
to the same directory, and start adding
import tarfile' in my own program
to another name or another path, the way I would have had to in
some other languages.
(This was useful for more than using a hacked
diagnosing things. It also meant that when I thought I had a
workaround in my own code, I could rename my
tarfile.py and have
my program instantly revert to using the stock Python
module, so I could verify that my fix wasn't being influenced by
Everyone cites Python's interactive interpreter and the ease of examining objects in it as great advantages, and I'm not going to argue; certainly I've used it for lots of exploration. Once I had things narrowed down to what I thought was the cause, the interactive interpreter was the fastest place to get to running code and so the best environment to quickly try out my guesses. In other languages I might have to fire up an editor to write a program or at least some tests, or craft a carefully built input file for my program.
(Technically it also sort of made for a pretty minimal reproduction case in my eventual bug report, because I implicitly assumed I didn't need to write up anything more than what would be needed to duplicate it inside an interactive interpreter.)
The cycle of editing
tarfile.py and re-running my program to test
and explore the module's behavior was probably not any faster in
Python than it might have been in a non-interpreted language, but
it felt different. The code I was editing was what was actually
running a few moments later, not something that was going to be
transformed through a build process. And for some reason, Python
code often feels more mutable to me than code in other languages
(perhaps because I percieve it as having less bureaucracy, due to
dynamic typing and the ability to easily print out random things
and so on).
Overall, I think the whole experience felt more lightweight and casual in Python than it would have in many other languages I'm familiar with. I was basically bashing things together and seeing how far I could get with relatively little effort, and the answer turned out to be all the way to a standard library bug.
tarfile module is too generous about what is considered a tar file
The Python standard library's
tarfile module has a
function that tells you whether or not some file is a tar file, or
at least is a tar file that the module can read. As is not too silly
in Python, it operates by attempting to open the file with
open() succeeds, clearly this is a good tarfile.
Unfortunately, through what is perhaps a bug, this fails to report any errors on various sorts of things that are not actually tar files. On a Unix system, the very easiest and simplest reproduction of this problem is:
>>> import tarfile >>> tarfile.open("/dev/zero", "r:")
This raises no exception and gives you back a TarFile object that will report that you have an empty tar file.
(If you leave off the '
r:', this hangs, ultimately because the
lzma module will
happily read forever from a stream of zero bytes. Unless you tell
it otherwise, the tarfile module normally tries a sequences of
decompressors on your potential tarfile, including lzma for
One specific form of thing that will cause this issue is any nominal
'tar file' that starts with 512 bytes of zero bytes (after any
decompression is applied). Since this applies to
have our handy and obviously incorrect reproduction case. There may
be other initial 512-byte blocks that will cause this; I have not
investigated the code deeply, partly because it is tangled.
I suspect that this is a bug in the
TarFile.next function, which
looks like it is missing an '
elif self.offset == 0:' clause (see
the block of code starting around here). But
whether or not this issue is a bug and will be fixed in a future
version of Python 3, it is very widespread in existing versions of
Python that are out there in the field, and so any code that cares
about this (which we have some of) needs to
cope with it.
My current hack workaround is to check whether or not the
list on the returned TarFile object is empty. This is not a documented
attribute, but it's unlikely to change and it works today (and feels
slightly less sleazy than checking whether
(For reasons beyond the scope of this entry, I have decided to slog through the effort of finding how to submit Python bug reports, unearthing my login from the last time I threw a bug report into their issue tracker, and filing a version of this as issue 36596.)
Going from a bound instance method to its class instance in Python
In response to yesterday's entry on how I feel callable classes are better than closures, a commentator suggested:
If you need something callable, why not use a bound method? They have a reference to the parent too.
This raises a question: how easy and reliable is it to go from a bound method on an instance to the instance itself?
In both Python 2 and Python 3, a bound method is an instance of a
special type (how this happens is described in my entry on how
functions become bound methods). Although
the Python 3 documentation is not explicit about it, this type is
what is described in the "Instance methods" section of the Python
3 data model.
This description of the (bound) method type officially documents
__self__ attribute, which is a reference to the original
instance that the bound method is derived from. So the answer is
that given an object
x that is passed to you as a bound method,
you can recover the actual instance as
x.__self__ and then
inspect it from there.
(In Python 2.7, there is also the
im_self attribute, which
contains the same information.)
If you want your code to check if it has a bound method, you can
This name for the type can also be used to check its
really won't tell you much; you're better off reading the "Instance
methods" section of the data model.
I'm not sure how I feel about relying on this. On the one hand, it
is officially documented and it works the same in Python 3 and
Python 2 (ignoring Python 2's
im_self and the possibility of
unbound methods on Python 2). On the other hand, this is a
attribute, and using those generally feels somewhat like I'm peeking
into implementation details. I don't know if the Python developers
consider this a stable API or something that very definitely isn't
guaranteed over the long term.
(If nothing else, now I know a little bit more about Python than I did before I decided to look this up. I was actually expecting the answer to be more obscure than it turned out to be.)
Callable class instances versus closures in Python
At first, like every operator overload, this seems like a nifty idea. And then, like most operator overload cases, we need to ask: why? Why is this better than a named method?
I wholeheartedly agree with this, and in the beginning I agreed
with the whole article. But then I began thinking about my usage
__call__ and something that the article advocated as a
replacement, and found that I partially disagree with it. To quote
If something really is nothing more than a function call with some extra arguments, then either a closure or a partial would be appropriate.
(By 'partial', the article means the use of
to construct a partially applied function.)
My view is that if you have to provide something that's callable,
a callable class is better than a closure because it's more
amenable to inspection. A class instance is a clear thing; you
can easily see what it is, what it's doing, and inspect the state
of instances (especially if you remember to give your class a
__repr__). You can
even easily give them (and their methods) docstrings, so that
help() provides helpful information about them.
None of this is true of closures (unless you go well out of your way) and only a bit of it is true of partially applied functions. Even if you go out of your way to provide a docstring for your closure function, the whole assemblage is basically an opaque blob. A partially applied function is somewhat better because the resulting object exposes some information, but it's still not as open and transparent as an object.
This becomes especially important if your callable thing is going to be called repeatedly and hold internal state. It's far easier to make this internal state visible, potentially modifiable, and above all debuggable if you're using an object than if you try to wrap all of this up inside a function (or a closure) that manipulates its internal variables. Python objects are designed to be transparent (at least by default), as peculiar as this sounds in general.
(After all, one of the usual stated purposes of objects is to encapsulate things away from the outside world.)
Callable classes are unquestionably more verbose than closures, partially applied functions, or even lambdas, and sometimes this is annoying. But I think you should use them for anything that is not trivial by itself, and maybe even for small things depending on how long the resulting callable entities are going to live and how far away they are going to propagate in your program. The result is likely to be more maintainable and more debuggable.
PS: This somewhat biases me toward providing things with the entire
instance and using
__call__ over providing a method on the
instance. If you're trying to debug something, it's harder to go
from a method to inspecting the instance it comes from. Providing
just a method is probably okay if the use is 'close' to the class
definition (eg, in the same file or the same module), because then
you can look back and forth easily. Providing the full instance is
what I'd do if I was passing the callable thing around to another
module or returning it as part of my public API.
Using default function arguments to avoid creating a class
Recently I was writing some Python code to print out Prometheus metrics about whether or not we could log in to an IMAP server. As an end to end test, this is something that can fail for a wide assortment of reasons; we can fail to connect to the IMAP server, experience a TLS error during TLS session negotiation, have the server's TLS certificate fail to validate, there could be an IMAP protocol problem, or the server could reject our login attempt. If we fail, we would like to know why for diagnostic purposes (especially, some sorts of failures are more important than others in this test). In the Prometheus world, this is traditionally done by emitting a separate metric for every different thing that can fail.
In my code, the metrics are all prepared by a single function that gets called at various points. It looks something like this:
def logingauges(host, ok, ...): [...] def logincheck(host, user, pw): try: c = ssl.create_default_context() m = imaplib.IMAP4_SSL(host=host, ssl_context=c) except ssl.CertificateError: return logingauges(host, 0, ...) except [...] [...] try: r = m.login(user, pw) [...] except imaplib.IMAP4.error: return logingauges(host, 0, ...) except [...] # success, finally. return logingauges(host, 1, ...)
When I first started writing this code, I only distinguished a
couple of different reasons that we could fail so I passed the state
of those reasons directly as additional parameters to
As the number of failure reasons rose, this got both unwieldy and
annoying, partly because adding a new failure reason required going
through all existing calls to
logingauges() to add a new parameter
to each of them.
So I gave up. I turned all of the failure reasons into keyword arguments that defaulted to 0:
def logingauges(host, ok, connerr=0, loginerr=0, certerr=0, sslerr=0, imaperr=0): [...]
Now to call
logingauges() on failure I only needed to supply an
argument for the specific failure:
return logingauges(host, 0, sslerr=1)
Adding a new failure reason became much more localized; I only had
to add a new gauge metric to
logingauges(), with a new keyword
argument, and then call it from the right place.
This strikes me as pretty much a hack. The proper way is probably
to create a class to hold all of this status information as attributes
on instances, create an instance of it at the start of
manipulate the attributes as appropriate, and return the instance
when done. The class can even have a
to_gauges() function that
generates all of the actual metrics from its current values.
(In Python 3.7, I would use a dataclass, but this has to run on Ubuntu 18.04 with Python 3.6.7, so it needs to be a boring old class.)
However, not only do I already have the version that uses default function arguments, but the class based version would require a bunch more code and bureaucracy for what is basically a simple situation in a small program. I like doing things the right way, but I'm not sure I like it that much. As it stands, the default function arguments approach is pleasantly minimal and low overhead.
(Or maybe this is considered an appropriate use of default function
arguments in Python these days. Arguments with default values are
often used to set default initial values for instance attributes,
and that is kind of what I'm doing here. One version of the class
based approach could actually look the same; instead of calling a
function, I'd return a just-created instance of my
(This is only somewhat similar to using default function arguments to merge several APIs together. Here it would be a real stretch to say that there are multiple APIs, one for each failure reason.)
The cliffs in the way of adding tests to our Django web app
Back in August of last year, I wrote that it was time for me to start adding tests to our Django web app. Since then, the number of tests I have added is zero, and in fact the amount of work that I have done on our Django web app's code is also essentially zero (partly because it hasn't needed any modifications). Part of the reason for that is that adding tests feels like make-work, even though I know perfectly well that it's not really, but another part of it is that I'm staring at two reasonably substantial cliffs in my way.
Put simply, in order to add tests that I actually want to keep, I need to learn how to write Django tests and then I need to figure out what we want to test in our Django web app (and how). Learning how to write tests means reading through the Django documentation on this, both the quick tutorial and the real documentation. Unfortunately I think that I need to read all of the documentation before I start writing any tests, and possibly even plan to throw away the first round of tests as a learning experience. Testing a Django app is not as simple as testing standalone code; there is a test database you need to construct, an internal HTTP client so that you can write end to end tests, and so on. This is complicated by the fact that by now I've forgotten a lot of my general Django knowledge and I know it, so to some extent I'm going to have to re-learn Django (and re-learn our web app's code too).
(It's possible that I can find some quick-start tests I can write more or less in isolation. There are probably some stand-alone functions that I can poke at, and perhaps even stand-alone model behavior that doesn't depend on the database having a set of interlinked base data.)
Once I sort of know how to write Django tests, I need to figure out what tests to write and how much of them. There are two general answers here that I already know; we need tests that will let us eventually move to Python 3 with some confidence that the app won't blow up, and I'd like tests that will do at least basic checks that everything is fine when we move from Django version to Django version. Tests for a Python 3 migration should probably concentrate on the points where data moves in and out of our app, following the same model I used when I thought about DWiki's Python 3 Unicode issues. Django version upgrade tests should probably start by focusing on end to end testing (eg, 'can we submit a new account request through the mock HTTP client and have it show up').
All of this adds up to a significant amount of time and work to invest before we start to see real benefits from it. As a result I've kept putting it off and finding higher priority work to do (or at least more interesting work). And I'm pretty sure I need to find a substantial chunk of time in order to get anywhere with this. To put it one way, the Django testing documentation is not something that I want to try to understand in fifteen minute blocks.
PS: It turns out that our app actually has one tiny little test that I must have added years ago as a first step. It's actually surprisingly heartening to find it there and still passing.
(As before, I'm writing this partly to push myself toward doing it. We now have less than a year to the nominal end of Python 2, which is not much time with everything going on.)
Sidebar: Our database testing issue
My impression is that a decent amount of Django apps can be tested with basically empty databases, perhaps putting in a few objects. Our app doesn't work that way; its operation sits on top of a bunch of interlinked data on things like who can sponsor accounts, how those accounts should be created, and so on. Without that data, the app does nothing (in fact it will probably fail spectacularly, since it assumes that various queries will always return some data). That means we need an entire set of at least minimal data in our test database in order to test anything much. So I need to learn all about that up front, more or less right away.