Wandering Thoughts

2019-06-24

The convenience (for me) of people writing commands in Python

The other day I was exploring Certbot, which is more or less the standard and 'as official as it ever gets' client for Let's Encrypt, and it did something that I objected to. Certbot is a very big program with a great many commands, modes, options, settings, and so on, and this was the kind of thing where I wasn't completely confident there even was a way to disable it. However, sometimes I'm a system programmer and the particular thing had printed a distinctive message. So, off to the source code I went with grep (okay, ripgrep), to find the message string and work backward from there.

Conveniently, Certbot is written in Python, which has two advantages here. The first advantage is that I actually know Python, which makes it easier to follow any logic I need to follow. The second is that Python programs intrinsically come with their source code, just as the standard library does. Certbot is open source and I was installing Ubuntu's official package for it, which gave me at least two ways of getting the source code, but there's nothing like not even having to go to the effort.

I was going to say that this is intrinsic in Python being an interpreted language, but that's not necessarily the case; instead, it's that way for both cultural and technical reasons. Java is generally interpreted by the JVM, but that's Java bytecode, not Java source code. Javascript is interpreted from its source code, but people who ship Javascript often ship it in minimized and compacted form for performance reasons, and in in practice this makes it unreadable (at least for my purposes).

(And then there's WebAssembly.)

Another cultural aspect of this is that a lot of commands written in Python are written in relatively straightforward ways that are easy to follow; you can usually grep through the code for what function something is in, then what calls that function, and so on and so forth. This is not a given and it's quite possible to create hard to follow tangles of magic (I've sort of done this in the past) or a tower of classes inside classes that are called through hard to follow patterns of delegation, object instantiation, and so on. But it's at least unusual, especially in relatively straightforward commands and in code bases that aren't too large.

(Things that are part of a framework may be a different story.)

PS: Certbot is on the edge of 'large' here, but for what I was looking for it was still functions calling functions.

PPS: That installing a Python thing gives you a bunch of .py files on your filesystem is not a completely sure thing. I believe that there are Python package and module distribution formats that don't unpack the .py files but leave them all bundled up, although the current Wheel format is apparently purely for distribution, not running in place. I am out of touch with the state of Python package distribution, so I don't know how this goes if you install things yourself.

SourceCodeIncluded written at 21:46:25; Add Comment

2019-05-30

Conditional expressions in any form are an attractive thing

In a recent entry I mentioned in passing that I once had relatively strong feelings about Python's 'A if COND else B_' conditional expressions but those had probably faded away. In a comment, Twirrim said:

I've started to use the A if FOO else BAR syntax, much to my surprise. In general, I hate it.

One increasingly common pattern in the code I write:

logging.basicConfig(level=logging.DEBUG if args.verbose else logging.INFO)

(or variations thereof, if I'm using a CLI framework like click)

Yes, very much this. One of my feelings about almost any form of ternary operator or conditional expression is that having it at all is so attractive that people will use almost any syntax that you come up with, regardless of what they feel about the syntax. Condensing a multi-line set of statements down to a single expression is sufficiently compelling that people will put up with a great deal to get it. I'll go so far as to say that people will willingly make their code less readable to get it.

There are ways around needing a conditional expression in situations like this, and I have probably adopted some of them in my code; for example, I might initialize a global 'log level' variable or setting based on things like the verbosity level the user has set on the command line. Whether or not this is a good thing is probably in the eye of the beholder, and I'm sure that some people will say that the best code is the one that spells it out explicitly (perhaps in a function that you call to determine the log level).

(In my view, the obvious corollary of how attractive conditional expressions are is that it's important to give them good syntax. Unlike other language constructs, where sufficiently annoying syntax may lead to them not being used, conditional expressions will likely get used no matter what. If your syntax is less than ideal, it'll still be all over code bases sooner or later.)

Sidebar: My use of conditional expressions has now surprised me

In my first entry, I claimed that I hadn't used conditional expressions yet. That was based on grep'ing an assortment of code that I had on hand, but it turns out that I wasn't complete enough. More extensive searching turned up at least two places (and then now a third). First, our Django app has one usage, which Mercurial tells me dates from 2013. Of course, Python's conditional expressions are also very old; they first appeared in Python 2.5, which was released in September of 2006.

More embarrassingly, the source code for DWiki turns out to have several uses, and some of these are reasonably complex, where I wrote things like:

avar = b if b else c.thing if c else None

I'm not sure if this nested code is a good idea, especially without ()'s to make ordering clear, but for what it's worth I can sort of read it out even now, several years after I wrote it.

(And I also used it in one place in a recent little utility program. As you might guess from my difficulties here, our Python code is scattered all over.)

ConditionalExpressionAttraction written at 00:25:49; Add Comment

2019-05-26

Why I no longer have strong feelings about changes to Python

A while back I wrote about when I'll probably be able to use the (then) very contentious Python assignment expressions and said in passing that I didn't have any strong opinions on them. There was probably a time when I would have felt differently; for example, I used to have relatively strong feelings about 'A if CONDITION else B' conditional expressions and I'm not sure I do any more, although I don't seem to have used them yet in any recent code.

(Since I had to look up the syntax just now, that's probably partly because I just didn't remember how to write them.)

There are two ways of putting why I no longer have strong opinions here. The first is that I no longer really care what Python looks like. The second is that I have lost my mental picture of Python; what was once clear enough for me to have opinions on how things looked has dissolved into a muddle. These two reasons are related, of course, with each contributing to the other. Since Python has turned into a complex muddle of a language, with more features and syntax than I can keep track of at my current relatively low and infrequent usage of Python, one more piece of syntax makes little difference to me and I can no longer have any opinions on conceptual unity, Pythonicness, or the like.

Will I ever use assignment expressions even once they're available to me? Probably not, honestly, unless I find some code where they would make things much clearer and I remember that they exist. In practice, a fair number of new Python 3 features have not been compelling enough to get me to even think about using them (or, to put it more bluntly, to remember them in any detail).

(There is also the issue that a lot of my current Python code is written for work, and there I run into the same issue as my pragmatic problem with using the attrs module in work code. In practice this concern is probably overblown. Among other things, I suspect my current Python code is less readable than I think.)

PS: My detachment from modern Python 3 is not exactly a strength and perhaps someday I should reset it, perhaps by going through the current Python 3 tutorial. Sooner or later I should learn how to write modern Python 3 code, however that looks and whatever modern idioms the community has settled on as good practices.

NoMoreStrongFeelings written at 21:45:26; Add Comment

2019-05-10

Some thoughts on Red Hat Enterprise 8 including Python 2 and what it means

Red Hat Enterprise 8 was released the other day, and now Red Hat has published an article on Python 2 (and 3) in RHEL 8 (via). The short version is that they aren't providing a package called 'python' but instead two packages called 'python3' and 'python2' (or two 'application streams' for Python 2 and 3, which come with additional packages). Although it's not entirely clear, Red Hat is apparently not going to have a /usr/bin/python symlink by default, leaving it up you to set one up through their alternatives system. Red Hat is recommending that you explicitly use 'python2' or 'python3' as the name of the script interpreter in '#!' lines, instead of relying on just the 'python' name.

(The presence of a 'python2' binary name is not new in RHEL 8; one was present at least as far back as RHEL 6. Also, this may or may not backtrack on things Red Hat said a year ago.)

In a way, the big news here is that RHEL 8 includes Python 2 at all as an official package, since RHEL 8 will be supported for probably somewhere around a decade (and they'd previously sort of suggested that they weren't going to). Unless Red Hat officially abandons doing any updates for Python 2 at some point, this means that they'll be supporting it (at least as far as fixing any security issues that are discovered) for much of that decade, and since their work here is open source, other people can take advantage of it. I suspect that Red Hat is not entirely happy with this, but I also suspect that they felt they had no choice for various reasons.

(I rather expect Python 2 to not be included in a future Red Hat Enterprise 9, which might be released somewhere around 2023 or 2024 based on past history. Unless Red Hat gets a lot of push back from customers, I suspect that RHEL 8 will be the only dual-Python RHEL release.)

I suspect that this makes it somewhat more likely than it already was that Ubuntu 20.04 LTS will include Python 2. At the moment, Python 2 is currently part of the bleeding edge Ubuntu rolling version and is still apparently part of the 'main' package repository. That could change before 20.04 LTS freezes and branches, but Ubuntu is running out of time to do that and, more importantly, they're running out of pre-LTS releases to do it in; there would normally only be 19.10, due out in October. Since RHEL 8 includes Python 2, including Python 2 in Ubuntu is safer in that Ubuntu can probably rely on copying Red Hat's fixes, if any are needed.

(Also, per this 2018 LWN article, Debian will be shipping Python 2 with their next distribution, which they're in the process of trying to release at the moment. I believe that Debian wants to strip out Python 2 after that, but I wouldn't necessarily expect fast movement on that, and Ubuntu probably won't be more aggressive than Debian here.)

None of this means that people using Python 2 are completely safe. For a start, Python based packages and systems have been moving away from supporting Python 2 for some time. For an example that's relevant to us, the last Django version that supports Python 2 is 1.11, which itself will only be supported until April 2020 (cf). Unless we want to count on Ubuntu 18.04's packaging of Django (and we don't), the presence of Python 2 in Ubuntu 20.04 will be not too relevant for our Django web application. These days, we also install some popular Python packages for GPU computation and so on, and they're very likely to be Python 3 only soon if they aren't already (I haven't checked the current state of things like Tensorflow). And even if Ubuntu 20.04 includes Python 2, Ubuntu 22.04 might not, and that's not all that far away.

I also suspect that even when Python 2 is available in some form, more future distributions will follow RHEL 8's model and try not to provide a /usr/bin/python that points to it, especially on completely new installs (which is our usual case). We can try to fight this, but I suspect that we're better off changing our Python (2) programs to use '#!/usr/bin/python2'. Our users may force our hands, though, if they object strongly enough to there not being a 'python'.

(Slowly making that change may give us a chance to inventory just how many Python programs we actually have rattling around this place. The answer is probably 'more than I thought we did', since we've been writing various random things in Python for quite a while now.)

Python2AndRHEL8 written at 23:44:23; Add Comment

2019-04-26

Various aspects of Python made debugging my tarfile problem unusual

I was recently thinking about what I like when I use Python, and in the process I wound up reflecting about how working out that the tarfile module is too generous about what is a tar file was made different and easier by various aspects of Python. I'm not going to say that I couldn't have worked out a similar problem in, say, Go, but if I had, I think it would have been a relatively different experience.

One aspect of CPython specifically is that a lot of the standard library is written in Python and so intrinsically has its source code available even on a standard Python install (because the source code is what CPython will run). You don't have to try to install debugging symbols or fetch a source package; I could just go find tarfile.py and read it immediately. This reduced friction is part of what made me actually go digging in the first place, because it wasn't that much work to take a quick peek to see if I could figure out what was going on (then things snowballed from there).

Once I was poking at the tarfile module, another useful Python peculiarity became important. Python lets you use (or abuse) the import path to provide your own versions of modules from the standard library, preempting the stock version. I could copy my program to a scratch directory, copy the tarfile.py from Python distribution to the same directory, and start adding print statements and so on to understand the flow of execution through the module's code. I didn't have to change the 'import tarfile' in my own program to another name or another path, the way I would have had to in some other languages.

(This was useful for more than using a hacked tarfile.py for diagnosing things. It also meant that when I thought I had a workaround in my own code, I could rename my tarfile.py and have my program instantly revert to using the stock Python tarfile module, so I could verify that my fix wasn't being influenced by my tarfile.py hacks.)

Everyone cites Python's interactive interpreter and the ease of examining objects in it as great advantages, and I'm not going to argue; certainly I've used it for lots of exploration. Once I had things narrowed down to what I thought was the cause, the interactive interpreter was the fastest place to get to running code and so the best environment to quickly try out my guesses. In other languages I might have to fire up an editor to write a program or at least some tests, or craft a carefully built input file for my program.

(Technically it also sort of made for a pretty minimal reproduction case in my eventual bug report, because I implicitly assumed I didn't need to write up anything more than what would be needed to duplicate it inside an interactive interpreter.)

The cycle of editing tarfile.py and re-running my program to test and explore the module's behavior was probably not any faster in Python than it might have been in a non-interpreted language, but it felt different. The code I was editing was what was actually running a few moments later, not something that was going to be transformed through a build process. And for some reason, Python code often feels more mutable to me than code in other languages (perhaps because I percieve it as having less bureaucracy, due to dynamic typing and the ability to easily print out random things and so on).

Overall, I think the whole experience felt more lightweight and casual in Python than it would have in many other languages I'm familiar with. I was basically bashing things together and seeing how far I could get with relatively little effort, and the answer turned out to be all the way to a standard library bug.

DebuggingTarfileThoughts written at 00:23:10; Add Comment

2019-04-10

The tarfile module is too generous about what is considered a tar file

The Python standard library's tarfile module has a tarfile.is_tarfile function that tells you whether or not some file is a tar file, or at least is a tar file that the module can read. As is not too silly in Python, it operates by attempting to open the file with tarfile.open; if open() succeeds, clearly this is a good tarfile.

Unfortunately, through what is perhaps a bug, this fails to report any errors on various sorts of things that are not actually tar files. On a Unix system, the very easiest and simplest reproduction of this problem is:

>>> import tarfile
>>> tarfile.open("/dev/zero", "r:")

This raises no exception and gives you back a TarFile object that will report that you have an empty tar file.

(If you leave off the 'r:', this hangs, ultimately because the lzma module will happily read forever from a stream of zero bytes. Unless you tell it otherwise, the tarfile module normally tries a sequences of decompressors on your potential tarfile, including lzma for .xz files.)

One specific form of thing that will cause this issue is any nominal 'tar file' that starts with 512 bytes of zero bytes (after any decompression is applied). Since this applies to /dev/zero, we have our handy and obviously incorrect reproduction case. There may be other initial 512-byte blocks that will cause this; I have not investigated the code deeply, partly because it is tangled.

I suspect that this is a bug in the TarFile.next function, which looks like it is missing an 'elif self.offset == 0:' clause (see the block of code starting around here). But whether or not this issue is a bug and will be fixed in a future version of Python 3, it is very widespread in existing versions of Python that are out there in the field, and so any code that cares about this (which we have some of) needs to cope with it.

My current hack workaround is to check whether or not the .members list on the returned TarFile object is empty. This is not a documented attribute, but it's unlikely to change and it works today (and feels slightly less sleazy than checking whether .firstmember is None).

(For reasons beyond the scope of this entry, I have decided to slog through the effort of finding how to submit Python bug reports, unearthing my login from the last time I threw a bug report into their issue tracker, and filing a version of this as issue 36596.)

TarfileTooGenerous written at 22:12:58; Add Comment

2019-03-17

Going from a bound instance method to its class instance in Python

In response to yesterday's entry on how I feel callable classes are better than closures, a commentator suggested:

If you need something callable, why not use a bound method? They have a reference to the parent too.

This raises a question: how easy and reliable is it to go from a bound method on an instance to the instance itself?

In both Python 2 and Python 3, a bound method is an instance of a special type (how this happens is described in my entry on how functions become bound methods). Although the Python 3 documentation is not explicit about it, this type is what is described in the "Instance methods" section of the Python 3 data model. This description of the (bound) method type officially documents the __self__ attribute, which is a reference to the original instance that the bound method is derived from. So the answer is that given an object x that is passed to you as a bound method, you can recover the actual instance as x.__self__ and then inspect it from there.

(In Python 2.7, there is also the im_self attribute, which contains the same information.)

If you want your code to check if it has a bound method, you can use isinstance() with types.MethodType. This name for the type can also be used to check its help(), which really won't tell you much; you're better off reading the "Instance methods" section of the data model.

I'm not sure how I feel about relying on this. On the one hand, it is officially documented and it works the same in Python 3 and Python 2 (ignoring Python 2's im_self and the possibility of unbound methods on Python 2). On the other hand, this is a __ attribute, and using those generally feels somewhat like I'm peeking into implementation details. I don't know if the Python developers consider this a stable API or something that very definitely isn't guaranteed over the long term.

(If nothing else, now I know a little bit more about Python than I did before I decided to look this up. I was actually expecting the answer to be more obscure than it turned out to be.)

BoundMethodToInstance written at 23:57:39; Add Comment

2019-03-16

Callable class instances versus closures in Python

Recently I read Don't Make It Callable (via), which advocates avoiding having your class instances be callable (by __call__ on your classes). Let me quote its fundamental thesis on using __call__:

At first, like every operator overload, this seems like a nifty idea. And then, like most operator overload cases, we need to ask: why? Why is this better than a named method?

I wholeheartedly agree with this, and in the beginning I agreed with the whole article. But then I began thinking about my usage of __call__ and something that the article advocated as a replacement, and found that I partially disagree with it. To quote it again:

If something really is nothing more than a function call with some extra arguments, then either a closure or a partial would be appropriate.

(By 'partial', the article means the use of functools.partial to construct a partially applied function.)

My view is that if you have to provide something that's callable, a callable class is better than a closure because it's more amenable to inspection. A class instance is a clear thing; you can easily see what it is, what it's doing, and inspect the state of instances (especially if you remember to give your class a useful __str__ or __repr__). You can even easily give them (and their methods) docstrings, so that help() provides helpful information about them.

None of this is true of closures (unless you go well out of your way) and only a bit of it is true of partially applied functions. Even if you go out of your way to provide a docstring for your closure function, the whole assemblage is basically an opaque blob. A partially applied function is somewhat better because the resulting object exposes some information, but it's still not as open and transparent as an object.

This becomes especially important if your callable thing is going to be called repeatedly and hold internal state. It's far easier to make this internal state visible, potentially modifiable, and above all debuggable if you're using an object than if you try to wrap all of this up inside a function (or a closure) that manipulates its internal variables. Python objects are designed to be transparent (at least by default), as peculiar as this sounds in general.

(After all, one of the usual stated purposes of objects is to encapsulate things away from the outside world.)

Callable classes are unquestionably more verbose than closures, partially applied functions, or even lambdas, and sometimes this is annoying. But I think you should use them for anything that is not trivial by itself, and maybe even for small things depending on how long the resulting callable entities are going to live and how far away they are going to propagate in your program. The result is likely to be more maintainable and more debuggable.

PS: This somewhat biases me toward providing things with the entire instance and using __call__ over providing a method on the instance. If you're trying to debug something, it's harder to go from a method to inspecting the instance it comes from. Providing just a method is probably okay if the use is 'close' to the class definition (eg, in the same file or the same module), because then you can look back and forth easily. Providing the full instance is what I'd do if I was passing the callable thing around to another module or returning it as part of my public API.

CallableClassVsClosure written at 22:59:13; Add Comment

2019-02-22

Using default function arguments to avoid creating a class

Recently I was writing some Python code to print out Prometheus metrics about whether or not we could log in to an IMAP server. As an end to end test, this is something that can fail for a wide assortment of reasons; we can fail to connect to the IMAP server, experience a TLS error during TLS session negotiation, have the server's TLS certificate fail to validate, there could be an IMAP protocol problem, or the server could reject our login attempt. If we fail, we would like to know why for diagnostic purposes (especially, some sorts of failures are more important than others in this test). In the Prometheus world, this is traditionally done by emitting a separate metric for every different thing that can fail.

In my code, the metrics are all prepared by a single function that gets called at various points. It looks something like this:

def logingauges(host, ok, ...):
  [...]

def logincheck(host, user, pw):
  try:
    c = ssl.create_default_context()
    m = imaplib.IMAP4_SSL(host=host, ssl_context=c)
  except ssl.CertificateError:
    return logingauges(host, 0, ...)
  except [...]
  [...]

  try:
    r = m.login(user, pw)
    [...]
  except imaplib.IMAP4.error:
    return logingauges(host, 0, ...)
  except [...]

  # success, finally.
  return logingauges(host, 1, ...)

When I first started writing this code, I only distinguished a couple of different reasons that we could fail so I passed the state of those reasons directly as additional parameters to logingauges(). As the number of failure reasons rose, this got both unwieldy and annoying, partly because adding a new failure reason required going through all existing calls to logingauges() to add a new parameter to each of them.

So I gave up. I turned all of the failure reasons into keyword arguments that defaulted to 0:

def logingauges(host, ok,
                connerr=0, loginerr=0, certerr=0,
                sslerr=0, imaperr=0):
  [...]

Now to call logingauges() on failure I only needed to supply an argument for the specific failure:

  return logingauges(host, 0, sslerr=1)

Adding a new failure reason became much more localized; I only had to add a new gauge metric to logingauges(), with a new keyword argument, and then call it from the right place.

This strikes me as pretty much a hack. The proper way is probably to create a class to hold all of this status information as attributes on instances, create an instance of it at the start of logincheck(), manipulate the attributes as appropriate, and return the instance when done. The class can even have a to_gauges() function that generates all of the actual metrics from its current values.

(In Python 3.7, I would use a dataclass, but this has to run on Ubuntu 18.04 with Python 3.6.7, so it needs to be a boring old class.)

However, not only do I already have the version that uses default function arguments, but the class based version would require a bunch more code and bureaucracy for what is basically a simple situation in a small program. I like doing things the right way, but I'm not sure I like it that much. As it stands, the default function arguments approach is pleasantly minimal and low overhead.

(Or maybe this is considered an appropriate use of default function arguments in Python these days. Arguments with default values are often used to set default initial values for instance attributes, and that is kind of what I'm doing here. One version of the class based approach could actually look the same; instead of calling a function, I'd return a just-created instance of my IMAPStatus class.)

(This is only somewhat similar to using default function arguments to merge several APIs together. Here it would be a real stretch to say that there are multiple APIs, one for each failure reason.)

DefaultArgumentsAvoidClass written at 22:27:44; Add Comment

2019-02-20

The cliffs in the way of adding tests to our Django web app

Back in August of last year, I wrote that it was time for me to start adding tests to our Django web app. Since then, the number of tests I have added is zero, and in fact the amount of work that I have done on our Django web app's code is also essentially zero (partly because it hasn't needed any modifications). Part of the reason for that is that adding tests feels like make-work, even though I know perfectly well that it's not really, but another part of it is that I'm staring at two reasonably substantial cliffs in my way.

Put simply, in order to add tests that I actually want to keep, I need to learn how to write Django tests and then I need to figure out what we want to test in our Django web app (and how). Learning how to write tests means reading through the Django documentation on this, both the quick tutorial and the real documentation. Unfortunately I think that I need to read all of the documentation before I start writing any tests, and possibly even plan to throw away the first round of tests as a learning experience. Testing a Django app is not as simple as testing standalone code; there is a test database you need to construct, an internal HTTP client so that you can write end to end tests, and so on. This is complicated by the fact that by now I've forgotten a lot of my general Django knowledge and I know it, so to some extent I'm going to have to re-learn Django (and re-learn our web app's code too).

(It's possible that I can find some quick-start tests I can write more or less in isolation. There are probably some stand-alone functions that I can poke at, and perhaps even stand-alone model behavior that doesn't depend on the database having a set of interlinked base data.)

Once I sort of know how to write Django tests, I need to figure out what tests to write and how much of them. There are two general answers here that I already know; we need tests that will let us eventually move to Python 3 with some confidence that the app won't blow up, and I'd like tests that will do at least basic checks that everything is fine when we move from Django version to Django version. Tests for a Python 3 migration should probably concentrate on the points where data moves in and out of our app, following the same model I used when I thought about DWiki's Python 3 Unicode issues. Django version upgrade tests should probably start by focusing on end to end testing (eg, 'can we submit a new account request through the mock HTTP client and have it show up').

All of this adds up to a significant amount of time and work to invest before we start to see real benefits from it. As a result I've kept putting it off and finding higher priority work to do (or at least more interesting work). And I'm pretty sure I need to find a substantial chunk of time in order to get anywhere with this. To put it one way, the Django testing documentation is not something that I want to try to understand in fifteen minute blocks.

PS: It turns out that our app actually has one tiny little test that I must have added years ago as a first step. It's actually surprisingly heartening to find it there and still passing.

(As before, I'm writing this partly to push myself toward doing it. We now have less than a year to the nominal end of Python 2, which is not much time with everything going on.)

Sidebar: Our database testing issue

My impression is that a decent amount of Django apps can be tested with basically empty databases, perhaps putting in a few objects. Our app doesn't work that way; its operation sits on top of a bunch of interlinked data on things like who can sponsor accounts, how those accounts should be created, and so on. Without that data, the app does nothing (in fact it will probably fail spectacularly, since it assumes that various queries will always return some data). That means we need an entire set of at least minimal data in our test database in order to test anything much. So I need to learn all about that up front, more or less right away.

DjangoMyTestingCliffs written at 00:20:30; Add Comment

(Previous 10 or go back to January 2019 at 2019/01/27)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.