Wandering Thoughts

2019-08-17

A situation where Python has undefined values

In most of Python, either a name has a value or it doesn't exist and attempts to access it will fail with some variation of 'that's not defined'. You get NameError for globals and AttributeError for attributes of objects, classes, and interestingly also for modules. Similarly, accessing a nonexistent key in a dictionary gets you a KeyError, also saying that 'this doesn't exist'.

(This means that code inside a module gets a different error for a nonexistent module variable than code outside it. I think this is just an artifact of how the name is accessed.)

But local variables in functions are different and special:

>>> def afunc():
...   print(a)
...   a = 10
... 
>>> afunc()
[...]
UnboundLocalError: local variable 'a' referenced before assignment

When we do the print(), the name a exists as a local variable (at least in some sense), but its value is undefined (and an error) instead of being, say, None. If a was not even a local variable, we should get either some variant of 'name not defined' or we'd access a global a if it existed.

(I say that a exists in some sense because it doesn't fully exist; for example, it is not in the dictionary that locals() will return.)

At one level this is a straightforward consequence of how local variables are implemented in CPython. All references to local variables within a function use the same fast access method, whether or not a value has been bound to the local variable. When no value has been set, you get an error.

At another level, this is a sensible language design decision regardless of the specifics of the implementation. Python has decided that it has lexically scoped local variables, and this opens up the possibility of accessing a local variable before it's had a value set (unlike globals and attributes). When this happens, you have three choices; you can invent an arbitrary 'unset' value, such as None, you can generate a 'name does not exist' error, or you can generate a unique error. Python doesn't have zero values in the way that a language like Go does (fundamentally because the meaning of variables is different in the two languages), so the first choice would be unusual. The second choice would be a confusing pretense, because the name actually does exist and is in fact blocking you from accessing a global version of the name. That leaves the third choice of a unique error, which is at least clear even if it's unusual.

(This sprung from a Twitter thread.)

UndefinedLocalVariables written at 23:31:36; Add Comment

2019-07-17

Django 1.11 has a bug that causes intermittent CSRF validation failures

Over on Twitter, I said:

People say that Django version upgrades are easy and reliable. That is why our web app, moved from 1.10 to 1.11, is now throwing CSRF errors on *a single form* but only when 'DEBUG=False' which, you know, doesn't help debug the issue.

Last week I updated our Django web application from Django 1.10.7 to 1.11.22. Today, one of its users reported that when they tried to submit a form, the application reported:

Forbidden (403)
CSRF verification failed. Request aborted.

More information is available with DEBUG=True.

At first I expected this to be a simple case of Django's CSRF browser cookie expiring or getting blocked. However, the person reproduced the issue, and then I reproduced the issue too, except that when I switched the live web app over to 'DEBUG=True', it didn't happen, and then sometimes it didn't happen even when debugging was off.

(Our application is infrequently used, so it's not surprising that this issue didn't surface (or didn't get reported) for a week.)

There are a number of reports of similar things on the Internet, for example here, here, here, and especially Django ticket #28488. Unfortunately not only was ticket 28488 theoretically fixed years ago, but it doesn't match what I see in Firefox's Network pane; there are no 404 HTTP requests served by our Django app, just regular successful ones.

(Here hints that maybe the issue involves using both sessions and CSRF cookies, which we do because sessions are a requirement for HTTP Basic Authentication, or at least they were at one point.)

The most popular workaround appears to be to stop Django from doing CSRF checks, often by setting CSRF_TRUSTED_ORIGINS to some value. My workaround for now is to revert back to Django 1.10.7; it may not be supported, but it actually works reliably for us, unlike Django 1.11. I am not sure that we will ever try 1.11 again; an intermittent failure that only happens in production is a really bad thing and not something I am very enthused about risking.

(I'm not particularly happy about this state of affairs and I have low expectations for the Django people fixing this issue in the remaining lifetime of 1.11, since this has clearly been happening with 1.11 for some time. Since I'm not willing to run 1.11 in production to test and try things for the Django people, it doesn't seem particularly useful to even try to report a bug.)

Django111CSRFFailures written at 21:30:52; Add Comment

2019-07-10

I brought our Django app up using Python 3 and it mostly just worked

I have been worrying for some time about the need to eventually get our Django web application running under Python 3; most recently I wrote about being realistic about our future plans, which mostly amounted to not doing anything until we had to. Well, guess what happened since then.

For reasons beyond the scope of this entry, last Friday I ended up working on moving our app from Django 1.10.7 to 1.11.x, which was enlivened by the usual problem. After I had it working under 1.11.22, I decided to try running it (in development mode, not in production) using Python 3 instead of Python 2, since Django 1.11.22 is itself fully compatible with Python 3. To my surprise, it took only a little bit of cleanup and additional changes beyond basic modernization to get it running, and the result is so far fully compatible with Python 2 as well (I committed the changes as part of the 1.11 move, and since Monday they're running in production).

I don't think this is particularly due to anything I've done in our app's code; instead, I think it's mostly due to the work that Django has done to make everything work more or less transparently. As the intermediate layer between your app and the web (and the database), Django is already the place that has to worry about character set conversion issues, so it can spare you from most of those. And generally that's the big difference between Python 2 and Python 3.

(The other difference is the print statement versus 'print()', but you can make Python 2.7 work in the same way as Python 3 with 'from __future__ import print_function', which is what I did.)

I haven't thoroughly tested our web app under Python 3, of course, but I did test a number of the basics and everything looks good. I'm fairly confident that there are no major issues left, only relatively small corner cases (and then the lurking issue of how well the Python 3 version of mod_wsgi works and if there are any traps there). I'm still planning to keep us on Python 2 and Django 1.11 through at least the end of this year, but if we needed to I could probably switch over to a current Django and Python 3 with not very much additional work (and most of the work would be updating to a new version of Django).

There was one interesting and amusing change I had to make, which is that I had to add a bunch of __str__ methods to various Django models that previously only had __unicode__ methods. When building HTML for things like form <select> fields, Django string-izes the names of model instances to determine what to put in here, but in Python 2 it actually generates the Unicode version and so ends up invoking __unicode__, while in Python 3 str is Unicode already and so Django was using __str__, which didn't exist. This is an interesting little incompatibility.

Sidebar: The specific changes I needed to make

I'm going to write these down partly because I want a coherent record, and partly because some of them are interesting.

  • When generating a random key to embed in a URL, read from /dev/urandom using binary mode instead of text mode and switch from an ad-hoc implementation of base64.urlsafe_b64encode to using the real thing. I don't know why I didn't use the base64 module in the first place; perhaps I just didn't look for it, since I already knew about Python 2's special purpose encodings.

  • Add __str__ methods to various Django model classes that previously only had __unicode__ ones.

  • Switch from print statements to print() as a function in some administrative tools the app has. The main app code doesn't use print, but some of the administrative commands report diagnostics and so on.

  • Fix mismatched tabs versus spaces indentation, which snuck in because my usual editor for Python used to use all-tabs and now uses all-spaces. At some point I should mass-convert all of the existing code files to use all-spaces, perhaps with four-space indentation.

  • Change a bunch of old style exception syntax, 'except Thing, e:', to 'except Thing as e:'. I wound up finding all of these with grep.

  • Fix one instance of sorting a dictionary's .keys(), since Python 3 now returns an iterator here instead of a sortable object.

Many of these changes were good ideas in general, and none of them are ones that I find objectionable. Certainly switching to just using base64.urlsafe_b64encode makes the code better (and it makes me feel silly for not using it to start with).

DjangoAppPython3Surprise written at 21:46:22; Add Comment

2019-07-04

Django's goals are probably not our goals for our web application

Django bills itself as "the web framework for perfectionists with deadlines". As a logical part of that, Django is always working to improve itself, as are probably almost all frameworks. For people with actively developed applications (perfectionists or otherwise), this is fine. They are working on their app anyway, constantly making other changes and improvements and adjustments, so time and Django updates will deliver a continue stream of improvements (along with a certain amount of changes they have to make to keep up, but again they're already making changes).

This does not describe our goals or what we do with our web application. What we want is to write our app, reach a point where it's essentially complete (which we pretty much achieved a while ago), and then touch it only on the rare occasions when there are changes in the requirements. Django provides what we need in terms of features (and someone has to write that code), but it doesn't and never will provide the stability that we also want. Neither sharks nor frameworks for perfectionists ever stand still.

This creates an awkward mismatch between what Django wants us to do and what we want to do, one that I have unfortunately spent years not realizing and understanding. In particular, from our perspective the work of keeping up with Django's changes and evolution is almost pure overhead. Our web application is running fine as it is, but every so often we need to go change it in order to nominally have security fixes available, and in completely unsurprising news I'm not very enthusiastic or active about doing this (not any more, at least; I was in the beginning). The latest change we need is an especially large amount of work, as we will have to move from Python 2 to Python 3.

(We don't need bug fixes because we aren't running into bugs. If we were, we probably would have to work around them anyway rather than wait for a new Django release.)

I don't know what the solution is, or even if there is a solution (especially at this point, with our application already written for Django). I expect that other frameworks (in any language) would have the same bias towards evolution and change that Django does; most users of them, especially big active ones, are likely people who have applications that are being actively developed on a regular basis. I suspect that 'web frameworks for people who want to write their app and then walk away from it' is not a very big niche, and it's not likely to be very satisfying for open source developers to work on.

(Among other structural issues, as a developer you don't get to do anything. You write your framework, fix the bugs, and then people like me want to you stop improving things.)

PS: I don't think this necessarily means that we made a bad choice when we picked Django way back when, because I'm not sure there was a better choice to be made. Writing our web app was clearly the right choice (it has saved us so much time and effort over the years), and using a framework made that feasible.

DjangoGoalsNotOurGoals written at 21:30:02; Add Comment

2019-06-29

Being realistic about what we're going to do with our Django app

One of our biggest problem points for moving away from Python 2 is our Django app, which handles all of the workflow when people request new accounts. Back in last August I wrote about how it needed tests, and then in February I wrote about that again, and now it is almost July and guess what, our app still has no tests. There is a pattern here, and given that pattern I think it's time for me to get realistic about what we're going to do with our app in next few years and how that's going to work. Being realistic doesn't leave me with pleasant answers, but at least I can try to be honest with myself for once instead of pretending.

(The problem with pretending is that I wind up not preparing for what actually happens.)

Our app is currently running on an Ubuntu 18.04 machine under Python 2 and mod_wsgi. This combination can keep running until early 2023 and we're going to do that unless there is a critical reason not to do so. By mid 2022 we should know whether or not Ubuntu 22.04 LTS will allow us to keep on running the Python 2 version with mod_wsgi; if it can, we will quite likely continue on with that until mid 2026 makes this issue something we can't ignore any more. At this point, keeping the app Python 2 until Ubuntu 18.04 support runs out is basic realism; it seems pretty unlikely that I will get around to porting the app to Python 3 in the remaining five months or so of 2019.

(We could probably switch to CentOS 8 for even longer support of Python 2, but this particular app is not worth going to that much effort and annoyance.)

At this point everyone notes that the last version of Django that supports Python 2 is 1.11, and support for 1.11 runs out at the end of this year. This is a good argument in theory, but in practice we are already running on an unsupported Django version, as we are back at Django 1.10.7 at the moment (as we have been since 2017 because Django updates are a pain at the best of times). Running an unsupported version of Django is nothing new for us; instead, it's unfortunately become the default state of affairs. I want to try to update the application to Django 1.11 at some point for various hand waving reasons, which hopefully won't be too much work. Possibly this means that we should switch to using the Ubuntu 18.04 packaged version of Django 1.11, even though I didn't think that was a good idea last November. If we're going to run an unsupported Django, it might as well be a version that someone might be keeping an eye on.

Does this present a security risk? Somewhat, but my view is that it's a relatively low one. Almost all of the web app is locked away behind Apache's HTTP basic authentication and restricted to a small number of trusted users only (and the Django admin interface is even more restricted). The exposed app surface is relatively low and relatively simple; we have a couple of basic forms and that's it (and one endpoint for AJAX that gives a yes/no answer to whether or not something is an available Unix login). Also, nothing permanent is done automatically by the app; a human is always in the loop before an account is actually created.

(It's possible that a Django vulnerability could be leveraged to attack other web things through our app, through CSRF or the like. But that would be a pretty targeted attack against the department by someone who would have to know a fair bit about how the app works, who uses it, and what else they interact with that can be attacked. Obviously the catastrophic scenario would be a remote code execution flaw that could be exploited through a basic URL view or form submission, but that seems unlikely.)

Wanting to write Django tests doesn't seem to have done much good, so my alternate plan for a Python 3 port is simply to try running our web app under Python 3, probably with Django 1.11 to keep things simple. If and when I find code that should be modernized anyway or changes that still keep things compatible with Python 2, I can fix them in the production codebase to make it more and more ready for Python 3. My hope is that a great deal of this can be done with clean changes that do not have to be conditional on Python 2 versus Python 3 but are simply good ideas in general. My hope is that the simplicity of our application combined with Django handling a lot of stuff for us behind the scene will lead to most things just working, so running it under Python 3 will mostly just work. We won't have the assurance that tests would give us, but in practice I can manually exercise things and declare the result good enough.

One big issue for Python 3 code is character set conversion and especially points where Python 3's automatic conversions can fail on you. For this, we're going to punt. I'm not going to try to harden the application to deal with character set decoding problems with the few data files that it reads; in our environment we can guarantee that they're always ASCII and so will always decode correctly. Similarly, we're always going to encode to the system default of UTF-8 when writing out files, which means that it too always works. Hopefully this means that I can ignore almost all of those issues in the Python 3 version of the app, which is what the Python 2 version is already doing.

(There are some places where I will want to require ASCII, but they're already points where I should be doing that, like the Unix login name that people choose, and so I should add these checks to the current version of the application.)

This will probably leave the Python 3 version of the application vulnerable to throwing exceptions if people put in weird characters in forms or do other things, but if that happens we actually don't care too much. The app is not used much (people don't request accounts all that often), and it's not too critical an issue if the app's not working for a few days while we fix the code to be more defensive or de-mangle things from its tiny little database.

(The app's database is so small that if we have to, we can dump it to plain text, edit the plain text, and recreate a new db from that. It is, naturally, a SQLite database.)

All of this is setting a relatively low quality standard for the eventual Python 3 version, but at this point that's realism. The app is neither a high enough priority nor interesting enough for us to do it any better, not unless I suddenly get a vast gulf of free time with nothing else to work on.

PS: Facing up to reality here has also made me realize some things about Django and us, but that's for another entry.

DjangoAppBeingRealistic written at 20:45:06; Add Comment

2019-06-24

The convenience (for me) of people writing commands in Python

The other day I was exploring Certbot, which is more or less the standard and 'as official as it ever gets' client for Let's Encrypt, and it did something that I objected to. Certbot is a very big program with a great many commands, modes, options, settings, and so on, and this was the kind of thing where I wasn't completely confident there even was a way to disable it. However, sometimes I'm a system programmer and the particular thing had printed a distinctive message. So, off to the source code I went with grep (okay, ripgrep), to find the message string and work backward from there.

Conveniently, Certbot is written in Python, which has two advantages here. The first advantage is that I actually know Python, which makes it easier to follow any logic I need to follow. The second is that Python programs intrinsically come with their source code, just as the standard library does. Certbot is open source and I was installing Ubuntu's official package for it, which gave me at least two ways of getting the source code, but there's nothing like not even having to go to the effort.

I was going to say that this is intrinsic in Python being an interpreted language, but that's not necessarily the case; instead, it's that way for both cultural and technical reasons. Java is generally interpreted by the JVM, but that's Java bytecode, not Java source code. Javascript is interpreted from its source code, but people who ship Javascript often ship it in minimized and compacted form for performance reasons, and in in practice this makes it unreadable (at least for my purposes).

(And then there's WebAssembly.)

Another cultural aspect of this is that a lot of commands written in Python are written in relatively straightforward ways that are easy to follow; you can usually grep through the code for what function something is in, then what calls that function, and so on and so forth. This is not a given and it's quite possible to create hard to follow tangles of magic (I've sort of done this in the past) or a tower of classes inside classes that are called through hard to follow patterns of delegation, object instantiation, and so on. But it's at least unusual, especially in relatively straightforward commands and in code bases that aren't too large.

(Things that are part of a framework may be a different story.)

PS: Certbot is on the edge of 'large' here, but for what I was looking for it was still functions calling functions.

PPS: That installing a Python thing gives you a bunch of .py files on your filesystem is not a completely sure thing. I believe that there are Python package and module distribution formats that don't unpack the .py files but leave them all bundled up, although the current Wheel format is apparently purely for distribution, not running in place. I am out of touch with the state of Python package distribution, so I don't know how this goes if you install things yourself.

SourceCodeIncluded written at 21:46:25; Add Comment

2019-05-30

Conditional expressions in any form are an attractive thing

In a recent entry I mentioned in passing that I once had relatively strong feelings about Python's 'A if COND else B_' conditional expressions but those had probably faded away. In a comment, Twirrim said:

I've started to use the A if FOO else BAR syntax, much to my surprise. In general, I hate it.

One increasingly common pattern in the code I write:

logging.basicConfig(level=logging.DEBUG if args.verbose else logging.INFO)

(or variations thereof, if I'm using a CLI framework like click)

Yes, very much this. One of my feelings about almost any form of ternary operator or conditional expression is that having it at all is so attractive that people will use almost any syntax that you come up with, regardless of what they feel about the syntax. Condensing a multi-line set of statements down to a single expression is sufficiently compelling that people will put up with a great deal to get it. I'll go so far as to say that people will willingly make their code less readable to get it.

There are ways around needing a conditional expression in situations like this, and I have probably adopted some of them in my code; for example, I might initialize a global 'log level' variable or setting based on things like the verbosity level the user has set on the command line. Whether or not this is a good thing is probably in the eye of the beholder, and I'm sure that some people will say that the best code is the one that spells it out explicitly (perhaps in a function that you call to determine the log level).

(In my view, the obvious corollary of how attractive conditional expressions are is that it's important to give them good syntax. Unlike other language constructs, where sufficiently annoying syntax may lead to them not being used, conditional expressions will likely get used no matter what. If your syntax is less than ideal, it'll still be all over code bases sooner or later.)

Sidebar: My use of conditional expressions has now surprised me

In my first entry, I claimed that I hadn't used conditional expressions yet. That was based on grep'ing an assortment of code that I had on hand, but it turns out that I wasn't complete enough. More extensive searching turned up at least two places (and then now a third). First, our Django app has one usage, which Mercurial tells me dates from 2013. Of course, Python's conditional expressions are also very old; they first appeared in Python 2.5, which was released in September of 2006.

More embarrassingly, the source code for DWiki turns out to have several uses, and some of these are reasonably complex, where I wrote things like:

avar = b if b else c.thing if c else None

I'm not sure if this nested code is a good idea, especially without ()'s to make ordering clear, but for what it's worth I can sort of read it out even now, several years after I wrote it.

(And I also used it in one place in a recent little utility program. As you might guess from my difficulties here, our Python code is scattered all over.)

ConditionalExpressionAttraction written at 00:25:49; Add Comment

2019-05-26

Why I no longer have strong feelings about changes to Python

A while back I wrote about when I'll probably be able to use the (then) very contentious Python assignment expressions and said in passing that I didn't have any strong opinions on them. There was probably a time when I would have felt differently; for example, I used to have relatively strong feelings about 'A if CONDITION else B' conditional expressions and I'm not sure I do any more, although I don't seem to have used them yet in any recent code.

(Since I had to look up the syntax just now, that's probably partly because I just didn't remember how to write them.)

There are two ways of putting why I no longer have strong opinions here. The first is that I no longer really care what Python looks like. The second is that I have lost my mental picture of Python; what was once clear enough for me to have opinions on how things looked has dissolved into a muddle. These two reasons are related, of course, with each contributing to the other. Since Python has turned into a complex muddle of a language, with more features and syntax than I can keep track of at my current relatively low and infrequent usage of Python, one more piece of syntax makes little difference to me and I can no longer have any opinions on conceptual unity, Pythonicness, or the like.

Will I ever use assignment expressions even once they're available to me? Probably not, honestly, unless I find some code where they would make things much clearer and I remember that they exist. In practice, a fair number of new Python 3 features have not been compelling enough to get me to even think about using them (or, to put it more bluntly, to remember them in any detail).

(There is also the issue that a lot of my current Python code is written for work, and there I run into the same issue as my pragmatic problem with using the attrs module in work code. In practice this concern is probably overblown. Among other things, I suspect my current Python code is less readable than I think.)

PS: My detachment from modern Python 3 is not exactly a strength and perhaps someday I should reset it, perhaps by going through the current Python 3 tutorial. Sooner or later I should learn how to write modern Python 3 code, however that looks and whatever modern idioms the community has settled on as good practices.

NoMoreStrongFeelings written at 21:45:26; Add Comment

2019-05-10

Some thoughts on Red Hat Enterprise 8 including Python 2 and what it means

Red Hat Enterprise 8 was released the other day, and now Red Hat has published an article on Python 2 (and 3) in RHEL 8 (via). The short version is that they aren't providing a package called 'python' but instead two packages called 'python3' and 'python2' (or two 'application streams' for Python 2 and 3, which come with additional packages). Although it's not entirely clear, Red Hat is apparently not going to have a /usr/bin/python symlink by default, leaving it up you to set one up through their alternatives system. Red Hat is recommending that you explicitly use 'python2' or 'python3' as the name of the script interpreter in '#!' lines, instead of relying on just the 'python' name.

(The presence of a 'python2' binary name is not new in RHEL 8; one was present at least as far back as RHEL 6. Also, this may or may not backtrack on things Red Hat said a year ago.)

In a way, the big news here is that RHEL 8 includes Python 2 at all as an official package, since RHEL 8 will be supported for probably somewhere around a decade (and they'd previously sort of suggested that they weren't going to). Unless Red Hat officially abandons doing any updates for Python 2 at some point, this means that they'll be supporting it (at least as far as fixing any security issues that are discovered) for much of that decade, and since their work here is open source, other people can take advantage of it. I suspect that Red Hat is not entirely happy with this, but I also suspect that they felt they had no choice for various reasons.

(I rather expect Python 2 to not be included in a future Red Hat Enterprise 9, which might be released somewhere around 2023 or 2024 based on past history. Unless Red Hat gets a lot of push back from customers, I suspect that RHEL 8 will be the only dual-Python RHEL release.)

I suspect that this makes it somewhat more likely than it already was that Ubuntu 20.04 LTS will include Python 2. At the moment, Python 2 is currently part of the bleeding edge Ubuntu rolling version and is still apparently part of the 'main' package repository. That could change before 20.04 LTS freezes and branches, but Ubuntu is running out of time to do that and, more importantly, they're running out of pre-LTS releases to do it in; there would normally only be 19.10, due out in October. Since RHEL 8 includes Python 2, including Python 2 in Ubuntu is safer in that Ubuntu can probably rely on copying Red Hat's fixes, if any are needed.

(Also, per this 2018 LWN article, Debian will be shipping Python 2 with their next distribution, which they're in the process of trying to release at the moment. I believe that Debian wants to strip out Python 2 after that, but I wouldn't necessarily expect fast movement on that, and Ubuntu probably won't be more aggressive than Debian here.)

None of this means that people using Python 2 are completely safe. For a start, Python based packages and systems have been moving away from supporting Python 2 for some time. For an example that's relevant to us, the last Django version that supports Python 2 is 1.11, which itself will only be supported until April 2020 (cf). Unless we want to count on Ubuntu 18.04's packaging of Django (and we don't), the presence of Python 2 in Ubuntu 20.04 will be not too relevant for our Django web application. These days, we also install some popular Python packages for GPU computation and so on, and they're very likely to be Python 3 only soon if they aren't already (I haven't checked the current state of things like Tensorflow). And even if Ubuntu 20.04 includes Python 2, Ubuntu 22.04 might not, and that's not all that far away.

I also suspect that even when Python 2 is available in some form, more future distributions will follow RHEL 8's model and try not to provide a /usr/bin/python that points to it, especially on completely new installs (which is our usual case). We can try to fight this, but I suspect that we're better off changing our Python (2) programs to use '#!/usr/bin/python2'. Our users may force our hands, though, if they object strongly enough to there not being a 'python'.

(Slowly making that change may give us a chance to inventory just how many Python programs we actually have rattling around this place. The answer is probably 'more than I thought we did', since we've been writing various random things in Python for quite a while now.)

Python2AndRHEL8 written at 23:44:23; Add Comment

2019-04-26

Various aspects of Python made debugging my tarfile problem unusual

I was recently thinking about what I like when I use Python, and in the process I wound up reflecting about how working out that the tarfile module is too generous about what is a tar file was made different and easier by various aspects of Python. I'm not going to say that I couldn't have worked out a similar problem in, say, Go, but if I had, I think it would have been a relatively different experience.

One aspect of CPython specifically is that a lot of the standard library is written in Python and so intrinsically has its source code available even on a standard Python install (because the source code is what CPython will run). You don't have to try to install debugging symbols or fetch a source package; I could just go find tarfile.py and read it immediately. This reduced friction is part of what made me actually go digging in the first place, because it wasn't that much work to take a quick peek to see if I could figure out what was going on (then things snowballed from there).

Once I was poking at the tarfile module, another useful Python peculiarity became important. Python lets you use (or abuse) the import path to provide your own versions of modules from the standard library, preempting the stock version. I could copy my program to a scratch directory, copy the tarfile.py from Python distribution to the same directory, and start adding print statements and so on to understand the flow of execution through the module's code. I didn't have to change the 'import tarfile' in my own program to another name or another path, the way I would have had to in some other languages.

(This was useful for more than using a hacked tarfile.py for diagnosing things. It also meant that when I thought I had a workaround in my own code, I could rename my tarfile.py and have my program instantly revert to using the stock Python tarfile module, so I could verify that my fix wasn't being influenced by my tarfile.py hacks.)

Everyone cites Python's interactive interpreter and the ease of examining objects in it as great advantages, and I'm not going to argue; certainly I've used it for lots of exploration. Once I had things narrowed down to what I thought was the cause, the interactive interpreter was the fastest place to get to running code and so the best environment to quickly try out my guesses. In other languages I might have to fire up an editor to write a program or at least some tests, or craft a carefully built input file for my program.

(Technically it also sort of made for a pretty minimal reproduction case in my eventual bug report, because I implicitly assumed I didn't need to write up anything more than what would be needed to duplicate it inside an interactive interpreter.)

The cycle of editing tarfile.py and re-running my program to test and explore the module's behavior was probably not any faster in Python than it might have been in a non-interpreted language, but it felt different. The code I was editing was what was actually running a few moments later, not something that was going to be transformed through a build process. And for some reason, Python code often feels more mutable to me than code in other languages (perhaps because I percieve it as having less bureaucracy, due to dynamic typing and the ability to easily print out random things and so on).

Overall, I think the whole experience felt more lightweight and casual in Python than it would have in many other languages I'm familiar with. I was basically bashing things together and seeing how far I could get with relatively little effort, and the answer turned out to be all the way to a standard library bug.

DebuggingTarfileThoughts written at 00:23:10; Add Comment

(Previous 10 or go back to April 2019 at 2019/04/10)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.