Various aspects of Python made debugging my tarfile problem unusual

April 26, 2019

I was recently thinking about what I like when I use Python, and in the process I wound up reflecting about how working out that the tarfile module is too generous about what is a tar file was made different and easier by various aspects of Python. I'm not going to say that I couldn't have worked out a similar problem in, say, Go, but if I had, I think it would have been a relatively different experience.

One aspect of CPython specifically is that a lot of the standard library is written in Python and so intrinsically has its source code available even on a standard Python install (because the source code is what CPython will run). You don't have to try to install debugging symbols or fetch a source package; I could just go find and read it immediately. This reduced friction is part of what made me actually go digging in the first place, because it wasn't that much work to take a quick peek to see if I could figure out what was going on (then things snowballed from there).

Once I was poking at the tarfile module, another useful Python peculiarity became important. Python lets you use (or abuse) the import path to provide your own versions of modules from the standard library, preempting the stock version. I could copy my program to a scratch directory, copy the from Python distribution to the same directory, and start adding print statements and so on to understand the flow of execution through the module's code. I didn't have to change the 'import tarfile' in my own program to another name or another path, the way I would have had to in some other languages.

(This was useful for more than using a hacked for diagnosing things. It also meant that when I thought I had a workaround in my own code, I could rename my and have my program instantly revert to using the stock Python tarfile module, so I could verify that my fix wasn't being influenced by my hacks.)

Everyone cites Python's interactive interpreter and the ease of examining objects in it as great advantages, and I'm not going to argue; certainly I've used it for lots of exploration. Once I had things narrowed down to what I thought was the cause, the interactive interpreter was the fastest place to get to running code and so the best environment to quickly try out my guesses. In other languages I might have to fire up an editor to write a program or at least some tests, or craft a carefully built input file for my program.

(Technically it also sort of made for a pretty minimal reproduction case in my eventual bug report, because I implicitly assumed I didn't need to write up anything more than what would be needed to duplicate it inside an interactive interpreter.)

The cycle of editing and re-running my program to test and explore the module's behavior was probably not any faster in Python than it might have been in a non-interpreted language, but it felt different. The code I was editing was what was actually running a few moments later, not something that was going to be transformed through a build process. And for some reason, Python code often feels more mutable to me than code in other languages (perhaps because I percieve it as having less bureaucracy, due to dynamic typing and the ability to easily print out random things and so on).

Overall, I think the whole experience felt more lightweight and casual in Python than it would have in many other languages I'm familiar with. I was basically bashing things together and seeing how far I could get with relatively little effort, and the answer turned out to be all the way to a standard library bug.

Written on 26 April 2019.
« How we're making updated versions of a file rapidly visible on our Linux NFS clients
Brief notes on making Prometheus instant queries with curl »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Apr 26 00:23:10 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.