In Python, using the logging package is part of your API, or should be

November 4, 2020

We have a Python program for logging email attachment type information. As part of doing this, it wants to peer inside various sorts of archive types to see what's inside of them, because malware puts bad stuff there. One of the Python modules we use for this is the Ubuntu packaged version of libarchive-c, which is a Python API for libarchive. Our program prints out information in a very specific output format, which our Exim configuration then reads and makes use of.

Very recently, I was looking at our logs for an email message and noticed that it had a very unusual status report. Normal status reports look like this:

1kX88D-0004Mb-PR attachment application/zip; MIME file ext: .zip; zip exts: .iso

This message's status report was:

Pathname cannot be converted from UTF-16BE to current locale.

That's not a message that our program emits. It's instead a warning message from the C libarchive library. However, it is not printed out directly by the C code; instead this report is passed up as an additional warning attached to the results of library calls. It is libarchive-c that is deciding to print it out, in a general FFI support function. More specifically, libarchive-c is deciding to 'log' it through the Python logging package; the default logging environment then prints it out to standard error.

(Our program does not otherwise use logging, and I had no idea it was in use until I tried to track this down.)

A program's output is often part of its API in practice. When code does things that in default conditions produces output, this alters the API of the program it is in. This should not be done casually. If warning information should be exposed, then it should be surfaced through an actual API (an accessible one), not thrown out randomly. If your code does use logging, this should be part of its documented API, not stuffed away in a corner as an implementation detail, because people will quite reasonably want to know this (so they can configure logging in general) and may want to turn it off.

In a related issue, notice that libarchive-c constructs the logger it will use at import time (here), before your Python code normally will have had a chance to configure logging, and will even use it at import time under some circumstances (here and here), as it is dynamically building some bindings. I suspect that it is far from alone as far as constructing and even using its logger at import time goes.

(It's natural to configure logging as part of program startup, in a main() function or something descending from it, not at program load time before you start doing imports. This is especially the case since how you do logging in a program may depend on command line arguments or other configuration information.)

(This is the background for this tweet of mine.)

Written on 04 November 2020.
« You shouldn't use the Linux dump program any more (on extN filesystems)
It's possible that the real size of different SSDs is now consistent »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Nov 4 23:45:30 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.