Wandering Thoughts archives

2020-11-04

In Python, using the logging package is part of your API, or should be

We have a Python program for logging email attachment type information. As part of doing this, it wants to peer inside various sorts of archive types to see what's inside of them, because malware puts bad stuff there. One of the Python modules we use for this is the Ubuntu packaged version of libarchive-c, which is a Python API for libarchive. Our program prints out information in a very specific output format, which our Exim configuration then reads and makes use of.

Very recently, I was looking at our logs for an email message and noticed that it had a very unusual status report. Normal status reports look like this:

1kX88D-0004Mb-PR attachment application/zip; MIME file ext: .zip; zip exts: .iso

This message's status report was:

Pathname cannot be converted from UTF-16BE to current locale.

That's not a message that our program emits. It's instead a warning message from the C libarchive library. However, it is not printed out directly by the C code; instead this report is passed up as an additional warning attached to the results of library calls. It is libarchive-c that is deciding to print it out, in a general FFI support function. More specifically, libarchive-c is deciding to 'log' it through the Python logging package; the default logging environment then prints it out to standard error.

(Our program does not otherwise use logging, and I had no idea it was in use until I tried to track this down.)

A program's output is often part of its API in practice. When code does things that in default conditions produces output, this alters the API of the program it is in. This should not be done casually. If warning information should be exposed, then it should be surfaced through an actual API (an accessible one), not thrown out randomly. If your code does use logging, this should be part of its documented API, not stuffed away in a corner as an implementation detail, because people will quite reasonably want to know this (so they can configure logging in general) and may want to turn it off.

In a related issue, notice that libarchive-c constructs the logger it will use at import time (here), before your Python code normally will have had a chance to configure logging, and will even use it at import time under some circumstances (here and here), as it is dynamically building some bindings. I suspect that it is far from alone as far as constructing and even using its logger at import time goes.

(It's natural to configure logging as part of program startup, in a main() function or something descending from it, not at program load time before you start doing imports. This is especially the case since how you do logging in a program may depend on command line arguments or other configuration information.)

(This is the background for this tweet of mine.)

python/LoggingPackageAndYourAPI written at 23:45:30; Add Comment

You shouldn't use the Linux dump program any more (on extN filesystems)

When I upgraded my office workstation to Fedora 32, one of the things that happened is that Amanda backups of its root filesystem stopped working. The specific complaint from Amanda was a report of:

no size line match in /usr/lib64/amanda/rundump (xfsdump) output

This happened because of Fedora bug 1830320, adequately summarized as "ext4 filesystem dumped with xfsdump instead of dump". The cause of this is that Fedora 32's Amanda RPMs are built without the venerable dump program and so do not try to use it. Instead, if you tell Amanda to back up a filesystem using the abstract program "DUMP", Amanda always uses xfsdump regardless of what the filesystem type is, and naturally xfsdump fails on extN filesystems.

I have historically used various versions of the Unix *dump family of programs because I felt that a filesystem specific tool was generally going to do the best job of fully backing up your filesystem, complete with whatever peculiar things it had (starting with holes in your files). ZFS has no zfsdump (although I wish that it did), so most of my workstation's filesystems are backed up with tar, but my root filesystem is an extN one and I used dump. Well, I used to use dump.

At first I was irritated with Fedora packaging and planned to say grumpy things about it. But then I read more, and discovered that this Amanda change is actually a good idea, because using Linux dump isn't a good idea any more. The full story is in Fedora bug 1884602, but the short version is that dump hasn't been updated to properly handle modern versions of extN filesystems and won't be, because it's unmaintained. To quote the bug:

Looking at the code it is very much outdated and will not support current ext4 features, in some cases leading to corrupted files without dump/restore even noticing any problems.

Fedora is currently planning to keep the restore program around so that you can restore any dump archives you have, which I fully support (especially since the Linux restore is actually pretty good at supporting various old dump formats from other systems, which can be handy).

I have some reflexes around using 'dump | restore' pipelines to copy extN filesystems around (eg, and also), which I now need to change. Probably tar is better than rsync for this particular purpose.

(I'll miss dump a bit, but a backup program that can silently produce corrupted backups is not a feature.)

PS: dump is a completely different thing than dumpe2fs; the former makes backups and the latter tells you details about your extN filesystem. Dumpe2fs is part of e2fsprogs and naturally remains under active development as part of extN development.

linux/ExtNDumpDeprecated written at 00:30:29; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.