2020-11-04
In Python, using the logging
package is part of your API, or should be
We have a Python program for logging email attachment type information. As part of doing this, it wants to peer inside various sorts of archive types to see what's inside of them, because malware puts bad stuff there. One of the Python modules we use for this is the Ubuntu packaged version of libarchive-c, which is a Python API for libarchive. Our program prints out information in a very specific output format, which our Exim configuration then reads and makes use of.
Very recently, I was looking at our logs for an email message and noticed that it had a very unusual status report. Normal status reports look like this:
1kX88D-0004Mb-PR attachment application/zip; MIME file ext: .zip; zip exts: .iso
This message's status report was:
Pathname cannot be converted from UTF-16BE to current locale.
That's not a message that our program emits. It's instead a warning
message from the C libarchive library. However, it is not printed
out directly by the C code; instead this report is passed up as an
additional warning attached to the results of library calls. It is
libarchive-c that is deciding to print it out, in a general
FFI support function.
More specifically, libarchive-c is deciding to 'log' it through the
Python logging
package; the default logging environment then prints it out to
standard error.
(Our program does not otherwise use logging
, and I had no
idea it was in use until I tried to track this down.)
A program's output is often part of its API in practice. When code
does things that in default conditions produces output, this alters
the API of the program it is in. This should not be done casually.
If warning information should be exposed, then it should be surfaced
through an actual API (an accessible one), not thrown out randomly.
If your code does use logging
, this should be part of its
documented API, not stuffed away in a corner as an implementation
detail, because people will quite reasonably want to know this
(so they can configure logging
in general) and may want to
turn it off.
In a related issue, notice that libarchive-c constructs the
logger it will use at import
time (here),
before your Python code normally will have had a chance to configure
logging, and will even use it at import time under some circumstances
(here
and here),
as it is dynamically building some bindings. I suspect that it is
far from alone as far as constructing and even using its logger at
import time goes.
(It's natural to configure logging as part of program startup, in
a main()
function or something descending from it, not at program
load time before you start doing import
s. This is especially the
case since how you do logging in a program may depend on command
line arguments or other configuration information.)
(This is the background for this tweet of mine.)
You shouldn't use the Linux dump
program any more (on extN filesystems)
When I upgraded my office workstation to Fedora 32, one of the things that happened is that Amanda backups of its root filesystem stopped working. The specific complaint from Amanda was a report of:
no size line match in /usr/lib64/amanda/rundump (xfsdump) output
This happened because of Fedora bug 1830320, adequately
summarized as "ext4 filesystem dumped with xfsdump instead of dump".
The cause of this is that Fedora 32's Amanda RPMs are built without
the venerable dump
program and
so do not try to use it. Instead, if you tell Amanda to back up a
filesystem using the abstract program "DUMP"
, Amanda always uses
xfsdump regardless of what the filesystem type is, and naturally
xfsdump fails on extN filesystems.
I have historically used various versions of the Unix *dump
family of programs because I felt that a
filesystem specific tool was generally going to do the best job of
fully backing up your filesystem, complete with whatever peculiar
things it had (starting with holes in your files). ZFS has no
zfsdump
(although I wish that it did),
so most of my workstation's filesystems are backed up with tar
,
but my root filesystem is an extN one and I used dump
. Well, I
used to use dump
.
At first I was irritated with Fedora packaging and planned to say
grumpy things about it. But then I read more, and discovered that
this Amanda change is actually a good idea, because using Linux
dump
isn't a good idea any more. The full story is in Fedora
bug 1884602,
but the short version is that dump
hasn't been updated to properly
handle modern versions of extN filesystems and won't be, because
it's unmaintained. To quote the bug:
Looking at the code it is very much outdated and will not support current ext4 features, in some cases leading to corrupted files without dump/restore even noticing any problems.
Fedora is currently planning to keep the restore
program around
so that you can restore any dump
archives you have, which I fully
support (especially since the Linux restore
is actually pretty
good at supporting various old dump
formats from other systems,
which can be handy).
I have some reflexes around using 'dump | restore
' pipelines to
copy extN filesystems around (eg,
and also), which I now need to change.
Probably tar
is better than rsync
for this particular purpose.
(I'll miss dump
a bit, but a backup program that can silently produce
corrupted backups is not a feature.)
PS: dump
is a completely different thing than dumpe2fs
; the
former makes backups and the latter tells you details about your
extN filesystem. Dumpe2fs is part of e2fsprogs and naturally remains under
active development as part of extN development.