An illustration of why running code during import is a bad idea (and how it happens anyway)

October 29, 2020

It's a piece of received wisdom in Python programming that while you can make your module run code when it's import'd, you normally shouldn't. Importing a module is supposed to be both fast and predictable, doing as little as possible. But this rule is not always followed, and when it's not followed you can get bad results:

If you've remotely logged in to a Fedora machine (and have no console session there) and the python3-keyring package is installed, 'python3 -c "import keyring"' takes 25 seconds or so as the module tries to talk to keyrings on import and waits for some long timeouts. Nice work.

(The keyring module (also) provides "an easy way to access the system keyring service".)

On the one hand this provides yet another poster child of why running code on import is very bad, since merely importing a module should clearly not stop your Python program for 25 seconds. On the other hand, I think that this case makes an interesting illustration of how it is possible to drift into this state through a reasonably sensible API choice.

Keyring has a notion of backends, which actually talk to the various different system keyring services. To use keyring, you need to pick a backend to use and initialize it, and by 'you' we mean 'keyring', because people calling keyring just want to use a generic API without having to care what backend is in use on this system. So when you import the keyring module, core.py picks and initializes a backend during the import:

# init the _keyring_backend
init_backend()

Automatically selecting and initializing a backend on import means that keyring's API is ready for callers to use right away without any further work. This is a friendly API, but assumes that everyone who imports keyring will go on to use it. While this sounds reasonable, a Python program may only need to talk to the keyring for some operations under some circumstances, and may mostly never use it. One such program is pip, which needs the keyring only rarely but imports it all of the time.

(Unconditional imports are the obvious and Pythonic thing to do. People look at you funny if your program does 'import' in a function or a class, and it's harder to use the result.)

However, selecting the backend on import has a drawback, at least on Linux, which is that keyring has to figure out which system keyring services are actually active right now, because in the Linux way there's more than one of them (keyring supports SecretStorage and direct use of KWallet, plus third party plugins). Since keyring has decided to choose the backend it will use at import time, it has to determine which of its supported system keyring services are active at import time.

Some of keyring's backends determine whether or not the corresponding system service is active by trying to make a DBus connection to the service. Under the right (or the wrong) circumstances, this DBus action can stall for a significant amount of time. For instance, you can see this in the kwallet backend code; it attempts to get the DBus object /modules/kwalletd5 from org.kde.kwalletd5. Under some circumstances, this DBus action can fail only after a long timeout, and now you have a 25 second import delay.

This import delay isn't a simple case where the keyring module is running a bunch of heavyweight code. Instead keyring is doing a potentially dangerous operation by talking to an outside service during import. It's not necessarily obvious that this is happening, because you need to understand both what happens in a specific backend and what's done at import time (and in isolation each piece sounds sensible). And a lot of time talking to the outside service will either work fine and be swift, or will fail immediately.


Comments on this page:

It seems to me that the obvious thing to do in a case like keyring would be to provide an "initialize_api" function, or something like it, that does the work that is currently being done at module level on import. Then someone who wanted to make sure that initialization was done at import time would just call the function right after importing, while someone who might not use the API at all could wait until they were going to use it to call the function (or the API functions could call it themselves on first use if it wasn't called already--which requires storing some state in the module to track API initialization, but any module that's running code at module level anyway shouldn't have a problem with that).

Written on 29 October 2020.
« An issue with Pip installed packages and Python versions (on Unix)
A sysadmin learning experience courtesy of some UPS issues »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Oct 29 00:54:21 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.