Why your main program should be importable

September 7, 2008

When I first started coding in Python, I didn't know what I was doing. So I structured my Python programs the way I would write Bourne shell scripts or Perl programs, writing functions as necessary and useful but otherwise putting all of the logic and code in the program's file outside of functions (in what I now call 'module scope').

This is a perfectly rational structure for Python programs, and even works; my programs ran fine and were perfectly functional. But it was also a bad mistake, as I slowly discovered later; what you really want to do is put all of your code in functions (and then start one with magic).

The problem that makes it a mistake is that a program written this way cannot be imported as if it was just another Python module; if you try, the program's code immediately starts running and explosive things start happening. There are at least two reasons why this is unfortunate:

  • various useful tools like pychecker rely on importing your code in order to pick through it. This is arguably a mistake on pychecker's part and they should be using a more robust mechanism, but it's how they work right now, so if you want to use them (and pychecker is usually quite useful) you have to live with it.

    (Discovering pychecker and trying to use it on my programs was how I began to realize the mistake I'd made.)

  • being able to import your main program gives you a handy method of testing bits of it from an interactive interpreter.

    To make this really work you need to code your program so that it calls sys.exit() as little as possible. If a function runs into a fatal error it should not do the usual 'call die() with an error message' thing; instead, it should raise an exception. Only the very top of the program should catch those exceptions and wind up calling sys.exit().

    (And if you don't like phase tracking, catching and wrapping exceptions can give you a nice method to add context to the error message that you'll wind up reporting.)

I'm sure that this is strongly suggested somewhere in the Python documentation and the smart people were aware of it from the start, but I missed it (to my regret with those early programs).

Oh yes, the magic you need to make your top level function start running when your program is actually run (instead of being imported) is:

if __name__ == "__main__":
    ... run code here ...

At the module scope, __name__ is normally the name of your module (well, the name it is being imported by). When Python is running your code because it has been directly handed to the interpreter, Python sets the name to "__main__" instead.

Sidebar: my current program structure

The program structure that I have wound up adopting for my own programs looks something like this:

import sys
def process(...):
    .....

def main(args):
    .....
    try:
        process(...)
    except EnvironmentError, e:
        die("OS problem: "+str(e))
    except MyError, e:
        die(str(e))
    ....

if __name__ == "__main__":
    main(sys.argv[1:])

The main() function parses the arguments, loads configuration files, and so on, and then calls process() with whatever arguments are appropriate for the program; process() actually starts to do work. To put it one way, main() does all the stuff that only has to be done when the program is being run as an actual program.


Comments on this page:

From 194.8.197.204 at 2008-09-07 09:38:46:

FWIW, the exact same approach can be used in Perl, where the condition that signals top-level scope is (quite logically, actually) “not defined caller”. Generally I put the main program code in a main function and then put “main(@ARGV) if not defined caller;” at the very bottom of a script.

Scripts that grow more complex generally benefit from being cast in terms of App::Cmd.

Aristotle Pagaltzis

From 76.15.184.226 at 2008-09-07 11:35:20:

You forgot the most important reason for an importable main, and that's unit testing.

Python's my favorite language these days, but the lack of strong typing and the fact that lines that aren't executed aren't checked means that you can get nasty surprises a long way down the road if your code takes a path it never has before.

It's so fast to develop in Python that you should re-invest some of the time you save into writing unit tests. Even the trivial one, that simply runs through every line of code and ignores the result, will catch about half your latent errors with an investment of about 5 minutes in time.

-- TomRitchford

From 76.90.7.98 at 2008-09-07 13:47:22:

Can't you just touch a blank __init__.py and make the folder the file lives in a module?

Voila, instant importability for all your old scripts.

From 150.101.214.82 at 2008-09-07 19:52:26:

Can't you just touch a blank __init__.py and make the folder the file lives in a module?

Wrong terminology: An '__init__.py' module makes a directory into a package.

Voila, instant importability for all your old scripts.

The point wasn't "can the import statement find this module". The point was to write one's program modules such that, when imported, they don't have side-effects.

Remember that a module, when first imported, will be executed. All statements at the module scope will be run in sequence. If one's program is written naively, that will cause unwanted effects like inputs and outputs, or interaction with other subsystems, which are not desired when importing the program as a module.

If one's program is written correctly, importing it should be fine, because those module-scope statements will do nothing but import other modules, create objects, classes, functions, and name bindings. The "do your stuff" parts should be available, but not triggered except by a minimal 'if __name__ == "__main__":' stanza.

From 150.101.214.82 at 2008-09-07 19:55:51:

main(sys.argv[1:])

Why throw away the first argument before the 'main' function gets it? It's often useful to know the program name with which the program was invoked, and that handling belongs in the 'main' function.

By cks at 2008-09-07 22:00:49:

Most of the time, the only use my programs have for the actual name the program was invoked under is putting it in error messages. For that I think of the name of the program as part of the environment, and have the low-level routines just use sys.argv[0] directly.

(After all, this is not C; we have not lost argv[0] if we do not pass it around.)

If the main() routine actually did different things when invoked under different names, then the program name would be part of the parameters and I would pass the full sys.argv to main(). But most of the time it is not, and chopping it out simplifies various bits of logic in main().

By Dan.Astoorian at 2008-09-08 12:10:37:

Having main() sometimes take sys.argv but other times take sys.argv[1:] seems as though it could cause confusion, especially if you want to take an existing program that didn't examine argv[0] and convert it to one which does. Personally, I'd rather see an idiom like:

  def main(args=[], cmdname="(none)"):
      ...

  if __name__ == "__main__":
     main(sys.argv[1:], sys.argv[0])

--Dan

From 79.220.204.248 at 2008-09-08 18:50:14:

I'm sure that this is strongly suggested somewhere in the Python documentation and the smart people were aware of it from the start, but I missed it

I am also interested to see where the smart guy got it from. How do you know it apart from discovering by reading on the web?

By cks at 2008-09-08 22:19:29:

I haven't yet written a (Python) command that did different things depending on what name you invoke it as, so I've opted to be lazy in my usual pattern for a main() or equivalent. I would rather consider it a sort of general pattern than pass an extra argument that everything I've written so far would ignore.

By Dan.Astoorian at 2008-09-09 09:57:13:

My point was that if you did, it would seem cleaner to make the command name a separate argument than to have your args list grow an extra value at the front and having to change getopt(args, ...) to getopt(args[1:], ...) in the body of main().

--Dan

From 75.162.153.62 at 2008-09-29 00:48:08:

I wrote a blog entry about best practices for python scripts (as in a script, not a big program, etc), a while back [1].

I've since made it into a cheat sheet [2].

It's really surprising how something simple ends up become more complicated and quite widespread, while using pretty bad coding practices. Yeah 800 line of code not in a function is pretty hard to test ;)

I've since wrote a program that spits out the boilerplate found in the cheatsheet, but am currently sitting on it. I should probably release it just to continue to help people understand good practices....

  1. - http://panela.blog-city.com/boilerplate_for_maintainable_distributible_testable_python.htm
  2. - http://panela.blog-city.com/more_beginner_python_cheatsheets.htm
Written on 07 September 2008.
« Why negative DNS caching is necessary
How to get as much of your program byte-compiled as possible »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Sep 7 01:02:07 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.