Wandering Thoughts archives

2016-03-30

I've now used Python's argparse module and I like it

The argparse module is the Python 2.7+ replacement for the old optparse module, which itself was basically an extension of the basic getopt module. A number of years ago, when I could barely start using argparse, I took a look at argparse's documentation and wound up with rather negative feelings about it. Having now written a program or two that uses argparse, I'm going to take those old views back.

I don't yet have any opinion on argparse as more than an optparse replacement for putting together quick argument handling for simple commands, but there are a number of things that I like about it for that role. In no particular order:

  • argparse doesn't produce warnings from pychecker. I know, this is a petty thing, but it's still nice to be able to just run 'pychecker program.py' without having to carefully guard 'import optparse' with various magic bits of code.

  • It's nice to be able to skip setting a default value for boolean flags with a store_true or store_false action. One less bit of code noise.

  • argparse gives you a simple way to define conflicting options. It isn't all that general but just having it there means that my programs have somewhat better option error checking. If I had to do it by hand, I might be tempted to not bother.

    (Because of the lack of generality, argparse doesn't give you top notch handling of conflicting arguments; if you want to do a really good job in even moderately complicated situations, you'll have to at least partially roll your own. But argparse is good enough for handling obvious cases in a simple program that you don't expect to be misused except by accident.)

  • It's conveniently lazy to let argparse handle positional arguments too. You can just tell it that there must be exactly N, or at least one, or whatever, and then continue onwards knowing that argparser will take care of all of the error checking and problem reporting and so on. If it gets to your code, you have at least the right number of arguments and you can pull them off the Namespace object it returns.

    (If you want to go a little bit crazy you can do a bunch of argument type validation as argparse processes the arguments. I'm not convinced that this is worth it for simple programs.)

The result of all of this is to reduce the amount of more or less boilerplate code that a simple argparse-using program needs to contain. Today I wrote one where the main function reduced down to:

def main():
   p = setup_args()
   opts = p.parse_args()
   for grname in opts.group:
      process(grname, opts)

All of the 'must have at least one positional argument' and 'some options obviously conflict' and so on error handling was entirely done for me in the depths of parse_args, so my code here didn't even have to think about it.

(I've historically shoved all of the argument parser setup off into a separate function. It's sufficiently verbose that I prefer to keep it out of the way of the actual logic in my main() function; otherwise it can be too hard to see the logic forest for the argument setup trees. With a separate setup_args() function, I can just skip over it entirely when reading the code later.)

ArgparseBriefPraise written at 23:41:54; Add Comment

2016-03-16

How 'from module import ...' is not doing what you may expect

There are a number of reasons to avoid things like 'from module import *'; for instance, it can be confusing later on and you can import more than you expect. But if you're doing this in the context of, say, just splitting a big source file apart it's tempting to say that these are not really problems. You're not going to be confused about where things come from because you're only importing everything from your own source files (and you're not even thinking of them as modules), and it's perfectly okay for there to be namespace contamination because that's kind of the point. But even then there are traps, because 'from module import ...' is not really doing what you might think it's doing.

There's two possible misconceptions here. If you're doing 'from module import *' within your own code, often what you want is for there to be one conjoined namespace where everything lives, both stuff from the other 'module' (really just a file) and stuff from your 'module' (the current file). If you're doing 'from module import A', it's easy (and tempting) to think that when you write plain A in your code, Python is basically automatically rewriting it to really be 'module.A' for you. Neither is what is actually going on in Python, although things can often look like it.

What a 'from module' import really does is it copies things from one module namespace into another. More specifically it copies the current bindings of names. You can think of 'from module import *' as doing something roughly like this:

import module
_this = globals()
for n in dir(module):
    _this[n] = getattr(module, n)

del _this
del module

(This code does not avoid internal names, doesn't respect __all__, and so on. It's a conceptual illustration.)

There are still two completely separate module namespaces, yours and the namespace of module; you have just copied a bunch of things from the module namespace into yours under the same name (or just some things, if you're doing 'from module import A, B'). Functions and classes from module are still using their module namespace, even if a reference to some or all of them has been copied into your module.

(As a corollary to this, things from module mostly can't refer to anything from your module namespace. This is easy to see since you can't have circular imports; if you're importing module to get at its namespace, it can't be importing you to get at yours. (Yes, there are odd ways around this.))

One reason why this matter is that if functions or classes from module update stuff in their module namespace, you may or may not pick it up in your own module. For example, consider the following code in some other module:

gvar = 10
func setit(newval):
  global gvar
  gvar = newval

The gvar that you see in your own module will forever be '10', no matter what calls to setit() have been made. However, code in the other module will see a different value for gvar.

Not all sorts of updates will do this, of course. If gvar is a dictionary and code just adds, changes, and deletes keys in it, everyone will see the same gvar. The illusion of a shared namespace can hold up, but it is ultimately only an illusion and it can be fragile. (And unless you already know Python well, it isn't necessarily easy to see where and when it's going to break down.)

Sidebar: An additional bit of possible weirdness

There are some situations where a module's namespace is more or less overwritten wholesale; the obvious case is reload() of the module. If you reload() a module that has been the subject of 'from module import ...', all of those bare imports are now broken, or at least not updated themselves. You can get into very odd situations this way (especially considering what reloading a module really does).

FromImportBindingIssue written at 23:59:45; Add Comment

I wish I could split up code more easily in Python

This really starts with some tweets:

This Python program has grown to almost 1500 lines. I think I need an intervention, or better data structures, or something.
I also wish it was easier and more convenient to split up a Python program across multiple source files (it's one way Go wins).

The best way to split up a big program is to genuinely modularize it. In other words, find separate pieces of functionality that can be cleanly extracted and turn them into Python modules, in separate files. There are still issues with your main program actually finding the modules, but this can be worked around (even though it is and remains annoying).

However, this assumes that you have a modular structure to start with, with things sensibly separated. If your program started off as a little 200 line thing and then grew step by step into a 1500 line monster (especially iteratively), you may not necessarily have this. That's where Python makes things a little bit awkward. Splitting things up into separate files fundamentally puts them in separate modules and thus separate namespaces; in order to do it, you need to be able to pull your code apart in this way. If your code isn't in this state already you have some degree of rewriting ahead of you, and in the mean time you have a 1500 line Python file.

(In theory you can do 'from modname import *'. In practice this is only faking a single namespace and the fakery can break down in various ways.)

Go may be less elegant here (and Go certainly makes it harder to have separate namespaces), but you can slice a big source file up into several separate ones while keeping them all co-mingled as one module, all using bits and pieces from each other. Sometimes this is more convenient and expedient, even if it may be uglier.

With that said, Python has excellent reasons to require every separate file to be a separate module. To summarize very quickly, it's tied to how you don't just load a file of Python source code, you run it (with things like function and class definitions actually being executable statements, and possibly other interesting things happening). This is a straightforward model that's quite appropriate for an interpreted language, but it imposes certain constraints.

SplittingProgramProblems written at 01:43:45; Add Comment

2016-03-04

Some notes on supporting readline (tab) completion in your Python program

Adding basic readline-style line editing to a Python program that reads input from the user is very simple; as the readline module documentation says, simply importing the module activates this without you having to call anything. However, adding completion is less well documented, so here are some notes about it.

First, you need both a readline completion binding and to register a completion function. The easiest way to get a completion binding is just to set it up explicitly:

readline.parse_and_bind("tab: complete")

You may also want to change the delimiter characters with readline.set_completer_delims. In my own code, I reduced the delimiters to space, tab, and newline. Note that if you have possible completions that include delimiter characters, nothing complains and things sort of work, but not entirely.

So, now we get to completion functions. Readline needs a completion function, and it's easiest to show you how a simple one works:

comps = ["abc", "abdef", "charlie", "horse",]
def complete(text, state):
   # generate candidate completion list
   if text == "":
      matches = comps
   else:
      matches = [x for x in comps if x.startswith(text)]

   # return current completion match
   if state > len(matches):
      return None
   else:
      return matches[state]

readline.set_completer(complete)

You are passed the current 'word' being completed and a 'state', which is a 0-based index. Your completion function's job is to return the state'th completion for the current word, or something other than a string if you've run out of completions, and you'll actually be called with ever-increasing state values until you declare 'no more'. As we see here, the list of completions that you return does not have to be in alphabetical order. Obviously it really should be a stable order for any particular input word; otherwise things will probably get confused.

By the way, readline will completely swallow any exceptions raised by your complete() function. The only symptom of major errors can be that you get fewer or no completions than you expect.

Of course it's common to want to be a little smarter about possible completions based on the context. For instance, you might be completing a command line where the first word is a command and then following words are various sorts of arguments, and it'd be nice not to offer as completions things that would actually be errors when entered. To do this, you often want to know what is before the current word being completed:

def get_cur_before():
   idx = readline.get_begidx()
   full = readline.get_line_buffer()
   return full[:idx]

Because words being completed stop at delimiter characters, anything in this before-the-word text is what readline considers a full word (or words). Otherwise, it would be part of the word currently being completed on. If you want to know what the first complete word of the line is, you can thus do something like:

   pref = get_cur_before()
   n = pref.split()
   cmd = n[0] if len(n) > 0 else ""

You can then use cmd to decide what set of completions to use. Other options are possible with the use of various additional readline functions, but this is all I've needed to use so far for the completions in my code.

Given that your complete() function is being called repeatedly every time the user hits TAB, and that it does all of this examination and selection and matching every time it's called, you might worry about performance here; it sure seems like there's a lot of duplicate work being done here. The good news is that modern computers are very fast, so you probably aren't going to notice this. If you do worry about this, what the rlcompleter module does is that it generates the list of matches when state is 0 (and caches it), and uses the already-cached list whenever state is non-zero. You can probably count on this to keep working in the future.

Speaking from personal experience, it was not all that much work to add readline completion to my program once I worked out what I actually needed to do, and having readline tab completion available is surprisingly fun.

(And of course it's handy. There's a reason everyone loves tab-completing things when they can.)

PS: remember to turn off readline completion if and when it's no longer applicable, such as when you're getting other input from the user (perhaps a yes/no approval). Otherwise things can get at least puzzling. This can be done with readline.set_completer(None).

ReadlineCompletionNotes written at 01:14:10; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.