2013-03-22
The problem with trying to make everything into a Python module
One of the reasons for Django's unpleasant project restructuring is that they want your website directory (ie the directory that your project sits in) to be a module that can be imported. This in fact seems to be somewhat of a general trend; all sorts of things rather want you to to have not just a collection of files in a directory but an actual module. I wish they'd stop. Modules are not the be all and end all in Python, at least not as currently implemented, and not everything needs or wants to be a module.
The general reason for making things into modules is namespaces for
imports. If you're sitting in your project's directory and do 'import
fred', in theory this is ambiguous; you might mean your fred.py
or you might mean some global fred module installed in Python. The
absolute form of 'import mystuff.fred' is more or less unambiguous.
(This preference for modules also goes with the fact that the relative
import syntax, 'from . import fred', is only valid in an actual
module. I think that this is a terrible mistake, but no one asked me for
my opinion.)
I have no problem with modules as such. The problem I have is how you
get a directory to be a module, namely that you add the directory's
parent to the Python search path (in one of a number of ways), and
then the directory becomes a module (or technically I think a package)
called its directory name. This is bad in at least two ways. It tightly
couples together the directory name and the module name and it also
makes everything else in the directory's parent available as a potential
module. What both of these have in common is undesired name collisions.
For example, you cannot be working on two versions of a 'fred' module
that are sitting in a directory as, say, src/fred-1 and src/fred-2,
not unless you want to have a src/fred symlink that you keep changing
back and forth.
(The natural structure seems to be to isolate each module in its own
artificial parent directory (eg src/fred-1/fred) or to ignore the
whole issue, put everything in src/, and assume you will never have
any collisions or be developing a new version of fred that you don't
want src/bob getting when it does an 'import fred'.)
What would make this situation okay is a simple way to tell Python 'directory X is module Y', where 'X' might be '.' (the current directory). This should be available both on the Python command line and from inside Python code. Sadly I don't expect this to arrive any time soon.
(This stuff irritates me for reasons that are hard to pin down. Partly
it just feels wrong (eg '/src' or wherever isn't a directory of
modules, so why am I telling Python that it is?).)
2013-03-17
Argument validation using functions
There's a pattern (or perhaps an anti-pattern) that I keep inventing in my programs. I start out with a bunch of commands (or macros or template text renderers or the like) that can take arguments (registered somehow), and I have all of the functions do their own argument count validation. But this is repetitive, so I start having the central dispatching code do some checks on the argument count. But there are always special cases (one command might take exactly N arguments, another takes M to N, another takes at least N but maybe more, and so on), so pretty soon I start trying to encode all of this in increasing baroque special meanings for various sorts of argument counts ('if it's negative, it means...').
In thinking about this recently (as part of some DWiki changes I'm thinking about) I've realized another approach, hopefully a better one. Instead of trying yet another crazy encoding scheme, I can use functions to validate the argument count. Instead of registering the argument count, register a function that validates the argument count. These functions (or callable objects) will of course be created by argument count validation factories, so I will write code like:
register("fred", fredfunc, noMoreThan(3))
register("brad", bradfunc, betweenCnt(2, 4))
register("barney", barnfunc, anyOf(0, 1, 3, 5))
The great attraction of this approach to me is that it completely decentralizes the encoding scheme for argument validation (and thus the complexity of argument validation entirely). The central dispatch function simply calls the validation function and doesn't care any further; all of the huge variety of possible arguments necessary is delegated to the code that creates any particular validation function. I can have any sort of validation ranging from very generic to completely custom, whatever makes the most sense, and none of the complexity of that shows up outside of code that actually uses it.
This is also completely expandable. New forms of argument validation just need new functions, they don't need any changes in the central dispatch system to understand and handle yet another special case. This is an attractive property for me since I never know just what sort of arguments I'm going to need until I actually write a particular command (or whatever) handler.
Obviously, this can be extended to also validate various properties of the arguments (for example, you might know that the first argument of a particular command has to be a file). When you reach this sort of extended argument validation I start to think that you want something like an ArgValidator class which you instantiate and then start adding restrictions to (otherwise you have a rapidly exploding number of combinations of various options; basically you want some way of easily composing separate restrictions together instead of having to hard code them).