2011-03-31
A slightly unobvious trap with 'from module import *'
If you are already being lazy, it is easy to
drift into the habit of doing 'from yourmodule import *' in your
own code, especially in situations where you really are going to use
everything from your module in the code that you're writing (for
example, importing your Django app's models into your view code for
it). As it happens, there is a little trap for you waiting here.
When they write 'from <X> import *', most people are probably thinking
that it imports everything they've defined in module X into their
current module. But this is not quite what it does. What it actually
does is that it imports everything in module X's namespace into your
namespace. Now, most of what's in module X's namespace is what you
defined in it. But the namespace also includes whatever you imported in
module X; all of those things have now been silently imported into
your namespace too.
(Yes, yes, __all__ can cut this off. I don't think people use
__all__ very much for internal modules and internal imports like
this, unless they are the sort of person who eschews 'from <X> import
*' in the first place.)
At this point it's quite easy to just start using some of these silently
imported things in your code without realizing that you haven't
explicitly imported them. When dealing with a bunch of directly imported
things (eg, 'from what.util import SomeThing') and a bunch of your own
code, it's easy to lose track of what you have and haven't imported yet.
When you write the code and it works, you're of course going to assume
that you explicitly imported whatever you're using, because that's the
only way the code could work, right?
Now the problem is that in your code in module Y is quietly depending
on the inner workings of module X. If you revise module X so that it
no longer needs the SomeThing utility bit and so remove the import,
X will still work but suddenly this other module Y breaks. Oops.
(The extreme case of this is forgetting to import an entire module.
Then you might wind up with all sorts of code in module Y making free
use of, say, 'socket.<whatever>', despite there being nary an 'import
socket' to be seen. This is the sort of thing that leaves you starting
at the revision history in your version control system and wondering how
the code ever worked in the first place.)
A more perverse version of this has intermediate steps; module X is imported wholesale into module Y, which is in turn imported wholesale into module Z, which is where you accidentally use SomeThing without importing it. After this happens, there's a whole raft of changes that can be done that break module Z.
(I'm sure that your imagination can come up with even more extreme and odd scenarios.)
2011-03-28
Why you should avoid 'from module import whatever'
Every so often I get to re-learn vaguely painful lessons.
For whatever reason, Django has a distinct coding style, as expressed
in things like their tutorial documentation. When I wrote my Django
application recently I generally followed this
style, partly because I took bits and pieces straight from the tutorial
(because it was the easy approach). One part of the Django style is a
heavy use of 'from module import SomeThing', or even 'from module
import *', and I copied this for my own code. Everything worked fine
and it certainly was convenient.
Then somewhat later I went back to the code and found myself staring at
a section of it, wondering just where a particular function came from.
Fortunately, context and naming made it fairly obvious that it wasn't
a standard Django function, but I had to do a file search to determine
whether it was from my views.py or my models.py.
(And I had it easy, since I only had two files that it could have come from.)
In a nutshell, this is why namespace-less imports are bad: they hide
where names come from, stripping them of context. Context is a good
thing, because us fallible humans can only hold so many things in our
head; the more explicit context, the less we have to keep track of
ourselves. Even partial module names helps (where you do 'from module
import dog' and then use 'dog.SomeThing'); if nothing else, it tells
you that this particular name doesn't come from the current file and it
gives you an idea of where to start looking.
(In the best case the partial module name is both unique and distinctive, so it generally gives you the full context on the spot.)
Some people will object that specifying even a partial module name results in too much typing. My response is that this just means you need shorter names.
(Yes, coming up with good short names are hard. No one said API design was easy.)
PS: I don't particularly fault Django for this particular element of their style; it's consistent and fits their overall goals and does save a certain amount of more or less make-work.
2011-03-22
How to add and use additional fields on Django model formsets
Suppose that you have a model formset and for some reason you want to add an additional field to each individual form (or perhaps you want to reinterpret a model field into something that is more user-friendly), a field that is of course not in your model schema.
At one level, adding custom fields or custom field handling to a model formset is relatively simple once you know what to do. At another level, the question is how to get access to information about the field. The normal way of dealing with a model formset is:
if formset.is_valid():
instances = formset.save(commit=False)
for thing in instances:
....
The problem is that thing is a model instance, not a form, and our new
field appears only in the form; since it is not in the model schema,
Django cannot copy it to the model instance it derives from the form
information. When you're using a model form or model formset, the only
thing that Django does with fields that are not in the model is validate
them and then (effectively) throw them away.
(This can sometimes be useful, for example an 'I agree to these terms' checkbox. If you make this a boolean field and require it, the field and thus the form will not validate until it is ticked.)
If we want access to non-model fields in a model formset, we need to
directly iterate the forms of the formset instead of just iterating the
model instances. These are available through formset.forms but this
has all of the forms in the formset, including ones that haven't been
modified or used; we need to exclude them. The way to do this is:
if formset.is_valid():
for form in formset.forms:
if not form.has_changed():
continue
thing = form.save(commit=False)
... process ...
You now have direct access to both the form itself and the corresponding model instance, so you can check the form for your extra fields and do whatever processing you need.
Note that you do not have to specifically check that the form itself is valid. Because the entire formset validated, we know that any changed form is itself valid. Unchanged forms may well not be if this was a formset for entering new data, since they will still be blank.
(This applies to Django 1.2.5, and my disclaimer is that I am new to Django so this may well not be considered the Django-correct way to do this particular thing.)
PS: as far as I can see, you do not want to use formset.cleaned_data
here. Although it exists, it's the basic form data with no clear way to
turn it into a model instance and it still includes all of the unchanged
or blank forms in the formset.
Sidebar: the actual problem in concrete
What I've written here sounds very abstract and you might be wondering why anyone would want to do something like this, so lets make it concrete with my Django application, our account request management system.
Account requests normally have to be approved by their sponsor; however, staff can approve requests on behalf of the sponsor. Staff can also enter a bunch of new account requests, which are normally not pre-approved and need the sponsor's approval. Suppose that we want to add a checkbox to the form to say 'mark this request as approved when it gets created', to save staff from the annoyance of creating a bunch of new requests and then immediately going off to approve them all.
This checkbox is not a model schema field directly (although ticking it results in a different value for schema field). I suppose that with a lot of effort we could create some sort of custom widget mapping that turns the 'status of request' model schema field (normally a three way choice) into a boolean tickbox (unticked makes the status 'Pending', ticked makes it 'Approved'), but I rather think that my approach here is simpler.
Some notes on doing things with Django model formsets
Django's model formsets are not well documented, at least not in the
Django documentation I've found on their website. Oh, the API docs say
more or less what parameters things like modelformset_factory()
take, but they won't tell you how you should use them. In particular the
documentation I've seen doesn't say very much about how to customize
what appears in your form elements and so on.
So here is what I know:
The form argument to modelformset_factory() is used to
construct the class for individual form elements. It should inherit from
forms.ModelForm like regular customized forms, but unlike regular
forms it should not have an internal Meta class; the Meta class
(or its equivalent) will be added by the model formset construction
process. Customized form classes can alter the default look and
behavior of schema fields by defining form fields as usual, and they
can also define validation and cleaning functions. Since form field
validation is more powerful than schema field validation, you may want
to override fields to, eg, make them into forms.RegexField fields with
appropriate regular expressions. Or just to improve the labels and
error messages.
(Yes, the need for this is a pain in the rear. If you want user friendly validation and error messages, you can wind up overriding nearly the entire set of model fields. Of course this pain exists for ordinary model forms as well.)
The formset argument to modelformset_factory() is used to
construct the class for the overall formset. It should inherit from
BaseModelFormSet (from django.forms.models). What I have used this
for is a clean() method that makes sure that no two newly-created
account requests have the same login. I believe that any clean()
function you use should start out by calling the superclass clean().
The fields argument to modelformset_factory() is a list (in
the broad sense) of what additional fields from the model should be
included in the individual forms. Similarly, the exclude argument is
the list of what additional fields should be excluded. Note that this
is additional fields; if you have a custom form class, any fields it
defines explicitly are always included. You do not need to list them in
fields, and you cannot make them go away by listing them in exclude.
If you need to include custom fields only some of the time, you will
need multiple form classes. Yes, this is annoying, especially if you
have a lot of variants (and there may be a better way that involves more
magic).
(You can sort of see the implementation showing through here.)
For future reference (given that Django changes over time), this is all applicable to Django 1.2.5.
Sidebar: how I find out what fields have changed in edited forms
In a regular form (even a model form) you can inspect
form.changed_data to see what fields have been edited. This is
awkward to do in a modelformset, because you do not have convenient
access to the individual forms that have been changed. How I get
around this is the following, somewhat hacky code:
if formset.is_valid():
instances = formset.save(commit=False)
cdict = dict(formset.changed_objects)
for thing in instances:
changed = cdict[thing]
....
(In my application I need to take special action when various fields are modified, plus I like having audit records that say what fields were edited.)