Wandering Thoughts

2020-10-26

Fifteen years of DWiki, the Python engine of Wandering Thoughts

DWiki, the wiki engine that underlies Wandering Thoughts (this blog), is fifteen years old. That makes it my oldest Python program that's in active, regular, and even somewhat demanding use (we serve up a bunch of requests a day, although mostly from syndication feed fetchers and bots on a typical day). As is usual for my long-lived Python programs, DWiki's not in any sort of active development, as you can see in its github repo, although I did add a an important feature just last year (that's another story, though).

DWiki has undergone a long process of sporadic development, where I've added important features slowly over time (including performance improvements). This sporadic development generally means that I come back to DWiki's code each time having forgotten much of the details and have to recover them. Unfortunately this isn't as easy as I'd like and is definitely complicated by historical decisions that seemed right at the time but which have wound up creating some very tangled and unclear objects that sit at the core of various important processes.

(I try to add comments for what I've worked out when I revisit code. It's probably not always successful at helping future me on the next time through.)

DWiki itself has been extremely stable in operation and has essentially never blown up or hit an unhandled exception that wasn't caused by a very recent code change of mine. This stability is part of why I can ignore DWiki's code for long lengths of time. However, DWiki operates in an environment where DWiki processes are either transient or restarted on a regular basis; if it was a persistent daemon, more problems might have come up (or I might have been forced to pay more attention to reference leaks and similar issues).

Given that it's a Unix based project started in 2005, Python has been an excellent choice out of the options available at the time. Using Python has given me long life, great stability in the language (since I started as Python 2 was reaching stability and slowing down), good enough performance, and a degree of freedom and flexibility in coding that was probably invaluable as I was ignorantly fumbling my way through the problem space. Even today I'm not convinced that another language would make DWiki better or easier to write, and most of the other options might make it harder to operate in practice.

(To put it one way, the messy state of DWiki's code is not really because of the language it's written in.)

Several parts of Python's standard library have been very useful in making DWiki perform better without too much work, especially pickle. The various pickle modules make it essentially trivial to serialize an object to disk and then reload it later, in another process, which is at the core of DWiki's caching strategies. That you can pickle arbitrary objects inside your program without having to make many changes to them has let me easily add pickle based disk caches to various things without too much effort.

At the same time, the very strong performance split in CPython between things implemented in C and things implemented in Python has definitely affected how DWiki is coded, not necessarily for the better. This is particularly obvious in the parsing of DWikiText, which is almost entirely done with complex regular expressions (some of them generated by code) because that's by far the fastest way to do it in CPython. The result is somewhat fragile in the face of potential changes to DWikiText and definitely hard for me to follow when I come back to it.

(With that said, I feel that parsing all wikitext dialects is a hard problem and a high performance parser is probably going to be tricky to write and follow regardless of the implementation language.)

DWiki is currently written in Python 2, but will probably eventually be ported to Python 3. I have no particular plans for when I'll try to do that for various reasons, although one of the places where I run a DWiki instance will probably drop Python 2 sooner or later and force my hand. Right now I would be happy to leave DWiki as a Python 2 program forever; Python 3 is nicer, but since I'm not changing DWiki much anyway I'll probably never use many of those nicer things in it.

DWikiFifteenYears written at 00:14:28; Add Comment

2020-10-19

What versions of PyPy I can use (October 2020 edition)

We have to do something about our Python 2 programs, especially on our Ubuntu machines where Python 2 may be gone entirely in Ubuntu 22.04 (which is now less than two years away, and we move slowly). One way to keep Python 2 programs running is with PyPy, and PyPy can potentially also be used to speed up Python 3 programs (well, long running ones). Thinking about various things to do with PyPy makes me curious as to what versions of it I can use on the various systems I care about.

On Fedora 32, PyPy is at 7.3.1 (Fedora 31 is at 7.1.1, but I should upgrade all my machines). Both Python 3 and Python 2 versions are available. On Ubuntu, PyPy is at version 7.3.1 on 20.04, 5.10.0 on 18.04, and 5.1.2 on 16.04. Ubuntu 16.04 is pretty much irrelevant at this point, since it goes out of support in less than six months. Only Ubuntu 20.04 has a 'pypy3' package; 16.04 and 18.04 don't.

PyPy 7.3.1 (on my Fedora machines) reports that its Python 2 is 2.7.13 and its Python 3 version is 3.6.9, which could be worse. The Ubuntu 18.04 PyPy 5.10.0 also reports that it's Python 2.7.13; the Ubuntu 16.04 PyPy 5.1.2 says it's Python 2.7.10. This is somewhat behind everyone's actual version of Python 2.7, where Ubuntu 16.04 has Python 2.7.12, 18.04 has 2.7.17, and 20.04 has 2.7.18rc1 (Fedora 32 has the release version of 2.7.18). In practice probably no one cares that much about sub-versions of Python 2.7; there hasn't exactly been lots of change there.

(For my own reference if nothing else, see new features added to Python 2.7 maintenance releases. I'm not sure if this completely covers changes to the standard library, but I think the Python people have been trying to keep those down in 2.7 for the obvious reasons.)

I'm somewhat surprised that Ubuntu 18.04 has no Python 3 version of PyPy, because according to the PyPy 5.10 release notes it supported both. PyPy 5.10 only supposed Python 3.5, but then Ubuntu 18.04 has Python 3.6.9, so the Python 3 versions aren't all that far apart. Perhaps Ubuntu thought that the Python 3 support in PyPy wasn't quite ready to be shipped as part of a LTS release, or perhaps it just wasn't on their radar then.

We'd likely see speed improvements in newer PyPy releases, but for just running undemanding Python 2 programs I think any of these versions of PyPy are probably good enough. If speed matters we probably want to be using PyPy on Ubuntu 20.04.

(We have one or two Python programs that can do enough work that they might benefit from PyPy. Unfortunately they're Python 3 programs that run on machines that will probably not be upgraded to 20.04 for various reasons.)

MyPyPyVersions2020-10 written at 00:52:53; Add Comment

2020-09-19

Python virtual environments transparently add themselves to sys.path

I have recently been exploring some aspects of Python's virtual environments, so I thought I had a reasonable understanding of how they worked. In the HN discussion of my entry on installing modules to a custom location, I saw someone say that you could just run the virtual environment's python binary (ie, <directory>/bin/python, which is normally a symlink to the system Python) without 'activating' the virtual environment and it would still find the modules that you'd installed in the venv. This surprised me, because I had expected that you'd have to set some sort of environment variable to get Python to add your arbitrary non-standard location to sys.path. However, testing it showed that it really works, which is both convenient and magical.

(For me as a sysadmin, the important thing is that we can run programs that use a virtual environment without having to set any environment variables before hand. As long as we use the venv's Python, everything works, presumably even for things run from Unix cron or started as daemons. And we can arrange to always do that by setting '#!...' paths appropriately.)

At first I guessed that this was something that Python did in general if it found an appropriate directory tree around where you were running the python executable from. However, it turns out not to be quite this general. Although Python has a quite intricate process on Unix for finding its standard library and site-packages (see the comment at the start of Modules/getpath.c), it doesn't go so far as to add random trees just because they sort of look right. Instead, there is a special feature for Python virtual environments that looks for pyvenv.cfg and uses it to trigger some additional things. To quote from the source code comment:

Search for an "pyvenv.cfg" environment configuration file, first in the executable's directory and then in the parent directory. If found, open it for use when searching for prefixes.

(I haven't attempted to trace through the tangled code to determine exactly how this results in your venv's site-packages getting added to sys.path.)

Venv normally writes pyvenv.cfg to the root of your virtual environment directory tree (ie, in the parent of the executable's directory). For me the contents appear to be pretty generic; there's no mention of the virtual directory's location, and copying the pyvenv.cfg to the root of an artificially created minimal tree does cause its site-packages directory to get added to sys.path.

Since all of this is undocumented, it's probably best to consider it special private stuff that's only for the use of the standard venv system. If you want something like a virtual environment, complete with its own site-packages that will be automatically picked up when you run the Python from that directory hierarchy, just create a real virtual environment. They're pretty lightweight.

(Well, for a value of 'pretty lightweight' that amounts to 947 files, directories, and symbolic links and 12 Mbytes of disk space. Almost all of these are from installing pip into your new virtual environment; it drops to 13 files if you use --without-pip.)

VenvsAndSysPath written at 00:28:57; Add Comment

2020-09-17

Python 3 venvs don't normally really embed their own copy of Python (on Unix)

Python 3 has standard support for virtual environments. The documentation describes them in general as:

[...] a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.

As a system administrator, a 'self-contained directory tree' that has a particular version of Python is a scary thing to read about, because it implies that the person responsible for keeping that version of Python up to date on security patches, bug fixes, and so on is me, not my Unix distribution. I don't want to have to keep up with Python in that way; I want to delegate it to the fine people at Debian, Canonical, Red Hat, or whoever.

(I also don't want to have to keep careful track of all of the virtual environments we might be running so that I can remember to hunt all of them down to update them when a new version of Python is released.)

Fortunately it turns out that the venv system doesn't actually do this (based on my testing on Fedora 31 with Python 3.7.9, and also a bit on Ubuntu 18.04). Venv does create a <dir>/bin/python for you, but under normal circumstances this is a symlink to whatever version of Python you ran the venv module with. On Linux, by default this will be the system installed version of Python, which means that a normal system package update of it will automatically update all your venvs too.

(As usual, currently running processes will not magically be upgraded; you'll need to restart them.)

This does however mean that you can shoot yourself in the foot by moving a venv around or by upgrading the system significantly enough. The directory tree created contains directories that include the minor version of Python, such as the site-packages directory (normally found as <dir>/lib/python3.<X>/site-packages). If you upgrade the system Python to a new minor version (perhaps by doing a Linux distribution version upgrade, or by replacing the server with a new server running a more current version), or you move the venv between systems with different Python minor versions, your venv probably won't work because it's looking in the wrong place.

(For instance, Ubuntu 18.04 LTS has Python 3.6.9, Fedora 31 has Python 3.7.9, and Ubuntu 20.04 LTS has Python 3.8.2. I deal with all three at work.)

You can easily fix this with 'python3 -m venv --upgrade <DIR>', but you have to remember that you need to do this. The good news is that whatever is trying to use the venv is probably going to fail immediately, so you'll know right away that you need it.

PS: One way to 'move' a venv between systems this way is to have an environment with multiple versions of a Linux (as we do), and to build venvs on NFS filesystems that are mounted everywhere.

VenvsAndPythonBinary written at 23:28:26; Add Comment

2020-09-16

How I think I want to drop modern Python packages into a single program

For reasons beyond the scope of this blog entry, I'm considering augmenting our Python program to log email attachment information for Exim to use oletools to peer inside MS Office files for indications of bad things. Oletools is not packaged by Ubuntu as far as I can see, and in any case it would be an older version, so we would need to add the oletools Python packages ourselves.

The official oletools install instructions talk about using either pip or setup.py. As a general rule, we're very strongly against installing anything system-wide except through Ubuntu's own package management system, and the environment our Python program runs in doesn't really have a home directory to use pip's --user option, so the obvious and simple pip invocations are out. I've used a setup.py approach to install a large Python package into a specific directory hierarchy in the past (Django), and it was a big pain, so I'd like not to do it again.

(Nor do we want to learn about how to build and maintain Python virtual environments, and then convert how we run this Python program to use one.)

After some looking at pip's help output I found the 'pip install --target <directory>' option and tested it a bit. This appears to do more or less what I want, in that it installs oletools and all of its dependencies into the target directory. The target directory is also littered with various metadata, so we probably don't want to make it where the program's normal source code lives. This means we'll need to arrange to run the program so that $PYTHONPATH is set to the target directory, but that's a solvable problem.

(This 'pip install' invocation does write some additional pip metadata to your $HOME. Fortunately it actually does respect the value of the $HOME environment variable, so I can point that at a junk directory and then delete it afterward. Or I can make $HOME point to my target directory so everything is in one place.)

All of this is not quite as neat and simple as dropping an oletools directory tree in the program's directory, in the way that I could deal with needing the rarfile module, but then again oletools has a bunch of dependencies and pip handles them all for me. I could manually copy them all into place, but that would actually create a sufficiently cluttered program directory that I prefer a separate directory even if it needs a $PYTHONPATH step.

(Some people will say that setting $PYTHONPATH means that I should go all the way to a virtual environment, but that would be a lot more to learn and it would be more opaque. But looking into this a bit did lead to me learning that Python 3 now has standard support for virtual environments.)

PipDropInInstall written at 23:54:46; Add Comment

2020-08-22

Some bits on making Python's argparse module work like Unix usually does

I recently discovered that argparse allows you to abbreviate long options, and then Chris Wellons wrote about Conventions for (Unix) Command Line Options, which included a criticism of argparse. I'm not going to write about how to make argparse behave today, because I haven't explored that in full; instead, this is some quick notes from the documentation and my past experiences.

First, both Wellons and I had a bad reaction to argparse accepting abbreviated options. However, based on the documentation you probably have to accept it, because of an important postscript note:

Changed in version 3.8: In previous versions, allow_abbrev [being False] also disabled grouping of short flags such as -vv to mean -v -v.

Almost no one has Python 3.8, which means that the cure here is worse than the disease. Not accepting grouped short flags is much worse than accepting abbreviated long flags, so until Python 3.8+ is pervasive we're stuck with the latter.

As it implicitly documents (in the form of its example), argparse allows the non-traditional style of options being intermixed with non-option arguments on the command line, instead of requiring all options to be before non-options. There is no way to control this. Argparse does accept '--' to terminate the option list (with some caveats), after which things that look like options are non-option arguments.

In general, using the nargs argument for add_argument() is neither necessary nor useful for options (and has issues when used with non-options). Setting things like 'type=str' or 'action="append" causes the argument parser to do the right thing; similarly, it does the right thing when the action doesn't consume any argument value (this behavior is documented in a postscript of the nargs section). As Wellons noted, argparse can fall down badly if you attempt to create an option that takes an optional value. Fortunately, I don't think you should do that and should stick to options either always taking a value or never doing so. Argparse's own examples use 'nargs="?"' for non-options arguments that are in fact optional.

Argparse makes a weird attempt to handle command line arguments that are negative numbers, as documented in Arguments containing -. This isn't how traditional Unix commands behave with such arguments, where a leading '-' is a leading '-' no matter what, with no attempts to guess at what should happen. This behavior is not currently optional and I don't think there's a really good way to trick argparse into not doing it.

(Actually reading much of the argparse documentation has already taught me useful things I didn't know, such as how the dest argument is optional. I'm not sure I'd want to ever leave it out, though; explicit is better than implicit, and using 'dest' leaves a visible reminder in the code of what attribute holds the result.)

ArgparseSomeUnixNotes written at 23:59:39; Add Comment

2020-08-03

The issue of how to propagate some errors in our Django web app

Much of what our Django application to handle (Unix) account requests does is only available to special people such as professors, who can actually approve account requests instead of just making them. Following our usual, we protect the management section of the web app with Apache HTTP Basic Authentication, where only people in designated Unix groups (such as the 'sponsors' group) have access. However, the Django application also has a 'users' table, and in order to have access to the application you have to be in it (as well as be in a Unix group that's allowed access). Normally we maintain these two things in sync; when we add someone to the relevant Unix group, we also add them as a user (and set the type of user they are, and set up other necessary data for people who can sponsor accounts). But sometimes we overlook this extra step so people wind up permitted by Apache but not in the 'users' table. If they actually try to use our web application, this causes it to stop with an error.

(This 'users' table is part of the application's own set of tables and models, not the normal Django authentication system's User table. Possibly I should have handled this differently so that it's more integrated with normal Django stuff, but when I started this web application I was new to Django and keeping things completely separate was much easier.)

Right now, a bunch of our views look like this at the start:

def approve(request):
  urec = get_user(request)
  if urec.is_staff():
    ...
  ...

You'll note that there's no error handling. This is because get_user() does the brute force simple thing (with some error checks removed):

def get_user(request):
  user = request.META['REMOTE_USER']
  try:
    return models.User.objects.get(login=user)
  except models.User.DoesNotExist:
    ... log a message ...
    raise django.http.Http404

This is simple and reliable, but it has a downside, which is that people who run into this mistake of ours get the same HTTP 404 error page that they'd get if they were trying to go to a URL that genuinely didn't exist in our application. This is at best uninformative and at worst confusing, and I'd like to do better. Unfortunately I'm not sure what the best way to do it is.

My first attempt was to raise Django's Http404 error with a specific message string and then try to make our template for the application's 404 error page check for that string and generate a different set of messages. That failed, because as far as I can see either Django drops the message string at some point in its processing or doesn't pass it to your custom template as a template variable.

I can see three alternate approaches, none of which I'm persuaded by. The simple but unappealing option is to change get_user() to return an error in this situation. This would require a boilerplate change at every place it's called to check the error and handle it by generating a standard 'we screwed up' response page, which makes the code feel like Go instead of Python. But at least how things worked would be obvious (and if I returned None, I could make failures to handle this case relatively obvious).

The more complicated but less code approach is to raise a custom error and wrap every function that calls get_user() with a decorator that catches the error to generate and return the standard explanation page. I would have to decorate every view function that directly or indirectly calls get_user() (and remember to add this if I added new functions), and decorators are sort of advanced Python magic that aren't necessarily either clear or straightforward for people to follow.

I suspect that the way I'm supposed to do this in Django is with some form of middleware. If I kept to much of the current approach, I could do this with a middleware that just used process_exception(), but that doesn't seem the most idiomatic way. Since this is a common processing step for anything protected behind HTTP Basic Authentication, it feels like the middleware should do the user lookup itself and attach it to the request somehow, with the actual views not even calling get_user(). But I don't know how to attach arbitrary data to Django's request objects, and anyway that involves even more Django magic than middleware that catches a custom exception (and the magic is less clear, since the view functions would just access data without any obvious reason for it to be there).

(I care about the amount of magic involved in any solution because my co-workers aren't particularly familiar with Django and even I only touch the code every once in a while. Possibly this means I should just use the explicit error checking version, even if it makes me twitch.)

DjangoErrorPropagationIssue written at 00:16:30; Add Comment

2020-07-20

An exploration of why Python doesn't require a 'main' function

Many languages start running your program by calling a function of yours that must have a specific name. In C (and many C derived languages), this is just called main(); in Go, it's main.main() (the main() function in the main package). Python famously doesn't require any such function, and won't automatically call a function called main() even if you create it. Recently I read Why doesn’t Python have a main function? (via), which puts forward one discussion for why this is so. However, I have a somewhat different way of explaining this situation.

The core reason that Python doesn't require a main() function is a combination of its execution model (specifically for what happens when you import something) and that under normal circumstances you start Python programs by (implicitly) importing a single file of Python code. So let's look at each of these parts.

In many languages things like functions, classes, and so on are created (defined) by the interpreter or compiler as it parses the source file. In Python, this is not quite the case; instead, def and class are executable statements, and they define classes and functions when they execute (among other things, this is part of why metaclasses work). When Python imports something, it simply executes everything in the file (or the import more generally). When what's executed is def and class statements, you get functions and classes. When what's executed is regular code, you get more complicated things happening, including conditional imports or calling functions on the fly under the right conditions. Or you can write an entire program that just runs inline, as the file is imported.

(This has some interesting consequences, including what reloading a Python module really does.)

However, Python is not quite as unique here as it might look. Many languages have some facility to run arbitrary code early on as the program is 'loading', before the program starts normal execution (Go has init() functions, for example). Where Python is different from these languages is that Python normally starts a program by loading and executing a specific single file. Because Python is only executing a single file, it's unambiguous what code is run in what order and it's straightforward for the code in that file to control what happens. In a sense, rather than picking an arbitrarily named function for where execution (nominally) starts, Python is able to sneakily pick an arbitrarily named file by having you provide it.

(Compiled languages traditionally have a model where code from a bunch of separate files is all sort of piled up together. In Python, you can't really aggregate multiple files together into a shared namespace this way; one way or another, you have to import them and everything starts from some initial file.)

Where this nice model breaks down and needs a workaround is if you run a package with 'python -m ...', where Python doesn't really have a single file that you're executing (or it'd have to make __init__.py serve double duty). As covered in the official documentation's __main__ — Top-level script environment (via), Python adopts the arbitrary convention of loading a __main__.py file from your package and declaring it more or less the point where execution starts.

(Under at least some situations, your package's __init__.py may also be executed.)

PS: contrary to the original article's views, I strongly suggest that you have a main() function, because there are significant benefits to keeping your program importable.

WhyNoMainFunction written at 23:27:55; Add Comment

2020-07-14

Today I learned that Python's argparse module allows you to abbreviate long command line options

Argparse is the standard Python library for Unix style command line argument handling; I use it in all of my Python programs these days. As part of this it supports the now standard long options, so you can have options like '--dead-disks' instead of just '-D' (there are only so many single characters to go around, and they can be hard to remember). Today I learned that argparse accepts abbreviations for these long options, provided that they're unambiguous. If you have a long option '--dead-disks' and no other long option that starts with 'd', argparse will accept all of '--dead-disk', '--dead', and even '--d' as meaning '--dead-disks'.

This is clearly documented in the argparse documentation if you bother to read it all (I never did), in Argument abbreviations (prefix matching). You can turn it off when constructing your ArgumentParser by setting the allow_abbrev keyword argument to 'false'. Unfortunately I don't think there's anything visible in a program's --help that will tell you whether or not it accepts abbreviated options; you have to either read the Python code or just try it with something harmless.

(It appears that optparse, argparse's predecessor, always allowed abbreviations, with no way to turn it off I'm basing this on a little mention of abbreviated long options in this section. This makes argparse a clear improvement here, since at least you can turn abbreviated options off if you want to.)

With my user hat on, I think this is a fine little feature. Long options are often, well, long, and if you can abbreviate them you might as well so people don't have to type as much (at least the people who don't have command line completion for options for your programs).

With my sysadmin hat on, I'm worried about the implications of abbreviated options accidentally getting embedded into other scripts, crontab entries, and so on. For instance, if the real option is '--dead-disks' but it's usually used with a single disk name, it would be easy to accidentally forget that it wasn't '--dead-disk' and embed that mistake in a script. Although this works today, it risks confusion in people who later read the script (including your future self). With heavily abbreviated options, evolving the program to add more options risks now making a previously unambiguous and working abbreviation now ambiguous and not working. If you add a new '--deadline' argument and scripts were using '--dead' as an abbreviation for '--dead-disks', suddenly they're going to start getting errors.

(You can think of this as a version of Postel's law, in which case The Harmful Consequences of the Robustness Principle sort of applies.)

Given this concern, it's tempting to update at least some of our sysadmin tools to disable abbreviated command line options and perhaps to make it my default argparse setting in future tools.

ArgparseAbbreviatedOptions written at 23:08:14; Add Comment

2020-06-29

Adapting our Django web app to changing requirements by not doing much

We have a Django web application to handle (Unix) account requests, which is now nine years old. I've called this utility code, but I mentioned recently that over that time there have been some changes in how graduate students were handled that needed some changes in the application. Except not very much change was necessary, in some ways, and in other ways the changes are hacks. So here are some stories of those changes.

When we (I) initially wrote the web application, our model of how new graduate students got Unix accounts was straightforward. All graduate students were doing a thesis (either a Masters or a PhD) and so all of them have a supervising professor. As a long standing matter of policy, that supervisor was their account sponsor, and so approved their account request. Professors can also sponsor accounts for other people associated with them, such as postdocs.

(This model already has a little glitch; some students are co-supervised by more than one professor. Our system requires one to be picked as the account sponsor, instead of somehow recording them as co-sponsored, which has various consequences that no one has complained about so far.)

The first change that showed up was that the department developed a new graduate program, the Master of Science in Applied Computing. Graduate students in the MScAC program don't write a thesis and as a result they don't have a supervising professor. As it happened, we already had a model for solving this, because Unix accounts for administrative and technical staff are not sponsored by professors either; they have special non-professor sponsors. So we added another such special sponsor for MScAC students. This was not sufficient by itself, because the account request system sometimes emails new graduate students and the way those messages were written assumed that the student's sponsor was supervising them.

Rather than develop a general solution to this, we took the brute force solution of an '{% if ...}' condition in the relevant Django template. Because of how our data is set up, this condition both has to reach through several foreign keys and uses a fixed text match against a magic name, instead of checking any sort of flag or status marker (because no such status marker was in the original data model). Fortunately the name it matches against is not exposed to people, because the official name for the program has actually changed over time but our internal name has never been updated (partly because it was burned into the text template). This is a hack, but it works.

The second change is that while all graduate students must eventually get a specific supervisor, not all of them have one initially when they arrive. In particular, there is one research group that accepts most new graduate students collectively and then sorts out who they will be supervised later, once the graduate students know more about the group and their own interests. In the past, this had been solved artificially by assigning nominal sponsors immediately even if they weren't going to be the student's supervisor, but eventually the group got tired of this and asked us to do better. The solution here was similar to the MScAC program (and staff accounts); we invented a synthetic 'supervisor' for them, with a suitable generic name. Unlike with the MScAC program, we didn't customize the Django templates for this new situation, and unfortunately the result does look a little ugly and awkward.

(This is where a general solution would have been useful. If we were templating this from a database table or the like, we could have just added a new entry for this general research group case. Adding another Django '{% if ...}' to the template would have made it too tangled, so we didn't.)

I don't think we did anything clever in our Django application's code or its data model. A lot of the changes we were able to make were inherent in having a system that was driven by database tables and being able to add relatively arbitrary things to those tables (with some hacks involved). Where our changes start breaking down is exactly where the limitations of that start appearing, such as multiple cases in templates when we didn't design that into the database.

(Could we have added it later? Perhaps. But I've always been too nervous about database migrations to modify our original database tables, partly because I've never done one with Django. This is a silly fear and in some ways it's holding back the evolution of our web application.)

PS: You might think that properly dealing with the co-supervision situation would make the research group situation easy to deal with, by just having new graduate students 'co-sponsored' by the entire research group. It's actually not clear if this is the right answer, because the situations are somewhat different on the Unix side. When you actively work with a supervisor, you normally get added to their Unix group so you can access group-specific things (if there are any), so for co-supervisors you should really get added to the Unix groups for both supervisors. However, it's not clear if people collectively sponsored by a research group should be added to every professor's Unix group in the same way. This implies that the Django application should know the difference between the two cases so that it can signal our Unix account creation process to treat them differently.

Sidebar: Our name hack for account sponsors

When someone goes to our web page to request an account, they have to choose their sponsor from a big <select> list of them. The list is sorted on the sponsor's last name, to make it easier to find. The idea of 'first name' and 'last name' is somewhat tricky (as is their order), and automatically picking them out from a text string is even harder. So we deal with the problem the other way around. Our Django data model has a 'first name' and a 'last name' field, but what they really mean is 'optional first part of the name' and 'last part of the name (that will determine the sort order)'.

As part of this, the synthetic account sponsors generally don't have a 'first name', because we want them to sort in order based on the full description (such as 'MScAC Graduate Student', which sorts in M not G or S).

(Sorting on 'last name' is somewhat arbitrary, but part of it is that we expect people requesting accounts to be more familiar with the last name of their sponsor than the first name.)

DjangoAppAdaptations written at 01:09:47; Add Comment

(Previous 10 or go back to June 2020 at 2020/06/28)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.