Wandering Thoughts

2024-02-04

I switched to explicit imports of things in our Django application

When I wrote our Django application it was a long time ago, I didn't know Django, and I was sort of in a hurry, so I used what I believe was the style at the time for Django of often doing broad imports of things from both Django modules and especially the application's other modules:

from django.conf.urls import *
from accounts.models import *

This wasn't universal; even at the time it was apparently partly the style to import only specific things from Django modules, and I followed that style in our code.

However, when I moved the application to Python 3 I also switched all of these over to specific imports. This wasn't required by Django (or by Python 3); instead, I did it because it made my editor complain less. Specifically it made Flycheck in GNU Emacs complain less (in my setup). I decided to do this change because I wanted to use Flycheck's list of issues to check for other, more serious issues, and because Flycheck specifically listed all of the missing or unknown imports. Because Flycheck listed them for me, I could readily write down everything it was reporting and see the errors vanish. When I had everything necessary imported, Flycheck was nicely quiet (about that).

Some of the import lines wound up being rather long (as you can imagine, the application's views.py uses a lot of things from our models.py). Even still, this is probably better for a future version of me who has to look at this code later. Some of what comes from the application models is obvious (like core object types), but not all of it; I was using some imported functions as well, and now the imports explicitly lists where they come from. And for Django modules, now I have a list of what I'm using from them (often not much), so if things change in a future Django version (such as the move from django.conf.urls to django.urls), I'll be better placed to track down the new locations and names.

In theory I could have made this change at any time. In practice, I only made it once I'd configured GNU Emacs for good Python editing and learned about Flycheck's ability to show me the full error list. Before then all of the pieces were two spread apart and too awkward for me to reach for.

(Of course, this isn't the first time that my available tools have influenced how I programmed in a way that I noticed.)

DjangoExplicitImportsSwitch written at 21:50:01; Add Comment

2024-02-03

Solving one of our Django problems in a sideways, brute force way

A few years ago I wrote about an issue with propagating some errors in our Django application. We have two sources of truth for user authorization, one outside of Django (in Unix group membership that was used by Apache HTTP Basic Authentication), and one inside Django in a 'users' table; these two can become desynchronized, with someone in the Unix group but not in the application's users table. The application's 'retrieve a user record' function either returns the user record or raises an Http404 exception that Django automatically handles, which means that someone who hasn't been added to the user table will get 404 results for every URL, which isn't very friendly. I wanted to handle this by finding a good way to render a different error page in this case, either by customizing what the 'Http404' error page contained or by raising a different error.

All of this is solving the problem in the obvious way and also a cool thing to (try to) do in Django. Who doesn't want to write Python code that handles exceptional cases by, well, raising exceptions and then having them magically caught and turn into different rendered pages? But Django doesn't particularly support this, although I might have been able to add something by writing an application specific piece of Django middleware that worked by catching our custom 'no such user' exception and rendering an appropriate template as the response. However, this would have been my first piece of middleware, so I held off trying anything here until we updated to a modern version of Django (partly in the hopes it might have a solution).

Then, recently a simpler but rather less cool option to deal with this whole issue occurred to me. We have a Django management command that checks our database for consistency in various ways (for example, unused records of certain types, or people in the application's users table who no longer exist), which we run every night (from cron). Although it was a bit of a violation of 'separation of concerns', I could have that command know about the Unix group(s) that let people through Apache, and then have it check that all of the group members were in the Django user table. If people were omitted, we'd get a report. This is pretty brute force and there's nothing that guarantees that the command's list of groups stays in synchronization with our Apache configuration, but it works.

It's also a better experience for people than the cool way I was previously considering, because it lets us proactively fix the problem before people encounter it, instead of only reactively fixing it after someone runs into this and reports the issue to us. Generally, we'll add someone to the Unix group, forget to add them to Django, and then get email about it the next day before they'll ever try to use the application, letting us transparently fix our own mistake.

(This feels related to something I realized very early about not trying to do everything through Django's admin interface.)

DjangoSolvingProblemSideways written at 21:44:04; Add Comment

2024-02-01

Our Django application is now using Python 3 and a modern Django

We have a long standing Django web application to handle the process of people requesting Unix accounts here and having the official sponsor of their account approve it. For a long time, this web app was stuck on Python 2 and Django 1.10 after a failed attempt to upgrade to Django 1.11 in 2019. Our reliance on Python 2 was obviously a problem, and with the not so far off end of life of Ubuntu 20.04 it was getting more acute (we use Apache's mod_wsgi, and Ubuntu 22.04 and later don't have a Python 2 version of that for obvious reasons). Recently I decided I had to slog through the process of moving to Python 3 and a modern Django (one that is actually supported) and it was better to start early. To my pleasant surprise the process of bringing it up under Python 3 and Django 4.2 was much less work than I expected, and recently we migrated the production version. At this point it's been running long enough (and has done enough) that I'm calling this upgrade a success.

There are a number of reasons for this smooth and rapid sailing. For a start, it turns out that my 2019 work to bring the app up under Python 3 covered most of the work necessary, although not all of it. Our previous problems with CSRF and Apache HTTP Basic Authentication have either been sidestepped by Django changes since 1.11 or perhaps mitigated by Django configuration changes based on a greater understanding of this area that I worked out two years ago. And despite some grumpy things I've said about Django in the past, our application needed very few changes to go from Django 1.10 to Django 4.2.

(Most of the Django changes seem to have been moving from 'load staticfiles' to 'load static' in templates, and replacing use of django.conf.urls.url() with django.urls.re_path(), although we could probably do our URL mapping better if we wanted to. There are other minor changes, like importing functions from different places, changing request.POST.has_key(X) to X in request.POST, and defining DEFAULT_AUTO_FIELD in our settings.)

Having this migration done and working takes a real load off of my mind for the obvious reasons; neither Python 2 nor Django 1.10 are what we should really be using today, even if they work, and now we're free to upgrade the server hosting this web application beyond Ubuntu 20.04. I'm also glad that it took relatively little work now.

(Probably this will make me more willing to keep up to date with Django versions in the future. We're not on Django 5.0 because it requires a more recent version of Python 3 than Ubuntu 20.04 has, but that will probably change this summer or fall as we start upgrades to Ubuntu 24.04.)

DjangoAppNowPython3 written at 23:06:25; Add Comment

2024-01-30

Putting a Python executable in venvs is probably a necessary thing

When I wrote about getting the Python LSP server working with venvs in a brute force way, Ian Z aka nobrowser commented (and I'm going to quote rather than paraphrase):

I'd say that venvs themselves are "aesthetically displeasing". After all, having a separate Python executable for every project differs from having a separate LSP in degree only.

On Unix, this separate executable is normally only a symbolic link, although other platforms may differ and the venv normally will have its own copy of pip, setuptools, and some other things, which can amount to 20+ Mbytes even on Linux. However, when I thought about it, I don't think there's any good option other than for the venv to have its own (nominal) copy of Python. The core problem is that venvs are very convenient when they're more or less transparently activated.

A Python venv is marked by a special file in the root of the venv, pyvenv.cfg. There are two ways that Python could plausibly decide when to automatically activate a venv without you having to set any environment variables; it can look around the environment of the Python executable you ran for this marker (which is what it does today), or it could look around the environment of your current directory, traversing up the filesystem to see if it could find a pyvenv.cfg (in much the same way that version control systems look for their special .git or .hg directory to mark the repository root).

The problem with automatically activating a venv based on what you find in the current directory and its parents is that it makes Python programs (and the Python interpreter) behave differently depending on where you are when you run them, including random system utilities that just happen to be written in Python. If the program requires any packages beyond the standard library, it may well fail outright because those packages aren't installed in the venv, and if they are installed in the venv they may not be the version the program needs or expects. This isn't a particularly good experience and I'm pretty confident that people would be very unhappy if this was what Python did with venvs.

The other option is to not automatically activate venvs at all and always require you to set environment variables (or the local equivalent). The problem for this is that it's a terrible experience for actually using venvs to, for example, deploy programs as encapsulated entities. You can't just ship the venv and have people run programs that have been installed into its bin/ subdirectory; now they need cover scripts to set the venv environment variables (which might be automatically generated by pip or whatever, but still).

So on the whole embedding the Python interpreter seems the best choice to me. That creates a clear logic to which venv is automatically activated, if any, that can be predicted by people; it's the venv whose Python you're running. Of course I wish it didn't take all of that disk space for extra copies of pip and setuptools, but you can't have everything.

VenvsAndEmbeddedPython written at 21:28:13; Add Comment

2024-01-27

Getting the Python LSP server working with venvs the brute force way

Recently I wound up doing some Django work using a Python venv, since this is the easiest way to get a self-contained Python environment that has some version of Django (or other applications) installed. However, one part of the experience was a little bit less than ideal. I normally write Python using GNU Emacs and the Python LSP server, and this environment was complaining about being unable to find Django modules to do code intelligence things with them. A little thought told me why; GNU Emacs was running my regular LSP server, which is installed through pipx into its own venv, and that venv didn't have Django installed (of course).

As far as I know, the Python LSP server doesn't have any specific support for recognizing and dealing with venvs, and in general this is a bit difficult (the version of Python being used by a venv may not even match the version that pylsp is running with; in fact this was my case, since my installed pylsp was using pypy but the Django venv was using the system Python 3). Rather than try to investigate deeply into this, I decided to solve the problem with brute force, which is to say that I installed the Python LSP server (with the right set of plugins) into the venv, along with all of the rest of things, and then ran that instance of GNU Emacs with its $PATH set to use the venv's bin/ directory and pick up everything there, including its Python 3 and python-lsp-server.

This is a little bit aesthetically displeasing for at least two reasons. First, the Python LSP server and its plugins and their dependencies aren't a small thing and anyway they're not a runtime package dependency, it's purely for development convenience. Second, the usual style of using GNU Emacs is to start it once and then reuse that single Emacs instance for everything, which naturally gives that Emacs instance a single $PATH and make it want to use a single version of python-lsp-server. I'm okay with deviating from this bit of Emacs practice, but other people may be less happy.

(A hack that deals with the second issue would be a 'pylsp' cover script that hunts through the directory tree to see if you're running it from inside a venv and if that venv has its own 'pylsp' binary; if both are true, you run that pylsp instead of your regular system-wide one. I may write this hack someday, partly so that I can stop having to remember to add the venv to my $PATH any time I want to fire up Emacs on the code I'm working on in the venv.)

PythonVenvAndLSPServer written at 21:50:05; Add Comment

2024-01-19

A Django gotcha with Python 3 and the encoding of CharFields

We have a long standing Django web application, which has been static for a long time in Python 2. I've recently re-started working on moving it to Python 3 (an initial round of work was done some time ago), and in the process of this I ran into a surprising issue involving text encoding and database text fields (a 'CharField' in Django terminology).

As part of one database model, we have a random key, which for various reasons is represented in the database model as a modest size string:

class Request(models.Model):
  [...]
  access_hash = models.CharField(..., default=gen_random_hash)

(We use SQLite as our database, which may be relevant here.)

The actual access hash is a 64-bit random value read from /dev/urandom. We could represent this in a variety of ways; for instance, I could have just treated it as a 64-bit unsigned decimal number in string form, or a 64-bit (unsigned) hex number. But for no particularly strong reason, long ago I decided to base64 encode the raw random value. Omitting error checking, the existing version of this is:

def gen_random_hash():
  fp = open("/dev/urandom", "rb")
  c = fp.read(8)
  fp.close()
  # Trim off trailing awkward '=' character
  return base64.urlsafe_b64encode(c)[:-1]

(The use of "rb" as the open mode stems from my first round of updates for Python 3.)

When I ran our web application under Python 3 in testing mode and looked at the uses of this access hash, I discovered that the URLs we were generating for it in email included a couple of ''' in them. Inspection of the database table entry itself in the Django admin interface showed that the actual value for access hash was, for example (and this is literal):

b'm5AWGlUSR1c'

(0x27 is ', so I was getting "b'm5AWGlUSR1c&#x27" in the URL when written out for email in HTML. Once I looked I could see that giveaway leading 'b' too.)

There are two things going on here. The first is that in Python 3, base64.urlsafe_b64encode operates on bytes (which we're giving it since we read /dev/urandom in binary mode, making c a bytes object) and returns a bytes object, not a string. The second thing is that when we ask Django to store a bytes object in a CharField (possibly only as the result of a callable default value), Django string-ized it, yielding the b'...' form as the stored value.

At one level this is reasonable; Django is doing its best to store some random Python object into a string field, probably by just doing a str() on it. At another level, I wish Django would specifically refuse to do this conversion for bytes objects, because one Python 3 issue is definitely bytes/str confusions and this specific representation conversion is almost certainly a bad idea, unlike str()'ing things in general. Raising an exception by default would be much more useful.

The solution is to explicitly convert to a Unicode str and specify a suitable character set encoding, which for base64 can be 'ascii':

return str(base64.urlsafe_b64encode(c)[:-1], "ascii")

This causes the CharField values to look like they should, which means that URLs using the access hash no longer have ''' in them.

Hopefully there aren't any other cases of this lurking in our Django web application, but I suppose I should do some more testing and examine the database for alarming characters (which is relatively readily done with the management dumpdata command).

DjangoPython3FieldEncodingGotcha written at 22:36:51; Add Comment

2023-11-18

Using argparse in my Python programs encourages me to add options to them

Today, for reasons outside the scope of this entry, I updated one of my very old little Python utilities to add some features and move it to Python 3 in the process. This program was so old it was using getopt, so as part of updating it I switched it over to argparse, which is what I use in all of my modern programs. The change was positive in a variety of ways, but one of the things it did was it immediately caused me to add some more command line options. This isn't due to anything specific to this program, because over and over again I've had much the same experience; my argparse based programs have more options (and often better structured ones).

In thinking about it I believe there's a couple of reasons for this. First, argparse makes it easy to add the basic support for an option in a single place, in a ArgumentParser.add_argument() call. Unlike with getopt's much more primitive argument handling, I don't have to modify code in several places just to have my program accept the option, handle it to set or record some value, and include it in usage and help. Even if the option does nothing this is an easy first step.

Second, that argparse generates an 'args' (or 'options') object with all of the parsed information on it often makes it easy to expose new options to the code that needs to look at them. My usual pattern with argparse is to pass the 'opts' object you get from ArgumentParser.parse_args() down to functions that get called to do the work, rather than break out specific flags, options, and so on separately. This means that a new option is usually pervasively available through the code and doesn't have to be passed in specifically; I can just check 'opts.myNewOption' wherever I want to use it. By contrast, getopt didn't create a natural options object so I tended to pass around options separately (or worse, set global variables because it was the easy lazy way).

This doesn't always work; sometimes I need the new option in a place where I haven't already passed in the opts object. But it works a lot of the time, and when it works it makes it that much easier to code up the use of the new option.

(This also means it's easy to reverse out of an option I decide that I don't want after all. I can delete the creation of it and the use of it without having to go through changing function arguments around. Similarly it's easy to change how an option works because that too doesn't touch many places in the code.)

ArgparseEncouragesOptions written at 22:33:51; Add Comment

2023-11-17

My first Django database migration worked fine

We have a long standing Django application, using SQLite as the database layer because that's easy and all we need at our tiny scale. For as long as we've had the application I've not so much as breathed on its database schema (which is to say its model), because the thought of trying to do any sort of database migration was a bit scary. For reasons outside the scope of this entry, we recently decided that it was time to add some fields to our application model, so I got to (or had to) try out Django's support for more or less automatic database migrations. The whole experience was pretty painless in my simple case, although I had some learning experiences.

The basics of Django migrations are well covered by Django's documentation. For straightforward changes, you change your model(s) and then run the 'makemigrations' Django subcommand, supplying the name you want for your migration; Django will write a new Python file to do the work. Once you've made the migration you can then apply it with the 'migrate' subcommand, and in at least some cases un-apply it again. Our changes added simple fields that could be empty, which is about the simplest case you can get; I don't know how Django handles more complicated cases, for example introducing a mandatory non-null field.

One learning experience is that Django will want to create a additional new migrations if you tinker with the properties of your (new) model things after you've created (and probably applied) the initial migration in your development environment. For me, this happened relatively naturally as I was writing Django code to use these new fields, now that we had them. You probably don't want to wind up with a whole series of iterating migrations, so you're going to want to somehow squash these down to a single final migration for the production change. Since we use SQLite, I wound up just repeatedly removing the migration file and reverting my development database to start again, rather than tinker around with un-applying the migration and trying to get Django to rewrite it.

(Now that I'm reading the documentation all the way through there's a section on Squashing migrations, but it seems complex and not necessarily quite right for this case.)

While it's not strictly a migration issue as such, one thing that I initially forgot to do when I added new model fields was to also update the settings for our Django admin interface to display them. Partly this happened because it's been so long since I touched this code that I'd somewhat forgotten how it all worked until I looked at the admin interface, didn't see these fields in the particular model object, and started digging.

Although the Django migration documentation contains scary warnings about doing migrations with a SQLite database, I had no problem doing it in production with our generally barely used web application (we normally measure activity in 'hits per month'). Given how Django does database migrations on SQLite, a sufficiently active web application might have to shut down during the migration so that the database could be fiddled with without problems. In general, shutting down will avoid a situation where your running code is incompatible with the state of your database, which can definitely happen even in simple changes.

(Some people probably deploy such a database migration in multiple steps, but with a tiny application we did it the single step way, deploying a new version of the code that used the new fields at the same time as we performed the database migration on the production database.)

My overall experience is quite positive; changing our model was nowhere near as scary or involved as I expected it to be. I suspect I'm going to be much more willing to make model changes in the future, although I still don't think of it as a casual thing.

DjangoDBMigrationWentOkay written at 23:12:51; Add Comment

2023-10-11

The wisdom of being selective about python-lsp-server plugins

As far as I know, the most common Python server for the Language Server Protocol is still python-lsp-server (pylsp), although there are now some alternatives (see eg the Emacs lsp-mode page on this for all languages). Pylsp can use a number of plugins (what it calls 'providers') that provide additional checks, warnings, and so on; some are integrated into the pip installation process if you ask for them, and some of them require injecting third party packages. One of these integrated ones is support for pycodestyle.

Although I may have started out not including pycodestyle in my pylsp setup, at some point I started adding it; perhaps I thought it wouldn't do any harm and didn't seem to produce obnoxious warnings. Then I tried another Emacs LSP implementation and discovered that I actually hated pycodestyle's complaints about styles. The reason I hadn't noticed before is that lsp-mode appears to ask pylsp to disable it (the diagnostics you get from a LSP server depend both on your editor's setup and your LSP server).

Although it would be nice if the LSP support in all editors made it straightforward to have a LSP server disable things, the larger lesson for me is that I should stop hitting myself in the face. If I don't want pycodestyle's copious picky complaints about how exactly I format my Python code, I shouldn't include it in my pylsp setup in the first place. The corollary to this is that before I include a linter or a checker in my pylsp setup, it might be a wise idea to try it out separately to see if I like it (before or after customization, as might be necessary for something like ruff). Testing a checker or linter as a standalone thing doesn't guarantee I'll get exactly the same results in pylsp (or any other LSP server), but it's at least something.

(Thus, I think you shouldn't install 'python-lsp-server[all]', because you may be getting a number of surprises. Even if you like what you get now, if you reinstall this later, pylsp may have changed its idea of what 'all' should include in the mean time.)

This also suggests that I should be minimal in my pylsp configuration, just in case. Since I'm going to see the LSP warnings all of the time, more or less, I should stick to things that provide information I almost always want. I can always run additional checkers by hand or by semi-automating them in my editor or even a Makefile.

(I continue to think that setting up Python LSP support in GNU Emacs is reasonably worth it. In Emacs 29.1 you don't even need any third party packages, since 29.1 includes Eglot.)

PS: It's nice to see that Python has picked up a surprising number of linters and checkers while I wasn't looking. A part of me is still way back in the Python 2 era where my option was basically pychecker. And pipx makes it easy to try them out.

PylspBeSelectiveOnPlugins written at 22:37:53; Add Comment

2023-09-15

Insuring that my URL server and client programs exit after problems

I recently wrote about my new simple system to open URLs on my desktop from remote machines, where a Python client (on the remote server) listens on a Unix domain socket for URLs that programs (like mail clients) want opened, and reports these URLs to the server on my desktop, which passes them to my browser. The server and client communicate over SSH; the server starts by SSH'ing to the remote machine and running the client. On my desktop, I run the server in a terminal window, because that's the easy approach.

Whenever I have a pair of communicating programs like this, one of my concerns is making sure that each end notices when the other goes away or the communication channel breaks, and cleans itself up. If the SSH connection is broken or the remote client exits for some reason, I don't want the server to hang around looking like it's still alive and functioning; similarly, if the server exits or the SSH connection is broken, I want the remote client to exit immediately, rather than hang around claiming to other parties that it can accept URLs and pass them to my desktop to be opened in a browser.

On the server this is relatively simple. I started with my standard stanza for Python programs that I want to die when there are problems:

signal.signal(signal.SIGINT, signal.SIG_DFL)
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
signal.signal(signal.SIGHUP, signal.SIG_DFL)

If I was being serious I should check to see what SIGINT was initially set to, but this is a casual program, so I'll never run it with SIGINT deliberately masked. Setting SIGHUP isn't necessary today, but I didn't remember that until I checked and Python could change it.

Since all the server does is read from the SSH connection to the client, I can detect both client exit and SSH connection problems by looking for end of file, which is signalled by an empty read result:

def process(host: str) -> None:
  pd = remoteprocess(host)
  assert(pd.stdout)
  while True:
    in = pd.stdout.readline()
    if not in:
      break
    [...]

As far as I know, our SSH configurations use TCP keepalives, so if the connection between my machine and the server is broken, both ends will eventually notice.

Arranging for the remote client to exit at appropriate points is a bit harder and involves a hack. The client's sign that the server has gone away is that the SSH connection gets closed, and one sign of that is that the client's standard input gets closed. However, the client is normally parked in socket.accept() waiting for new connections over its Unix socket, not trying to read from the SSH connection. Rather than write more complicated Python code to try to listen for both a new socket connection and end of file on standard input (for example using select), I decided to use a second thread and brute force. The second thread tries to read from standard input and forces the entire program to exit if it sees end of file:

def reader() -> None:
  while True:
    try:
      s = sys.stdin.readline()
      if not s:
        os._exit(0)
    except EnvironmentError:
      os._exit(0)

[...]
def main() -> None:
  [the same signal setup as above]

  t = threading.Thread(target=reader, daemon=True)
  t.start()

  [rest of code]

In theory the server is not supposed to send anything to the client, but in practice I decided that I would rather have the client exit only on an explicit end of file indication. The use of os._exit() is a bit brute force, but at this point I want all of the client to exit immediately.

This threading approach is brute force but also quite easy, so I'm glad I opted for it rather than complicating my life a lot with select and similar options. These days maybe the proper easy way to do this sort of thing is asyncio with streams, but I haven't written any asyncio code.

(I may take this as a challenge and rewrite the client as a proper asyncio based program, just to see how difficult it is.)

All of this appears to work in casual testing. If I Ctrl-C the server in my terminal window, the remote client dutifully exits. If I manually kill the remote client, my local server exits. I haven't simulated having the network connection stop working and having SSH recognize this, but my network connections don't get broken very often (and if my network isn't working, I won't be logged in to work and trying to open URLs on my home desktop).

URLServerInsuringExits written at 22:04:18; Add Comment

(Previous 10 or go back to August 2023 at 2023/08/14)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.