Wandering Thoughts

2019-10-16

Some magical weirdness in Django's HTML form classes

In Django, HTML forms are defined in what seems to be the conventional approach; each form is a Python class and fields are variables defined in the class definition. This looks like this, to crib a skeleton from our Django web application :

class RequestForm(forms.Form):
  login = forms.CharField([...])
  name  = forms.CharField([...])
  email = forms.EmailField([...])
  aup_agree = forms.BooleanField([...])

You instantiate an instance of RequestForm either with initial data (when you're displaying the form for the first time) or with data from the POST or GET form, and then call various standard Django API functions on it and so on. It's all covered in the Django documentation.

What I didn't remember until I started looking just now is that this Python class definition we wrote out is kind of a lie. Django model classes mostly work look like they look, so for example the model version of a Request has a Request.login attribute just as its class definition does. Forms are significantly different. Although we set up what looks like class attributes here, our actual RequestForm class and class instances do not have, say, a RequestForm.login attribute. All of the form fields we seemed to define here get swept up and put in Form.fields.

At one level this is documented and probably the safest option, given that the data ultimately comes from an untrusted source (ie, a HTTP request). It also means that you mostly can't accidentally use a form instance as a model instance (for example, by passing the wrong thing to some function); if you try, it will blow up with attribute errors.

(The 'mostly' is because you can create a login attribute on a RequestForm instance if you write to it, so if a function that writes to fields of a model instance is handed a form instance by accident, it may at least half work.)

At another level, this is another way that Django's classes are non-Python magic. What looks like class attributes aren't even properties; they've just vanished. Conventional Python knowledge is not as much use for dealing with Django as it looks, and you have to know the API (or look it up) even for things that look like they should be basic and straightforward.

(I don't have an opinion any more about whether the tradeoffs here are worth it. Our Django app just works, which is what we really care about.)

DjangoFormClassMagic written at 21:54:06; Add Comment

2019-09-07

CentOS 7 and Python 3

Over on Twitter, I said:

Today I was unpleasantly reminded that CentOS 7 (still) doesn't ship with any version of Python 3 available. You have to add the EPEL repositories to get Python 3.6.

This came up because of a combination of two things. The first is that we need to set up CentOS 7 to host a piece of commercial software, because CentOS 7 is the most recent Linux release it supports. The second is that an increasing number of our local management tools are now in Python 3 and for various reasons, this particular CentOS 7 machine needs to run them (or at least wants to ) when our existing CentOS 7 machines haven't. The result was that when I set up various pieces of our standard environment on a newly installed CentOS 7 virtual machine, they failed to run because there was no /usr/bin/python3.

At one level this is easily fixed. Adding the EPEL repositories is a straightforward 'yum install epel-release', and after that installing Python 3.6 is 'yum install python36'. You don't get a pip3 with this and I'm not sure how to change that, but for our purposes pip3 isn't necessary; we don't install packages system-wide through PIP under anything except exceptional circumstances.

(The current exceptional circumstances is for Tensorflow on our GPU compute servers. These run Ubuntu 18.04, where pip3 is available more or less standard. If we had general-use CentOS 7 machines it would be an issue, because pip3 is necessary for personal installs of things like the Python LSP server.)

Even having Python 3.6 instead of 3.7 isn't particularly bad right now; our Ubuntu 16.04 machines have Python 3.5.2 and even our 18.04 ones only have 3.6.8. Even not considering CentOS 7, it will be years before we can safely move any of our code past 3.6.8, since some of our 18.04 machines will not be upgraded to 20.04 next year and will probably stay on 18.04 until early 2023 when support starts to run out. This is surprisingly close to the CentOS 7 likely end of life in mid 2024 (which is much closer than I thought before I started writing this entry), so it seems like CentOS 7 only having Python 3.6 is not going to hold our code back very much, if at all.

(Hopefully by 2023 either EPEL will have a more recent version of Python 3 available on CentOS 7 or this commercial software will finally support CentOS 8. I can't blame them for not supporting RHEL 8 just yet, since it's only been out for a relatively short length of time.)

PS: I don't know what the difference is between the epel-release repositories you get by doing it this way and the epel-release-latest repositories you get from following the instructions in the EPEL wiki. The latter repos still don't seem to have Python 3.7, so I'm not worrying about it; I'm not very picky about the specific version of Python 3.6 I get, especially since our code has to run on 3.5 anyway.

Python3AndCentOS7 written at 23:24:39; Add Comment

2019-09-04

If you use the rarfile module, make sure you're using version 3.0 (or later)

We have a Python program to log mail attachment information for Exim. One of the things it does as part of looking at attachments is try to look inside various sorts of archives to find out what sort of things are in there (occasionally the answer is interesting). Back in the summer of 2016, I added the ability to look inside RAR archives using the rarfile module, in addition to ZIPs and tar files (for which I was using the standard library's modules). At the time our mail machines were running a mixture of Ubuntu 12.04 and Ubuntu 14.04, neither of which had the rarfile module available pre-packaged, but the recently released Ubuntu 16.04 packaged rarfile 2.7. Since the module itself is a single pure Python file, I just copied the 16.04 package's rarfile.py into our program's local source and left things there.

(I believe that rarfile 2.8 had been out for about a month at that point, but it didn't seem worth deviating from the Ubuntu version. At that point I was hoping to switch to the official Ubuntu package when we upgraded all of the mail machines to Ubuntu 16.04, so we could theoretically let Ubuntu worry about its version.)

Over time (starting no later than the fall of 2017), we noticed a slowly increasing number of MIME attachments with .rar extensions that we couldn't get the RAR archive contents for. Often our libmagic-based content sniffing (using the magic module) would say that these actually were what it thought was RAR archives, and frequently our commercial anti-spam system would detect malware in them. Recently this reached a tipping point (cf) where I decided to see if updating the rarfile module to the current version would improve the situation and let us look into more RAR archives.

The answer is yes. It turns out that there is a 'new' RAR archive format called RAR5, and rarfile added support for this format in version 3.0 (which was released at the end of 2016); before then, rarfile only supported the RAR3 format. Unsurprisingly, over time more and more RAR archives have been created using RAR5 format instead of RAR3 (although use of RAR3 is still surprisingly frequent in email attachments we get). To be able to read as many RAR archives as possible, you want rarfile 3.0 or later so it supports both RAR3 and RAR5 formats. Right now the 'or later' clause is not really important, since 3.0 is the latest released version.

(WinRAR started supporting RAR5 in late 2013, but my impression is that there are a lot of third party tools and third party RAR code out there. Apparently a fair amount of it has been slow to implement RAR5 or at least to default to it for new archives, much like the rarfile module.)

The rarfile module doesn't move very fast and it kept working for us in general, which is a large part of why I let it just sit there (and had we updated the mail machines to Ubuntu 18.04 and switched to the Ubuntu packaged version, we'd have automatically fixed the problem, as Ubuntu 18.04 packages 3.0). But it's an interesting experience in quietly outdated dependencies, where a more recent version would have improved our experience (and without us having to do anything).

Locking or otherwise freezing dependencies is a very common way to get stability and guarantee reproducible deployments, and that's very popular with a lot of people (me included). But what happened to us is the drawback of that stability, especially for those programs and apps that are complete and which thus have no natural ongoing changes that provide a push to at least check the state of dependencies.

PS: Updating to use rarfile 3.0 required no changes in our program, although we only use a very small portion of the module's capabilities. As far as I can tell, our code doesn't even notice whether the RAR archive is in RAR3 or RAR5 format.

UpdatingToRarfile30 written at 22:29:31; Add Comment

2019-08-30

How I'm dealing with my Python indentation problem in GNU Emacs

The current (cultural) standard for indentation in Python is four space indent levels and indenting only with spaces, never tabs; this is what GNU Emacs' python mode defaults to and what YAPF and other code formatters use. Our new and updated Python 3 code is written in this official standard, as is some relatively recent Python 2 code. However, I spent a very long time writing Python code using 8-space indent levels and tab-based indentation, which means that I have a great deal of existing Python code in this style, including almost all of our existing Python 2 code at work and all of DWiki. For various reasons I don't want to reformat or reindent all of this code, so I want to work on existing code in its current style, whatever that is. Because Python 3 doesn't like it when you mix spaces and tabs, this should include the use of tabs in indentation.

My existing .emacs settings for this across various different systems were basically an inconsistent mess. On my desktop I was reflexively clinging to my old indentation style with various python mode settings; on our Ubuntu login servers, I'd stopped overriding the python mode defaults due to shifting toward the standard style, but that left me with the tab problem. Today, as part of dealing with my .emacs in general, I decided that I wanted to have the same .emacs everywhere, and that drove me to actively work out a solution.

First, I realized that if I was willing to really commit to shifting my indentation style to the standard on, the only real problem I had was with tabs. GNU Emacs's python mode will automatically detect the current indentation level for existing Python code, and for new files I'll use 4-space indents with spaces no matter what the other files existed in the project. For tabs, I want to continue using tabs if and only if the file is already in my old 8-space tab based indentation style, so the only problem is detecting this.

As far as I can tell there are no existing GNU Emacs features or functions to do this, so I wrote some ELisp to be run as a python-mode hook (which means it happens on a per-file basis). I won't claim it's very good ELisp, but here it is:

(defun cks/leading-tabs-p ()
  "Detect if the current buffer has a line with leading tab(s)."
  (save-excursion
    (save-restriction
      (widen)
      (goto-char (point-min))
      (if (re-search-forward "^\t+" nil t)
          t
        nil))))

(add-hook 'python-mode-hook
          (lambda ()
            (if (and (= python-indent-offset 8) (cks/leading-tabs-p))
                (setq indent-tabs-mode t))))

The detection of tab-indented lines here is highly imperfect and can be fooled by all sorts of things, but for my purposes it's good enough; misfires are unlikely in practice. I'm not sure I even have any Python code that uses 8-space indentation but without tabs.

(The start of cks/leading-tabs-p is copied directly from the python-mode function that scans the buffer to determine the indentation level it currently uses. The function naming is superstition based on what I've seen around the Internet.)

I also decided to write some ELisp functions to toggle back and forth between the modern style and my old style and to report the indentation state of a buffer:

(defun cks/python-toggle ()
  "Toggle between old-style Python 2 and modern Python 3 settings."
  (interactive)
  (if (= python-indent-offset 8)
      (progn (setq indent-tabs-mode nil) (setq python-indent-offset 4)
             (message "Set to modern Python 3 (4-level spaces)"))
    (progn (setq indent-tabs-mode t) (setq python-indent-offset 8)
           (message "Set to ancient Python 2 with tabs"))))

(defun cks/rep-python ()
  "Report the Python indentation status of the current buffer."
  (interactive)
  (message "Python indentation is %d-space indents with %s %s" python-indent-offset
           (if (eq indent-tabs-mode t) "tabs" "spaces only")
           (cond ((and (= python-indent-offset 4) (eq indent-tabs-mode nil))
                  "(Python 3 standard)")
                 ((and (= python-indent-offset 8) (eq indent-tabs-mode t))
                  "(my Python 2 style)")
                 (t "(something weird)"))))

It's deliberate that after cks/python-toggle, I'm in one or the other of my standard indentation styles, even if the buffer started out in some weird style.

PS: Both python-indent-offset and indent-tabs-mode are buffer-local variables by the time I get my hands on them, so I can just directly use setq and so on. There may be a better way to do this these days, but my ELisp knowledge is old and rusty.

EmacsPythonIndentation written at 00:13:43; Add Comment

2019-08-18

Early notes on using LSP-based editing in GNU Emacs for Python

The two languages that I most use GNU Emacs for these days are Go and Python. After I got LSP-based editing working for Go, I decided to take a run at getting it to work for Python as well. Python is one of the languages that lsp-mode supports, through pyls, so I was hoping that it would be an install and go experience. The reality was not quite so smooth and I've wound up with some open questions and uncertainties.

As I usually do with pip-based install instructions, I opted to use 'pip install --user', which puts the resulting programs in ~/.local/bin. Since this isn't on my regular $PATH, I had to arrange for GNU Emacs to be able to see the pyls program before lsp-mode could do anything. Once it did, warnings popped up all over the Python code that I tried it out on, because I'd installed it as 'python-language-server[all]', which installs all linters and checkers. I must regretfully report that my code is not at all clean to all of them; for example, I frequently use short variable names that are all in lower case. After poking at this a bit I decided that I didn't want any linters right now. Some of the linters apparently could be disabled by 'pip uninstall', but others have standard Ubuntu versions and it's not clear how to tell lsp-mode to tell pyls to turn them off, and anyway some of them may be used to detect outright syntax errors, which I would like flagged.

Talking of syntax errors brings up the next issue, which is Python 2 versus Python 3. While we're moving towards Python 3, we still have plenty of Python 2 code, and so I would like a LSP-based setup that works smoothly with both. Unfortunately, as far as I can see pyls is at least partially specific to the version of Python you install it for. I actually used pip3 to install the Python 3 versions of things (since that's our future and seems the right choice if I have to pick one). This still seems to at least partially work for some test Python 2 code, in that in simple navigation works, but various syntax warnings and so on appear and there may be other LSP things that don't.

(As far as I can tell, pyls has no particular provisions for picking Python versions, which is not surprising. Some things I've read suggest that most people who have to deal with this use per-project virtualenvs, and Python 2 projects would then have the Python 2 version of pyls installed in their virtualenv. Manually starting GNU Emacs with a $PATH that finds the Python 2 version of pyls first does seem to work right, and I may be able to partially automate this with a frontend script for pyls that tries to figure out which Python version is more likely for the current context.)

All of this makes me fairly uncertain about whether lsp-mode is currently worth it for my Python programming. It does give me nice things like completions, but it's probably not going to be a set and forget thing the way it is for Go. Probably I'm going to be shaving more yaks before I have clear answers.

(There are various writeups on the net of using Python with lsp-mode but they seem to mostly come from people who already know a lot of Emacs, which is not me these days. Reading them and flailing away at my .emacs has been a humbling experience.)

PS: As usual, writing this entry pushed me to go further, try more things, and do more experimentation than I had at the start, which is a good thing.

PythonEmacsLSPNotes written at 22:56:47; Add Comment

2019-08-17

A situation where Python has undefined values

In most of Python, either a name has a value or it doesn't exist and attempts to access it will fail with some variation of 'that's not defined'. You get NameError for globals and AttributeError for attributes of objects, classes, and interestingly also for modules. Similarly, accessing a nonexistent key in a dictionary gets you a KeyError, also saying that 'this doesn't exist'.

(This means that code inside a module gets a different error for a nonexistent module variable than code outside it. I think this is just an artifact of how the name is accessed.)

But local variables in functions are different and special:

>>> def afunc():
...   print(a)
...   a = 10
... 
>>> afunc()
[...]
UnboundLocalError: local variable 'a' referenced before assignment

When we do the print(), the name a exists as a local variable (at least in some sense), but its value is undefined (and an error) instead of being, say, None. If a was not even a local variable, we should get either some variant of 'name not defined' or we'd access a global a if it existed.

(I say that a exists in some sense because it doesn't fully exist; for example, it is not in the dictionary that locals() will return.)

At one level this is a straightforward consequence of how local variables are implemented in CPython. All references to local variables within a function use the same fast access method, whether or not a value has been bound to the local variable. When no value has been set, you get an error.

At another level, this is a sensible language design decision regardless of the specifics of the implementation. Python has decided that it has lexically scoped local variables, and this opens up the possibility of accessing a local variable before it's had a value set (unlike globals and attributes). When this happens, you have three choices; you can invent an arbitrary 'unset' value, such as None, you can generate a 'name does not exist' error, or you can generate a unique error. Python doesn't have zero values in the way that a language like Go does (fundamentally because the meaning of variables is different in the two languages), so the first choice would be unusual. The second choice would be a confusing pretense, because the name actually does exist and is in fact blocking you from accessing a global version of the name. That leaves the third choice of a unique error, which is at least clear even if it's unusual.

(This sprung from a Twitter thread.)

UndefinedLocalVariables written at 23:31:36; Add Comment

2019-07-17

Django 1.11 has a bug that causes intermittent CSRF validation failures

Over on Twitter, I said:

People say that Django version upgrades are easy and reliable. That is why our web app, moved from 1.10 to 1.11, is now throwing CSRF errors on *a single form* but only when 'DEBUG=False' which, you know, doesn't help debug the issue.

Last week I updated our Django web application from Django 1.10.7 to 1.11.22. Today, one of its users reported that when they tried to submit a form, the application reported:

Forbidden (403)
CSRF verification failed. Request aborted.

More information is available with DEBUG=True.

At first I expected this to be a simple case of Django's CSRF browser cookie expiring or getting blocked. However, the person reproduced the issue, and then I reproduced the issue too, except that when I switched the live web app over to 'DEBUG=True', it didn't happen, and then sometimes it didn't happen even when debugging was off.

(Our application is infrequently used, so it's not surprising that this issue didn't surface (or didn't get reported) for a week.)

There are a number of reports of similar things on the Internet, for example here, here, here, and especially Django ticket #28488. Unfortunately not only was ticket 28488 theoretically fixed years ago, but it doesn't match what I see in Firefox's Network pane; there are no 404 HTTP requests served by our Django app, just regular successful ones.

(Here hints that maybe the issue involves using both sessions and CSRF cookies, which we do because sessions are a requirement for HTTP Basic Authentication, or at least they were at one point.)

The most popular workaround appears to be to stop Django from doing CSRF checks, often by setting CSRF_TRUSTED_ORIGINS to some value. My workaround for now is to revert back to Django 1.10.7; it may not be supported, but it actually works reliably for us, unlike Django 1.11. I am not sure that we will ever try 1.11 again; an intermittent failure that only happens in production is a really bad thing and not something I am very enthused about risking.

(I'm not particularly happy about this state of affairs and I have low expectations for the Django people fixing this issue in the remaining lifetime of 1.11, since this has clearly been happening with 1.11 for some time. Since I'm not willing to run 1.11 in production to test and try things for the Django people, it doesn't seem particularly useful to even try to report a bug.)

Django111CSRFFailures written at 21:30:52; Add Comment

2019-07-10

I brought our Django app up using Python 3 and it mostly just worked

I have been worrying for some time about the need to eventually get our Django web application running under Python 3; most recently I wrote about being realistic about our future plans, which mostly amounted to not doing anything until we had to. Well, guess what happened since then.

For reasons beyond the scope of this entry, last Friday I ended up working on moving our app from Django 1.10.7 to 1.11.x, which was enlivened by the usual problem. After I had it working under 1.11.22, I decided to try running it (in development mode, not in production) using Python 3 instead of Python 2, since Django 1.11.22 is itself fully compatible with Python 3. To my surprise, it took only a little bit of cleanup and additional changes beyond basic modernization to get it running, and the result is so far fully compatible with Python 2 as well (I committed the changes as part of the 1.11 move, and since Monday they're running in production).

I don't think this is particularly due to anything I've done in our app's code; instead, I think it's mostly due to the work that Django has done to make everything work more or less transparently. As the intermediate layer between your app and the web (and the database), Django is already the place that has to worry about character set conversion issues, so it can spare you from most of those. And generally that's the big difference between Python 2 and Python 3.

(The other difference is the print statement versus 'print()', but you can make Python 2.7 work in the same way as Python 3 with 'from __future__ import print_function', which is what I did.)

I haven't thoroughly tested our web app under Python 3, of course, but I did test a number of the basics and everything looks good. I'm fairly confident that there are no major issues left, only relatively small corner cases (and then the lurking issue of how well the Python 3 version of mod_wsgi works and if there are any traps there). I'm still planning to keep us on Python 2 and Django 1.11 through at least the end of this year, but if we needed to I could probably switch over to a current Django and Python 3 with not very much additional work (and most of the work would be updating to a new version of Django).

There was one interesting and amusing change I had to make, which is that I had to add a bunch of __str__ methods to various Django models that previously only had __unicode__ methods. When building HTML for things like form <select> fields, Django string-izes the names of model instances to determine what to put in here, but in Python 2 it actually generates the Unicode version and so ends up invoking __unicode__, while in Python 3 str is Unicode already and so Django was using __str__, which didn't exist. This is an interesting little incompatibility.

Sidebar: The specific changes I needed to make

I'm going to write these down partly because I want a coherent record, and partly because some of them are interesting.

  • When generating a random key to embed in a URL, read from /dev/urandom using binary mode instead of text mode and switch from an ad-hoc implementation of base64.urlsafe_b64encode to using the real thing. I don't know why I didn't use the base64 module in the first place; perhaps I just didn't look for it, since I already knew about Python 2's special purpose encodings.

  • Add __str__ methods to various Django model classes that previously only had __unicode__ ones.

  • Switch from print statements to print() as a function in some administrative tools the app has. The main app code doesn't use print, but some of the administrative commands report diagnostics and so on.

  • Fix mismatched tabs versus spaces indentation, which snuck in because my usual editor for Python used to use all-tabs and now uses all-spaces. At some point I should mass-convert all of the existing code files to use all-spaces, perhaps with four-space indentation.

  • Change a bunch of old style exception syntax, 'except Thing, e:', to 'except Thing as e:'. I wound up finding all of these with grep.

  • Fix one instance of sorting a dictionary's .keys(), since Python 3 now returns an iterator here instead of a sortable object.

Many of these changes were good ideas in general, and none of them are ones that I find objectionable. Certainly switching to just using base64.urlsafe_b64encode makes the code better (and it makes me feel silly for not using it to start with).

DjangoAppPython3Surprise written at 21:46:22; Add Comment

2019-07-04

Django's goals are probably not our goals for our web application

Django bills itself as "the web framework for perfectionists with deadlines". As a logical part of that, Django is always working to improve itself, as are probably almost all frameworks. For people with actively developed applications (perfectionists or otherwise), this is fine. They are working on their app anyway, constantly making other changes and improvements and adjustments, so time and Django updates will deliver a continue stream of improvements (along with a certain amount of changes they have to make to keep up, but again they're already making changes).

This does not describe our goals or what we do with our web application. What we want is to write our app, reach a point where it's essentially complete (which we pretty much achieved a while ago), and then touch it only on the rare occasions when there are changes in the requirements. Django provides what we need in terms of features (and someone has to write that code), but it doesn't and never will provide the stability that we also want. Neither sharks nor frameworks for perfectionists ever stand still.

This creates an awkward mismatch between what Django wants us to do and what we want to do, one that I have unfortunately spent years not realizing and understanding. In particular, from our perspective the work of keeping up with Django's changes and evolution is almost pure overhead. Our web application is running fine as it is, but every so often we need to go change it in order to nominally have security fixes available, and in completely unsurprising news I'm not very enthusiastic or active about doing this (not any more, at least; I was in the beginning). The latest change we need is an especially large amount of work, as we will have to move from Python 2 to Python 3.

(We don't need bug fixes because we aren't running into bugs. If we were, we probably would have to work around them anyway rather than wait for a new Django release.)

I don't know what the solution is, or even if there is a solution (especially at this point, with our application already written for Django). I expect that other frameworks (in any language) would have the same bias towards evolution and change that Django does; most users of them, especially big active ones, are likely people who have applications that are being actively developed on a regular basis. I suspect that 'web frameworks for people who want to write their app and then walk away from it' is not a very big niche, and it's not likely to be very satisfying for open source developers to work on.

(Among other structural issues, as a developer you don't get to do anything. You write your framework, fix the bugs, and then people like me want to you stop improving things.)

PS: I don't think this necessarily means that we made a bad choice when we picked Django way back when, because I'm not sure there was a better choice to be made. Writing our web app was clearly the right choice (it has saved us so much time and effort over the years), and using a framework made that feasible.

DjangoGoalsNotOurGoals written at 21:30:02; Add Comment

2019-06-29

Being realistic about what we're going to do with our Django app

One of our biggest problem points for moving away from Python 2 is our Django app, which handles all of the workflow when people request new accounts. Back in last August I wrote about how it needed tests, and then in February I wrote about that again, and now it is almost July and guess what, our app still has no tests. There is a pattern here, and given that pattern I think it's time for me to get realistic about what we're going to do with our app in next few years and how that's going to work. Being realistic doesn't leave me with pleasant answers, but at least I can try to be honest with myself for once instead of pretending.

(The problem with pretending is that I wind up not preparing for what actually happens.)

Our app is currently running on an Ubuntu 18.04 machine under Python 2 and mod_wsgi. This combination can keep running until early 2023 and we're going to do that unless there is a critical reason not to do so. By mid 2022 we should know whether or not Ubuntu 22.04 LTS will allow us to keep on running the Python 2 version with mod_wsgi; if it can, we will quite likely continue on with that until mid 2026 makes this issue something we can't ignore any more. At this point, keeping the app Python 2 until Ubuntu 18.04 support runs out is basic realism; it seems pretty unlikely that I will get around to porting the app to Python 3 in the remaining five months or so of 2019.

(We could probably switch to CentOS 8 for even longer support of Python 2, but this particular app is not worth going to that much effort and annoyance.)

At this point everyone notes that the last version of Django that supports Python 2 is 1.11, and support for 1.11 runs out at the end of this year. This is a good argument in theory, but in practice we are already running on an unsupported Django version, as we are back at Django 1.10.7 at the moment (as we have been since 2017 because Django updates are a pain at the best of times). Running an unsupported version of Django is nothing new for us; instead, it's unfortunately become the default state of affairs. I want to try to update the application to Django 1.11 at some point for various hand waving reasons, which hopefully won't be too much work. Possibly this means that we should switch to using the Ubuntu 18.04 packaged version of Django 1.11, even though I didn't think that was a good idea last November. If we're going to run an unsupported Django, it might as well be a version that someone might be keeping an eye on.

Does this present a security risk? Somewhat, but my view is that it's a relatively low one. Almost all of the web app is locked away behind Apache's HTTP basic authentication and restricted to a small number of trusted users only (and the Django admin interface is even more restricted). The exposed app surface is relatively low and relatively simple; we have a couple of basic forms and that's it (and one endpoint for AJAX that gives a yes/no answer to whether or not something is an available Unix login). Also, nothing permanent is done automatically by the app; a human is always in the loop before an account is actually created.

(It's possible that a Django vulnerability could be leveraged to attack other web things through our app, through CSRF or the like. But that would be a pretty targeted attack against the department by someone who would have to know a fair bit about how the app works, who uses it, and what else they interact with that can be attacked. Obviously the catastrophic scenario would be a remote code execution flaw that could be exploited through a basic URL view or form submission, but that seems unlikely.)

Wanting to write Django tests doesn't seem to have done much good, so my alternate plan for a Python 3 port is simply to try running our web app under Python 3, probably with Django 1.11 to keep things simple. If and when I find code that should be modernized anyway or changes that still keep things compatible with Python 2, I can fix them in the production codebase to make it more and more ready for Python 3. My hope is that a great deal of this can be done with clean changes that do not have to be conditional on Python 2 versus Python 3 but are simply good ideas in general. My hope is that the simplicity of our application combined with Django handling a lot of stuff for us behind the scene will lead to most things just working, so running it under Python 3 will mostly just work. We won't have the assurance that tests would give us, but in practice I can manually exercise things and declare the result good enough.

One big issue for Python 3 code is character set conversion and especially points where Python 3's automatic conversions can fail on you. For this, we're going to punt. I'm not going to try to harden the application to deal with character set decoding problems with the few data files that it reads; in our environment we can guarantee that they're always ASCII and so will always decode correctly. Similarly, we're always going to encode to the system default of UTF-8 when writing out files, which means that it too always works. Hopefully this means that I can ignore almost all of those issues in the Python 3 version of the app, which is what the Python 2 version is already doing.

(There are some places where I will want to require ASCII, but they're already points where I should be doing that, like the Unix login name that people choose, and so I should add these checks to the current version of the application.)

This will probably leave the Python 3 version of the application vulnerable to throwing exceptions if people put in weird characters in forms or do other things, but if that happens we actually don't care too much. The app is not used much (people don't request accounts all that often), and it's not too critical an issue if the app's not working for a few days while we fix the code to be more defensive or de-mangle things from its tiny little database.

(The app's database is so small that if we have to, we can dump it to plain text, edit the plain text, and recreate a new db from that. It is, naturally, a SQLite database.)

All of this is setting a relatively low quality standard for the eventual Python 3 version, but at this point that's realism. The app is neither a high enough priority nor interesting enough for us to do it any better, not unless I suddenly get a vast gulf of free time with nothing else to work on.

PS: Facing up to reality here has also made me realize some things about Django and us, but that's for another entry.

DjangoAppBeingRealistic written at 20:45:06; Add Comment

(Previous 10 or go back to June 2019 at 2019/06/24)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.