Wandering Thoughts

2017-03-18

Part of why Python 3.5's await and async have some odd usage restrictions

Python 3.5 added a new system for coroutines and asynchronous programming, based around new async and await keywords (which have the technical details written up at length in PEP 492). Roughly speaking, in terms of coroutines implemented with yield from, await replaces 'yield from' (and is more powerful). So what's async for? Well, it marks a function that can use await. If you use await outside an async function, you'll get a syntax error. Functions marked async have some odd restrictions, too, such as that you can't use yield or yield from in them.

When I described doing coroutines with yield from here, I noted that it was potentially error prone because in order to make everything work you had to have an unbroken chain of yield from from top to bottom. Break the chain or use yield instead of yield from, and things wouldn't work. And because both yield from and yield are used for regular generators as well as coroutines, it's possible to slip up in various ways. Well, when you introduce new syntax you can fix issues like that, and that's part of why async and await have their odd rules.

A function marked async is a (native) coroutine. await can only be applied to coroutines, which means that you can't accidentally treat a generator like a coroutine the way you can with yield from. Simplifying slightly, coroutines can only be invoked through await; you can't call one or use them as a generator, for example as 'for something in coroutine(...):'. As part of not being generators, coroutines can't use 'yield' or 'yield from'.

(And there's only await, so you avoid the whole 'yield' verus 'yield from' confusion.)

In other words, coroutines can only be invoked from coroutines and they must be invoked using the exact mechanism that makes coroutines work (and that mechanism isn't and can't be used for or by anything else). The entire system is designed so that you're more or less forced to create that unbroken chain of awaits that makes it all go. Although Python itself won't error out on import time if you try to call a async function without await (it just won't work at runtime), there's probably Python static checkers that look for this. And in general it's an easy rule to keep track of; if it's async, you have to await it, and this status is marked right there in the function definition.

(Unfortunately it's not in the type of the function, which means that you can't tell by just importing the module interactively and then doing 'type(mod.func)'.)

Sidebar: The other reason you can only use await in async functions

Before Python 3.5, the following was completely valid code:

def somefunc(a1, b2):
   ...
   await = interval(a1, 10)
   otherfunc(b2, await)
   ...

In other words, await was not a reserved keyword and so could be legally used as the name of a local variable, or for that matter a function argument or a global.

Had Python 3.5 made await a keyword in all contexts, all such code would immediately have broken. That's not acceptable for a minor release, so Python needed some sort of workaround. So it's not that you can't use await outside of functions marked async; it's that it's not a keyword outside of async functions. Since it's not a keyword, writing something like 'await func(arg)' is a syntax error, just as 'abcdef func(arg)' would be.

The same is true of async, by the way:

def somefunc(a, b, async = False):
  if b == 10:
     async = True
  ....

Thus why it's a syntax error to use 'async for' or 'async with' outside of an async function; outside of such functions async isn't even a keyword so 'async for' is treated the same as 'abcdef for'.

(I'm sure this makes Python's parser that much more fun.)

AsyncAwaitRestrictionsWhy written at 01:11:54; Add Comment

2017-03-16

How we can use yield from to implement coroutines

Give my new understanding of generator functions and yield from, we can now see how to use yield from to implement coroutines and an event loop. Consider a three level stack of functions, where on the top layer you have an event loop, in the middle you have the processing code you write, and on the bottom are event functions like wait_read() or sleep().

Let's start with an example processing function or two:

def countdown(n):
   while n:
      print("T-minus", n)
      n -= 1
      yield from sleep(1)

def launch(what, nsecs):
   print("counting down for", what)
   yield from countdown(nsecs)
   print("launching", what)

To start a launch, we call something like 'coro.start(launch("fred", 10))', which looks a bit peculiar since it sort of seems like coro.start() should get control only after the launch. However, we already know that calling a generator function doesn't do exactly what it looks like. What coro.start() gets when we do this is an unstarted generator object (which handily encapsulates those arguments to launch(), so we don't have to do it by hand).

When the coroutine scheduler starts the launch() generator object, we wind up with a chain of yield froms that bottoms out at sleep(). What sleep() yields is passed back up to the coroutine scheduler and the entire call chain is suspended; this is no different that what I did by calling .send() by hand yesterday. What sleep() returns to the scheduler is an object (call it an event object) that tells the coroutine scheduler under what conditions this coroutine should be resumed. When the scheduler reaches the point that the coroutine should be run again, the scheduler will once again call .send(), which will resume execution in sleep(), which will then return back to countdown(), and so on. The scheduler may use this .send() to pass information back to sleep(), such as how long it took before the coroutine was restarted.

Here yield and yield from are being used for two things. First, they create a communication channel between the coroutine scheduler and the low-level event functions like sleep(). Our launch() and countdown() functions are oblivious to this since they don't touch either the value sleep() yields up to the scheduler or the value that the scheduler injects to sleep() with .send(). Second, the chain of yield from and the final yield neatly suspend the entire call stack.

In order for this to work reliably, there are two rules that our user-written processing functions have to follow. First, they must never accidentally attempt to do anything with the sleep() generator function. It is okay but unclear for a non-generator function to call sleep() and return the result:

def sleep_minutes(n):
   return sleep(n * 60)

def long_countdown(n):
   while n:
      print("T-minus", n, "minutes")
      yield from sleep_minutes(1)
      n -= 1

This is ultimately because 'yield from func()' is equivalent to 't = func(); yield from t'. We don't care just how the generator object got to us so we can yield from it, we just care that it did.

However, at no stage in our processing functions can we attempt to look at the results of iterating sleep()'s generator object, either directly or indirectly by writing, say, 'for i in countdown(10):'. This rules out certain patterns for writing processing functions, for instance this one:

def label_each_sec(label, n):
   for _ in tick_once_per_sec(n):
      print(label)

This leads to the second rule, which is that we must have an unbroken chain of yield froms from the top to the bottom of our processing functions, right down to where you use an event function such as sleep(). Each function must 'call' the next using the 'yield from func()' idiom. In effect we don't have calls from one processing function to another; instead we're passing control from one function to the next. In my example, launch() passes control to countdown() until the countdown expires (and countdown() passes control to sleep()). If we actually call a processing function normally or accidentally use 'yield' instead of 'yield from', the entire collection explodes into various sorts of errors without getting off the launch pad and you will not go to space today.

As you might imagine, this is a little bit open to errors. Under normal circumstances you'll catch the errors fairly fast (when your main code doesn't work). However, since errors can only be caught at runtime when a non-yield from code path is reached, you may have mistakes that lurk in rarely executed code paths. Perhaps you have a rarely invoked last moment launch abort:

def launch(what, nsecs):
   print("counting down for", what)
   yield from countdown(nsecs)
   if launch_abort:
      print("Aborting launch! Clear the launch pad for", what)
      yield sleep(1)
      print("Flooding fire suppression ...")
   else:
      print("launching", what)

It might be a while before you discovered that mistake (I'm doing a certain amount of hand-waving about early aborts in countdown()).

(See also my somewhat related attempt at understanding this sort of thing in a Javascript context in Understanding how generators help asynchronous programming. Note that you can't use my particular approach from that entry in Python with 'yield from' for reasons beyond the scope of this entry.)

CoroutinesWithYieldFrom written at 00:32:50; Add Comment

2017-03-15

Sorting out Python generator functions and yield from in my head

Through a chain of reading, I wound up at How the heck does async/await work in Python 3.5? (via the Trio tutorial). As has happened before when I started reading about Python 3's new async and await stuff, my head started hurting when I hit the somewhat breezy discussion of yield from and I felt the need to slow down and try to solidly understand this, which I haven't really before.

Generator functions are functions that contain a yield statement:

def fred(a):
   r = yield 10
   print("fred got:", r)
   yield a

A straightforward generator function is there to produce a whole series of values without having to ever materialize all of them at once in a list or the like.

Calling a generator function does not return its result. Instead, it returns a generator object, which is a form of iterator:

>>> fred(10)
<generator object fred at 0x7f52a75ea1a8>

This generator object is in part a closure that captures the argument fred() was called with (and in general will preserve fred()'s state while it is repeatedly iterated). Note that fred()'s code doesn't start executing until you try to get the first value from the iterator.

One common pattern with a stack of generator functions (including needing to modify or filter part of a generator's results) was that you would have one generator function that wanted to call another one for a while. In the beginning this was done with explicit for loops and the like, but then Python added yield from. yield from takes a generator or iterator and exhausts it for you, repeatedly yield'ing the result.

def barney(a):
   yield from fred(a)

(You can intermix yield and yield from and use both of them more than once in a function.)

Because generator functions actually return generators, not any sort of result, 'yield from func()' is essentially syntactic sugar for calling the function, getting a generator object back, and then calling yield from on the generator object. There is no special magic involved in that:

def barney(a):
   gen = fred(a)
   yield from gen

Because generator objects are ordinary objects, they can be returned through functions that are not generators themselves, provided that intermediate functions don't really attempt to manipulate them and simply return them as-is:

def jim(a):
   return fred(a)

def bob(a):
   yield from jim(a)

(If jim() actually iterated through fred()'s results, things would quietly go sideways in ways that might or might not be visible.)

When yield started out, it was a statement; however, that got revised so that it was an expression and could thus have a value, as we see in fred() where the value of one yield is assigned to r and then used later. You (the holder of the generator object) inject that value by calling .send() on the generator:

>>> g = fred(2)
>>> _ = g.send(None); _ = g.send("a")
fred got: a

(The first .send() starts the generator running and must be made with a None argument.)

As part of adding yield from, Python arranged it so that if you had a stack of yield from invocations and you called .send() on the outer generator object, the value you sent did not go to the outer generator object; instead it goes all the way down to the eventual generator object that is doing a yield instead of a yield from.

def level3(a):
   # three levels of yield from
   # and we pass through a normal
   # functions too
   yield from bob(a)

>>> g = level3(10)
>>> _ = g.send(None); _ = g.send("down there")
fred got: down there

This means that if you have a stack of functions that all relay things back up using 'yield from', you have a direct path from your top level code (here that's our interactive code where we called level3()) all the way down to the core generator function at the bottom of the call stack (here, the fred() function). You and it can communicate with each other through the values it yields and the values you send() to it without any function in the middle having to understand anything about this; it's entirely transparent to them.

(Don't accidentally write 'yield' instead of 'yield from', though. The good news about that mistake is that you'll catch it fast.)

Hopefully writing this has anchored yield from's full behavior and the logic behind it sufficiently solidly in my head that it will actually stick this time around.

Sidebar: yield from versus yield of a generator

Suppose that we have a little mistake:

def barney2(a):
   yield fred(a)

What happens? Basically what you'd expect:

>>> list(barney(20))
fred got: None
[10, 20]
>>> list(barney2(20))
[<generator object fred at 0x7f0fdf4b9258>]

When we used yield instead of yield from, we returned a value instead of iterating through the generator. The value here is what we get as the result of calling fred(), which is a generator object.

By the way, a corollary to strings being iterable is that accidentally calling 'yield from' instead of 'yield' on a string won't fail the way that eg 'yield from 10' does but will instead give you a sequence of single characters. You'll probably notice that error fairly fast, though.

This behavior of yield from is pretty much a feature, because it means that you can yield from another function without having to care about whether it's an actual generator function or it merely returns an iterable object of some sort; either will work.

YieldFromAndGeneratorFunctions written at 01:05:27; Add Comment

2017-02-26

In Python, strings are infinitely recursively iterable

Yesterday I posed the side question of what happened when you called the following code with flatten2(["abcdef",]):

def flatten2(inlst):
    olst = []
    for i in inlst:
        try:
            it = iter(i)
        except TypeError:
            it = None
        if it is None:
            olst.append(i)
        else:
            olst.extend(flatten2(i))
    return olst

The intent of this code is to recursively flatten iterable things. What I had expected to get when I called it with a string was ['a', 'b', 'c', 'd', 'e', 'f'], ie that it flattened the string instead of leaving it alone (because strings are iterable). What I actually got (and what you can get) is a RecursionError of 'maximum recursion depth exceeded'. So why did this recurse endlessly?

The answer is that strings are not just iterable, they are infinitely recursively iterable. With normal iterable containers, iterating the container yields non-iterable items unless you've explicitly put in iterable ones (such as a list inside another list); assuming that you have not cleverly embedded a cycle in your container, our recursive flattening will eventually bottom out and finish. This is not the case for Python strings. When you iterate a multi-character string like "abcdef", you get a sequence of single-character strings, "a" "b" "c" "d" "e" "f". However, these are still strings and so they are still iterable; when you iterate the "a" string, you get back another single-character string "a", which is also iterable. And so flatten2() chases down an infinite recursion of trying to unpack single-character strings into non-iterable items, which it will never succeed at.

Fixing this without side effects is a bit annoying. It's not enough to immediately check that inlst is length 1 and just return it if so, because this fails on flatten2([[1,]]) (and flatten2(["abcdef",]) too, for that matter). I think you have to check explicitly for length 1 strings (and bytes) and just return them, which of course leaves you exposed to any other infinitely recursively iterable types (should there be any other out there).

(I was about to gripe about repr()'s representation of single character strings but then I tested and it uses single quotes for all strings, not just single-character ones. I need to remember that in Python there is no 'character' type that's different from strings, unlike in other languages such as Go, and so single quotes still mean 'string'.)

Sidebar: It's trivially possible to create containers with cycles

>>> a = [1, 2, 3]
>>> a.append(a)
>>> print(a)
[1, 2, 3, [...]]

More involved versions are left as an exercise for the reader.

Without contemplating it too hard, I think that you can only create cycles with mutable containers, and then only with some of them. Sets are mutable, for example, but they can only contain hashable items, which are generally immutable.

StringsRecursivelyIterable written at 19:32:55; Add Comment

How recursively flattening a list raises a Python type question

Today I wound up reading Why it's hard for programmers to write a program to flatten a list? (via), where the quiz challenge put forward is to turn an input like [1,[2,3], [4, [5,6]]] into [1,2,3,4,5,6]. My immediate reaction was that I'd do this in Python rather than in any statically typed language I know, because all of them make the input type here hard to represent. But then I realized that doing this in Python raises another type-related question.

If we stick exactly to the specification (and directly implement it), the result is pretty simple and straightforward:

def flatten(inlst):
    olst = []
    for i in inlst:
        if isinstance(i, int):
            olst.append(i)
        elif isinstance(i, list):
            olst.extend(flatten(i))
        else:
            raise ValueError("invalid element in list")
    return olst

(You can optimize this by having a _flatten internal function that gets passed the output list, so you don't have to keep building lists and then merging them into other lists as you work down and then back up the recursion stack. Also, I'm explicitly opting to return an actual list instead of making this a (recursive) generator.)

However, this code is not very Pythonic because it is so very narrowly typed. We can relax it slightly by checking for isinstance(i, (int, float)), but even then most people would say that flatten() should definitely accept tuples in place of lists and probably even sets.

If we're thinking about being Pythonic and general, the obvious thing to do is check if the object is iterable. So we write some simple and general code:

def flatten2(inlst):
    olst = []
    for i in inlst:
        try:
            it = iter(i)
        except TypeError:
            it = None
        if it is None:
            olst.append(i)
        else:
            olst.extend(flatten2(i))
    return olst

This should flatten any type (or mixture of types) that contains elements, as denoted by the fact that it's iterable. It looks good and passes initial tests. Then some joker calls our code with flatten2(["abcdef",]) and suddenly we have a problem. Then another joker calls our code with flatten2([somedict,]) and files a bug that our code only flattens the keys of their dictionary, not the keys and values.

(As an exercise, can you predict in advance, without trying it, what our problem is with flatten2(["abcdef",]), and why it happens? I got this wrong when I was writing and testing this code in Python 3 and had to scratch my head for a bit before the penny dropped.)

The problem here is that 'is iterable' is not exactly what we want. Some things, such as strings, are iterable but should probably be treated as indivisible by flatten2(). Other things, such as dicts, are iterable but the default iteration result does not fully represent their contents. Really, not only is Python lacking a simple condition for what we want, it's arguably not clear just what we want to do if we're generalizing flatten() (and what making it 'Pythonic' really means).

One valid answer is that we will explicitly check for container types that are close enough to what we want, and otherwise mostly return things as-is. Here we would write a version of flatten() that looked like this:

def flatten3(inlst):
    olst = []
    for i in inlst:
        if isinstance(i, (list, tuple, set)):
            olst.extend(flatten3(i))
        elif isinstance(i, dict):
            raise ValueError("dict not valid in list")
        else:
            olst.append(i)
    return olst

We could treat dicts as single elements and just return them, but that is probably not what the caller intended. Still, this check feels dubious, which is a warning sign.

As a minimum, it would be nice to have a Python abstract type or trait that represented 'this is a container object and iterating it returns a full copy of its contents'; you could call this the property of being list-like. This would be true for lists, tuples, and sets, but false for dicts, which would give us a starting point. It would also be true for strings, but you can't win them all; when dealing with iterable things, we'll probably always have to special-case strings.

(I'd go so far as arguing that making strings iterable by default was a Python mistake. It's one of those neat features that winds up getting in the way in practice.)

I don't have an answer here, by the way. If I was in this situation I might either write and carefully document a version of flatten2() (specifying 'recursively flattens any iterable thing using its default iterator; this will probably not do what you want for dicts'), or go with some version of flatten3() that specifically restricted iteration to things that I felt were sufficiently list-like.

(I'd worry about missing some new popular type over time, though. Ten years ago I might not have put set in the list, and who knows what I'm missing today that's going to be popular in Python in the future. Queues? Trees? Efficient numerical arrays?)

FlattenTypeQuestion written at 02:09:00; Add Comment

2017-02-10

Python won't (and can't) import native modules from zip archives

I've written before about running Python programs from zip files, and in general you can package up Python modules in zip files. Recently I was grumbling on Twitter about the hassles of copying multi-file Python things around, and Jouni Seppänen mentioned using zip files for this but wasn't sure whether they supported native modules (that is, compiled code, as opposed to Python code). It turns out that the answer is straightforward and unambiguous; Python doesn't support importing native modules from zip files. This is covered in the zipimport module:

Any files may be present in the ZIP archive, but only files .py and .pyc are available for import. ZIP import of dynamic modules (.pyd, .so) is disallowed. [...]

(The Python 2.7 documentation says the same thing. If you're like me and had never heard of .pyd files before, they're basically Windows DLLs.)

I don't know about the Windows side of things, but on Unix (with .so shared objects), this is not an arbitrary restriction Python has imposed, which is what the term 'disallowed' might lead you to think. Instead it's more or less inherent in the underlying API that Python is using. Python loads native modules using the Unix dlopen() function, and the dlopen() manpage is specific about its API:

void *dlopen(const char *filename, int flags);

Which is to say that dlopen() takes a (Unix) filename as the dynamic object to load. In order to call dlopen(), you must have an actual file on disk (or at least on something that can be mmap()'d). You can't just hand dlopen() a chunk of memory, for example a file that you read out of a zip file on the fly, and say 'treat this as a shared object'.

(dlopen() relies on mmap() for relatively solid reasons due to how regular shared libraries are loaded in order to share memory between processes. Plus, wanting to turn a block of memory into a loaded shared library is a pretty uncommon thing; mapping existing files on disk is the common case. There are probably potential users other than Python, but I doubt there are very many.)

In theory perhaps Python could extract the .so file from the zip file, write it to disk in some defined temporary location, and dlopen() that scratch file. There are any number of potential issues with that (including fun security ones), but what's certain is that it would be a lot of work. Python declines to do that work for you and to handle all of the possible special cases and so on; if you need to deploy a native module in a bundle like this, you'll have to arrange to extract it yourself. Among other things, that puts the responsibility for all of the special cases on your shoulders, not on Python's.

ZipimportAndNativeModules written at 01:40:40; Add Comment

2017-01-25

I think docstrings in Python are for everything, not just public things

In my entry on my mixed feelings about docstrings, I mentioned that my most recent Python code was all comments and no docstrings, but also that it was a program instead of a module. So here's a question: should that matter for whether or not you use docstrings? The obvious related question is whether docstrings are for documenting everything in a module, or only the things people are supposed to use.

My best current answer is that if you're going to use docstrings at all, I think that they should be for everything. It's not that docstrings are how you make public documentation, things that you intend that people will use help() on and so on; instead, it's that docstrings are simply the Pythonic way to write function, method, class, and module documentation, whether those are intended for public use or not. Signaling which things are public versus private is best done through other mechanisms (such as a module level __all__ with public exports).

(At this point I will pause to admit that I took a quick look through the standard library and it doesn't seem to be entirely consistent on this. Most things that appear to be internal functions and that have documentation at all seem to use docstrings, but some of them have one or two line comments at the start where docstrings would normally go.)

Of course there are some things that you can't document with docstrings; constants and variables can't have docstrings attached, so you need to either use comments or cover them in the module level documentation.

(Here I'm talking about 'what this function does and how you call it' sort of documentation. Other sorts of commentary about the code and how things operate still go in comments because there isn't any other place where they belong, so you'll find those sort of comments scattered through the standard library as necessary.)

I also have no idea if this is a widespread view in the Python community. I have the impression that it is and the Django source code seems to use docstrings to document functions and so on, but it could be that people now see comments as the way to go outside of public stuff that should show up in help().

(I'll probably try to shift how I document things to docstrings for the next piece of Python code I write, assuming I remember this by then and don't give up in annoyance.)

DocstringsForEverything written at 01:32:47; Add Comment

2017-01-23

My still-mixed feelings about Python's docstrings

I've written before about why I'm not really fond of docstrings in Python. I continue to feel that way, but at the same time I have to acknowledge that the Python world has made its decision; although help() et al have some heroic hacks to deal with documentation comments, the only fully supported in-Python way to document your module for other people is with docstrings.

This doesn't mean that docstrings are the best way to document your modules. Python's own module documentation, for example for the os module, is much more extensive than what 'help()' on the module will give you. The in-Python version is basically a moderate summary of the full documentation. But writing this sort of documentation is a lot of work, and if you're only going to write documentation for a public module in one place and once, it's clear that it has to be docstrings.

I tend to write docstring based documentation only erratically for non-public code. Documentation in comments simply looks better and 'more right' to me (and I find it less distracting when it's placed before a declaration, instead of between a declaration and the code as a docstring needs to be). I assume that anyone who needs to work with the code is going to have to read it, at which point they're going to see the comments. However this is partly because we don't have a stand-alone collection of Python modules that are reused across disparate programs, which means that I'm basically never in a situation where I just want to know what a module does so I can use it in a new program.

I definitely have mixed feelings about how Python has chosen to explicitly expose docstrings as attributes on functions, classes, and so on. On the one hand, it makes the simple version of help() be pretty straightforward and non-magical; you basically combine dir() and looking at everything's __doc__. On the other hand, that __doc__ is exposed to Python code has tempted me into abusing it for passing around other function-related information that I wanted my code to have access to. And of course the real version of help() is rather more complicated.

(In theory this also decouples where docstrings live from help()'s implementation. For example, Python 3.x could decide to turn properly formatted comments that are right before declarations into docstrings if the declaration didn't itself already have a docstring.)

My aesthetic feelings about how docstrings look are probably significantly cultural, in that most of the programming languages I've used and been exposed to used explicit comments and not docstrings and so I'm not used to them. I don't know if docstrings read more naturally to people who 'grew up' in Python and similar languages, but I suspect that they may well do so. Perhaps I should make an effort to write more docstrings in future Python code, instead of reflexively documenting everything in comments.

(Not that I write much Python code these days. But my most recent Python code was once again all comments and no docstrings, although it's a program and not a module.)

DocstringsMixedFeelings written at 01:12:52; Add Comment

2016-12-25

What can be going on with your custom management commands in Django 1.10

If you update a relatively old Django project from 1.9 to 1.10 or later and you have added your own custom management commands that take positional arguments, it's possible that those commands will abruptly stop working (well, stop accepting positional arguments). Also, although it is not explicitly documented, the *args parameter of your Command's handle() function is now often completely meaningless and you won't see anything passed in it (although when this happens is fairly obscure).

If you were still using the optparse based approach to argument parsing in your custom management commands, this is sort of expected; the 1.10 release notes mention, in the large 'features removed in 1.10' section, that 'support for optparse is dropped for custom management commands' (and you'll have been getting deprecation warnings about that in 1.9). However this happens even if you had already switched over to using argparse based argument parsing (ie, your custom management command class has an add_arguments function). Your code worked fine in Django 1.9 and failed in Django 1.10, despite no deprecation warning.

So, here is what is going on. In Django 1.9 and earlier, code in django.core.management.base's BaseCommand class hierarchy silently introspected your command class to see if it had an args member. If it did and you were already using argparse, BaseCommand silently added an extra args argument to your argument parser that swept up all positional arguments:

if not self.use_argparse:
   ... old optparse code ...
else:
   ....
   if self.args:
      # Keep compatibility and always accept
      # positional arguments, like optparse when
      # args is set
      parser.add_argument('args', nargs='*')

Later, other code in BaseCommand would silently take the value of the args argument from argparse's results and turn it into the args parameter for your handle() function (removing it from the argparse results in the process):

# Move positional args out of options to
# mimic legacy optparse
args = cmd_options.pop('args', ())

Note that this specifically happened when you had switched over to using argparse. That this catch-all argument was added (without real documentation) meant that in Django 1.9, if you still had an args member, you could not add your own argument to collect some or all of the positional arguments because it would clash with the automatically added argument here. And really confusing things might happen if you called some argparse argument 'args', because it would be stolen as the args parameter for your handle() function.

In Django 1.10, the first chunk of this code was changed so that an extra args argument is no longer automatically added to your argument parser if you still had an args member in your command class. Now such custom management commands could not get their extra positional arguments (and I believe would fail with a 'cannot parse command line arguments' error if you tried to supply some). However, watch out, because the second chunk of code is still there. If you have an argparse argument called 'args', Django will silently remove it from the argparse result and pass it to your handle() command as the args parameter.

(Of course this is kind of a feature. As it stands now, you can just add your own copy of the old Django 1.9 parser.add_argument call and nothing else in your code has to change. But it smells like a hack to me and I wouldn't be surprised if Django made this magic behavior disappear at some point.)

In my opinion, this automatic addition of an args argument should have been explicitly deprecated, meaning both a deprecation warning in Django 1.9 and an explicit mention in Django 1.10's documentation. Probably the magic behavior of converting an args argparse argument in the handle() args parameter should have been deprecated at the same time, but I don't have strong opinions.

(This is the kind of entry I write because we stumbled over this and then I went and dug the details out of the Django revision history, so I'm certainly not going to waste all that work.)

DjangoMgmtCommandArgProblem written at 01:59:52; Add Comment

2016-12-12

Some of my views on Naftali Harris's 'Python 2.8'

Naftali Harris recently wrote Why I'm Making Python 2.8, which served to also announce his 'Python 2.8' project (which has subsequently been renamed until the dust settles). His goals are straightforward:

[...] because I want to give all the people who use Python 2 access to the Python 3 language features, which I think are actually pretty cool. [...]

His opinions on the whole situation and evolutionary process of Python 3 are a fairly good match with mine, including on the relative (non-)gains to be had from porting perfectly good working code to Python 3. And many of the Python 3 features that his Python 2.8 adopts are quite nice in quiet ways (no-argument super(), for example).

So I ought to really like the idea of this sort of Python 2.8, and to a certain degree I do. But at the same time I don't feel all that interested in it or compelled by it, which has surprised me. After thinking about it for a while I think I understand why I feel this way and thus I have some views.

If you have a Python 2 code base that you're actively developing, changing, and evolving, then the new features are attractive. As you work on the code you'll get a chance to start using them and they'll make your life better, and it might even make sense to do a global refactoring to switch to nicer versions of things across your entire code base. Some of the new features will enable you to do more in your Python 2 code than you could before.

(I'm magically assuming that there are no support or other issues with Harris's version, that everyone loves it and adopts it in place of Python 2.7 and it gets solid support.)

But not all Python 2 code bases are like that. Some are basically static now; they just work and no one is touching them apart from bug fixes (hopefully minor) and other necessary but minimal updates. For these code bases, the big concern with Python 2 is simply whether it will continue to work and work well. You're very unlikely to go through such static code to rewrite bits of it to use new features adopted from Python 3, even if that would make the code clearer, because the current code works even if it's not as great as it could be.

If you're writing new greenfield code from scratch, even Harris concedes that Python 3 is a better language than Python 2 (and I'd maintain better than even his 'Python 2.8'). So you should really use Python 3 for that; it's even better than Harris's work even in the best case for the latter and there is no downside of existing code. I'm reasonably positive about Python 3 for new things after recent experiences.

The reason why I feel only lukewarm about Harris's work is simply that I don't really have any Python 2 code bases that I'm still actively developing. The closest that I come to one is DWiki, and that has only one real feature I expect to ever add to the current code (some form of good tag support, which I haven't done anything on for years). Everything else is essentially frozen, including my other substantial Python program.

(Oh, we have one Django application, but the Python 2 versus 3 fate of that is tied very closely to Django's support or lack thereof for Python 2.)

PS: Regardless of what the official schedule is, I believe that Python 2.7 will stick around well after 2020 and is unlikely to have problems. But if 'Python 2.8' extends that lifetime, it'll save me some amount of hassle and so I'm all for it. And if it wants to improve Python 2.7's performance too, sure, I'll take that as well.

Python28Feelings written at 23:20:10; Add Comment

(Previous 10 or go back to November 2016 at 2016/11/21)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.