Wandering Thoughts archives

2017-02-26

In Python, strings are infinitely recursively iterable

Yesterday I posed the side question of what happened when you called the following code with flatten2(["abcdef",]):

def flatten2(inlst):
    olst = []
    for i in inlst:
        try:
            it = iter(i)
        except TypeError:
            it = None
        if it is None:
            olst.append(i)
        else:
            olst.extend(flatten2(i))
    return olst

The intent of this code is to recursively flatten iterable things. What I had expected to get when I called it with a string was ['a', 'b', 'c', 'd', 'e', 'f'], ie that it flattened the string instead of leaving it alone (because strings are iterable). What I actually got (and what you can get) is a RecursionError of 'maximum recursion depth exceeded'. So why did this recurse endlessly?

The answer is that strings are not just iterable, they are infinitely recursively iterable. With normal iterable containers, iterating the container yields non-iterable items unless you've explicitly put in iterable ones (such as a list inside another list); assuming that you have not cleverly embedded a cycle in your container, our recursive flattening will eventually bottom out and finish. This is not the case for Python strings. When you iterate a multi-character string like "abcdef", you get a sequence of single-character strings, "a" "b" "c" "d" "e" "f". However, these are still strings and so they are still iterable; when you iterate the "a" string, you get back another single-character string "a", which is also iterable. And so flatten2() chases down an infinite recursion of trying to unpack single-character strings into non-iterable items, which it will never succeed at.

Fixing this without side effects is a bit annoying. It's not enough to immediately check that inlst is length 1 and just return it if so, because this fails on flatten2([[1,]]) (and flatten2(["abcdef",]) too, for that matter). I think you have to check explicitly for length 1 strings (and bytes) and just return them, which of course leaves you exposed to any other infinitely recursively iterable types (should there be any other out there).

(I was about to gripe about repr()'s representation of single character strings but then I tested and it uses single quotes for all strings, not just single-character ones. I need to remember that in Python there is no 'character' type that's different from strings, unlike in other languages such as Go, and so single quotes still mean 'string'.)

Sidebar: It's trivially possible to create containers with cycles

>>> a = [1, 2, 3]
>>> a.append(a)
>>> print(a)
[1, 2, 3, [...]]

More involved versions are left as an exercise for the reader.

Without contemplating it too hard, I think that you can only create cycles with mutable containers, and then only with some of them. Sets are mutable, for example, but they can only contain hashable items, which are generally immutable.

StringsRecursivelyIterable written at 19:32:55; Add Comment

How recursively flattening a list raises a Python type question

Today I wound up reading Why it's hard for programmers to write a program to flatten a list? (via), where the quiz challenge put forward is to turn an input like [1,[2,3], [4, [5,6]]] into [1,2,3,4,5,6]. My immediate reaction was that I'd do this in Python rather than in any statically typed language I know, because all of them make the input type here hard to represent. But then I realized that doing this in Python raises another type-related question.

If we stick exactly to the specification (and directly implement it), the result is pretty simple and straightforward:

def flatten(inlst):
    olst = []
    for i in inlst:
        if isinstance(i, int):
            olst.append(i)
        elif isinstance(i, list):
            olst.extend(flatten(i))
        else:
            raise ValueError("invalid element in list")
    return olst

(You can optimize this by having a _flatten internal function that gets passed the output list, so you don't have to keep building lists and then merging them into other lists as you work down and then back up the recursion stack. Also, I'm explicitly opting to return an actual list instead of making this a (recursive) generator.)

However, this code is not very Pythonic because it is so very narrowly typed. We can relax it slightly by checking for isinstance(i, (int, float)), but even then most people would say that flatten() should definitely accept tuples in place of lists and probably even sets.

If we're thinking about being Pythonic and general, the obvious thing to do is check if the object is iterable. So we write some simple and general code:

def flatten2(inlst):
    olst = []
    for i in inlst:
        try:
            it = iter(i)
        except TypeError:
            it = None
        if it is None:
            olst.append(i)
        else:
            olst.extend(flatten2(i))
    return olst

This should flatten any type (or mixture of types) that contains elements, as denoted by the fact that it's iterable. It looks good and passes initial tests. Then some joker calls our code with flatten2(["abcdef",]) and suddenly we have a problem. Then another joker calls our code with flatten2([somedict,]) and files a bug that our code only flattens the keys of their dictionary, not the keys and values.

(As an exercise, can you predict in advance, without trying it, what our problem is with flatten2(["abcdef",]), and why it happens? I got this wrong when I was writing and testing this code in Python 3 and had to scratch my head for a bit before the penny dropped.)

The problem here is that 'is iterable' is not exactly what we want. Some things, such as strings, are iterable but should probably be treated as indivisible by flatten2(). Other things, such as dicts, are iterable but the default iteration result does not fully represent their contents. Really, not only is Python lacking a simple condition for what we want, it's arguably not clear just what we want to do if we're generalizing flatten() (and what making it 'Pythonic' really means).

One valid answer is that we will explicitly check for container types that are close enough to what we want, and otherwise mostly return things as-is. Here we would write a version of flatten() that looked like this:

def flatten3(inlst):
    olst = []
    for i in inlst:
        if isinstance(i, (list, tuple, set)):
            olst.extend(flatten3(i))
        elif isinstance(i, dict):
            raise ValueError("dict not valid in list")
        else:
            olst.append(i)
    return olst

We could treat dicts as single elements and just return them, but that is probably not what the caller intended. Still, this check feels dubious, which is a warning sign.

As a minimum, it would be nice to have a Python abstract type or trait that represented 'this is a container object and iterating it returns a full copy of its contents'; you could call this the property of being list-like. This would be true for lists, tuples, and sets, but false for dicts, which would give us a starting point. It would also be true for strings, but you can't win them all; when dealing with iterable things, we'll probably always have to special-case strings.

(I'd go so far as arguing that making strings iterable by default was a Python mistake. It's one of those neat features that winds up getting in the way in practice.)

I don't have an answer here, by the way. If I was in this situation I might either write and carefully document a version of flatten2() (specifying 'recursively flattens any iterable thing using its default iterator; this will probably not do what you want for dicts'), or go with some version of flatten3() that specifically restricted iteration to things that I felt were sufficiently list-like.

(I'd worry about missing some new popular type over time, though. Ten years ago I might not have put set in the list, and who knows what I'm missing today that's going to be popular in Python in the future. Queues? Trees? Efficient numerical arrays?)

FlattenTypeQuestion written at 02:09:00; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.