In Python, strings are infinitely recursively iterable

February 26, 2017

Yesterday I posed the side question of what happened when you called the following code with flatten2(["abcdef",]):

def flatten2(inlst):
    olst = []
    for i in inlst:
            it = iter(i)
        except TypeError:
            it = None
        if it is None:
    return olst

The intent of this code is to recursively flatten iterable things. What I had expected to get when I called it with a string was ['a', 'b', 'c', 'd', 'e', 'f'], ie that it flattened the string instead of leaving it alone (because strings are iterable). What I actually got (and what you can get) is a RecursionError of 'maximum recursion depth exceeded'. So why did this recurse endlessly?

The answer is that strings are not just iterable, they are infinitely recursively iterable. With normal iterable containers, iterating the container yields non-iterable items unless you've explicitly put in iterable ones (such as a list inside another list); assuming that you have not cleverly embedded a cycle in your container, our recursive flattening will eventually bottom out and finish. This is not the case for Python strings. When you iterate a multi-character string like "abcdef", you get a sequence of single-character strings, "a" "b" "c" "d" "e" "f". However, these are still strings and so they are still iterable; when you iterate the "a" string, you get back another single-character string "a", which is also iterable. And so flatten2() chases down an infinite recursion of trying to unpack single-character strings into non-iterable items, which it will never succeed at.

Fixing this without side effects is a bit annoying. It's not enough to immediately check that inlst is length 1 and just return it if so, because this fails on flatten2([[1,]]) (and flatten2(["abcdef",]) too, for that matter). I think you have to check explicitly for length 1 strings (and bytes) and just return them, which of course leaves you exposed to any other infinitely recursively iterable types (should there be any other out there).

(I was about to gripe about repr()'s representation of single character strings but then I tested and it uses single quotes for all strings, not just single-character ones. I need to remember that in Python there is no 'character' type that's different from strings, unlike in other languages such as Go, and so single quotes still mean 'string'.)

Sidebar: It's trivially possible to create containers with cycles

>>> a = [1, 2, 3]
>>> a.append(a)
>>> print(a)
[1, 2, 3, [...]]

More involved versions are left as an exercise for the reader.

Without contemplating it too hard, I think that you can only create cycles with mutable containers, and then only with some of them. Sets are mutable, for example, but they can only contain hashable items, which are generally immutable.

Written on 26 February 2017.
« How recursively flattening a list raises a Python type question
The conflict between wildcard TLS certificates and Certificate Transparency »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Feb 26 19:32:55 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.