## How recursively flattening a list raises a Python type question

February 26, 2017

Today I wound up reading Why it's hard for programmers to write a program to flatten a list? (via), where the quiz challenge put forward is to turn an input like `[1,[2,3], [4, [5,6]]]` into `[1,2,3,4,5,6]`. My immediate reaction was that I'd do this in Python rather than in any statically typed language I know, because all of them make the input type here hard to represent. But then I realized that doing this in Python raises another type-related question.

If we stick exactly to the specification (and directly implement it), the result is pretty simple and straightforward:

```def flatten(inlst):
olst = []
for i in inlst:
if isinstance(i, int):
olst.append(i)
elif isinstance(i, list):
olst.extend(flatten(i))
else:
raise ValueError("invalid element in list")
return olst
```

(You can optimize this by having a `_flatten` internal function that gets passed the output list, so you don't have to keep building lists and then merging them into other lists as you work down and then back up the recursion stack. Also, I'm explicitly opting to return an actual list instead of making this a (recursive) generator.)

However, this code is not very Pythonic because it is so very narrowly typed. We can relax it slightly by checking for ```isinstance(i, (int, float))```, but even then most people would say that `flatten()` should definitely accept tuples in place of lists and probably even sets.

If we're thinking about being Pythonic and general, the obvious thing to do is check if the object is iterable. So we write some simple and general code:

```def flatten2(inlst):
olst = []
for i in inlst:
try:
it = iter(i)
except TypeError:
it = None
if it is None:
olst.append(i)
else:
olst.extend(flatten2(i))
return olst
```

This should flatten any type (or mixture of types) that contains elements, as denoted by the fact that it's iterable. It looks good and passes initial tests. Then some joker calls our code with `flatten2(["abcdef",])` and suddenly we have a problem. Then another joker calls our code with `flatten2([somedict,])` and files a bug that our code only flattens the keys of their dictionary, not the keys and values.

(As an exercise, can you predict in advance, without trying it, what our problem is with `flatten2(["abcdef",])`, and why it happens? I got this wrong when I was writing and testing this code in Python 3 and had to scratch my head for a bit before the penny dropped.)

The problem here is that 'is iterable' is not exactly what we want. Some things, such as strings, are iterable but should probably be treated as indivisible by `flatten2()`. Other things, such as dicts, are iterable but the default iteration result does not fully represent their contents. Really, not only is Python lacking a simple condition for what we want, it's arguably not clear just what we want to do if we're generalizing `flatten()` (and what making it 'Pythonic' really means).

One valid answer is that we will explicitly check for container types that are close enough to what we want, and otherwise mostly return things as-is. Here we would write a version of `flatten()` that looked like this:

```def flatten3(inlst):
olst = []
for i in inlst:
if isinstance(i, (list, tuple, set)):
olst.extend(flatten3(i))
elif isinstance(i, dict):
raise ValueError("dict not valid in list")
else:
olst.append(i)
return olst
```

We could treat dicts as single elements and just return them, but that is probably not what the caller intended. Still, this check feels dubious, which is a warning sign.

As a minimum, it would be nice to have a Python abstract type or trait that represented 'this is a container object and iterating it returns a full copy of its contents'; you could call this the property of being list-like. This would be true for lists, tuples, and sets, but false for dicts, which would give us a starting point. It would also be true for strings, but you can't win them all; when dealing with iterable things, we'll probably always have to special-case strings.

(I'd go so far as arguing that making strings iterable by default was a Python mistake. It's one of those neat features that winds up getting in the way in practice.)

I don't have an answer here, by the way. If I was in this situation I might either write and carefully document a version of `flatten2()` (specifying 'recursively flattens any iterable thing using its default iterator; this will probably not do what you want for dicts'), or go with some version of `flatten3()` that specifically restricted iteration to things that I felt were sufficiently list-like.

(I'd worry about missing some new popular type over time, though. Ten years ago I might not have put `set` in the list, and who knows what I'm missing today that's going to be popular in Python in the future. Queues? Trees? Efficient numerical arrays?)

By yggdrasil at 2017-02-26 03:59:09:

collections.Sequence?

>>> isinstance({}, collections.Sequence)
False

I was rather surprised not seeing the collections module mentioned at all in the post.

By cks at 2017-02-26 15:19:07:

Sets aren't `collections.Sequence`s (because they're not reversible) and I think we should flatten sets. I took a look at the stuff in `collections.abc` but couldn't find anything that seemed quite on, so I omitted a discussion of it (which may have been a mistake).

The other issue I see with using `collections.abc` classes for `isinstance()` checks is that I suspect a lot of user-written classes do not have all of the methods necessary to be considered a `Collection` or `Sequence` (or a `Mapping`, so one can detect and exclude that). This would give false negatives for classes that you do want to be flattenable.

If we ignore that, checking for things that are `Collection`s but not `Mapping`s is probably the closest we can come. We know mappings have key/value pairs, so we know it's at least not clear whether iterating them will give the full contents.

By Ewen McNeill at 2017-02-26 15:36:13:

Having run into this myself a few times recently (for basically the same reason: trying to normalise externally loaded data which may be a bare string or a list), I tend to agree that strings probably should not have been automatically iterable in the way that other sequences are. It's definitely useful to be able to iterate over the characters in a string -- but it feels like most of the time I end up doing it by accident and having to debug it, then code around it.

Maybe iteration over the elements of a string should require a `str.items(`) or `str.elem(`) or `str.elements(`) which returns a very thin proxy to be the iterator?

Ewen

PS: I too ended up with `isinstance(foo, str`) and/or `isinstance(foo, list`) as a quick detection work around, after also quickly looking through collections to find the right supertype that matched `list`, etc, but not `str`. Fortunately due to the way the data was coming in I could be fairly sure it was either `list` or `str`.

By Ewen McNeill at 2017-02-26 21:08:55:

FTR, this question also gets asked on Stack Exchange fairly often, of which this question and this question seem to be some of the better answers. Among other things they suggest using `basestring` to check for string-like types... but as answer to the second question point out, that's non-portable over the Python 2/Python 3 chasm :-( (It was at about this "non-portable over Python 2/Python 3 stage that I gave up and hard coded the type checks, because I was trying to target Python 2.6 - 3.5 or at least Python 2.7 - 3.5, and could make reasonable assumptions about the few types I could expect.)

At a quick glance it appears few, if any of them, are prepared to handle a StringsRecursivelyIterable sidebar (self-containing container) case...

Ewen

From 172.93.51.212 at 2017-03-11 08:18:24:
```   def flatten(*iterable):
"""hopefully equivalent to python2's compiler.ast.flatten"""
# thanks http://stackoverflow.com/a/16176779
import collections
from itertools import chain
for el in chain(*iterable):
if isinstance(el, collections.Iterable) and not isinstance(el, str):
for subel in flatten(el):
yield subel
else:
yield el
```
Written on 26 February 2017.