How recursively flattening a list raises a Python type question
Today I wound up reading Why it's hard for programmers to write
a program to flatten a list?
(via),
where the quiz challenge put forward is to turn an input like
[1,[2,3], [4, [5,6]]]
into [1,2,3,4,5,6]
. My immediate reaction
was that I'd do this in Python rather than in any statically typed
language I know, because all of them make the input type here hard
to represent. But then I realized that doing this in Python raises
another type-related question.
If we stick exactly to the specification (and directly implement it), the result is pretty simple and straightforward:
def flatten(inlst): olst = [] for i in inlst: if isinstance(i, int): olst.append(i) elif isinstance(i, list): olst.extend(flatten(i)) else: raise ValueError("invalid element in list") return olst
(You can optimize this by having a _flatten
internal function
that gets passed the output list, so you don't have to keep building
lists and then merging them into other lists as you work down and
then back up the recursion stack. Also, I'm explicitly opting to
return an actual list instead of making this a (recursive) generator.)
However, this code is not very Pythonic because it is so very
narrowly typed. We can relax it slightly by checking for isinstance(i,
(int, float))
, but even then most people would say that flatten()
should definitely accept tuples in place of lists and probably even
sets.
If we're thinking about being Pythonic and general, the obvious thing to do is check if the object is iterable. So we write some simple and general code:
def flatten2(inlst): olst = [] for i in inlst: try: it = iter(i) except TypeError: it = None if it is None: olst.append(i) else: olst.extend(flatten2(i)) return olst
This should flatten any type (or mixture of types) that contains
elements, as denoted by the fact that it's iterable. It looks good
and passes initial tests.
Then some joker calls our code with flatten2(["abcdef",])
and
suddenly we have a problem. Then another joker calls our code
with flatten2([somedict,])
and files a bug that our code only
flattens the keys of their dictionary, not the keys and values.
(As an exercise, can you predict in advance, without trying it,
what our problem is with flatten2(["abcdef",])
, and why it
happens? I got this wrong when I was writing and testing this
code in Python 3 and had to scratch my head for a bit before the
penny dropped.)
The problem here is that 'is iterable' is not exactly what we want.
Some things, such as strings, are iterable but should probably be
treated as indivisible by flatten2()
. Other things, such as dicts,
are iterable but the default iteration result does not fully represent
their contents. Really, not only is Python lacking a simple condition
for what we want, it's arguably not clear just what we want to do
if we're generalizing flatten()
(and what making it 'Pythonic'
really means).
One valid answer is that we will explicitly check for container
types that are close enough to what we want, and otherwise mostly
return things as-is. Here we would write a version of flatten()
that looked like this:
def flatten3(inlst): olst = [] for i in inlst: if isinstance(i, (list, tuple, set)): olst.extend(flatten3(i)) elif isinstance(i, dict): raise ValueError("dict not valid in list") else: olst.append(i) return olst
We could treat dicts as single elements and just return them, but that is probably not what the caller intended. Still, this check feels dubious, which is a warning sign.
As a minimum, it would be nice to have a Python abstract type or trait that represented 'this is a container object and iterating it returns a full copy of its contents'; you could call this the property of being list-like. This would be true for lists, tuples, and sets, but false for dicts, which would give us a starting point. It would also be true for strings, but you can't win them all; when dealing with iterable things, we'll probably always have to special-case strings.
(I'd go so far as arguing that making strings iterable by default was a Python mistake. It's one of those neat features that winds up getting in the way in practice.)
I don't have an answer here, by the way. If I was in this situation
I might either write and carefully document a version of flatten2()
(specifying 'recursively flattens any iterable thing using its
default iterator; this will probably not do what you want for
dicts'), or go with some version of flatten3()
that specifically
restricted iteration to things that I felt were sufficiently
list-like.
(I'd worry about missing some new popular type over time, though.
Ten years ago I might not have put set
in the list, and who knows
what I'm missing today that's going to be popular in Python in the
future. Queues? Trees? Efficient numerical arrays?)
Comments on this page:
|
|