Single-level list flattening in Python

July 13, 2010

I have been writing a bunch of Python code lately which involves turning lists of lists into plain lists. As it happens Python does not have a flatten() built-in, but people have come up with any number of ingenious ways to do this, some of which I actually like.

First, a stupid non-way that is worth writing down for posterity:

reduce(lambda x,y: x.extend(y), somelists, [])

This version is a classic mistake (one I made today, in fact). Your reward for trying this is the cryptic error "AttributeError: 'NoneType' object has no attribute 'extend'". What this really means is that you've forgotten that .extend() doesn't return anything (ie, it returns None), and the function that reduce() uses must return the accumulator variable. You can fix this by picking an equivalent list operation that does return a result:

reduce(lambda x,y: x+y, somelists, [])

This is both ugly and inefficient, although the inefficiency shouldn't bother you for small lists-of-lists. It is also equivalent to the simpler and slightly faster version:

sum(somelists, [])

All addition-based flatteners have the drawback that they only work on actual lists. If you have a list of tuples, or just a list of some sequence type that is not a list, you lose; list addition works only on two lists.

The best solution is a rare good use of a multi-level list comprehension:

[item for sublist in somelists for item in sublist]

This will work on any sequence type and for any mixture of sequence types (or iterables); you can have an iterable that returns a mixture of lists, tuples, and other iterables and everything will work out and you will get a list at the end.

The multi-level list comprehension approach is both the fastest and the most general, but it's kind of verbose and hard to follow if you're not completely up on list comprehensions. I want to like the sum() approach better, but it too is the kind of thing that makes me go 'oh, how clever', which is not necessarily a good idea in general code. All things considered, I'll probably use either the list comprehension approach or just the plain for loop one in future code.

PS: this is the kind of entry that I write in an attempt to make it all stick in my head. And in a demonstration that I repeat myself, the example I used in MultilevelListComps was for list flattening (although back in 2005 it appears that lists did not have .extend(), or at least it slipped my mind).

(I mined here and here for this information.)

Sidebar: the boring plain approach

The most verbose but straightforward approach is of course the classic for loop:

accum = []
for e in somelists:
    accum.extend(e)

Just as with the list comprehension version, this works with any sequence types or iterable types; it doesn't require lists. On advantage of this approach is that anyone (your future self included) can easily recognize what the code is doing.


Comments on this page:

From 87.194.56.231 at 2010-07-13 05:31:13:

Because I am often thick, I think I find this easier to read than the multi-level list comprehension:

import itertools
list(itertools.chain(*somelists))

David B.

From 87.194.56.231 at 2010-07-13 06:34:03:

Faster too for very large lists:

   $ python -mtimeit -s 'l = list(range(200) for x in range(999))' '[i for sublist in l for i in sublist]'
   100 loops, best of 3: 9.37 msec per loop
   $ python -mtimeit -s 'import itertools; l = list(range(200) for x in range(999))' 'list(itertools.chain(*l))'
   100 loops, best of 3: 5.12 msec per loop

Tests were done with Python 2.6.5 on Mac OS X 10.6.3.

David B.

From 65.172.155.230 at 2010-07-15 11:21:28:

FWIW simplest is fastest:

 % python -mtimeit -s 'l = list(range(200) for x in range(999))' '[i for sublist in l for i in sublist]'
 100 loops, best of 3: 13.2 msec per loop
 % python -mtimeit -s 'import itertools; l = list(range(200) for x in range(999))' 'list(itertools.chain(*l))'
 100 loops, best of 3: 4.12 msec per loop
 % python -mtimeit -s 'l = list(range(200) for x in range(999))' 'x=[]; [x.extend(sublist) for sublist in l]'
 1000 loops, best of 3: 1.28 msec per loop

...the last should be the normal for loop, but I can't write that in a way python will accept on one line (stupid invisible syntax).

I've also seen the same thing with .setdefault() ... it seems like it should be faster, and one of those things that python people will know about. But doing the test explicitly is slightly faster, and everyone will understand it.

From 199.255.189.186 at 2011-06-30 16:58:27:

I believe the last posters analysis is incorrect. The x.extend method does not iterate over the final list returning values, it only builds a list. The other methods will return all the values from the nested lists.

$ python -mtimeit -s 'l = list(range(200) for x in range(999))' 'x=[]; [x.extend(sublist) for sublist in l]' 1000 loops, best of 3: 817 usec per loop

$ python -mtimeit -s 'l = list(range(200) for x in range(999))' 'x=[]; [x.extend(sublist) for sublist in l]' 'for a in x: pass' 100 loops, best of 3: 4.2 msec per loop

$ python -mtimeit -s 'import itertools; l = list(range(200) for x in range(999))' 'list(itertools.chain(*l))' 100 loops, best of 3: 2.34 msec per loop

So if you want the items from the list itertools is still faster. And the normal for loop:

$ python -mtimeit -s 'l = list(range(200) for x in range(999))' 'for i in l:' ' for j in i: pass' 100 loops, best of 3: 3.44 msec per loop

Written on 13 July 2010.
« On (not) logging calculated statistics
Sun Support's habit of publicizing private bug reports »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Tue Jul 13 00:32:56 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.