Tuesday, February 16, 2010

Python: Iterating over a list vs. iterating over dict.items()

While cleaning up some code today, I was wondering: how efficient is Python's dict.items()? If we're usually iterating over a list of items but don't particularly care about their ordering, is it worth it to use a dict instead so we can benefit from an occasional key-based lookup? Here's the code:
# Build a long list and copy each member
a=range(30000)
[x for x in a]

# Build a list, put each member into a dict,
# and copy each member of the dict
a=range(30000)
b=dict((x,x) for x in a)
c=b.values()
[x for x in c]
Iteration over the dict's items is just a little bit slower:
$ timeit.py -s 'a=range(30000)' '[x for x in a]'
1000 loops, best of 3: 1.35 msec per loop
$ timeit.py -s 'a=range(30000)'\
            -s 'b=dict((x,x) for x in a)'\
            -s 'c=b.values()'\
               '[x for x in c]'
1000 loops, best of 3: 1.37 msec per loop
But obviously lookups in the dict will be much faster. There's extra memory usage too, of course. Still, not a bad deal. Of course, you don't want to put dict_name.values() inside the loop or the extra lookup on each pass will make performance much worse:
$ timeit.py -s 'a=range(30000)'\
            -s 'b=dict((x,x) for x in a)'\
               '[x for x in b.values()]'
100 loops, best of 3: 2.5 msec per loop

1 comment:

  1. Your interpretation here is a bit flawed; possible you're using py3k and didn't state it, but c=b.values() means it's a list. Meaning you're testing the same pathways- iterating over a list. Background noise/cache perturbation at best accounts for the difference, although I'd bet on noise personally.

    For the second test, probably helps to be clear that 1) it's a method invocation, 2) full list creation, 3) full dict iteration, for accomplishing it. So yeah, it would definitely be slower ;)

    ReplyDelete