Oh Python

Suppose you have a list of objects that you need to iterate over two consecutive items at a time.

An old stackoverflow question for this leads to the a quote from the documentation that reads:

This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n).

So the solution would be:

k = [1, 2, 3, 4, 5, 6]
# [(1, 2), (3, 4), (5, 6)]

That is cryptic! Let’s break it down to understand why this works.

  1. Let’s start with the inner most bit. iter simply returns an iterator object. For lists we would normally just write for x in alist to iterate over the list, but under the hood an iterator is defined with each loop fetching the next item using a next call.
>>> iter(k)
<list_iterator object at 0x7fcf654c9f28>
  1. Next we consider [iter(k)]*2 - the multiplication here creates a shallow copy of the list.
>>> [iter(k)] * 2
[<list_iterator object at 0x7fcf654c9f28>, <list_iterator object at 0x7fcf654c9f28>]
  1. The star operator * then unpacks the collection as positional arguments to a function which is zip in this case. zip is a handy tool to merge several iterable together.
>>> zip(*[iter(k)] * 2)
<zip object at 0x7fcf654de808>
  1. Finally, the list operator just runs through to generate the entire list, giving us the desired output.
>>> list(zip(*[iter(k)] * 2))
[(1, 2), (3, 4), (5, 6)]

What’s strange about all this is that it depends on subtle behaviours of the underlying methods. For example, instead of zip(*[iter(k)] * 2) you wrote list(zip(*[iter(k), iter(k)])). You will end up with a different result. The solution depends on the iterators being a shallow copy! Each time any of the iterator is hit, it calls the next call to the function.

Show, don’t tell

I’d hate to encounter snippets like this in the wild as it places significant cognitive load on people trying to read this. Strange it was included in the official 2.x documentation, thankfully removed from the current versions.

Rahul Nair
Rahul Nair
Research Staff Member