python – Generator expressions vs. list comprehensions

The Question :

433 people think this question is useful

When should you use generator expressions and when should you use list comprehensions in Python?

# Generator expression
(x*2 for x in range(256))

# List comprehension
[x*2 for x in range(256)]

The Question Comments :
  • could [exp for x in iter] just be sugar for list((exp for x in iter)) ? or is there an execution difference ?
  • it think I had a relevant question, so when using yield can we use just the generator expression from a function or we have to use yield for a function to return generator object ?
  • @b0fh Very late answer to your comment: in Python2 there is a tiny difference, the loop variable will leak out of a list comprehension, while a generator expression will not leak. Compare X = [x**2 for x in range(5)]; print x with Y = list(y**2 for y in range(5)); print y, the second will give an error. In Python3, a list comprehension is indeed the syntactic sugar for a generator expression fed to list() as you expected, so the loop variable will no longer leak out.
  • I’d suggest reading PEP 0289. Summed up by “This PEP introduces generator expressions as a high performance, memory efficient generalization of list comprehensions and generators”. It also has useful examples of when to use them.
  • @icc97 I’m also eight years late to the party, and the PEP link was perfect. Thanks for making that easy to find!

The Answer 1

297 people think this answer is useful

John’s answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it’s also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won’t work:

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

Basically, use a generator expression if all you’re doing is iterating once. If you want to store and use the generated results, then you’re probably better off with a list comprehension.

Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.

The Answer 2

193 people think this answer is useful

Iterating over the generator expression or the list comprehension will do the same thing. However, the list comprehension will create the entire list in memory first while the generator expression will create the items on the fly, so you are able to use it for very large (and also infinite!) sequences.

The Answer 3

108 people think this answer is useful

Use list comprehensions when the result needs to be iterated over multiple times, or where speed is paramount. Use generator expressions where the range is large or infinite.

See Generator expressions and list comprehensions for more info.

The Answer 4

62 people think this answer is useful

The important point is that the list comprehension creates a new list. The generator creates a an iterable object that will “filter” the source material on-the-fly as you consume the bits.

Imagine you have a 2TB log file called “hugefile.txt”, and you want the content and length for all the lines that start with the word “ENTRY”.

So you try starting out by writing a list comprehension:

logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]

This slurps up the whole file, processes each line, and stores the matching lines in your array. This array could therefore contain up to 2TB of content. That’s a lot of RAM, and probably not practical for your purposes.

So instead we can use a generator to apply a “filter” to our content. No data is actually read until we start iterating over the result.

logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))

Not even a single line has been read from our file yet. In fact, say we want to filter our result even further:

long_entries = ((line,length) for (line,length) in entry_lines if length > 80)

Still nothing has been read, but we’ve specified now two generators that will act on our data as we wish.

Lets write out our filtered lines to another file:

outfile = open("filtered.txt","a")
for entry,length in long_entries:
    outfile.write(entry)

Now we read the input file. As our for loop continues to request additional lines, the long_entries generator demands lines from the entry_lines generator, returning only those whose length is greater than 80 characters. And in turn, the entry_lines generator requests lines (filtered as indicated) from the logfile iterator, which in turn reads the file.

So instead of “pushing” data to your output function in the form of a fully-populated list, you’re giving the output function a way to “pull” data only when its needed. This is in our case much more efficient, but not quite as flexible. Generators are one way, one pass; the data from the log file we’ve read gets immediately discarded, so we can’t go back to a previous line. On the other hand, we don’t have to worry about keeping data around once we’re done with it.

The Answer 5

48 people think this answer is useful

The benefit of a generator expression is that it uses less memory since it doesn’t build the whole list at once. Generator expressions are best used when the list is an intermediary, such as summing the results, or creating a dict out of the results.

For example:

sum(x*2 for x in xrange(256))

dict( (k, some_func(k)) for k in some_list_of_keys )

The advantage there is that the list isn’t completely generated, and thus little memory is used (and should also be faster)

You should, though, use list comprehensions when the desired final product is a list. You are not going to save any memeory using generator expressions, since you want the generated list. You also get the benefit of being able to use any of the list functions like sorted or reversed.

For example:

reversed( [x*2 for x in xrange(256)] )

The Answer 6

16 people think this answer is useful

When creating a generator from a mutable object (like a list) be aware that the generator will get evaluated on the state of the list at time of using the generator, not at time of the creation of the generator:

>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing

If there is any chance of your list getting modified (or a mutable object inside that list) but you need the state at creation of the generator you need to use a list comprehension instead.

The Answer 7

4 people think this answer is useful

Sometimes you can get away with the tee function from itertools, it returns multiple iterators for the same generator that can be used independently.

The Answer 8

4 people think this answer is useful

I’m using the Hadoop Mincemeat module. I think this is a great example to take a note of:

import mincemeat

def mapfn(k,v):
    for w in v:
        yield 'sum',w
        #yield 'count',1


def reducefn(k,v): 
    r1=sum(v)
    r2=len(v)
    print r2
    m=r1/r2
    std=0
    for i in range(r2):
       std+=pow(abs(v[i]-m),2)  
    res=pow((std/r2),0.5)
    return r1,r2,res

Here the generator gets numbers out of a text file (as big as 15GB) and applies simple math on those numbers using Hadoop’s map-reduce. If I had not used the yield function, but instead a list comprehension, it would have taken a much longer time calculating the sums and average (not to mention the space complexity).

Hadoop is a great example for using all the advantages of Generators.

The Answer 9

0 people think this answer is useful

Python 3.7:

List comprehensions are faster.

enter image description here

Generators are more memory efficient. enter image description here

As all others have said, if you’re looking to scale infinite data, you’ll need a generator eventually. For relatively static small and medium-sized jobs where speed is necessary, a list comprehension is best.

Add a Comment