# python – Remove all the elements that occur in one list from another

## The Question :

403 people think this question is useful

Let’s say I have two lists, l1 and l2. I want to perform l1 - l2, which returns all elements of l1 not in l2.

I can think of a naive loop approach to doing this, but that is going to be really inefficient. What is a pythonic and efficient way of doing this?

As an example, if I have l1 = [1,2,6,8] and l2 = [2,3,5,8], l1 - l2 should return [1,6]

• Just a tip: PEP8 states that lowercase “L” should not be used because it looks too much like a 1.
• I agree. I read this whole question and the answers wondering why people kept using eleven and twelve. It was only when I read @spelchekr ‘s comment that it made sense.
• Possible duplicate of dropping rows from dataframe based on a “not in” condition
• @JimG. Dataframe and list is not the same thing.

546 people think this answer is useful

Python has a language feature called List Comprehensions that is perfectly suited to making this sort of thing extremely easy. The following statement does exactly what you want and stores the result in l3:

l3 = [x for x in l1 if x not in l2]



l3 will contain [1, 6].

169 people think this answer is useful

One way is to use sets:

>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])



Note, however, that sets do not preserve the order of elements, and cause any duplicated elements to be removed. The elements also need to be hashable. If these restrictions are tolerable, this may often be the simplest and highest performance option.

44 people think this answer is useful

As an alternative, you may also use filter with the lambda expression to get the desired result. For example:

>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])

#     v  filter returns the a iterator object. Here I'm type-casting
#     v  it to list in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]



Performance Comparison

Here I am comparing the performance of all the answers mentioned here. As expected, Arkku’s set based operation is fastest.

PS: set does not maintain the order and removes the duplicate elements from the list. Hence, do not use set difference if you need any of these.

32 people think this answer is useful

Expanding on Donut’s answer and the other answers here, you can get even better results by using a generator comprehension instead of a list comprehension, and by using a set data structure (since the in operator is O(n) on a list but O(1) on a set).

So here’s a function that would work for you:

def filter_list(full_list, excludes):
s = set(excludes)
return (x for x in full_list if x not in s)



The result will be an iterable that will lazily fetch the filtered list. If you need a real list object (e.g. if you need to do a len() on the result), then you can easily build a list like so:

filtered_list = list(filter_list(full_list, excludes))



29 people think this answer is useful

Use the Python set type. That would be the most Pythonic. 🙂

Also, since it’s native, it should be the most optimized method too.

See:

http://docs.python.org/library/stdtypes.html#set

http://docs.python.org/library/sets.htm (for older python)

# Using Python 2.7 set literal format.
# Otherwise, use: l1 = set([1,2,6,8])
#
l1 = {1,2,6,8}
l2 = {2,3,5,8}
l3 = l1 - l2



12 people think this answer is useful

use Set Comprehensions {x for x in l2} or set(l2) to get set, then use List Comprehensions to get list

l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]



benchmark test code:

import time

l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))

l2set = {x for x in l2}

tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)

tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)

print("speedup %fx"%(difflist/diffset))



benchmark test result:

0.0015058517456054688
3.968189239501953
speedup 2635.179227x



7 people think this answer is useful

Alternate Solution :

reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])



0 people think this answer is useful

# Sets versus list comprehension benchmark on Python 3.8

tldr: Use Arkku’s set solution, it’s even faster than promised in comparison!

## Checking existing files against a list

In my example I found it to be 40 times (!) faster to use Arkku’s set solution than the pythonic list comprehension for a real world application of checking existing filenames against a list.

### List comprehension:

%%time
import glob
existing = [int(os.path.basename(x).split(".")[0]) for x in glob.glob("*.txt")]
wanted = list(range(1, 100000))
[i for i in wanted if i not in existing]



Wall time: 28.2 s

### Sets

%%time
import glob
existing = [int(os.path.basename(x).split(".")[0]) for x in glob.glob("*.txt")]
wanted = list(range(1, 100000))
set(wanted) - set(existing)



Wall time: 689 ms

Tags:,