## The Question :

*525 people think this question is useful*

How can I achieve the equivalents of SQL’s `IN`

and `NOT IN`

?

I have a list with the required values.
Here’s the scenario:

df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']})
countries_to_keep = ['UK', 'China']
# pseudo-code:
df[df['country'] not in countries_to_keep]

My current way of doing this is as follows:

df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']})
df2 = pd.DataFrame({'country': ['UK', 'China'], 'matched': True})
# IN
df.merge(df2, how='inner', on='country')
# NOT IN
not_in = df.merge(df2, how='left', on='country')
not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?

*The Question Comments :*

## The Answer 1

*968 people think this answer is useful*

You can use `pd.Series.isin`

.

For “IN” use: `something.isin(somewhere)`

Or for “NOT IN”: `~something.isin(somewhere)`

As a worked example:

import pandas as pd
>>> df
country
0 US
1 UK
2 Germany
3 China
>>> countries_to_keep
['UK', 'China']
>>> df.country.isin(countries_to_keep)
0 False
1 True
2 False
3 True
Name: country, dtype: bool
>>> df[df.country.isin(countries_to_keep)]
country
1 UK
3 China
>>> df[~df.country.isin(countries_to_keep)]
country
0 US
2 Germany

## The Answer 2

*84 people think this answer is useful*

Alternative solution that uses .query() method:

In [5]: df.query("countries in @countries")
Out[5]:
countries
1 UK
3 China
In [6]: df.query("countries not in @countries")
Out[6]:
countries
0 US
2 Germany

## The Answer 3

*53 people think this answer is useful*

## How to implement ‘in’ and ‘not in’ for a pandas DataFrame?

Pandas offers two methods: `Series.isin`

and `DataFrame.isin`

for Series and DataFrames, respectively.

## Filter DataFrame Based on ONE Column (also applies to Series)

The most common scenario is applying an `isin`

condition on a specific column to filter rows in a DataFrame.

df = pd.DataFrame({'countries': ['US', 'UK', 'Germany', np.nan, 'China']})
df
countries
0 US
1 UK
2 Germany
3 China
c1 = ['UK', 'China'] # list
c2 = {'Germany'} # set
c3 = pd.Series(['China', 'US']) # Series
c4 = np.array(['US', 'UK']) # array

`Series.isin`

accepts various types as inputs. The following are all valid ways of getting what you want:

df['countries'].isin(c1)
0 False
1 True
2 False
3 False
4 True
Name: countries, dtype: bool
# `in` operation
df[df['countries'].isin(c1)]
countries
1 UK
4 China
# `not in` operation
df[~df['countries'].isin(c1)]
countries
0 US
2 Germany
3 NaN

# Filter with `set` (tuples work too)
df[df['countries'].isin(c2)]
countries
2 Germany

# Filter with another Series
df[df['countries'].isin(c3)]
countries
0 US
4 China

# Filter with array
df[df['countries'].isin(c4)]
countries
0 US
1 UK

## Filter on MANY Columns

Sometimes, you will want to apply an ‘in’ membership check with some search terms over multiple columns,

df2 = pd.DataFrame({
'A': ['x', 'y', 'z', 'q'], 'B': ['w', 'a', np.nan, 'x'], 'C': np.arange(4)})
df2
A B C
0 x w 0
1 y a 1
2 z NaN 2
3 q x 3
c1 = ['x', 'w', 'p']

To apply the `isin`

condition to both columns “A” and “B”, use `DataFrame.isin`

:

df2[['A', 'B']].isin(c1)
A B
0 True True
1 False False
2 False False
3 False True

From this, **to retain rows where at least one column is **`True`

, we can use `any`

along the first axis:

df2[['A', 'B']].isin(c1).any(axis=1)
0 True
1 False
2 False
3 True
dtype: bool
df2[df2[['A', 'B']].isin(c1).any(axis=1)]
A B C
0 x w 0
3 q x 3

Note that if you want to search every column, you’d just omit the column selection step and do

df2.isin(c1).any(axis=1)

Similarly, **to retain rows where ALL columns are **`True`

, use `all`

in the same manner as before.

df2[df2[['A', 'B']].isin(c1).all(axis=1)]
A B C
0 x w 0

## Notable Mentions: `numpy.isin`

, `query`

, list comprehensions (string data)

In addition to the methods described above, you can also use the numpy equivalent: `numpy.isin`

.

# `in` operation
df[np.isin(df['countries'], c1)]
countries
1 UK
4 China
# `not in` operation
df[np.isin(df['countries'], c1, invert=True)]
countries
0 US
2 Germany
3 NaN

Why is it worth considering? NumPy functions are usually a bit faster than their pandas equivalents because of lower overhead. Since this is an elementwise operation that does not depend on index alignment, there are very few situations where this method is not an appropriate replacement for pandas’ `isin`

.

Pandas routines are usually iterative when working with strings, because string operations are hard to vectorise. There is a lot of evidence to suggest that list comprehensions will be faster here..
We resort to an `in`

check now.

c1_set = set(c1) # Using `in` with `sets` is a constant time operation...
# This doesn't matter for pandas because the implementation differs.
# `in` operation
df[[x in c1_set for x in df['countries']]]
countries
1 UK
4 China
# `not in` operation
df[[x not in c1_set for x in df['countries']]]
countries
0 US
2 Germany
3 NaN

It is a lot more unwieldy to specify, however, so don’t use it unless you know what you’re doing.

Lastly, there’s also `DataFrame.query`

which has been covered in this answer. numexpr FTW!

## The Answer 4

*15 people think this answer is useful*

I’ve been usually doing generic filtering over rows like this:

criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]

## The Answer 5

*10 people think this answer is useful*

Collating possible solutions from the answers:

For IN: `df[df['A'].isin([3, 6])]`

For NOT IN:

`df[-df["A"].isin([3, 6])]`

`df[~df["A"].isin([3, 6])]`

`df[df["A"].isin([3, 6]) == False]`

`df[np.logical_not(df["A"].isin([3, 6]))]`

## The Answer 6

*8 people think this answer is useful*

I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds

dfbc = dfbc[~dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID'])]

## The Answer 7

*4 people think this answer is useful*

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']

**implement in**:

df[df.countries.isin(countries)]

**implement not in** as in of rest countries:

df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]

## The Answer 8

*0 people think this answer is useful*

A trick if you want to keep the order of the list:

df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']})
countries_to_keep = ['Germany', 'US']
ind=[df.index[df['country']==i].tolist() for i in countries_to_keep]
flat_ind=[item for sublist in ind for item in sublist]
df.reindex(flat_ind)
country
2 Germany
0 US