## The Question :

*461 people think this question is useful*

Having issue filtering my result dataframe with an `or`

condition. I want my result `df`

to extract all column `var`

values that are above 0.25 and below -0.25.

This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations. What is happening here? not sure where to use the suggested `a.empty(), a.bool(), a.item(),a.any() or a.all()`

.

result = result[(result['var']>0.25) or (result['var']<-0.25)]

*The Question Comments :*

## The Answer 1

*690 people think this answer is useful*

The `or`

and `and`

python statements require `truth`

-values. For `pandas`

these are considered ambiguous so you should use “bitwise” `|`

(or) or `&`

(and) operations:

result = result[(result['var']>0.25) | (result['var']<-0.25)]

These are overloaded for these kind of datastructures to yield the element-wise `or`

(or `and`

).

Just to add some more explanation to this statement:

The exception is thrown when you want to get the `bool`

of a `pandas.Series`

:

>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What you hit was a place where the operator **implicitly** converted the operands to `bool`

(you used `or`

but it also happens for `and`

, `if`

and `while`

):

>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Besides these 4 statements there are several python functions that hide some `bool`

calls (like `any`

, `all`

, `filter`

, …) these are normally not problematic with `pandas.Series`

but for completeness I wanted to mention these.

In your case the exception isn’t really helpful, because it doesn’t mention the **right alternatives**. For `and`

and `or`

you can use (if you want element-wise comparisons):

`numpy.logical_or`

:

>>> import numpy as np
>>> np.logical_or(x, y)

or simply the `|`

operator:

>>> x | y

`numpy.logical_and`

:

>>> np.logical_and(x, y)

or simply the `&`

operator:

>>> x & y

If you’re using the operators then make sure you set your parenthesis correctly because of the operator precedence.

There are several logical numpy functions which *should* work on `pandas.Series`

.

The alternatives mentioned in the Exception are more suited if you encountered it when doing `if`

or `while`

. I’ll shortly explain each of these:

If you want to check if your Series is **empty**:

>>> x = pd.Series([])
>>> x.empty
True
>>> x = pd.Series([1])
>>> x.empty
False

Python normally interprets the `len`

gth of containers (like `list`

, `tuple`

, …) as truth-value if it has no explicit boolean interpretation. So if you want the python-like check, you could do: `if x.size`

or `if not x.empty`

instead of `if x`

.

If your `Series`

contains **one and only one** boolean value:

>>> x = pd.Series([100])
>>> (x > 50).bool()
True
>>> (x < 50).bool()
False

If you want to check the **first and only item** of your Series (like `.bool()`

but works even for not boolean contents):

>>> x = pd.Series([100])
>>> x.item()
100

If you want to check if **all** or **any** item is not-zero, not-empty or not-False:

>>> x = pd.Series([0, 1, 2])
>>> x.all() # because one element is zero
False
>>> x.any() # because one (or more) elements are non-zero
True

## The Answer 2

*50 people think this answer is useful*

For boolean logic, use `&`

and `|`

.

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
>>> df
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863

To see what is happening, you get a column of booleans for each comparison, e.g.

df.C > 0.25
0 True
1 False
2 False
3 True
4 True
Name: C, dtype: bool

When you have multiple criteria, you will get multiple columns returned. This is why the join logic is ambiguous. Using `and`

or `or`

treats each column separately, so you first need to reduce that column to a single boolean value. For example, to see if any value or all values in each of the columns is True.

# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True
# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False

One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic.

>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863

For more details, refer to Boolean Indexing in the docs.

## The Answer 3

*28 people think this answer is useful*

Well pandas use bitwise `&`

`|`

and each condition should be wrapped in a `()`

For example following works

data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]

But the same query without proper brackets does not

data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]

## The Answer 4

*10 people think this answer is useful*

Or, alternatively, you could use Operator module. More detailed information is here Python docs

import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.4438

## The Answer 5

*3 people think this answer is useful*

This excellent answer explains very well what is happening and provides a solution. I would like to add another solution that might be suitable in similar cases: using the `query`

method:

result = result.query("(var > 0.25) or (var < -0.25)")

See also http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query.

(Some tests with a dataframe I’m currently working with suggest that this method is a bit slower than using the bitwise operators on series of booleans: 2 ms vs. 870 µs)

**A piece of warning**: At least one situation where this is not straightforward is when column names happen to be python expressions. I had columns named `WT_38hph_IP_2`

, `WT_38hph_input_2`

and `log2(WT_38hph_IP_2/WT_38hph_input_2)`

and wanted to perform the following query: `"(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"`

I obtained the following exception cascade:

`KeyError: 'log2'`

`UndefinedVariableError: name 'log2' is not defined`

`ValueError: "log2" is not a supported function`

I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column.

A possible workaround is proposed here.

## The Answer 6

*1 people think this answer is useful*

I encountered the same error and got stalled with a pyspark dataframe for few days, *I was able to resolve it successfully by filling na values with 0* since I was comparing integer values from 2 fields.

## The Answer 7

*1 people think this answer is useful*

You need to use bitwise operators `|`

instead of `or`

and `&`

instead of `and`

in pandas, you can’t simply use the bool statements from python.

For much complex filtering create a `mask`

and apply the mask on the dataframe.

Put all your query in the mask and apply it.

Suppose,

mask = (df["col1"]>=df["col2"]) & (stock["col1"]<=df["col2"])
df_new = df[mask]

## The Answer 8

*0 people think this answer is useful*

One minor thing, which wasted my time.

Put the conditions(if comparing using ” = “, ” != “) in parenthesis, failing to do so also raises this exception.
This will work

df[(some condition) conditional operator (some conditions)]

This will not

df[some condition conditional-operator some condition]

## The Answer 9

*0 people think this answer is useful*

I’ll try to give the benchmark of the three most common way (also mentioned above):

from timeit import repeat
setup = """
import numpy as np;
import random;
x = np.linspace(0,100);
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
"""
stmts = 'x[(x > lb) * (x <= ub)]', 'x[(x > lb) & (x <= ub)]', 'x[np.logical_and(x > lb, x <= ub)]'
for _ in range(3):
for stmt in stmts:
t = min(repeat(stmt, setup, number=100_000))
print('%.4f' % t, stmt)
print()

result:

0.4808 x[(x > lb) * (x <= ub)]
0.4726 x[(x > lb) & (x <= ub)]
0.4904 x[np.logical_and(x > lb, x <= ub)]
0.4725 x[(x > lb) * (x <= ub)]
0.4806 x[(x > lb) & (x <= ub)]
0.5002 x[np.logical_and(x > lb, x <= ub)]
0.4781 x[(x > lb) * (x <= ub)]
0.4336 x[(x > lb) & (x <= ub)]
0.4974 x[np.logical_and(x > lb, x <= ub)]

But, `*`

is not supported in Panda Series, and NumPy Array is faster than pandas data frame (arround 1000 times slower, see number):

from timeit import repeat
setup = """
import numpy as np;
import random;
import pandas as pd;
x = pd.DataFrame(np.linspace(0,100));
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
"""
stmts = 'x[(x > lb) & (x <= ub)]', 'x[np.logical_and(x > lb, x <= ub)]'
for _ in range(3):
for stmt in stmts:
t = min(repeat(stmt, setup, number=100))
print('%.4f' % t, stmt)
print()

result:

0.1964 x[(x > lb) & (x <= ub)]
0.1992 x[np.logical_and(x > lb, x <= ub)]
0.2018 x[(x > lb) & (x <= ub)]
0.1838 x[np.logical_and(x > lb, x <= ub)]
0.1871 x[(x > lb) & (x <= ub)]
0.1883 x[np.logical_and(x > lb, x <= ub)]

Note: adding one line of code `x = x.to_numpy()`

will need about 20 µs.

For those who prefer `%timeit`

:

import numpy as np
import random
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
lb, ub
x = pd.DataFrame(np.linspace(0,100))
def asterik(x):
x = x.to_numpy()
return x[(x > lb) * (x <= ub)]
def and_symbol(x):
x = x.to_numpy()
return x[(x > lb) & (x <= ub)]
def numpy_logical(x):
x = x.to_numpy()
return x[np.logical_and(x > lb, x <= ub)]
for i in range(3):
%timeit asterik(x)
%timeit and_symbol(x)
%timeit numpy_logical(x)
print('\n')

result:

23 µs ± 3.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
35.6 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
31.3 µs ± 8.9 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
21.4 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
21.9 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
21.7 µs ± 500 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
25.1 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
36.8 µs ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
28.2 µs ± 5.97 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)