## The Question :

*529 people think this question is useful*

In Python Pandas, what’s the best way to check whether a DataFrame has one (or more) NaN values?

I know about the function `pd.isnan`

, but this returns a DataFrame of booleans for each element. This post right here doesn’t exactly answer my question either.

*The Question Comments :*

## The Answer 1

*639 people think this answer is useful*

jwilner‘s response is spot on. I was exploring to see if there’s a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:

df.isnull().values.any()

import numpy as np
import pandas as pd
import perfplot
def setup(n):
df = pd.DataFrame(np.random.randn(n))
df[df > 0.9] = np.nan
return df
def isnull_any(df):
return df.isnull().any()
def isnull_values_sum(df):
return df.isnull().values.sum() > 0
def isnull_sum(df):
return df.isnull().sum() > 0
def isnull_values_any(df):
return df.isnull().values.any()
perfplot.save(
"out.png",
setup=setup,
kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any],
n_range=[2 ** k for k in range(25)],
)

`df.isnull().sum().sum()`

is a bit slower, but of course, has additional information — the number of `NaNs`

.

## The Answer 2

*188 people think this answer is useful*

You have a couple of options.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

Now the data frame looks something like this:

0 1 2 3 4 5
0 0.520113 0.884000 1.260966 -0.236597 0.312972 -0.196281
1 -0.837552 NaN 0.143017 0.862355 0.346550 0.842952
2 -0.452595 NaN -0.420790 0.456215 1.203459 0.527425
3 0.317503 -0.917042 1.780938 -1.584102 0.432745 0.389797
4 -0.722852 1.704820 -0.113821 -1.466458 0.083002 0.011722
5 -0.622851 -0.251935 -1.498837 NaN 1.098323 0.273814
6 0.329585 0.075312 -0.690209 -3.807924 0.489317 -0.841368
7 -1.123433 -1.187496 1.868894 -2.046456 -0.949718 NaN
8 1.133880 -0.110447 0.050385 -1.158387 0.188222 NaN
9 -0.513741 1.196259 0.704537 0.982395 -0.585040 -1.693810

**Option 1**: `df.isnull().any().any()`

– This returns a boolean value

You know of the `isnull()`

which would return a dataframe like this:

0 1 2 3 4 5
0 False False False False False False
1 False True False False False False
2 False True False False False False
3 False False False False False False
4 False False False False False False
5 False False False True False False
6 False False False False False False
7 False False False False False True
8 False False False False False True
9 False False False False False False

If you make it `df.isnull().any()`

, you can find just the columns that have `NaN`

values:

0 False
1 True
2 False
3 True
4 False
5 True
dtype: bool

One more `.any()`

will tell you if any of the above are `True`

> df.isnull().any().any()
True

**Option 2**: `df.isnull().sum().sum()`

– This returns an integer of the total number of `NaN`

values:

This operates the same way as the `.any().any()`

does, by first giving a summation of the number of `NaN`

values in a column, then the summation of those values:

df.isnull().sum()
0 0
1 2
2 0
3 1
4 0
5 2
dtype: int64

Finally, to get the total number of NaN values in the DataFrame:

df.isnull().sum().sum()
5

## The Answer 3

*69 people think this answer is useful*

To find out which rows have NaNs in a specific column:

nan_rows = df[df['name column'].isnull()]

## The Answer 4

*54 people think this answer is useful*

If you need to know how many rows there are with “one or more `NaN`

s”:

df.isnull().T.any().T.sum()

Or if you need to pull out these rows and examine them:

nan_rows = df[df.isnull().T.any()]

## The Answer 5

*41 people think this answer is useful*

`df.isnull().any().any()`

should do it.

## The Answer 6

*20 people think this answer is useful*

Adding to Hobs brilliant answer, I am very new to Python and Pandas so please point out if I am wrong.

To find out which rows have NaNs:

nan_rows = df[df.isnull().any(1)]

would perform the same operation without the need for transposing by specifying the axis of any() as 1 to check if ‘True’ is present in rows.

## The Answer 7

*17 people think this answer is useful*

# Super Simple Syntax: `df.isna().any(axis=None)`

Starting from v0.23.2, you can use `DataFrame.isna`

+ `DataFrame.any(axis=None)`

where `axis=None`

specifies logical reduction over the entire DataFrame.

# Setup
df = pd.DataFrame({'A': [1, 2, np.nan], 'B' : [np.nan, 4, 5]})
df
A B
0 1.0 NaN
1 2.0 4.0
2 NaN 5.0

df.isna()
A B
0 False True
1 False False
2 True False
df.isna().any(axis=None)
# True

# Useful Alternatives

`numpy.isnan`

Another performant option if you’re running older versions of pandas.

np.isnan(df.values)
array([[False, True],
[False, False],
[ True, False]])
np.isnan(df.values).any()
# True

Alternatively, check the sum:

np.isnan(df.values).sum()
# 2
np.isnan(df.values).sum() > 0
# True

`Series.hasnans`

You can also iteratively call `Series.hasnans`

. For example, to check if a single column has NaNs,

df['A'].hasnans
# True

And to check if *any* column has NaNs, you can use a comprehension with `any`

(which is a short-circuiting operation).

any(df.hasnans for c in df)
# True

This is actually *very* fast.

## The Answer 8

*10 people think this answer is useful*

Since none have mentioned, there is just another variable called `hasnans`

.

`df[i].hasnans`

will output to `True`

if one or more of the values in the pandas Series is NaN, `False`

if not. Note that its not a function.

pandas version ‘0.19.2’ and ‘0.20.2’

## The Answer 9

*10 people think this answer is useful*

let `df`

be the name of the Pandas DataFrame and any value that is `numpy.nan`

is a null value.

If you want to see which columns has nulls and which do not(just True and False)

df.isnull().any()

If you want to see only the columns that has nulls

df.loc[:, df.isnull().any()].columns

If you want to see the count of nulls in every column

df.isna().sum()

If you want to see the percentage of nulls in every column

df.isna().sum()/(len(df))*100

If you want to see the percentage of nulls in columns only with nulls:

df.loc[:,list(df.loc[:,df.isnull().any()].columns)].isnull().sum()/(len(df))*100

EDIT 1:
If you want to see where your data is missing visually:

import missingno
missingdata_df = df.columns[df.isnull().any()].tolist()
missingno.matrix(df[missingdata_df])

## The Answer 10

*7 people think this answer is useful*

Since `pandas`

has to find this out for `DataFrame.dropna()`

, I took a look to see how they implement it and discovered that they made use of `DataFrame.count()`

, which counts all non-null values in the `DataFrame`

. Cf. pandas source code. I haven’t benchmarked this technique, but I figure the authors of the library are likely to have made a wise choice for how to do it.

## The Answer 11

*6 people think this answer is useful*

df.isnull().sum()

This will give you count of all NaN values present in the respective coloums of the DataFrame.

## The Answer 12

*4 people think this answer is useful*

I’ve been using the following and type casting it to a string and checking for the nan value

(str(df.at[index, 'column']) == 'nan')

This allows me to check specific value in a series and not just return if this is contained somewhere within the series.

## The Answer 13

*3 people think this answer is useful*

Just using
math.isnan(x), Return True if x is a NaN (not a number), and False otherwise.

## The Answer 14

*3 people think this answer is useful*

Here is another interesting way of finding null and replacing with a calculated value

#Creating the DataFrame
testdf = pd.DataFrame({'Tenure':[1,2,3,4,5],'Monthly':[10,20,30,40,50],'Yearly':[10,40,np.nan,np.nan,250]})
>>> testdf2
Monthly Tenure Yearly
0 10 1 10.0
1 20 2 40.0
2 30 3 NaN
3 40 4 NaN
4 50 5 250.0
#Identifying the rows with empty columns
nan_rows = testdf2[testdf2['Yearly'].isnull()]
>>> nan_rows
Monthly Tenure Yearly
2 30 3 NaN
3 40 4 NaN
#Getting the rows# into a list
>>> index = list(nan_rows.index)
>>> index
[2, 3]
# Replacing null values with calculated value
>>> for i in index:
testdf2['Yearly'][i] = testdf2['Monthly'][i] * testdf2['Tenure'][i]
>>> testdf2
Monthly Tenure Yearly
0 10 1 10.0
1 20 2 40.0
2 30 3 90.0
3 40 4 160.0
4 50 5 250.0

## The Answer 15

*2 people think this answer is useful*

The best would be to use:

df.isna().any().any()

Here is why. So `isna()`

is used to define `isnull()`

, but both of these are identical of course.

This is even faster than the accepted answer and covers all 2D panda arrays.

## The Answer 16

*2 people think this answer is useful*

We can see the null values present in the dataset by generating heatmap using seaborn moduleheatmap

import pandas as pd
import seaborn as sns
dataset=pd.read_csv('train.csv')
sns.heatmap(dataset.isnull(),cbar=False)

## The Answer 17

*1 people think this answer is useful*

Or you can use `.info()`

on the `DF`

such as :

`df.info(null_counts=True)`

which returns the number of non_null rows in a columns such as:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3276314 entries, 0 to 3276313
Data columns (total 10 columns):
n_matches 3276314 non-null int64
avg_pic_distance 3276314 non-null float64

## The Answer 18

*1 people think this answer is useful*

import missingno as msno
msno.matrix(df) # just to visualize. no missing value.

## The Answer 19

*0 people think this answer is useful*

df.apply(axis=0, func=lambda x : any(pd.isnull(x)))

Will check for each column if it contains Nan or not.

## The Answer 20

*0 people think this answer is useful*

To check for NaN values in python 3 :

import pandas as pd
s=pd.Series([1,2,3,4,5])
print(s.hasnans)

The output will be :

False

## The Answer 21

*-1 people think this answer is useful*

You could not only check if any ‘NaN’ exist but also get the percentage of ‘NaN’s in each column using the following,

df = pd.DataFrame({'col1':[1,2,3,4,5],'col2':[6,np.nan,8,9,10]})
df
col1 col2
0 1 6.0
1 2 NaN
2 3 8.0
3 4 9.0
4 5 10.0
df.isnull().sum()/len(df)
col1 0.0
col2 0.2
dtype: float64

## The Answer 22

*-2 people think this answer is useful*

Depending on the type of data you’re dealing with, you could also just get the value counts of each column while performing your EDA by setting dropna to False.

for col in df:
print df[col].value_counts(dropna=False)

Works well for categorical variables, not so much when you have many unique values.