. Advertisement .
..3..
. Advertisement .
..4..
Python undoubtedly doesn’t support Null or NaN values. NaN means Not A Number, which is one of the most common ways to represent a missing data value. NaN values will surely cause problems in the data analyzing process. Thus, it is vital to drop all rows containing this value.
There are various methods to remove a row with None or NaN values in a Pandas dataframe. This tutorial will introduce you to the detailed steps for different approaches to drop rows with NaN values.
How To Drop Rows With NaN Values In Pandas DataFrame
With dropna()
Df.dropna()
is the syntax to drop rows with None values. Before starting, let’s create quick drop rows with NaN values:
import pandas as pd
df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'],
'values_2': ['DDD','150','350','400','5000']
})
print (df)
Here is the dataframe you receive:
values_1 values_2
0 700 DDD
1 ABC 150
2 500 350
3 XYZ 400
4 1200 5000
In this example, the data includes both non-numeric and numeric values. Use the to_numeric
function to convert the dataset’s values into a float format.
Yet, three non-numeric values will return NaN.
values_1 values_2
0 700.0 NaN
1 NaN 150.0
2 500.0 350.0
3 NaN 400.0
4 1200.0 5000.0
The dropna()
function will return the dataframe’s copy by default. So use inplace = True
to remove the existing dataframe.
import pandas as pd
df = pd.DataFrame({'values_1': ['700','ABC','500','XYZ','1200'],
'values_2': ['DDD','150','350','400','5000']
})
df = df.apply (pd.to_numeric, errors='coerce')
df = df.dropna()
print (df)
Run this code and there will only two rows without NaN values in the dataframe:
values_1 values_2
2 500.0 350.0
4 1200.0 5000.0
Those rows have no sequential index, which is 2 and 4 respectively. You can reset it and start from 0 by the df.reset_index(drop=True)
command.
On the other hand, you can also make use of the axis = 0 code as a parameter to remove these rows. dropna( axis = 1) is used to drop all columns.
With notna()
Use the notna()
function as a filter to wipe away the rows with NaN values.
df = df[df['C'].notna()]
how Parameter
The how parameter allows you to choose the number of rows to remove. By default, this option removes all rows with NaN or None values. Type in the number of rows you want to remove.
The how = ‘all’
command can wipe away all values in one row:
# Drop rows that has all NaN values
df2=df.dropna(how='all')
print(df2)
Output:
Courses Fee Duration Discount
0 Spark 20000.0 30days 1000.0
1 Java NaN NaN NaN
2 Hadoop 26000.0 35days 2500.0
3 Python 24000.0 40days NaN
Subset Parameter
Sometimes, you only need to drop rows, with which columns show NaN values in the dataset. You can do this by using the subset parameter. This one takes the label name list:
# Drop rows that has NaN values on selected columns
df2=df.dropna(subset=['Courses','Fee'])
print(df2)
Output:
Courses Fee Duration Discount
0 Spark 20000.0 30days 1000.0
2 Hadoop 26000.0 35days 2500.0
3 Python 24000.0 40days NaN
Conclusion
This article covers various methods to drop rows with NaN values in Pandas dataframe. Depending on your purposes and dataset, you should choose a suitable approach.
Leave a comment