. Advertisement .
..3..
. Advertisement .
..4..
Depending on the need of your pandas DataFrame analytics, you may need to remove certain rows. Learn more about the ways Pandas drop rows from DataFrame with examples below.
Pandas Drop Rows From DataFrame
As with columns, you will need to rely on the DataFrame.drop() method to remove one or multiple rows from a DataFrame. It allows you to drop rows by specifying their indices or label names and the corresponding axis. Dropping rows from DataFrame with a multi-index is also supported (you can remove different labels using their levels).
The full syntax of this method:
DataFrame.drop(labels, axis, index, columns, level, inplace, errors)
- labels: this parameter specifies the labels of rows you want to drop. It can be a single label or a list-like object. Remember that tuples will be treated as a single label, not list-like. Its default value is None.
- axis: this is used to tell drop() you want to remove labels from ‘columns’ (or 1) or ‘index’ (or 0). Its default value is 0, meaning you don’t need to adjust the parameter when dropping rows from a DataFrame.
- index and columns: alternative shorthand parameters to labels and axes. Their default values are None. But ‘columns=labels’ and ‘index=labels’ are equivalent to ‘labels, axis=1’ and ‘labels, axis=0’ respectively.
- level: you will need to use this parameter when dropping rows from a multi-index DataFrame. As its name suggests, it determines how many levels from labels should be removed.
- inplace: by default, drop() leaves the current DataFrame intact. It means you will have to assign the return value if you want to hold the result. By switching the implace parameter to True, the drop() operation will happen in place without returning any DataFrame.
- errors: this parameter indicates whether drop() should ‘raise’ or ‘ignore’ errors.
The drop() method returns a DataFrame (unless the inplace parameter has been set to True).
We are going to use this CSV file to illustrate some examples. You will need to import the data to a DataFrame first:
>>> import pandas as pd
>>> df = pd.read_csv('mlb_players.csv')
......... ADVERTISEMENT .........
..8..
To drop the second row, you can use its index (remember that DataFrames are zero-indexed):
>>> df.drop([1])
Name Team Position Height Weight Age
0 Adam Donachie BAL Catcher 74 180 22.99
2 Orlando Cabrera ANA Shortstop 70 190 32.33
3 Dustin Pedroia BOS Second Baseman 69 180 23.54
4 Andy Marte CLE Third Baseman 73 185 23.36
This has an equivalent expression:
>>> df.drop(df.index[1])
If you want to operate directly on the DataFrame, enable the inplace parameter:
>>> df.drop([1], inplace=True)
To drop multiple rows:
>>> df.drop([1, 3])
>>> df.drop([df.index[1], df.index[3]])
Name Team Position Height Weight Age
0 Adam Donachie BAL Catcher 74 180 22.99
2 Orlando Cabrera ANA Shortstop 70 190 32.33
4 Andy Marte CLE Third Baseman 73 185 23.36
You can also use slicing to drop a range of rows:
>>> df.drop(df.index[1:4])
Name Team Position Height Weight Age
0 Adam Donachie BAL Catcher 74 180 22.99
4 Andy Marte CLE Third Baseman 73 185 23.36
You can also use iloc[] to drop rows after a position (similar to column selection). This property allows you to specify rows using their position indices. A semicolon can be used to indicate the start and end positions.
For instance, to drop everything after the second row:
>>> df.iloc[:2]
Name Team Position Height Weight Age
0 Adam Donachie BAL Catcher 74 180 22.99
1 Paul Konerko CWS First Baseman 74 215 30.99
Rows in a DataFrame can also be dropped based on certain conditions, such as values of specific columns. Let’s say you want to remove players 30 years old or older:
>>> df.drop(df[df['Age'] >= 30].index)
Name Team Position Height Weight Age
0 Adam Donachie BAL Catcher 74 180 22.99
3 Dustin Pedroia BOS Second Baseman 69 180 23.54
4 Andy Marte CLE Third Baseman 73 185 23.36
In this expression, the df[‘Age’] >= 30 part determines your condition to remove rows, while df[].index chooses the rows based on that condition.
Conclusion
In Pandas drop rows from DataFrame can be done by the built-in method drop(). It supports the removal of one or multiple rows by using their indices or other conditions.
Leave a comment