. Advertisement .
..3..
. Advertisement .
..4..
To select rows from Pandas DataFrame, you can use common built-in tools for indexing and selection in this module. Find out who they are and how you can make use of them.
Select Rows From Pandas DataFrame
First, we will import data from a CSV file called mlb_players.csv by using read_csv(). It contains information about twenty players playing in Major League Baseball (MLB), including their name, team, position, height, weight, and age.
import pandas as pd
df = pd.read_csv('mlb_players.csv')
df
......... ADVERTISEMENT .........
..8..
Square Brackets
Let’s say you want to print only rows hosting players from team Colorado Rockies (COL). You can use the square brackets and the column name, then compare it to the record you want to match.
df[df['Team'] == 'COL']
......... ADVERTISEMENT .........
..8..
The statement above compares every value in the column ‘Team’ and prints out every row that has a matching value with ‘COL’. Pandas allows you to put this select condition in an object and invoke it from the DataFrame itself.
cond = df['Team'] == 'COL'
df[cond]
The first statement returns a Pandas Series featuring boolean values for all the rows. It will check the condition and yield the corresponding value. If the entry at the column matches the condition, it is set to True, and False otherwise. You can check this Series yourself.
>>> type(cond)
<class 'pandas.core.series.Series'>
>> cond
......... ADVERTISEMENT .........
..8..
A Series of boolean values is actually one of the methods of selecting rows and columns in Pandas. In this case, we only generate it on the fly by using the equal operator in Python.
A boolean array can also be created and used for row selection. You can achieve this by adding the property values in the condition.
cond = df['Team'].values == 'COL'
df[cond]
The function type() can verify the type of the condition object for you:
>>> type(cond)
<class 'numpy.ndarray'>
>> cond
......... ADVERTISEMENT .........
..8..
As you can see, the conditional statement with the property values creates a numpy array. Pandas can use this boolean array to select only rows that match your requirements.
You can expand the capabilities of this method by using other operators, depending on your needs. For instance, this statement returns players who are an outfielder and weigh at least 180 pounds.
cond = (df['Position'] == 'Outfielder') & (df['Weight'] >= 180)
df[cond]
......... ADVERTISEMENT .........
..8..
Another example of compound comparison is to use the | symbol as an OR operator. This is how you can filter players who are either at least 30 years old or shorter than 73 inches.
cond = (df['Age'] >= 30) | (df['Height'] < 73)
df[cond]
......... ADVERTISEMENT .........
..8..
On the other hand, this statement removes every player from the Pittsburgh Pirates and Colorado Rockies from the result.
cond = (df['Team'] != 'PIT') & (df['Team'] != 'COL')
df[cond]
......... ADVERTISEMENT .........
..8..
The Properties loc And iloc
loc and iloc are label-based and position-based properties, and they provide another method for row selection based on the entry value.
You can replace the statements above with this property to achieve the same thing:
cond = df['Team'].values == 'COL'
df.loc[cond]
......... ADVERTISEMENT .........
..8..
Remember that loc can also accept a Pandas Series as an argument. You can remove the property values, and it will still return the same result.
Similarly, you can use other comparisons with the property with little change. You can learn more selection with loc here.
cond = df['Team'] == 'COL'
df.loc[cond]
cond = (df['Position'] == 'Outfielder') & (df['Weight'] >= 180)
df.loc[cond]
cond = (df['Age'] >= 30) | (df['Height'] < 73)
df.loc[cond]
cond = (df['Team'] != 'PIT') & (df['Team'] != 'COL')
df.loc[cond]
However, since the property iloc is position-based, it only accepts integers as arguments. You will need to convert the Series or boolean array object into an array of indexes of the matching rows. This can be done with the function flatnonzero() from numpy.
cond = df['Team'] == 'COL'
df.iloc[np.flatnonzero(cond)]
cond = (df['Position'] == 'Outfielder') & (df['Weight'] >= 180)
df.iloc[np.flatnonzero(cond)]
cond = (df['Age'] >= 30) | (df['Height'] < 73)
df.iloc[np.flatnonzero(cond)]
cond = (df['Team'] != 'PIT') & (df['Team'] != 'COL')
df.iloc[np.flatnonzero(cond)]
Conclusion
You can use the square brackets and properties to select rows from Pandas DataFrame. They can accept a Series or array of boolean values to match rows based on your conditions.
Leave a comment