. Advertisement .
..3..
. Advertisement .
..4..
The GroupBy function on Pandas is very simple to use. All you need is to group all data in different columns. Then, apply suitable functions including max, min, average, logic, or mix.
This tutorial will focus on using DataFrame.groupby()
for two columns to divide the DataFrame into different groups.
Groupby Two Columns In Pandas
Here is the suggested DataFrame:
import pandas as pd
roll_no = [301, 302, 303, 304, 305]
data = pd.DataFrame({
'Name': [“Emma”, “Travis”, “Anish”, “Jennifer”, “Bob”, “Luna”],
'Gender': ["Female", "Male", "Female", "Male", "Female", "Male"],
'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
'Age': [23, 25, 30, 34, 27, 25]
})
print(data)
Output:
Name Gender Employed Age
0 Emma Female Yes 23
1 Travis Male No 25
2 Anish Female Yes 30
3 Jennifer Male No 34
4 Bob Female Yes 27
5 Luna Male No 25
Pandas Groupby Multiple Columns
import pandas as pd
roll_no = [301, 302, 303, 304, 305]
data = pd.DataFrame({
'Name': [“Emma”, “Travis”, “Anish”, “Jennifer”, “Bob”, “Luna”],
'Gender': ["Female", "Male", "Female", "Male", "Female", "Male"],
'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
'Age': [23, 25, 30, 34, 27, 25]
})
print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(['Gender', 'Employed'])
for group_key, group_value in groups:
group = groups.get_group(group_key)
print(group)
print("")
Output:
Name Gender Employed Age
0 Emma Female Yes 23
1 Travis Male No 25
2 Anish Female Yes 30
3 Jennifer Male No 34
4 Bob Female Yes 27
5 Luna Male No 25
Groups in DataFrame:
Name Gender Employed Age
3 Jennifer Male No 34
Name Gender Employed Age
0 Emma Female Yes 23
4 Bob Female Yes 27
Name Gender Employed Age
1 Travis Male No 25
5 Luna Male No 25
Name Gender Employed Age
2 Anish Female Yes 30
The code divides the DataFrame from four different groups. All the rows containing Gender and Employed columns are grouped in the same place. If you want to visualize the results more clearly and get outliers, Pandas DataFrame.plt() function will serve your needs. Seaborn will offer a more appealing chart. Yet, a simple bar graph can also help you analyze the data.
Count The Number Of Row
In DataFrame.groupby()
method, there is a size()
function to count the row number:
import pandas as pd
roll_no = [301, 302, 303, 304, 305]
data = pd.DataFrame({
'Name': [“Emma”, “Travis”, “Anish”, “Jennifer”, “Bob”, “Luna”],
'Gender': ["Female", "Male", "Female", "Male", "Female", "Male"],
'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
'Age': [23, 25, 30, 34, 27, 25]
})
print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(['Gender', 'Employed']
).size().reset_index(name="Count")
print(grouped_df)
Output:
Name Gender Employed Age
0 Emma Female Yes 23
1 Travis Male No 25
2 Anish Female Yes 30
3 Jennifer Male No 34
4 Bob Female Yes 27
5 Luna Male No 25
Count of Each group:
Gender Employed Count
0 Female No 0
1 Female Yes 3
2 Male No 3
3 Male Yes 0
You can also use the max()
method to count the highest value:
Code:
groups = data.groupby(['Gender', 'Employed']).size().groupby(level=1)
print(groups.max())
Output:
Name Gender Employed Age
Name Gender Employed Age
0 Emma Female Yes 23
1 Travis Male No 25
2 Anish Female Yes 30
3 Jennifer Male No 34
4 Bob Female Yes 27
5 Luna Male No 25
Employed
No 3
Yes 3
dtype: int64
Conclusion
The article has explained how to use the DataFrame.groupby() method in Pandas for groupby two columns in Python Pandas.
Leave a comment