Table of Contents
This tutorial will show you how to operate pandas groupby() and sum() without even a mild headache being able to distract your mind.
Let’s enter the fray!
Example For Dataframes
We’ll begin as usual by importing the Pandas library and making a straightforward DataFrame that we’ll utilize for the duration of this illustration.
# Create a pandas DataFrame.
import pandas as pd
englishteaching = ({
'Teachers':["Simmon","Pinmark","Hanson","Putin","Plex","Hanson","Simmon","Putin"],
'Fee' :[32000,45000,33000,44000,36000,45000,35000,42000],
'Prolongation':['50days','30days','40days','35days','35days','60days','50days','55days'],
'Discount':[2000,1300,2000,1100,2500,2300,1200,1500]
})
df = pd.DataFrame(englishteaching, columns=['Teachers','Fee','Prolongation','Discount'])
print(df)
In this manner, we produce the following result:
Teachers Fee Prolongation Discount
0 Simmon 32000 50days 2000
1 Pinmark 45000 30days 1300
2 Hanson 33000 40days 2000
3 Putin 44000 35days 1100
4 Plex 36000 35days 2500
5 Hanson 45000 60days 2300
6 Simmon 35000 50days 1200
7 Putin 42000 55days 1500
Pandas Groupby() & Sum() According To Column Name
To combine similar data into a group so that you may apply aggregate functions, use the groupby()
method in Pandas. This method provides a DataFrameGroupBy object that has aggregate methods like sum, mean, and other similar ones.
For instance, the function df.groupby(['Teachers']).sum()
groups data based on the Teachers
column and totals all the numeric columns in the DataFrame.
Keep in mind that the group key you are using turns into an Index of the resulting DataFrame. Following that, add an Index use as_index =False argument to make this advertisement disappear. We will show you how to do this in one of the instances below.
Running the code:
# Use GroupBy() to compute the sum
df2 = df.groupby('Teachers').sum()
print(df2)
Output:
Teachers Fee Discount
Simmon 32000 2000
Pinmark 45000 1300
Hanson 33000 2000
Putin 44000 1100
Plex 36000 2500
Hanson 45000 2300
Simmon 35000 1200
Putin 42000 1500
Additionally, you may specifically state which column you wish to do a sum() action on. The sum in the Fee column is applied in the example below.
Running the code:
# Use GroupBy() & compute sum on specific column
df2 = df.groupby('Teachers')['Fee'].sum()
print(df2)
Output:
Teachers
Simmon 32000
Pinmark 45000
Hanson 33000
Putin 44000
Plex 36000
Hanson 45000
Simmon 35000
Putin 42000
Name: Fee, dtype: int64
Multiple Columns Pandas groupby() & sum()
In order to apply a group by to multiple columns and calculate a total over each combination group, you can also pass a list of the columns you wish to group to the groupby() function.
As an illustration, the function df.groupby(['Teachers','Prolongation'])['Fee']. sum()
groups by the Teachers
and Prolongation
columns before calculating the sum.
Running the code:
# Using GroupBy multiple column
df2 = df.groupby(['Teachers','Prolongation'])['Fee'].sum()
print(df2)
Output:
Teachers Prolongation Fee
Simmon 50days 32000
Pinmark 30days 45000
Hanson 40days 33000
Putin 35days 44000
Plex 35days 36000
Hanson 60days 45000
Simmon 50days 35000
Putin 55days 42000
Name: Fee, dtype: int64
The Bottom Line
These two methods might work best for your pandas groupby() and sum() operation, depending on your coding style.Now, hesitate any longer but give it a shot and don’t forget to let us know how it goes in the comments. Also, check out our article on how to apply a function to a column in pandas dataframe if you urge to learn further.
Leave a comment