. Advertisement .
..3..
. Advertisement .
..4..
A GroupBy operation requires some combinations of using a function, combining the outcomes, and splitting the object. You can use this technique to organize massive amounts of data and perform computations on these groups. Want to learn how to use GroupBy index in Pandas? You are at the right place.
Pass the DataFrame’s index name as an argument to groupby()
to combine several rows on the index. DataFrame.groupby()
accepts a list or string as a parameter to indicate the group index or columns.
We will thoroughly show you how to utilize GroupBy on one or multiple indexes. We also show you how to perform this on an index and column combination. Notice that each instruction includes thorough examples to aid in your understanding of the procedures.
GroupBy On Index In Pandas Examples
Here are some brief illustrations of GroupBy on index field usage.
# Below are quick examples
# Create DataFrame
df = pd.DataFrame(technologies)
# Set Index to DataFrame
df.set_index(['Courses','Fee'], inplace=True)
print(df)
# Groupby Index
result = df.groupby('Courses').sum()
print(result)
# Groupby Multiple Index
result = df.groupby(['Courses','Fee']).sum()
print(result)
# Groupby Column & Index
result = df.groupby(['Courses','Duration']).sum()
print(result)
First, make a DataFrame from dict in Pandas and look at the examples above.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","PySpark","Spark","Spark"],
'Fee' :[20000,25000,26000,22000,25000,20000,35000],
'Duration':['30day','40days','35days','40days','60days','60days','70days'],
'Discount':[1000,2300,1200,2500,2000,2000,3000]
}
df = pd.DataFrame(technologies)
df.set_index(['Courses','Fee'], inplace=True)
print(df)
Output:
Duration Discount
Courses Fee
Spark 20000 30day 1000
PySpark 25000 40days 2300
Hadoop 26000 35days 1200
Python 22000 40days 2500
PySpark 25000 60days 2000
Spark 20000 60days 2000
35000 70days 3000
In the example above, we use the DataFrame.set_index()
to set numerous columns as the Index. These two indexes are also used to group rows.
GroupBy Index In Pandas Example
The DataFrame.gorupby() function requires a list or string of index or column names to execute grouping in a Pandas DataFrame. Keep in mind that a name of index is necessary to function. Utilize DataFrame.index.name = ‘index-name’ to create an index name if you don’t already have it.
# Groupby Index
result = df.groupby('Courses').sum()
print(result)
Output:
Discount
Courses
Hadoop 1200
PySpark 4300
Python 2500
Spark 6000
Group By Multiple Index Pandas
To perform grouping by many index fields simultaneously, you need to pass a list of index names. Here is a sample of grouping by Fee and Course index.
# Groupby Multiple Index
result = df.groupby(['Courses','Fee']).sum()
print(result)
Output:
Discount
Courses Fee
Hadoop 26000 1200
PySpark 25000 4300
Python 22000 2500
Spark 20000 3000
35000 3000
Use Both Index And Column In GroupBy Pandas
On occasion, you could also need to perform a group by method on the index and column simultaneously. In this case, try groupby(). This function has the advantage of accepting both at once.
# Groupby Column & Index
result = df.groupby(['Courses','Duration']).sum()
print(result)
Notice that in this sample, the DataFrame does not have the correct date grouping on the columns of Duration and Courses.
Discount
Courses Duration
Hadoop 35days 1200
PySpark 40days 2300
60days 2000
Python 40days 2500
Spark 30day 1000
60days 2000
70days 3000
Complete Example
Looking for a complete example of grouping indexes in Pandas? Here is one that you can use as a reference.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","PySpark","Spark","Spark"],
'Fee' :[20000,25000,26000,22000,25000,20000,35000],
'Duration':['30day','40days','35days','40days','60days','60days','70days'],
'Discount':[1000,2300,1200,2500,2000,2000,3000]
}
df = pd.DataFrame(technologies)
df.set_index(['Courses','Fee'], inplace=True)
print(df)
# Groupby Index
result = df.groupby('Courses').sum()
print(result)
# Groupby Multiple Index
result = df.groupby(['Courses','Fee']).sum()
print(result)
# Groupby Column & Index
result = df.groupby(['Courses','Duration']).sum()
print(result)
Conclusion
We have provided different ways to use GroupBy index in Pandas. Now you know how to utilize the groupby()
function to run a group by operation on one or many indexes.
Moreover, you can also apply this function to an index and column combination. Hopefully, our included examples can make it easier for you to learn these techniques. Keep practicing, and you will quickly become proficient in all of the above methods of grouping indexes in Pandas. Since you already know how to utilize GroupBy index, why not continue learning how to concat two DataFrames? This technique can come in handy for your coding journey.
Leave a comment