. Advertisement .
..3..
. Advertisement .
..4..
Read on to find out how to get column average or mean in pandas DataFrame. With a certain degree of customization, it will provide you with essential insights into your data.
Get Column Average Or Mean In Pandas Dataframe
The method mean() can be used to get the mean of the values of one or multiple columns (or rows) in a DataFrame.
Syntax: DataFrame.mean(axis, skipna, level, numeric_only, **kwargs)
Parameters:
- axis: this is the main parameter of the method mean(), indicating whether you want to process the values of columns or rows. If you want to get the mean of columns, assign the value 0, which is also its default value.
- skipna: by default, mean() doesn’t take N/A or null values into account when calculating means. If you want to do so, assign False to it.
- level: this parameter determines the level of index you want to count in a multi-index DataFrame. The method mean() will collapse that level into a pandas Series in order to evaluate its mean.
- numeric_only: by default, mean() attempt to calculate the means of every column in a DataFrame before using only numeric columns. If set to True, this parameter tells the method to use only numeric data, including int, float, and boolean columns.
- **kwargsy: keyword arguments you want to pass into the function
We will use two different DataFrame to show the capabilities of this method: one database of home sales and one of MLB players.
......... ADVERTISEMENT .........
..8..
......... ADVERTISEMENT .........
..8..
import pandas as pd
df1 = pd.read_csv('homes.csv')
df2 = pd.read_csv('mlb_players.csv')
It is important to note that every parameter above is optional, meaning you can just call the method without passing anything on any DataFrame. When invoked like this, the method mean() will try to return the means of every column.
>>> df1.mean()
Sell 166.8
List 175.2
Rooms 7.7
Beds 3.9
Baths 2.0
Age 25.6
Taxes 3456.1
dtype: float64
>>> df2.mean()
<stdin>:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
Height 72.000
Weight 190.000
Age 26.642
dtype: float64
The first DataFrame has only numeric data, such as listing prices, selling prices, and numbers of rooms. The method mean() returns a Series containing the mean of each column.
On the other hand, there are string values in the second DataFrame, including names, team names, and positions of players.
As we have mentioned, mean() tries to calculate the means of those columns by default as well. But since these strings can’t be converted to legit numeric data, they will be eventually ignored with a warning message. Other numeric columns have their means evaluated just fine at the same time.
If you want to remove this message, pass the numeric_only=True argument:
>>> df2.mean(numeric_only=True)
Height 72.000
Weight 190.000
Age 26.642
dtype: float64
If you assign False to the parameter numeric_only and force the method mean() to calculate the means of string columns, it will return an error, not just a warning message:
>>> df2.mean(numeric_only=False)
TypeError: Could not convert ['Adam DonachiePaul KonerkoOrlando CabreraDustin PedroiaAndy Marte'] to numeric
If you only want to get the mean of certain columns, invoke the method mean() on a subset instead of the entire DataFrame.
You can get the mean of one or multiple columns by using their labels.
>>> df1['Sell'].mean()
166.8
>>> df1.loc[:,"Sell"].mean()
166.8
>>> df1[['Sell', 'List']].mean()
Sell 166.8
List 175.2
dtype: float64
Note: you can learn more about column selection here.
Conclusion
Knowing how to get column average or mean in pandas DataFrame with mean() is important for any data analyst. You can invoke this method on the entire DataFrame or only certain columns of your choice.
Leave a comment