. Advertisement .
..3..
. Advertisement .
..4..
You can find Pandas sum DataFrame columns easily with the built-in functions of this module. Check them out with these examples.
Pandas Sum DataFrame Columns
You will need to use the function pandas.DataFrame.sum() to find the summation(s) of one or multiple columns in a DataFrame. This method can be operated over any requested axis, including rows (indices).
Syntax: DataFrame.sum(axis, skipna, level, numeric_only, min_count, **kwargs)
Parameters:
- axis: this parameter determines the axis over which the function will be applied on. The default value is ‘index’ or 0, meaning sum() will calculate the addition of every index in each column. You can assign the parameter to ‘columns’ or 1 to find the sum of each row.
- skipna: a boolean parameter determines whether to include null or N/A values when evaluating the result. The default value is True.
- level: you can tell the function sum() to count values along a certain level in a multi-index DataFrame by using this parameter. While its default value is None, you can assign a level name or number.
- numeric_only: a boolean parameter determines whether to count only numeric values. When set to None (which is also its default value), it tells sum() to attempt to take into account possible values before using only numeric elements.
- min_count: the number of non-N/A values required. When there are fewer valid values in operation, the function sum() will return a N/A value.
- **kwargs: other keyword arguments that you want to pass to the function.
Remember that pandas.DataFrame.sum() typically returns a pandas Series. But when a level is specified in multiindex DataFrames, a DataFrame will be returned.
To demonstrate the capabilities of the function pandas.DataFrame.sum(), we are going to use two DataFrame as examples. One of them stores basic information about several home sales, and the other is a database of some Major League Baseball (MLB) players.
......... ADVERTISEMENT .........
..8..
......... ADVERTISEMENT .........
..8..
We need to import the existing CSV file first into DataFrames first:
import pandas as pd
df1 = pd.read_csv('homes.csv')
df2 = pd.read_csv('mlb_players.csv')
When you invoke the function sum() on the entire first DataFrame like this, it will return the sum of every column.
>>> df1.sum()
Sell 1668
List 1752
Rooms 77
Beds 39
Baths 20
Age 256
Taxes 34561
dtype: int64
Remember that the return object is a Series.
>>> type(df1.sum())
<class 'pandas.core.series.Series'>
As every column in the first dataset is numerical, there should be no unexpected behaviors when calling sum() in all of them. This is not the case for the second DataFrame, which has some string columns.
By default, the function still tries to calculate the sums of its columns. But the returned values of string columns aren’t just unnecessary but also hard to understand:
>>> df2.sum()
Name Adam DonachiePaul KonerkoOrlando CabreraDustin...
Team BALCWSANABOSCLE
Position CatcherFirst BasemanShortstopSecond BasemanThi...
Height 286.0
Weight 950
Age 133.21
dtype: object
To ignore non-numerical columns, you can use the numeric_only parameter:
>>> df2.sum(numeric_only=True)
Height 286.00
Weight 950.00
Age 133.21
dtype: float64
The output now concerns numerical data, which is highly likely also the purpose of most people.
What if we want to calculate the sum of one or some columns, not all of them? You can indicate this range by using the square brackets or the loc and iloc properties (learn more about this here).
>>> df1['Sell'].sum()
1668
>>> df1.loc[:, 'Sell'].sum()
1668
>>> df1.iloc[:,0].sum()
1668
The function sum() is now invoked over only a subset of your original DataFrame, which contains only one column. The iloc property also allows you to invoke the function sum() over a range of columns:
>>> df1.iloc[:,0:3].sum()
Sell 1668
List 1752
Rooms 77
Conclusion
To find Pandas sum DataFrame columns, you can use the function pandas.DataFrame.sum(). It can be called upon a single, multiple, or every column of a DataFrame.
Leave a comment