. Advertisement .
..3..
. Advertisement .
..4..
Learn more about Pandas concat two DataFrames with this tutorial. You can combine them to create another DataFrame, providing more control over your data analysis.
Pandas Concat Two DataFrames
The function pandas.concat() can concatenate several pandas objects together, including two DataFrames. It can link them along a certain axis with optional operations on other axes.
Syntax: pandas.concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
Important parameters you need to be aware of:
- objs: this should be a mapping or sequence of several DataFrames. The function also accepts Series objects. You can pass a None object, which the function will drop without notice. However, when every object is None, a ValueError exception will be returned.
- axis: this parameter determines the axis along which the concatenation operation will be run. The default value is ‘index’/0, and you can assign it to ‘columns’/1.
- ignore_index: it controls whether the final result uses the existing index values in the concatenation axis. If True, the function concat() will assign a new label from 0. You should set it to True when the indexing of the current concatenation axis doesn’t contain any meaningful information. Keep in mind that this parameter only concerns the concatenation axis. Other axes will be kept intact by the function concat().
- join: it determines whether to include non-overlapping columns or rows on the other axis. The default is ‘outer’, meaning the function concat() will include them.
- sort: When join=’outer’, this boolean parameter determines whether the non-concatenation axis should be sorted. It doesn’t change how concat() functions when the join is ‘inner’. The default is False (will not sort).
- verify_integrity: it checks whether the concatenation axis has duplicates.
The function pandas.concat() returns a DataFrame when it concatenates along the columns or when at least a DataFrame is involved.
We use three similar DataFrames to show you to use this function to manipulate your data. These DataFrames provide information on several home sales. We will need to import them from CSV files first:
......... ADVERTISEMENT .........
..8..
......... ADVERTISEMENT .........
..8..
......... ADVERTISEMENT .........
..8..
import pandas as pd
df1 = pd.read_csv('homes-1.csv')
df2 = pd.read_csv('homes-2.csv')
df3 = pd.read_csv('homes-3.csv')
To link two DataFrames to each other, you can invoke the function concat() on them, which should be represented in a list:
>>> pd.concat([df1, df2])
Sell List Rooms Beds Baths Age Taxes
0 142 160 10 5 3 60 3167
1 175 180 8 4 1 12 4033
2 129 132 6 3 1 41 1471
3 138 140 7 3 1 22 3204
4 232 240 8 4 3 5 3613
0 135 140 7 4 3 9 3028
1 150 160 8 4 3 18 3131
2 207 225 8 4 2 16 5158
3 271 285 10 5 2 30 5702
4 89 90 5 3 1 43 2054
As you can see, the returned DataFrame contains every row of the original DataFrames. The only weird thing is how the indexes of the result get repeated.
The reason is that without an explicit value for the parameter ignore_index, the function concat() reuses the indexes of the original DataFrames. If you want to have new indexes in the final result, assign ignore_index=True:
>>> pd.concat([df1, df2], ignore_index=True)
Sell List Rooms Beds Baths Age Taxes
0 …
1 …
2 …
3 …
4 …
5 …
6 …
7 …
8 …
9 ..
Each row now has a new index automatically assigned to it. If you want to know the number of rows in a DataFrame, check out this guide.
If two DataFrames don’t have identical columns (such as df1 and df3), the function concat() will do its best and concatenate as many columns as possible.
>>> pd.concat([df1, df3], ignore_index=True)
Sell List Rooms Beds Baths Age Taxes
0 142 160 10 5 3 60 3167.0
1 175 180 8 4 1 12 4033.0
2 129 132 6 3 1 41 1471.0
3 138 140 7 3 1 22 3204.0
4 232 240 8 4 3 5 3613.0
5 166 170 9 4 2 37 NaN
6 136 140 7 3 1 22 NaN
7 148 160 7 3 2 13 NaN
8 151 153 8 4 2 24 NaN
9 180 190 9 4 2 10 NaN
As you can see, Rooms 5-9 come from the third DataFrame, which has no column for tax. The function concat() will fill those places with a NaN object. To ignore this column, you can assign the value ‘inner’ to the parameter join.
>>> pd.concat([df1, df3], ignore_index=True, join='inner')
Sell List Rooms Beds Baths Age
0 142 160 10 5 3 60
1 175 180 8 4 1 12
2 129 132 6 3 1 41
3 138 140 7 3 1 22
4 232 240 8 4 3 5
5 166 170 9 4 2 37
6 136 140 7 3 1 22
7 148 160 7 3 2 13
8 151 153 8 4 2 24
9 180 190 9 4 2 10
The final result now has no Taxes column as it isn’t available on every DataFrame.
Conclusion
You can have Pandas concat two DataFrames with the function concat(). With many parameters for fine-control over the operation, it can link two or more DataFrames to create a unified dataset.
Leave a comment