. Advertisement .
..3..
. Advertisement .
..4..
With pandas select columns by name or index from a DataFrame should not be different. Learn all the common methods of doing so and get the data you need with this tutorial.
Pandas Select Columns by Name or Index
You can select columns from a pandas DataFrame by using square brackets [] (for simple selections) or properties like loc and iloc (for more advanced selections).
In this guide, we are going to use a DataFrame created from a CSV file. It represents ten home sale statistics, such as the selling price, asking price, number of rooms, and taxes.
You can import the CSV with the read_csv() function from the pandas module.
>>> import pandas as pd
>>> sales = pd.read_csv("homes.csv")
Using Square Brackets
What if you are interested in the listing prices of every home? The entire column with the label “List” is what you should select – a task can be easily done with the square brackets.
>>> sales["List"]
0 160.0
1 180.0
2 132.0
3 140.0
Name: List, dtype: float64
The returned object is a Series, which is also the data type of columns in a pandas DataFrame. You can verify this manually:
>>> type(sales["List"])
<class 'pandas.core.series.Series'>
If you want to get a DataFrame instead, put the labels inside two sets of square brackets:
>>> type(sales[["List"]])
<class 'pandas.core.frame.DataFrame'>
Using the same method, you can select multiple columns from a pandas DataFrame by providing a list of labels you want to extract:
>>> sales[["Sell", "List"]]
Sell List
0 142 160.0
1 175 180.0
2 129 132.0
3 138 140.0
Since the returned object is now a 2-dimensional subset of the original DataFrame, it is also a DataFrame:
>>> type(sales[["Sell", "List"]])
<class 'pandas.core.frame.DataFrame'>
You can also select certain columns while filtering specific rows based on those columns at the same time. For example, this compound command removes rows with a value less than or equal 150:
>>> sales[sales["List"] > 150]["List"]
0 160.0
1 180.0
4 240.0
Name: List, dtype: float64
Using DataFrame.loc[]
The property DataFrame.loc allows you to select columns (and rows as well if you wish to) by labels or boolean arrays.
The easiest use case is to access a single column by provide one label:
>>> sales.loc[:, "List"]
0 160.0
1 180.0
2 132.0
Name: List, dtype: float64
The semicolon argument above indicates that you want to extract every row. In order to select multiple columns from a DataFrame, you will need to provide their labels in a list:
>>> sales.loc[:, ["Sell", "List"]]
Sell List
0 142 160.0
1 175 180.0
2 129 132.0
Like the square brackets, loc also returns a Series or DataFrame depending on whether you select one or multiple columns. You can filter rows with loc as well:
>>> sales[sales.loc[:, "List"] > 150].loc[:, "List"]
0 160.0
1 180.0
4 240.0
Name: List, dtype: float64
Using DataFrame.iloc[]
While loc can select columns by labels, you can pick a subset of a DataFrame based on indices with the property iloc.
>>> sales.iloc[:, [1]]
List
0 160.0
1 180.0
2 132.0
The 1 argument here isn’t a label but the index of the List column. You can add other indices to the list to select more columns:
>>> sales.iloc[:, [0, 1]]
Sell List
0 142 160.0
1 175 180.0
2 129 132.0
Remember that even when just one column is selected with iloc[], it still returns a DataFrame, not a Series:
>>> type(sales.iloc[:, [1]])
<class 'pandas.core.frame.DataFrame'>
Conclusion
Like converting columns to lists, in pandas select columns by name or index is an easy task with many options. You can use square brackets or advanced properties like loc[] and iloc[], which all create a Series or DataFrame object.
Leave a comment