. Advertisement .
..3..
. Advertisement .
..4..
You may already know slicing lists in Python. And this makes learning slice columns in Pandas DataFrame easier, thanks to their similarities. Check out these examples to learn more about it.
How To Slice Columns In Pandas DataFrame
In order to slice columns, you will grasp the basics of how Pandas indexes its DataFrames. This module supports three methods of indexing in multi-axis datasets, allowing for flexible subset selections.
DataFrame.loc[]
This property can access a group of columns indicated by a boolean array or a list of labels.
We are going to use a DataFrame containing information about several websites as an example.
import pandas as pd
headers = ['Site', 'Ranking', 'Type', 'Top Language']
data = [
['ITTutoria', 1, 'Tutorials', 'Python'],
['Stack Overflow', 2, 'Q&A', 'JavaScript'],
['Quora', 3, 'Q&A', None],
['Reddit', 4, 'Forum', None]
]
df = pd.DataFrame(data, columns = headers)
df
Output:
Site Ranking Type Top Language
0 ITTutoria 1 Tutorials Python
1 Stack Overflow 2 Q&A JavaScript
2 Quora 3 Q&A None
3 Reddit 4 Forum None
The simplest application of .loc[] is to pick up a single label. For instance, this statement prints out the column ‘Site’ from the above DataFrame:
>>> df.loc[:, 'Site']
0 ITTutoria
1 Stack Overflow
2 Quora
3 Reddit
We need to put the column name after a semicolon as the property .loc[] can deal with rows as well, and it will interpret anything before this semicolon as a request for accessing rows in Pandas DataFrame.
Additionally, this property can also return a slice object – the application you are looking for. To produce slices with .loc, you will need to provide the index with the labels of both the start and stop positions. The property .loc will return every column between them, including those two columns.
For example, this statement selects columns from ‘Ranking’ to ‘Top Language’:
>>> df.loc[:, 'Ranking':'Top Language']
Ranking Type Top Language
0 1 Tutorials Python
1 2 Q&A JavaScript
2 3 Q&A None
3 4 Forum None
The semicolon also allows you to slice from the start to a certain column or from this column to the end of the DataFrame:
>>> df.loc[:, :'Ranking']
Site Ranking
0 ITTutoria 1
1 Stack Overflow 2
2 Quora 3
3 Reddit 4
>>> df.loc[:, 'Ranking':]
Ranking Type Top Language
0 1 Tutorials Python
1 2 Q&A JavaScript
2 3 Q&A None
3 4 Forum None
The examples above show you the semicolon splits the DataFrame. You can even add a step when slicing columns, similar to how you can slice a Python list:
>>> df.loc[:, 'Site':'Top Language':2]
Site Type
0 ITTutoria Tutorials
1 Stack Overflow Q&A
2 Quora Q&A
3 Reddit Forum
The statement above tells the property .loc to select every other column, starting with ‘Site’ and stopping after ‘Top Language’.
If you want to pick only certain rows when slicing columns, you can indicate them straight into the property .loc before the semicolon. This statement selects only the name and ranking of the first two sites:
>>> df.loc[1:2, 'Site':'Ranking']
Site Ranking
1 Stack Overflow 2
2 Quora 3
DataFrame.iloc[]
Like .loc, .iloc is also a property of DataFrames in Pandas. But while .loc is label-based, .iloc caters to data selections by location. You will need to use integer indexes with this property.
This statement will only show a slice object from the column ‘Site’ to ‘Type’:
>>> df.iloc[:, 0:3]
Site Ranking Type
0 ITTutoria 1 Tutorials
1 Stack Overflow 2 Q&A
2 Quora 3 Q&A
3 Reddit 4 Forum
Because DataFrames in Pandas are zero-indexed, the first column has the index 0, the second 1, and so on. Like .loc[], the first part of the property indicates row selections. To select every row, we must use a semicolon.
You can also add a step between columns:
>>> df.iloc[:, 0:3:2]
Site Type
0 ITTutoria Tutorials
1 Stack Overflow Q&A
2 Quora Q&A
3 Reddit Forum
Or use the first or final column in your slice:
>>> df.iloc[:, :2]
Site Ranking
0 ITTutoria 1
1 Stack Overflow 2
2 Quora 3
3 Reddit 4
>>> df.iloc[:, 1:]
Ranking Type Top Language
0 1 Tutorials Python
1 2 Q&A JavaScript
2 3 Q&A None
3 4 Forum None
Remember that, unlike loc, iloc will begin from the start and stop before the specified index, not right after it.
Conclusion
You can slice columns in Pandas DataFrame with two properties: loc and iloc. They use labels and positions of the columns to determine the start and end points of the slice object. You can look more into data selection in Pandas with this guide.
Leave a comment