. Advertisement .
..3..
. Advertisement .
..4..
You may need to obtain the column names to carry out specific operations while examining real datasets that are frequently very large. In this case, if you can get a list of all column names in Pandas DataFrame, your job will be much easier.
This guide will show you how to execute this process with two different approaches. We recommend you try both of them to see how they perform and which one is more suitable for your needs.
Example
For this example, we establish a DataFrame with three columns as below:
import pandas as pd
data = {‘Name’: [‘Bill’,’Maria’,’David’,’James’,’Mary’],
‘Age’: [32,45,27,59,37],
‘Country’: [‘Spain’,’Canada’,’Brazil’,’UK’,’France’]
}
df = pd.DataFrame(data)
print (df)
When you execute the code above, the below DataFrame with three columns will appear:
Name Age Country
0 Bill 32 Spain
1 Maria 45 Canada
2 David 27 Brazil
3 James 59 UK
4 Mary 37 France
Utilizing List(df) To Get The List Of All Column Names In Pandas Dataframe
Add my_list = list(df)
to the code to use the first strategy:
import pandas as pd
data = {‘Name’: [‘Bill’,’Maria’,’David’,’James’,’Mary’],
‘Age’: [32,45,27,59,37],
‘Country’: [‘Spain’,’Canada’,’Brazil’,’UK’,’France’]
}
df = pd.DataFrame(data)
my_list = list(df)
print (my_list)
The List now contains three column names:
[‘Name’, ‘Age’, ‘Country’]
You can also confirm that you received the list by including print(type(my_list))
to the final line of the code:
import pandas as pd
data = {‘Name’: [‘Bill’,’Maria’,’David’,’James’,’Mary’],
‘Age’: [32,45,27,59,37],
‘Country’: [‘Spain’,’Canada’,’Brazil’,’UK’,’France’]
}
df = pd.DataFrame(data)
my_list = list(df)
print (my_list)
print (type(my_list))
Then, you will be able to verify that you have a list:
[‘Name’, ‘Age’, ‘Country’]
<class ‘list’>
Utilizing My_list = df.columns.values.tolist() To Get The List Of All Column Names In Pandas Dataframe
You can use the second method by changing the code to read: my_list = df.columns.values.tolist()
:
import pandas as pd
data = {‘Name’: [‘Bill’,’Maria’,’David’,’James’,’Mary’],
‘Age’: [32,45,27,59,37],
‘Country’: [‘Spain’,’Canada’,’Brazil’,’UK’,’France’]
}
df = pd.DataFrame(data)
my_list = df.columns.values.tolist()
print (my_list)
print (type(my_list))
You will now receive the list that includes the column names:
[‘Name’, ‘Age’, ‘Country’]
<class ‘list’>
Which Method Should You Pick?
Depending on your requirements, you might need to use the quicker method. So, which one is it?
Let’s use the timeit module to determine how long each option took to execute!
First, we measure the time of the my_list = list(df)
approach.
from timeit import default_timer
import pandas as pd
data = {‘Name’: [‘Bill’,’Maria’,’David’,’James’,’Mary’],
‘Age’: [32,45,27,59,37],
‘Country’: [‘Spain’,’Canada’,’Brazil’,’UK’,’France’]
}
df = pd.DataFrame(data)
beginning = default_timer()
my_list = list(df)
ending = default_timer()
print ((ending - beginning)*1000)
Below is the result of its execution time:
0.011199999999988997
To obtain a better idea of the execution time, you might want to execute the code several times.
Next, we measure the time of the my_list = df.columns.values.tolist()
approach.
from timeit import default_timer
import pandas as pd
data = {‘Name’: [‘Bill’,’Maria’,’David’,’James’,’Mary’],
‘Age’: [32,45,27,59,37],
‘Country’: [‘Spain’,’Canada’,’Brazil’,’UK’,’France’]
}
df = pd.DataFrame(data)
beginning = default_timer()
my_list = df.columns.values.tolist()
ending = default_timer()
print ((ending - beginning)*1000)
As you can tell, this strategy outperforms the first one in terms of speed:
0.005499999999991623
Keep in mind that the time to run these approaches may change depending on the Python/Pandas version and your device.
Thus, it would be best to test on your device and then draw a conclusion for the most accurate result.
Conclusion
After reading this guide, you now know how to get a list of all column names in Pandas DataFrame. Our two approaches can help you a lot in dealing with large datasets, making your job more bearable.
Notice that the executing time of these two methods is different. We suggest you give both of them a try, then pick the one that fits your needs the most.Once you have the list of all columns you need, you may want to get the number of rows in the DataFrame as well. Check out our thorough guide on getting the rows number now.
Leave a comment