You can use the method
DataFrame.replace() to replace values in Pandas DataFrame. This tutorial is going to show you how to put this method to use in various scenarios.
How To Replace Values in Pandas DataFrame
DataFrame.replace() has syntax as follows:
DataFrame.replace(to_replace, value, inplace, limit, regex, method)
Basically, it searches for and replaces
to_replace with data given in value. Unlike label-based and location-based properties like
.iloc, this solution doesn’t require you to provide exactly which entry you need to update its value.
In addition to those two required arguments, you may need to use other parameters:
- inplace: this boolean parameter controls whether the method should make changes directly into the original DataFrame. Its default value is False, meaning it will copy it to another DataFrame, where the replacement will happen.
- regex: this boolean parameter determines whether the method should interpret the given
to_replaceand value arguments as regular expressions.
The method will return a DataFrame after replacement, regardless of the value of the argument inplace.
To demonstrate the capabilities of the method
replace(), we will use the data imported from a CSV file. It contains information about some players in Major League Baseball, such as their name, team, or position.
import pandas as pd df = pd.read_csv('mlb_players.csv') df
Suppose you receive the wrong database and need to adjust it accordingly. Even though you can modify the source file, you can directly process this DataFrame with
For instance, what if you make a mistake and want to replace ‘LA’ in the team name column with ‘LAD’? This simple statement can get the trick done.
Some databases aren’t simple like that. They may have entries of the same name across several columns that may complicate your attempt.
This DataFrame has two columns that share every entry, except for their index.
If you use the method
replace() like above, it can result in unexpected behaviors. For instance, let’s say you want to replace ‘Python’ in the column ‘Quora’ with ‘Ruby’. This statement will also replace the same entry on the column ‘ITTutoria’, which isn’t your desired result.
In this case, you will need to apply the method
replace() only to the column ‘Quora’, not the entire DataFrame.
df['Quora'].replace('Python', 'Ruby', inplace = True) df
You can also do so with one or multiple rows. This statement will only process the final three rows and replace ‘C’ with ‘Rush’, leaving the entry in the second row intact. You can learn more about row selection here.
df.iloc[4:7].replace(['C'], 'Rust', inplace = True) df
Thanks to regular expressions, you can replace different records that match a certain pattern at the same time. For example, use this statement if you want to replace both ‘C’ and ‘C++’ with ‘Rust’ in the DataFrame above:
df.replace(to_replace=r'^C.*$', value='Rust', regex=True)
When there is no regular expression able to act as a search pattern, you can provide a list of entries that need to be replaced.
df.replace(['C', 'C++'], 'Rust')
The statement above should produce the same result as the one with the regular expression.
You can replace values in Pandas DataFrame with the method
DataFrame.replace(). It can be applied to the whole DataFrame or just a subset of it. In addition to string literals, the method also accepts regular expressions as its search pattern.