. Advertisement .
..3..
. Advertisement .
..4..
You can use the method DataFrame.replace()
to replace values in Pandas DataFrame. This tutorial is going to show you how to put this method to use in various scenarios.
How To Replace Values in Pandas DataFrame
DataFrame.replace()
The method DataFrame.replace()
has syntax as follows:
DataFrame.replace(to_replace, value, inplace, limit, regex, method)
Basically, it searches for and replaces to_replace
with data given in value. Unlike label-based and location-based properties like .loc
and .iloc
, this solution doesn’t require you to provide exactly which entry you need to update its value.
In addition to those two required arguments, you may need to use other parameters:
- inplace: this boolean parameter controls whether the method should make changes directly into the original DataFrame. Its default value is False, meaning it will copy it to another DataFrame, where the replacement will happen.
- regex: this boolean parameter determines whether the method should interpret the given
to_replace
and value arguments as regular expressions.
The method will return a DataFrame after replacement, regardless of the value of the argument inplace.
Examples
To demonstrate the capabilities of the method replace()
, we will use the data imported from a CSV file. It contains information about some players in Major League Baseball, such as their name, team, or position.
import pandas as pd
df = pd.read_csv('mlb_players.csv')
df
......... ADVERTISEMENT .........
..8..
Suppose you receive the wrong database and need to adjust it accordingly. Even though you can modify the source file, you can directly process this DataFrame with replace()
.
For instance, what if you make a mistake and want to replace ‘LA’ in the team name column with ‘LAD’? This simple statement can get the trick done.
df.replace('LA', 'LAD')
......... ADVERTISEMENT .........
..8..
Some databases aren’t simple like that. They may have entries of the same name across several columns that may complicate your attempt.
This DataFrame has two columns that share every entry, except for their index.
data = {'ITTutoria': ['Python','Spark','HTML','Scala','C','JavaScript', 'Java'],
'Quora': ['Java','C','HTML','Python','Spark','Scala','R']
}
df = pd.DataFrame(data, columns= ['ITTutoria','Quora'])
......... ADVERTISEMENT .........
..8..
If you use the method replace()
like above, it can result in unexpected behaviors. For instance, let’s say you want to replace ‘Python’ in the column ‘Quora’ with ‘Ruby’. This statement will also replace the same entry on the column ‘ITTutoria’, which isn’t your desired result.
......... ADVERTISEMENT .........
..8..
In this case, you will need to apply the method replace()
only to the column ‘Quora’, not the entire DataFrame.
df['Quora'].replace('Python', 'Ruby', inplace = True)
df
......... ADVERTISEMENT .........
..8..
You can also do so with one or multiple rows. This statement will only process the final three rows and replace ‘C’ with ‘Rush’, leaving the entry in the second row intact. You can learn more about row selection here.
df.iloc[4:7].replace(['C'], 'Rust', inplace = True)
df
......... ADVERTISEMENT .........
..8..
Thanks to regular expressions, you can replace different records that match a certain pattern at the same time. For example, use this statement if you want to replace both ‘C’ and ‘C++’ with ‘Rust’ in the DataFrame above:
df.replace(to_replace=r'^C.*$', value='Rust', regex=True)
......... ADVERTISEMENT .........
..8..
When there is no regular expression able to act as a search pattern, you can provide a list of entries that need to be replaced.
df.replace(['C', 'C++'], 'Rust')
The statement above should produce the same result as the one with the regular expression.
Conclusion
You can replace values in Pandas DataFrame with the method DataFrame.replace()
. It can be applied to the whole DataFrame or just a subset of it. In addition to string literals, the method also accepts regular expressions as its search pattern.
Leave a comment