Python for Beginners: Pandas Replace Value in a Dataframe :

Python for Beginners: Pandas Replace Value in a Dataframe
by:
blow post content copied from  Planet Python
click here to view original post


Pandas dataframes are used to manipulate tabular data in Python. Sometimes, while manipulating the data, we need to replace certain values in the pandas dataframe. In this article, we will discuss different ways to replace a value in a pandas dataframe. 

The replace() Method

To replace one or more values in a pandas dataframe, you can use the replace() method. It has the following syntax.

DataFrame.replace(to_replace=None, value=_NoDefault.no_default, *, inplace=False, limit=None, regex=False, method=_NoDefault.no_default)

Here, 

  • The to_repalce parameter takes a string, regex, list, dictionary, series, integer, or a floating point number as its input argument. 
    • If the input given to the to_replace parameter is a string, integer, floating point number, or a regex, the values matching to the input are replaced by the input given to the value parameter. 
  • If we pass a list of strings, numeric values, or regexes to the to_replace parameter, it works in two ways. 
    • If the input given to the value parameter is a single value, all the elements of the list passed to the to_replace parameter are replaced by the same value.
    • If the input given to the value parameter is a list, lists given to both the to_replace parameter and the value parameter must have equal length. The values in the list given to the to_replace parameter are replaced by the values at the corresponding position in the list given to the value parameter. 
  • If the input given to the to_replace parameter is a python dictionary, it works in two ways.
    • If the value parameter is set to None, the keys of the dictionary are replaced with the associated values.
    • If the value parameter is not None, the keys of the dictionary should be column names and the associated values are the values to be replaced with the input given to the value parameter.
  • By default, the replace() method returns a new dataframe. If you want to modify the original dataframe, you can set the inplace parameter to True.
  • When we specify the to_replace parameter and the value parameter is set to None, the replace() method works as the pandas fillna method. In this case, the values given to the to_replace parameter are first replaced with NaN. Then, the nan values are replaced using the method specified in the method parameter. You can specify the values ‘pad’, ‘ffill’, and ‘bfill’  for pad, forward fill, and backward fill respectively.
  • The limit parameter is used to fill nan values when the replace() method works as the fillna() method. 
  • The regex parameter is used to specify whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None.

After execution, the replace() method returns a new dataframe if the inplace parameter is set to False. Otherwise, it returns None. If invoked on a pandas series, the replace() method returns a series.

Replace Value in a Series in Python

To replace a value in a series, we will pass the value to be replaced and the new value to the replace() method as shown in the following example.

import pandas as pd
import numpy as np
numbers=[3,23,100,14,16,100,45,65]
series=pd.Series(numbers)
print("The series is:")
print(series)
newSeries=series.replace(100,"Max")
print("The updated series is:")
print(newSeries)

Output:

The series is:
0      3
1     23
2    100
3     14
4     16
5    100
6     45
7     65
dtype: int64
The updated series is:
0      3
1     23
2    Max
3     14
4     16
5    Max
6     45
7     65
dtype: object

In this example, we first created a series using a python list. Then, we invoked the replace() method on the series with 100 as its first input argument and the python literalMax” as the second input argument. After execution, the replace() method replaces each instance of 100 with "Max" and returns a new series.

Pandas Replace Single Value in the Entire Dataframe

To replace a value in a pandas dataframe, We will invoke the replace() method on the dataframe. Here, we will pass the value that needs to be replaced as the first input argument and the new value as the second input argument to the replace() method as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":87, "Chemistry": 82},
        {"Roll":2,"Maths":75, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":87, "Physics":84, "Chemistry": 76},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":87, "Chemistry": 84},
        {"Roll":6,"Maths":79, "Physics":75, "Chemistry": 72}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
newDf=df.replace(100,"Max")
print("The updated dataframe is:")
print(newDf)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       87         82
1     2     75      100         90
2     3     87       84         76
3     4    100      100         90
4     5     90       87         84
5     6     79       75         72
The updated dataframe is:
   Roll Maths Physics  Chemistry
0     1   Max      87         82
1     2    75     Max         90
2     3    87      84         76
3     4   Max     Max         90
4     5    90      87         84
5     6    79      75         72

In the above example, we first converted a list of dictionaries to dataframe. Then, we invoked the replace() method on the dataframe with 100 as its first input argument and "Max" as the second input argument. After execution, the replace() method replaces each instance of 100 with "Max" in the original dataframe and returns a new dataframe.

Replace Value in a Single Column in a Dataframe

Instead of replacing value in the entire dataframe, you can also replace a value in a single column of a pandas dataframe.

To replace a value in a specific column, we will invoke the replace() method on the column instead of the entire dataframe.You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":87, "Chemistry": 82},
        {"Roll":2,"Maths":75, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":87, "Physics":84, "Chemistry": 76},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":87, "Chemistry": 84},
        {"Roll":6,"Maths":79, "Physics":75, "Chemistry": 72}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Maths"]=df["Maths"].replace(100,"Max")
print("The updated dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       87         82
1     2     75      100         90
2     3     87       84         76
3     4    100      100         90
4     5     90       87         84
5     6     79       75         72
The updated dataframe is:
   Roll Maths  Physics  Chemistry
0     1   Max       87         82
1     2    75      100         90
2     3    87       84         76
3     4   Max      100         90
4     5    90       87         84
5     6    79       75         72

In the above example, we have invoked the replace() method on a column of the dataframe. After execution, the replace() method returns a new series object. We then assign the same object to the existing column in the dataframe.

Pandas Replace Different Value in Each Column

If you want to replace different values in different columns with a single final value, you can pass a dictionary to the replace() method as the first input argument.

Here, the dictionary should contain the column names as its keys and the values that need to be replaced in the columns as the corresponding values of the keys in the dictionary. You can specify the replacement value as the second input argument to the replace() method. After execution, you will get the desired output as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":87, "Chemistry": 82},
        {"Roll":2,"Maths":75, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":87, "Physics":84, "Chemistry": 76},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":87, "Chemistry": 84},
        {"Roll":6,"Maths":79, "Physics":75, "Chemistry": 72}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
newDf=df.replace({"Maths":100,"Physics":100, "Chemistry":90},"Max")
print("The updated dataframe is:")
print(newDf)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       87         82
1     2     75      100         90
2     3     87       84         76
3     4    100      100         90
4     5     90       87         84
5     6     79       75         72
The updated dataframe is:
   Roll Maths Physics Chemistry
0     1   Max      87        82
1     2    75     Max       Max
2     3    87      84        76
3     4   Max     Max       Max
4     5    90      87        84
5     6    79      75        72

In the original dataframe, the column "Chemistry" has 90 has its highest value. So, when we replace 100 with "Max", we cannot specify the rows that have maximum marks in Chemistry.

To specify the value to replace in each column, we have passed a python dictionary containing the column names as the keys and the maximum value in each column as the associated value to the replace() method as its first input argument and the term "Max" as the second input argument. Hence, after execution of the replace() method replaces the value 100 in the columns "Maths", and "Physics". In the column "Chemistry", it replaces the value 90 with "Max" as specified in the dictionary.

Replace Value Inplace in a Pandas Dataframe

In the above examples, the replace() method returns a new dataframe or series after execution. If you want to modify the existing series or dataframe after using the replace() method, you can set the inplace parameter to True. After this, the original series or dataframe will be modified. You can observe this in the following example.

import pandas as pd
import numpy as np
numbers=[3,23,100,14,16,100,45,65]
series=pd.Series(numbers)
print("The series is:")
print(series)
series.replace(100,"Max",inplace=True)
print("The updated series is:")
print(series)

Output:

The series is:
0      3
1     23
2    100
3     14
4     16
5    100
6     45
7     65
dtype: int64
The updated series is:
0      3
1     23
2    Max
3     14
4     16
5    Max
6     45
7     65
dtype: object

In this example, we have set the inplace parameter to True in the replace() method. Hence, the replace() method modifies the original series instead of returning a new series.

In a similar manner, you can replace a value in a pandas dataframe inplace as shown in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":87, "Chemistry": 82},
        {"Roll":2,"Maths":75, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":87, "Physics":84, "Chemistry": 76},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":87, "Chemistry": 84},
        {"Roll":6,"Maths":79, "Physics":75, "Chemistry": 72}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df.replace({"Maths":100,"Physics":100, "Chemistry":90},"Max",inplace=True)
print("The updated dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       87         82
1     2     75      100         90
2     3     87       84         76
3     4    100      100         90
4     5     90       87         84
5     6     79       75         72
The updated dataframe is:
   Roll Maths Physics Chemistry
0     1   Max      87        82
1     2    75     Max       Max
2     3    87      84        76
3     4   Max     Max       Max
4     5    90      87        84
5     6    79      75        72

Conclusion

In this article, we have discussed different ways to replace a value in a pandas dataframe and series. We also discussed how to replace different values in different columns by a single value.

To learn more about python programming, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Pandas Replace Value in a Dataframe appeared first on PythonForBeginners.com.


January 04, 2023 at 07:30PM
Click here for more details...

=============================
The original post is available in Planet Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Salesforce