Skip to content

replace Method

The replace() method substitutes values in a DataFrame or Series. It is more general than fillna() and can replace any value, not just NaN.

Mental Model

replace() is find-and-replace for DataFrames. Unlike fillna which only targets NaN, replace swaps any value for any other value. Pass a dict of {old: new} pairs, a list of values to replace, or even a regex pattern. It is the right tool for cleaning up sentinel values like -999, "N/A", or typos.

Basic Usage

Replace specific values with new values.

1. Replace NaN

```python import pandas as pd import numpy as np

url = "https://raw.githubusercontent.com/codebasics/py/master/pandas/5_handling_missing_data_fillna_dropna_interpolate/weather_data.csv" df = pd.read_csv(url, index_col='day', parse_dates=True) print(df)

dg = df.replace(to_replace=np.nan, value=0) print(dg) ```

2. Replace Single Value

python df.replace(to_replace=-999, value=np.nan)

3. Shorthand Syntax

python df.replace(-999, np.nan) # Positional arguments

Dictionary Replacement

Use dictionaries for multiple replacements.

1. Simple Dictionary

python df.replace({'old_value': 'new_value'})

2. Multiple Replacements

python df.replace({ -999: np.nan, -1: 0, 'N/A': np.nan })

3. Column-specific

python df.replace({ 'column_A': {0: 100}, 'column_B': {'x': 'y'} })

List Replacement

Replace multiple values with a single value.

1. List to Scalar

python df.replace([np.nan, -999, 'NA'], 0)

2. List to List

python df.replace( to_replace=['low', 'medium', 'high'], value=[1, 2, 3] )

3. Order Matters

```python

Lists must have same length for one-to-one mapping

```

Regex Replacement

Use regular expressions for pattern matching.

1. Enable Regex

python df['text'].replace( to_replace=r'^ba.$', value='new', regex=True )

2. Pattern Replacement

```python

Replace all strings starting with 'test'

df.replace(r'^test.*', 'replaced', regex=True) ```

3. Capture Groups

python df['col'].replace( r'(\d+)-(\d+)', r'\2-\1', regex=True ) # Swap groups

Comparison with fillna

When to use replace vs fillna.

1. fillna for NaN Only

python df.fillna(0) # Only replaces NaN

2. replace for Any Value

python df.replace(-999, np.nan) # Replaces any value df.replace(np.nan, 0) # Also replaces NaN

3. Use Case Guidance

```python

Use fillna when specifically handling missing values

Use replace when substituting specific values

```

Practical Examples

Common replacement scenarios.

1. Clean Sentinel Values

```python

Replace common missing value indicators

df.replace([-999, -1, 'NA', 'N/A', ''], np.nan) ```

2. Standardize Categories

python df['status'].replace({ 'Y': 'Yes', 'y': 'Yes', 'YES': 'Yes', 'N': 'No', 'n': 'No', 'NO': 'No' })

3. Fix Data Entry Errors

python df['country'].replace({ 'USA': 'United States', 'U.S.A.': 'United States', 'US': 'United States' })

Method Parameters

Additional parameters for replace.

1. inplace

python df.replace(-999, np.nan, inplace=True)

2. limit

python df.replace(-999, np.nan, limit=10) # Max 10 replacements

3. method (Deprecated)

```python

method parameter was used for forward/backward fill

Now deprecated; use fillna instead

```


Exercises

Exercise 1. Create a DataFrame where -999 is used as a sentinel for missing data. Use .replace(-999, np.nan) to convert these to proper NaN values. Then count the resulting NaN values.

Solution to Exercise 1

Replace sentinel values with NaN.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'temp': [21, -999, 25, -999],
    'humidity': [65, 68, -999, 75]
})
df = df.replace(-999, np.nan)
print(df)
print(f"Total NaN: {df.isna().sum().sum()}")

Exercise 2. Create a Series with inconsistent string values (e.g., 'yes', 'Yes', 'YES', 'no', 'No'). Use .replace() with a dictionary to standardize them to 'Yes' and 'No'.

Solution to Exercise 2

Standardize inconsistent string values.

import pandas as pd

s = pd.Series(['yes', 'Yes', 'YES', 'no', 'No', 'NO'])
s = s.replace({'yes': 'Yes', 'YES': 'Yes', 'no': 'No', 'NO': 'No'})
print(s)
print(s.value_counts())

Exercise 3. Create a DataFrame and use .replace() with regex=True to replace all strings matching a pattern (e.g., replace any value starting with 'old_' with 'new_').

Solution to Exercise 3

Use regex replacement to transform matching values.

import pandas as pd

df = pd.DataFrame({
    'code': ['old_abc', 'old_def', 'new_ghi', 'old_jkl']
})
df['code'] = df['code'].replace(r'^old_', 'new_', regex=True)
print(df)