Skip to content

assign Method

The assign() method adds new columns to a DataFrame, returning a new DataFrame with the additions.

Basic Usage

Add new columns.

1. Single Column

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'salary': [50000, 60000, 70000]
})

df = df.assign(bonus=5000)
print(df)
      name  salary  bonus
0    Alice   50000   5000
1      Bob   60000   5000
2  Charlie   70000   5000

2. Multiple Columns

df = df.assign(
    bonus=5000,
    tax_rate=0.25
)

3. Returns New DataFrame

# Original unchanged unless reassigned
new_df = df.assign(bonus=5000)

Computed Columns

Create columns based on existing data.

1. From Other Columns

df = df.assign(
    total_comp=df['salary'] + 5000
)

2. Using Lambda

df = df.assign(
    total_comp=lambda x: x['salary'] + 5000
)

3. Multiple Computed

df = df.assign(
    bonus=lambda x: x['salary'] * 0.1,
    tax=lambda x: x['salary'] * 0.25,
    net=lambda x: x['salary'] - x['salary'] * 0.25
)

Lambda Advantage

Lambda functions access the DataFrame being created.

1. Chain Dependencies

df = df.assign(
    bonus=lambda x: x['salary'] * 0.1,
    total=lambda x: x['salary'] + x['bonus']  # Uses bonus just created
)

2. Order Matters

# This works because bonus is created first
df.assign(
    bonus=df['salary'] * 0.1,
    total=lambda x: x['salary'] + x['bonus']
)

3. Without Lambda Issue

# This fails if bonus column doesn't exist yet
# df.assign(
#     bonus=df['salary'] * 0.1,
#     total=df['salary'] + df['bonus']  # Error!
# )

LeetCode Example: Restaurant Growth

Calculate rolling sums and averages.

1. Sample Data

df = pd.DataFrame({
    'visited_on': pd.date_range('2024-07-15', periods=7),
    'amount': [30.0, 20.0, 40.0, 10.0, 50.0, 20.0, 60.0]
})
df = df.set_index('visited_on')

2. Rolling Calculation

rolling_sum = df['amount'].rolling('7D').sum()

3. Assign Multiple Columns

df = df.assign(
    amount=rolling_sum,
    average_amount=round(rolling_sum / 7, 2)
)
print(df)
            amount  average_amount
visited_on                        
2024-07-15    30.0            4.29
2024-07-16    50.0            7.14
2024-07-17    90.0           12.86
2024-07-18   100.0           14.29
2024-07-19   150.0           21.43
2024-07-20   170.0           24.29
2024-07-21   230.0           32.86

Method Chaining

assign works well in method chains.

1. Chain Operations

result = (
    df
    .assign(bonus=lambda x: x['salary'] * 0.1)
    .assign(total=lambda x: x['salary'] + x['bonus'])
    .query('total > 60000')
)

2. Multiple assigns

result = (
    df
    .assign(year=lambda x: x['date'].dt.year)
    .assign(month=lambda x: x['date'].dt.month)
    .groupby(['year', 'month'])
    .sum()
)

3. With Other Methods

result = (
    df
    .dropna()
    .assign(calculated=lambda x: x['a'] + x['b'])
    .sort_values('calculated')
)

Overwriting Columns

assign can replace existing columns.

1. Replace Column

df = df.assign(salary=df['salary'] * 1.1)  # 10% raise

2. Transform Column

df = df.assign(name=df['name'].str.upper())

3. Multiple Transforms

df = df.assign(
    salary=lambda x: x['salary'] * 1.1,
    name=lambda x: x['name'].str.title()
)

vs Direct Assignment

Compare assign to direct column assignment.

1. Direct Assignment

df['bonus'] = 5000  # Modifies df in place

2. assign Method

df = df.assign(bonus=5000)  # Returns new DataFrame

3. When to Use Each

# Direct: quick modifications
# assign: method chaining, functional style