assign Method¶
The assign() method adds new columns to a DataFrame, returning a new DataFrame with the additions.
Mental Model
assign() is the functional way to add columns: it returns a new DataFrame instead of modifying the original. This makes it perfect for method chaining -- you can pipe assign calls together to build up derived columns step by step without side effects.
Basic Usage¶
Add new columns.
1. Single Column¶
```python import pandas as pd
df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie'], 'salary': [50000, 60000, 70000] })
df = df.assign(bonus=5000) print(df) ```
name salary bonus
0 Alice 50000 5000
1 Bob 60000 5000
2 Charlie 70000 5000
2. Multiple Columns¶
python
df = df.assign(
bonus=5000,
tax_rate=0.25
)
3. Returns New DataFrame¶
```python
Original unchanged unless reassigned¶
new_df = df.assign(bonus=5000) ```
Computed Columns¶
Create columns based on existing data.
1. From Other Columns¶
python
df = df.assign(
total_comp=df['salary'] + 5000
)
2. Using Lambda¶
python
df = df.assign(
total_comp=lambda x: x['salary'] + 5000
)
3. Multiple Computed¶
python
df = df.assign(
bonus=lambda x: x['salary'] * 0.1,
tax=lambda x: x['salary'] * 0.25,
net=lambda x: x['salary'] - x['salary'] * 0.25
)
Lambda Advantage¶
Lambda functions access the DataFrame being created.
1. Chain Dependencies¶
python
df = df.assign(
bonus=lambda x: x['salary'] * 0.1,
total=lambda x: x['salary'] + x['bonus'] # Uses bonus just created
)
2. Order Matters¶
```python
This works because bonus is created first¶
df.assign( bonus=df['salary'] * 0.1, total=lambda x: x['salary'] + x['bonus'] ) ```
3. Without Lambda Issue¶
```python
This fails if bonus column doesn't exist yet¶
df.assign(¶
bonus=df['salary'] * 0.1,¶
total=df['salary'] + df['bonus'] # Error!¶
)¶
```
LeetCode Example: Restaurant Growth¶
Calculate rolling sums and averages.
1. Sample Data¶
python
df = pd.DataFrame({
'visited_on': pd.date_range('2024-07-15', periods=7),
'amount': [30.0, 20.0, 40.0, 10.0, 50.0, 20.0, 60.0]
})
df = df.set_index('visited_on')
2. Rolling Calculation¶
python
rolling_sum = df['amount'].rolling('7D').sum()
3. Assign Multiple Columns¶
python
df = df.assign(
amount=rolling_sum,
average_amount=round(rolling_sum / 7, 2)
)
print(df)
amount average_amount
visited_on
2024-07-15 30.0 4.29
2024-07-16 50.0 7.14
2024-07-17 90.0 12.86
2024-07-18 100.0 14.29
2024-07-19 150.0 21.43
2024-07-20 170.0 24.29
2024-07-21 230.0 32.86
Method Chaining¶
assign works well in method chains.
1. Chain Operations¶
python
result = (
df
.assign(bonus=lambda x: x['salary'] * 0.1)
.assign(total=lambda x: x['salary'] + x['bonus'])
.query('total > 60000')
)
2. Multiple assigns¶
python
result = (
df
.assign(year=lambda x: x['date'].dt.year)
.assign(month=lambda x: x['date'].dt.month)
.groupby(['year', 'month'])
.sum()
)
3. With Other Methods¶
python
result = (
df
.dropna()
.assign(calculated=lambda x: x['a'] + x['b'])
.sort_values('calculated')
)
Overwriting Columns¶
assign can replace existing columns.
1. Replace Column¶
python
df = df.assign(salary=df['salary'] * 1.1) # 10% raise
2. Transform Column¶
python
df = df.assign(name=df['name'].str.upper())
3. Multiple Transforms¶
python
df = df.assign(
salary=lambda x: x['salary'] * 1.1,
name=lambda x: x['name'].str.title()
)
vs Direct Assignment¶
Compare assign to direct column assignment.
1. Direct Assignment¶
python
df['bonus'] = 5000 # Modifies df in place
2. assign Method¶
python
df = df.assign(bonus=5000) # Returns new DataFrame
3. When to Use Each¶
```python
Direct: quick modifications¶
assign: method chaining, functional style¶
```
Exercises¶
Exercise 1.
Create a DataFrame with a 'price' column. Use .assign() to add a 'tax' column (10% of price) and a 'total' column (price + tax) in a single call.
Solution to Exercise 1
Add multiple computed columns with assign.
import pandas as pd
df = pd.DataFrame({'product': ['A', 'B', 'C'], 'price': [10.0, 25.0, 15.0]})
result = df.assign(
tax=lambda x: x['price'] * 0.10,
total=lambda x: x['price'] * 1.10
)
print(result)
Exercise 2.
Use .assign() with a lambda that references a previously created column in the same call. For example, create 'bonus' as 10% of salary, then 'total_pay' as salary + bonus, all in one .assign() chain.
Solution to Exercise 2
Reference columns created within the same assign call.
import pandas as pd
df = pd.DataFrame({'name': ['Alice', 'Bob'], 'salary': [50000, 60000]})
result = df.assign(
bonus=lambda x: x['salary'] * 0.10,
total_pay=lambda x: x['salary'] + x['salary'] * 0.10
)
print(result)
Exercise 3.
Create a DataFrame and use .assign() to overwrite an existing column (e.g., round a float column to 2 decimal places). Verify that the original DataFrame is unchanged and only the returned DataFrame has the modification.
Solution to Exercise 3
Overwrite a column via assign without modifying the original.
import pandas as pd
df = pd.DataFrame({'value': [3.14159, 2.71828, 1.41421]})
result = df.assign(value=lambda x: x['value'].round(2))
print("Original:", df['value'].tolist())
print("Assigned:", result['value'].tolist())