transform Method¶
The transform() method applies a function to each group and returns a result with the same shape as the original DataFrame.
Mental Model
transform is the shape-preserving counterpart to agg. While agg collapses each group to one row, transform broadcasts the group result back to every row of that group. This is how you create columns like "each value minus its group mean" -- the group statistic is repeated to match the original row count.
Basic Concept¶
Transform preserves the original index.
1. Difference from agg¶
```python import pandas as pd
df = pd.DataFrame({ 'group': ['A', 'A', 'B', 'B'], 'value': [1, 2, 3, 4] })
agg: returns one value per group¶
df.groupby('group')['value'].mean()
group¶
A 1.5¶
B 3.5¶
transform: returns same-length Series¶
df.groupby('group')['value'].transform('mean')
0 1.5¶
1 1.5¶
2 3.5¶
3 3.5¶
```
2. Same Shape Output¶
Transform output aligns with original DataFrame.
3. Broadcast Group Values¶
Each row gets its group's aggregated value.
LeetCode Example: First Activity Date¶
Find each player's first login date.
1. Sample Data¶
python
activity = pd.DataFrame({
'player_id': [1, 1, 2, 2],
'event_date': pd.to_datetime([
'2024-01-01', '2024-01-02',
'2024-01-03', '2024-01-04'
])
})
2. Transform with min¶
python
activity["first"] = activity.groupby("player_id")["event_date"].transform('min')
print(activity)
player_id event_date first
0 1 2024-01-01 2024-01-01
1 1 2024-01-02 2024-01-01
2 2 2024-01-03 2024-01-03
3 2 2024-01-04 2024-01-03
3. Each Row Gets Group Min¶
Every row for player 1 shows their first date.
Common Use Cases¶
Typical transform applications.
1. Group Normalization¶
python
df['normalized'] = df.groupby('group')['value'].transform(
lambda x: (x - x.mean()) / x.std()
)
2. Percent of Group Total¶
python
df['pct_of_group'] = df.groupby('group')['value'].transform(
lambda x: x / x.sum()
)
3. Rank Within Group¶
python
df['group_rank'] = df.groupby('group')['value'].transform('rank')
transform vs apply¶
Key differences between methods.
1. Output Shape¶
```python
transform: must return same shape¶
apply: can return any shape¶
```
2. Function Requirements¶
```python
transform: function must return same-length Series¶
apply: more flexible¶
```
3. Performance¶
```python
transform: often faster for built-in functions¶
apply: more flexible but potentially slower¶
```
Multiple Columns¶
Transform multiple columns at once.
1. Same Function¶
python
df[['value1', 'value2']] = df.groupby('group')[['value1', 'value2']].transform('mean')
2. Different Functions¶
```python
Use apply for different functions per column¶
```
3. Preserving Original¶
```python
Create new columns instead of overwriting¶
df['value_mean'] = df.groupby('group')['value'].transform('mean') ```
Exercises¶
Exercise 1.
Given a DataFrame with 'department' and 'salary' columns, use groupby().transform('mean') to add a new column 'dept_avg_salary' that shows each department's average salary on every row.
Solution to Exercise 1
Use transform('mean') to broadcast the group mean to every row.
import pandas as pd
df = pd.DataFrame({
'department': ['IT', 'IT', 'HR', 'HR', 'IT'],
'salary': [70000, 65000, 50000, 55000, 72000]
})
df['dept_avg_salary'] = df.groupby('department')['salary'].transform('mean')
print(df)
Exercise 2.
Use transform to normalize values within each group: for each group, compute (x - mean) / std. Add the result as a 'normalized' column.
Solution to Exercise 2
Apply a lambda that computes z-scores within each group.
import pandas as pd
df = pd.DataFrame({
'group': ['A', 'A', 'A', 'B', 'B', 'B'],
'value': [10, 20, 30, 100, 200, 300]
})
df['normalized'] = df.groupby('group')['value'].transform(
lambda x: (x - x.mean()) / x.std()
)
print(df)
Exercise 3.
Use transform('sum') to compute each row's value as a percentage of its group total. Add a column 'pct_of_group' that shows what fraction each row contributes to its group's total.
Solution to Exercise 3
Divide each value by the group sum using transform.
import pandas as pd
df = pd.DataFrame({
'region': ['East', 'East', 'West', 'West'],
'sales': [100, 300, 200, 200]
})
df['pct_of_group'] = df['sales'] / df.groupby('region')['sales'].transform('sum')
print(df)