Skip to content

transform Method

The transform() method applies a function to each group and returns a result with the same shape as the original DataFrame.

Mental Model

transform is the shape-preserving counterpart to agg. While agg collapses each group to one row, transform broadcasts the group result back to every row of that group. This is how you create columns like "each value minus its group mean" -- the group statistic is repeated to match the original row count.

Basic Concept

Transform preserves the original index.

1. Difference from agg

```python import pandas as pd

df = pd.DataFrame({ 'group': ['A', 'A', 'B', 'B'], 'value': [1, 2, 3, 4] })

agg: returns one value per group

df.groupby('group')['value'].mean()

group

A 1.5

B 3.5

transform: returns same-length Series

df.groupby('group')['value'].transform('mean')

0 1.5

1 1.5

2 3.5

3 3.5

```

2. Same Shape Output

Transform output aligns with original DataFrame.

3. Broadcast Group Values

Each row gets its group's aggregated value.

LeetCode Example: First Activity Date

Find each player's first login date.

1. Sample Data

python activity = pd.DataFrame({ 'player_id': [1, 1, 2, 2], 'event_date': pd.to_datetime([ '2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04' ]) })

2. Transform with min

python activity["first"] = activity.groupby("player_id")["event_date"].transform('min') print(activity)

player_id event_date first 0 1 2024-01-01 2024-01-01 1 1 2024-01-02 2024-01-01 2 2 2024-01-03 2024-01-03 3 2 2024-01-04 2024-01-03

3. Each Row Gets Group Min

Every row for player 1 shows their first date.

Common Use Cases

Typical transform applications.

1. Group Normalization

python df['normalized'] = df.groupby('group')['value'].transform( lambda x: (x - x.mean()) / x.std() )

2. Percent of Group Total

python df['pct_of_group'] = df.groupby('group')['value'].transform( lambda x: x / x.sum() )

3. Rank Within Group

python df['group_rank'] = df.groupby('group')['value'].transform('rank')

transform vs apply

Key differences between methods.

1. Output Shape

```python

transform: must return same shape

apply: can return any shape

```

2. Function Requirements

```python

transform: function must return same-length Series

apply: more flexible

```

3. Performance

```python

transform: often faster for built-in functions

apply: more flexible but potentially slower

```

Multiple Columns

Transform multiple columns at once.

1. Same Function

python df[['value1', 'value2']] = df.groupby('group')[['value1', 'value2']].transform('mean')

2. Different Functions

```python

Use apply for different functions per column

```

3. Preserving Original

```python

Create new columns instead of overwriting

df['value_mean'] = df.groupby('group')['value'].transform('mean') ```


Exercises

Exercise 1. Given a DataFrame with 'department' and 'salary' columns, use groupby().transform('mean') to add a new column 'dept_avg_salary' that shows each department's average salary on every row.

Solution to Exercise 1

Use transform('mean') to broadcast the group mean to every row.

import pandas as pd

df = pd.DataFrame({
    'department': ['IT', 'IT', 'HR', 'HR', 'IT'],
    'salary': [70000, 65000, 50000, 55000, 72000]
})
df['dept_avg_salary'] = df.groupby('department')['salary'].transform('mean')
print(df)

Exercise 2. Use transform to normalize values within each group: for each group, compute (x - mean) / std. Add the result as a 'normalized' column.

Solution to Exercise 2

Apply a lambda that computes z-scores within each group.

import pandas as pd

df = pd.DataFrame({
    'group': ['A', 'A', 'A', 'B', 'B', 'B'],
    'value': [10, 20, 30, 100, 200, 300]
})
df['normalized'] = df.groupby('group')['value'].transform(
    lambda x: (x - x.mean()) / x.std()
)
print(df)

Exercise 3. Use transform('sum') to compute each row's value as a percentage of its group total. Add a column 'pct_of_group' that shows what fraction each row contributes to its group's total.

Solution to Exercise 3

Divide each value by the group sum using transform.

import pandas as pd

df = pd.DataFrame({
    'region': ['East', 'East', 'West', 'West'],
    'sales': [100, 300, 200, 200]
})
df['pct_of_group'] = df['sales'] / df.groupby('region')['sales'].transform('sum')
print(df)