Skip to content

transform Method

The transform() method applies a function to each group and returns a result with the same shape as the original DataFrame.

Basic Concept

Transform preserves the original index.

1. Difference from agg

import pandas as pd

df = pd.DataFrame({
    'group': ['A', 'A', 'B', 'B'],
    'value': [1, 2, 3, 4]
})

# agg: returns one value per group
df.groupby('group')['value'].mean()
# group
# A    1.5
# B    3.5

# transform: returns same-length Series
df.groupby('group')['value'].transform('mean')
# 0    1.5
# 1    1.5
# 2    3.5
# 3    3.5

2. Same Shape Output

Transform output aligns with original DataFrame.

3. Broadcast Group Values

Each row gets its group's aggregated value.

LeetCode Example: First Activity Date

Find each player's first login date.

1. Sample Data

activity = pd.DataFrame({
    'player_id': [1, 1, 2, 2],
    'event_date': pd.to_datetime([
        '2024-01-01', '2024-01-02',
        '2024-01-03', '2024-01-04'
    ])
})

2. Transform with min

activity["first"] = activity.groupby("player_id")["event_date"].transform('min')
print(activity)
   player_id event_date      first
0          1 2024-01-01 2024-01-01
1          1 2024-01-02 2024-01-01
2          2 2024-01-03 2024-01-03
3          2 2024-01-04 2024-01-03

3. Each Row Gets Group Min

Every row for player 1 shows their first date.

Common Use Cases

Typical transform applications.

1. Group Normalization

df['normalized'] = df.groupby('group')['value'].transform(
    lambda x: (x - x.mean()) / x.std()
)

2. Percent of Group Total

df['pct_of_group'] = df.groupby('group')['value'].transform(
    lambda x: x / x.sum()
)

3. Rank Within Group

df['group_rank'] = df.groupby('group')['value'].transform('rank')

transform vs apply

Key differences between methods.

1. Output Shape

# transform: must return same shape
# apply: can return any shape

2. Function Requirements

# transform: function must return same-length Series
# apply: more flexible

3. Performance

# transform: often faster for built-in functions
# apply: more flexible but potentially slower

Multiple Columns

Transform multiple columns at once.

1. Same Function

df[['value1', 'value2']] = df.groupby('group')[['value1', 'value2']].transform('mean')

2. Different Functions

# Use apply for different functions per column

3. Preserving Original

# Create new columns instead of overwriting
df['value_mean'] = df.groupby('group')['value'].transform('mean')