Skip to content

agg with Multiple Functions

The agg() method excels at applying multiple aggregation functions simultaneously, producing comprehensive summary statistics.

Multiple Functions per Column

Apply several functions to each column.

1. List of Functions

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50]
})

result = df.agg(['mean', 'std', 'min', 'max'])
print(result)
             A          B
mean  3.000000  30.000000
std   1.581139  15.811388
min   1.000000  10.000000
max   5.000000  50.000000

2. With Custom Functions

def range_func(x):
    return x.max() - x.min()

result = df.agg(['mean', 'std', range_func])
print(result)

3. Lambda Functions

result = df.agg([
    'mean',
    'std',
    lambda x: x.max() - x.min()
])

LeetCode Example: Monthly Transactions

Aggregate multiple metrics by group.

1. Sample Data

transactions = pd.DataFrame({
    'month': ['2023-01', '2023-01', '2023-02', '2023-02', '2023-03'],
    'country': ['USA', 'USA', 'UK', 'UK', 'Canada'],
    'id': [1, 2, 3, 4, 5],
    'state': [1, 0, 1, 1, 0],  # 1=approved, 0=declined
    'amount': [50, 20, 70, 30, 40],
    'approved_amount': [50, 0, 70, 30, 0]
})

2. Multi-metric Aggregation

aggregated = transactions.groupby(
    ['month', 'country'],
    dropna=False
).agg({
    'id': 'count',
    'state': 'sum',
    'amount': 'sum',
    'approved_amount': 'sum'
}).reset_index()

print(aggregated)

3. Renaming Columns

aggregated.columns = [
    'month', 'country', 'trans_count',
    'approved_count', 'total_amount', 'approved_total'
]

LeetCode Example: Employee Reports

Count and average with named aggregations.

1. Sample Data

employees = pd.DataFrame({
    'employee_id': [1, 2, 3, 4, 5],
    'reports_to': [0, 1, 1, 2, 2],
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [45, 30, 28, 35, 40]
})

2. Named Aggregations

result = employees.groupby('reports_to', as_index=False).agg(
    reports_count=('employee_id', 'size'),
    average_age=('age', 'mean')
)
print(result)

3. Result

   reports_to  reports_count  average_age
0           0              1         45.0
1           1              2         29.0
2           2              2         37.5

Hierarchical Column Output

Multiple functions create MultiIndex columns.

1. MultiIndex Columns

result = df.agg({
    'A': ['mean', 'std'],
    'B': ['min', 'max']
})
print(result)
print(result.columns)

2. Flatten Columns

result.columns = ['_'.join(col) for col in result.columns]

3. Access Specific Results

result['A']['mean']  # Access mean of column A
result[('A', 'mean')]  # Alternative syntax

Combining with GroupBy

agg is most powerful with groupby.

1. Grouped Aggregation

df = pd.DataFrame({
    'Group': ['A', 'A', 'B', 'B'],
    'Value': [1, 2, 3, 4],
    'Other': [10, 20, 30, 40]
})

df.groupby('Group').agg({
    'Value': ['sum', 'mean'],
    'Other': ['min', 'max']
})

2. Different Functions per Column

df.groupby('Group').agg({
    'Value': 'sum',
    'Other': 'mean'
})

3. Named Aggregations

df.groupby('Group').agg(
    value_total=('Value', 'sum'),
    value_avg=('Value', 'mean'),
    other_range=('Other', lambda x: x.max() - x.min())
)

Best Practices

Guidelines for effective aggregation.

1. Use Named Aggregations

# Clear and readable
df.groupby('Group').agg(
    total=('Value', 'sum'),
    average=('Value', 'mean')
)

2. Reset Index When Needed

result = df.groupby('Group').agg('sum').reset_index()

3. Handle Missing Values

df.groupby('Group', dropna=False).agg('mean')