agg with Multiple Functions¶
The agg() method excels at applying multiple aggregation functions simultaneously, producing comprehensive summary statistics.
Multiple Functions per Column¶
Apply several functions to each column.
1. List of Functions¶
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50]
})
result = df.agg(['mean', 'std', 'min', 'max'])
print(result)
A B
mean 3.000000 30.000000
std 1.581139 15.811388
min 1.000000 10.000000
max 5.000000 50.000000
2. With Custom Functions¶
def range_func(x):
return x.max() - x.min()
result = df.agg(['mean', 'std', range_func])
print(result)
3. Lambda Functions¶
result = df.agg([
'mean',
'std',
lambda x: x.max() - x.min()
])
LeetCode Example: Monthly Transactions¶
Aggregate multiple metrics by group.
1. Sample Data¶
transactions = pd.DataFrame({
'month': ['2023-01', '2023-01', '2023-02', '2023-02', '2023-03'],
'country': ['USA', 'USA', 'UK', 'UK', 'Canada'],
'id': [1, 2, 3, 4, 5],
'state': [1, 0, 1, 1, 0], # 1=approved, 0=declined
'amount': [50, 20, 70, 30, 40],
'approved_amount': [50, 0, 70, 30, 0]
})
2. Multi-metric Aggregation¶
aggregated = transactions.groupby(
['month', 'country'],
dropna=False
).agg({
'id': 'count',
'state': 'sum',
'amount': 'sum',
'approved_amount': 'sum'
}).reset_index()
print(aggregated)
3. Renaming Columns¶
aggregated.columns = [
'month', 'country', 'trans_count',
'approved_count', 'total_amount', 'approved_total'
]
LeetCode Example: Employee Reports¶
Count and average with named aggregations.
1. Sample Data¶
employees = pd.DataFrame({
'employee_id': [1, 2, 3, 4, 5],
'reports_to': [0, 1, 1, 2, 2],
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'age': [45, 30, 28, 35, 40]
})
2. Named Aggregations¶
result = employees.groupby('reports_to', as_index=False).agg(
reports_count=('employee_id', 'size'),
average_age=('age', 'mean')
)
print(result)
3. Result¶
reports_to reports_count average_age
0 0 1 45.0
1 1 2 29.0
2 2 2 37.5
Hierarchical Column Output¶
Multiple functions create MultiIndex columns.
1. MultiIndex Columns¶
result = df.agg({
'A': ['mean', 'std'],
'B': ['min', 'max']
})
print(result)
print(result.columns)
2. Flatten Columns¶
result.columns = ['_'.join(col) for col in result.columns]
3. Access Specific Results¶
result['A']['mean'] # Access mean of column A
result[('A', 'mean')] # Alternative syntax
Combining with GroupBy¶
agg is most powerful with groupby.
1. Grouped Aggregation¶
df = pd.DataFrame({
'Group': ['A', 'A', 'B', 'B'],
'Value': [1, 2, 3, 4],
'Other': [10, 20, 30, 40]
})
df.groupby('Group').agg({
'Value': ['sum', 'mean'],
'Other': ['min', 'max']
})
2. Different Functions per Column¶
df.groupby('Group').agg({
'Value': 'sum',
'Other': 'mean'
})
3. Named Aggregations¶
df.groupby('Group').agg(
value_total=('Value', 'sum'),
value_avg=('Value', 'mean'),
other_range=('Other', lambda x: x.max() - x.min())
)
Best Practices¶
Guidelines for effective aggregation.
1. Use Named Aggregations¶
# Clear and readable
df.groupby('Group').agg(
total=('Value', 'sum'),
average=('Value', 'mean')
)
2. Reset Index When Needed¶
result = df.groupby('Group').agg('sum').reset_index()
3. Handle Missing Values¶
df.groupby('Group', dropna=False).agg('mean')