agg Method¶
The agg() method (short for aggregate) applies one or more aggregation functions to columns. It provides flexible control over which functions to apply.
Mental Model
While mean() or sum() apply a single reduction, agg() is the Swiss-army knife: pass it a function name, a list, or a dict and it routes the right aggregation to the right column. Think of it as a dispatch table that maps columns to reduction functions.
Basic Usage¶
Apply aggregation functions to a DataFrame.
1. Single Function¶
```python import pandas as pd import yfinance as yf
ticker = "WMT" df = yf.Ticker(ticker).history(start="2020-01-30", end="2022-12-31") df = df[["Open", "Close"]].pct_change() print(df.head())
dg = df.agg('std') print(dg) ```
Open 0.016482
Close 0.017234
dtype: float64
2. Multiple Functions¶
python
dg = df.agg(['std', 'max', 'min'])
print(dg)
Open Close
std 0.016482 0.017234
max 0.109375 0.119760
min -0.082456 -0.097674
3. Custom Function¶
```python def max_minus_min(x): return x.max() - x.min()
dg = df.agg(['std', 'max', 'min', max_minus_min]) print(dg) ```
Column-specific Aggregations¶
Apply different functions to different columns.
1. Dictionary Specification¶
python
dg = df.agg({
"Open": ['std'],
"Close": ['std', max_minus_min]
})
print(dg)
Open Close
std 0.016482 0.017234
max_minus_min NaN 0.217434
2. Named Aggregations¶
python
df.agg(
open_std=('Open', 'std'),
close_mean=('Close', 'mean'),
close_max=('Close', 'max')
)
3. Multiple Functions per Column¶
python
df.agg({
'Open': ['mean', 'std', 'min', 'max'],
'Close': ['mean', 'std']
})
LeetCode Example: Trips Analysis¶
Aggregate trip data by date.
1. Sample Data¶
python
merged_data = pd.DataFrame({
'request_at': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'],
'valid_request': [1, 1, 1, 1],
'effective_cancellation': [0, 1, 0, 0]
})
2. GroupBy with agg¶
python
result = merged_data.groupby('request_at').agg({
'valid_request': 'sum',
'effective_cancellation': 'sum'
})
print(result)
3. Result¶
valid_request effective_cancellation
request_at
2024-01-01 2 1
2024-01-02 2 0
String Function Names¶
Common aggregation function strings.
1. Statistical Functions¶
```python
'mean', 'sum', 'min', 'max', 'std', 'var', 'sem'¶
'median', 'first', 'last', 'count', 'nunique'¶
```
2. Example Usage¶
python
df.agg(['mean', 'std', 'min', 'max'])
3. Alias Functions¶
```python
'prod' - product¶
'size' - length including NaN¶
'count' - length excluding NaN¶
```
agg vs aggregate¶
The methods are identical.
1. Alias¶
python
df.agg('mean') # Shorthand
df.aggregate('mean') # Full name
2. Same Functionality¶
Both accept the same parameters and return identical results.
3. Prefer agg¶
The agg shorthand is more common in practice.
Returning DataFrames¶
Control output structure.
1. Single Function Returns Series¶
python
result = df.agg('mean')
print(type(result)) # Series
2. Multiple Functions Return DataFrame¶
python
result = df.agg(['mean', 'std'])
print(type(result)) # DataFrame
3. Dict Returns DataFrame¶
python
result = df.agg({'A': 'mean', 'B': 'std'})
print(type(result)) # Series (single value per column)
Exercises¶
Exercise 1.
Create a DataFrame with columns 'revenue' and 'cost'. Use .agg() with a list of functions ['sum', 'mean', 'max'] to compute multiple aggregations on the 'revenue' column in a single call.
Solution to Exercise 1
Pass a list of function names to .agg().
import pandas as pd
df = pd.DataFrame({
'revenue': [100, 200, 300, 150],
'cost': [80, 150, 200, 100]
})
result = df['revenue'].agg(['sum', 'mean', 'max'])
print(result)
Exercise 2.
Use .agg() with a dictionary to apply different functions to different columns: compute the 'sum' of 'quantity' and the 'mean' of 'price' in a DataFrame with those two columns.
Solution to Exercise 2
Pass a dictionary mapping column names to functions.
import pandas as pd
df = pd.DataFrame({
'quantity': [10, 20, 30, 40],
'price': [5.0, 7.5, 3.0, 6.0]
})
result = df.agg({'quantity': 'sum', 'price': 'mean'})
print(result)
Exercise 3.
Write a custom aggregation function that computes the range (max minus min) of a column. Apply it to a DataFrame using .agg() alongside built-in aggregation functions 'mean' and 'std'.
Solution to Exercise 3
Define a custom range function and mix with built-in functions.
import pandas as pd
def value_range(x):
return x.max() - x.min()
df = pd.DataFrame({
'values': [10, 25, 15, 40, 30]
})
result = df['values'].agg(['mean', 'std', value_range])
print(result)