agg Method¶
The agg() method (short for aggregate) applies one or more aggregation functions to columns. It provides flexible control over which functions to apply.
Basic Usage¶
Apply aggregation functions to a DataFrame.
1. Single Function¶
import pandas as pd
import yfinance as yf
ticker = "WMT"
df = yf.Ticker(ticker).history(start="2020-01-30", end="2022-12-31")
df = df[["Open", "Close"]].pct_change()
print(df.head())
dg = df.agg('std')
print(dg)
Open 0.016482
Close 0.017234
dtype: float64
2. Multiple Functions¶
dg = df.agg(['std', 'max', 'min'])
print(dg)
Open Close
std 0.016482 0.017234
max 0.109375 0.119760
min -0.082456 -0.097674
3. Custom Function¶
def max_minus_min(x):
return x.max() - x.min()
dg = df.agg(['std', 'max', 'min', max_minus_min])
print(dg)
Column-specific Aggregations¶
Apply different functions to different columns.
1. Dictionary Specification¶
dg = df.agg({
"Open": ['std'],
"Close": ['std', max_minus_min]
})
print(dg)
Open Close
std 0.016482 0.017234
max_minus_min NaN 0.217434
2. Named Aggregations¶
df.agg(
open_std=('Open', 'std'),
close_mean=('Close', 'mean'),
close_max=('Close', 'max')
)
3. Multiple Functions per Column¶
df.agg({
'Open': ['mean', 'std', 'min', 'max'],
'Close': ['mean', 'std']
})
LeetCode Example: Trips Analysis¶
Aggregate trip data by date.
1. Sample Data¶
merged_data = pd.DataFrame({
'request_at': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'],
'valid_request': [1, 1, 1, 1],
'effective_cancellation': [0, 1, 0, 0]
})
2. GroupBy with agg¶
result = merged_data.groupby('request_at').agg({
'valid_request': 'sum',
'effective_cancellation': 'sum'
})
print(result)
3. Result¶
valid_request effective_cancellation
request_at
2024-01-01 2 1
2024-01-02 2 0
String Function Names¶
Common aggregation function strings.
1. Statistical Functions¶
# 'mean', 'sum', 'min', 'max', 'std', 'var', 'sem'
# 'median', 'first', 'last', 'count', 'nunique'
2. Example Usage¶
df.agg(['mean', 'std', 'min', 'max'])
3. Alias Functions¶
# 'prod' - product
# 'size' - length including NaN
# 'count' - length excluding NaN
agg vs aggregate¶
The methods are identical.
1. Alias¶
df.agg('mean') # Shorthand
df.aggregate('mean') # Full name
2. Same Functionality¶
Both accept the same parameters and return identical results.
3. Prefer agg¶
The agg shorthand is more common in practice.
Returning DataFrames¶
Control output structure.
1. Single Function Returns Series¶
result = df.agg('mean')
print(type(result)) # Series
2. Multiple Functions Return DataFrame¶
result = df.agg(['mean', 'std'])
print(type(result)) # DataFrame
3. Dict Returns DataFrame¶
result = df.agg({'A': 'mean', 'B': 'std'})
print(type(result)) # Series (single value per column)