Window with GroupBy¶
Combine window functions with groupby to apply rolling, expanding, or EWM calculations within groups.
Mental Model
groupby(...).rolling(N) applies the rolling window independently within each group. This is essential for panel data: computing a 20-day moving average per stock, not across stocks. Use transform with a lambda for clean integration back into the original DataFrame without index headaches.
Rolling Within Groups¶
Apply rolling calculations per group.
1. Basic Pattern¶
```python import pandas as pd
df = pd.DataFrame({ 'Ticker': ['AAPL', 'AAPL', 'AAPL', 'MSFT', 'MSFT', 'MSFT'], 'Date': pd.date_range('2024-01-01', periods=3).tolist() * 2, 'Close': [150, 152, 151, 350, 355, 353] })
df['5D_MA'] = df.groupby('Ticker')['Close'].rolling(window=2).mean().reset_index(0, drop=True) print(df) ```
2. Using transform¶
python
df['MA'] = df.groupby('Ticker')['Close'].transform(
lambda x: x.rolling(2).mean()
)
3. Preserves Original Index¶
The transform approach maintains alignment with original DataFrame.
Expanding Within Groups¶
Cumulative calculations per group.
1. Cumulative Mean per Group¶
python
df['Cum_Mean'] = df.groupby('Ticker')['Close'].transform(
lambda x: x.expanding().mean()
)
2. Running Max per Group¶
python
df['Running_Max'] = df.groupby('Ticker')['Close'].transform(
lambda x: x.expanding().max()
)
3. Since-Inception Return per Asset¶
python
df['Since_Start'] = df.groupby('Ticker')['Close'].transform(
lambda x: x / x.iloc[0] - 1
)
EWM Within Groups¶
Exponentially weighted calculations per group.
1. EWMA per Asset¶
python
df['EWMA'] = df.groupby('Ticker')['Close'].transform(
lambda x: x.ewm(span=10).mean()
)
2. EWM Volatility per Asset¶
python
returns = df.groupby('Ticker')['Close'].pct_change()
df['EWM_Vol'] = returns.groupby(df['Ticker']).transform(
lambda x: x.ewm(span=20).std()
)
3. Independent Decay per Group¶
Each group's EWM is calculated independently.
Practical Example¶
Multi-asset portfolio analysis.
1. Sample Data¶
```python import numpy as np
np.random.seed(42) dates = pd.date_range('2024-01-01', periods=50) tickers = ['AAPL', 'MSFT', 'GOOGL']
data = [] for ticker in tickers: prices = 100 + np.cumsum(np.random.randn(50)) for date, price in zip(dates, prices): data.append({'Ticker': ticker, 'Date': date, 'Close': price})
df = pd.DataFrame(data) ```
2. Add Rolling Stats per Asset¶
```python df['MA20'] = df.groupby('Ticker')['Close'].transform( lambda x: x.rolling(20).mean() )
df['Vol20'] = df.groupby('Ticker')['Close'].transform( lambda x: x.pct_change().rolling(20).std() ) ```
3. Cross-sectional Rank¶
```python
Rank within each date¶
df['Rank'] = df.groupby('Date')['Close'].rank() ```
Performance Tip¶
Efficient grouped window operations.
1. Single transform Call¶
```python
Faster: single grouped operation¶
df['MA'] = df.groupby('Ticker')['Close'].transform( lambda x: x.rolling(20).mean() ) ```
2. Avoid Looping¶
```python
Slower: don't loop over groups¶
for ticker in df['Ticker'].unique():¶
...¶
```
3. Use Built-in When Possible¶
GroupBy rolling is optimized internally.
Exercises¶
Exercise 1. Write code that computes a rolling mean within each group using df.groupby('group').rolling(5).mean().
Solution to Exercise 1
```python import pandas as pd import numpy as np
Solution for the specific exercise¶
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```
Exercise 2. Explain why combining groupby() with rolling() is useful. Give a real-world example.
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.
Exercise 3. Write code that computes an expanding sum for each group separately.
Solution to Exercise 3
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```
Exercise 4. Create a DataFrame with panel data (multiple entities over time) and compute rolling z-scores within each entity.
Solution to Exercise 4
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```