Window with GroupBy¶

Combine window functions with groupby to apply rolling, expanding, or EWM calculations within groups.

Mental Model

groupby(...).rolling(N) applies the rolling window independently within each group. This is essential for panel data: computing a 20-day moving average per stock, not across stocks. Use transform with a lambda for clean integration back into the original DataFrame without index headaches.

Rolling Within Groups¶

Apply rolling calculations per group.

1. Basic Pattern¶

```python import pandas as pd

df = pd.DataFrame({ 'Ticker': ['AAPL', 'AAPL', 'AAPL', 'MSFT', 'MSFT', 'MSFT'], 'Date': pd.date_range('2024-01-01', periods=3).tolist() * 2, 'Close': [150, 152, 151, 350, 355, 353] })

df['5D_MA'] = df.groupby('Ticker')['Close'].rolling(window=2).mean().reset_index(0, drop=True) print(df) ```

2. Using transform¶

python df['MA'] = df.groupby('Ticker')['Close'].transform( lambda x: x.rolling(2).mean() )

3. Preserves Original Index¶

The transform approach maintains alignment with original DataFrame.

Expanding Within Groups¶

Cumulative calculations per group.

1. Cumulative Mean per Group¶

python df['Cum_Mean'] = df.groupby('Ticker')['Close'].transform( lambda x: x.expanding().mean() )

2. Running Max per Group¶

python df['Running_Max'] = df.groupby('Ticker')['Close'].transform( lambda x: x.expanding().max() )

3. Since-Inception Return per Asset¶

python df['Since_Start'] = df.groupby('Ticker')['Close'].transform( lambda x: x / x.iloc[0] - 1 )

EWM Within Groups¶

Exponentially weighted calculations per group.

1. EWMA per Asset¶

python df['EWMA'] = df.groupby('Ticker')['Close'].transform( lambda x: x.ewm(span=10).mean() )

2. EWM Volatility per Asset¶

python returns = df.groupby('Ticker')['Close'].pct_change() df['EWM_Vol'] = returns.groupby(df['Ticker']).transform( lambda x: x.ewm(span=20).std() )

3. Independent Decay per Group¶

Each group's EWM is calculated independently.

Practical Example¶

Multi-asset portfolio analysis.

1. Sample Data¶

```python import numpy as np

np.random.seed(42) dates = pd.date_range('2024-01-01', periods=50) tickers = ['AAPL', 'MSFT', 'GOOGL']

data = [] for ticker in tickers: prices = 100 + np.cumsum(np.random.randn(50)) for date, price in zip(dates, prices): data.append({'Ticker': ticker, 'Date': date, 'Close': price})

df = pd.DataFrame(data) ```

2. Add Rolling Stats per Asset¶

```python df['MA20'] = df.groupby('Ticker')['Close'].transform( lambda x: x.rolling(20).mean() )

df['Vol20'] = df.groupby('Ticker')['Close'].transform( lambda x: x.pct_change().rolling(20).std() ) ```

3. Cross-sectional Rank¶

```python

Rank within each date¶

df['Rank'] = df.groupby('Date')['Close'].rank() ```

Performance Tip¶

Efficient grouped window operations.

1. Single transform Call¶

```python

Faster: single grouped operation¶

df['MA'] = df.groupby('Ticker')['Close'].transform( lambda x: x.rolling(20).mean() ) ```

2. Avoid Looping¶

```python

Slower: don't loop over groups¶

for ticker in df['Ticker'].unique():¶

...¶

```

3. Use Built-in When Possible¶

GroupBy rolling is optimized internally.

Exercises¶

Exercise 1. Write code that computes a rolling mean within each group using df.groupby('group').rolling(5).mean().

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise¶

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```

Exercise 2. Explain why combining groupby() with rolling() is useful. Give a real-world example.

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.

Exercise 3. Write code that computes an expanding sum for each group separately.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```

Exercise 4. Create a DataFrame with panel data (multiple entities over time) and compute rolling z-scores within each entity.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```