Panel Aggregations¶
Panel data enables both entity-wise and time-wise aggregations. This document covers common aggregation patterns.
Mental Model
Panel aggregation has two natural directions: aggregate across time (one result per entity) or across entities (one result per time point). Use groupby(level='ticker') for entity-wise and groupby(level='date') for time-wise summaries. The level parameter in groupby replaces column names with index level names.
Setup¶
```python import pandas as pd import numpy as np
Create sample panel¶
tickers = ['AAPL', 'MSFT', 'GOOGL'] dates = pd.date_range('2024-01-01', periods=10) index = pd.MultiIndex.from_product([tickers, dates], names=['ticker', 'date'])
np.random.seed(42) df = pd.DataFrame({ 'return': np.random.randn(30) * 0.02, 'volume': np.random.randint(1000, 10000, 30) }, index=index) ```
Entity-Wise Aggregation (Across Time)¶
Aggregate each entity's time series.
Using groupby(level=)¶
```python
Mean return per ticker (across all dates)¶
mean_by_ticker = df.groupby(level='ticker').mean() print("Mean by ticker:") print(mean_by_ticker) ```
Mean by ticker:
return volume
ticker
AAPL 0.005123 5234.500000
MSFT 0.002891 4891.300000
GOOGL -0.001234 5567.200000
Various Aggregations per Entity¶
```python
Multiple statistics per ticker¶
entity_stats = df.groupby(level='ticker').agg({ 'return': ['mean', 'std', 'min', 'max'], 'volume': ['mean', 'sum'] }) print("Entity statistics:") print(entity_stats) ```
Rolling Operations per Entity¶
```python
5-day rolling mean return per ticker¶
rolling_mean = df.groupby(level='ticker')['return'].rolling(window=5).mean() print("Rolling mean per entity:") print(rolling_mean) ```
Time-Wise Aggregation (Cross-Sectional)¶
Aggregate across entities at each time point.
Using groupby(level=)¶
```python
Mean return per date (across all tickers)¶
mean_by_date = df.groupby(level='date').mean() print("Mean by date (cross-sectional):") print(mean_by_date) ```
Mean by date (cross-sectional):
return volume
date
2024-01-01 0.003456 4567.333333
2024-01-02 -0.001234 5234.000000
...
Cross-Sectional Statistics¶
```python
Cross-sectional stats each day¶
cross_sectional = df.groupby(level='date').agg({ 'return': ['mean', 'std', 'min', 'max', 'count'] }) print("Cross-sectional statistics:") print(cross_sectional) ```
Using .xs for Aggregation¶
```python
Get cross-section, then aggregate¶
jan1_returns = df.xs('2024-01-01', level='date')['return'] print(f"Jan 1 mean return: {jan1_returns.mean():.4f}") print(f"Jan 1 return std: {jan1_returns.std():.4f}") ```
Transform: Broadcast Aggregation¶
Keep original shape while adding aggregated values.
```python
Add entity mean as new column¶
df['entity_mean'] = df.groupby(level='ticker')['return'].transform('mean')
Add cross-sectional mean¶
df['time_mean'] = df.groupby(level='date')['return'].transform('mean')
Demeaned return (relative to entity mean)¶
df['return_demeaned'] = df['return'] - df['entity_mean']
print(df.head(10)) ```
Entity-Specific Time Series Operations¶
```python
Cumulative return per entity¶
df['cum_return'] = df.groupby(level='ticker')['return'].cumsum()
Lagged return per entity¶
df['return_lag1'] = df.groupby(level='ticker')['return'].shift(1)
Return momentum (current vs lagged)¶
df['momentum'] = df['return'] - df['return_lag1']
print(df[['return', 'cum_return', 'return_lag1', 'momentum']].head(15)) ```
Wide-Format Aggregations¶
Convert to wide format for cross-entity analysis.
```python
Reshape to wide format¶
returns_wide = df['return'].unstack('ticker') print("Wide format returns:") print(returns_wide.head()) print()
Correlation matrix (needs wide format)¶
correlation = returns_wide.corr() print("Return correlations:") print(correlation) print()
Covariance matrix¶
covariance = returns_wide.cov() print("Return covariances:") print(covariance) ```
Market-Wide Statistics¶
```python
Equal-weighted market return¶
market_return = df.groupby(level='date')['return'].mean() print("Market return (equal-weighted):") print(market_return)
Market volume¶
total_volume = df.groupby(level='date')['volume'].sum() print("\nTotal market volume:") print(total_volume) ```
Advanced: CAPM-Style Analysis¶
```python
Add market return to panel¶
market = df.groupby(level='date')['return'].mean() df['market_return'] = df.index.get_level_values('date').map(market)
Excess return (vs market)¶
df['excess_return'] = df['return'] - df['market_return']
print(df[['return', 'market_return', 'excess_return']].head(10)) ```
Beta Estimation per Entity¶
```python
Simple beta calculation per ticker¶
def calc_beta(group): cov = group['return'].cov(group['market_return']) var = group['market_return'].var() return cov / var if var != 0 else np.nan
betas = df.groupby(level='ticker').apply(calc_beta) print("Betas:") print(betas) ```
Ranking Within Cross-Sections¶
```python
Rank returns each day¶
df['return_rank'] = df.groupby(level='date')['return'].rank(ascending=False) print(df[['return', 'return_rank']].head(15)) ```
Summary Statistics¶
```python def panel_summary(df, value_col='return'): """Generate comprehensive panel summary.""" return pd.DataFrame({ 'Overall Mean': [df[value_col].mean()], 'Overall Std': [df[value_col].std()], 'Entity Count': [df.index.get_level_values(0).nunique()], 'Time Count': [df.index.get_level_values(1).nunique()], 'Total Obs': [len(df)], 'Cross-Sectional Std (avg)': [ df.groupby(level='date')[value_col].std().mean() ], 'Time-Series Std (avg)': [ df.groupby(level='ticker')[value_col].std().mean() ] })
summary = panel_summary(df) print(summary.T) ```
Aggregation Reference¶
| Goal | Method |
|---|---|
| Mean per entity | df.groupby(level='ticker').mean() |
| Mean per time | df.groupby(level='date').mean() |
| Rolling per entity | df.groupby(level='ticker').rolling(n).mean() |
| Broadcast entity mean | df.groupby(level='ticker').transform('mean') |
| Cumsum per entity | df.groupby(level='ticker').cumsum() |
| Lag per entity | df.groupby(level='ticker').shift(1) |
| Rank per time | df.groupby(level='date').rank() |
| Correlation matrix | df['col'].unstack().corr() |
Exercises¶
Exercise 1. Write code that computes the mean across entities (panel groups) using groupby().mean() on a MultiIndex DataFrame.
Solution to Exercise 1
```python import pandas as pd import numpy as np
Solution for the specific exercise¶
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```
Exercise 2. Explain the difference between aggregating across time (within-entity) and across entities (within-time) in panel data.
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.
Exercise 3. Write code that computes both the within-entity mean and the between-entity mean for a panel dataset.
Solution to Exercise 3
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```
Exercise 4. Create panel data and compute the rolling mean for each entity separately using groupby().rolling().
Solution to Exercise 4
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```