Panel Aggregations¶
Panel data enables both entity-wise and time-wise aggregations. This document covers common aggregation patterns.
Setup¶
import pandas as pd
import numpy as np
# Create sample panel
tickers = ['AAPL', 'MSFT', 'GOOGL']
dates = pd.date_range('2024-01-01', periods=10)
index = pd.MultiIndex.from_product([tickers, dates], names=['ticker', 'date'])
np.random.seed(42)
df = pd.DataFrame({
'return': np.random.randn(30) * 0.02,
'volume': np.random.randint(1000, 10000, 30)
}, index=index)
Entity-Wise Aggregation (Across Time)¶
Aggregate each entity's time series.
Using groupby(level=)¶
# Mean return per ticker (across all dates)
mean_by_ticker = df.groupby(level='ticker').mean()
print("Mean by ticker:")
print(mean_by_ticker)
Mean by ticker:
return volume
ticker
AAPL 0.005123 5234.500000
MSFT 0.002891 4891.300000
GOOGL -0.001234 5567.200000
Various Aggregations per Entity¶
# Multiple statistics per ticker
entity_stats = df.groupby(level='ticker').agg({
'return': ['mean', 'std', 'min', 'max'],
'volume': ['mean', 'sum']
})
print("Entity statistics:")
print(entity_stats)
Rolling Operations per Entity¶
# 5-day rolling mean return per ticker
rolling_mean = df.groupby(level='ticker')['return'].rolling(window=5).mean()
print("Rolling mean per entity:")
print(rolling_mean)
Time-Wise Aggregation (Cross-Sectional)¶
Aggregate across entities at each time point.
Using groupby(level=)¶
# Mean return per date (across all tickers)
mean_by_date = df.groupby(level='date').mean()
print("Mean by date (cross-sectional):")
print(mean_by_date)
Mean by date (cross-sectional):
return volume
date
2024-01-01 0.003456 4567.333333
2024-01-02 -0.001234 5234.000000
...
Cross-Sectional Statistics¶
# Cross-sectional stats each day
cross_sectional = df.groupby(level='date').agg({
'return': ['mean', 'std', 'min', 'max', 'count']
})
print("Cross-sectional statistics:")
print(cross_sectional)
Using .xs for Aggregation¶
# Get cross-section, then aggregate
jan1_returns = df.xs('2024-01-01', level='date')['return']
print(f"Jan 1 mean return: {jan1_returns.mean():.4f}")
print(f"Jan 1 return std: {jan1_returns.std():.4f}")
Transform: Broadcast Aggregation¶
Keep original shape while adding aggregated values.
# Add entity mean as new column
df['entity_mean'] = df.groupby(level='ticker')['return'].transform('mean')
# Add cross-sectional mean
df['time_mean'] = df.groupby(level='date')['return'].transform('mean')
# Demeaned return (relative to entity mean)
df['return_demeaned'] = df['return'] - df['entity_mean']
print(df.head(10))
Entity-Specific Time Series Operations¶
# Cumulative return per entity
df['cum_return'] = df.groupby(level='ticker')['return'].cumsum()
# Lagged return per entity
df['return_lag1'] = df.groupby(level='ticker')['return'].shift(1)
# Return momentum (current vs lagged)
df['momentum'] = df['return'] - df['return_lag1']
print(df[['return', 'cum_return', 'return_lag1', 'momentum']].head(15))
Wide-Format Aggregations¶
Convert to wide format for cross-entity analysis.
# Reshape to wide format
returns_wide = df['return'].unstack('ticker')
print("Wide format returns:")
print(returns_wide.head())
print()
# Correlation matrix (needs wide format)
correlation = returns_wide.corr()
print("Return correlations:")
print(correlation)
print()
# Covariance matrix
covariance = returns_wide.cov()
print("Return covariances:")
print(covariance)
Market-Wide Statistics¶
# Equal-weighted market return
market_return = df.groupby(level='date')['return'].mean()
print("Market return (equal-weighted):")
print(market_return)
# Market volume
total_volume = df.groupby(level='date')['volume'].sum()
print("\nTotal market volume:")
print(total_volume)
Advanced: CAPM-Style Analysis¶
# Add market return to panel
market = df.groupby(level='date')['return'].mean()
df['market_return'] = df.index.get_level_values('date').map(market)
# Excess return (vs market)
df['excess_return'] = df['return'] - df['market_return']
print(df[['return', 'market_return', 'excess_return']].head(10))
Beta Estimation per Entity¶
# Simple beta calculation per ticker
def calc_beta(group):
cov = group['return'].cov(group['market_return'])
var = group['market_return'].var()
return cov / var if var != 0 else np.nan
betas = df.groupby(level='ticker').apply(calc_beta)
print("Betas:")
print(betas)
Ranking Within Cross-Sections¶
# Rank returns each day
df['return_rank'] = df.groupby(level='date')['return'].rank(ascending=False)
print(df[['return', 'return_rank']].head(15))
Summary Statistics¶
def panel_summary(df, value_col='return'):
"""Generate comprehensive panel summary."""
return pd.DataFrame({
'Overall Mean': [df[value_col].mean()],
'Overall Std': [df[value_col].std()],
'Entity Count': [df.index.get_level_values(0).nunique()],
'Time Count': [df.index.get_level_values(1).nunique()],
'Total Obs': [len(df)],
'Cross-Sectional Std (avg)': [
df.groupby(level='date')[value_col].std().mean()
],
'Time-Series Std (avg)': [
df.groupby(level='ticker')[value_col].std().mean()
]
})
summary = panel_summary(df)
print(summary.T)
Aggregation Reference¶
| Goal | Method |
|---|---|
| Mean per entity | df.groupby(level='ticker').mean() |
| Mean per time | df.groupby(level='date').mean() |
| Rolling per entity | df.groupby(level='ticker').rolling(n).mean() |
| Broadcast entity mean | df.groupby(level='ticker').transform('mean') |
| Cumsum per entity | df.groupby(level='ticker').cumsum() |
| Lag per entity | df.groupby(level='ticker').shift(1) |
| Rank per time | df.groupby(level='date').rank() |
| Correlation matrix | df['col'].unstack().corr() |