Panel Aggregations¶

Panel data enables both entity-wise and time-wise aggregations. This document covers common aggregation patterns.

Mental Model

Panel aggregation has two natural directions: aggregate across time (one result per entity) or across entities (one result per time point). Use groupby(level='ticker') for entity-wise and groupby(level='date') for time-wise summaries. The level parameter in groupby replaces column names with index level names.

Setup¶

```python import pandas as pd import numpy as np

Create sample panel¶

tickers = ['AAPL', 'MSFT', 'GOOGL'] dates = pd.date_range('2024-01-01', periods=10) index = pd.MultiIndex.from_product([tickers, dates], names=['ticker', 'date'])

np.random.seed(42) df = pd.DataFrame({ 'return': np.random.randn(30) * 0.02, 'volume': np.random.randint(1000, 10000, 30) }, index=index) ```

Entity-Wise Aggregation (Across Time)¶

Aggregate each entity's time series.

Using groupby(level=)¶

```python

Mean return per ticker (across all dates)¶

mean_by_ticker = df.groupby(level='ticker').mean() print("Mean by ticker:") print(mean_by_ticker) ```

Mean by ticker: return volume ticker AAPL 0.005123 5234.500000 MSFT 0.002891 4891.300000 GOOGL -0.001234 5567.200000

Various Aggregations per Entity¶

```python

Multiple statistics per ticker¶

entity_stats = df.groupby(level='ticker').agg({ 'return': ['mean', 'std', 'min', 'max'], 'volume': ['mean', 'sum'] }) print("Entity statistics:") print(entity_stats) ```

Rolling Operations per Entity¶

```python

5-day rolling mean return per ticker¶

rolling_mean = df.groupby(level='ticker')['return'].rolling(window=5).mean() print("Rolling mean per entity:") print(rolling_mean) ```

Time-Wise Aggregation (Cross-Sectional)¶

Aggregate across entities at each time point.

Using groupby(level=)¶

```python

Mean return per date (across all tickers)¶

mean_by_date = df.groupby(level='date').mean() print("Mean by date (cross-sectional):") print(mean_by_date) ```

Mean by date (cross-sectional): return volume date 2024-01-01 0.003456 4567.333333 2024-01-02 -0.001234 5234.000000 ...

Cross-Sectional Statistics¶

```python

Cross-sectional stats each day¶

cross_sectional = df.groupby(level='date').agg({ 'return': ['mean', 'std', 'min', 'max', 'count'] }) print("Cross-sectional statistics:") print(cross_sectional) ```

Using .xs for Aggregation¶

```python

Get cross-section, then aggregate¶

jan1_returns = df.xs('2024-01-01', level='date')['return'] print(f"Jan 1 mean return: {jan1_returns.mean():.4f}") print(f"Jan 1 return std: {jan1_returns.std():.4f}") ```

Transform: Broadcast Aggregation¶

Keep original shape while adding aggregated values.

```python

Add entity mean as new column¶

df['entity_mean'] = df.groupby(level='ticker')['return'].transform('mean')

Add cross-sectional mean¶

df['time_mean'] = df.groupby(level='date')['return'].transform('mean')

Demeaned return (relative to entity mean)¶

df['return_demeaned'] = df['return'] - df['entity_mean']

print(df.head(10)) ```

Entity-Specific Time Series Operations¶

```python

Cumulative return per entity¶

df['cum_return'] = df.groupby(level='ticker')['return'].cumsum()

Lagged return per entity¶

df['return_lag1'] = df.groupby(level='ticker')['return'].shift(1)

Return momentum (current vs lagged)¶

df['momentum'] = df['return'] - df['return_lag1']

print(df[['return', 'cum_return', 'return_lag1', 'momentum']].head(15)) ```

Wide-Format Aggregations¶

Convert to wide format for cross-entity analysis.

```python

Reshape to wide format¶

returns_wide = df['return'].unstack('ticker') print("Wide format returns:") print(returns_wide.head()) print()

Correlation matrix (needs wide format)¶

correlation = returns_wide.corr() print("Return correlations:") print(correlation) print()

Covariance matrix¶

covariance = returns_wide.cov() print("Return covariances:") print(covariance) ```

Market-Wide Statistics¶

```python

Equal-weighted market return¶

market_return = df.groupby(level='date')['return'].mean() print("Market return (equal-weighted):") print(market_return)

Market volume¶

total_volume = df.groupby(level='date')['volume'].sum() print("\nTotal market volume:") print(total_volume) ```

Advanced: CAPM-Style Analysis¶

```python

Add market return to panel¶

market = df.groupby(level='date')['return'].mean() df['market_return'] = df.index.get_level_values('date').map(market)

Excess return (vs market)¶

df['excess_return'] = df['return'] - df['market_return']

print(df[['return', 'market_return', 'excess_return']].head(10)) ```

Beta Estimation per Entity¶

```python

Simple beta calculation per ticker¶

def calc_beta(group): cov = group['return'].cov(group['market_return']) var = group['market_return'].var() return cov / var if var != 0 else np.nan

betas = df.groupby(level='ticker').apply(calc_beta) print("Betas:") print(betas) ```

Ranking Within Cross-Sections¶

```python

Rank returns each day¶

df['return_rank'] = df.groupby(level='date')['return'].rank(ascending=False) print(df[['return', 'return_rank']].head(15)) ```

Summary Statistics¶

```python def panel_summary(df, value_col='return'): """Generate comprehensive panel summary.""" return pd.DataFrame({ 'Overall Mean': [df[value_col].mean()], 'Overall Std': [df[value_col].std()], 'Entity Count': [df.index.get_level_values(0).nunique()], 'Time Count': [df.index.get_level_values(1).nunique()], 'Total Obs': [len(df)], 'Cross-Sectional Std (avg)': [ df.groupby(level='date')[value_col].std().mean() ], 'Time-Series Std (avg)': [ df.groupby(level='ticker')[value_col].std().mean() ] })

summary = panel_summary(df) print(summary.T) ```

Aggregation Reference¶

Goal	Method
Mean per entity	`df.groupby(level='ticker').mean()`
Mean per time	`df.groupby(level='date').mean()`
Rolling per entity	`df.groupby(level='ticker').rolling(n).mean()`
Broadcast entity mean	`df.groupby(level='ticker').transform('mean')`
Cumsum per entity	`df.groupby(level='ticker').cumsum()`
Lag per entity	`df.groupby(level='ticker').shift(1)`
Rank per time	`df.groupby(level='date').rank()`
Correlation matrix	`df['col'].unstack().corr()`

Exercises¶

Exercise 1. Write code that computes the mean across entities (panel groups) using groupby().mean() on a MultiIndex DataFrame.

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise¶

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```

Exercise 2. Explain the difference between aggregating across time (within-entity) and across entities (within-time) in panel data.

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.

Exercise 3. Write code that computes both the within-entity mean and the between-entity mean for a panel dataset.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```

Exercise 4. Create panel data and compute the rolling mean for each entity separately using groupby().rolling().

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```