Reshaping Panel Data¶

Panel data often needs reshaping between long and wide formats for different analyses. This document covers common reshaping operations.

Mental Model

Long format has one row per (entity, time) observation -- best for storage and groupby. Wide format spreads entities or time periods across columns -- best for correlation analysis and plotting. unstack goes long-to-wide, stack goes wide-to-long, and pivot/melt offer column-based alternatives. Choose the shape that fits your next operation.

Long vs Wide Format¶

Long Format (Standard Panel)¶

Each row is one observation (entity-time)
Best for storage and filtering
Natural for groupby operations

Wide Format (Cross-Sectional)¶

Rows are time periods, columns are entities
Best for correlation/covariance analysis
Natural for matrix operations

Setup¶

```python import pandas as pd import numpy as np

Create sample panel data¶

tickers = ['AAPL', 'MSFT', 'GOOGL'] dates = pd.date_range('2024-01-01', periods=5) index = pd.MultiIndex.from_product([tickers, dates], names=['ticker', 'date'])

np.random.seed(42) panel_long = pd.DataFrame({ 'return': np.random.randn(15) * 0.02, 'volume': np.random.randint(1000, 10000, 15) }, index=index)

print("Long format:") print(panel_long) ```

Long to Wide: unstack()¶

Single Variable¶

```python

Returns in wide format¶

returns_wide = panel_long['return'].unstack('ticker') print("Returns (wide format):") print(returns_wide) ```

ticker AAPL MSFT GOOGL date 2024-01-01 0.009934 -0.003129 -0.018867 2024-01-02 -0.002765 0.015335 0.029389 2024-01-03 0.012936 0.000296 0.003031 2024-01-04 0.030486 -0.002013 -0.009299 2024-01-05 -0.004675 0.002826 -0.007218

Multiple Variables¶

```python

Both return and volume in wide format¶

panel_wide = panel_long.unstack('ticker') print("Panel (wide format):") print(panel_wide) ```

return volume ticker AAPL MSFT GOOGL AAPL MSFT GOOGL date 2024-01-01 0.009934 -0.003129 -0.018867 5765 3109 3046 2024-01-02 -0.002765 0.015335 0.029389 6274 2239 3856 ...

Unstack by Time Instead¶

```python

Tickers as rows, dates as columns¶

by_date = panel_long['return'].unstack('date') print("By date (wide format):") print(by_date) ```

Wide to Long: stack()¶

```python

Convert back to long format¶

returns_long = returns_wide.stack() returns_long.name = 'return' print("Back to long format:") print(returns_long) ```

Swap Level Order¶

```python

Change from (date, ticker) to (ticker, date)¶

returns_long_reordered = returns_long.swaplevel().sort_index() print("Reordered (ticker first):") print(returns_long_reordered) ```

reset_index() for Flat Format¶

```python

Completely flat DataFrame¶

flat = panel_long.reset_index() print("Flat format:") print(flat) ```

ticker date return volume 0 AAPL 2024-01-01 0.009934 5765 1 AAPL 2024-01-02 -0.002765 6274 2 AAPL 2024-01-03 0.012936 2627 ...

pivot() for Reshaping¶

Alternative to unstack when working with flat data:

```python

From flat format¶

flat = panel_long.reset_index()

Pivot to wide¶

wide = flat.pivot(index='date', columns='ticker', values='return') print("Pivoted:") print(wide) ```

melt() for Unpivoting¶

```python

From wide format back to long¶

melted = returns_wide.reset_index().melt( id_vars='date', var_name='ticker', value_name='return' ) print("Melted:") print(melted.head(10)) ```

Practical Use Cases¶

Correlation Analysis (Requires Wide)¶

```python

Need wide format for correlation¶

returns_wide = panel_long['return'].unstack('ticker')

Calculate correlation matrix¶

corr_matrix = returns_wide.corr() print("Correlation matrix:") print(corr_matrix) ```

Covariance Matrix¶

```python

Annualized covariance¶

cov_matrix = returns_wide.cov() * 252 print("Annualized covariance matrix:") print(cov_matrix) ```

Portfolio Analysis¶

```python

Define weights¶

weights = np.array([0.4, 0.35, 0.25]) # AAPL, GOOGL, MSFT order

Portfolio return (requires wide format)¶

portfolio_return = returns_wide.dot(weights) print("Portfolio return:") print(portfolio_return)

Portfolio variance¶

portfolio_var = weights @ cov_matrix @ weights print(f"\nPortfolio variance: {portfolio_var:.6f}") ```

Rolling Correlation¶

```python

Rolling 20-day correlation (requires wide)¶

rolling_corr = returns_wide['AAPL'].rolling(3).corr(returns_wide['MSFT']) print("Rolling correlation AAPL-MSFT:") print(rolling_corr) ```

Cross-Sectional Regression¶

```python

At each time, regress returns on a factor¶

Wide format makes this natural¶

Add a "factor" column¶

returns_wide['factor'] = np.random.randn(len(returns_wide)) * 0.01

For each date, calculate betas¶

from scipy import stats

betas = {} for ticker in ['AAPL', 'MSFT', 'GOOGL']: slope, , , _, _ = stats.linregress( returns_wide['factor'], returns_wide[ticker] ) betas[ticker] = slope

print("Factor betas:") print(pd.Series(betas)) ```

Three-Level Panel Reshaping¶

```python

Create 3-level panel: sector, ticker, date¶

sectors = {'Tech': ['AAPL', 'MSFT'], 'Finance': ['JPM', 'BAC']} dates = pd.date_range('2024-01-01', periods=3)

data = [] for sector, tickers in sectors.items(): for ticker in tickers: for date in dates: data.append({ 'sector': sector, 'ticker': ticker, 'date': date, 'return': np.random.randn() * 0.02 })

df = pd.DataFrame(data).set_index(['sector', 'ticker', 'date']) print("3-level panel:") print(df) print()

Reshape: dates as columns¶

wide_by_date = df['return'].unstack('date') print("Unstacked by date:") print(wide_by_date) print()

Reshape: tickers as columns¶

wide_by_ticker = df['return'].unstack('ticker') print("Unstacked by ticker:") print(wide_by_ticker) ```

Summary of Reshaping Operations¶

From	To	Method
Long (MultiIndex)	Wide (columns = entities)	`.unstack('ticker')`
Wide	Long (MultiIndex)	`.stack()`
Long (MultiIndex)	Flat DataFrame	`.reset_index()`
Flat	Long (MultiIndex)	`.set_index([...])`
Flat	Wide	`.pivot(index, columns, values)`
Wide	Flat	`.melt(id_vars, var_name, value_name)`

Best Practices¶

Use long format for storage - more flexible, handles missing data
Convert to wide for matrix operations - correlation, portfolio math
Name index levels - makes reshaping code clearer
Sort after reshaping - ensures consistent ordering
Check for NaN after unstack - unbalanced panels create missing values

Exercises¶

Exercise 1. Write code that converts panel data from long format to wide format using pivot() or unstack().

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise¶

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```

Exercise 2. Explain the relationship between stack(), unstack(), melt(), and pivot() in the context of reshaping panel data.

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.

Exercise 3. Write code that uses unstack() to convert a MultiIndex DataFrame to a wide-format DataFrame.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```

Exercise 4. Create wide-format panel data and convert it to long format using melt() or stack().

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```