Skip to content

Introduction to Panel Data

Panel data (also called longitudinal data) combines cross-sectional and time-series dimensions. Each observation is identified by two keys: an entity (individual, firm, country) and a time point.

What is Panel Data?

Panel data has two dimensions: - Cross-sectional: Different entities (stocks, companies, individuals) - Time-series: Repeated observations over time

┌─────────────────────────────────────────────────┐
│                  Panel Data                      │
├─────────────────────────────────────────────────┤
│    Entity (i)     Time (t)        Value         │
│    ──────────     ────────        ─────         │
│    AAPL           2024-01-01      \$150.00       │
│    AAPL           2024-01-02      \$151.50       │
│    AAPL           2024-01-03      \$149.80       │
│    MSFT           2024-01-01      \$300.00       │
│    MSFT           2024-01-02      \$302.00       │
│    MSFT           2024-01-03      \$301.50       │
│    GOOGL          2024-01-01      \$140.00       │
│    GOOGL          2024-01-02      \$141.20       │
│    GOOGL          2024-01-03      \$142.00       │
└─────────────────────────────────────────────────┘

Panel Data vs Other Data Types

Data Type Entities Time Points Example
Cross-sectional Multiple Single Survey at one point
Time-series Single Multiple One stock's history
Panel Multiple Multiple Multiple stocks over time

Why Use Panel Data?

1. More Information

Panel data contains more observations than cross-sectional or time-series alone:

Cross-sectional: 100 stocks × 1 day = 100 observations
Time-series: 1 stock × 252 days = 252 observations
Panel: 100 stocks × 252 days = 25,200 observations

2. Control for Unobserved Heterogeneity

Panel data allows fixed effects models that control for entity-specific factors: - Stock-specific characteristics (management quality, brand value) - Time-specific effects (market conditions, regulations)

3. Study Dynamics

Track how entities change over time.

Examples of Panel Data

Domain Entities Time Variables
Finance Stocks Days Returns, Volume
Economics Countries Years GDP, Inflation
Healthcare Patients Visits Vitals, Tests
Education Students Grades Scores, Attendance

Balanced vs Unbalanced Panels

Balanced Panel

Every entity has observations for every time period.

Unbalanced Panel

Some entity-time combinations are missing (common in real data).

Long vs Wide Format

Long Format (Standard)

ticker  date        return
AAPL    2024-01-01  0.01
AAPL    2024-01-02  0.02
MSFT    2024-01-01  0.015

Wide Format

date        AAPL   MSFT
2024-01-01  0.01   0.015
2024-01-02  0.02   0.018

Panel Data in Pandas

Pandas handles panel data using MultiIndex:

import pandas as pd
import numpy as np

tickers = ['AAPL', 'MSFT', 'GOOGL']
dates = pd.date_range('2024-01-01', periods=5)

index = pd.MultiIndex.from_product(
    [tickers, dates], 
    names=['ticker', 'date']
)

returns = pd.Series(np.random.randn(15) * 0.02, index=index, name='return')
print(returns)