stack Method¶
The stack() method pivots columns into rows, moving the innermost column level to become the innermost row index level. This is useful for converting wide-format data to long-format.
Basic Concept¶
Before stack(): After stack():
A B
0 1.0 2.0 0 A 1.0
1 3.0 4.0 B 2.0
1 A 3.0
B 4.0
Basic Usage¶
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1.0, 3.0],
'B': [2.0, 4.0]
})
print("Original DataFrame:")
print(df)
print()
stacked = df.stack()
print("After stack():")
print(stacked)
print(f"\nType: {type(stacked)}")
Original DataFrame:
A B
0 1.0 2.0
1 3.0 4.0
After stack():
0 A 1.0
B 2.0
1 A 3.0
B 4.0
dtype: float64
Type: <class 'pandas.core.series.Series'>
Result is a MultiIndex Series¶
After stacking, the result is a Series with a MultiIndex:
stacked = df.stack()
print(f"Index levels: {stacked.index.names}")
print(f"Index:\n{stacked.index}")
Index levels: [None, None]
Index:
MultiIndex([(0, 'A'),
(0, 'B'),
(1, 'A'),
(1, 'B')],
)
With Named Index and Columns¶
df = pd.DataFrame(
{'A': [1, 3], 'B': [2, 4]},
index=['row1', 'row2']
)
df.columns.name = 'letter'
df.index.name = 'row'
print("Original:")
print(df)
print()
stacked = df.stack()
print("Stacked:")
print(stacked)
Original:
letter A B
row
row1 1 2
row2 3 4
Stacked:
row letter
row1 A 1
B 2
row2 A 3
B 4
dtype: int64
Handling Missing Values¶
By default, stack() drops rows with NaN values:
df = pd.DataFrame({
'A': [1.0, np.nan],
'B': [2.0, 4.0]
})
print("Original:")
print(df)
print()
# Default: dropna=True (drops NaN)
print("stack() [default]:")
print(df.stack())
print()
# Keep NaN values
print("stack(dropna=False):")
print(df.stack(dropna=False))
Original:
A B
0 1.0 2.0
1 NaN 4.0
stack() [default]:
0 A 1.0
B 2.0
1 B 4.0
dtype: float64
stack(dropna=False):
0 A 1.0
B 2.0
1 A NaN
B 4.0
dtype: float64
Multi-Level Columns¶
When columns have multiple levels, stack() operates on the innermost level:
# Create DataFrame with MultiIndex columns
columns = pd.MultiIndex.from_tuples([
('price', 'AAPL'), ('price', 'MSFT'),
('volume', 'AAPL'), ('volume', 'MSFT')
])
df = pd.DataFrame(
[[150, 300, 1000, 2000],
[151, 301, 1100, 2100]],
index=['day1', 'day2'],
columns=columns
)
print("Original (multi-level columns):")
print(df)
print()
# Stack innermost level (ticker)
stacked = df.stack()
print("After stack():")
print(stacked)
Original (multi-level columns):
price volume
AAPL MSFT AAPL MSFT
day1 150 300 1000 2000
day2 151 301 1100 2100
After stack():
price volume
day1 AAPL 150 1000
MSFT 300 2000
day2 AAPL 151 1100
MSFT 301 2100
Specifying Level to Stack¶
# Stack the outer level instead
stacked_outer = df.stack(level=0)
print("Stack outer level (level=0):")
print(stacked_outer)
Stack outer level (level=0):
AAPL MSFT
day1 price 150 300
volume 1000 2000
day2 price 151 301
volume 1100 2100
Practical Examples¶
Converting Wide Price Data to Long Format¶
# Wide format: each ticker is a column
prices_wide = pd.DataFrame({
'AAPL': [150, 151, 152],
'MSFT': [300, 301, 302],
'GOOGL': [140, 141, 142]
}, index=pd.date_range('2024-01-01', periods=3))
prices_wide.columns.name = 'ticker'
prices_wide.index.name = 'date'
print("Wide format:")
print(prices_wide)
print()
# Convert to long format
prices_long = prices_wide.stack()
prices_long.name = 'price'
print("Long format:")
print(prices_long)
Wide format:
ticker AAPL MSFT GOOGL
date
2024-01-01 150 300 140
2024-01-02 151 301 141
2024-01-03 152 302 142
Long format:
date ticker
2024-01-01 AAPL 150
MSFT 300
GOOGL 140
2024-01-02 AAPL 151
MSFT 301
GOOGL 141
2024-01-03 AAPL 152
MSFT 302
GOOGL 142
Name: price, dtype: int64
Reset to DataFrame¶
# Convert stacked Series back to DataFrame
prices_df = prices_long.reset_index()
prices_df.columns = ['date', 'ticker', 'price']
print(prices_df)
date ticker price
0 2024-01-01 AAPL 150
1 2024-01-01 MSFT 300
2 2024-01-01 GOOGL 140
3 2024-01-02 AAPL 151
...
Financial Data Reshaping¶
# OHLC data in wide format
ohlc_wide = pd.DataFrame({
('AAPL', 'open'): [149, 150],
('AAPL', 'close'): [150, 151],
('MSFT', 'open'): [299, 300],
('MSFT', 'close'): [300, 301],
}, index=['day1', 'day2'])
ohlc_wide.columns = pd.MultiIndex.from_tuples(ohlc_wide.columns)
print("Wide OHLC:")
print(ohlc_wide)
print()
# Stack to get ticker as row index
stacked = ohlc_wide.stack(level=0)
print("Stacked by ticker:")
print(stacked)
Stack vs Melt¶
Both convert wide to long format, but differently:
| Aspect | stack() | melt() |
|---|---|---|
| Input | DataFrame | DataFrame |
| Output | Series (usually) | DataFrame |
| Index | Uses existing index | Resets index |
| Column handling | Moves to index | Moves to column |
| Multi-level | Handles naturally | Requires flatten first |
df = pd.DataFrame({
'A': [1, 2],
'B': [3, 4]
}, index=['x', 'y'])
# stack: columns become inner index level
stacked = df.stack()
print("stack():")
print(stacked)
print()
# melt: columns become a column value
melted = df.reset_index().melt(id_vars='index')
print("melt():")
print(melted)
Summary¶
| Parameter | Description | Default |
|---|---|---|
level |
Which level to stack | -1 (innermost) |
dropna |
Drop rows with missing values | True |
Key Points:
- stack() pivots columns to rows
- Result is typically a Series with MultiIndex
- Operates on innermost column level by default
- Inverse operation is unstack()