stack Method¶
The stack() method pivots columns into rows, moving the innermost column level to become the innermost row index level. This is useful for converting wide-format data to long-format.
Mental Model
stack takes column headers and pushes them down into the row index, creating a MultiIndex. The result is longer and narrower -- fewer columns, more rows. It is the index-based counterpart of melt, and its inverse is unstack.
Basic Concept¶
Before stack(): After stack():
A B
0 1.0 2.0 0 A 1.0
1 3.0 4.0 B 2.0
1 A 3.0
B 4.0
Basic Usage¶
```python import pandas as pd import numpy as np
df = pd.DataFrame({ 'A': [1.0, 3.0], 'B': [2.0, 4.0] }) print("Original DataFrame:") print(df) print()
stacked = df.stack() print("After stack():") print(stacked) print(f"\nType: {type(stacked)}") ```
``` Original DataFrame: A B 0 1.0 2.0 1 3.0 4.0
After stack(): 0 A 1.0 B 2.0 1 A 3.0 B 4.0 dtype: float64
Type:
Result is a MultiIndex Series¶
After stacking, the result is a Series with a MultiIndex:
python
stacked = df.stack()
print(f"Index levels: {stacked.index.names}")
print(f"Index:\n{stacked.index}")
Index levels: [None, None]
Index:
MultiIndex([(0, 'A'),
(0, 'B'),
(1, 'A'),
(1, 'B')],
)
With Named Index and Columns¶
```python df = pd.DataFrame( {'A': [1, 3], 'B': [2, 4]}, index=['row1', 'row2'] ) df.columns.name = 'letter' df.index.name = 'row'
print("Original:") print(df) print()
stacked = df.stack() print("Stacked:") print(stacked) ```
```
Original:
letter A B
row
row1 1 2
row2 3 4
Stacked: row letter row1 A 1 B 2 row2 A 3 B 4 dtype: int64 ```
Handling Missing Values¶
By default, stack() drops rows with NaN values:
```python df = pd.DataFrame({ 'A': [1.0, np.nan], 'B': [2.0, 4.0] }) print("Original:") print(df) print()
Default: dropna=True (drops NaN)¶
print("stack() [default]:") print(df.stack()) print()
Keep NaN values¶
print("stack(dropna=False):") print(df.stack(dropna=False)) ```
``` Original: A B 0 1.0 2.0 1 NaN 4.0
stack() [default]: 0 A 1.0 B 2.0 1 B 4.0 dtype: float64
stack(dropna=False): 0 A 1.0 B 2.0 1 A NaN B 4.0 dtype: float64 ```
Multi-Level Columns¶
When columns have multiple levels, stack() operates on the innermost level:
```python
Create DataFrame with MultiIndex columns¶
columns = pd.MultiIndex.from_tuples([ ('price', 'AAPL'), ('price', 'MSFT'), ('volume', 'AAPL'), ('volume', 'MSFT') ]) df = pd.DataFrame( [[150, 300, 1000, 2000], [151, 301, 1100, 2100]], index=['day1', 'day2'], columns=columns )
print("Original (multi-level columns):") print(df) print()
Stack innermost level (ticker)¶
stacked = df.stack() print("After stack():") print(stacked) ```
```
Original (multi-level columns):
price volume
AAPL MSFT AAPL MSFT
day1 150 300 1000 2000
day2 151 301 1100 2100
After stack(): price volume day1 AAPL 150 1000 MSFT 300 2000 day2 AAPL 151 1100 MSFT 301 2100 ```
Specifying Level to Stack¶
```python
Stack the outer level instead¶
stacked_outer = df.stack(level=0) print("Stack outer level (level=0):") print(stacked_outer) ```
Stack outer level (level=0):
AAPL MSFT
day1 price 150 300
volume 1000 2000
day2 price 151 301
volume 1100 2100
Practical Examples¶
Converting Wide Price Data to Long Format¶
```python
Wide format: each ticker is a column¶
prices_wide = pd.DataFrame({ 'AAPL': [150, 151, 152], 'MSFT': [300, 301, 302], 'GOOGL': [140, 141, 142] }, index=pd.date_range('2024-01-01', periods=3)) prices_wide.columns.name = 'ticker' prices_wide.index.name = 'date'
print("Wide format:") print(prices_wide) print()
Convert to long format¶
prices_long = prices_wide.stack() prices_long.name = 'price' print("Long format:") print(prices_long) ```
```
Wide format:
ticker AAPL MSFT GOOGL
date
2024-01-01 150 300 140
2024-01-02 151 301 141
2024-01-03 152 302 142
Long format: date ticker 2024-01-01 AAPL 150 MSFT 300 GOOGL 140 2024-01-02 AAPL 151 MSFT 301 GOOGL 141 2024-01-03 AAPL 152 MSFT 302 GOOGL 142 Name: price, dtype: int64 ```
Reset to DataFrame¶
```python
Convert stacked Series back to DataFrame¶
prices_df = prices_long.reset_index() prices_df.columns = ['date', 'ticker', 'price'] print(prices_df) ```
date ticker price
0 2024-01-01 AAPL 150
1 2024-01-01 MSFT 300
2 2024-01-01 GOOGL 140
3 2024-01-02 AAPL 151
...
Financial Data Reshaping¶
```python
OHLC data in wide format¶
ohlc_wide = pd.DataFrame({ ('AAPL', 'open'): [149, 150], ('AAPL', 'close'): [150, 151], ('MSFT', 'open'): [299, 300], ('MSFT', 'close'): [300, 301], }, index=['day1', 'day2']) ohlc_wide.columns = pd.MultiIndex.from_tuples(ohlc_wide.columns)
print("Wide OHLC:") print(ohlc_wide) print()
Stack to get ticker as row index¶
stacked = ohlc_wide.stack(level=0) print("Stacked by ticker:") print(stacked) ```
Stack vs Melt¶
Both convert wide to long format, but differently:
| Aspect | stack() | melt() |
|---|---|---|
| Input | DataFrame | DataFrame |
| Output | Series (usually) | DataFrame |
| Index | Uses existing index | Resets index |
| Column handling | Moves to index | Moves to column |
| Multi-level | Handles naturally | Requires flatten first |
```python df = pd.DataFrame({ 'A': [1, 2], 'B': [3, 4] }, index=['x', 'y'])
stack: columns become inner index level¶
stacked = df.stack() print("stack():") print(stacked) print()
melt: columns become a column value¶
melted = df.reset_index().melt(id_vars='index') print("melt():") print(melted) ```
Summary¶
| Parameter | Description | Default |
|---|---|---|
level |
Which level to stack | -1 (innermost) |
dropna |
Drop rows with missing values | True |
Key Points:
stack()pivots columns to rows- Result is typically a Series with MultiIndex
- Operates on innermost column level by default
- Inverse operation is
unstack()
Exercises¶
Exercise 1. Write code that uses .stack() on a DataFrame to move column labels into the index, creating a MultiIndex Series.
Solution to Exercise 1
```python import pandas as pd import numpy as np
Solution for the specific exercise¶
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```
Exercise 2. Explain the relationship between .stack() and .unstack(). What does each do?
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.
Exercise 3. Create a DataFrame with MultiIndex columns and use .stack(level=0) and .stack(level=1) to show the difference.
Solution to Exercise 3
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```
Exercise 4. Write code showing how .stack() handles NaN values. What does dropna=False do?
Solution to Exercise 4
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```