unstack Method¶
The unstack() method pivots rows into columns, moving an index level to become a column level. This is useful for converting long-format data to wide-format.
Mental Model
unstack pulls an index level out of the row axis and spreads its unique values across as new column headers. The result is wider and shorter -- more columns, fewer rows. It is the inverse of stack and the index-based counterpart of pivot.
Basic Concept¶
Before unstack(): After unstack():
A B
0 A 1.0 0 1.0 2.0
B 2.0 1 3.0 4.0
1 A 3.0
B 4.0
Basic Usage¶
```python import pandas as pd import numpy as np
Create a Series with MultiIndex¶
index = pd.MultiIndex.from_tuples([ (0, 'A'), (0, 'B'), (1, 'A'), (1, 'B') ]) s = pd.Series([1.0, 2.0, 3.0, 4.0], index=index)
print("Original Series:") print(s) print()
unstacked = s.unstack() print("After unstack():") print(unstacked) ```
``` Original Series: 0 A 1.0 B 2.0 1 A 3.0 B 4.0 dtype: float64
After unstack(): A B 0 1.0 2.0 1 3.0 4.0 ```
Specifying Level to Unstack¶
By default, unstack() operates on the innermost level (-1):
```python
Create MultiIndex with named levels¶
index = pd.MultiIndex.from_tuples([ ('row1', 'A'), ('row1', 'B'), ('row2', 'A'), ('row2', 'B') ], names=['row', 'col'])
s = pd.Series([1, 2, 3, 4], index=index)
print("Original:") print(s) print()
Unstack inner level (default)¶
print("unstack() [default, level=-1]:") print(s.unstack()) print()
Unstack outer level¶
print("unstack(level=0):") print(s.unstack(level=0)) ```
``` Original: row col row1 A 1 B 2 row2 A 3 B 4 dtype: int64
unstack() [default, level=-1]:
col A B
row
row1 1 2
row2 3 4
unstack(level=0):
row row1 row2
col
A 1 3
B 2 4
```
Unstack by Level Name¶
```python
Unstack by name instead of position¶
print(s.unstack(level='col')) print(s.unstack(level='row')) ```
Handling Missing Values¶
unstack() introduces NaN when combinations don't exist:
```python
Incomplete MultiIndex¶
index = pd.MultiIndex.from_tuples([ ('A', 1), ('A', 2), ('B', 1) # Missing ('B', 2) ]) s = pd.Series([10, 20, 30], index=index)
print("Original (missing B-2):") print(s) print()
print("After unstack():") print(s.unstack()) ```
``` Original (missing B-2): A 1 10 2 20 B 1 30 dtype: int64
After unstack(): 1 2 A 10.0 20.0 B 30.0 NaN ```
Fill Missing Values¶
```python
Fill NaN with a specific value¶
print(s.unstack(fill_value=0)) ```
1 2
A 10 20
B 30 0
DataFrame unstack¶
Works on DataFrame index levels:
```python
DataFrame with MultiIndex¶
index = pd.MultiIndex.from_tuples([ ('AAPL', 'day1'), ('AAPL', 'day2'), ('MSFT', 'day1'), ('MSFT', 'day2') ], names=['ticker', 'date'])
df = pd.DataFrame({ 'price': [150, 151, 300, 301], 'volume': [1000, 1100, 2000, 2100] }, index=index)
print("Original:") print(df) print()
Unstack the date level¶
print("unstack('date'):") print(df.unstack('date')) ```
```
Original:
price volume
ticker date
AAPL day1 150 1000
day2 151 1100
MSFT day1 300 2000
day2 301 2100
unstack('date'):
price volume
date day1 day2 day1 day2
ticker
AAPL 150 151 1000 1100
MSFT 300 301 2000 2100
```
Practical Examples¶
Panel Data Reshaping¶
```python
Long format panel data¶
np.random.seed(42) tickers = ['AAPL', 'MSFT', 'GOOGL'] dates = pd.date_range('2024-01-01', periods=5)
Create MultiIndex panel¶
index = pd.MultiIndex.from_product([tickers, dates], names=['ticker', 'date']) returns = pd.Series(np.random.randn(15) * 0.02, index=index, name='return')
print("Long format (panel):") print(returns.head(10)) print()
Unstack to wide format (each ticker as column)¶
returns_wide = returns.unstack('ticker') print("Wide format:") print(returns_wide) ```
Cross-Sectional Analysis¶
```python
Wide format is useful for correlation analysis¶
correlation = returns_wide.corr() print("Correlation matrix:") print(correlation) ```
Time Series Pivot¶
```python
Data by sector and date¶
index = pd.MultiIndex.from_product([ ['Tech', 'Finance', 'Health'], pd.date_range('2024-01-01', periods=3) ], names=['sector', 'date'])
df = pd.DataFrame({ 'returns': np.random.randn(9) * 0.02, 'volume': np.random.randint(100, 1000, 9) }, index=index)
print("Original:") print(df) print()
Pivot: dates as columns, sectors as rows¶
returns_pivot = df['returns'].unstack('date') print("Returns pivoted:") print(returns_pivot) ```
Multiple Unstacks¶
You can unstack multiple times:
```python
Three-level MultiIndex¶
index = pd.MultiIndex.from_product([ ['2024', '2025'], ['Q1', 'Q2'], ['AAPL', 'MSFT'] ], names=['year', 'quarter', 'ticker'])
s = pd.Series(range(8), index=index) print("Original (3 levels):") print(s) print()
Unstack once¶
print("After one unstack():") print(s.unstack()) print()
Unstack twice¶
print("After two unstacks:") print(s.unstack().unstack()) ```
unstack vs pivot¶
Both reshape data, but differently:
| Aspect | unstack() | pivot() |
|---|---|---|
| Input | Series/DataFrame with MultiIndex | DataFrame with columns |
| Source | Index levels | Column values |
| Operation | Index level → columns | Column → columns |
```python
Using unstack (from MultiIndex)¶
s = pd.Series([1, 2, 3, 4], index=pd.MultiIndex.from_product([['A', 'B'], ['x', 'y']])) print("unstack():") print(s.unstack()) print()
Using pivot (from columns)¶
df = pd.DataFrame({ 'row': ['A', 'A', 'B', 'B'], 'col': ['x', 'y', 'x', 'y'], 'value': [1, 2, 3, 4] }) print("pivot():") print(df.pivot(index='row', columns='col', values='value')) ```
stack and unstack are Inverses¶
```python df = pd.DataFrame({ 'A': [1, 2], 'B': [3, 4] }, index=['x', 'y'])
print("Original:") print(df) print()
Stack then unstack returns original¶
stacked = df.stack() unstacked = stacked.unstack() print("After stack().unstack():") print(unstacked) ```
Summary¶
| Parameter | Description | Default |
|---|---|---|
level |
Which level to unstack | -1 (innermost) |
fill_value |
Value for missing combinations | NaN |
Key Points:
unstack()pivots index level to columns- Inverse of
stack() - Creates NaN for missing combinations
- Use
fill_valueto handle missing data - Essential for converting long to wide format
Exercises¶
Exercise 1. Write code that creates a Series with a MultiIndex and uses .unstack() to convert it to a DataFrame.
Solution to Exercise 1
```python import pandas as pd import numpy as np
Solution for the specific exercise¶
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```
Exercise 2. Explain what .unstack() does. Which index level is moved to columns by default?
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.
Exercise 3. Create a MultiIndex DataFrame and use .unstack(fill_value=0) to fill NaN values created by the reshape.
Solution to Exercise 3
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```
Exercise 4. Write code that unstacks a grouped aggregation result (e.g., df.groupby(['a', 'b']).mean().unstack()).
Solution to Exercise 4
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```