pivot Method¶
The pivot() method reshapes data from long format to wide format, spreading values across columns.
Mental Model
pivot takes three columns -- index, columns, values -- and reshapes the table so that unique values of the "columns" column become actual column headers. It is a pure reshape with no aggregation, so it requires unique (index, columns) pairs. If duplicates exist, use pivot_table instead.
Basic Usage¶
Pivot a DataFrame.
1. Long to Wide¶
```python import pandas as pd
df = pd.DataFrame({ 'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'], 'city': ['NY', 'LA', 'NY', 'LA'], 'temperature': [30, 70, 32, 72] }) print("Long format:") print(df)
wide = df.pivot(index='date', columns='city', values='temperature') print("\nWide format:") print(wide) ```
``` Long format: date city temperature 0 2024-01-01 NY 30 1 2024-01-01 LA 70 2 2024-01-02 NY 32 3 2024-01-02 LA 72
Wide format:
city LA NY
date
2024-01-01 70 30
2024-01-02 72 32
```
2. Parameters¶
```python
index: column to become row index¶
columns: column whose values become column headers¶
values: column containing data values¶
```
3. Result Structure¶
Each unique value in 'city' becomes a column.
LeetCode Example: Department Table¶
Reshape department revenue by month.
1. Sample Data¶
python
department = pd.DataFrame({
'id': [1, 1, 1, 2, 2],
'month': ['Jan', 'Feb', 'Mar', 'Jan', 'Feb'],
'revenue': [100, 150, 200, 80, 120]
})
2. Pivot Transform¶
python
bymonth = department.pivot(
index='id',
columns='month',
values='revenue'
)
print(bymonth)
month Feb Jan Mar
id
1 150 100 200.0
2 120 80 NaN
3. Handle Missing¶
python
bymonth = bymonth.fillna(0)
Limitations¶
pivot has strict requirements.
1. No Duplicate Entries¶
```python
pivot fails if index-column combination has duplicates¶
df = pd.DataFrame({ 'date': ['2024-01-01', '2024-01-01'], 'city': ['NY', 'NY'], # Duplicate! 'temp': [30, 31] })
df.pivot(index='date', columns='city', values='temp') # Error!¶
```
2. Use pivot_table for Duplicates¶
```python
pivot_table handles duplicates with aggregation¶
df.pivot_table(index='date', columns='city', values='temp', aggfunc='mean') ```
3. Single Value Required¶
Each index-column pair must have exactly one value.
Financial Example¶
Pivot stock price data.
1. Sample Data¶
python
prices = pd.DataFrame({
'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'],
'ticker': ['AAPL', 'MSFT', 'AAPL', 'MSFT'],
'close': [150, 350, 152, 355]
})
2. Pivot to Wide¶
python
price_matrix = prices.pivot(
index='date',
columns='ticker',
values='close'
)
3. Use for Analysis¶
```python
Calculate correlation¶
price_matrix.corr()
Calculate returns¶
price_matrix.pct_change() ```
reset_index After Pivot¶
Flatten the result.
1. Index as Column¶
python
result = df.pivot(index='date', columns='city', values='temp')
result = result.reset_index()
2. Remove Column Name¶
python
result.columns.name = None
3. Rename Columns¶
python
result.columns = ['date', 'los_angeles', 'new_york']
Exercises¶
Exercise 1. Create a DataFrame with columns ['date', 'city', 'temperature'] and use .pivot() to reshape it so each city becomes a column.
Solution to Exercise 1
```python import pandas as pd import numpy as np
Solution for the specific exercise¶
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```
Exercise 2. Explain the three required parameters of .pivot(): index, columns, and values.
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.
Exercise 3. Write code showing that .pivot() raises an error when there are duplicate entries for the same index-column combination.
Solution to Exercise 3
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```
Exercise 4. Create a pivoted DataFrame and use .melt() to convert it back to the original long format.
Solution to Exercise 4
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```