Skip to content

pivot Method

The pivot() method reshapes data from long format to wide format, spreading values across columns.

Mental Model

pivot takes three columns -- index, columns, values -- and reshapes the table so that unique values of the "columns" column become actual column headers. It is a pure reshape with no aggregation, so it requires unique (index, columns) pairs. If duplicates exist, use pivot_table instead.

Basic Usage

Pivot a DataFrame.

1. Long to Wide

```python import pandas as pd

df = pd.DataFrame({ 'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'], 'city': ['NY', 'LA', 'NY', 'LA'], 'temperature': [30, 70, 32, 72] }) print("Long format:") print(df)

wide = df.pivot(index='date', columns='city', values='temperature') print("\nWide format:") print(wide) ```

``` Long format: date city temperature 0 2024-01-01 NY 30 1 2024-01-01 LA 70 2 2024-01-02 NY 32 3 2024-01-02 LA 72

Wide format: city LA NY date
2024-01-01 70 30 2024-01-02 72 32 ```

2. Parameters

```python

index: column to become row index

columns: column whose values become column headers

values: column containing data values

```

3. Result Structure

Each unique value in 'city' becomes a column.

LeetCode Example: Department Table

Reshape department revenue by month.

1. Sample Data

python department = pd.DataFrame({ 'id': [1, 1, 1, 2, 2], 'month': ['Jan', 'Feb', 'Mar', 'Jan', 'Feb'], 'revenue': [100, 150, 200, 80, 120] })

2. Pivot Transform

python bymonth = department.pivot( index='id', columns='month', values='revenue' ) print(bymonth)

month Feb Jan Mar id 1 150 100 200.0 2 120 80 NaN

3. Handle Missing

python bymonth = bymonth.fillna(0)

Limitations

pivot has strict requirements.

1. No Duplicate Entries

```python

pivot fails if index-column combination has duplicates

df = pd.DataFrame({ 'date': ['2024-01-01', '2024-01-01'], 'city': ['NY', 'NY'], # Duplicate! 'temp': [30, 31] })

df.pivot(index='date', columns='city', values='temp') # Error!

```

2. Use pivot_table for Duplicates

```python

pivot_table handles duplicates with aggregation

df.pivot_table(index='date', columns='city', values='temp', aggfunc='mean') ```

3. Single Value Required

Each index-column pair must have exactly one value.

Financial Example

Pivot stock price data.

1. Sample Data

python prices = pd.DataFrame({ 'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02'], 'ticker': ['AAPL', 'MSFT', 'AAPL', 'MSFT'], 'close': [150, 350, 152, 355] })

2. Pivot to Wide

python price_matrix = prices.pivot( index='date', columns='ticker', values='close' )

3. Use for Analysis

```python

Calculate correlation

price_matrix.corr()

Calculate returns

price_matrix.pct_change() ```

reset_index After Pivot

Flatten the result.

1. Index as Column

python result = df.pivot(index='date', columns='city', values='temp') result = result.reset_index()

2. Remove Column Name

python result.columns.name = None

3. Rename Columns

python result.columns = ['date', 'los_angeles', 'new_york']


Exercises

Exercise 1. Create a DataFrame with columns ['date', 'city', 'temperature'] and use .pivot() to reshape it so each city becomes a column.

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```


Exercise 2. Explain the three required parameters of .pivot(): index, columns, and values.

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.


Exercise 3. Write code showing that .pivot() raises an error when there are duplicate entries for the same index-column combination.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```


Exercise 4. Create a pivoted DataFrame and use .melt() to convert it back to the original long format.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```