pivot vs melt¶

pivot and melt are inverse operations for reshaping DataFrames between wide and long formats.

Conceptual Relationship¶

pivot and melt transform data in opposite directions.

1. melt: Wide to Long¶

import pandas as pd

# Wide format
wide = pd.DataFrame({
    'month': ['Jan', 'Feb'],
    'New York': [5, 3],
    'Los Angeles': [15, 17]
})
print("Wide format:")
print(wide)

  month  New York  Los Angeles
0   Jan         5           15
1   Feb         3           17

2. Apply melt¶

long = pd.melt(
    wide,
    id_vars=['month'],
    var_name='city',
    value_name='temperature'
)
print("Long format (after melt):")
print(long)

  month         city  temperature
0   Jan     New York            5
1   Feb     New York            3
2   Jan  Los Angeles           15
3   Feb  Los Angeles           17

3. Apply pivot¶

back_to_wide = long.pivot(
    index='month',
    columns='city',
    values='temperature'
)
print("Back to wide (after pivot):")
print(back_to_wide)

city   Los Angeles  New York
month                       
Feb             17         3
Jan             15         5

When to Use Each¶

Guidelines for choosing between pivot and melt.

1. Use melt When¶

# Converting columns to rows
# Preparing data for visualization (seaborn, plotly)
# Normalizing for database storage
# Input for groupby operations

2. Use pivot When¶

# Creating summary tables
# Converting rows to columns
# Preparing data for comparison
# Creating cross-tabulation

3. Format Characteristics¶

Format	Rows	Columns	Use Case
Wide	Few	Many	Display, comparison
Long	Many	Few	Analysis, storage

Complete Round-trip¶

Transform and reverse without data loss.

1. Start with Wide¶

original = pd.DataFrame({
    'product': ['A', 'B'],
    'Q1': [100, 150],
    'Q2': [200, 250]
})

2. Melt to Long¶

long_form = original.melt(
    id_vars=['product'],
    var_name='quarter',
    value_name='sales'
)

3. Pivot Back to Wide¶

wide_again = long_form.pivot(
    index='product',
    columns='quarter',
    values='sales'
).reset_index()

wide_again.columns.name = None  # Remove column name

Practical Example¶

Temperature data transformation.

1. Original Wide Data¶

temps = pd.DataFrame({
    'date': ['2024-01-01', '2024-01-02'],
    'city_A': [20, 22],
    'city_B': [15, 17],
    'city_C': [25, 27]
})

2. Melt for Analysis¶

temps_long = temps.melt(
    id_vars=['date'],
    var_name='city',
    value_name='temperature'
)

# Now can easily compute:
temps_long.groupby('city')['temperature'].mean()

3. Pivot for Display¶

temps_wide = temps_long.pivot(
    index='date',
    columns='city',
    values='temperature'
)
# Good for side-by-side comparison

Key Differences¶

Summary of differences.

1. Direction¶

# melt: columns → rows (wide to long)
# pivot: rows → columns (long to wide)

2. Data Volume¶

# melt: increases row count
# pivot: decreases row count (typically)

3. Required Parameters¶

# melt: id_vars (optional), var_name, value_name
# pivot: index, columns, values

Runnable Example: `pivot_reshape_tutorial.py`¶

"""
Pandas Tutorial: Pivot Tables and Reshaping Data.

Covers pivot, pivot_table, melt, stack, and unstack.
"""

import pandas as pd
import numpy as np

# =============================================================================
# Main
# =============================================================================

if __name__ == "__main__":

    print("="*70)
    print("PIVOT TABLES AND RESHAPING")
    print("="*70)

    # Create sample data
    np.random.seed(42)
    data = pd.DataFrame({
        'Date': pd.date_range('2024-01-01', periods=12, freq='M'),
        'Product': ['A', 'B'] * 6,
        'Region': ['North', 'North', 'South', 'South'] * 3,
        'Sales': np.random.randint(100, 1000, 12)
    })

    print("\nOriginal Data:")
    print(data)

    # Pivot
    print("\n1. Pivot (reshape from long to wide):")
    pivoted = data.pivot(index='Date', columns='Product', values='Sales')
    print(pivoted.head())

    # Pivot table (with aggregation)
    print("\n2. Pivot Table with aggregation:")
    pivot_table = data.pivot_table(values='Sales', 
                                    index='Product', 
                                    columns='Region',
                                    aggfunc='mean')
    print(pivot_table)

    # Multiple aggregation functions
    print("\n3. Pivot table with multiple functions:")
    pivot_multi = data.pivot_table(values='Sales',
                                   index='Product',
                                   columns='Region',
                                   aggfunc=['sum', 'mean', 'count'])
    print(pivot_multi)

    # Melt (reshape from wide to long)
    wide_data = pd.DataFrame({
        'ID': [1, 2, 3],
        'Name': ['Alice', 'Bob', 'Charlie'],
        'Math': [85, 92, 78],
        'Science': [88, 95, 82],
        'English': [90, 87, 85]
    })

    print("\n4. Wide format data:")
    print(wide_data)

    print("\n5. Melt (reshape to long format):")
    melted = pd.melt(wide_data, 
                     id_vars=['ID', 'Name'],
                     value_vars=['Math', 'Science', 'English'],
                     var_name='Subject',
                     value_name='Score')
    print(melted)

    # Stack and Unstack
    df_multi = pd.DataFrame({
        'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9]
    }, index=['X', 'Y', 'Z'])

    print("\n6. Original DataFrame:")
    print(df_multi)

    print("\n7. Stack (columns to rows):")
    stacked = df_multi.stack()
    print(stacked)

    print("\n8. Unstack (rows to columns):")
    unstacked = stacked.unstack()
    print(unstacked)

    # Cross-tabulation
    df_survey = pd.DataFrame({
        'Gender': ['M', 'F', 'M', 'F', 'M', 'F'],
        'Age_Group': ['Young', 'Young', 'Old', 'Old', 'Young', 'Old'],
        'Response': ['Yes', 'No', 'Yes', 'Yes', 'No', 'Yes']
    })

    print("\n9. Cross-tabulation:")
    crosstab = pd.crosstab(df_survey['Gender'], 
                           df_survey['Response'],
                           margins=True)
    print(crosstab)

    print("\nKEY TAKEAWAYS:")
    print("- pivot(): Reshape data (needs unique index/column combinations)")
    print("- pivot_table(): Pivot with aggregation")
    print("- melt(): Convert wide to long format")
    print("- stack(): Pivot columns to row index")
    print("- unstack(): Pivot row index to columns")
    print("- crosstab(): Compute frequency table")