pivot vs melt¶
pivot and melt are inverse operations for reshaping DataFrames between wide and long formats.
Conceptual Relationship¶
pivot and melt transform data in opposite directions.
1. melt: Wide to Long¶
import pandas as pd
# Wide format
wide = pd.DataFrame({
'month': ['Jan', 'Feb'],
'New York': [5, 3],
'Los Angeles': [15, 17]
})
print("Wide format:")
print(wide)
month New York Los Angeles
0 Jan 5 15
1 Feb 3 17
2. Apply melt¶
long = pd.melt(
wide,
id_vars=['month'],
var_name='city',
value_name='temperature'
)
print("Long format (after melt):")
print(long)
month city temperature
0 Jan New York 5
1 Feb New York 3
2 Jan Los Angeles 15
3 Feb Los Angeles 17
3. Apply pivot¶
back_to_wide = long.pivot(
index='month',
columns='city',
values='temperature'
)
print("Back to wide (after pivot):")
print(back_to_wide)
city Los Angeles New York
month
Feb 17 3
Jan 15 5
When to Use Each¶
Guidelines for choosing between pivot and melt.
1. Use melt When¶
# Converting columns to rows
# Preparing data for visualization (seaborn, plotly)
# Normalizing for database storage
# Input for groupby operations
2. Use pivot When¶
# Creating summary tables
# Converting rows to columns
# Preparing data for comparison
# Creating cross-tabulation
3. Format Characteristics¶
| Format | Rows | Columns | Use Case |
|---|---|---|---|
| Wide | Few | Many | Display, comparison |
| Long | Many | Few | Analysis, storage |
Complete Round-trip¶
Transform and reverse without data loss.
1. Start with Wide¶
original = pd.DataFrame({
'product': ['A', 'B'],
'Q1': [100, 150],
'Q2': [200, 250]
})
2. Melt to Long¶
long_form = original.melt(
id_vars=['product'],
var_name='quarter',
value_name='sales'
)
3. Pivot Back to Wide¶
wide_again = long_form.pivot(
index='product',
columns='quarter',
values='sales'
).reset_index()
wide_again.columns.name = None # Remove column name
Practical Example¶
Temperature data transformation.
1. Original Wide Data¶
temps = pd.DataFrame({
'date': ['2024-01-01', '2024-01-02'],
'city_A': [20, 22],
'city_B': [15, 17],
'city_C': [25, 27]
})
2. Melt for Analysis¶
temps_long = temps.melt(
id_vars=['date'],
var_name='city',
value_name='temperature'
)
# Now can easily compute:
temps_long.groupby('city')['temperature'].mean()
3. Pivot for Display¶
temps_wide = temps_long.pivot(
index='date',
columns='city',
values='temperature'
)
# Good for side-by-side comparison
Key Differences¶
Summary of differences.
1. Direction¶
# melt: columns → rows (wide to long)
# pivot: rows → columns (long to wide)
2. Data Volume¶
# melt: increases row count
# pivot: decreases row count (typically)
3. Required Parameters¶
# melt: id_vars (optional), var_name, value_name
# pivot: index, columns, values
Runnable Example: pivot_reshape_tutorial.py¶
"""
Pandas Tutorial: Pivot Tables and Reshaping Data.
Covers pivot, pivot_table, melt, stack, and unstack.
"""
import pandas as pd
import numpy as np
# =============================================================================
# Main
# =============================================================================
if __name__ == "__main__":
print("="*70)
print("PIVOT TABLES AND RESHAPING")
print("="*70)
# Create sample data
np.random.seed(42)
data = pd.DataFrame({
'Date': pd.date_range('2024-01-01', periods=12, freq='M'),
'Product': ['A', 'B'] * 6,
'Region': ['North', 'North', 'South', 'South'] * 3,
'Sales': np.random.randint(100, 1000, 12)
})
print("\nOriginal Data:")
print(data)
# Pivot
print("\n1. Pivot (reshape from long to wide):")
pivoted = data.pivot(index='Date', columns='Product', values='Sales')
print(pivoted.head())
# Pivot table (with aggregation)
print("\n2. Pivot Table with aggregation:")
pivot_table = data.pivot_table(values='Sales',
index='Product',
columns='Region',
aggfunc='mean')
print(pivot_table)
# Multiple aggregation functions
print("\n3. Pivot table with multiple functions:")
pivot_multi = data.pivot_table(values='Sales',
index='Product',
columns='Region',
aggfunc=['sum', 'mean', 'count'])
print(pivot_multi)
# Melt (reshape from wide to long)
wide_data = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['Alice', 'Bob', 'Charlie'],
'Math': [85, 92, 78],
'Science': [88, 95, 82],
'English': [90, 87, 85]
})
print("\n4. Wide format data:")
print(wide_data)
print("\n5. Melt (reshape to long format):")
melted = pd.melt(wide_data,
id_vars=['ID', 'Name'],
value_vars=['Math', 'Science', 'English'],
var_name='Subject',
value_name='Score')
print(melted)
# Stack and Unstack
df_multi = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}, index=['X', 'Y', 'Z'])
print("\n6. Original DataFrame:")
print(df_multi)
print("\n7. Stack (columns to rows):")
stacked = df_multi.stack()
print(stacked)
print("\n8. Unstack (rows to columns):")
unstacked = stacked.unstack()
print(unstacked)
# Cross-tabulation
df_survey = pd.DataFrame({
'Gender': ['M', 'F', 'M', 'F', 'M', 'F'],
'Age_Group': ['Young', 'Young', 'Old', 'Old', 'Young', 'Old'],
'Response': ['Yes', 'No', 'Yes', 'Yes', 'No', 'Yes']
})
print("\n9. Cross-tabulation:")
crosstab = pd.crosstab(df_survey['Gender'],
df_survey['Response'],
margins=True)
print(crosstab)
print("\nKEY TAKEAWAYS:")
print("- pivot(): Reshape data (needs unique index/column combinations)")
print("- pivot_table(): Pivot with aggregation")
print("- melt(): Convert wide to long format")
print("- stack(): Pivot columns to row index")
print("- unstack(): Pivot row index to columns")
print("- crosstab(): Compute frequency table")