pivot vs pivot_table¶

Both methods reshape data from long to wide format, but they have important differences in functionality and use cases.

Key Differences¶

Summary of differences between pivot and pivot_table.

1. Duplicate Handling¶

import pandas as pd

df = pd.DataFrame({
    'date': ['2024-01-01', '2024-01-01', '2024-01-01'],
    'city': ['NY', 'NY', 'LA'],  # NY appears twice
    'temp': [30, 32, 70]
})

# pivot: fails with duplicates
# df.pivot(index='date', columns='city', values='temp')  # ValueError!

# pivot_table: aggregates duplicates
df.pivot_table(index='date', columns='city', values='temp', aggfunc='mean')

2. Aggregation Support¶

# pivot: no aggregation
# pivot_table: aggfunc parameter for aggregation

3. Default Behavior¶

# pivot: requires explicit values parameter
# pivot_table: can infer values from numeric columns

Comparison Table¶

Side-by-side feature comparison.

1. Features¶

Feature	pivot	pivot_table
Duplicates	Error	Aggregates
aggfunc	No	Yes
margins	No	Yes
fill_value	No	Yes
Multiple values	Manual	Automatic

2. Syntax¶

# pivot
df.pivot(index='row', columns='col', values='val')

# pivot_table
df.pivot_table(index='row', columns='col', values='val', aggfunc='mean')

3. Requirements¶

# pivot: unique index-column combinations
# pivot_table: any data (aggregates if needed)

When to Use Each¶

Guidelines for choosing.

1. Use pivot When¶

# Data has unique index-column pairs
# No aggregation needed
# Simple reshape operation
df.pivot(index='date', columns='ticker', values='price')

2. Use pivot_table When¶

# Data may have duplicates
# Aggregation is needed
# Need margins or fill_value
df.pivot_table(index='region', columns='product', values='sales', aggfunc='sum')

3. Default Choice¶

When in doubt, use pivot_table—it handles all cases.

Example Comparison¶

Same data, different methods.

1. Unique Data¶

# Both work the same
df_unique = pd.DataFrame({
    'date': ['Jan', 'Jan', 'Feb', 'Feb'],
    'city': ['NY', 'LA', 'NY', 'LA'],
    'sales': [100, 200, 150, 250]
})

# pivot
df_unique.pivot(index='date', columns='city', values='sales')

# pivot_table (same result)
df_unique.pivot_table(index='date', columns='city', values='sales')

2. Duplicate Data¶

df_dup = pd.DataFrame({
    'date': ['Jan', 'Jan', 'Jan'],
    'city': ['NY', 'NY', 'LA'],  # NY duplicated
    'sales': [100, 120, 200]
})

# pivot: fails
# df_dup.pivot(...)  # ValueError

# pivot_table: aggregates
df_dup.pivot_table(index='date', columns='city', values='sales', aggfunc='mean')
# NY gets (100+120)/2 = 110

3. Multiple Aggregations¶

# Only pivot_table supports this
df.pivot_table(
    index='date',
    columns='city',
    values='sales',
    aggfunc=['sum', 'mean', 'count']
)

Performance¶

Considerations for large datasets.

1. Simple Cases¶

# pivot is slightly faster for simple unique data

2. Complex Cases¶

# pivot_table handles complexity better

3. Recommendation¶

Use the appropriate method for your data structure.