pivot vs pivot_table¶
Both methods reshape data from long to wide format, but they have important differences in functionality and use cases.
Key Differences¶
Summary of differences between pivot and pivot_table.
1. Duplicate Handling¶
import pandas as pd
df = pd.DataFrame({
'date': ['2024-01-01', '2024-01-01', '2024-01-01'],
'city': ['NY', 'NY', 'LA'], # NY appears twice
'temp': [30, 32, 70]
})
# pivot: fails with duplicates
# df.pivot(index='date', columns='city', values='temp') # ValueError!
# pivot_table: aggregates duplicates
df.pivot_table(index='date', columns='city', values='temp', aggfunc='mean')
2. Aggregation Support¶
# pivot: no aggregation
# pivot_table: aggfunc parameter for aggregation
3. Default Behavior¶
# pivot: requires explicit values parameter
# pivot_table: can infer values from numeric columns
Comparison Table¶
Side-by-side feature comparison.
1. Features¶
| Feature | pivot | pivot_table |
|---|---|---|
| Duplicates | Error | Aggregates |
| aggfunc | No | Yes |
| margins | No | Yes |
| fill_value | No | Yes |
| Multiple values | Manual | Automatic |
2. Syntax¶
# pivot
df.pivot(index='row', columns='col', values='val')
# pivot_table
df.pivot_table(index='row', columns='col', values='val', aggfunc='mean')
3. Requirements¶
# pivot: unique index-column combinations
# pivot_table: any data (aggregates if needed)
When to Use Each¶
Guidelines for choosing.
1. Use pivot When¶
# Data has unique index-column pairs
# No aggregation needed
# Simple reshape operation
df.pivot(index='date', columns='ticker', values='price')
2. Use pivot_table When¶
# Data may have duplicates
# Aggregation is needed
# Need margins or fill_value
df.pivot_table(index='region', columns='product', values='sales', aggfunc='sum')
3. Default Choice¶
When in doubt, use pivot_tableāit handles all cases.
Example Comparison¶
Same data, different methods.
1. Unique Data¶
# Both work the same
df_unique = pd.DataFrame({
'date': ['Jan', 'Jan', 'Feb', 'Feb'],
'city': ['NY', 'LA', 'NY', 'LA'],
'sales': [100, 200, 150, 250]
})
# pivot
df_unique.pivot(index='date', columns='city', values='sales')
# pivot_table (same result)
df_unique.pivot_table(index='date', columns='city', values='sales')
2. Duplicate Data¶
df_dup = pd.DataFrame({
'date': ['Jan', 'Jan', 'Jan'],
'city': ['NY', 'NY', 'LA'], # NY duplicated
'sales': [100, 120, 200]
})
# pivot: fails
# df_dup.pivot(...) # ValueError
# pivot_table: aggregates
df_dup.pivot_table(index='date', columns='city', values='sales', aggfunc='mean')
# NY gets (100+120)/2 = 110
3. Multiple Aggregations¶
# Only pivot_table supports this
df.pivot_table(
index='date',
columns='city',
values='sales',
aggfunc=['sum', 'mean', 'count']
)
Performance¶
Considerations for large datasets.
1. Simple Cases¶
# pivot is slightly faster for simple unique data
2. Complex Cases¶
# pivot_table handles complexity better
3. Recommendation¶
Use the appropriate method for your data structure.