pivot vs pivot_table¶
Both methods reshape data from long to wide format, but they have important differences in functionality and use cases.
Mental Model
pivot is a strict reshape -- it moves values into a grid and fails if any cell would receive more than one value. pivot_table is a reshape plus aggregation -- it handles duplicates by applying aggfunc and also supports margins for row/column totals. When in doubt, use pivot_table because it is a strict superset of pivot.
Key Differences¶
Summary of differences between pivot and pivot_table.
1. Duplicate Handling¶
```python import pandas as pd
df = pd.DataFrame({ 'date': ['2024-01-01', '2024-01-01', '2024-01-01'], 'city': ['NY', 'NY', 'LA'], # NY appears twice 'temp': [30, 32, 70] })
pivot: fails with duplicates¶
df.pivot(index='date', columns='city', values='temp') # ValueError!¶
pivot_table: aggregates duplicates¶
df.pivot_table(index='date', columns='city', values='temp', aggfunc='mean') ```
2. Aggregation Support¶
```python
pivot: no aggregation¶
pivot_table: aggfunc parameter for aggregation¶
```
3. Default Behavior¶
```python
pivot: requires explicit values parameter¶
pivot_table: can infer values from numeric columns¶
```
Comparison Table¶
Side-by-side feature comparison.
1. Features¶
| Feature | pivot | pivot_table |
|---|---|---|
| Duplicates | Error | Aggregates |
| aggfunc | No | Yes |
| margins | No | Yes |
| fill_value | No | Yes |
| Multiple values | Manual | Automatic |
2. Syntax¶
```python
pivot¶
df.pivot(index='row', columns='col', values='val')
pivot_table¶
df.pivot_table(index='row', columns='col', values='val', aggfunc='mean') ```
3. Requirements¶
```python
pivot: unique index-column combinations¶
pivot_table: any data (aggregates if needed)¶
```
When to Use Each¶
Guidelines for choosing.
1. Use pivot When¶
```python
Data has unique index-column pairs¶
No aggregation needed¶
Simple reshape operation¶
df.pivot(index='date', columns='ticker', values='price') ```
2. Use pivot_table When¶
```python
Data may have duplicates¶
Aggregation is needed¶
Need margins or fill_value¶
df.pivot_table(index='region', columns='product', values='sales', aggfunc='sum') ```
3. Default Choice¶
When in doubt, use pivot_table—it handles all cases.
Example Comparison¶
Same data, different methods.
1. Unique Data¶
```python
Both work the same¶
df_unique = pd.DataFrame({ 'date': ['Jan', 'Jan', 'Feb', 'Feb'], 'city': ['NY', 'LA', 'NY', 'LA'], 'sales': [100, 200, 150, 250] })
pivot¶
df_unique.pivot(index='date', columns='city', values='sales')
pivot_table (same result)¶
df_unique.pivot_table(index='date', columns='city', values='sales') ```
2. Duplicate Data¶
```python df_dup = pd.DataFrame({ 'date': ['Jan', 'Jan', 'Jan'], 'city': ['NY', 'NY', 'LA'], # NY duplicated 'sales': [100, 120, 200] })
pivot: fails¶
df_dup.pivot(...) # ValueError¶
pivot_table: aggregates¶
df_dup.pivot_table(index='date', columns='city', values='sales', aggfunc='mean')
NY gets (100+120)/2 = 110¶
```
3. Multiple Aggregations¶
```python
Only pivot_table supports this¶
df.pivot_table( index='date', columns='city', values='sales', aggfunc=['sum', 'mean', 'count'] ) ```
Performance¶
Considerations for large datasets.
1. Simple Cases¶
```python
pivot is slightly faster for simple unique data¶
```
2. Complex Cases¶
```python
pivot_table handles complexity better¶
```
3. Recommendation¶
Use the appropriate method for your data structure.
Exercises¶
Exercise 1. Explain when to use .pivot() vs .pivot_table(). Give a concrete example for each.
Solution to Exercise 1
```python import pandas as pd import numpy as np
Solution for the specific exercise¶
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```
Exercise 2. Write code that demonstrates that .pivot_table() can handle duplicate entries (by aggregating) while .pivot() cannot.
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.
Exercise 3. Create the same output using both .pivot() and .pivot_table() on data without duplicates.
Solution to Exercise 3
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```
Exercise 4. Write code that uses .pivot_table() with a custom aggregation function passed to aggfunc.
Solution to Exercise 4
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```