pivot_table Method¶
The pivot_table() method creates a spreadsheet-style pivot table with aggregation support, handling duplicate entries and providing more flexibility than pivot().
Mental Model
pivot_table is pivot with a built-in groupby. When multiple rows share the same (index, columns) pair, pivot_table aggregates them using aggfunc (default: mean). Think of it as the pandas equivalent of an Excel pivot table -- specify rows, columns, values, and an aggregation function to get a summary cross-tabulation.
Basic Usage¶
Create a pivot table with aggregation.
1. Simple Pivot Table¶
```python import pandas as pd import numpy as np
df = pd.DataFrame({ 'date': ['5/1/2017', '5/1/2017', '5/2/2017', '5/2/2017', '5/1/2017', '5/2/2017'], 'city': ['new york', 'new york', 'new york', 'new york', 'mumbai', 'mumbai'], 'temperature': [65, 61, 70, 72, 75, 81], 'humidity': [56, 54, 60, 62, 80, 55] }) print(df.head()) ```
2. Create Pivot Table¶
python
dg = df.pivot_table(
index="city",
columns="date",
aggfunc=np.mean
)
print(dg)
humidity temperature
date 5/1/2017 5/2/2017 5/1/2017 5/2/2017
city
mumbai 80.0 55.0 75.0 81.0
new york 55.0 61.0 63.0 71.0
3. Automatic Aggregation¶
pivot_table aggregates duplicate index-column pairs.
pivot_table vs pivot¶
Key differences between the methods.
1. Duplicates¶
```python
pivot: fails with duplicate entries¶
pivot_table: aggregates duplicates¶
```
2. Aggregation Function¶
```python
pivot: no aggregation¶
pivot_table: aggfunc parameter (default='mean')¶
```
3. Multiple Values¶
```python
pivot_table can show multiple columns automatically¶
```
aggfunc Parameter¶
Specify how to aggregate values.
1. Built-in Functions¶
python
df.pivot_table(index='city', columns='date', values='temperature', aggfunc='sum')
df.pivot_table(index='city', columns='date', values='temperature', aggfunc='count')
df.pivot_table(index='city', columns='date', values='temperature', aggfunc='max')
2. NumPy Functions¶
python
df.pivot_table(index='city', columns='date', values='temperature', aggfunc=np.mean)
df.pivot_table(index='city', columns='date', values='temperature', aggfunc=np.std)
3. Multiple Functions¶
python
df.pivot_table(
index='city',
columns='date',
values='temperature',
aggfunc=['mean', 'std', 'count']
)
values Parameter¶
Specify which columns to aggregate.
1. Single Column¶
python
df.pivot_table(
index='city',
columns='date',
values='temperature'
)
2. Multiple Columns¶
python
df.pivot_table(
index='city',
columns='date',
values=['temperature', 'humidity']
)
3. All Numeric (Default)¶
```python
If values not specified, all numeric columns are used¶
df.pivot_table(index='city', columns='date') ```
fill_value Parameter¶
Replace missing values.
1. Fill with Zero¶
python
df.pivot_table(
index='city',
columns='date',
values='temperature',
fill_value=0
)
2. Fill with NaN (Default)¶
Missing combinations show NaN.
3. Custom Fill¶
python
df.pivot_table(..., fill_value=-999)
Practical Example¶
Sales transactions analysis.
1. Sample Data¶
python
sales = pd.DataFrame({
'Region': ['North', 'South', 'East', 'West', 'North', 'South'],
'Product': ['A', 'A', 'B', 'B', 'C', 'C'],
'Sales': [100, 150, 200, 300, 250, 180],
'Profit': [20, 30, 50, 80, 60, 40]
})
2. Create Summary¶
python
pivot_sales = sales.pivot_table(
values='Sales',
index='Region',
columns='Product',
aggfunc='sum',
fill_value=0
)
print(pivot_sales)
3. Add Margins¶
python
pivot_sales = sales.pivot_table(
values='Sales',
index='Region',
columns='Product',
aggfunc='sum',
margins=True,
margins_name='Total'
)
Exercises¶
Exercise 1. Write code that creates a pivot table with aggfunc='mean' to compute the average value for each group combination.
Solution to Exercise 1
```python import pandas as pd import numpy as np
Solution for the specific exercise¶
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```
Exercise 2. Explain the difference between .pivot() and .pivot_table(). When must you use pivot_table?
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.
Exercise 3. Create a pivot table with multiple aggregation functions using aggfunc=['mean', 'sum', 'count'].
Solution to Exercise 3
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```
Exercise 4. Write code that creates a pivot table with margins=True to add row and column totals.
Solution to Exercise 4
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```