pivot_table Method¶
The pivot_table() method creates a spreadsheet-style pivot table with aggregation support, handling duplicate entries and providing more flexibility than pivot().
Basic Usage¶
Create a pivot table with aggregation.
1. Simple Pivot Table¶
import pandas as pd
import numpy as np
df = pd.DataFrame({
'date': ['5/1/2017', '5/1/2017', '5/2/2017', '5/2/2017',
'5/1/2017', '5/2/2017'],
'city': ['new york', 'new york', 'new york', 'new york',
'mumbai', 'mumbai'],
'temperature': [65, 61, 70, 72, 75, 81],
'humidity': [56, 54, 60, 62, 80, 55]
})
print(df.head())
2. Create Pivot Table¶
dg = df.pivot_table(
index="city",
columns="date",
aggfunc=np.mean
)
print(dg)
humidity temperature
date 5/1/2017 5/2/2017 5/1/2017 5/2/2017
city
mumbai 80.0 55.0 75.0 81.0
new york 55.0 61.0 63.0 71.0
3. Automatic Aggregation¶
pivot_table aggregates duplicate index-column pairs.
pivot_table vs pivot¶
Key differences between the methods.
1. Duplicates¶
# pivot: fails with duplicate entries
# pivot_table: aggregates duplicates
2. Aggregation Function¶
# pivot: no aggregation
# pivot_table: aggfunc parameter (default='mean')
3. Multiple Values¶
# pivot_table can show multiple columns automatically
aggfunc Parameter¶
Specify how to aggregate values.
1. Built-in Functions¶
df.pivot_table(index='city', columns='date', values='temperature', aggfunc='sum')
df.pivot_table(index='city', columns='date', values='temperature', aggfunc='count')
df.pivot_table(index='city', columns='date', values='temperature', aggfunc='max')
2. NumPy Functions¶
df.pivot_table(index='city', columns='date', values='temperature', aggfunc=np.mean)
df.pivot_table(index='city', columns='date', values='temperature', aggfunc=np.std)
3. Multiple Functions¶
df.pivot_table(
index='city',
columns='date',
values='temperature',
aggfunc=['mean', 'std', 'count']
)
values Parameter¶
Specify which columns to aggregate.
1. Single Column¶
df.pivot_table(
index='city',
columns='date',
values='temperature'
)
2. Multiple Columns¶
df.pivot_table(
index='city',
columns='date',
values=['temperature', 'humidity']
)
3. All Numeric (Default)¶
# If values not specified, all numeric columns are used
df.pivot_table(index='city', columns='date')
fill_value Parameter¶
Replace missing values.
1. Fill with Zero¶
df.pivot_table(
index='city',
columns='date',
values='temperature',
fill_value=0
)
2. Fill with NaN (Default)¶
Missing combinations show NaN.
3. Custom Fill¶
df.pivot_table(..., fill_value=-999)
Practical Example¶
Sales transactions analysis.
1. Sample Data¶
sales = pd.DataFrame({
'Region': ['North', 'South', 'East', 'West', 'North', 'South'],
'Product': ['A', 'A', 'B', 'B', 'C', 'C'],
'Sales': [100, 150, 200, 300, 250, 180],
'Profit': [20, 30, 50, 80, 60, 40]
})
2. Create Summary¶
pivot_sales = sales.pivot_table(
values='Sales',
index='Region',
columns='Product',
aggfunc='sum',
fill_value=0
)
print(pivot_sales)
3. Add Margins¶
pivot_sales = sales.pivot_table(
values='Sales',
index='Region',
columns='Product',
aggfunc='sum',
margins=True,
margins_name='Total'
)