pivot_table Method¶

The pivot_table() method creates a spreadsheet-style pivot table with aggregation support, handling duplicate entries and providing more flexibility than pivot().

Basic Usage¶

Create a pivot table with aggregation.

1. Simple Pivot Table¶

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'date': ['5/1/2017', '5/1/2017', '5/2/2017', '5/2/2017',
             '5/1/2017', '5/2/2017'],
    'city': ['new york', 'new york', 'new york', 'new york',
             'mumbai', 'mumbai'],
    'temperature': [65, 61, 70, 72, 75, 81],
    'humidity': [56, 54, 60, 62, 80, 55]
})
print(df.head())

2. Create Pivot Table¶

dg = df.pivot_table(
    index="city",
    columns="date",
    aggfunc=np.mean
)
print(dg)

         humidity          temperature         
date     5/1/2017 5/2/2017    5/1/2017 5/2/2017
city                                           
mumbai       80.0     55.0        75.0     81.0
new york     55.0     61.0        63.0     71.0

3. Automatic Aggregation¶

pivot_table aggregates duplicate index-column pairs.

pivot_table vs pivot¶

Key differences between the methods.

1. Duplicates¶

# pivot: fails with duplicate entries
# pivot_table: aggregates duplicates

2. Aggregation Function¶

# pivot: no aggregation
# pivot_table: aggfunc parameter (default='mean')

3. Multiple Values¶

# pivot_table can show multiple columns automatically

aggfunc Parameter¶

Specify how to aggregate values.

1. Built-in Functions¶

df.pivot_table(index='city', columns='date', values='temperature', aggfunc='sum')
df.pivot_table(index='city', columns='date', values='temperature', aggfunc='count')
df.pivot_table(index='city', columns='date', values='temperature', aggfunc='max')

2. NumPy Functions¶

df.pivot_table(index='city', columns='date', values='temperature', aggfunc=np.mean)
df.pivot_table(index='city', columns='date', values='temperature', aggfunc=np.std)

3. Multiple Functions¶

df.pivot_table(
    index='city',
    columns='date',
    values='temperature',
    aggfunc=['mean', 'std', 'count']
)

values Parameter¶

Specify which columns to aggregate.

1. Single Column¶

df.pivot_table(
    index='city',
    columns='date',
    values='temperature'
)

2. Multiple Columns¶

df.pivot_table(
    index='city',
    columns='date',
    values=['temperature', 'humidity']
)

3. All Numeric (Default)¶

# If values not specified, all numeric columns are used
df.pivot_table(index='city', columns='date')

fill_value Parameter¶

Replace missing values.

1. Fill with Zero¶

df.pivot_table(
    index='city',
    columns='date',
    values='temperature',
    fill_value=0
)

2. Fill with NaN (Default)¶

Missing combinations show NaN.

3. Custom Fill¶

df.pivot_table(..., fill_value=-999)

Practical Example¶

Sales transactions analysis.

1. Sample Data¶

sales = pd.DataFrame({
    'Region': ['North', 'South', 'East', 'West', 'North', 'South'],
    'Product': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Sales': [100, 150, 200, 300, 250, 180],
    'Profit': [20, 30, 50, 80, 60, 40]
})

2. Create Summary¶

pivot_sales = sales.pivot_table(
    values='Sales',
    index='Region',
    columns='Product',
    aggfunc='sum',
    fill_value=0
)
print(pivot_sales)

3. Add Margins¶

pivot_sales = sales.pivot_table(
    values='Sales',
    index='Region',
    columns='Product',
    aggfunc='sum',
    margins=True,
    margins_name='Total'
)