pivot_table Method¶

The pivot_table() method creates a spreadsheet-style pivot table with aggregation support, handling duplicate entries and providing more flexibility than pivot().

Mental Model

pivot_table is pivot with a built-in groupby. When multiple rows share the same (index, columns) pair, pivot_table aggregates them using aggfunc (default: mean). Think of it as the pandas equivalent of an Excel pivot table -- specify rows, columns, values, and an aggregation function to get a summary cross-tabulation.

Basic Usage¶

Create a pivot table with aggregation.

1. Simple Pivot Table¶

```python import pandas as pd import numpy as np

df = pd.DataFrame({ 'date': ['5/1/2017', '5/1/2017', '5/2/2017', '5/2/2017', '5/1/2017', '5/2/2017'], 'city': ['new york', 'new york', 'new york', 'new york', 'mumbai', 'mumbai'], 'temperature': [65, 61, 70, 72, 75, 81], 'humidity': [56, 54, 60, 62, 80, 55] }) print(df.head()) ```

2. Create Pivot Table¶

python dg = df.pivot_table( index="city", columns="date", aggfunc=np.mean ) print(dg)

humidity temperature date 5/1/2017 5/2/2017 5/1/2017 5/2/2017 city mumbai 80.0 55.0 75.0 81.0 new york 55.0 61.0 63.0 71.0

3. Automatic Aggregation¶

pivot_table aggregates duplicate index-column pairs.

pivot_table vs pivot¶

Key differences between the methods.

1. Duplicates¶

```python

pivot: fails with duplicate entries¶

pivot_table: aggregates duplicates¶

```

2. Aggregation Function¶

```python

pivot: no aggregation¶

pivot_table: aggfunc parameter (default='mean')¶

```

3. Multiple Values¶

```python

pivot_table can show multiple columns automatically¶

```

aggfunc Parameter¶

Specify how to aggregate values.

1. Built-in Functions¶

python df.pivot_table(index='city', columns='date', values='temperature', aggfunc='sum') df.pivot_table(index='city', columns='date', values='temperature', aggfunc='count') df.pivot_table(index='city', columns='date', values='temperature', aggfunc='max')

2. NumPy Functions¶

python df.pivot_table(index='city', columns='date', values='temperature', aggfunc=np.mean) df.pivot_table(index='city', columns='date', values='temperature', aggfunc=np.std)

3. Multiple Functions¶

python df.pivot_table( index='city', columns='date', values='temperature', aggfunc=['mean', 'std', 'count'] )

values Parameter¶

Specify which columns to aggregate.

1. Single Column¶

python df.pivot_table( index='city', columns='date', values='temperature' )

2. Multiple Columns¶

python df.pivot_table( index='city', columns='date', values=['temperature', 'humidity'] )

3. All Numeric (Default)¶

```python

If values not specified, all numeric columns are used¶

df.pivot_table(index='city', columns='date') ```

fill_value Parameter¶

Replace missing values.

1. Fill with Zero¶

python df.pivot_table( index='city', columns='date', values='temperature', fill_value=0 )

2. Fill with NaN (Default)¶

Missing combinations show NaN.

3. Custom Fill¶

python df.pivot_table(..., fill_value=-999)

Practical Example¶

Sales transactions analysis.

1. Sample Data¶

python sales = pd.DataFrame({ 'Region': ['North', 'South', 'East', 'West', 'North', 'South'], 'Product': ['A', 'A', 'B', 'B', 'C', 'C'], 'Sales': [100, 150, 200, 300, 250, 180], 'Profit': [20, 30, 50, 80, 60, 40] })

2. Create Summary¶

python pivot_sales = sales.pivot_table( values='Sales', index='Region', columns='Product', aggfunc='sum', fill_value=0 ) print(pivot_sales)

3. Add Margins¶

python pivot_sales = sales.pivot_table( values='Sales', index='Region', columns='Product', aggfunc='sum', margins=True, margins_name='Total' )

Exercises¶

Exercise 1. Write code that creates a pivot table with aggfunc='mean' to compute the average value for each group combination.

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise¶

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```

Exercise 2. Explain the difference between .pivot() and .pivot_table(). When must you use pivot_table?

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.

Exercise 3. Create a pivot table with multiple aggregation functions using aggfunc=['mean', 'sum', 'count'].

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```

Exercise 4. Write code that creates a pivot table with margins=True to add row and column totals.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```