Skip to content

Multi-Level Grouping

Group by multiple columns to create hierarchical aggregations.

Basic Multi-Level

Group by multiple columns.

1. Two-Level Grouping

import pandas as pd

df = pd.DataFrame({
    'region': ['East', 'East', 'West', 'West'],
    'product': ['A', 'B', 'A', 'B'],
    'sales': [100, 150, 200, 250]
})

result = df.groupby(['region', 'product'])['sales'].sum()
print(result)
region  product
East    A          100
        B          150
West    A          200
        B          250
Name: sales, dtype: int64

2. MultiIndex Result

The result has a hierarchical index.

3. Access Levels

result['East']      # All products in East
result['East', 'A']  # Specific combination

reset_index

Flatten the hierarchical result.

1. Convert to DataFrame

result = df.groupby(['region', 'product'])['sales'].sum().reset_index()
print(result)
  region product  sales
0   East       A    100
1   East       B    150
2   West       A    200
3   West       B    250

2. as_index=False

df.groupby(['region', 'product'], as_index=False)['sales'].sum()

3. Equivalent Results

Both approaches produce a flat DataFrame.

Aggregations with Multiple Columns

Apply aggregations to grouped data.

1. Single Aggregation

df.groupby(['region', 'product']).agg({'sales': 'sum'})

2. Multiple Aggregations

df.groupby(['region', 'product']).agg({
    'sales': ['sum', 'mean', 'count']
})

3. Named Aggregations

df.groupby(['region', 'product']).agg(
    total_sales=('sales', 'sum'),
    avg_sales=('sales', 'mean')
)

unstack Method

Reshape grouped results.

1. Basic unstack

result = df.groupby(['region', 'product'])['sales'].sum().unstack()
print(result)
product    A    B
region           
East     100  150
West     200  250

2. Pivot-like Output

unstack moves inner index level to columns.

3. Fill Missing

result.unstack(fill_value=0)

Practical Example

Financial sector analysis.

1. Sample Data

df = pd.DataFrame({
    'sector': ['Tech', 'Tech', 'Finance', 'Finance'],
    'ticker': ['AAPL', 'MSFT', 'JPM', 'GS'],
    'per': [25.5, 30.2, 12.1, 10.8],
    'market_cap': [2800, 2400, 450, 120]
})

2. Grouped Statistics

stats = df.groupby('sector').agg({
    'per': ['mean', 'min', 'max'],
    'market_cap': 'sum'
})
print(stats)

3. Reset and Flatten

stats.columns = ['_'.join(col) for col in stats.columns]
stats = stats.reset_index()