Skip to content

Multi-Level Grouping

Group by multiple columns to create hierarchical aggregations.

Mental Model

Grouping by multiple columns creates a cross-tabulation: each unique combination of values forms one group. The result has a MultiIndex with one level per grouping column. Think of it as nested folders -- region, then city -- where each leaf folder contains the rows for that specific combination.

Basic Multi-Level

Group by multiple columns.

1. Two-Level Grouping

```python import pandas as pd

df = pd.DataFrame({ 'region': ['East', 'East', 'West', 'West'], 'product': ['A', 'B', 'A', 'B'], 'sales': [100, 150, 200, 250] })

result = df.groupby(['region', 'product'])['sales'].sum() print(result) ```

region product East A 100 B 150 West A 200 B 250 Name: sales, dtype: int64

2. MultiIndex Result

The result has a hierarchical index.

3. Access Levels

python result['East'] # All products in East result['East', 'A'] # Specific combination

reset_index

Flatten the hierarchical result.

1. Convert to DataFrame

python result = df.groupby(['region', 'product'])['sales'].sum().reset_index() print(result)

region product sales 0 East A 100 1 East B 150 2 West A 200 3 West B 250

2. as_index=False

python df.groupby(['region', 'product'], as_index=False)['sales'].sum()

3. Equivalent Results

Both approaches produce a flat DataFrame.

Aggregations with Multiple Columns

Apply aggregations to grouped data.

1. Single Aggregation

python df.groupby(['region', 'product']).agg({'sales': 'sum'})

2. Multiple Aggregations

python df.groupby(['region', 'product']).agg({ 'sales': ['sum', 'mean', 'count'] })

3. Named Aggregations

python df.groupby(['region', 'product']).agg( total_sales=('sales', 'sum'), avg_sales=('sales', 'mean') )

unstack Method

Reshape grouped results.

1. Basic unstack

python result = df.groupby(['region', 'product'])['sales'].sum().unstack() print(result)

product A B region East 100 150 West 200 250

2. Pivot-like Output

unstack moves inner index level to columns.

3. Fill Missing

python result.unstack(fill_value=0)

Practical Example

Financial sector analysis.

1. Sample Data

python df = pd.DataFrame({ 'sector': ['Tech', 'Tech', 'Finance', 'Finance'], 'ticker': ['AAPL', 'MSFT', 'JPM', 'GS'], 'per': [25.5, 30.2, 12.1, 10.8], 'market_cap': [2800, 2400, 450, 120] })

2. Grouped Statistics

python stats = df.groupby('sector').agg({ 'per': ['mean', 'min', 'max'], 'market_cap': 'sum' }) print(stats)

3. Reset and Flatten

python stats.columns = ['_'.join(col) for col in stats.columns] stats = stats.reset_index()


Exercises

Exercise 1. Group a DataFrame by ['region', 'product'] and compute the sum of 'sales'. Then use .unstack() to reshape the result into a pivot-like table with regions as rows and products as columns.

Solution to Exercise 1

Group, aggregate, and unstack for a pivot-like view.

import pandas as pd

df = pd.DataFrame({
    'region': ['East', 'East', 'West', 'West'],
    'product': ['A', 'B', 'A', 'B'],
    'sales': [100, 150, 200, 250]
})
result = df.groupby(['region', 'product'])['sales'].sum().unstack()
print(result)

Exercise 2. Apply named aggregations with multi-level grouping: group by ['sector', 'ticker'] and compute total_volume=('volume', 'sum') and avg_price=('price', 'mean'). Flatten the column index and reset the row index.

Solution to Exercise 2

Named aggregations with multi-level grouping.

import pandas as pd

df = pd.DataFrame({
    'sector': ['Tech', 'Tech', 'Finance', 'Finance'],
    'ticker': ['AAPL', 'MSFT', 'JPM', 'GS'],
    'volume': [1000, 800, 500, 300],
    'price': [150.0, 350.0, 180.0, 380.0]
})
result = df.groupby(['sector', 'ticker']).agg(
    total_volume=('volume', 'sum'),
    avg_price=('price', 'mean')
).reset_index()
print(result)

Exercise 3. Group by two columns and access a specific sub-group using bracket indexing on the resulting MultiIndex Series: result['East'] to get all products in the East region, and result['East', 'A'] for a specific combination.

Solution to Exercise 3

Access sub-groups in a MultiIndex result.

import pandas as pd

df = pd.DataFrame({
    'region': ['East', 'East', 'West', 'West'],
    'product': ['A', 'B', 'A', 'B'],
    'sales': [100, 150, 200, 250]
})
result = df.groupby(['region', 'product'])['sales'].sum()
print("All East:", result['East'])
print("East-A:", result['East', 'A'])