Multi-Level Grouping¶

Group by multiple columns to create hierarchical aggregations.

Mental Model

Grouping by multiple columns creates a cross-tabulation: each unique combination of values forms one group. The result has a MultiIndex with one level per grouping column. Think of it as nested folders -- region, then city -- where each leaf folder contains the rows for that specific combination.

Basic Multi-Level¶

Group by multiple columns.

1. Two-Level Grouping¶

```python import pandas as pd

df = pd.DataFrame({ 'region': ['East', 'East', 'West', 'West'], 'product': ['A', 'B', 'A', 'B'], 'sales': [100, 150, 200, 250] })

result = df.groupby(['region', 'product'])['sales'].sum() print(result) ```

region product East A 100 B 150 West A 200 B 250 Name: sales, dtype: int64

2. MultiIndex Result¶

The result has a hierarchical index.

3. Access Levels¶

python result['East'] # All products in East result['East', 'A'] # Specific combination

reset_index¶

Flatten the hierarchical result.

1. Convert to DataFrame¶

python result = df.groupby(['region', 'product'])['sales'].sum().reset_index() print(result)

region product sales 0 East A 100 1 East B 150 2 West A 200 3 West B 250

2. as_index=False¶

python df.groupby(['region', 'product'], as_index=False)['sales'].sum()

3. Equivalent Results¶

Both approaches produce a flat DataFrame.

Aggregations with Multiple Columns¶

Apply aggregations to grouped data.

1. Single Aggregation¶

python df.groupby(['region', 'product']).agg({'sales': 'sum'})

2. Multiple Aggregations¶

python df.groupby(['region', 'product']).agg({ 'sales': ['sum', 'mean', 'count'] })

3. Named Aggregations¶

python df.groupby(['region', 'product']).agg( total_sales=('sales', 'sum'), avg_sales=('sales', 'mean') )

unstack Method¶

Reshape grouped results.

1. Basic unstack¶

python result = df.groupby(['region', 'product'])['sales'].sum().unstack() print(result)

product A B region East 100 150 West 200 250

2. Pivot-like Output¶

unstack moves inner index level to columns.

3. Fill Missing¶

python result.unstack(fill_value=0)

Practical Example¶

Financial sector analysis.

1. Sample Data¶

python df = pd.DataFrame({ 'sector': ['Tech', 'Tech', 'Finance', 'Finance'], 'ticker': ['AAPL', 'MSFT', 'JPM', 'GS'], 'per': [25.5, 30.2, 12.1, 10.8], 'market_cap': [2800, 2400, 450, 120] })

2. Grouped Statistics¶

python stats = df.groupby('sector').agg({ 'per': ['mean', 'min', 'max'], 'market_cap': 'sum' }) print(stats)

3. Reset and Flatten¶

python stats.columns = ['_'.join(col) for col in stats.columns] stats = stats.reset_index()

Exercises¶

Exercise 1. Group a DataFrame by ['region', 'product'] and compute the sum of 'sales'. Then use .unstack() to reshape the result into a pivot-like table with regions as rows and products as columns.

Solution to Exercise 1

Group, aggregate, and unstack for a pivot-like view.

import pandas as pd

df = pd.DataFrame({
    'region': ['East', 'East', 'West', 'West'],
    'product': ['A', 'B', 'A', 'B'],
    'sales': [100, 150, 200, 250]
})
result = df.groupby(['region', 'product'])['sales'].sum().unstack()
print(result)

Exercise 2. Apply named aggregations with multi-level grouping: group by ['sector', 'ticker'] and compute total_volume=('volume', 'sum') and avg_price=('price', 'mean'). Flatten the column index and reset the row index.

Solution to Exercise 2

Named aggregations with multi-level grouping.

import pandas as pd

df = pd.DataFrame({
    'sector': ['Tech', 'Tech', 'Finance', 'Finance'],
    'ticker': ['AAPL', 'MSFT', 'JPM', 'GS'],
    'volume': [1000, 800, 500, 300],
    'price': [150.0, 350.0, 180.0, 380.0]
})
result = df.groupby(['sector', 'ticker']).agg(
    total_volume=('volume', 'sum'),
    avg_price=('price', 'mean')
).reset_index()
print(result)

Exercise 3. Group by two columns and access a specific sub-group using bracket indexing on the resulting MultiIndex Series: result['East'] to get all products in the East region, and result['East', 'A'] for a specific combination.

Solution to Exercise 3

Access sub-groups in a MultiIndex result.

import pandas as pd

df = pd.DataFrame({
    'region': ['East', 'East', 'West', 'West'],
    'product': ['A', 'B', 'A', 'B'],
    'sales': [100, 150, 200, 250]
})
result = df.groupby(['region', 'product'])['sales'].sum()
print("All East:", result['East'])
print("East-A:", result['East', 'A'])