Skip to content

MultiIndex Hierarchy

Mental Model

A MultiIndex is a tree encoded as tuples. Each "row label" is actually a tuple of values from multiple levels, like ('US', 'NY'). Internally, pandas stores each level as a separate array and uses integer codes to map rows to level values, making the structure memory-efficient even with millions of rows.

Hierarchical Indexing

1. Definition

MultiIndex enables multiple index levels:

```python import pandas as pd

arrays = [ ['A', 'A', 'B', 'B'], [1, 2, 1, 2] ] index = pd.MultiIndex.from_arrays(arrays, names=['letter', 'number']) s = pd.Series([10, 20, 30, 40], index=index) ```

2. From Tuples

python tuples = [('A', 1), ('A', 2), ('B', 1), ('B', 2)] index = pd.MultiIndex.from_tuples(tuples)

3. From Product

python index = pd.MultiIndex.from_product([['A', 'B'], [1, 2]])

Accessing Data

1. Level Selection

python s['A'] # All entries with level 0 = 'A' s['A', 1] # Specific entry

2. Cross-section

```python df = s.unstack() # Pivot to DataFrame

letter A B

number

1 10 30

2 20 40

```

3. Slicing

python s.loc[('A', 1):('B', 1)] # Range

DataFrame with MultiIndex

1. Rows and Columns

python df = pd.DataFrame( [[1, 2], [3, 4], [5, 6], [7, 8]], index=pd.MultiIndex.from_tuples([('A', 'x'), ('A', 'y'), ('B', 'x'), ('B', 'y')]), columns=['col1', 'col2'] )

2. Selecting

python df.loc['A'] # All rows where first level = 'A' df.loc[('A', 'x')] # Specific row

3. Stacking

python stacked = df.stack() # Add column as index level unstacked = stacked.unstack() # Remove level


Exercises

Exercise 1. Create a DataFrame with a MultiIndex (2 levels) using pd.MultiIndex.from_tuples(). Access data at each level using .loc[].

Solution to Exercise 1

```python import pandas as pd

df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'], 'age': [25, 30, 35, 40, 28], 'score': [85, 92, 78, 95, 88] }) print(f'Shape: {df.shape}') print(f'Columns: {df.columns.tolist()}') print(f'Dtypes:\n{df.dtypes}') ```


Exercise 2. Explain the difference between pd.MultiIndex.from_tuples(), from_arrays(), and from_product(). Give an example of each.

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas data structures and their relationships.


Exercise 3. Write code that creates a DataFrame with a MultiIndex and uses .xs() to select a cross-section at a specific level.

Solution to Exercise 3

```python import pandas as pd

df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'score': [85, 92, 78] })

Label-based

print(df.loc[0])

Position-based

print(df.iloc[-1]) ```


Exercise 4. Create a MultiIndex DataFrame and use reset_index() to flatten it back to a regular DataFrame. Then use set_index() to recreate the MultiIndex.

Solution to Exercise 4

```python import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) df['c'] = df['a'] + df['b'] df = df.drop(columns=['b']) print(df) ```