Skip to content

Index Objects

Index objects provide axis labels for pandas data structures. Understanding indexes is fundamental to effective pandas usage.

Mental Model

An Index is an immutable array of labels attached to an axis. It serves two roles: it is the lookup key for label-based selection (loc), and it is the alignment mechanism that makes arithmetic between mismatched Series "just work." Every pandas operation begins by consulting the Index.

Index Purpose

Index serves as an immutable container for axis labels.

1. Label Container

```python import pandas as pd

idx = pd.Index(['a', 'b', 'c', 'd']) s = pd.Series([10, 20, 30, 40], index=idx) print(s) ```

a 10 b 20 c 30 d 40 dtype: int64

2. Immutability

Indexes cannot be modified in place:

```python idx = pd.Index(['a', 'b', 'c'])

idx[0] = 'x' # TypeError: Index does not support mutable operations

```

3. Index Types

python pd.Index # Generic index pd.RangeIndex # Memory-efficient integer range pd.DatetimeIndex # Datetime labels pd.MultiIndex # Hierarchical index pd.CategoricalIndex # Categorical data

Index Operations

Index supports set-like operations for data alignment.

1. Set Union

```python idx1 = pd.Index(['a', 'b', 'c']) idx2 = pd.Index(['b', 'c', 'd'])

print(idx1.union(idx2))

Index(['a', 'b', 'c', 'd'], dtype='object')

```

2. Set Intersection

```python print(idx1.intersection(idx2))

Index(['b', 'c'], dtype='object')

```

3. Set Difference

```python print(idx1.difference(idx2))

Index(['a'], dtype='object')

```

Automatic Alignment

pandas automatically aligns data based on index labels during operations.

1. Series Alignment

python s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c']) s2 = pd.Series([10, 20], index=['b', 'c']) result = s1 + s2 print(result)

a NaN b 12.0 c 23.0 dtype: float64

2. DataFrame Alignment

python df1 = pd.DataFrame({'A': [1, 2]}, index=['x', 'y']) df2 = pd.DataFrame({'A': [10, 20]}, index=['y', 'z']) print(df1 + df2)

3. Fill Value

python result = s1.add(s2, fill_value=0)

Reindexing

Change the index of a Series or DataFrame with reindex.

1. Basic Reindexing

python s = pd.Series([1, 2, 3], index=['a', 'b', 'c']) s_new = s.reindex(['a', 'b', 'c', 'd']) print(s_new)

a 1.0 b 2.0 c 3.0 d NaN dtype: float64

2. Fill Missing Values

python s_new = s.reindex(['a', 'b', 'c', 'd'], fill_value=0)

3. Forward Fill

python s_new = s.reindex(['a', 'b', 'c', 'd'], method='ffill')

RangeIndex

Default integer index optimized for memory efficiency.

1. Automatic Creation

```python s = pd.Series([10, 20, 30]) print(s.index)

RangeIndex(start=0, stop=3, step=1)

```

2. Reset Index

```python df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z']) df_reset = df.reset_index(drop=True) print(df_reset.index)

RangeIndex(start=0, stop=3, step=1)

```

3. Memory Efficiency

RangeIndex stores only start, stop, and step, not individual values.


Exercises

Exercise 1. Create a pd.Index from a list of strings. Demonstrate that the Index is immutable by attempting to modify an element (it should raise a TypeError).

Solution to Exercise 1

Verify Index immutability.

import pandas as pd

idx = pd.Index(['a', 'b', 'c'])
print(idx)
try:
    idx[0] = 'z'
except TypeError as e:
    print(f"TypeError: {e}")

Exercise 2. Create two Index objects with overlapping values. Use .intersection(), .union(), and .difference() to perform set operations on them.

Solution to Exercise 2

Perform set operations on Index objects.

import pandas as pd

idx1 = pd.Index([1, 2, 3, 4])
idx2 = pd.Index([3, 4, 5, 6])
print("Intersection:", idx1.intersection(idx2).tolist())
print("Union:", idx1.union(idx2).tolist())
print("Difference:", idx1.difference(idx2).tolist())

Exercise 3. Create a DataFrame and use .set_index() to set a column as the index. Then use .reset_index() to move the index back to a column. Verify the DataFrame is the same as the original.

Solution to Exercise 3

Round-trip set_index and reset_index.

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3], 'val': [10, 20, 30]})
df_indexed = df.set_index('id')
print("Indexed:\n", df_indexed)
df_reset = df_indexed.reset_index()
print("Reset:\n", df_reset)
assert df.equals(df_reset)