Index Objects¶
Index objects provide axis labels for pandas data structures. Understanding indexes is fundamental to effective pandas usage.
Mental Model
An Index is an immutable array of labels attached to an axis. It serves two roles: it is the lookup key for label-based selection (loc), and it is the alignment mechanism that makes arithmetic between mismatched Series "just work." Every pandas operation begins by consulting the Index.
Index Purpose¶
Index serves as an immutable container for axis labels.
1. Label Container¶
```python import pandas as pd
idx = pd.Index(['a', 'b', 'c', 'd']) s = pd.Series([10, 20, 30, 40], index=idx) print(s) ```
a 10
b 20
c 30
d 40
dtype: int64
2. Immutability¶
Indexes cannot be modified in place:
```python idx = pd.Index(['a', 'b', 'c'])
idx[0] = 'x' # TypeError: Index does not support mutable operations¶
```
3. Index Types¶
python
pd.Index # Generic index
pd.RangeIndex # Memory-efficient integer range
pd.DatetimeIndex # Datetime labels
pd.MultiIndex # Hierarchical index
pd.CategoricalIndex # Categorical data
Index Operations¶
Index supports set-like operations for data alignment.
1. Set Union¶
```python idx1 = pd.Index(['a', 'b', 'c']) idx2 = pd.Index(['b', 'c', 'd'])
print(idx1.union(idx2))
Index(['a', 'b', 'c', 'd'], dtype='object')¶
```
2. Set Intersection¶
```python print(idx1.intersection(idx2))
Index(['b', 'c'], dtype='object')¶
```
3. Set Difference¶
```python print(idx1.difference(idx2))
Index(['a'], dtype='object')¶
```
Automatic Alignment¶
pandas automatically aligns data based on index labels during operations.
1. Series Alignment¶
python
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([10, 20], index=['b', 'c'])
result = s1 + s2
print(result)
a NaN
b 12.0
c 23.0
dtype: float64
2. DataFrame Alignment¶
python
df1 = pd.DataFrame({'A': [1, 2]}, index=['x', 'y'])
df2 = pd.DataFrame({'A': [10, 20]}, index=['y', 'z'])
print(df1 + df2)
3. Fill Value¶
python
result = s1.add(s2, fill_value=0)
Reindexing¶
Change the index of a Series or DataFrame with reindex.
1. Basic Reindexing¶
python
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s_new = s.reindex(['a', 'b', 'c', 'd'])
print(s_new)
a 1.0
b 2.0
c 3.0
d NaN
dtype: float64
2. Fill Missing Values¶
python
s_new = s.reindex(['a', 'b', 'c', 'd'], fill_value=0)
3. Forward Fill¶
python
s_new = s.reindex(['a', 'b', 'c', 'd'], method='ffill')
RangeIndex¶
Default integer index optimized for memory efficiency.
1. Automatic Creation¶
```python s = pd.Series([10, 20, 30]) print(s.index)
RangeIndex(start=0, stop=3, step=1)¶
```
2. Reset Index¶
```python df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z']) df_reset = df.reset_index(drop=True) print(df_reset.index)
RangeIndex(start=0, stop=3, step=1)¶
```
3. Memory Efficiency¶
RangeIndex stores only start, stop, and step, not individual values.
Exercises¶
Exercise 1.
Create a pd.Index from a list of strings. Demonstrate that the Index is immutable by attempting to modify an element (it should raise a TypeError).
Solution to Exercise 1
Verify Index immutability.
import pandas as pd
idx = pd.Index(['a', 'b', 'c'])
print(idx)
try:
idx[0] = 'z'
except TypeError as e:
print(f"TypeError: {e}")
Exercise 2.
Create two Index objects with overlapping values. Use .intersection(), .union(), and .difference() to perform set operations on them.
Solution to Exercise 2
Perform set operations on Index objects.
import pandas as pd
idx1 = pd.Index([1, 2, 3, 4])
idx2 = pd.Index([3, 4, 5, 6])
print("Intersection:", idx1.intersection(idx2).tolist())
print("Union:", idx1.union(idx2).tolist())
print("Difference:", idx1.difference(idx2).tolist())
Exercise 3.
Create a DataFrame and use .set_index() to set a column as the index. Then use .reset_index() to move the index back to a column. Verify the DataFrame is the same as the original.
Solution to Exercise 3
Round-trip set_index and reset_index.
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3], 'val': [10, 20, 30]})
df_indexed = df.set_index('id')
print("Indexed:\n", df_indexed)
df_reset = df_indexed.reset_index()
print("Reset:\n", df_reset)
assert df.equals(df_reset)