Indexing and Selection¶
Indexing is one of pandas' most powerful features. Understanding the difference between label-based and position-based selection is essential.
Mental Model
Pandas has two addressing systems: loc uses label names (like looking up a word in a dictionary), and iloc uses integer positions (like accessing an array by index). Mixing them up is the most common pandas mistake. When in doubt, loc for labels, iloc for integers.
Label-based Selection¶
The loc accessor selects by labels.
1. Single Row by Label¶
```python import pandas as pd
df = pd.DataFrame({ "price": [100, 101, 102], "volume": [10, 12, 9] }, index=['day1', 'day2', 'day3'])
print(df.loc['day1']) ```
price 100
volume 10
Name: day1, dtype: int64
2. Row and Column Selection¶
python
df.loc['day1', 'price'] # Single value
df.loc['day1':'day2', 'price'] # Slice (inclusive)
df.loc[:, 'price'] # All rows, one column
3. Multiple Labels¶
python
df.loc[['day1', 'day3'], ['price', 'volume']]
Position-based Selection¶
The iloc accessor selects by integer positions.
1. Single Row by Position¶
python
df.iloc[0] # First row
2. Row and Column by Position¶
python
df.iloc[0, 0] # First row, first column
df.iloc[0:2, 0] # First two rows, first column
df.iloc[:, 0] # All rows, first column
3. Integer Slicing¶
python
df.iloc[0:2] # First two rows (exclusive end)
df.iloc[-1] # Last row
Boolean Indexing¶
Select rows based on conditions.
1. Single Condition¶
python
df[df["price"] > 100]
2. Multiple Conditions¶
python
df[(df["price"] > 100) & (df["volume"] > 10)]
df[(df["price"] > 102) | (df["volume"] < 10)]
3. Using Query¶
python
df.query("price > 100 and volume > 10")
Chained Indexing¶
Avoid chained indexing to prevent unexpected behavior.
1. Problematic Pattern¶
```python
This can cause SettingWithCopyWarning¶
df[df["price"] > 100]["volume"] = 20 ```
2. Correct Approach¶
python
df.loc[df["price"] > 100, "volume"] = 20
3. Copy vs View¶
Chained indexing may return a view or copy unpredictably.
Setting Values¶
Use loc and iloc for assignment.
1. Single Value¶
python
df.loc['day1', 'price'] = 105
2. Multiple Values¶
python
df.loc['day1', ['price', 'volume']] = [105, 15]
3. Conditional Assignment¶
python
df.loc[df['price'] > 100, 'flag'] = True
Best Practices¶
Follow these guidelines for clean indexing code.
1. Explicit Selection¶
Always use loc or iloc explicitly:
```python
Preferred¶
df.loc[0] # If 0 is a label df.iloc[0] # If 0 is a position
Avoid¶
df[0] # Ambiguous ```
2. Avoid Mixed Indexing¶
```python
Don't mix labels and positions¶
Use loc for labels, iloc for positions¶
```
3. Check Index Type¶
python
print(df.index) # Understand your index
print(type(df.index)) # Know the index type
Exercises¶
Exercise 1.
Create a DataFrame with a string index. Use .loc[] to select a single row, a range of rows (slice), and specific rows and columns simultaneously.
Solution to Exercise 1
Use loc for label-based selection.
import pandas as pd
df = pd.DataFrame(
{'A': [10, 20, 30, 40], 'B': [50, 60, 70, 80]},
index=['w', 'x', 'y', 'z']
)
print("Single row:\n", df.loc['x'])
print("\nSlice:\n", df.loc['x':'z'])
print("\nRows and cols:\n", df.loc[['w', 'z'], ['A']])
Exercise 2.
Create a DataFrame with a default numeric index. Use .iloc[] to select the first 3 rows, every other row, and the last row. Verify each result.
Solution to Exercise 2
Use iloc for position-based selection.
import pandas as pd
df = pd.DataFrame({'val': [10, 20, 30, 40, 50]})
print("First 3:\n", df.iloc[:3])
print("\nEvery other:\n", df.iloc[::2])
print("\nLast row:\n", df.iloc[-1])
Exercise 3.
Create a DataFrame and demonstrate the difference between df['col'] (column access), df.loc[row_label] (row by label), and df.iloc[row_pos] (row by position). Show that df.loc['label', 'col'] accesses a scalar value.
Solution to Exercise 3
Demonstrate different selection methods.
import pandas as pd
df = pd.DataFrame(
{'price': [100, 200, 300], 'qty': [5, 3, 7]},
index=['a', 'b', 'c']
)
print("Column:", type(df['price']))
print("Row by label:", type(df.loc['a']))
print("Row by position:", type(df.iloc[0]))
print("Scalar:", df.loc['b', 'price'])