Indexing and Selection¶
Indexing is one of pandas' most powerful features. Understanding the difference between label-based and position-based selection is essential.
Label-based Selection¶
The loc accessor selects by labels.
1. Single Row by Label¶
import pandas as pd
df = pd.DataFrame({
"price": [100, 101, 102],
"volume": [10, 12, 9]
}, index=['day1', 'day2', 'day3'])
print(df.loc['day1'])
price 100
volume 10
Name: day1, dtype: int64
2. Row and Column Selection¶
df.loc['day1', 'price'] # Single value
df.loc['day1':'day2', 'price'] # Slice (inclusive)
df.loc[:, 'price'] # All rows, one column
3. Multiple Labels¶
df.loc[['day1', 'day3'], ['price', 'volume']]
Position-based Selection¶
The iloc accessor selects by integer positions.
1. Single Row by Position¶
df.iloc[0] # First row
2. Row and Column by Position¶
df.iloc[0, 0] # First row, first column
df.iloc[0:2, 0] # First two rows, first column
df.iloc[:, 0] # All rows, first column
3. Integer Slicing¶
df.iloc[0:2] # First two rows (exclusive end)
df.iloc[-1] # Last row
Boolean Indexing¶
Select rows based on conditions.
1. Single Condition¶
df[df["price"] > 100]
2. Multiple Conditions¶
df[(df["price"] > 100) & (df["volume"] > 10)]
df[(df["price"] > 102) | (df["volume"] < 10)]
3. Using Query¶
df.query("price > 100 and volume > 10")
Chained Indexing¶
Avoid chained indexing to prevent unexpected behavior.
1. Problematic Pattern¶
# This can cause SettingWithCopyWarning
df[df["price"] > 100]["volume"] = 20
2. Correct Approach¶
df.loc[df["price"] > 100, "volume"] = 20
3. Copy vs View¶
Chained indexing may return a view or copy unpredictably.
Setting Values¶
Use loc and iloc for assignment.
1. Single Value¶
df.loc['day1', 'price'] = 105
2. Multiple Values¶
df.loc['day1', ['price', 'volume']] = [105, 15]
3. Conditional Assignment¶
df.loc[df['price'] > 100, 'flag'] = True
Best Practices¶
Follow these guidelines for clean indexing code.
1. Explicit Selection¶
Always use loc or iloc explicitly:
# Preferred
df.loc[0] # If 0 is a label
df.iloc[0] # If 0 is a position
# Avoid
df[0] # Ambiguous
2. Avoid Mixed Indexing¶
# Don't mix labels and positions
# Use loc for labels, iloc for positions
3. Check Index Type¶
print(df.index) # Understand your index
print(type(df.index)) # Know the index type