Skip to content

Boolean Indexing

Filter DataFrame rows using boolean conditions and masks.

Basic Comparison

Filter using comparison operators.

1. Single Condition

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'salary': [50000, 60000, 70000, 80000]
})

# Filter age > 30
result = df[df['age'] > 30]
print(result)
      name  age  salary
2  Charlie   35   70000
3    David   40   80000

2. Boolean Mask

mask = df['age'] > 30
print(mask)
0    False
1    False
2     True
3     True
Name: age, dtype: bool

3. Apply Mask

result = df[mask]

Multiple Conditions

Combine conditions with logical operators.

1. AND Condition (&)

# Age > 30 AND salary > 60000
result = df[(df['age'] > 30) & (df['salary'] > 60000)]

2. OR Condition (|)

# Age > 35 OR salary < 55000
result = df[(df['age'] > 35) | (df['salary'] < 55000)]

3. NOT Condition (~)

# NOT age > 30
result = df[~(df['age'] > 30)]

Parentheses Required

Always use parentheses with multiple conditions.

1. Correct Syntax

# Correct - parentheses around each condition
df[(df['age'] > 30) & (df['salary'] > 60000)]

2. Incorrect Syntax

# Wrong - will raise error
# df[df['age'] > 30 & df['salary'] > 60000]

3. Operator Precedence

& has higher precedence than comparison operators.

loc with Boolean

Use loc for boolean indexing with column selection.

1. Filter and Select

result = df.loc[df['age'] > 30, ['name', 'salary']]
print(result)
      name  salary
2  Charlie   70000
3    David   80000

2. Modify Values

df.loc[df['age'] > 30, 'bonus'] = 5000

3. Multiple Conditions

df.loc[(df['age'] > 30) & (df['salary'] > 60000), 'level'] = 'Senior'

LeetCode Example: Employees Earning More Than Manager

Self-merge and filter comparison.

1. Sample Data

employee = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['John', 'Doe', 'Jane', 'Smith'],
    'salary': [50000, 40000, 60000, 30000],
    'managerId': [None, 1, 1, 2]
})

2. Self Merge

merged = pd.merge(
    employee, employee,
    left_on='managerId', right_on='id',
    suffixes=('_emp', '_mgr')
)

3. Filter Comparison

higher_salary = merged[merged['salary_emp'] > merged['salary_mgr']]
result = higher_salary[['name_emp']].rename(columns={'name_emp': 'Employee'})

LeetCode Example: Not Boring Movies

Filter with multiple conditions.

1. Sample Data

cinema = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'movie': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E'],
    'description': ['thrilling', 'boring', 'exciting', 'boring doc', 'great'],
    'rating': [4.5, 3.0, 4.7, 2.1, 4.9]
})

2. Odd ID and Not Boring

filtered = cinema[
    (cinema['id'] % 2 != 0) & 
    (~cinema['description'].str.contains('boring', case=False))
]

3. Result

print(filtered)

Missing Value Handling

Filter based on null values.

1. Filter Non-null

result = df[df['salary'].notna()]

2. Filter Null

result = df[df['salary'].isna()]

3. Any Row with Missing

# Rows with any missing value
result = df[df.isna().any(axis=1)]

Date Filtering

Filter by date conditions.

1. Date Range

df['date'] = pd.to_datetime(df['date'])

result = df[
    (df['date'] >= '2019-06-28') & 
    (df['date'] <= '2019-07-27')
]

2. Year Filter

result = df[df['date'].dt.year == 2020]

3. Month Filter

result = df[df['date'].dt.month.isin([1, 2, 3])]  # Q1