Boolean Indexing¶

Filter DataFrame rows using boolean conditions and masks.

Basic Comparison¶

Filter using comparison operators.

1. Single Condition¶

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'salary': [50000, 60000, 70000, 80000]
})

# Filter age > 30
result = df[df['age'] > 30]
print(result)

      name  age  salary
2  Charlie   35   70000
3    David   40   80000

2. Boolean Mask¶

mask = df['age'] > 30
print(mask)

0    False
1    False
2     True
3     True
Name: age, dtype: bool

3. Apply Mask¶

result = df[mask]

Multiple Conditions¶

Combine conditions with logical operators.

1. AND Condition (&)¶

# Age > 30 AND salary > 60000
result = df[(df['age'] > 30) & (df['salary'] > 60000)]

2. OR Condition (|)¶

# Age > 35 OR salary < 55000
result = df[(df['age'] > 35) | (df['salary'] < 55000)]

3. NOT Condition (~)¶

# NOT age > 30
result = df[~(df['age'] > 30)]

Parentheses Required¶

Always use parentheses with multiple conditions.

1. Correct Syntax¶

# Correct - parentheses around each condition
df[(df['age'] > 30) & (df['salary'] > 60000)]

2. Incorrect Syntax¶

# Wrong - will raise error
# df[df['age'] > 30 & df['salary'] > 60000]

3. Operator Precedence¶

& has higher precedence than comparison operators.

loc with Boolean¶

Use loc for boolean indexing with column selection.

1. Filter and Select¶

result = df.loc[df['age'] > 30, ['name', 'salary']]
print(result)

      name  salary
2  Charlie   70000
3    David   80000

2. Modify Values¶

df.loc[df['age'] > 30, 'bonus'] = 5000

3. Multiple Conditions¶

df.loc[(df['age'] > 30) & (df['salary'] > 60000), 'level'] = 'Senior'

LeetCode Example: Employees Earning More Than Manager¶

Self-merge and filter comparison.

1. Sample Data¶

employee = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['John', 'Doe', 'Jane', 'Smith'],
    'salary': [50000, 40000, 60000, 30000],
    'managerId': [None, 1, 1, 2]
})

2. Self Merge¶

merged = pd.merge(
    employee, employee,
    left_on='managerId', right_on='id',
    suffixes=('_emp', '_mgr')
)

3. Filter Comparison¶

higher_salary = merged[merged['salary_emp'] > merged['salary_mgr']]
result = higher_salary[['name_emp']].rename(columns={'name_emp': 'Employee'})

LeetCode Example: Not Boring Movies¶

Filter with multiple conditions.

1. Sample Data¶

cinema = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'movie': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E'],
    'description': ['thrilling', 'boring', 'exciting', 'boring doc', 'great'],
    'rating': [4.5, 3.0, 4.7, 2.1, 4.9]
})

2. Odd ID and Not Boring¶

filtered = cinema[
    (cinema['id'] % 2 != 0) & 
    (~cinema['description'].str.contains('boring', case=False))
]

3. Result¶

print(filtered)

Missing Value Handling¶

Filter based on null values.

1. Filter Non-null¶

result = df[df['salary'].notna()]

2. Filter Null¶

result = df[df['salary'].isna()]

3. Any Row with Missing¶

# Rows with any missing value
result = df[df.isna().any(axis=1)]

Date Filtering¶

Filter by date conditions.

1. Date Range¶

df['date'] = pd.to_datetime(df['date'])

result = df[
    (df['date'] >= '2019-06-28') & 
    (df['date'] <= '2019-07-27')
]

2. Year Filter¶

result = df[df['date'].dt.year == 2020]

3. Month Filter¶

result = df[df['date'].dt.month.isin([1, 2, 3])]  # Q1