query Method¶

The query() method filters DataFrames using a string expression, providing cleaner syntax for complex conditions.

Mental Model

query() lets you write filter conditions as a readable string instead of bracket-heavy boolean expressions. df.query("age > 30 and city == 'NY'") replaces df[(df['age'] > 30) & (df['city'] == 'NY')]. The result is identical, but the string form is closer to how you think about the condition in plain English.

Basic Usage¶

Filter with string expressions.

1. Simple Condition¶

```python import pandas as pd

df = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'Salary': [50000, 60000, 70000, 80000] })

Equivalent operations¶

result1 = df[df['Age'] < 35] # Boolean indexing result2 = df.query('Age < 35') # Query method

print(result2) ```

Name Age Salary 0 Alice 25 50000 1 Bob 30 60000

2. Cleaner Syntax¶

Query avoids repeated DataFrame references.

3. Column Names¶

Use column names directly in the query string.

Multiple Conditions¶

Combine conditions with and/or.

1. AND Condition¶

```python

Boolean indexing¶

df[(df['Age'] > 25) & (df['Salary'] > 50000)]

Query method¶

df.query('Age > 25 and Salary > 50000') ```

2. OR Condition¶

python df.query('Age > 35 or Salary < 55000')

3. Complex Conditions¶

python df.query('(Age > 30 and Salary > 60000) or Name == "Alice"')

Column Comparisons¶

Compare columns against each other.

1. Column vs Column¶

```python df = pd.DataFrame({ 'Subscribers': [760, 366, 1660, 171], 'Age': [40, 50, 25, 35] })

Subscribers > 10 * Age¶

df.query('Subscribers > 10 * Age') ```

2. Arithmetic Expressions¶

python df.query('Salary / 1000 > Age')

3. Multiple Columns¶

python df.query('High > Open and Close > Open')

External Variables¶

Reference variables with @ prefix.

1. Variable Reference¶

```python limit = 500

Boolean indexing¶

df[df['Subscribers'] > limit]

Query method¶

df.query('Subscribers > @limit') ```

2. Multiple Variables¶

python min_age = 25 max_age = 40 df.query('@min_age <= Age <= @max_age')

3. List Variables¶

python valid_names = ['Alice', 'Bob'] df.query('Name in @valid_names')

String Operations¶

Query with string conditions.

1. Equality¶

python df.query('Name == "Alice"')

2. In List¶

python df.query('Name in ["Alice", "Bob"]')

3. Not In¶

python df.query('Name not in ["Charlie", "David"]')

Detecting NaN¶

Special handling for missing values.

1. NaN Detection¶

```python

NaN != NaN is True¶

df.query('Subscribers != Subscribers') # Finds NaN rows ```

2. Not NaN¶

python df.query('Subscribers == Subscribers') # Non-NaN rows

3. Alternative¶

```python

Better to avoid spaces in column names¶

df.columns = df.columns.str.replace(' ', '_') ```

3. Alternative¶

python df[df['First Name'] == 'Alice'] # Boolean indexing works

Performance¶

When to use query.

1. Large DataFrames¶

Query can be faster for large DataFrames.

2. Complex Conditions¶

Query is cleaner for multiple conditions.

3. Simple Filters¶

Boolean indexing is fine for simple filters.

Query vs Boolean Indexing¶

Comparison of approaches.

1. Query Advantages¶

Cleaner syntax for complex conditions
No repeated DataFrame name
More readable

2. Boolean Indexing Advantages¶

More flexible
Works with any expression
No string parsing

3. Choose Based On¶

Readability needs
Condition complexity
Personal preference

Exercises¶

Exercise 1. Create a DataFrame with columns 'product', 'price', and 'quantity'. Use .query() to find products where price > 20 and quantity < 100. Compare the readability with equivalent boolean indexing.

Solution to Exercise 1

Use .query() with a readable string expression.

import pandas as pd

df = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Gizmo', 'Tool'],
    'price': [15, 25, 30, 10],
    'quantity': [200, 50, 80, 150]
})
result = df.query('price > 20 and quantity < 100')
print(result)

Exercise 2. Use .query() with the @ syntax to reference a Python variable. Given a variable min_salary = 60000, write a query that filters employees with salary >= @min_salary.

Solution to Exercise 2

Reference external variables with @ inside query strings.

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol'],
    'salary': [55000, 65000, 72000]
})
min_salary = 60000
result = df.query('salary >= @min_salary')
print(result)

Exercise 3. Use .query() with string methods by filtering a DataFrame where the 'name' column starts with the letter 'A'. Use the backtick syntax if the column name contains spaces.

Solution to Exercise 3

Use string methods inside .query().

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Anna', 'Charlie'],
    'score': [85, 90, 78, 92]
})
result = df.query('name.str.startswith("A")', engine='python')
print(result)

query Method¶

Basic Usage¶

1. Simple Condition¶

Equivalent operations¶

2. Cleaner Syntax¶

3. Column Names¶

Multiple Conditions¶

1. AND Condition¶

Boolean indexing¶

Query method¶

2. OR Condition¶

3. Complex Conditions¶

Column Comparisons¶

1. Column vs Column¶

Subscribers > 10 * Age¶

2. Arithmetic Expressions¶

3. Multiple Columns¶

External Variables¶

1. Variable Reference¶

Boolean indexing¶

Query method¶

2. Multiple Variables¶

3. List Variables¶

String Operations¶

1. Equality¶

2. In List¶

3. Not In¶

Detecting NaN¶

1. NaN Detection¶

NaN != NaN is True¶

2. Not NaN¶

3. Alternative¶

More readable alternatives¶

Date Queries¶

1. Date Range¶

2. Year Filter¶

3. Month Filter¶

LeetCode Example: Sales Analysis¶

1. Sample Data¶

2. Filter Q1 2019¶

3. Exclude Q1 2019¶

Spaces in Column Names¶

1. Backticks¶

2. Rename Columns¶

Better to avoid spaces in column names¶

3. Alternative¶

Performance¶

1. Large DataFrames¶

2. Complex Conditions¶

3. Simple Filters¶

Query vs Boolean Indexing¶

1. Query Advantages¶

2. Boolean Indexing Advantages¶

3. Choose Based On¶

Exercises¶