query Method¶
The query() method filters DataFrames using a string expression, providing cleaner syntax for complex conditions.
Mental Model
query() lets you write filter conditions as a readable string instead of bracket-heavy boolean expressions. df.query("age > 30 and city == 'NY'") replaces df[(df['age'] > 30) & (df['city'] == 'NY')]. The result is identical, but the string form is closer to how you think about the condition in plain English.
Basic Usage¶
Filter with string expressions.
1. Simple Condition¶
```python import pandas as pd
df = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'Salary': [50000, 60000, 70000, 80000] })
Equivalent operations¶
result1 = df[df['Age'] < 35] # Boolean indexing result2 = df.query('Age < 35') # Query method
print(result2) ```
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2. Cleaner Syntax¶
Query avoids repeated DataFrame references.
3. Column Names¶
Use column names directly in the query string.
Multiple Conditions¶
Combine conditions with and/or.
1. AND Condition¶
```python
Boolean indexing¶
df[(df['Age'] > 25) & (df['Salary'] > 50000)]
Query method¶
df.query('Age > 25 and Salary > 50000') ```
2. OR Condition¶
python
df.query('Age > 35 or Salary < 55000')
3. Complex Conditions¶
python
df.query('(Age > 30 and Salary > 60000) or Name == "Alice"')
Column Comparisons¶
Compare columns against each other.
1. Column vs Column¶
```python df = pd.DataFrame({ 'Subscribers': [760, 366, 1660, 171], 'Age': [40, 50, 25, 35] })
Subscribers > 10 * Age¶
df.query('Subscribers > 10 * Age') ```
2. Arithmetic Expressions¶
python
df.query('Salary / 1000 > Age')
3. Multiple Columns¶
python
df.query('High > Open and Close > Open')
External Variables¶
Reference variables with @ prefix.
1. Variable Reference¶
```python limit = 500
Boolean indexing¶
df[df['Subscribers'] > limit]
Query method¶
df.query('Subscribers > @limit') ```
2. Multiple Variables¶
python
min_age = 25
max_age = 40
df.query('@min_age <= Age <= @max_age')
3. List Variables¶
python
valid_names = ['Alice', 'Bob']
df.query('Name in @valid_names')
String Operations¶
Query with string conditions.
1. Equality¶
python
df.query('Name == "Alice"')
2. In List¶
python
df.query('Name in ["Alice", "Bob"]')
3. Not In¶
python
df.query('Name not in ["Charlie", "David"]')
Detecting NaN¶
Special handling for missing values.
1. NaN Detection¶
```python
NaN != NaN is True¶
df.query('Subscribers != Subscribers') # Finds NaN rows ```
2. Not NaN¶
python
df.query('Subscribers == Subscribers') # Non-NaN rows
3. Alternative¶
```python
More readable alternatives¶
df[df['Subscribers'].isna()] df[df['Subscribers'].notna()] ```
Date Queries¶
Filter by date ranges.
1. Date Range¶
python
df.query('request_at >= "2013-10-01" and request_at <= "2013-10-03"')
2. Year Filter¶
python
df.query('sale_date.dt.year == 2019')
3. Month Filter¶
python
df.query('sale_date.dt.month in [1, 2, 3]')
LeetCode Example: Sales Analysis¶
Filter sales by date.
1. Sample Data¶
python
sales = pd.DataFrame({
'sale_date': pd.to_datetime(['2019-01-15', '2019-04-10', '2018-12-25']),
'product_id': [1, 2, 3]
})
2. Filter Q1 2019¶
python
q1_sales = sales.query('sale_date.dt.year == 2019 and sale_date.dt.month in [1, 2, 3]')
3. Exclude Q1 2019¶
python
not_q1 = sales.query('sale_date.dt.year != 2019 or sale_date.dt.month not in [1, 2, 3]')
Spaces in Column Names¶
Handle column names with spaces.
1. Backticks¶
python
df = pd.DataFrame({'First Name': ['Alice', 'Bob']})
df.query('`First Name` == "Alice"')
2. Rename Columns¶
```python
Better to avoid spaces in column names¶
df.columns = df.columns.str.replace(' ', '_') ```
3. Alternative¶
python
df[df['First Name'] == 'Alice'] # Boolean indexing works
Performance¶
When to use query.
1. Large DataFrames¶
Query can be faster for large DataFrames.
2. Complex Conditions¶
Query is cleaner for multiple conditions.
3. Simple Filters¶
Boolean indexing is fine for simple filters.
Query vs Boolean Indexing¶
Comparison of approaches.
1. Query Advantages¶
- Cleaner syntax for complex conditions
- No repeated DataFrame name
- More readable
2. Boolean Indexing Advantages¶
- More flexible
- Works with any expression
- No string parsing
3. Choose Based On¶
- Readability needs
- Condition complexity
- Personal preference
Exercises¶
Exercise 1.
Create a DataFrame with columns 'product', 'price', and 'quantity'. Use .query() to find products where price > 20 and quantity < 100. Compare the readability with equivalent boolean indexing.
Solution to Exercise 1
Use .query() with a readable string expression.
import pandas as pd
df = pd.DataFrame({
'product': ['Widget', 'Gadget', 'Gizmo', 'Tool'],
'price': [15, 25, 30, 10],
'quantity': [200, 50, 80, 150]
})
result = df.query('price > 20 and quantity < 100')
print(result)
Exercise 2.
Use .query() with the @ syntax to reference a Python variable. Given a variable min_salary = 60000, write a query that filters employees with salary >= @min_salary.
Solution to Exercise 2
Reference external variables with @ inside query strings.
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Carol'],
'salary': [55000, 65000, 72000]
})
min_salary = 60000
result = df.query('salary >= @min_salary')
print(result)
Exercise 3.
Use .query() with string methods by filtering a DataFrame where the 'name' column starts with the letter 'A'. Use the backtick syntax if the column name contains spaces.
Solution to Exercise 3
Use string methods inside .query().
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Anna', 'Charlie'],
'score': [85, 90, 78, 92]
})
result = df.query('name.str.startswith("A")', engine='python')
print(result)