isin Method¶
The isin() method filters rows where column values are in a specified list or set.
Basic Usage¶
Check membership in a list.
1. Simple isin¶
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'city': ['NY', 'LA', 'SF', 'NY']
})
# Filter where city is in list
result = df[df['city'].isin(['NY', 'SF'])]
print(result)
name city
0 Alice NY
2 Charlie SF
3 David NY
2. Boolean Result¶
mask = df['city'].isin(['NY', 'SF'])
print(mask)
0 True
1 False
2 True
3 True
Name: city, dtype: bool
3. Apply Filter¶
result = df[mask]
NOT isin¶
Filter values NOT in list.
1. Negate with ~¶
# Cities NOT in list
result = df[~df['city'].isin(['NY', 'SF'])]
print(result)
name city
1 Bob LA
2. Common Pattern¶
# Find customers who never ordered
customers[~customers['id'].isin(orders['customerId'])]
3. Exclude Values¶
# Remove specific categories
df[~df['category'].isin(['Deprecated', 'Deleted'])]
LeetCode Example: Customers Who Never Order¶
Find customers not in orders.
1. Sample Data¶
customers = pd.DataFrame({
'id': [1, 2, 3, 4],
'name': ['Alice', 'Bob', 'Charlie', 'David']
})
orders = pd.DataFrame({
'customerId': [1, 3, 1]
})
2. Filter Not In¶
result = customers[~customers['id'].isin(orders['customerId'])]
print(result)
id name
1 2 Bob
3 4 David
3. Select Column¶
result = result[['name']].rename(columns={'name': 'Customers'})
isin with Series¶
Use another column or Series as the list.
1. Column as List¶
valid_ids = orders['customerId'].unique()
result = customers[customers['id'].isin(valid_ids)]
2. From Another DataFrame¶
# Products sold in Q1
q1_products = q1_sales['product_id']
products_sold = products[products['id'].isin(q1_products)]
3. Dynamic Lists¶
# Filter based on computed values
top_cities = df.groupby('city')['sales'].sum().nlargest(5).index
top_city_data = df[df['city'].isin(top_cities)]
isin with Dictionary¶
Check multiple columns simultaneously.
1. Dict Syntax¶
df = pd.DataFrame({
'A': [1, 2, 3],
'B': ['x', 'y', 'z']
})
# Check if values are in respective lists
mask = df.isin({'A': [1, 3], 'B': ['x', 'z']})
print(mask)
A B
0 True True
1 False False
2 True True
2. Row Filter¶
# Rows where ALL columns match
result = df[mask.all(axis=1)]
# Rows where ANY column matches
result = df[mask.any(axis=1)]
3. Use Case¶
# Valid combinations
valid = {
'department': ['Sales', 'Marketing'],
'level': ['Senior', 'Manager']
}
filtered = df[df.isin(valid).all(axis=1)]
Numeric isin¶
Filter numeric values.
1. Integer List¶
df = pd.DataFrame({'id': [1, 2, 3, 4, 5]})
result = df[df['id'].isin([1, 3, 5])]
2. Float Values¶
# Be careful with float precision
df[df['value'].isin([1.0, 2.0, 3.0])]
3. Range Alternative¶
# For ranges, use comparison operators
df[(df['id'] >= 1) & (df['id'] <= 5)]
Date isin¶
Filter specific dates.
1. Date List¶
dates = pd.to_datetime(['2024-01-01', '2024-01-15', '2024-02-01'])
result = df[df['date'].isin(dates)]
2. Month Filter¶
# Q1 months
result = df[df['date'].dt.month.isin([1, 2, 3])]
3. Year Filter¶
result = df[df['date'].dt.year.isin([2023, 2024])]
Performance¶
isin optimization.
1. Set for Large Lists¶
# Convert to set for faster lookup
large_list = set(range(10000))
df[df['id'].isin(large_list)]
2. vs Multiple OR¶
# isin is faster than multiple OR conditions
# Good
df[df['city'].isin(['NY', 'LA', 'SF'])]
# Slower
df[(df['city'] == 'NY') | (df['city'] == 'LA') | (df['city'] == 'SF')]
3. Merge Alternative¶
# For very large lookups, merge can be faster
lookup = pd.DataFrame({'city': ['NY', 'LA', 'SF']})
result = df.merge(lookup, on='city')