isin Method¶

The isin() method filters rows where column values are in a specified list or set.

Basic Usage¶

Check membership in a list.

1. Simple isin¶

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'city': ['NY', 'LA', 'SF', 'NY']
})

# Filter where city is in list
result = df[df['city'].isin(['NY', 'SF'])]
print(result)

      name city
0    Alice   NY
2  Charlie   SF
3    David   NY

2. Boolean Result¶

mask = df['city'].isin(['NY', 'SF'])
print(mask)

0     True
1    False
2     True
3     True
Name: city, dtype: bool

3. Apply Filter¶

result = df[mask]

NOT isin¶

Filter values NOT in list.

1. Negate with ~¶

# Cities NOT in list
result = df[~df['city'].isin(['NY', 'SF'])]
print(result)

  name city
1  Bob   LA

2. Common Pattern¶

# Find customers who never ordered
customers[~customers['id'].isin(orders['customerId'])]

3. Exclude Values¶

# Remove specific categories
df[~df['category'].isin(['Deprecated', 'Deleted'])]

LeetCode Example: Customers Who Never Order¶

Find customers not in orders.

1. Sample Data¶

customers = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Charlie', 'David']
})

orders = pd.DataFrame({
    'customerId': [1, 3, 1]
})

2. Filter Not In¶

result = customers[~customers['id'].isin(orders['customerId'])]
print(result)

   id   name
1   2    Bob
3   4  David

3. Select Column¶

result = result[['name']].rename(columns={'name': 'Customers'})

isin with Series¶

Use another column or Series as the list.

1. Column as List¶

valid_ids = orders['customerId'].unique()
result = customers[customers['id'].isin(valid_ids)]

2. From Another DataFrame¶

# Products sold in Q1
q1_products = q1_sales['product_id']
products_sold = products[products['id'].isin(q1_products)]

3. Dynamic Lists¶

# Filter based on computed values
top_cities = df.groupby('city')['sales'].sum().nlargest(5).index
top_city_data = df[df['city'].isin(top_cities)]

isin with Dictionary¶

Check multiple columns simultaneously.

1. Dict Syntax¶

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['x', 'y', 'z']
})

# Check if values are in respective lists
mask = df.isin({'A': [1, 3], 'B': ['x', 'z']})
print(mask)

       A      B
0   True   True
1  False  False
2   True   True

2. Row Filter¶

# Rows where ALL columns match
result = df[mask.all(axis=1)]

# Rows where ANY column matches
result = df[mask.any(axis=1)]

3. Use Case¶

# Valid combinations
valid = {
    'department': ['Sales', 'Marketing'],
    'level': ['Senior', 'Manager']
}
filtered = df[df.isin(valid).all(axis=1)]

Numeric isin¶

Filter numeric values.

1. Integer List¶

df = pd.DataFrame({'id': [1, 2, 3, 4, 5]})
result = df[df['id'].isin([1, 3, 5])]

2. Float Values¶

# Be careful with float precision
df[df['value'].isin([1.0, 2.0, 3.0])]

3. Range Alternative¶

# For ranges, use comparison operators
df[(df['id'] >= 1) & (df['id'] <= 5)]

Date isin¶

Filter specific dates.

1. Date List¶

dates = pd.to_datetime(['2024-01-01', '2024-01-15', '2024-02-01'])
result = df[df['date'].isin(dates)]

2. Month Filter¶

# Q1 months
result = df[df['date'].dt.month.isin([1, 2, 3])]

3. Year Filter¶

result = df[df['date'].dt.year.isin([2023, 2024])]

Performance¶

isin optimization.

1. Set for Large Lists¶

# Convert to set for faster lookup
large_list = set(range(10000))
df[df['id'].isin(large_list)]

2. vs Multiple OR¶

# isin is faster than multiple OR conditions
# Good
df[df['city'].isin(['NY', 'LA', 'SF'])]

# Slower
df[(df['city'] == 'NY') | (df['city'] == 'LA') | (df['city'] == 'SF')]

3. Merge Alternative¶

# For very large lookups, merge can be faster
lookup = pd.DataFrame({'city': ['NY', 'LA', 'SF']})
result = df.merge(lookup, on='city')