isin Method¶
The isin() method filters rows where column values are in a specified list or set.
Mental Model
isin() is the pandas equivalent of SQL's IN clause. It checks each value against a set of allowed values and returns a boolean mask. Instead of chaining (x == 'a') | (x == 'b') | (x == 'c'), write x.isin(['a', 'b', 'c']) -- cleaner, faster, and scales to any number of values.
Basic Usage¶
Check membership in a list.
1. Simple isin¶
```python import pandas as pd
df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'city': ['NY', 'LA', 'SF', 'NY'] })
Filter where city is in list¶
result = df[df['city'].isin(['NY', 'SF'])] print(result) ```
name city
0 Alice NY
2 Charlie SF
3 David NY
2. Boolean Result¶
python
mask = df['city'].isin(['NY', 'SF'])
print(mask)
0 True
1 False
2 True
3 True
Name: city, dtype: bool
3. Apply Filter¶
python
result = df[mask]
NOT isin¶
Filter values NOT in list.
1. Negate with ~¶
```python
Cities NOT in list¶
result = df[~df['city'].isin(['NY', 'SF'])] print(result) ```
name city
1 Bob LA
2. Common Pattern¶
```python
Find customers who never ordered¶
customers[~customers['id'].isin(orders['customerId'])] ```
3. Exclude Values¶
```python
Remove specific categories¶
df[~df['category'].isin(['Deprecated', 'Deleted'])] ```
LeetCode Example: Customers Who Never Order¶
Find customers not in orders.
1. Sample Data¶
```python customers = pd.DataFrame({ 'id': [1, 2, 3, 4], 'name': ['Alice', 'Bob', 'Charlie', 'David'] })
orders = pd.DataFrame({ 'customerId': [1, 3, 1] }) ```
2. Filter Not In¶
python
result = customers[~customers['id'].isin(orders['customerId'])]
print(result)
id name
1 2 Bob
3 4 David
3. Select Column¶
python
result = result[['name']].rename(columns={'name': 'Customers'})
isin with Series¶
Use another column or Series as the list.
1. Column as List¶
python
valid_ids = orders['customerId'].unique()
result = customers[customers['id'].isin(valid_ids)]
2. From Another DataFrame¶
```python
Products sold in Q1¶
q1_products = q1_sales['product_id'] products_sold = products[products['id'].isin(q1_products)] ```
3. Dynamic Lists¶
```python
Filter based on computed values¶
top_cities = df.groupby('city')['sales'].sum().nlargest(5).index top_city_data = df[df['city'].isin(top_cities)] ```
isin with Dictionary¶
Check multiple columns simultaneously.
1. Dict Syntax¶
```python df = pd.DataFrame({ 'A': [1, 2, 3], 'B': ['x', 'y', 'z'] })
Check if values are in respective lists¶
mask = df.isin({'A': [1, 3], 'B': ['x', 'z']}) print(mask) ```
A B
0 True True
1 False False
2 True True
2. Row Filter¶
```python
Rows where ALL columns match¶
result = df[mask.all(axis=1)]
Rows where ANY column matches¶
result = df[mask.any(axis=1)] ```
3. Use Case¶
```python
Valid combinations¶
valid = { 'department': ['Sales', 'Marketing'], 'level': ['Senior', 'Manager'] } filtered = df[df.isin(valid).all(axis=1)] ```
Numeric isin¶
Filter numeric values.
1. Integer List¶
python
df = pd.DataFrame({'id': [1, 2, 3, 4, 5]})
result = df[df['id'].isin([1, 3, 5])]
2. Float Values¶
```python
Be careful with float precision¶
df[df['value'].isin([1.0, 2.0, 3.0])] ```
3. Range Alternative¶
```python
For ranges, use comparison operators¶
df[(df['id'] >= 1) & (df['id'] <= 5)] ```
Date isin¶
Filter specific dates.
1. Date List¶
python
dates = pd.to_datetime(['2024-01-01', '2024-01-15', '2024-02-01'])
result = df[df['date'].isin(dates)]
2. Month Filter¶
```python
Q1 months¶
result = df[df['date'].dt.month.isin([1, 2, 3])] ```
3. Year Filter¶
python
result = df[df['date'].dt.year.isin([2023, 2024])]
Performance¶
isin optimization.
1. Set for Large Lists¶
```python
Convert to set for faster lookup¶
large_list = set(range(10000)) df[df['id'].isin(large_list)] ```
2. vs Multiple OR¶
```python
isin is faster than multiple OR conditions¶
Good¶
df[df['city'].isin(['NY', 'LA', 'SF'])]
Slower¶
df[(df['city'] == 'NY') | (df['city'] == 'LA') | (df['city'] == 'SF')] ```
3. Merge Alternative¶
```python
For very large lookups, merge can be faster¶
lookup = pd.DataFrame({'city': ['NY', 'LA', 'SF']}) result = df.merge(lookup, on='city') ```
Exercises¶
Exercise 1.
Create a DataFrame of orders with a 'status' column. Use .isin() to filter rows where the status is either 'shipped' or 'delivered', excluding 'pending' and 'cancelled' orders.
Solution to Exercise 1
Pass a list of target values to .isin().
import pandas as pd
df = pd.DataFrame({
'order_id': [1, 2, 3, 4, 5],
'status': ['shipped', 'pending', 'delivered', 'cancelled', 'shipped']
})
active = df[df['status'].isin(['shipped', 'delivered'])]
print(active)
Exercise 2.
Use the negation of .isin() (with ~) to find all employees whose department is NOT in the list ['HR', 'Finance']. Print the filtered result.
Solution to Exercise 2
Negate the isin mask with ~.
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Carol', 'Dave'],
'department': ['IT', 'HR', 'Finance', 'IT']
})
result = df[~df['department'].isin(['HR', 'Finance'])]
print(result)
Exercise 3.
Given a DataFrame with 'ticker' and 'sector' columns, use .isin() on two columns simultaneously: filter rows where the ticker is in a watchlist AND the sector is in an approved sectors list.
Solution to Exercise 3
Combine two .isin() conditions with &.
import pandas as pd
df = pd.DataFrame({
'ticker': ['AAPL', 'JPM', 'MSFT', 'XOM', 'GOOGL'],
'sector': ['Tech', 'Finance', 'Tech', 'Energy', 'Tech']
})
watchlist = ['AAPL', 'MSFT', 'GOOGL']
approved = ['Tech', 'Energy']
result = df[df['ticker'].isin(watchlist) & df['sector'].isin(approved)]
print(result)