Skip to content

isin Method

The isin() method filters rows where column values are in a specified list or set.

Mental Model

isin() is the pandas equivalent of SQL's IN clause. It checks each value against a set of allowed values and returns a boolean mask. Instead of chaining (x == 'a') | (x == 'b') | (x == 'c'), write x.isin(['a', 'b', 'c']) -- cleaner, faster, and scales to any number of values.

Basic Usage

Check membership in a list.

1. Simple isin

```python import pandas as pd

df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'city': ['NY', 'LA', 'SF', 'NY'] })

Filter where city is in list

result = df[df['city'].isin(['NY', 'SF'])] print(result) ```

name city 0 Alice NY 2 Charlie SF 3 David NY

2. Boolean Result

python mask = df['city'].isin(['NY', 'SF']) print(mask)

0 True 1 False 2 True 3 True Name: city, dtype: bool

3. Apply Filter

python result = df[mask]

NOT isin

Filter values NOT in list.

1. Negate with ~

```python

Cities NOT in list

result = df[~df['city'].isin(['NY', 'SF'])] print(result) ```

name city 1 Bob LA

2. Common Pattern

```python

Find customers who never ordered

customers[~customers['id'].isin(orders['customerId'])] ```

3. Exclude Values

```python

Remove specific categories

df[~df['category'].isin(['Deprecated', 'Deleted'])] ```

LeetCode Example: Customers Who Never Order

Find customers not in orders.

1. Sample Data

```python customers = pd.DataFrame({ 'id': [1, 2, 3, 4], 'name': ['Alice', 'Bob', 'Charlie', 'David'] })

orders = pd.DataFrame({ 'customerId': [1, 3, 1] }) ```

2. Filter Not In

python result = customers[~customers['id'].isin(orders['customerId'])] print(result)

id name 1 2 Bob 3 4 David

3. Select Column

python result = result[['name']].rename(columns={'name': 'Customers'})

isin with Series

Use another column or Series as the list.

1. Column as List

python valid_ids = orders['customerId'].unique() result = customers[customers['id'].isin(valid_ids)]

2. From Another DataFrame

```python

Products sold in Q1

q1_products = q1_sales['product_id'] products_sold = products[products['id'].isin(q1_products)] ```

3. Dynamic Lists

```python

Filter based on computed values

top_cities = df.groupby('city')['sales'].sum().nlargest(5).index top_city_data = df[df['city'].isin(top_cities)] ```

isin with Dictionary

Check multiple columns simultaneously.

1. Dict Syntax

```python df = pd.DataFrame({ 'A': [1, 2, 3], 'B': ['x', 'y', 'z'] })

Check if values are in respective lists

mask = df.isin({'A': [1, 3], 'B': ['x', 'z']}) print(mask) ```

A B 0 True True 1 False False 2 True True

2. Row Filter

```python

Rows where ALL columns match

result = df[mask.all(axis=1)]

Rows where ANY column matches

result = df[mask.any(axis=1)] ```

3. Use Case

```python

Valid combinations

valid = { 'department': ['Sales', 'Marketing'], 'level': ['Senior', 'Manager'] } filtered = df[df.isin(valid).all(axis=1)] ```

Numeric isin

Filter numeric values.

1. Integer List

python df = pd.DataFrame({'id': [1, 2, 3, 4, 5]}) result = df[df['id'].isin([1, 3, 5])]

2. Float Values

```python

Be careful with float precision

df[df['value'].isin([1.0, 2.0, 3.0])] ```

3. Range Alternative

```python

For ranges, use comparison operators

df[(df['id'] >= 1) & (df['id'] <= 5)] ```

Date isin

Filter specific dates.

1. Date List

python dates = pd.to_datetime(['2024-01-01', '2024-01-15', '2024-02-01']) result = df[df['date'].isin(dates)]

2. Month Filter

```python

Q1 months

result = df[df['date'].dt.month.isin([1, 2, 3])] ```

3. Year Filter

python result = df[df['date'].dt.year.isin([2023, 2024])]

Performance

isin optimization.

1. Set for Large Lists

```python

Convert to set for faster lookup

large_list = set(range(10000)) df[df['id'].isin(large_list)] ```

2. vs Multiple OR

```python

isin is faster than multiple OR conditions

Good

df[df['city'].isin(['NY', 'LA', 'SF'])]

Slower

df[(df['city'] == 'NY') | (df['city'] == 'LA') | (df['city'] == 'SF')] ```

3. Merge Alternative

```python

For very large lookups, merge can be faster

lookup = pd.DataFrame({'city': ['NY', 'LA', 'SF']}) result = df.merge(lookup, on='city') ```


Exercises

Exercise 1. Create a DataFrame of orders with a 'status' column. Use .isin() to filter rows where the status is either 'shipped' or 'delivered', excluding 'pending' and 'cancelled' orders.

Solution to Exercise 1

Pass a list of target values to .isin().

import pandas as pd

df = pd.DataFrame({
    'order_id': [1, 2, 3, 4, 5],
    'status': ['shipped', 'pending', 'delivered', 'cancelled', 'shipped']
})
active = df[df['status'].isin(['shipped', 'delivered'])]
print(active)

Exercise 2. Use the negation of .isin() (with ~) to find all employees whose department is NOT in the list ['HR', 'Finance']. Print the filtered result.

Solution to Exercise 2

Negate the isin mask with ~.

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol', 'Dave'],
    'department': ['IT', 'HR', 'Finance', 'IT']
})
result = df[~df['department'].isin(['HR', 'Finance'])]
print(result)

Exercise 3. Given a DataFrame with 'ticker' and 'sector' columns, use .isin() on two columns simultaneously: filter rows where the ticker is in a watchlist AND the sector is in an approved sectors list.

Solution to Exercise 3

Combine two .isin() conditions with &.

import pandas as pd

df = pd.DataFrame({
    'ticker': ['AAPL', 'JPM', 'MSFT', 'XOM', 'GOOGL'],
    'sector': ['Tech', 'Finance', 'Tech', 'Energy', 'Tech']
})
watchlist = ['AAPL', 'MSFT', 'GOOGL']
approved = ['Tech', 'Energy']
result = df[df['ticker'].isin(watchlist) & df['sector'].isin(approved)]
print(result)