drop_duplicates Method¶
The drop_duplicates() method removes duplicate rows from a DataFrame.
Basic Usage¶
Remove duplicate rows.
1. Remove All Duplicates¶
import pandas as pd
df = pd.DataFrame({
'A': [1, 1, 2, 2, 3],
'B': ['a', 'a', 'b', 'b', 'c']
})
result = df.drop_duplicates()
print(result)
A B
0 1 a
2 2 b
4 3 c
2. Keep First (Default)¶
First occurrence is kept, duplicates removed.
3. Returns New DataFrame¶
# Original unchanged
new_df = df.drop_duplicates()
subset Parameter¶
Check specific columns for duplicates.
1. Single Column¶
df = pd.DataFrame({
'id': [1, 2, 1, 3],
'name': ['Alice', 'Bob', 'Alice', 'Charlie']
})
result = df.drop_duplicates(subset='id')
print(result)
id name
0 1 Alice
1 2 Bob
3 3 Charlie
2. Multiple Columns¶
result = df.drop_duplicates(subset=['id', 'name'])
3. All Columns (Default)¶
# subset=None checks all columns
result = df.drop_duplicates() # Uses all columns
keep Parameter¶
Control which duplicate to keep.
1. Keep First (Default)¶
df.drop_duplicates(keep='first')
# Keeps first occurrence
2. Keep Last¶
df.drop_duplicates(keep='last')
# Keeps last occurrence
3. Keep None (Remove All)¶
df.drop_duplicates(keep=False)
# Removes ALL duplicates, keeps only unique rows
LeetCode Example: Delete Duplicate Emails¶
Keep first occurrence by email.
1. Sample Data¶
person = pd.DataFrame({
'id': [1, 2, 3],
'email': ['a@example.com', 'b@example.com', 'a@example.com']
})
2. Remove Duplicates¶
person.drop_duplicates(subset='email', inplace=True)
print(person)
id email
0 1 a@example.com
1 2 b@example.com
3. Sorted First¶
# Sort to control which row is kept
person = person.sort_values('id')
person.drop_duplicates(subset='email', keep='first', inplace=True)
LeetCode Example: Second Highest Salary¶
Get unique sorted values.
1. Unique Salaries¶
employee = pd.DataFrame({
'id': [1, 2, 3, 4],
'salary': [100, 200, 200, 300]
})
unique_salaries = employee['salary'].drop_duplicates()
2. Sort Descending¶
sorted_salaries = unique_salaries.sort_values(ascending=False)
print(sorted_salaries)
3 300
1 200
0 100
3. Get Second Highest¶
if len(sorted_salaries) >= 2:
second_highest = sorted_salaries.iloc[1]
else:
second_highest = None
LeetCode Example: Consecutive Numbers¶
Drop duplicates after filtering.
1. Find Consecutive¶
logs = pd.DataFrame({
'id': [1, 2, 3, 4, 5, 6],
'num': [1, 1, 1, 2, 2, 2]
})
2. Filter and Drop¶
# After filtering for consecutive numbers
consecutive = logs[
(logs['num'] == logs['num'].shift(1)) &
(logs['num'] == logs['num'].shift(2))
]
# Drop duplicate numbers
result = consecutive.drop_duplicates('num')
3. Unique Values Only¶
unique_nums = result[['num']].rename(columns={'num': 'ConsecutiveNums'})
LeetCode Example: Investments in 2016¶
keep=False for removing all duplicates.
1. Sample Data¶
insurance = pd.DataFrame({
'pid': [1, 2, 3, 4],
'lat': [10.0, 10.0, 20.0, 20.0],
'lon': [5.0, 5.0, 15.0, 25.0],
'tiv_2016': [100, 200, 300, 400]
})
2. Remove All Duplicates¶
# Keep only unique lat/lon combinations
unique_locations = insurance.drop_duplicates(
subset=['lat', 'lon'],
keep=False
)
3. Result¶
print(unique_locations)
pid lat lon tiv_2016
2 3 20.0 15.0 300
3 4 20.0 25.0 400
inplace Parameter¶
Modify DataFrame directly.
1. Without inplace¶
result = df.drop_duplicates()
# df unchanged
2. With inplace¶
df.drop_duplicates(inplace=True)
# df modified directly
3. Reassignment Preferred¶
df = df.drop_duplicates()
ignore_index Parameter¶
Reset index after dropping.
1. Keep Original Index¶
result = df.drop_duplicates()
# Keeps original index values
2. Reset Index¶
result = df.drop_duplicates(ignore_index=True)
# Index is 0, 1, 2, ...
3. Equivalent To¶
result = df.drop_duplicates().reset_index(drop=True)