drop Method¶

The drop() method removes specified rows or columns from a DataFrame.

Drop Columns¶

Remove columns by name.

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

result = df.drop('B', axis=1)
print(result)

result = df.drop(['B', 'C'], axis=1)

result = df.drop(columns=['B', 'C'])
# More explicit than axis=1

Remove rows by index.

result = df.drop(0)  # axis=0 is default
print(result)

   A  B  C
1  2  5  8
2  3  6  9

result = df.drop([0, 2])

result = df.drop(index=[0, 2])

Modify DataFrame directly.

result = df.drop(columns=['B'])
# df unchanged, result has change

df.drop(columns=['B'], inplace=True)
# df modified directly, returns None

# Prefer reassignment over inplace
df = df.drop(columns=['B'])

Drop unnecessary columns.

tree = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'p_id': [None, 1, 1, 2, 2],
    'type': ['Root', 'Inner', 'Inner', 'Leaf', 'Leaf']
})

result = tree.drop(columns='p_id')
print(result)

   id   type
0   1   Root
1   2  Inner
2   3  Inner
3   4   Leaf
4   5   Leaf

result = tree.drop('p_id', axis=1)

Drop ID column from rides.

rides = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'user_id': [1, 1, 2, 3],
    'distance': [10, 15, 20, 25]
})

result = rides.drop('id', axis=1)

print(result)

   user_id  distance
0        1        10
1        1        15
2        2        20
3        3        25

Remove rows based on conditions.

# Use dropna instead
df = df.dropna()

# Filter instead of drop
df = df[df['status'] != 'Deleted']

# Find rows to drop
to_drop = df[df['value'] < 0].index
df = df.drop(to_drop)

Handle missing labels.

# Raises KeyError if column doesn't exist
df.drop(columns=['NonExistent'])  # Error!

df.drop(columns=['NonExistent'], errors='ignore')
# No error, returns df unchanged

# Drop if exists
columns_to_drop = ['B', 'NonExistent']
df.drop(columns=columns_to_drop, errors='ignore')

Use drop_duplicates for duplicate removal.

# drop: remove by label
df.drop(index=[0, 1])

# drop_duplicates: remove duplicate rows
df.drop_duplicates()

# drop: known indices/columns
# drop_duplicates: based on values

Refer to drop_duplicates.md for duplicate removal.

drop in pipelines.

result = (
    df
    .drop(columns=['temp_col'])
    .drop(index=[0])
    .reset_index(drop=True)
)

result = (
    df
    .assign(calculated=df['a'] + df['b'])
    .drop(columns=['a', 'b'])
    .rename(columns={'calculated': 'sum'})
)

result = (
    raw_df
    .dropna()
    .drop(columns=['unnecessary_col'])
    .reset_index(drop=True)
)