Skip to content

dropna Method

The dropna() method removes rows or columns containing missing values. It is useful when missing data cannot be reliably imputed.

Mental Model

dropna() is the "delete the problem" strategy for missing data. It removes entire rows (or columns) that contain NaN. It is safe when missing values are few and random, but dangerous when data is missing systematically -- dropping those rows could introduce bias. Always check how much data you are losing before committing.

Basic Usage

Drop rows with any missing values.

1. Drop Rows

```python import pandas as pd import numpy as np

url = "https://raw.githubusercontent.com/codebasics/py/master/pandas/5_handling_missing_data_fillna_dropna_interpolate/weather_data.csv" df = pd.read_csv(url, index_col='day', parse_dates=True) print(df)

dg = df.dropna() print(dg) ```

2. Drop Columns

python df.dropna(axis=1) # Drop columns with any NaN

3. Return Copy

```python

dropna returns a new DataFrame

dg = df.dropna()

Original df is unchanged

```

LeetCode Example: Student Names

Drop students with missing names.

1. Problem Data

python students = pd.DataFrame({ 'id': [1, 2, 3, 4, 5], 'name': ['Alice', None, 'Bob', None, 'Charlie'], 'grade': ['A', 'B', 'B+', 'A-', 'C'] })

2. Drop Missing Names

python result = students.dropna(subset=['name']) print(result)

3. Result

id name grade 0 1 Alice A 2 3 Bob B+ 4 5 Charlie C

LeetCode Example: Employee Data

Drop employees with any missing values.

1. Problem Data

python filtered_employees = pd.DataFrame({ 'employee_id': [1, 2, 3, 4, 5], 'manager_id': [2.0, None, None, 2.0, 3.0], 'salary': [25000, 35000, 28000, None, 32000] })

2. Drop All NaN Rows

python cleaned_employees = filtered_employees.dropna() print(cleaned_employees)

3. Result

Only rows with complete data remain:

employee_id manager_id salary 0 1 2.0 25000 4 5 3.0 32000

Practical Considerations

When to use dropna vs fillna.

1. Use dropna When

  • Missing data is random and limited
  • Filling would introduce bias
  • Sufficient data remains after dropping

2. Avoid dropna When

  • Missing data is systematic
  • Dropping loses too much information
  • Missing values can be reasonably estimated

3. Check Impact

python print(f"Before: {len(df)} rows") print(f"After: {len(df.dropna())} rows") print(f"Dropped: {len(df) - len(df.dropna())} rows")


Exercises

Exercise 1. Create a DataFrame with NaN values in different positions. Use .dropna() to drop rows with any missing values. Compare the number of rows before and after.

Solution to Exercise 1

Drop rows with any NaN and compare counts.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, np.nan, 3, 4],
    'B': [5, 6, np.nan, 8],
    'C': [9, 10, 11, 12]
})
print(f"Before: {len(df)} rows")
result = df.dropna()
print(f"After: {len(result)} rows")

Exercise 2. Create a DataFrame where one column is entirely NaN. Use .dropna(axis=1) to drop columns with any missing values. Verify the all-NaN column is removed.

Solution to Exercise 2

Drop all-NaN columns with axis=1.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [np.nan, np.nan, np.nan],
    'C': [4, np.nan, 6]
})
result = df.dropna(axis=1)
print(result.columns.tolist())
assert 'B' not in result.columns

Exercise 3. Create a DataFrame and use .dropna(subset=['col_name']) to drop rows only where a specific column has missing values, leaving other columns' NaN values intact.

Solution to Exercise 3

Drop rows based on a specific column's missing values.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['Alice', None, 'Carol', 'Dave'],
    'score': [90, 85, np.nan, 88]
})
result = df.dropna(subset=['name'])
print(result)
# Row with None name is dropped, but NaN in score is kept