dropna Method¶

The dropna() method removes rows or columns containing missing values. It is useful when missing data cannot be reliably imputed.

Mental Model

dropna() is the "delete the problem" strategy for missing data. It removes entire rows (or columns) that contain NaN. It is safe when missing values are few and random, but dangerous when data is missing systematically -- dropping those rows could introduce bias. Always check how much data you are losing before committing.

Basic Usage¶

Drop rows with any missing values.

1. Drop Rows¶

```python import pandas as pd import numpy as np

url = "https://raw.githubusercontent.com/codebasics/py/master/pandas/5_handling_missing_data_fillna_dropna_interpolate/weather_data.csv" df = pd.read_csv(url, index_col='day', parse_dates=True) print(df)

dg = df.dropna() print(dg) ```

2. Drop Columns¶

python df.dropna(axis=1) # Drop columns with any NaN

3. Return Copy¶

```python

dropna returns a new DataFrame¶

dg = df.dropna()

Original df is unchanged¶

```

LeetCode Example: Student Names¶

Drop students with missing names.

1. Problem Data¶

python students = pd.DataFrame({ 'id': [1, 2, 3, 4, 5], 'name': ['Alice', None, 'Bob', None, 'Charlie'], 'grade': ['A', 'B', 'B+', 'A-', 'C'] })

2. Drop Missing Names¶

python result = students.dropna(subset=['name']) print(result)

3. Result¶

id name grade 0 1 Alice A 2 3 Bob B+ 4 5 Charlie C

LeetCode Example: Employee Data¶

Drop employees with any missing values.

1. Problem Data¶

python filtered_employees = pd.DataFrame({ 'employee_id': [1, 2, 3, 4, 5], 'manager_id': [2.0, None, None, 2.0, 3.0], 'salary': [25000, 35000, 28000, None, 32000] })

2. Drop All NaN Rows¶

python cleaned_employees = filtered_employees.dropna() print(cleaned_employees)

3. Result¶

Only rows with complete data remain:

employee_id manager_id salary 0 1 2.0 25000 4 5 3.0 32000

Practical Considerations¶

When to use dropna vs fillna.

1. Use dropna When¶

Missing data is random and limited
Filling would introduce bias
Sufficient data remains after dropping

2. Avoid dropna When¶

Missing data is systematic
Dropping loses too much information
Missing values can be reasonably estimated

3. Check Impact¶

python print(f"Before: {len(df)} rows") print(f"After: {len(df.dropna())} rows") print(f"Dropped: {len(df) - len(df.dropna())} rows")

Exercises¶

Exercise 1. Create a DataFrame with NaN values in different positions. Use .dropna() to drop rows with any missing values. Compare the number of rows before and after.

Solution to Exercise 1

Drop rows with any NaN and compare counts.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, np.nan, 3, 4],
    'B': [5, 6, np.nan, 8],
    'C': [9, 10, 11, 12]
})
print(f"Before: {len(df)} rows")
result = df.dropna()
print(f"After: {len(result)} rows")

Exercise 2. Create a DataFrame where one column is entirely NaN. Use .dropna(axis=1) to drop columns with any missing values. Verify the all-NaN column is removed.

Solution to Exercise 2

Drop all-NaN columns with axis=1.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [np.nan, np.nan, np.nan],
    'C': [4, np.nan, 6]
})
result = df.dropna(axis=1)
print(result.columns.tolist())
assert 'B' not in result.columns

Exercise 3. Create a DataFrame and use .dropna(subset=['col_name']) to drop rows only where a specific column has missing values, leaving other columns' NaN values intact.

Solution to Exercise 3

Drop rows based on a specific column's missing values.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['Alice', None, 'Carol', 'Dave'],
    'score': [90, 85, np.nan, 88]
})
result = df.dropna(subset=['name'])
print(result)
# Row with None name is dropped, but NaN in score is kept