Skip to content

dropna Keywords

The dropna() method accepts several keyword arguments to control which rows or columns are dropped.

Mental Model

The keywords fine-tune which rows survive. how='any' drops a row if even one cell is NaN; how='all' drops only if every cell is NaN. subset limits the NaN check to specific columns. thresh sets a minimum count of non-NaN values required to keep the row. Together they let you express nuanced "how much missing is too much" policies.

how Keyword

Specify when to drop a row or column.

1. how='any' (Default)

```python import pandas as pd import numpy as np

df = pd.DataFrame({ 'A': [1, np.nan, 3], 'B': [4, np.nan, np.nan], 'C': [7, 8, 9] })

df.dropna(how='any')

Drops row if ANY value is NaN

```

2. how='all'

```python url = "https://raw.githubusercontent.com/codebasics/py/master/pandas/5_handling_missing_data_fillna_dropna_interpolate/weather_data.csv" df = pd.read_csv(url, index_col='day', parse_dates=True)

dg = df.dropna(how='all') print(dg) ```

Only drops rows where ALL values are NaN.

3. Comparison

```python

how='any': Drop if at least one NaN

how='all': Drop only if entire row is NaN

```

subset Keyword

Specify columns to consider for NaN detection.

1. Single Column

```python students = pd.DataFrame({ 'id': [1, 2, 3], 'name': ['Alice', None, 'Bob'], 'grade': ['A', 'B', None] })

students.dropna(subset=['name'])

Only checks 'name' column

```

2. Multiple Columns

```python students.dropna(subset=['name', 'grade'])

Drops if NaN in name OR grade

```

3. Selective Cleaning

```python

Keep rows even if other columns have NaN

Only require specific columns to be non-null

df.dropna(subset=['critical_column']) ```

thresh Keyword

Require minimum number of non-NaN values.

1. Basic Usage

```python url = "https://raw.githubusercontent.com/codebasics/py/master/pandas/5_handling_missing_data_fillna_dropna_interpolate/weather_data.csv" df = pd.read_csv(url, index_col='day', parse_dates=True)

dg = df.dropna(thresh=2) print(dg) ```

Keeps rows with at least 2 non-NaN values.

2. Calculate Threshold

```python

Keep rows with at least 50% non-null values

threshold = int(len(df.columns) * 0.5) df.dropna(thresh=threshold) ```

3. Cannot Combine with how

```python

thresh cannot be used with how parameter

df.dropna(how='any', thresh=2) # Error

```

axis Keyword

Drop rows or columns.

1. axis=0 (Default)

python df.dropna(axis=0) # Drop rows df.dropna() # Same as axis=0

2. axis=1

python df.dropna(axis=1) # Drop columns with NaN

3. Column Cleaning

```python

Remove columns with more than 50% missing

threshold = int(len(df) * 0.5) df.dropna(axis=1, thresh=threshold) ```

Combined Keywords

Use multiple keywords for precise control.

1. Subset with Threshold

```python

Keep rows with at least 2 non-null values

in the specified columns

df.dropna(subset=['col1', 'col2', 'col3'], thresh=2) ```

2. Axis with how

```python

Drop columns where all values are NaN

df.dropna(axis=1, how='all') ```

3. Practical Pipeline

python df_clean = (df .dropna(how='all') # Remove empty rows .dropna(axis=1, how='all') # Remove empty columns .dropna(subset=['key_col']) # Require key column )

inplace Keyword

Modify DataFrame in place.

1. Without inplace

python dg = df.dropna() # Returns new DataFrame

2. With inplace

python df.dropna(inplace=True) # Modifies df directly

3. Prefer Reassignment

python df = df.dropna() # More explicit than inplace


Exercises

Exercise 1. Create a DataFrame where one row is entirely NaN. Use dropna(how='all') to drop only that row. Verify that rows with partial NaN values are kept.

Solution to Exercise 1

Drop only rows where all values are NaN.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, np.nan, 3],
    'B': [np.nan, np.nan, 6],
    'C': [7, np.nan, 9]
})
result = df.dropna(how='all')
print(result)
print(f"Rows kept: {len(result)}")

Exercise 2. Create a DataFrame and use dropna(thresh=2) to keep only rows that have at least 2 non-null values. Count how many rows are dropped.

Solution to Exercise 2

Use thresh to require a minimum number of non-null values.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, np.nan, np.nan, 4],
    'B': [np.nan, np.nan, 3, 4],
    'C': [1, np.nan, np.nan, 4]
})
before = len(df)
result = df.dropna(thresh=2)
print(result)
print(f"Dropped: {before - len(result)} rows")

Exercise 3. Create a DataFrame with 4 columns and use dropna(subset=['col1', 'col2']) to drop rows only when col1 or col2 has NaN, ignoring NaN in other columns.

Solution to Exercise 3

Drop based on specific column subset.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col1': [1, np.nan, 3, 4],
    'col2': [5, 6, np.nan, 8],
    'col3': [np.nan, np.nan, np.nan, 12],
    'col4': [13, 14, 15, 16]
})
result = df.dropna(subset=['col1', 'col2'])
print(result)
# NaN in col3 does not trigger row removal