dropna Keywords¶

The dropna() method accepts several keyword arguments to control which rows or columns are dropped.

how Keyword¶

Specify when to drop a row or column.

1. how='any' (Default)¶

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, np.nan, 3],
    'B': [4, np.nan, np.nan],
    'C': [7, 8, 9]
})

df.dropna(how='any')
# Drops row if ANY value is NaN

2. how='all'¶

url = "https://raw.githubusercontent.com/codebasics/py/master/pandas/5_handling_missing_data_fillna_dropna_interpolate/weather_data.csv"
df = pd.read_csv(url, index_col='day', parse_dates=True)

dg = df.dropna(how='all')
print(dg)

Only drops rows where ALL values are NaN.

3. Comparison¶

# how='any': Drop if at least one NaN
# how='all': Drop only if entire row is NaN

subset Keyword¶

Specify columns to consider for NaN detection.

1. Single Column¶

students = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', None, 'Bob'],
    'grade': ['A', 'B', None]
})

students.dropna(subset=['name'])
# Only checks 'name' column

2. Multiple Columns¶

students.dropna(subset=['name', 'grade'])
# Drops if NaN in name OR grade

3. Selective Cleaning¶

# Keep rows even if other columns have NaN
# Only require specific columns to be non-null
df.dropna(subset=['critical_column'])

thresh Keyword¶

Require minimum number of non-NaN values.

1. Basic Usage¶

url = "https://raw.githubusercontent.com/codebasics/py/master/pandas/5_handling_missing_data_fillna_dropna_interpolate/weather_data.csv"
df = pd.read_csv(url, index_col='day', parse_dates=True)

dg = df.dropna(thresh=2)
print(dg)

Keeps rows with at least 2 non-NaN values.

2. Calculate Threshold¶

# Keep rows with at least 50% non-null values
threshold = int(len(df.columns) * 0.5)
df.dropna(thresh=threshold)

3. Cannot Combine with how¶

# thresh cannot be used with how parameter
# df.dropna(how='any', thresh=2)  # Error

axis Keyword¶

Drop rows or columns.

1. axis=0 (Default)¶

df.dropna(axis=0)  # Drop rows
df.dropna()        # Same as axis=0

2. axis=1¶

df.dropna(axis=1)  # Drop columns with NaN

3. Column Cleaning¶

# Remove columns with more than 50% missing
threshold = int(len(df) * 0.5)
df.dropna(axis=1, thresh=threshold)

Combined Keywords¶

Use multiple keywords for precise control.

1. Subset with Threshold¶

# Keep rows with at least 2 non-null values
# in the specified columns
df.dropna(subset=['col1', 'col2', 'col3'], thresh=2)

2. Axis with how¶

# Drop columns where all values are NaN
df.dropna(axis=1, how='all')

3. Practical Pipeline¶

df_clean = (df
    .dropna(how='all')           # Remove empty rows
    .dropna(axis=1, how='all')   # Remove empty columns
    .dropna(subset=['key_col'])  # Require key column
)

inplace Keyword¶

Modify DataFrame in place.

1. Without inplace¶

dg = df.dropna()  # Returns new DataFrame

2. With inplace¶

df.dropna(inplace=True)  # Modifies df directly

3. Prefer Reassignment¶

df = df.dropna()  # More explicit than inplace