dropna Keywords¶
The dropna() method accepts several keyword arguments to control which rows or columns are dropped.
how Keyword¶
Specify when to drop a row or column.
1. how='any' (Default)¶
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, np.nan, 3],
'B': [4, np.nan, np.nan],
'C': [7, 8, 9]
})
df.dropna(how='any')
# Drops row if ANY value is NaN
2. how='all'¶
url = "https://raw.githubusercontent.com/codebasics/py/master/pandas/5_handling_missing_data_fillna_dropna_interpolate/weather_data.csv"
df = pd.read_csv(url, index_col='day', parse_dates=True)
dg = df.dropna(how='all')
print(dg)
Only drops rows where ALL values are NaN.
3. Comparison¶
# how='any': Drop if at least one NaN
# how='all': Drop only if entire row is NaN
subset Keyword¶
Specify columns to consider for NaN detection.
1. Single Column¶
students = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', None, 'Bob'],
'grade': ['A', 'B', None]
})
students.dropna(subset=['name'])
# Only checks 'name' column
2. Multiple Columns¶
students.dropna(subset=['name', 'grade'])
# Drops if NaN in name OR grade
3. Selective Cleaning¶
# Keep rows even if other columns have NaN
# Only require specific columns to be non-null
df.dropna(subset=['critical_column'])
thresh Keyword¶
Require minimum number of non-NaN values.
1. Basic Usage¶
url = "https://raw.githubusercontent.com/codebasics/py/master/pandas/5_handling_missing_data_fillna_dropna_interpolate/weather_data.csv"
df = pd.read_csv(url, index_col='day', parse_dates=True)
dg = df.dropna(thresh=2)
print(dg)
Keeps rows with at least 2 non-NaN values.
2. Calculate Threshold¶
# Keep rows with at least 50% non-null values
threshold = int(len(df.columns) * 0.5)
df.dropna(thresh=threshold)
3. Cannot Combine with how¶
# thresh cannot be used with how parameter
# df.dropna(how='any', thresh=2) # Error
axis Keyword¶
Drop rows or columns.
1. axis=0 (Default)¶
df.dropna(axis=0) # Drop rows
df.dropna() # Same as axis=0
2. axis=1¶
df.dropna(axis=1) # Drop columns with NaN
3. Column Cleaning¶
# Remove columns with more than 50% missing
threshold = int(len(df) * 0.5)
df.dropna(axis=1, thresh=threshold)
Combined Keywords¶
Use multiple keywords for precise control.
1. Subset with Threshold¶
# Keep rows with at least 2 non-null values
# in the specified columns
df.dropna(subset=['col1', 'col2', 'col3'], thresh=2)
2. Axis with how¶
# Drop columns where all values are NaN
df.dropna(axis=1, how='all')
3. Practical Pipeline¶
df_clean = (df
.dropna(how='all') # Remove empty rows
.dropna(axis=1, how='all') # Remove empty columns
.dropna(subset=['key_col']) # Require key column
)
inplace Keyword¶
Modify DataFrame in place.
1. Without inplace¶
dg = df.dropna() # Returns new DataFrame
2. With inplace¶
df.dropna(inplace=True) # Modifies df directly
3. Prefer Reassignment¶
df = df.dropna() # More explicit than inplace