Median Absolute Deviation (MAD)¶
Overview¶
The Median Absolute Deviation (MAD) is a robust measure of statistical dispersion that measures the spread of data around the median. Unlike variance and standard deviation, MAD is resistant to outliers, making it an ideal complement to the median for describing skewed or contaminated datasets.
Definition¶
The MAD is computed in three steps:
- Find the median \(M = \text{median}(x_1, x_2, \ldots, x_n)\)
- Compute the absolute deviations: \(d_i = |x_i - M|\) for each observation
- Find the median of these deviations: \(\text{MAD} = \text{median}(d_1, d_2, \ldots, d_n)\)
Standardization Constant¶
To make MAD directly comparable to standard deviation (particularly for normally distributed data), multiply by a standardization constant:
The constant 0.6745 is the 75th percentile of the standard normal distribution, chosen so that for normally distributed data, standardized MAD ≈ standard deviation.
Example: U.S. State Population¶
Using state population data, compute MAD and compare to standard deviation:
import pandas as pd
from statsmodels import robust
# Load state data
state = pd.read_csv('state.csv')
# Standard deviation (sensitive to outliers)
std_dev = state['Population'].std()
print(f"Standard Deviation: {std_dev:,.0f}")
# MAD using statsmodels
mad = robust.scale.mad(state['Population'])
print(f"MAD (standardized): {mad:,.0f}")
# Manual calculation
median_pop = state['Population'].median()
abs_deviations = abs(state['Population'] - median_pop)
mad_manual = abs_deviations.median()
mad_standardized = mad_manual / 0.6744897501960817
print(f"MAD (manual calc): {mad_standardized:,.0f}")
Output:
Standard Deviation: 6,848,235
MAD (standardized): 3,849,876
MAD (manual calc): 3,849,876
California's extreme population (37M vs. a median of 4.4M) heavily influences the standard deviation, pulling it upward. The MAD, based on deviations from the median, is less affected by this outlier.
Why MAD is Robust¶
Consider the effect of outliers on these two measures:
import pandas as pd
import numpy as np
from statsmodels import robust
# Original state population data
state = pd.read_csv('state.csv')
original_std = state['Population'].std()
original_mad = robust.scale.mad(state['Population'])
# Introduce extreme outliers
population_with_outliers = pd.concat([
state['Population'],
pd.Series([100_000_000, 150_000_000]) # Two fictional giant states
])
outlier_std = population_with_outliers.std()
outlier_mad = robust.scale.mad(population_with_outliers)
print("Impact of Outliers:")
print(f" Std Dev: {original_std:,.0f} → {outlier_std:,.0f} ({100 * (outlier_std - original_std) / original_std:.1f}% increase)")
print(f" MAD: {original_mad:,.0f} → {outlier_mad:,.0f} ({100 * (outlier_mad - original_mad) / original_mad:.1f}% increase)")
Adding two extreme outliers dramatically increases standard deviation but barely affects MAD. This demonstrates MAD's robustness.
Robustness Properties¶
MAD is a robust statistic with:
- Breakdown point: Up to 50% of data can be arbitrarily contaminated before MAD becomes unreliable, compared to 0% for standard deviation.
- Influence function: Bounded—one extreme outlier has limited effect.
- Efficiency: For normally distributed data, MAD is about 64% as efficient as standard deviation. This efficiency loss is small, given MAD's massive robustness gain.
Comparison: Standard Deviation vs. MAD¶
| Characteristic | Standard Deviation | MAD |
|---|---|---|
| Sensitivity to outliers | High | Low |
| Uses all data points | Yes | Yes |
| Breakdown point | 0% | 50% |
| Computational complexity | \(O(n)\) | \(O(n \log n)\) (due to sorting) |
| Interpretability | Familiar to most analysts | Less familiar |
| Efficiency (normal data) | 100% | 64% |
When to Use MAD¶
Skewed distributions: Income, wealth, or other right-skewed financial data Outlier-prone datasets: Sensor measurements, astronomical observations Robust estimation: When you cannot trust all data points equally Non-normal data: Heavy-tailed or multimodal distributions
Practical Example: Financial Returns¶
For stock market analysis, MAD can be more representative than standard deviation:
import pandas as pd
from statsmodels import robust
# Hypothetical daily stock returns
returns = pd.Series([0.01, 0.02, -0.01, 0.015, -0.005, 0.03, -0.02,
0.01, -0.01, 0.005, -0.015, 0.02, -0.50]) # One crash day
print(f"Standard Deviation: {returns.std():.4f}")
print(f"MAD (standardized): {robust.scale.mad(returns):.4f}")
# The crash day (-0.50) inflates std dev much more than MAD
The single crash day (-0.50) vastly increases standard deviation, which might overstate typical daily volatility. MAD provides a clearer picture of routine variation.
Computing MAD in Python¶
Using statsmodels (recommended)¶
from statsmodels import robust
import pandas as pd
data = pd.Series([1, 2, 3, 4, 5, 100]) # Last value is an outlier
mad = robust.scale.mad(data)
print(f"MAD: {mad:.2f}")
Manual Calculation¶
import pandas as pd
import numpy as np
data = pd.Series([1, 2, 3, 4, 5, 100])
median = data.median()
abs_dev = abs(data - median)
mad = abs_dev.median()
mad_standardized = mad / 0.6744897501960817 # Standardize for normal data
print(f"MAD (standardized): {mad_standardized:.2f}")
Summary¶
The Median Absolute Deviation is a powerful tool for measuring data spread in the presence of outliers. By basing dispersion on deviations from the median (itself robust), MAD achieves a level of stability that variance and standard deviation cannot match. For any analysis involving skewed data, outliers, or non-normal distributions, pairing the median with MAD provides a more trustworthy summary than the mean with standard deviation.