PP Plots¶
A probability-probability (PP) plot compares the empirical cumulative distribution function of a sample against a theoretical CDF by plotting their values at each data point. While QQ plots compare quantiles and are more sensitive to tail deviations, PP plots compare cumulative probabilities and are more sensitive to deviations near the center of the distribution. This page explains how PP plots are constructed, how to interpret common patterns, and how to build them with SciPy.
Construction¶
Given an ordered sample \(x_{(1)} \le x_{(2)} \le \cdots \le x_{(n)}\) and a candidate distribution with CDF \(F\), a PP plot displays the points
The horizontal axis shows the empirical cumulative probability \(\hat{F}_n(x_{(i)}) = i/n\), and the vertical axis shows the theoretical cumulative probability \(F(x_{(i)})\). If the data come from the distribution \(F\), the points cluster along the identity line \(y = x\).
Plotting Position Adjustment
A common refinement replaces \(i/n\) with a plotting position such as \((i - 0.5)/n\) or \(i/(n+1)\) to avoid probabilities of exactly 0 and 1 at the boundaries.
Building a PP Plot¶
SciPy does not provide a dedicated PP plot function, but constructing one requires only the CDF and sorted data.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(42)
data = np.random.normal(loc=5, scale=2, size=200)
# Fit normal distribution
mu, sigma = stats.norm.fit(data)
# PP plot
data_sorted = np.sort(data)
n = len(data_sorted)
empirical_cdf = (np.arange(1, n + 1) - 0.5) / n # Plotting position
theoretical_cdf = stats.norm.cdf(data_sorted, loc=mu, scale=sigma)
plt.figure(figsize=(6, 6))
plt.scatter(empirical_cdf, theoretical_cdf, s=10, alpha=0.6)
plt.plot([0, 1], [0, 1], 'r--', label='Identity line')
plt.xlabel('Empirical Cumulative Probability')
plt.ylabel('Theoretical Cumulative Probability')
plt.title('PP Plot (Normal Fit)')
plt.legend()
plt.axis('equal')
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.show()
Interpreting PP Plots¶
Deviations from the identity line indicate specific types of misfit between the data and the candidate distribution.
| Pattern | Interpretation |
|---|---|
| Points follow \(y = x\) | Good fit |
| S-shaped curve (below then above) | Data have lighter tails than the theoretical distribution |
| Inverted S-shape (above then below) | Data have heavier tails than the theoretical distribution |
| Points consistently above the line | Theoretical distribution is shifted right relative to the data |
| Points consistently below the line | Theoretical distribution is shifted left relative to the data |
PP Plot vs QQ Plot¶
PP plots and QQ plots emphasize different aspects of distributional fit.
| Feature | PP Plot | QQ Plot |
|---|---|---|
| Axes | Cumulative probabilities (0 to 1) | Quantile values (data scale) |
| Sensitivity | Center of distribution | Tails of distribution |
| Scale invariance | Yes (probabilities are unitless) | No (quantiles are in data units) |
| Best for | Detecting location/shape differences | Detecting tail heaviness, skewness |
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# PP plot
axes[0].scatter(empirical_cdf, theoretical_cdf, s=10, alpha=0.6)
axes[0].plot([0, 1], [0, 1], 'r--')
axes[0].set_xlabel('Empirical CDF')
axes[0].set_ylabel('Theoretical CDF')
axes[0].set_title('PP Plot')
axes[0].set_aspect('equal')
# QQ plot
stats.probplot(data, dist="norm", plot=axes[1])
axes[1].set_title('QQ Plot')
plt.tight_layout()
plt.show()
PP Plot for Non-Normal Distributions¶
PP plots work with any continuous distribution that has a CDF available in SciPy.
# Generate exponential data
data_exp = stats.expon.rvs(scale=3, size=300, random_state=42)
# PP plot against exponential fit
loc_fit, scale_fit = stats.expon.fit(data_exp)
sorted_exp = np.sort(data_exp)
n = len(sorted_exp)
emp_cdf = (np.arange(1, n + 1) - 0.5) / n
theo_cdf = stats.expon.cdf(sorted_exp, loc=loc_fit, scale=scale_fit)
plt.figure(figsize=(6, 6))
plt.scatter(emp_cdf, theo_cdf, s=10, alpha=0.6)
plt.plot([0, 1], [0, 1], 'r--', label='Identity line')
plt.xlabel('Empirical CDF')
plt.ylabel('Exponential CDF')
plt.title('PP Plot (Exponential Fit)')
plt.legend()
plt.axis('equal')
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.show()
Summary¶
PP plots compare empirical and theoretical cumulative probabilities, providing a diagnostic that is most sensitive to deviations near the center of the distribution. Their bounded \([0, 1] \times [0, 1]\) domain makes them scale-invariant and visually consistent across different datasets. While QQ plots are generally preferred for detecting tail behavior, PP plots complement them by highlighting location shifts and central shape mismatches. Using both together gives a complete picture of distributional fit.