Skip to content

PP Plots

A probability-probability (PP) plot compares the empirical cumulative distribution function of a sample against a theoretical CDF by plotting their values at each data point. While QQ plots compare quantiles and are more sensitive to tail deviations, PP plots compare cumulative probabilities and are more sensitive to deviations near the center of the distribution. This page explains how PP plots are constructed, how to interpret common patterns, and how to build them with SciPy.


Construction

Given an ordered sample \(x_{(1)} \le x_{(2)} \le \cdots \le x_{(n)}\) and a candidate distribution with CDF \(F\), a PP plot displays the points

\[ \left(\frac{i}{n},\; F(x_{(i)})\right), \quad i = 1, 2, \ldots, n \]

The horizontal axis shows the empirical cumulative probability \(\hat{F}_n(x_{(i)}) = i/n\), and the vertical axis shows the theoretical cumulative probability \(F(x_{(i)})\). If the data come from the distribution \(F\), the points cluster along the identity line \(y = x\).

Plotting Position Adjustment

A common refinement replaces \(i/n\) with a plotting position such as \((i - 0.5)/n\) or \(i/(n+1)\) to avoid probabilities of exactly 0 and 1 at the boundaries.


Building a PP Plot

SciPy does not provide a dedicated PP plot function, but constructing one requires only the CDF and sorted data.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)
data = np.random.normal(loc=5, scale=2, size=200)

# Fit normal distribution
mu, sigma = stats.norm.fit(data)

# PP plot
data_sorted = np.sort(data)
n = len(data_sorted)
empirical_cdf = (np.arange(1, n + 1) - 0.5) / n  # Plotting position
theoretical_cdf = stats.norm.cdf(data_sorted, loc=mu, scale=sigma)

plt.figure(figsize=(6, 6))
plt.scatter(empirical_cdf, theoretical_cdf, s=10, alpha=0.6)
plt.plot([0, 1], [0, 1], 'r--', label='Identity line')
plt.xlabel('Empirical Cumulative Probability')
plt.ylabel('Theoretical Cumulative Probability')
plt.title('PP Plot (Normal Fit)')
plt.legend()
plt.axis('equal')
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.show()

Interpreting PP Plots

Deviations from the identity line indicate specific types of misfit between the data and the candidate distribution.

Pattern Interpretation
Points follow \(y = x\) Good fit
S-shaped curve (below then above) Data have lighter tails than the theoretical distribution
Inverted S-shape (above then below) Data have heavier tails than the theoretical distribution
Points consistently above the line Theoretical distribution is shifted right relative to the data
Points consistently below the line Theoretical distribution is shifted left relative to the data

PP Plot vs QQ Plot

PP plots and QQ plots emphasize different aspects of distributional fit.

Feature PP Plot QQ Plot
Axes Cumulative probabilities (0 to 1) Quantile values (data scale)
Sensitivity Center of distribution Tails of distribution
Scale invariance Yes (probabilities are unitless) No (quantiles are in data units)
Best for Detecting location/shape differences Detecting tail heaviness, skewness
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# PP plot
axes[0].scatter(empirical_cdf, theoretical_cdf, s=10, alpha=0.6)
axes[0].plot([0, 1], [0, 1], 'r--')
axes[0].set_xlabel('Empirical CDF')
axes[0].set_ylabel('Theoretical CDF')
axes[0].set_title('PP Plot')
axes[0].set_aspect('equal')

# QQ plot
stats.probplot(data, dist="norm", plot=axes[1])
axes[1].set_title('QQ Plot')

plt.tight_layout()
plt.show()

PP Plot for Non-Normal Distributions

PP plots work with any continuous distribution that has a CDF available in SciPy.

# Generate exponential data
data_exp = stats.expon.rvs(scale=3, size=300, random_state=42)

# PP plot against exponential fit
loc_fit, scale_fit = stats.expon.fit(data_exp)
sorted_exp = np.sort(data_exp)
n = len(sorted_exp)
emp_cdf = (np.arange(1, n + 1) - 0.5) / n
theo_cdf = stats.expon.cdf(sorted_exp, loc=loc_fit, scale=scale_fit)

plt.figure(figsize=(6, 6))
plt.scatter(emp_cdf, theo_cdf, s=10, alpha=0.6)
plt.plot([0, 1], [0, 1], 'r--', label='Identity line')
plt.xlabel('Empirical CDF')
plt.ylabel('Exponential CDF')
plt.title('PP Plot (Exponential Fit)')
plt.legend()
plt.axis('equal')
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.show()

Summary

PP plots compare empirical and theoretical cumulative probabilities, providing a diagnostic that is most sensitive to deviations near the center of the distribution. Their bounded \([0, 1] \times [0, 1]\) domain makes them scale-invariant and visually consistent across different datasets. While QQ plots are generally preferred for detecting tail behavior, PP plots complement them by highlighting location shifts and central shape mismatches. Using both together gives a complete picture of distributional fit.