PP Plots¶

A probability-probability (PP) plot compares the empirical cumulative distribution function of a sample against a theoretical CDF by plotting their values at each data point. While QQ plots compare quantiles and are more sensitive to tail deviations, PP plots compare cumulative probabilities and are more sensitive to deviations near the center of the distribution. This page explains how PP plots are constructed, how to interpret common patterns, and how to build them with SciPy.

Construction¶

Given an ordered sample \(x_{(1)} \le x_{(2)} \le \cdots \le x_{(n)}\) and a candidate distribution with CDF \(F\), a PP plot displays the points

\[ \left(\frac{i}{n},\; F(x_{(i)})\right), \quad i = 1, 2, \ldots, n \]

The horizontal axis shows the empirical cumulative probability \(\hat{F}_n(x_{(i)}) = i/n\), and the vertical axis shows the theoretical cumulative probability \(F(x_{(i)})\). If the data come from the distribution \(F\), the points cluster along the identity line \(y = x\).

Plotting Position Adjustment

A common refinement replaces \(i/n\) with a plotting position such as \((i - 0.5)/n\) or \(i/(n+1)\) to avoid probabilities of exactly 0 and 1 at the boundaries.

Building a PP Plot¶

SciPy does not provide a dedicated PP plot function, but constructing one requires only the CDF and sorted data.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)
data = np.random.normal(loc=5, scale=2, size=200)

# Fit normal distribution
mu, sigma = stats.norm.fit(data)

# PP plot
data_sorted = np.sort(data)
n = len(data_sorted)
empirical_cdf = (np.arange(1, n + 1) - 0.5) / n  # Plotting position
theoretical_cdf = stats.norm.cdf(data_sorted, loc=mu, scale=sigma)

plt.figure(figsize=(6, 6))
plt.scatter(empirical_cdf, theoretical_cdf, s=10, alpha=0.6)
plt.plot([0, 1], [0, 1], 'r--', label='Identity line')
plt.xlabel('Empirical Cumulative Probability')
plt.ylabel('Theoretical Cumulative Probability')
plt.title('PP Plot (Normal Fit)')
plt.legend()
plt.axis('equal')
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.show()

Interpreting PP Plots¶

Deviations from the identity line indicate specific types of misfit between the data and the candidate distribution.

Pattern	Interpretation
Points follow \(y = x\)	Good fit
S-shaped curve (below then above)	Data have lighter tails than the theoretical distribution
Inverted S-shape (above then below)	Data have heavier tails than the theoretical distribution
Points consistently above the line	Theoretical distribution is shifted right relative to the data
Points consistently below the line	Theoretical distribution is shifted left relative to the data

PP Plot vs QQ Plot¶

PP plots and QQ plots emphasize different aspects of distributional fit.

Feature	PP Plot	QQ Plot
Axes	Cumulative probabilities (0 to 1)	Quantile values (data scale)
Sensitivity	Center of distribution	Tails of distribution
Scale invariance	Yes (probabilities are unitless)	No (quantiles are in data units)
Best for	Detecting location/shape differences	Detecting tail heaviness, skewness

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# PP plot
axes[0].scatter(empirical_cdf, theoretical_cdf, s=10, alpha=0.6)
axes[0].plot([0, 1], [0, 1], 'r--')
axes[0].set_xlabel('Empirical CDF')
axes[0].set_ylabel('Theoretical CDF')
axes[0].set_title('PP Plot')
axes[0].set_aspect('equal')

# QQ plot
stats.probplot(data, dist="norm", plot=axes[1])
axes[1].set_title('QQ Plot')

plt.tight_layout()
plt.show()

PP Plot for Non-Normal Distributions¶

PP plots work with any continuous distribution that has a CDF available in SciPy.

# Generate exponential data
data_exp = stats.expon.rvs(scale=3, size=300, random_state=42)

# PP plot against exponential fit
loc_fit, scale_fit = stats.expon.fit(data_exp)
sorted_exp = np.sort(data_exp)
n = len(sorted_exp)
emp_cdf = (np.arange(1, n + 1) - 0.5) / n
theo_cdf = stats.expon.cdf(sorted_exp, loc=loc_fit, scale=scale_fit)

plt.figure(figsize=(6, 6))
plt.scatter(emp_cdf, theo_cdf, s=10, alpha=0.6)
plt.plot([0, 1], [0, 1], 'r--', label='Identity line')
plt.xlabel('Empirical CDF')
plt.ylabel('Exponential CDF')
plt.title('PP Plot (Exponential Fit)')
plt.legend()
plt.axis('equal')
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.show()

Summary¶

PP plots compare empirical and theoretical cumulative probabilities, providing a diagnostic that is most sensitive to deviations near the center of the distribution. Their bounded \([0, 1] \times [0, 1]\) domain makes them scale-invariant and visually consistent across different datasets. While QQ plots are generally preferred for detecting tail behavior, PP plots complement them by highlighting location shifts and central shape mismatches. Using both together gives a complete picture of distributional fit.