Skip to content

Normality Visualization

Many statistical methods -- \(t\)-tests, ANOVA, linear regression -- assume that data or residuals follow a normal distribution. While formal tests like Shapiro-Wilk provide a binary decision, visual methods reveal the nature and severity of departures from normality. A histogram might show skewness, a QQ plot might show heavy tails, and these patterns inform which remedies (transformations, nonparametric alternatives) are appropriate. This page demonstrates the primary visual techniques for assessing normality.

Mental Model

A normality check is like fitting a template over your data. The histogram reveals gross shape mismatches (skewness, bimodality), while the QQ plot magnifies subtle tail deviations that the histogram hides. Use both together: the histogram for the big picture, the QQ plot for the fine details.


Histogram with Normal Overlay

The most intuitive check overlays the fitted normal density \(f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/(2\sigma^2)}\) on a normalized histogram of the data.

```python import numpy as np import matplotlib.pyplot as plt from scipy import stats

np.random.seed(42) data = np.random.normal(loc=50, scale=10, size=300)

Fit normal distribution

mu, sigma = stats.norm.fit(data) x = np.linspace(data.min() - 5, data.max() + 5, 200) pdf = stats.norm.pdf(x, mu, sigma)

plt.hist(data, bins=25, density=True, alpha=0.6, label='Data') plt.plot(x, pdf, 'r-', linewidth=2, label=f'Normal(\(\\mu\)={mu:.1f}, \(\\sigma\)={sigma:.1f})') plt.xlabel('Value') plt.ylabel('Density') plt.title('Histogram with Fitted Normal Density') plt.legend() plt.show() ```

Interpreting the Histogram

Look for symmetry around the peak, bell-shaped curvature, and tails that taper smoothly. Skewness appears as asymmetry; heavy tails appear as excess density far from the center.


Normal Probability Plot (QQ Plot)

The normal QQ plot plots sample quantiles against theoretical normal quantiles. If the data are normally distributed, the points follow a straight line. scipy.stats.probplot computes both the quantiles and a fitted line.

python fig, ax = plt.subplots() stats.probplot(data, dist="norm", plot=ax) ax.set_title('Normal QQ Plot') plt.show()

Common deviation patterns:

Pattern Interpretation
S-shaped curve Heavy tails (leptokurtic)
Inverted S-shape Light tails (platykurtic)
Points curve upward at both ends Right skew
Points curve downward at both ends Left skew
Points follow the line Approximately normal

ECDF vs Normal CDF

Comparing the empirical CDF against the theoretical normal CDF provides a non-binned view of normality. The maximum vertical gap between the two curves is the Kolmogorov-Smirnov statistic.

```python data_sorted = np.sort(data) ecdf = np.arange(1, len(data) + 1) / len(data) cdf_normal = stats.norm.cdf(data_sorted, loc=mu, scale=sigma)

plt.step(data_sorted, ecdf, where='post', label='Empirical CDF') plt.plot(data_sorted, cdf_normal, 'r-', label='Normal CDF') plt.xlabel('Value') plt.ylabel('Cumulative Probability') plt.title('ECDF vs Normal CDF') plt.legend() plt.show()

Quantify the maximum deviation

ks_stat, p_value = stats.kstest(data, 'norm', args=(mu, sigma)) print(f"KS statistic: {ks_stat:.4f}, p-value: {p_value:.4f}") ```


Box Plot

A box plot summarizes the distribution through its quartiles and flags observations beyond the whiskers as potential outliers. For normal data, the box is approximately symmetric around the median, and few points appear beyond the whiskers.

```python fig, axes = plt.subplots(1, 2, figsize=(10, 4))

Normal data

axes[0].boxplot(data, vert=True) axes[0].set_title('Normal Data')

Skewed data for contrast

skewed = stats.lognorm.rvs(s=0.5, size=300, random_state=42) axes[1].boxplot(skewed, vert=True) axes[1].set_title('Right-Skewed Data')

plt.tight_layout() plt.show() ```


Skewness and Kurtosis Indicators

Quantitative measures of shape complement the visual assessment. For a normal distribution, skewness is \(0\) and excess kurtosis is \(0\).

```python skew = stats.skew(data) kurt = stats.kurtosis(data) # Excess kurtosis (Fisher) print(f"Skewness: {skew:.3f} (normal = 0)") print(f"Excess kurtosis: {kurt:.3f} (normal = 0)")

D'Agostino-Pearson test combines skewness and kurtosis

stat, p_value = stats.normaltest(data) print(f"D'Agostino-Pearson: statistic={stat:.3f}, p={p_value:.4f}") ```

Kurtosis Convention

scipy.stats.kurtosis returns excess kurtosis by default (Fisher's definition), so the expected value for a normal distribution is \(0\), not \(3\). Set fisher=False to get the Pearson definition where the normal value is \(3\).


Comparing Normal vs Non-Normal Data

Viewing multiple diagnostic plots side by side clarifies how departures from normality manifest across different visualization methods.

```python fig, axes = plt.subplots(2, 3, figsize=(14, 8))

for col, (label, sample) in enumerate([ ('Normal', np.random.normal(0, 1, 500)), ('Right-skewed', stats.lognorm.rvs(s=0.8, size=500, random_state=1)), ('Heavy-tailed', stats.t.rvs(df=3, size=500, random_state=2)), ]): # Histogram axes[0, col].hist(sample, bins=30, density=True, alpha=0.6) mu_fit, sigma_fit = stats.norm.fit(sample) x = np.linspace(sample.min(), sample.max(), 100) axes[0, col].plot(x, stats.norm.pdf(x, mu_fit, sigma_fit), 'r-') axes[0, col].set_title(f'{label} - Histogram')

# QQ plot
stats.probplot(sample, dist="norm", plot=axes[1, col])
axes[1, col].set_title(f'{label} - QQ Plot')

plt.tight_layout() plt.show() ```


Summary

Visual normality assessment reveals not just whether data are non-normal but how they deviate. Histograms show overall shape, QQ plots diagnose tail behavior and skewness with high sensitivity, and ECDF comparisons provide a complete non-binned view. Box plots highlight asymmetry and outliers at a glance, while skewness and kurtosis values quantify what the plots reveal. Using these methods together gives a thorough picture that guides the choice between parametric methods that assume normality and robust or nonparametric alternatives that do not.


Exercises

Exercise 1. Write code that creates a figure with three diagnostic plots for normality: histogram with normal overlay, Q-Q plot, and box plot.

Solution to Exercise 1

```python import numpy as np from scipy import stats

np.random.seed(42) data = np.random.randn(100) print(f'Mean: {data.mean():.4f}') print(f'Std: {data.std():.4f}') ```


Exercise 2. Explain three visual indicators of non-normality in a Q-Q plot (heavy tails, skewness, multimodality).

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the statistical method and its assumptions.


Exercise 3. Write code that generates data from a heavy-tailed distribution (e.g., t-distribution with 3 df) and shows its departure from normality visually.

Solution to Exercise 3

```python import numpy as np from scipy import stats import matplotlib.pyplot as plt

np.random.seed(42) data = np.random.randn(1000) fig, ax = plt.subplots() ax.hist(data, bins=30, density=True, alpha=0.7) ax.set_title('Distribution') plt.show() ```


Exercise 4. Create a function that takes a data array and produces a 4-panel normality diagnostic figure.

Solution to Exercise 4

```python import numpy as np from scipy import stats

np.random.seed(42) data = np.random.randn(500) result = stats.describe(data) print(result) ```