Basic Visualization with Matplotlib¶

Overview¶

Matplotlib is Python's foundational plotting library. It can produce publication-quality static figures in a wide variety of formats and integrates tightly with NumPy and pandas. This section covers the core plotting patterns used throughout the book for exploratory data analysis and statistical visualization.

import matplotlib.pyplot as plt
import numpy as np

The Figure–Axes Model¶

Every Matplotlib plot lives inside a Figure, which contains one or more Axes (individual plots). The recommended way to create figures is with plt.subplots().

fig, ax = plt.subplots()            # single plot
fig, axes = plt.subplots(1, 2)      # 1 row, 2 columns
fig, axes = plt.subplots(2, 3, figsize=(12, 6))  # 2×3 grid

Axes vs Axis

In Matplotlib terminology, an Axes object is an entire plot (with its own title, labels, and data). An axis (lowercase) refers to the x-axis or y-axis within that plot.

Line Plots¶

Line plots are the most basic visualization type and are commonly used for time series and function curves.

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

fig, ax = plt.subplots()
ax.plot(x, y, marker="o", linestyle="-", color="b", label="primes")

ax.set_title("Simple Line Plot")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.legend()
plt.show()

Plotting Mathematical Functions¶

x = np.linspace(-2 * np.pi, 2 * np.pi, 201)

fig, ax = plt.subplots(figsize=(10, 3))
ax.plot(x, np.sin(x), label="sin(x)")
ax.plot(x, np.cos(x), label="cos(x)", linestyle="--")

# Custom x-ticks with π labels
ax.set_xticks([-2*np.pi, -np.pi, 0, np.pi, 2*np.pi])
ax.set_xticklabels([r"$-2\pi$", r"$-\pi$", "0", r"$\pi$", r"$2\pi$"])

# Move spines to origin
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_position("zero")
ax.spines["left"].set_position("zero")

ax.legend()
plt.show()

Scatter Plots¶

Scatter plots visualize the relationship between two continuous variables—central to correlation analysis and regression diagnostics.

rng = np.random.default_rng(42)
x = rng.normal(0, 1, 100)
y = 2 * x + rng.normal(0, 0.5, 100)

fig, ax = plt.subplots()
ax.scatter(x, y, alpha=0.6, edgecolors="k", linewidths=0.5)
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_title("Scatter Plot")
plt.show()

Histograms¶

Histograms display the distribution of a single variable—the visual counterpart to density estimation.

data = rng.normal(loc=50, scale=10, size=500)

fig, ax = plt.subplots()
ax.hist(data, bins=25, edgecolor="black", alpha=0.7)
ax.set_xlabel("Value")
ax.set_ylabel("Frequency")
ax.set_title("Histogram")
plt.show()

Normalized Histogram with Density Overlay¶

from scipy.stats import norm

fig, ax = plt.subplots()
ax.hist(data, bins=30, density=True, alpha=0.6, edgecolor="black", label="data")

# Overlay theoretical density
x_grid = np.linspace(data.min(), data.max(), 200)
ax.plot(x_grid, norm.pdf(x_grid, loc=50, scale=10), "r-", lw=2, label="N(50, 10²)")

ax.set_xlabel("Value")
ax.set_ylabel("Density")
ax.legend()
plt.show()

Bar Charts¶

Bar charts compare categorical quantities—useful for frequency tables and group comparisons.

categories = ["A", "B", "C", "D"]
values = [23, 45, 12, 37]

fig, ax = plt.subplots()
ax.bar(categories, values, color="steelblue", edgecolor="black")
ax.set_xlabel("Category")
ax.set_ylabel("Count")
ax.set_title("Bar Chart")
plt.show()

Box Plots¶

Box plots summarize the five-number summary (min, Q1, median, Q3, max) and flag outliers.

groups = [rng.normal(0, 1, 100),
          rng.normal(1, 1.5, 100),
          rng.normal(-0.5, 0.8, 100)]

fig, ax = plt.subplots()
ax.boxplot(groups, labels=["Group A", "Group B", "Group C"])
ax.set_ylabel("Value")
ax.set_title("Box Plot Comparison")
plt.show()

Subplots¶

Subplots allow multiple plots to share a single figure, enabling side-by-side comparisons.

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Plot 1: Histogram
axes[0].hist(rng.normal(0, 1, 500), bins=25, edgecolor="black")
axes[0].set_title("Histogram")

# Plot 2: Scatter
x = rng.normal(0, 1, 100)
axes[1].scatter(x, x + rng.normal(0, 0.3, 100), alpha=0.6)
axes[1].set_title("Scatter")

# Plot 3: Line
t = np.linspace(0, 4 * np.pi, 200)
axes[2].plot(t, np.sin(t))
axes[2].set_title("Sine Wave")

fig.suptitle("Three Subplots", fontsize=14)
fig.tight_layout()
plt.show()

Customization Reference¶

Colors, Markers, and Line Styles¶

# Named colors: "steelblue", "coral", "seagreen", "slategray"
# Hex colors:   "#1f77b4"
# Markers:      "o", "s", "^", "D", "x", "+"
# Line styles:  "-", "--", "-.", ":"

Labels, Titles, and Legends¶

ax.set_title("Title", fontsize=14)
ax.set_xlabel("X Label", fontsize=12)
ax.set_ylabel("Y Label", fontsize=12)
ax.legend(loc="upper right", fontsize=10)
ax.set_xlim(0, 10)
ax.set_ylim(-1, 1)

Grid and Ticks¶

ax.grid(True, alpha=0.3)
ax.tick_params(axis="both", labelsize=10)

Plotting Directly from pandas¶

pandas DataFrames and Series have a built-in .plot() method that wraps Matplotlib, making quick exploratory plots convenient.

import pandas as pd

df = pd.DataFrame({
    "A": rng.normal(0, 1, 200),
    "B": rng.normal(1, 2, 200)
})

# Histogram of all columns
df.plot.hist(bins=30, alpha=0.5, edgecolor="black")

# Scatter plot
df.plot.scatter(x="A", y="B", alpha=0.5)

# Box plot
df.plot.box()

plt.show()

Saving Figures¶

fig.savefig("figure.png", dpi=150, bbox_inches="tight")
fig.savefig("figure.pdf", bbox_inches="tight")
fig.savefig("figure.svg", bbox_inches="tight")

The bbox_inches="tight" argument trims excess whitespace around the figure.

Statistical Plot Recipes¶

The following patterns recur throughout the book.

Empirical CDF¶

def plot_ecdf(data, ax, **kwargs):
    """Plot the empirical CDF of a 1-D array."""
    sorted_data = np.sort(data)
    ecdf = np.arange(1, len(sorted_data) + 1) / len(sorted_data)
    ax.step(sorted_data, ecdf, where="post", **kwargs)
    ax.set_ylabel("ECDF")

fig, ax = plt.subplots()
sample = rng.normal(0, 1, 300)
plot_ecdf(sample, ax, label="sample")
ax.legend()
plt.show()

Q-Q Plot (Manual)¶

from scipy.stats import norm

sample = np.sort(rng.normal(0, 1, 200))
theoretical = norm.ppf(np.linspace(0.005, 0.995, len(sample)))

fig, ax = plt.subplots()
ax.scatter(theoretical, sample, s=10, alpha=0.6)
lims = [min(theoretical.min(), sample.min()), max(theoretical.max(), sample.max())]
ax.plot(lims, lims, "r--", lw=1)
ax.set_xlabel("Theoretical Quantiles")
ax.set_ylabel("Sample Quantiles")
ax.set_title("Q-Q Plot")
plt.show()

Confidence Interval Visualization¶

means = [2.3, 3.1, 4.5]
ci_lower = [1.8, 2.5, 3.9]
ci_upper = [2.8, 3.7, 5.1]
labels = ["A", "B", "C"]

fig, ax = plt.subplots()
y_pos = range(len(means))
ax.errorbar(means, y_pos,
            xerr=[[m - lo for m, lo in zip(means, ci_lower)],
                  [hi - m for m, hi in zip(means, ci_upper)]],
            fmt="o", capsize=4)
ax.set_yticks(y_pos)
ax.set_yticklabels(labels)
ax.set_xlabel("Estimate")
ax.set_title("Confidence Intervals")
plt.show()

Summary¶

Plot Type	When to Use	Key Function
Line plot	Trends, time series, function curves	`ax.plot()`
Scatter plot	Bivariate relationships	`ax.scatter()`
Histogram	Distribution of a single variable	`ax.hist()`
Bar chart	Categorical comparisons	`ax.bar()`
Box plot	Five-number summary and outliers	`ax.boxplot()`
Subplots	Side-by-side comparisons	`plt.subplots(nrows, ncols)`
ECDF	Non-parametric distribution view	Custom `step` function
Q-Q plot	Normality assessment	`scatter` + reference line