Histogram and Density Plots¶
Graphical methods offer a visual approach to assessing whether a dataset follows a normal distribution. While these methods are not formal statistical tests, they provide insights that are useful in understanding data distribution.
Overview¶
A histogram is a graphical representation of a dataset's distribution. It divides the data into bins and shows how frequently data points fall into each bin. When the data is normally distributed, the histogram should approximate the familiar bell-shaped curve. A density plot is similar but provides a smooth curve representing the distribution.
Normal Samples with Normal PDF¶
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
def plot_histogram_with_density(data, figsize=(12, 3)):
"""
The histogram will show the frequency of data points,
while the **kernel density estimate (KDE)** line will smooth the histogram
to give a clearer idea of the data distribution.
Parameters:
- data (array-like): The input dataset to plot.
- figsize (tuple): The size of the plot (width, height).
Returns:
- None: Displays the plot.
"""
# Create the figure and axis
fig, ax = plt.subplots(figsize=figsize)
# Plot the histogram with the density curve (KDE)
_, bins, _ = ax.hist(data, bins=20, density=True, alpha=0.5, label="Data Histogram")
mu = data.mean()
sigma = data.std()
pdf = stats.norm(loc=mu, scale=sigma).pdf(bins)
ax.plot(bins, pdf, "--r", label="Normal PDF")
# Customize the appearance: remove top and right spines
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
# Set plot title and labels
ax.set_title('Histogram with Density Plot')
ax.set_xlabel('Value')
ax.set_ylabel('Density')
ax.legend()
plt.show()
if __name__ == "__main__":
np.random.seed(0)
sample_data = np.random.normal(loc=0, scale=1, size=1000)
plot_histogram_with_density(sample_data)
When the data is drawn from a normal distribution, the histogram closely matches the overlaid normal PDF curve.
Exponential Samples with Normal PDF¶
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
def plot_histogram_with_density(data, figsize=(12, 3)):
"""
Plot histogram with a fitted normal PDF overlay.
"""
fig, ax = plt.subplots(figsize=figsize)
_, bins, _ = ax.hist(data, bins=20, density=True, alpha=0.5, label="Data Histogram")
mu = data.mean()
sigma = data.std()
pdf = stats.norm(loc=mu, scale=sigma).pdf(bins)
ax.plot(bins, pdf, "--r", label="Normal PDF")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.set_title('Histogram with Density Plot')
ax.set_xlabel('Value')
ax.set_ylabel('Density')
ax.legend()
plt.show()
if __name__ == "__main__":
np.random.seed(0)
sample_data = np.random.exponential(scale=1, size=1000)
plot_histogram_with_density(sample_data)
For exponential data, the histogram is strongly right-skewed and clearly does not match the symmetric normal PDF curve.
Chi-Square Samples with Normal PDF¶
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
def plot_histogram_with_density(data, figsize=(12, 3)):
"""
Plot histogram with a fitted normal PDF overlay.
"""
fig, ax = plt.subplots(figsize=figsize)
_, bins, _ = ax.hist(data, bins=20, density=True, alpha=0.5, label="Data Histogram")
mu = data.mean()
sigma = data.std()
pdf = stats.norm(loc=mu, scale=sigma).pdf(bins)
ax.plot(bins, pdf, "--r", label="Normal PDF")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.set_title('Histogram with Density Plot')
ax.set_xlabel('Value')
ax.set_ylabel('Density')
ax.legend()
plt.show()
if __name__ == "__main__":
np.random.seed(0)
sample_data = np.random.chisquare(df=10, size=1000)
plot_histogram_with_density(sample_data)
Chi-square data with moderate degrees of freedom is moderately right-skewed. The normal PDF provides a rough but imperfect fit, illustrating the importance of formal tests beyond visual inspection.