Histogram Keywords¶
The ax.hist() method accepts various keyword arguments to customize histogram appearance and behavior.
Mental Model
The three most important histogram keywords are bins (how many divisions), density (counts vs. probability density), and histtype (bar style). bins controls resolution, density=True normalizes area to 1 for comparison with PDFs, and histtype='step' draws outlines instead of filled bars for overlaying multiple distributions.
Parameter Priority
Not all histogram parameters are equally important:
bins→ controls structure (most important — few bins = smooth, many = noisy)density→ controls interpretation (counts vs probability density)histtype/ styling → controls visual presentation (least important)
Bins = resolution of the distribution. This is the same resolution tradeoff that appears in 2D density plots (gridsize, bandwidth): too few bins hides real structure, too many creates noise.
Decision Guide — When to Use Which Option
| Situation | Parameter | Setting |
|---|---|---|
| Comparing distributions with different sample sizes | density |
True |
| Overlaying multiple histograms | histtype |
'step' or 'stepfilled' with alpha |
| Skewed data with long tails | bins |
Custom edges (e.g., log-spaced) |
| Side-by-side group comparison | multiple datasets |
Pass list to ax.hist() |
| Need exact bin boundaries | bins |
Explicit sequence [0, 10, 20, ...] |
| Want to see cumulative distribution | cumulative |
True |
bins¶
The bins parameter controls how data is grouped. It accepts either an integer (number of bins) or a sequence (explicit bin edges).
bins as int¶
When bins is an integer, matplotlib automatically calculates bin edges spanning the data range.
```python import matplotlib.pyplot as plt import numpy as np
def main(): # data generation n_samples = 10_000 data = np.random.randn(n_samples)
# plot histogram with integer bins
fig, ax = plt.subplots(figsize=(12, 4))
ax.hist(data, bins=100, alpha=0.2) # <--- int used
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.show()
if name == "main": main() ```
bins as sequence¶
When bins is a sequence, it defines the exact positions of bin edges, providing precise control over binning.
```python import matplotlib.pyplot as plt import numpy as np
def main(): # data generation n_samples = 10_000 data = np.random.randn(n_samples)
# plot histogram with sequence bins
fig, ax = plt.subplots(figsize=(12, 4))
ax.hist(data, bins=np.linspace(-4, 4, 100), alpha=0.2) # <--- sequence used
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.show()
if name == "main": main() ```
density¶
The density parameter controls whether the histogram shows counts or probability density.
density=False(default): y-axis shows raw countsdensity=True: y-axis shows probability density (area under histogram equals 1)
```python import matplotlib.pyplot as plt import numpy as np
def main(): # data generation n_samples = 10_000 data = np.random.randn(n_samples)
# plot histogram in density scale
fig, ax = plt.subplots()
ax.hist(data, bins=100, density=True, alpha=0.2) # <--- density=True
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
if name == "main": main() ```
histtype¶
The histtype parameter controls the visual style of the histogram.
| Value | Description |
|---|---|
'bar' |
Traditional bar-style histogram (default) |
'barstacked' |
Bar-style with stacked data |
'step' |
Unfilled line histogram |
'stepfilled' |
Filled line histogram |
Example: step histogram with PMF overlay¶
```python import matplotlib.pyplot as plt import numpy as np from scipy import stats
Set the seed for reproducibility¶
np.random.seed(42)
Parameters for the binomial distribution¶
num_trials = 10 success_probability = 0.6
Number of random samples to generate¶
num_samples = 100
Generate random samples from the binomial distribution¶
samples = stats.binom(num_trials, success_probability).rvs(num_samples)
Possible outcomes from 0 successes to 'num_trials' successes¶
outcomes = np.arange(num_trials + 1)
Probability mass function values for each outcome¶
pmf_values = stats.binom(num_trials, success_probability).pmf(outcomes)
Set up the plot¶
fig, ax = plt.subplots(figsize=(12, 3))
Bar plot to show the probability mass function¶
ax.bar(outcomes, pmf_values, alpha=0.2, color='red', label='Binomial PMF')
Histogram of the sampled data with step style¶
ax.hist(samples, bins=np.arange(num_trials + 2) - 0.5, density=True, histtype='step', label='Sampled Data Histogram')
Adding labels and legend¶
ax.set_xlabel('Number of Successes') ax.set_ylabel('Probability') ax.legend()
plt.show() ```
Documentation¶
Exercises¶
Exercise 1. Write code demonstrating the histtype parameter by creating a 2x2 subplot grid showing the same data with histtype='bar', 'barstacked', 'step', and 'stepfilled'.
Solution to Exercise 1
```python import matplotlib.pyplot as plt import numpy as np
np.random.seed(42) data = np.random.normal(5, 2, 1000)
fig, ax = plt.subplots() ax.hist(data, bins=30, density=True, alpha=0.7, color='steelblue', edgecolor='black') ax.set_xlabel('Value') ax.set_ylabel('Density') ax.set_title('Histogram') plt.show() ```
Exercise 2. Explain the difference between bins=30 (integer) and bins='auto' in ax.hist(). What algorithm does 'auto' use?
Solution to Exercise 2
ax.hist() returns (n, bins, patches) where n has shape (20,) containing the count or density in each bin, bins has shape (21,) containing the bin edges (one more than the number of bins), and patches is a list of 20 Rectangle objects.
Exercise 3. Create a histogram with custom bin edges using bins=[0, 1, 2, 5, 10, 20] (non-uniform spacing). Explain when non-uniform bins are useful.
Solution to Exercise 3
```python import matplotlib.pyplot as plt import numpy as np
np.random.seed(42) normal_data = np.random.normal(0, 1, 1000) exp_data = np.random.exponential(1, 1000)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.hist(normal_data, bins=30, density=True, alpha=0.7, color='steelblue') ax1.set_title('Normal Distribution')
ax2.hist(exp_data, bins=30, density=True, alpha=0.7, color='coral') ax2.set_title('Exponential Distribution')
plt.tight_layout() plt.show() ```
Exercise 4. Write code that creates a cumulative histogram using cumulative=True and overlays the theoretical CDF for the same distribution.
Solution to Exercise 4
```python import matplotlib.pyplot as plt import numpy as np
np.random.seed(42) data1 = np.random.normal(0, 1, 1000) data2 = np.random.normal(3, 1, 1000)
fig, ax = plt.subplots() ax.hist(data1, bins=30, alpha=0.5, label='N(0, 1)', color='blue') ax.hist(data2, bins=30, alpha=0.5, label='N(3, 1)', color='red') ax.legend() ax.set_title('Overlaid Histograms') plt.show() ```