Basic Box Plot¶
Box plots (box-and-whisker plots) visualize the distribution of data through quartiles, providing a compact summary of central tendency, spread, and outliers.
Mental Model
A box plot compresses an entire distribution into five numbers: minimum, Q1, median, Q3, and maximum. The box spans the interquartile range (middle 50% of data), the line inside marks the median, and whiskers extend to the extremes. Points beyond the whiskers are outliers, making them easy to spot at a glance.
Distribution Visualization Model
All distribution plots answer the same question at different levels of detail:
| Level of detail | Plot type | What it shows |
|---|---|---|
| Summary (5 numbers) | Box plot | Center, spread, outliers |
| Full shape | Histogram / Violin | Density, skewness, multimodality |
| Exact points | Scatter / Strip | Individual observations |
Box plots are powerful for comparison across groups, but they compress data — a bimodal distribution and a unimodal one with the same quartiles produce identical box plots. When shape matters, pair with a histogram or violin.
Decision guide:
- Compare groups compactly → box plot
- Reveal distributional shape → histogram or violin
- Show raw data alongside summary → combine (box + scatter, violin + box)
Different plots reveal different aspects of the same distribution — combining them gives the reader both the quick summary and the full picture.
Single Data Set¶
The simplest box plot displays one distribution using ax.boxplot().
1. Import and Setup¶
python
import matplotlib.pyplot as plt
import numpy as np
2. Generate Data¶
python
np.random.seed(42)
data = np.random.normal(100, 15, 200)
3. Create Box Plot¶
python
fig, ax = plt.subplots()
ax.boxplot(data)
ax.set_ylabel('Value')
ax.set_title('Basic Box Plot')
plt.show()
Multiple Data Sets¶
Compare multiple distributions side by side by passing a list of arrays.
1. Prepare Multiple Arrays¶
python
np.random.seed(42)
data1 = np.random.normal(100, 10, 200)
data2 = np.random.normal(90, 20, 200)
data3 = np.random.normal(110, 15, 200)
2. Pass as List¶
python
fig, ax = plt.subplots()
ax.boxplot([data1, data2, data3])
ax.set_xticklabels(['Group A', 'Group B', 'Group C'])
ax.set_ylabel('Value')
ax.set_title('Comparing Distributions')
plt.show()
3. Interpret Results¶
Each box represents one distribution. Boxes at different heights indicate different medians. Wider boxes (taller IQR) indicate greater variability.
Method Signature¶
The ax.boxplot() method accepts various input formats.
1. Single Array¶
python
ax.boxplot(data) # One box
2. List of Arrays¶
python
ax.boxplot([data1, data2, data3]) # Multiple boxes
3. 2D Array¶
python
data_2d = np.random.randn(100, 4)
ax.boxplot(data_2d) # Each column becomes a box
Exercises¶
Exercise 1. Generate three datasets from different distributions: normal (mean=0, std=1), uniform (low=-2, high=2), and exponential (scale=1) with 200 samples each. Create side-by-side box plots for all three and label each with the distribution name.
Solution to Exercise 1
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
normal_data = np.random.normal(0, 1, 200)
uniform_data = np.random.uniform(-2, 2, 200)
exponential_data = np.random.exponential(1, 200)
fig, ax = plt.subplots(figsize=(8, 5))
ax.boxplot([normal_data, uniform_data, exponential_data],
labels=['Normal(0,1)', 'Uniform(-2,2)', 'Exponential(1)'])
ax.set_ylabel('Value')
ax.set_title('Distribution Comparison')
plt.show()
Exercise 2.
Create a horizontal box plot of test scores for four classes. Generate data with np.random.normal using means [70, 75, 80, 85] and standard deviation 10 for each class (100 students each). Add class labels and a vertical line at the overall mean.
Solution to Exercise 2
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
means = [70, 75, 80, 85]
data = [np.random.normal(m, 10, 100) for m in means]
overall_mean = np.mean([d.mean() for d in data])
fig, ax = plt.subplots(figsize=(8, 5))
ax.boxplot(data, vert=False, labels=['Class A', 'Class B', 'Class C', 'Class D'])
ax.axvline(x=overall_mean, color='red', linestyle='--', label=f'Overall Mean = {overall_mean:.1f}')
ax.set_xlabel('Test Score')
ax.set_title('Test Scores by Class')
ax.legend()
plt.show()
Exercise 3.
Create a box plot comparing the distributions of sin(x), cos(x), and tan(x) evaluated at 1000 random points uniformly distributed in \([0, 2\pi]\). Clip the tan(x) values to \([-10, 10]\). Show notched box plots to indicate confidence intervals around the median.
Solution to Exercise 3
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
x = np.random.uniform(0, 2 * np.pi, 1000)
sin_vals = np.sin(x)
cos_vals = np.cos(x)
tan_vals = np.clip(np.tan(x), -10, 10)
fig, ax = plt.subplots(figsize=(8, 5))
ax.boxplot([sin_vals, cos_vals, tan_vals],
labels=['sin(x)', 'cos(x)', 'tan(x)'],
notch=True)
ax.set_ylabel('Value')
ax.set_title('Trig Function Distributions (Notched)')
plt.show()