Skip to content

Box Plot Anatomy

Understanding the visual components of a box plot is essential for proper interpretation of distributional data.

Mental Model

Read a box plot from inside out: the median line shows center, the box shows where the middle half of data lives (IQR), the whiskers reach to the farthest non-outlier points, and individual dots beyond the whiskers flag unusual values. Each component maps to a specific statistical summary, so one glance tells you center, spread, and skewness.

Visual Cues for Quick Reading

You can extract key distributional features at a glance:

  • Long whisker on one side → skewed distribution (tail extends that direction)
  • Many outlier dots → heavy tails or data quality issues
  • Wide box → high variability (large IQR)
  • Narrow box → tightly concentrated data
  • Median line off-center in box → skewness (median closer to Q1 = right-skewed)

Visual Components

A box plot consists of five primary visual elements that summarize the distribution.

1. The Box

The rectangular box spans from the first quartile (Q1, 25th percentile) to the third quartile (Q3, 75th percentile). This range is called the Interquartile Range (IQR).

python IQR = Q3 - Q1

2. The Median Line

The horizontal line inside the box represents the median (Q2, 50th percentile). Its position within the box indicates skewness.

3. The Whiskers

Vertical lines extend from the box to show the range of non-outlier data. By default, whiskers extend to 1.5 × IQR from the box edges.

python lower_whisker = Q1 - 1.5 * IQR upper_whisker = Q3 + 1.5 * IQR

4. The Fliers (Outliers)

Points beyond the whiskers are plotted individually as outliers (fliers). These represent extreme values in the distribution.

5. The Caps

Short horizontal lines at whisker ends mark the extent of non-outlier data.

Statistical Interpretation

The box plot encodes the five-number summary plus outlier detection.

1. Five-Number Summary

```python import numpy as np

data = np.random.normal(100, 15, 200)

minimum = np.min(data) q1 = np.percentile(data, 25) median = np.percentile(data, 50) q3 = np.percentile(data, 75) maximum = np.max(data) ```

2. Spread Indicators

The box width (IQR) shows the middle 50% of data. Tall boxes indicate high variability; short boxes indicate consistency.

3. Skewness Detection

When the median line is not centered in the box, the distribution is skewed. Median closer to Q1 indicates right skew; closer to Q3 indicates left skew.

Comparison with Histogram

Box plots and histograms both show distributions but emphasize different aspects.

1. Box Plot Strengths

Box plots excel at comparing multiple distributions side-by-side, identifying outliers, and showing quartile information compactly.

2. Histogram Strengths

Histograms reveal the shape of the distribution, multimodality, and density patterns that box plots cannot show.

3. Combined View

```python import matplotlib.pyplot as plt import numpy as np

np.random.seed(42) data = np.random.normal(100, 15, 200)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

ax1.hist(data, bins=20, edgecolor='black', alpha=0.7) ax1.set_title('Histogram')

ax2.boxplot(data) ax2.set_title('Box Plot')

plt.tight_layout() plt.show() ```


Exercises

Exercise 1. Create a box plot from 100 samples of a normal distribution and manually annotate the five key components (minimum, Q1, median, Q3, maximum) using ax.annotate with arrows pointing to each part. Print the actual values of Q1, median, Q3, and the IQR.

Solution to Exercise 1
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data = np.random.randn(100)

q1 = np.percentile(data, 25)
median = np.median(data)
q3 = np.percentile(data, 75)
iqr = q3 - q1
whisker_low = max(data.min(), q1 - 1.5 * iqr)
whisker_high = min(data.max(), q3 + 1.5 * iqr)

print(f"Q1={q1:.3f}, Median={median:.3f}, Q3={q3:.3f}, IQR={iqr:.3f}")

fig, ax = plt.subplots(figsize=(6, 8))
ax.boxplot(data)

annotations = [
    (whisker_low, 'Min (whisker)'),
    (q1, 'Q1 (25th percentile)'),
    (median, 'Median'),
    (q3, 'Q3 (75th percentile)'),
    (whisker_high, 'Max (whisker)'),
]

for val, label in annotations:
    ax.annotate(f'{label}: {val:.2f}', xy=(1, val), xytext=(1.4, val),
                fontsize=8, arrowprops=dict(arrowstyle='->', color='red'))

ax.set_title('Box Plot Anatomy')
ax.set_xlim(0.5, 2.5)
plt.show()

Exercise 2. Generate data with known outliers: 100 samples from N(0, 1) plus 5 manually added outliers at values [-4, -3.5, 3.5, 4, 5]. Create a box plot and use the returned dictionary to access the outlier points (fliers). Print the number of detected outliers and their values.

Solution to Exercise 2
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data = np.concatenate([np.random.randn(100), [-4, -3.5, 3.5, 4, 5]])

fig, ax = plt.subplots(figsize=(6, 6))
bp = ax.boxplot(data)

fliers = bp['fliers'][0]
outlier_values = fliers.get_ydata()
print(f"Number of outliers: {len(outlier_values)}")
print(f"Outlier values: {outlier_values}")

ax.set_title(f'Box Plot with {len(outlier_values)} Outliers')
plt.show()

Exercise 3. Create a side-by-side comparison showing the same data with whis=1.5 (default) and whis=3.0. Use 500 samples from a standard normal distribution. Annotate the whisker extents on each plot and explain how changing whis affects outlier detection.

Solution to Exercise 3
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data = np.random.randn(500)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))

bp1 = ax1.boxplot(data, whis=1.5)
n_outliers_1 = len(bp1['fliers'][0].get_ydata())
ax1.set_title(f'whis=1.5 (default)\n{n_outliers_1} outliers')

bp2 = ax2.boxplot(data, whis=3.0)
n_outliers_2 = len(bp2['fliers'][0].get_ydata())
ax2.set_title(f'whis=3.0\n{n_outliers_2} outliers')

plt.suptitle('Effect of whis Parameter on Outlier Detection')
plt.tight_layout()
plt.show()