Skip to content

Boxplots and Their Interpretation

Overview

A boxplot summarizes the distribution by showing the median, quartiles, and potential outliers. While not explicitly designed for normality testing, boxplots can hint at skewness and whether the data is symmetric, which are characteristics of normally distributed data.

The boxplot in a normally distributed dataset will be symmetric, and the whiskers (indicating the data range) will be roughly the same length on both sides.

Box Plot with Normal Distribution

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

def plot_horizontal_boxplot(data, figsize=(12, 1)):
    """
    Generates a horizontal boxplot for the given data and customizes the appearance
    by removing unnecessary spines.

    Parameters:
    - data (array-like): The input dataset to plot.
    - figsize (tuple): The size of the plot (width, height).

    Returns:
    - None: Displays the horizontal boxplot.
    """
    warnings.simplefilter(action='ignore', category=FutureWarning)

    fig, ax = plt.subplots(figsize=figsize)
    sns.boxplot(data=data, orient='h', ax=ax)

    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)
    ax.spines["left"].set_visible(False)

    ax.set_title('Horizontal Boxplot')
    plt.show()

if __name__ == "__main__":
    np.random.seed(0)
    sample_data = np.random.normal(loc=0, scale=1, size=1000)
    plot_horizontal_boxplot(sample_data)

For normally distributed data, the boxplot is symmetric: the median line is centered in the box, and the whiskers extend approximately equally on both sides.

Box Plot with Exponential Distribution

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

def plot_horizontal_boxplot(data, figsize=(12, 1)):
    """
    Generates a horizontal boxplot for the given data.
    """
    warnings.simplefilter(action='ignore', category=FutureWarning)

    fig, ax = plt.subplots(figsize=figsize)
    sns.boxplot(data=data, orient='h', ax=ax)

    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)
    ax.spines["left"].set_visible(False)

    ax.set_title('Horizontal Boxplot')
    plt.show()

if __name__ == "__main__":
    np.random.seed(0)
    sample_data = np.random.exponential(scale=1, size=1000)
    plot_horizontal_boxplot(sample_data)

For exponential data, the boxplot is clearly asymmetric: the right whisker extends much farther than the left, and multiple outliers appear on the right side, indicating strong positive skew.

Box Plot with Chi-Square Distribution

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

def plot_horizontal_boxplot(data, figsize=(12, 1)):
    """
    Generates a horizontal boxplot for the given data.
    """
    warnings.simplefilter(action='ignore', category=FutureWarning)

    fig, ax = plt.subplots(figsize=figsize)
    sns.boxplot(data=data, orient='h', ax=ax)

    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)
    ax.spines["left"].set_visible(False)

    ax.set_title('Horizontal Boxplot')
    plt.show()

if __name__ == "__main__":
    np.random.seed(0)
    sample_data = np.random.chisquare(df=10, size=1000)
    plot_horizontal_boxplot(sample_data)

Chi-square data with 10 degrees of freedom shows moderate right-skew in the boxplot: the median is shifted left within the box, and the right whisker is longer than the left.

Limitations of Graphical Methods

While graphical methods are helpful for visually assessing normality, they are subjective and rely on interpretation. Small deviations from normality might not be noticeable, and different users may interpret the same plot differently. Moreover, graphical methods are less effective for small sample sizes, where the variability in the data can obscure patterns.