Standard Error¶

Overview¶

References: YouTube — Standard Error | Blog — SD vs SE

The standard error (SE) quantifies how much a sample statistic varies from sample to sample. It is the standard deviation of the sampling distribution of that statistic.

Standard Deviation vs Standard Error¶

Standard Deviation (SD)¶

The SD measures how spread out individual observations are around the population mean:

\[ \text{SD} = \sqrt{\text{Var}(X)} \]

Standard Error (SE)¶

The SE measures how spread out a sample statistic is around the true parameter:

\[ \text{SE} = \sqrt{\text{Var}(\hat{\theta}(X_1, \dots, X_n))} \]

Key Distinction¶

Standard deviation measures the spread of individual data points around the mean.

\[ \text{SD} = \sqrt{\text{Var}(X)} \]

Standard error measures the spread of sample statistics (e.g., means) around the population parameter.

\[ \text{SE} = \sqrt{\text{Var}(\hat{\theta}(X_1, \dots, X_n))} \]

	Standard Deviation	Standard Error
Measures	Spread of individual data	Spread of a sample statistic
Depends on	Population variability	Population variability and sample size
Formula (for \(\bar{X}\))	\(\sigma\)	\(\sigma / \sqrt{n}\)
Decreases with \(n\)?	No	Yes

The Standardization Pattern¶

Reference: Khan Academy — Standard Error of the Mean

A unifying pattern in inferential statistics:

\[ \begin{array}{lllllll} \displaystyle \frac{\text{unbiased\_estimator} - \text{parameter}}{\text{standard\_error}} &=& \displaystyle \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} &\approx& \displaystyle \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}} &\approx& z \;\text{ or }\; t_{n-1} \\[16pt] \displaystyle \frac{\text{unbiased\_estimator} - \text{parameter}}{\text{standard\_error}} &=& \displaystyle \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} &\approx& \displaystyle \frac{\hat{p} - p}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}} &\approx& z \end{array} \]

Example: Running Out of Water¶

Reference: Khan Academy — Sampling Distribution Example Problem

Problem. On average, a male drinks 2 liters of water when active outdoors, with a standard deviation of 0.7 liters. For a full-day nature trip of 50 men, we will bring 110 liters of water along. Determine the probability of running out of water during the trip.

Solution. Let \(X_i\) be the water consumption of the \(i\)-th person. Assuming independence, by the CLT the sample mean \(\bar{X}\) is approximately normally distributed with mean 2 and standard deviation \(0.7/\sqrt{50} \approx 0.0990\).

\[ \begin{array}{lll} \displaystyle P\!\left(\bar{X} > \frac{110}{50}\right) &=& \displaystyle P\!\left(\frac{\bar{X} - 2}{0.0990} > \frac{2.2 - 2}{0.0990}\right) \\[12pt] &\approx& \displaystyle P(Z > 2.020) \\[8pt] &\approx& 0.0217 \end{array} \]

Python: Standard Error of X-bar¶

Standalone Version¶

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(0)

def main():
    X_bar = []
    for _ in range(10_000):
        x = np.random.uniform(size=(5,))
        x_bar = x.mean()
        X_bar.append(x_bar)

    average = np.array(X_bar).mean()  # very good estimate of mu
    standard_error = np.array(X_bar).std()

    print(f'(Estimated) Mean of X_bar : {average:.4}')
    print(f'Standard Error   of X_bar : {standard_error:.4}')

    fig, ax = plt.subplots(figsize=(12, 3))

    ax.set_title("Sampling Distribution of X_bar", fontsize=20)

    ax.hist(X_bar, bins=100, density=True, alpha=0.3)
    ax.vlines(average, ymin=0, ymax=5, alpha=1.0, color='k', ls='-', lw=5)
    ax.vlines(average + standard_error, ymin=0, ymax=5, alpha=0.7, color='k', ls='--')
    ax.vlines(average - standard_error, ymin=0, ymax=5, alpha=0.7, color='k', ls='--')

    arrowprops = dict(arrowstyle='<->', color='k', linewidth=3, mutation_scale=20)
    ax.annotate(text='',
                xy=(average, 5),
                xytext=(average + standard_error, 5),
                arrowprops=arrowprops)
    ax.annotate(text='Standard Error',
                xy=(average, 5.5),
                xytext=(average, 5.5),
                fontsize=15)

    ax.set_xlim(0.0, 1.0)
    ax.set_ylim(-0.1, 6)

    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)

    plt.show()

if __name__ == "__main__":
    main()

Modular Version: `global_name_space.py`¶

import argparse
import numpy as np

parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
parser.add_argument('--seed', type=int, default=1, metavar='S',
                    help='random seed (default: 1)')
ARGS = parser.parse_args()

np.random.seed(ARGS.seed)

Modular Version: `standard_error_of_x_bar.py`¶

import matplotlib.pyplot as plt
import numpy as np

from global_name_space import ARGS

def main():
    X_bar = []
    for _ in range(10_000):
        x = np.random.uniform(size=(5,))
        x_bar = x.mean()
        X_bar.append(x_bar)

    average = np.array(X_bar).mean()
    standard_error = np.array(X_bar).std()

    print(f'(Estimated) Mean of X_bar : {average:.4}')
    print(f'Standard Error   of X_bar : {standard_error:.4}')

    fig, ax = plt.subplots(figsize=(12, 3))

    ax.set_title("Sampling Distribution of X_bar", fontsize=20)

    ax.hist(X_bar, bins=100, density=True, alpha=0.3)
    ax.vlines(average, ymin=0, ymax=5, alpha=1.0, color='k', ls='-', lw=5)
    ax.vlines(average + standard_error, ymin=0, ymax=5, alpha=0.7, color='k', ls='--')
    ax.vlines(average - standard_error, ymin=0, ymax=5, alpha=0.7, color='k', ls='--')

    arrowprops = dict(arrowstyle='<->', color='k', linewidth=3, mutation_scale=20)
    ax.annotate(text='',
                xy=(average, 5),
                xytext=(average + standard_error, 5),
                arrowprops=arrowprops)
    ax.annotate(text='Standard Error',
                xy=(average, 5.5),
                xytext=(average, 5.5),
                fontsize=15)

    ax.set_xlim(0.0, 1.0)
    ax.set_ylim(-0.1, 6)

    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)

    plt.show()

if __name__ == "__main__":
    main()

Python: Standard Error of S-squared¶

`standard_error_of_s_square.py`¶

import matplotlib.pyplot as plt
import numpy as np

from global_name_space import ARGS

def main():
    S_square = []
    for _ in range(10_000):
        x = np.random.uniform(size=(5,))
        sigma = x.std()
        S_square.append(sigma**2)

    average = np.array(S_square).mean()
    standard_error = np.array(S_square).std()

    print(f'(Estimated) Mean of S^2 : {average:.4}')
    print(f'Standard Error   of S^2 : {standard_error:.4}')

    fig, ax = plt.subplots(figsize=(12, 3))

    ax.set_title("Sampling Distribution of S^2", fontsize=20)

    ax.hist(S_square, bins=100, density=True, alpha=0.3)
    ax.vlines(average, ymin=0, ymax=12, alpha=1.0, color='k', ls='-', lw=5)
    ax.vlines(average + standard_error, ymin=0, ymax=12, alpha=0.7, color='k', ls='--')
    ax.vlines(average - standard_error, ymin=0, ymax=12, alpha=0.7, color='k', ls='--')

    arrowprops = dict(arrowstyle='<->', color='k', linewidth=3, mutation_scale=20)
    ax.annotate(text='',
                xy=(average, 12),
                xytext=(average + standard_error, 12),
                arrowprops=arrowprops)
    ax.annotate(text='Standard Error',
                xy=(average, 13),
                xytext=(average, 13),
                fontsize=15)

    ax.set_xlim(0.0, 0.2)
    ax.set_ylim(-0.1, 15)

    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)

    plt.show()

if __name__ == "__main__":
    main()