Standard Error¶
Overview¶
References: YouTube — Standard Error | Blog — SD vs SE
The standard error (SE) quantifies how much a sample statistic varies from sample to sample. It is the standard deviation of the sampling distribution of that statistic.
Standard Deviation vs Standard Error¶
Standard Deviation (SD)¶
The SD measures how spread out individual observations are around the population mean:
Standard Error (SE)¶
The SE measures how spread out a sample statistic is around the true parameter:
Key Distinction¶
- Standard deviation measures the spread of individual data points around the mean.
- Standard error measures the spread of sample statistics (e.g., means) around the population parameter.
| Standard Deviation | Standard Error | |
|---|---|---|
| Measures | Spread of individual data | Spread of a sample statistic |
| Depends on | Population variability | Population variability and sample size |
| Formula (for \(\bar{X}\)) | \(\sigma\) | \(\sigma / \sqrt{n}\) |
| Decreases with \(n\)? | No | Yes |
The Standardization Pattern¶
Reference: Khan Academy — Standard Error of the Mean
A unifying pattern in inferential statistics:
Example: Running Out of Water¶
Reference: Khan Academy — Sampling Distribution Example Problem
Problem. On average, a male drinks 2 liters of water when active outdoors, with a standard deviation of 0.7 liters. For a full-day nature trip of 50 men, we will bring 110 liters of water along. Determine the probability of running out of water during the trip.
Solution. Let \(X_i\) be the water consumption of the \(i\)-th person. Assuming independence, by the CLT the sample mean \(\bar{X}\) is approximately normally distributed with mean 2 and standard deviation \(0.7/\sqrt{50} \approx 0.0990\).
Python: Standard Error of X-bar¶
Standalone Version¶
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)
def main():
X_bar = []
for _ in range(10_000):
x = np.random.uniform(size=(5,))
x_bar = x.mean()
X_bar.append(x_bar)
average = np.array(X_bar).mean() # very good estimate of mu
standard_error = np.array(X_bar).std()
print(f'(Estimated) Mean of X_bar : {average:.4}')
print(f'Standard Error of X_bar : {standard_error:.4}')
fig, ax = plt.subplots(figsize=(12, 3))
ax.set_title("Sampling Distribution of X_bar", fontsize=20)
ax.hist(X_bar, bins=100, density=True, alpha=0.3)
ax.vlines(average, ymin=0, ymax=5, alpha=1.0, color='k', ls='-', lw=5)
ax.vlines(average + standard_error, ymin=0, ymax=5, alpha=0.7, color='k', ls='--')
ax.vlines(average - standard_error, ymin=0, ymax=5, alpha=0.7, color='k', ls='--')
arrowprops = dict(arrowstyle='<->', color='k', linewidth=3, mutation_scale=20)
ax.annotate(text='',
xy=(average, 5),
xytext=(average + standard_error, 5),
arrowprops=arrowprops)
ax.annotate(text='Standard Error',
xy=(average, 5.5),
xytext=(average, 5.5),
fontsize=15)
ax.set_xlim(0.0, 1.0)
ax.set_ylim(-0.1, 6)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.show()
if __name__ == "__main__":
main()
Modular Version: global_name_space.py¶
import argparse
import numpy as np
parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
parser.add_argument('--seed', type=int, default=1, metavar='S',
help='random seed (default: 1)')
ARGS = parser.parse_args()
np.random.seed(ARGS.seed)
Modular Version: standard_error_of_x_bar.py¶
import matplotlib.pyplot as plt
import numpy as np
from global_name_space import ARGS
def main():
X_bar = []
for _ in range(10_000):
x = np.random.uniform(size=(5,))
x_bar = x.mean()
X_bar.append(x_bar)
average = np.array(X_bar).mean()
standard_error = np.array(X_bar).std()
print(f'(Estimated) Mean of X_bar : {average:.4}')
print(f'Standard Error of X_bar : {standard_error:.4}')
fig, ax = plt.subplots(figsize=(12, 3))
ax.set_title("Sampling Distribution of X_bar", fontsize=20)
ax.hist(X_bar, bins=100, density=True, alpha=0.3)
ax.vlines(average, ymin=0, ymax=5, alpha=1.0, color='k', ls='-', lw=5)
ax.vlines(average + standard_error, ymin=0, ymax=5, alpha=0.7, color='k', ls='--')
ax.vlines(average - standard_error, ymin=0, ymax=5, alpha=0.7, color='k', ls='--')
arrowprops = dict(arrowstyle='<->', color='k', linewidth=3, mutation_scale=20)
ax.annotate(text='',
xy=(average, 5),
xytext=(average + standard_error, 5),
arrowprops=arrowprops)
ax.annotate(text='Standard Error',
xy=(average, 5.5),
xytext=(average, 5.5),
fontsize=15)
ax.set_xlim(0.0, 1.0)
ax.set_ylim(-0.1, 6)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.show()
if __name__ == "__main__":
main()
Python: Standard Error of S-squared¶
standard_error_of_s_square.py¶
import matplotlib.pyplot as plt
import numpy as np
from global_name_space import ARGS
def main():
S_square = []
for _ in range(10_000):
x = np.random.uniform(size=(5,))
sigma = x.std()
S_square.append(sigma**2)
average = np.array(S_square).mean()
standard_error = np.array(S_square).std()
print(f'(Estimated) Mean of S^2 : {average:.4}')
print(f'Standard Error of S^2 : {standard_error:.4}')
fig, ax = plt.subplots(figsize=(12, 3))
ax.set_title("Sampling Distribution of S^2", fontsize=20)
ax.hist(S_square, bins=100, density=True, alpha=0.3)
ax.vlines(average, ymin=0, ymax=12, alpha=1.0, color='k', ls='-', lw=5)
ax.vlines(average + standard_error, ymin=0, ymax=12, alpha=0.7, color='k', ls='--')
ax.vlines(average - standard_error, ymin=0, ymax=12, alpha=0.7, color='k', ls='--')
arrowprops = dict(arrowstyle='<->', color='k', linewidth=3, mutation_scale=20)
ax.annotate(text='',
xy=(average, 12),
xytext=(average + standard_error, 12),
arrowprops=arrowprops)
ax.annotate(text='Standard Error',
xy=(average, 13),
xytext=(average, 13),
fontsize=15)
ax.set_xlim(0.0, 0.2)
ax.set_ylim(-0.1, 15)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.show()
if __name__ == "__main__":
main()