Expectation and Linearity¶
Overview¶
The expected value (or expectation) of a random variable is its long-run average value over many repetitions of an experiment. It provides a single number summarizing the "center" of a distribution. The linearity of expectation is one of the most powerful and widely used properties in all of probability.
Definition¶
Discrete Random Variables¶
For a discrete random variable \(X\) with PMF \(p_{x_i}\):
In the brick metaphor: \(E[X]\) is the center of mass of the bricks placed along the real line.
Continuous Random Variables¶
For a continuous random variable \(X\) with PDF \(f(x)\):
The Law of the Unconscious Statistician (LOTUS)¶
To compute the expected value of a function \(g(X)\) without first finding the distribution of \(g(X)\):
This avoids the often tedious step of deriving the distribution of \(g(X)\).
Linearity of Expectation¶
For any random variables \(X\) and \(Y\) (not necessarily independent) and constants \(a, b, c\):
This extends to any finite sum:
Key insight: Linearity holds regardless of whether the random variables are independent or dependent. This makes it an exceptionally powerful tool.
Properties of Expectation¶
- Constant: \(E[c] = c\)
- Scaling: \(E[aX] = aE[X]\)
- Additivity: \(E[X + Y] = E[X] + E[Y]\)
- Monotonicity: If \(X \leq Y\) always, then \(E[X] \leq E[Y]\)
- Product (independent only): If \(X \perp\!\!\!\perp Y\), then \(E[XY] = E[X] \cdot E[Y]\)
Note that property 5 requires independence; properties 1–4 do not.
Examples¶
Example: Expected Value of a Fair Die¶
Example: Expected Number of Heads in n Coin Flips¶
Let \(X_i = 1\) if flip \(i\) is heads, 0 otherwise. Then \(X = \sum_{i=1}^n X_i\) counts the total heads. By linearity:
For a fair coin with \(n = 100\): \(E[X] = 50\).
Example: Coupon Collector Problem¶
There are \(n\) distinct coupons. Each purchase gives a uniformly random coupon. Let \(T\) be the total purchases needed to collect all \(n\) coupons.
Divide the process into phases: phase \(i\) begins when you have \(i-1\) distinct coupons and ends when you get the \(i\)-th new one. In phase \(i\), each purchase has probability \(\frac{n - i + 1}{n}\) of being new, so the number of purchases in phase \(i\) is geometric with mean \(\frac{n}{n - i + 1}\).
By linearity:
For \(n = 50\) types: \(E[T] \approx 50 \times \ln(50) \approx 225\) purchases.
Example: Continuous — Exponential Distribution¶
For \(X \sim \text{Exponential}(\lambda)\) with PDF \(f(x) = \lambda e^{-\lambda x}\) for \(x \geq 0\):
Python Exploration¶
import numpy as np
# Expected value of a fair die
values = np.arange(1, 7)
probs = np.ones(6) / 6
expected = np.sum(values * probs)
print(f"E[fair die] = {expected:.4f}")
# Simulation
np.random.seed(42)
rolls = np.random.randint(1, 7, size=100_000)
print(f"Simulated mean = {rolls.mean():.4f}")
import numpy as np
def coupon_collector_simulation(n_coupons, n_trials=10_000):
"""Simulate the coupon collector problem."""
np.random.seed(42)
totals = []
for _ in range(n_trials):
collected = set()
count = 0
while len(collected) < n_coupons:
collected.add(np.random.randint(0, n_coupons))
count += 1
totals.append(count)
simulated = np.mean(totals)
H_n = sum(1/k for k in range(1, n_coupons + 1))
theoretical = n_coupons * H_n
print(f"n = {n_coupons}")
print(f"Simulated E[T] = {simulated:.1f}")
print(f"Theoretical E[T] = n·Hₙ = {theoretical:.1f}")
coupon_collector_simulation(50)
import numpy as np
import matplotlib.pyplot as plt
def linearity_demonstration():
"""Demonstrate linearity of expectation with dependent variables."""
np.random.seed(42)
n_sim = 100_000
# X ~ Uniform(0,1), Y = X^2 (clearly dependent on X)
X = np.random.rand(n_sim)
Y = X ** 2
print("X and Y = X² are dependent, but linearity still holds:")
print(f"E[X] = {X.mean():.4f} (theoretical: 0.5)")
print(f"E[Y] = {Y.mean():.4f} (theoretical: 0.3333)")
print(f"E[X + Y] = {(X + Y).mean():.4f}")
print(f"E[X] + E[Y] = {X.mean() + Y.mean():.4f}")
linearity_demonstration()
Key Takeaways¶
- The expected value \(E[X]\) is the probability-weighted average of all possible values.
- LOTUS lets us compute \(E[g(X)]\) directly from the distribution of \(X\).
- Linearity of expectation always holds, even for dependent variables—it is one of the most useful tools in probability.
- The product rule \(E[XY] = E[X]E[Y]\) requires independence; linearity does not.