Sequences, Limits, and Asymptotics¶
Sequences and their limiting behavior are the mathematical backbone of statistical inference. The Law of Large Numbers, the Central Limit Theorem, and the consistency of estimators are all statements about limits of sequences of random variables. This section reviews the deterministic foundations; the probabilistic extensions appear in Chapter 3.
Sequences¶
Definition¶
A sequence is an ordered list of real numbers indexed by the natural numbers:
Equivalently, a sequence is a function \(a: \mathbb{N} \to \mathbb{R}\).
Examples:
- \(a_n = 1/n\): the harmonic sequence \(1, 1/2, 1/3, \dots\)
- \(a_n = (-1)^n / n\): an alternating sequence \(-1, 1/2, -1/3, \dots\)
- \(a_n = (1 + 1/n)^n\): converges to \(e \approx 2.718\)
Monotone and Bounded Sequences¶
- \((a_n)\) is increasing if \(a_n \leq a_{n+1}\) for all \(n\), and decreasing if \(a_n \geq a_{n+1}\).
- \((a_n)\) is bounded above if there exists \(M\) such that \(a_n \leq M\) for all \(n\); bounded below similarly.
Monotone Convergence Theorem
Every bounded monotone sequence converges. This theorem is used implicitly whenever we assert that a non-decreasing sequence of probabilities or expectations has a limit.
Limits of Sequences¶
Definition (epsilon-N Definition)¶
A sequence \((a_n)\) converges to a limit \(L \in \mathbb{R}\), written \(\lim_{n \to \infty} a_n = L\) or \(a_n \to L\), if:
If no such \(L\) exists, the sequence diverges.
Limit Laws¶
If \(a_n \to L\) and \(b_n \to M\), then:
| Rule | Statement |
|---|---|
| Sum | \(a_n + b_n \to L + M\) |
| Product | \(a_n \cdot b_n \to L \cdot M\) |
| Quotient | \(a_n / b_n \to L / M\) (provided \(M \neq 0\)) |
| Scalar multiple | \(c \cdot a_n \to c \cdot L\) |
| Power | \(a_n^k \to L^k\) for fixed \(k \in \mathbb{N}\) |
Squeeze Theorem¶
If \(a_n \leq c_n \leq b_n\) for all \(n\) and \(a_n \to L\) and \(b_n \to L\), then \(c_n \to L\).
This technique is used to establish bounds on tail probabilities and approximation errors.
Series¶
A series is the sequence of partial sums of a sequence:
The series converges if \(\lim_{N \to \infty} S_N\) exists and is finite.
Key Series¶
| Series | Convergence | Limit |
|---|---|---|
| Geometric: \(\sum_{n=0}^{\infty} r^n\) | \(\lvert r \rvert < 1\) | \(\dfrac{1}{1-r}\) |
| Harmonic: \(\sum_{n=1}^{\infty} \frac{1}{n}\) | Diverges | — |
| \(p\)-series: \(\sum_{n=1}^{\infty} \frac{1}{n^p}\) | \(p > 1\) | Finite (depends on \(p\)) |
| Exponential: \(\sum_{n=0}^{\infty} \frac{x^n}{n!}\) | All \(x \in \mathbb{R}\) | \(e^x\) |
The geometric series identity is used to derive the PMF normalizations for geometric and negative binomial distributions. The exponential series underpins the Poisson distribution and moment generating functions.
Limits of Functions¶
Definition¶
Continuity¶
A function \(f\) is continuous at \(a\) if \(\lim_{x \to a} f(x) = f(a)\).
\(f\) is continuous on an interval if it is continuous at every point in that interval.
Continuous functions preserve limits: if \(a_n \to L\) and \(f\) is continuous at \(L\), then \(f(a_n) \to f(L)\). This continuous mapping theorem has a probabilistic analog that is central to deriving the asymptotic distributions of estimators.
Differentiation Essentials¶
Derivative¶
Key Rules¶
| Rule | Formula |
|---|---|
| Power | \((x^n)' = n x^{n-1}\) |
| Exponential | \((e^x)' = e^x\) |
| Logarithm | \((\ln x)' = 1/x\) |
| Chain | \((f \circ g)'(x) = f'(g(x)) \cdot g'(x)\) |
| Product | \((fg)' = f'g + fg'\) |
| Quotient | \((f/g)' = (f'g - fg')/g^2\) |
Partial Derivatives¶
For \(f: \mathbb{R}^n \to \mathbb{R}\), the partial derivative with respect to \(x_i\) is
The gradient is the vector of all partial derivatives:
Gradients are used extensively in maximum likelihood estimation and gradient-based optimization (Chapters 6, 7, 13, 14).
Integration Essentials¶
Definite Integral¶
represents the signed area under \(f\) from \(a\) to \(b\). For probability, this computes \(P(a \leq X \leq b)\) when \(f\) is a probability density function.
Fundamental Theorem of Calculus¶
If \(F'(x) = f(x)\), then
Key Integrals¶
| Integral | Result | Statistical Use |
|---|---|---|
| \(\int_0^{\infty} e^{-\lambda x}\, dx\) | \(1/\lambda\) | Exponential distribution |
| \(\int_{-\infty}^{\infty} e^{-x^2/2}\, dx\) | \(\sqrt{2\pi}\) | Normal distribution normalizing constant |
| \(\int_0^{\infty} x^{n-1} e^{-x}\, dx\) | \(\Gamma(n) = (n-1)!\) | Gamma function |
Taylor Series and Approximations¶
Taylor Expansion¶
The Taylor series of \(f\) about \(x = a\) is
Important Expansions¶
Taylor approximations are the workhorse behind the delta method, the derivation of the Central Limit Theorem via moment generating functions, and asymptotic expansions of test statistics.
Asymptotic Notation¶
Asymptotic notation describes the growth rate of functions and sequences, which is essential for characterizing how fast estimators converge.
Big-O and Little-o¶
| Notation | Definition | Intuition |
|---|---|---|
| \(f(n) = O(g(n))\) | \(\exists\, C, N\) s.t. \(\lvert f(n) \rvert \leq C \lvert g(n) \rvert\) for \(n > N\) | \(f\) grows no faster than \(g\) |
| \(f(n) = o(g(n))\) | \(\lim_{n \to \infty} f(n)/g(n) = 0\) | \(f\) grows strictly slower than \(g\) |
| \(f(n) \sim g(n)\) | \(\lim_{n \to \infty} f(n)/g(n) = 1\) | \(f\) and \(g\) are asymptotically equivalent |
Examples:
- \(n^2 + 3n = O(n^2)\)
- \(1/n = o(1)\) — converges to zero.
- \(n! \sim \sqrt{2\pi n}\,(n/e)^n\) — Stirling's approximation.
Convergence Rates¶
In statistics, we often write:
This means the estimation error shrinks at rate \(1/\sqrt{n}\), which is the standard rate for many estimators (e.g., the sample mean). The subscript \(p\) indicates convergence in probability, formalized in Chapter 3.
Modes of Convergence (Preview)¶
The following modes of convergence for sequences of random variables are developed fully in Chapter 3 but previewed here for context:
| Mode | Notation | Informal Meaning |
|---|---|---|
| Almost sure | \(X_n \xrightarrow{\text{a.s.}} X\) | \(X_n(\omega) \to X(\omega)\) for almost every outcome |
| In probability | \(X_n \xrightarrow{p} X\) | \(P(\lvert X_n - X \rvert > \varepsilon) \to 0\) |
| In distribution | \(X_n \xrightarrow{d} X\) | CDFs converge: \(F_{X_n}(x) \to F_X(x)\) |
| In \(L^p\) | \(X_n \xrightarrow{L^p} X\) | \(E[\lvert X_n - X \rvert^p] \to 0\) |
Hierarchy: a.s. \(\Rightarrow\) in probability \(\Rightarrow\) in distribution. The Law of Large Numbers is a statement about convergence in probability (or a.s.), while the Central Limit Theorem is a statement about convergence in distribution.
Summary¶
| Concept | Where It Appears |
|---|---|
| Limits of sequences | Consistency of estimators, LLN |
| Series convergence | PMF normalization, MGFs |
| Continuity and continuous mapping | Asymptotic distributions of transformed estimators |
| Derivatives and gradients | MLE, score functions, optimization |
| Integration | CDF/PDF relationship, expected values |
| Taylor series | Delta method, CLT derivation, asymptotic expansions |
| Big-\(O\) / little-\(o\) | Convergence rates of estimators |