Skip to content

Sequences, Limits, and Asymptotics

Sequences and their limiting behavior are the mathematical backbone of statistical inference. The Law of Large Numbers, the Central Limit Theorem, and the consistency of estimators are all statements about limits of sequences of random variables. This section reviews the deterministic foundations; the probabilistic extensions appear in Chapter 3.

Sequences

Definition

A sequence is an ordered list of real numbers indexed by the natural numbers:

\[ (a_n)_{n=1}^{\infty} = a_1,\, a_2,\, a_3,\, \dots \]

Equivalently, a sequence is a function \(a: \mathbb{N} \to \mathbb{R}\).

Examples:

  • \(a_n = 1/n\): the harmonic sequence \(1, 1/2, 1/3, \dots\)
  • \(a_n = (-1)^n / n\): an alternating sequence \(-1, 1/2, -1/3, \dots\)
  • \(a_n = (1 + 1/n)^n\): converges to \(e \approx 2.718\)

Monotone and Bounded Sequences

  • \((a_n)\) is increasing if \(a_n \leq a_{n+1}\) for all \(n\), and decreasing if \(a_n \geq a_{n+1}\).
  • \((a_n)\) is bounded above if there exists \(M\) such that \(a_n \leq M\) for all \(n\); bounded below similarly.

Monotone Convergence Theorem

Every bounded monotone sequence converges. This theorem is used implicitly whenever we assert that a non-decreasing sequence of probabilities or expectations has a limit.

Limits of Sequences

Definition (epsilon-N Definition)

A sequence \((a_n)\) converges to a limit \(L \in \mathbb{R}\), written \(\lim_{n \to \infty} a_n = L\) or \(a_n \to L\), if:

\[ \forall\, \varepsilon > 0,\;\; \exists\, N \in \mathbb{N} \text{ such that } n > N \implies |a_n - L| < \varepsilon \]

If no such \(L\) exists, the sequence diverges.

Limit Laws

If \(a_n \to L\) and \(b_n \to M\), then:

Rule Statement
Sum \(a_n + b_n \to L + M\)
Product \(a_n \cdot b_n \to L \cdot M\)
Quotient \(a_n / b_n \to L / M\) (provided \(M \neq 0\))
Scalar multiple \(c \cdot a_n \to c \cdot L\)
Power \(a_n^k \to L^k\) for fixed \(k \in \mathbb{N}\)

Squeeze Theorem

If \(a_n \leq c_n \leq b_n\) for all \(n\) and \(a_n \to L\) and \(b_n \to L\), then \(c_n \to L\).

This technique is used to establish bounds on tail probabilities and approximation errors.

Series

A series is the sequence of partial sums of a sequence:

\[ S_N = \sum_{n=1}^{N} a_n \]

The series converges if \(\lim_{N \to \infty} S_N\) exists and is finite.

Key Series

Series Convergence Limit
Geometric: \(\sum_{n=0}^{\infty} r^n\) \(\lvert r \rvert < 1\) \(\dfrac{1}{1-r}\)
Harmonic: \(\sum_{n=1}^{\infty} \frac{1}{n}\) Diverges
\(p\)-series: \(\sum_{n=1}^{\infty} \frac{1}{n^p}\) \(p > 1\) Finite (depends on \(p\))
Exponential: \(\sum_{n=0}^{\infty} \frac{x^n}{n!}\) All \(x \in \mathbb{R}\) \(e^x\)

The geometric series identity is used to derive the PMF normalizations for geometric and negative binomial distributions. The exponential series underpins the Poisson distribution and moment generating functions.

Limits of Functions

Definition

\[ \lim_{x \to a} f(x) = L \quad \Longleftrightarrow \quad \forall\, \varepsilon > 0,\; \exists\, \delta > 0 \text{ s.t. } 0 < |x - a| < \delta \implies |f(x) - L| < \varepsilon \]

Continuity

A function \(f\) is continuous at \(a\) if \(\lim_{x \to a} f(x) = f(a)\).

\(f\) is continuous on an interval if it is continuous at every point in that interval.

Continuous functions preserve limits: if \(a_n \to L\) and \(f\) is continuous at \(L\), then \(f(a_n) \to f(L)\). This continuous mapping theorem has a probabilistic analog that is central to deriving the asymptotic distributions of estimators.

Differentiation Essentials

Derivative

\[ f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \]

Key Rules

Rule Formula
Power \((x^n)' = n x^{n-1}\)
Exponential \((e^x)' = e^x\)
Logarithm \((\ln x)' = 1/x\)
Chain \((f \circ g)'(x) = f'(g(x)) \cdot g'(x)\)
Product \((fg)' = f'g + fg'\)
Quotient \((f/g)' = (f'g - fg')/g^2\)

Partial Derivatives

For \(f: \mathbb{R}^n \to \mathbb{R}\), the partial derivative with respect to \(x_i\) is

\[ \frac{\partial f}{\partial x_i} = \lim_{h \to 0} \frac{f(x_1, \dots, x_i + h, \dots, x_n) - f(x_1, \dots, x_n)}{h} \]

The gradient is the vector of all partial derivatives:

\[ \nabla f = \left(\frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n}\right) \]

Gradients are used extensively in maximum likelihood estimation and gradient-based optimization (Chapters 6, 7, 13, 14).

Integration Essentials

Definite Integral

\[ \int_a^b f(x)\, dx \]

represents the signed area under \(f\) from \(a\) to \(b\). For probability, this computes \(P(a \leq X \leq b)\) when \(f\) is a probability density function.

Fundamental Theorem of Calculus

If \(F'(x) = f(x)\), then

\[ \int_a^b f(x)\, dx = F(b) - F(a) \]

Key Integrals

Integral Result Statistical Use
\(\int_0^{\infty} e^{-\lambda x}\, dx\) \(1/\lambda\) Exponential distribution
\(\int_{-\infty}^{\infty} e^{-x^2/2}\, dx\) \(\sqrt{2\pi}\) Normal distribution normalizing constant
\(\int_0^{\infty} x^{n-1} e^{-x}\, dx\) \(\Gamma(n) = (n-1)!\) Gamma function

Taylor Series and Approximations

Taylor Expansion

The Taylor series of \(f\) about \(x = a\) is

\[ f(x) = \sum_{k=0}^{\infty} \frac{f^{(k)}(a)}{k!}(x - a)^k = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \cdots \]

Important Expansions

\[ e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots \]
\[ \ln(1 + x) = x - \frac{x^2}{2} + \frac{x^3}{3} - \cdots \qquad (|x| \leq 1,\; x \neq -1) \]
\[ (1 + x)^n \approx 1 + nx + \frac{n(n-1)}{2}x^2 + \cdots \]

Taylor approximations are the workhorse behind the delta method, the derivation of the Central Limit Theorem via moment generating functions, and asymptotic expansions of test statistics.

Asymptotic Notation

Asymptotic notation describes the growth rate of functions and sequences, which is essential for characterizing how fast estimators converge.

Big-O and Little-o

Notation Definition Intuition
\(f(n) = O(g(n))\) \(\exists\, C, N\) s.t. \(\lvert f(n) \rvert \leq C \lvert g(n) \rvert\) for \(n > N\) \(f\) grows no faster than \(g\)
\(f(n) = o(g(n))\) \(\lim_{n \to \infty} f(n)/g(n) = 0\) \(f\) grows strictly slower than \(g\)
\(f(n) \sim g(n)\) \(\lim_{n \to \infty} f(n)/g(n) = 1\) \(f\) and \(g\) are asymptotically equivalent

Examples:

  • \(n^2 + 3n = O(n^2)\)
  • \(1/n = o(1)\) — converges to zero.
  • \(n! \sim \sqrt{2\pi n}\,(n/e)^n\) — Stirling's approximation.

Convergence Rates

In statistics, we often write:

\[ \hat{\theta}_n - \theta = O_p(n^{-1/2}) \]

This means the estimation error shrinks at rate \(1/\sqrt{n}\), which is the standard rate for many estimators (e.g., the sample mean). The subscript \(p\) indicates convergence in probability, formalized in Chapter 3.

Modes of Convergence (Preview)

The following modes of convergence for sequences of random variables are developed fully in Chapter 3 but previewed here for context:

Mode Notation Informal Meaning
Almost sure \(X_n \xrightarrow{\text{a.s.}} X\) \(X_n(\omega) \to X(\omega)\) for almost every outcome
In probability \(X_n \xrightarrow{p} X\) \(P(\lvert X_n - X \rvert > \varepsilon) \to 0\)
In distribution \(X_n \xrightarrow{d} X\) CDFs converge: \(F_{X_n}(x) \to F_X(x)\)
In \(L^p\) \(X_n \xrightarrow{L^p} X\) \(E[\lvert X_n - X \rvert^p] \to 0\)

Hierarchy: a.s. \(\Rightarrow\) in probability \(\Rightarrow\) in distribution. The Law of Large Numbers is a statement about convergence in probability (or a.s.), while the Central Limit Theorem is a statement about convergence in distribution.

Summary

Concept Where It Appears
Limits of sequences Consistency of estimators, LLN
Series convergence PMF normalization, MGFs
Continuity and continuous mapping Asymptotic distributions of transformed estimators
Derivatives and gradients MLE, score functions, optimization
Integration CDF/PDF relationship, expected values
Taylor series Delta method, CLT derivation, asymptotic expansions
Big-\(O\) / little-\(o\) Convergence rates of estimators