Skip to content

Geometric and Negative Binomial Distributions

Overview

The geometric distribution models the number of trials until the first success, while the negative binomial distribution generalizes this to the number of trials until the \(r\)-th success. Both arise naturally in sequential experiments with independent Bernoulli trials.


Geometric Distribution

Definition

If independent Bernoulli trials with success probability \(p\) are performed until the first success, then the number of trials \(X\) follows a geometric distribution:

\[ X \sim \text{Geometric}(p), \qquad P(X = k) = (1 - p)^{k-1} p, \quad k = 1, 2, 3, \ldots \]

The PMF captures that the first \(k-1\) trials must be failures and the \(k\)-th trial must be a success.

Alternative parameterization: Some texts define \(Y\) as the number of failures before the first success, so \(Y = X - 1\) and \(P(Y = k) = (1-p)^k p\) for \(k = 0, 1, 2, \ldots\)

Verifying the PMF Sums to 1

\[ \sum_{k=1}^{\infty} (1-p)^{k-1} p = p \sum_{j=0}^{\infty} (1-p)^j = p \cdot \frac{1}{1 - (1-p)} = 1 \]

using the geometric series formula with ratio \(|1-p| < 1\).

Properties

\[ \begin{aligned} E[X] &= \frac{1}{p} \\[4pt] \text{Var}(X) &= \frac{1 - p}{p^2} \end{aligned} \]

Derivation of Mean

\[ E[X] = \sum_{k=1}^{\infty} k(1-p)^{k-1} p = p \cdot \frac{d}{dq}\left[\sum_{k=0}^{\infty} q^k \right]_{q=1-p} \!\!\!\!= p \cdot \frac{1}{(1-q)^2}\bigg|_{q=1-p} = \frac{1}{p} \]

Derivation of Variance

Using \(E[X(X-1)] = \sum_{k=2}^{\infty} k(k-1)(1-p)^{k-1}p = \frac{2(1-p)}{p^2}\):

\[ E[X^2] = E[X(X-1)] + E[X] = \frac{2(1-p)}{p^2} + \frac{1}{p} \]
\[ \text{Var}(X) = E[X^2] - (E[X])^2 = \frac{2(1-p)}{p^2} + \frac{1}{p} - \frac{1}{p^2} = \frac{1-p}{p^2} \]

Memoryless Property

The geometric distribution is the only discrete distribution with the memoryless property:

\[ P(X > s + t \mid X > s) = P(X > t) \quad \text{for all } s, t \geq 0 \]

Proof

\[ P(X > s + t \mid X > s) = \frac{P(X > s + t)}{P(X > s)} = \frac{(1-p)^{s+t}}{(1-p)^s} = (1-p)^t = P(X > t) \]

Interpretation: Given that you have already waited \(s\) trials without success, the probability of waiting at least \(t\) more trials is the same as starting fresh. Past failures carry no information about future success.


Negative Binomial Distribution

Definition

The number of trials \(Y\) needed to achieve \(r\) successes in independent Bernoulli trials follows a negative binomial distribution:

\[ Y \sim \text{NegBin}(r, p), \qquad P(Y = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, r+2, \ldots \]

The binomial coefficient \(\binom{k-1}{r-1}\) counts the ways to place \(r-1\) successes among the first \(k-1\) trials (the \(k\)-th trial is necessarily a success).

Note: When \(r = 1\), the negative binomial reduces to the geometric distribution.

Properties

\[ \begin{aligned} E[Y] &= \frac{r}{p} \\[4pt] \text{Var}(Y) &= \frac{r(1-p)}{p^2} \end{aligned} \]

Derivation via Sum of Geometrics

If \(X_1, X_2, \ldots, X_r\) are independent \(\text{Geometric}(p)\) random variables, then \(Y = \sum_{i=1}^r X_i \sim \text{NegBin}(r, p)\). Therefore:

\[ E[Y] = \sum_{i=1}^r E[X_i] = \frac{r}{p}, \qquad \text{Var}(Y) = \sum_{i=1}^r \text{Var}(X_i) = \frac{r(1-p)}{p^2} \]

Worked Example

Problem: A trader's strategy has a 30% win rate on each independent trade. What is the expected number of trades to achieve the first win? What is the probability that the first win occurs on the 5th trade?

Solution:

\[ E[X] = \frac{1}{0.3} \approx 3.33 \text{ trades} \]
\[ P(X = 5) = (1 - 0.3)^{5-1} \cdot 0.3 = (0.7)^4 \cdot 0.3 = 0.2401 \cdot 0.3 = 0.0720 \]

Python: PMF, CDF, and Sampling

Geometric Distribution

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

p = 0.3
x = np.arange(1, 20)

fig, ax = plt.subplots(figsize=(12, 3))
ax.bar(x - 0.15, stats.geom(p).pmf(x), width=0.3, label='PMF', alpha=0.7)
ax.bar(x + 0.15, stats.geom(p).cdf(x), width=0.3, label='CDF', alpha=0.7)
ax.set_xlabel('k (number of trials)')
ax.set_xticks(x)
ax.spines[['top', 'right']].set_visible(False)
ax.legend()
plt.show()

Negative Binomial Distribution

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

# scipy parameterizes by number of failures: nbinom(r, p) gives P(Y=k) for k failures
r, p = 5, 0.4
x = np.arange(0, 30)

fig, ax = plt.subplots(figsize=(12, 3))
ax.bar(x, stats.nbinom(r, p).pmf(x), alpha=0.7, label=f'NegBin(r={r}, p={p})')
ax.set_xlabel('k (number of failures before r-th success)')
ax.spines[['top', 'right']].set_visible(False)
ax.legend()
plt.show()

Verifying the Memoryless Property

import numpy as np
from scipy import stats

np.random.seed(42)
p = 0.3
samples = stats.geom(p).rvs(1_000_000)

s = 3
# P(X > s + t | X > s) vs P(X > t)
for t in [1, 3, 5]:
    conditional = np.mean(samples[samples > s] > s + t)
    unconditional = np.mean(samples > t)
    print(f"P(X>{s}+{t}|X>{s}) = {conditional:.4f},  P(X>{t}) = {unconditional:.4f}")

Comparing Parameters

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

fig, ax = plt.subplots(figsize=(12, 3))
for p in [0.2, 0.4, 0.6]:
    x = np.arange(1, 25)
    ax.plot(x, stats.geom(p).pmf(x), 'o-', label=f'Geometric(p={p})', markersize=4)
ax.spines[['top', 'right']].set_visible(False)
ax.set_xlabel('k')
ax.legend()
plt.show()

Relationship to Other Distributions

\[ \begin{aligned} \text{Geometric}(p) &= \text{NegBin}(1, p) \\[4pt] \text{NegBin}(r, p) &= \sum_{i=1}^r \text{Geometric}_i(p) \quad \text{(independent sum)} \\[4pt] \text{Geometric} &\leftrightarrow \text{Exponential} \quad \text{(discrete vs continuous memoryless)} \end{aligned} \]

Key Takeaways

  • The geometric distribution models waiting time to the first success and is the only discrete memoryless distribution.
  • The negative binomial generalizes the geometric to count trials until the \(r\)-th success.
  • Both distributions arise from sequences of independent Bernoulli trials.
  • The mean \(1/p\) of the geometric distribution has an intuitive interpretation: lower success probability means longer expected wait.
  • The geometric distribution is the discrete analogue of the exponential distribution, sharing the memoryless property.