Skip to content

CI for p

One-Sample Proportion Confidence Interval

In many statistical problems, we are interested in estimating a population proportion \(p\) — the fraction of individuals in a population that have a certain characteristic. For example, the proportion of voters who support a particular candidate or the proportion of defective items in a batch.

Formula (Wald z-Interval)

The general form of a confidence interval for a population proportion \(p\) is

\[ \hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \]

where

  • \(\hat{p}\) is the sample proportion,
  • \(\alpha\) is the significance level (\(\text{significance level} = 1 - \text{confidence level}\)),
  • \(z_{\alpha/2}\) is the critical value from the standard normal distribution, satisfying \(P(Z > z_{\alpha/2}) = \alpha/2\),
  • \(n\) is the sample size,
  • \(\sqrt{\hat{p}(1 - \hat{p})/n}\) is the standard error of the sample proportion.

Conditions for Validity

\[ \hat{p}\pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \quad\text{if}\quad \begin{cases} n\hat{p}\ge 10 \text{ and } n(1-\hat{p})\ge 10 \text{ (CLT)} \\ n \ge 30 \text{ (LLN)} \\ n \le 0.1N \text{ (IID)} \end{cases} \]

The sample size \(n\) must be large enough so that the sampling distribution of the proportion is approximately normal.

Python Code

import numpy as np
import scipy.stats as stats

n = 200          # sample size
x = 120          # number of successes
confidence_level = 0.95

p_hat = x / n
z_critical = stats.norm.ppf(1 - (1 - confidence_level) / 2)
standard_error = np.sqrt((p_hat * (1 - p_hat)) / n)
margin_of_error = z_critical * standard_error
confidence_interval = (p_hat - margin_of_error, p_hat + margin_of_error)

print(f"{confidence_interval = }")

Alternatives to the Wald z-Interval

The Wald interval is named after Abraham Wald, who formalized this type of interval as a normal approximation. However, the Wald interval performs poorly when \(n\) is small, \(\hat{p}\) is near 0 or 1, or \(n\hat{p}\) or \(n(1-\hat{p}) < 10\). In these cases, the coverage probability can be much lower than the nominal level.

Interval Formula Type Uses \(z\)? Works Well When Comments
Wald (z) \(\hat{p} \pm z\sqrt{\hat{p}(1-\hat{p})/n}\) Yes Large \(n\) Simple, but inaccurate for small samples
Wilson score Derived from inverting z-test Yes Small–large \(n\) Much better coverage
Agresti–Coull Adjusted Wald (add pseudo-observations) Yes Small–medium \(n\) Easy fix, near Wilson performance
Clopper–Pearson Based on binomial No Small \(n\) Conservative but exact

Wilson Score Interval

The Wilson interval comes from inverting the z-test for proportions:

\[ \frac{(\hat{p} - p)^2}{p(1-p)/n} = z_{\alpha/2}^2 \]

Solving for \(p\) gives:

\[ \text{CI} = \frac{ \hat{p} + \frac{z^2}{2n} \pm z \sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z^2}{4n^2}} }{ 1 + \frac{z^2}{n} } \]

The center of the interval is not \(\hat{p}\), but a shrunken value toward 0.5:

\[ \tilde{p} = \frac{\hat{p} + \frac{z^2}{2n}}{1 + \frac{z^2}{n}} \]

The interval stays within \([0, 1]\) and performs much better for small or skewed samples, with nearly nominal coverage even for \(n < 30\).

Agresti–Coull Interval

Agresti and Coull observed that the Wilson formula can be approximated simply by adding "pseudo-observations." For a 95% confidence level (\(z = 1.96 \approx 2\)):

  • Add 2 successes and 2 failures → effectively 4 extra observations.
  • Use adjusted counts: \(n' = n + 4\), \(x' = x + 2\), \(\tilde{p} = x'/n'\).
  • Compute a Wald-style interval using the adjusted proportion:
\[ \tilde{p} \pm z_{\alpha/2} \sqrt{\frac{\tilde{p}(1 - \tilde{p})}{n'}} \]

Coverage is very close to Wilson; easy to explain and compute by hand. Becomes identical to Wilson when \(n\) is large.

Comparison Summary

Method Centered at Adjustment Performance
Wald (z) \(\hat{p}\) None Poor for small/edge cases
Wilson score Weighted avg of \(\hat{p}\) and 0.5 Shifts center & width Excellent
Agresti–Coull \((x+2)/(n+4)\) Adds pseudo-data Nearly as good as Wilson

Examples

Example 1: 95% CI for Proportion of Voter Support

A random sample of 200 voters is taken, and 120 say they support a particular candidate. Construct a 95% CI for the true proportion.

Solution.

\[ \hat{p} = \frac{120}{200} = 0.60 \]

For 95% confidence, \(z_{\alpha/2} \approx 1.96\).

\[ \text{SE} = \sqrt{\frac{0.60 \times 0.40}{200}} = \sqrt{0.0012} \approx 0.03464 \]
\[ \text{ME} = 1.96 \times 0.03464 \approx 0.0679 \]
\[ \boxed{(0.5321,\ 0.6679)} \]

We are 95% confident that the true proportion of voters who support the candidate is between 0.5321 and 0.6679.

Example 2: Sample Size for School Funding Survey

Della wants a margin of error smaller than \(\pm 2\%\) at 95% confidence for a proportion. What minimum sample size is needed?

Solution. The worst-case standard error occurs at \(\hat{p} = 0.5\) (maximizing \(\hat{p}(1-\hat{p})\)).

import scipy.stats as stats
import numpy as np

confidence_level = 0.95
alpha = 1 - confidence_level
z_star = stats.norm().ppf(1 - alpha / 2)
margin_of_error_max = 0.02
p_max = 0.5

n = 1
while True:
    n += 1
    me = z_star * np.sqrt(p_max * (1 - p_max) / n)
    if me <= margin_of_error_max:
        break
print(f"{n = }")  # n = 2401

Example 3: Female Artist's Songs (99% CI)

Della has over 500 songs. She randomly selects 50 songs and finds 20 are by a female artist. Construct a 99% CI.

Solution.

import numpy as np
from scipy import stats

confidence_level = 0.99
alpha = 1 - confidence_level
p_hat = 20 / 50
n = 50

z_star = stats.norm().ppf(1 - alpha / 2)
margin_of_error = z_star * np.sqrt(p_hat * (1 - p_hat) / n)
print(f"{p_hat} ± {margin_of_error:.3f}")
# 0.4 ± 0.178

The 99% CI is approximately \((0.222, 0.578)\).


Exercises

Exercise: Confidence Interval for Pass Rate

A random sample of 100 vehicles is selected, and 74 pass the inspection. Construct a 95% confidence interval for the pass rate.

Solution.

\[ \hat{p} = \frac{74}{100} = 0.74, \qquad \text{SE} = \sqrt{\frac{0.74 \times 0.26}{100}} \approx 0.04386 \]
\[ \text{ME} = 1.96 \times 0.04386 \approx 0.086 \]
\[ \boxed{(0.6540,\ 0.8260)} \]

We are 95% confident that the true pass rate lies between 65.40% and 82.60%.

Exercise: 90% CI for Population Proportion

In a survey of 200 people, 120 prefer product A over product B. Construct a 90% confidence interval.

Solution.

\[ \hat{p} = 0.60, \qquad z_{0.05} \approx 1.645 \]
\[ \text{ME} = 1.645 \times \sqrt{\frac{0.60 \times 0.40}{200}} \approx 1.645 \times 0.03464 \approx 0.057 \]
\[ \boxed{(0.543,\ 0.657)} \]