Kendall's Tau¶
While Pearson and Spearman correlation measure linear and monotonic association respectively, Kendall's Tau offers a different perspective: it measures association in terms of the probability that two randomly chosen observations are concordant. This pairwise comparison approach produces a statistic with attractive theoretical properties, including more reliable p-values for small samples and a natural probabilistic interpretation.
Concordant and Discordant Pairs¶
Given \(n\) paired observations \((x_1, y_1), \ldots, (x_n, y_n)\), consider any pair of observations \((x_i, y_i)\) and \((x_j, y_j)\) with \(i < j\):
- The pair is concordant if \((x_i - x_j)\) and \((y_i - y_j)\) have the same sign, meaning both variables move in the same direction
- The pair is discordant if \((x_i - x_j)\) and \((y_i - y_j)\) have opposite signs
- The pair is tied if \(x_i = x_j\) or \(y_i = y_j\)
There are \(\binom{n}{2} = \frac{n(n-1)}{2}\) total pairs.
Tau-a¶
The simplest form, Kendall's Tau-a, is defined as
where \(C\) is the number of concordant pairs and \(D\) is the number of discordant pairs. Tau-a does not adjust for ties, so it can fail to reach \(\pm 1\) even for perfectly monotonic data when ties are present.
Tau-b¶
Kendall's Tau-b adjusts for ties and is defined as
where:
- \(n_0 = \binom{n}{2}\) is the total number of pairs
- \(n_1 = \sum_k \binom{t_k}{2}\) counts tied pairs in \(X\) (with \(t_k\) the size of the \(k\)-th group of ties)
- \(n_2 = \sum_l \binom{u_l}{2}\) counts tied pairs in \(Y\) (with \(u_l\) the size of the \(l\)-th group of ties)
When there are no ties, Tau-b reduces to Tau-a.
SciPy Default
The function scipy.stats.kendalltau computes Tau-b by default. This is the appropriate choice for most practical applications, since real data frequently contains ties.
Properties¶
Kendall's Tau shares some properties with other correlation measures while having distinctive characteristics:
- Range: \(-1 \leq \tau \leq 1\), with \(\tau = 1\) indicating perfect concordance (all pairs agree) and \(\tau = -1\) indicating perfect discordance
- Probabilistic interpretation: \(\tau = P(\text{concordant}) - P(\text{discordant})\) for a randomly chosen pair
- Relationship to Spearman: In general \(|\tau| \leq |r_s|\), so Kendall values are typically smaller in magnitude than Spearman values for the same data
- Symmetry: \(\tau(X, Y) = \tau(Y, X)\)
Computing Kendall's Tau with SciPy¶
import numpy as np
from scipy import stats
# Generate monotonically related data with some noise
np.random.seed(42)
n = 100
x = np.random.uniform(0, 10, n)
y = 2 * x + np.random.normal(0, 2, n)
# Compute Kendall's Tau-b
tau, p_value = stats.kendalltau(x, y)
print(f"Kendall tau-b = {tau:.4f}, p-value = {p_value:.2e}")
# Compare with Spearman
rho, _ = stats.spearmanr(x, y)
print(f"Spearman rho = {rho:.4f}")
print(f"|tau| <= |rho|: {abs(tau) <= abs(rho)}")
The p-value tests the null hypothesis \(H_0\!: \tau = 0\) (no association) against the two-sided alternative.
When to Prefer Kendall's Tau
Kendall's Tau is preferred over Spearman in several situations: (1) small sample sizes, where its p-values are more reliable; (2) when a probabilistic interpretation is desired; and (3) when working with ordinal data that contains many ties, since Tau-b handles ties explicitly.
Summary¶
Kendall's Tau measures association by counting concordant and discordant pairs among all observation pairs. Tau-a uses the simple proportion, while Tau-b adjusts for ties in the denominator. The statistic ranges from \(-1\) to \(1\), has a direct probabilistic interpretation, and is available through scipy.stats.kendalltau, which computes Tau-b by default.