Decision Threshold Tuning¶
Overview¶
In binary classification, logistic regression produces predicted probabilities in \([0,1]\). To make a hard prediction (positive or negative), we must choose a decision threshold. The default threshold of 0.5 is not always optimal—the right choice depends on the costs of false positives and false negatives.
Adjusting the threshold directly affects the precision-recall tradeoff and confusion matrix, allowing you to align the classifier with the specific needs of your application.
Default Threshold: 0.5¶
By convention, most binary classifiers use a threshold of 0.5:
Under this rule, an observation is classified as positive if the predicted probability exceeds 50%.
When is 0.5 Appropriate?¶
The 0.5 threshold is optimal when: - The cost of false positives equals the cost of false negatives - The classes are balanced (similar prevalence) - You have no prior reason to favor one error type over the other
In many real applications, these conditions do not hold.
Adjusting the Threshold¶
Lower Threshold (e.g., 0.2)¶
When you lower the threshold, the classifier becomes more lenient—it predicts "positive" more often.
Effect on confusion matrix: - Recall (sensitivity) increases: catches more true positives - Specificity decreases: more false positives - Precision decreases: fewer predicted positives are actually correct
Use when: - The cost of missing a positive (false negative) is high - Medical screening: minimize missed disease cases - Fraud detection: catch more fraudulent transactions - Loan default prediction: identify as many risky borrowers as possible
Higher Threshold (e.g., 0.8)¶
When you raise the threshold, the classifier becomes more conservative—it predicts "positive" only when very confident.
Effect on confusion matrix: - Specificity (true negative rate) increases: fewer false alarms - Recall decreases: more false negatives - Precision increases: most predicted positives are correct
Use when: - The cost of a false positive is high - Email spam filtering: avoid filtering legitimate emails - Credit approval: approve only very safe borrowers - Recommender systems: only recommend when very confident
Example: Loan Default Prediction¶
Model Predictions¶
Consider a logistic regression model trained on 1,000 loan observations. The distribution of predicted probabilities is:
- 400 loans have predicted probability < 0.2 (very likely to repay)
- 300 loans have predicted probability 0.2-0.5 (likely to repay)
- 200 loans have predicted probability 0.5-0.8 (likely to default)
- 100 loans have predicted probability > 0.8 (very likely to default)
Confusion Matrices at Different Thresholds¶
Threshold = 0.5 (balanced):
Predicted
No Default Default
Actual No Default 280 120
Actual Default 70 530
- Sensitivity (recall): 530/(530+70) = 0.883
- Specificity: 280/(280+120) = 0.700
- Precision: 530/(530+120) = 0.815
Threshold = 0.2 (lenient):
Predicted
No Default Default
Actual No Default 150 250
Actual Default 10 590
- Sensitivity: 590/(590+10) = 0.983 (catch almost all defaults)
- Specificity: 150/(150+250) = 0.375 (many false alarms)
- Precision: 590/(590+250) = 0.702 (less reliable)
Threshold = 0.8 (conservative):
Predicted
No Default Default
Actual No Default 350 50
Actual Default 250 350
- Sensitivity: 350/(350+250) = 0.583 (miss more defaults)
- Specificity: 350/(350+50) = 0.875 (fewer false alarms)
- Precision: 350/(350+50) = 0.875 (very reliable)
The Precision-Recall Tradeoff¶
As you lower the threshold: - Recall (sensitivity) ↑: Identify more positives - Precision ↓: More false positives dilute the positive predictions
As you raise the threshold: - Precision ↑: Fewer false positives - Recall ↓: Miss more true positives
This fundamental tradeoff is visualized in the precision-recall curve, which plots precision (y-axis) against recall (x-axis) as the threshold varies from 1 to 0.
Selecting the Optimal Threshold¶
Cost-Sensitive Approach¶
If you know the cost of each error type, you can compute the optimal threshold analytically. Let:
- \(C_{FP}\) = cost of a false positive
- \(C_{FN}\) = cost of a false negative
The optimal threshold is approximately:
For example, if a missed default costs $1,000 and a false alarm costs $50:
This high threshold reflects the asymmetric costs.
Performance Metrics¶
Common methods to select a threshold:
| Metric | Optimal When |
|---|---|
| F1 Score | \(F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}\) |
| Youden's J | \(J = \text{Sensitivity} + \text{Specificity} - 1\) |
| Precision-Recall Curve | Look for "elbow" or domain-specific target |
| ROC Curve | Youden's J maximizes TPR - FPR |
Domain-Specific Targets¶
In practice, domain knowledge often guides threshold selection:
- Medical screening: High recall (catch disease early), tolerate false alarms
- Fraud detection: High precision (avoid hassling customers), moderate recall
- Loan approval: High precision (avoid defaults), lower recall (accept some good borrowers)
Implementation¶
Python Example¶
from sklearn.metrics import (confusion_matrix, precision_score,
recall_score, f1_score)
# Assume model trained and y_prob contains predicted probabilities
thresholds = [0.2, 0.3, 0.5, 0.7, 0.8]
for t in thresholds:
y_pred = (y_prob >= t).astype(int)
cm = confusion_matrix(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print(f"Threshold {t}: Precision={precision:.3f}, "
f"Recall={recall:.3f}, F1={f1:.3f}")
ROC Curve and Youden's J¶
from sklearn.metrics import roc_curve, roc_auc_score
# Compute ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
# Find optimal threshold by Youden's J
j_scores = tpr - fpr
optimal_idx = np.argmax(j_scores)
optimal_threshold = thresholds[optimal_idx]
print(f"Optimal threshold (Youden's J): {optimal_threshold:.3f}")
Key Properties of Threshold Tuning¶
| Aspect | Impact |
|---|---|
| Computation | Fast: just change classification rule, no retraining |
| Visualization | ROC and PR curves show all threshold choices |
| Interpretability | Directly controls false positive / false negative rate |
| Limitation | Cannot improve AUC (overall ranking), only adjust for specific operating point |
Application to Linear Discriminant Analysis (LDA)¶
The threshold concept extends beyond logistic regression to any probabilistic classifier. For example, Linear Discriminant Analysis (LDA) also produces class probabilities \(P(Y=1|\mathbf{x})\), and you can tune its threshold in the same way.
LDA classifier with threshold 0.5 (default):
Changing to threshold 0.2 makes LDA more lenient, increasing recall at the cost of precision, just as with logistic regression.
Summary¶
- Default threshold 0.5 is only optimal when error costs are equal and classes are balanced
- Lower thresholds (0.2-0.3) increase recall but decrease precision: use when missing positives is costly
- Higher thresholds (0.7-0.8) increase precision but decrease recall: use when false positives are costly
- ROC and PR curves visualize the full tradeoff across thresholds
- Youden's J and F1 score provide automatic selection methods
- Threshold tuning is fast and requires no retraining, making it practical for operational adjustments