Dimensionality Reduction: PCR and PLS¶

Overview¶

When the number of predictors \(p\) is large (especially when \(p > n\), more predictors than observations), or when severe multicollinearity exists among predictors, coefficient shrinkage methods (Ridge, Lasso) become less efficient. Dimensionality reduction methods provide an alternative: instead of shrinking coefficients, they construct a smaller number of latent variables (principal components or PLS components) and regress on these.

When to Use Dimensionality Reduction¶

Dimensionality reduction is valuable when:

\(p\) is very large — Many more predictors than observations (\(p > n\) or \(p >> n\))
Severe multicollinearity — Predictors are highly correlated, making OLS estimates unstable
Prediction is the goal — Interpretability of individual coefficients is less critical
Computational efficiency — Reducing dimensionality improves computational speed
Data comes from measurements — E.g., spectroscopy, sensor arrays where all dimensions capture noise

Key Methods¶

Principal Components Regression (PCR)¶

Unsupervised dimensionality reduction: Constructs principal components by maximizing variance in the predictor space X, then regresses the response on these components.¶

Algorithm:¶

Standardize predictors
Perform PCA to extract principal components
Select optimal number of components via cross-validation
Regress response on selected components

Advantages:¶

Eliminates multicollinearity (components are uncorrelated)
Handles \(p > n\) settings naturally
Simple, well-understood method
PCA components have clear interpretation in terms of variance

Disadvantages:¶

Unsupervised: PCA maximizes \(X\) variance, not covariance with \(y\)
May require many components to capture response variation
Loss of original feature interpretability
Components are linear combinations of all predictors

When to use: When predictor variance aligns with response variation, or as a baseline for comparison.¶

Partial Least Squares (PLS)¶

Supervised dimensionality reduction: Constructs components by maximizing covariance between predictors and response, balancing explaining variance in X with predicting y.¶

Algorithm:¶

Standardize predictors and response
Iteratively construct components that maximize covariance of \(X\) with \(y\)
Select optimal number of components via cross-validation
Regress response on selected components

Advantages:¶

Supervised: Components are chosen to predict response
Often requires fewer components than PCR (more efficient)
Excellent for high-dimensional prediction problems
Components respect covariance structure between \(X\) and \(y\)
Originated in chemometrics; proven in practice

Disadvantages:¶

Components still lack direct interpretability
Theory less developed than OLS (fewer asymptotic results)
Standardization required and can affect results
Must tune number of components

When to use: When prediction accuracy is the goal and p is large; typically outperforms PCR.¶

Comparison: PCR vs PLS¶

Aspect	PCR	PLS
Objective	Maximize variance in \(X\)	Maximize covariance of \(X\) and \(y\)
Component selection	Unsupervised	Supervised
Typical # components	Many (to capture \(y\) variation)	Fewer (aligned with \(y\))
Computational complexity	O(min(n,p)³) for PCA	O(min(n,p)²) per component
When it excels	Variance in \(X\) important	Predicting \(y\) is goal
Interpretability	Same loss as PLS	Same loss as PCR

Rule of thumb: In practice, PLS often outperforms PCR because its supervised component selection aligns better with the prediction objective.¶

Comparison with Shrinkage Methods¶

Method	Type	Sparsity	Interpretability	High-dim Capability
Ridge	Shrinkage	Dense	High	Good (when \(p >> n\))
Lasso	Shrinkage	Sparse	High	Good (automatic selection)
Elastic Net	Shrinkage	Sparse	High	Good (balanced)
PCR	Dimension reduction	N/A	Low	Excellent
PLS	Dimension reduction	N/A	Low	Excellent

Decision tree:¶

Need feature selection? → Lasso or Elastic Net
Want all features + stability? → Ridge
\(p\) very large or \(p > n\)? → PCR or PLS
Prediction focus, high-dimensionality? → PLS
Need interpretability? → Ridge or Lasso

Contents¶

Principal Components Regression (PCR) — Unsupervised approach using PCA for dimensionality reduction
Partial Least Squares (PLS) — Supervised approach maximizing covariance with response

Code Examples¶

See code/pcr_pls_examples.py for: - Full implementation of PCR with cross-validation - Full implementation of PLS with cross-validation - Comprehensive model comparison - Visualizations (scree plots, CV curves, predictions) - Real-world housing price prediction example

Practical Guidance¶

When to Use PCR or PLS¶

Choose PCR/PLS if:¶

\(p > n\) (more predictors than observations)
Multicollinearity is severe
\(p\) is large (50+ predictors) even if \(p < n\)
You have spectroscopic or sensor data
Prediction accuracy is paramount
Interpretability of individual coefficients is not critical

Choose Ridge/Lasso if:¶

Feature selection is important
You need to understand which predictors matter
\(p\) is moderate (< 50) and \(p < n\)
Interpretability is crucial
You can afford computational cost of fitting multiple models

Cross-Validation Strategy¶

Both PCR and PLS require selecting the number of components via cross-validation:

from sklearn.model_selection import cross_val_score

# Test different numbers of components
for n_components in range(1, min(n_samples, n_predictors) + 1):
    cv_scores = cross_val_score(model, X_scaled, y, cv=10,
                                scoring='neg_mean_squared_error')
    mse = -cv_scores.mean()

Always use a held-out test set to report final performance; don't trust CV error alone.

Summary¶

Dimensionality reduction methods (PCR and PLS) offer powerful alternatives to shrinkage when:

The predictor space is very high-dimensional (\(p >> n\) or \(p > n\))
Multicollinearity is severe across many predictors
Prediction accuracy is prioritized over interpretability

Key distinction: PLS's supervised component construction usually outperforms PCR's unsupervised approach, making PLS the preferred choice when prediction is the goal. Choose PCR only when variance in predictor space (not covariance with response) is the primary concern.¶

Dimensionality Reduction: PCR and PLS¶

Overview¶

When to Use Dimensionality Reduction¶

Key Methods¶

Principal Components Regression (PCR)¶

Unsupervised dimensionality reduction: Constructs principal components by maximizing variance in the predictor space X, then regresses the response on these components.¶

Algorithm:¶

Advantages:¶

Disadvantages:¶

When to use: When predictor variance aligns with response variation, or as a baseline for comparison.¶

Partial Least Squares (PLS)¶

Supervised dimensionality reduction: Constructs components by maximizing covariance between predictors and response, balancing explaining variance in X with predicting y.¶

Algorithm:¶

Advantages:¶

Disadvantages:¶

When to use: When prediction accuracy is the goal and p is large; typically outperforms PCR.¶

Comparison: PCR vs PLS¶

Rule of thumb: In practice, PLS often outperforms PCR because its supervised component selection aligns better with the prediction objective.¶

Comparison with Shrinkage Methods¶

Decision tree:¶

Contents¶

Code Examples¶

Practical Guidance¶

When to Use PCR or PLS¶

Choose PCR/PLS if:¶

Choose Ridge/Lasso if:¶

Cross-Validation Strategy¶

Further Reading¶

Key References¶

Summary¶

Key distinction: PLS's supervised component construction usually outperforms PCR's unsupervised approach, making PLS the preferred choice when prediction is the goal. Choose PCR only when variance in predictor space (not covariance with response) is the primary concern.¶

Dimensionality Reduction: PCR and PLS¶

Overview¶

When to Use Dimensionality Reduction¶

Key Methods¶

Principal Components Regression (PCR)¶

Unsupervised dimensionality reduction: Constructs principal components by maximizing variance in the predictor space X, then regresses the response on these components.¶

Algorithm:¶

Advantages:¶

Disadvantages:¶

When to use: When predictor variance aligns with response variation, or as a baseline for comparison.¶

Partial Least Squares (PLS)¶

Supervised dimensionality reduction: Constructs components by maximizing covariance between predictors and response, balancing explaining variance in X with predicting y.¶

Algorithm:¶

Advantages:¶

Disadvantages:¶

When to use: When prediction accuracy is the goal and p is large; typically outperforms PCR.¶

Comparison: PCR vs PLS¶

Rule of thumb: In practice, PLS often outperforms PCR because its supervised component selection aligns better with the prediction objective.¶

Comparison with Shrinkage Methods¶

Decision tree:¶

Contents¶

Code Examples¶

Practical Guidance¶

When to Use PCR or PLS¶

Choose PCR/PLS if:¶

Choose Ridge/Lasso if:¶

Cross-Validation Strategy¶

Further Reading¶

Key References¶

Related Topics¶

Summary¶

Key distinction: PLS's supervised component construction usually outperforms PCR's unsupervised approach, making PLS the preferred choice when prediction is the goal. Choose PCR only when variance in predictor space (not covariance with response) is the primary concern.¶