Least Squares Connection¶

When np.polyfit fits a polynomial to data, it solves a least squares problem under the hood. Understanding this connection reveals why polynomial fitting works, when it fails, and how to control the numerical behavior. The same least squares machinery underlies linear regression, spline fitting, and many other approximation techniques.

Mental Model

Polynomial fitting is really matrix algebra in disguise. You build a Vandermonde matrix from your x-values, then solve an overdetermined system in the least-squares sense. The coefficients that minimize the sum of squared residuals are the same ones that np.polyfit returns -- understanding this connection lets you diagnose ill-conditioning and choose the right degree.

Big Picture

All polynomial fitting reduces to one equation: \(V \mathbf{c} \approx \mathbf{y}\), where \(V\) is the Vandermonde matrix built from x-values and \(\mathbf{c}\) are the unknown coefficients. Everything else — normal equations, QR, SVD, scaling — is about how we solve that system stably.

Unified Approximation Map

Every topic in this section connects through a single pipeline:

text Data (x, y) │ ▼ Vandermonde Matrix V │ ▼ Least Squares: V c ≈ y │ ├──→ polyfit / poly1d (legacy API, power basis) │ ├──→ Polynomial.fit (modern API, domain scaling) │ │ │ ▼ │ Polynomial class (coefficients + domain + window) │ └──→ Problems │ ├── Ill-conditioning → fix with domain mapping (Polynomial.fit) └── Oscillation → fix with splines (scipy.interpolate)

At the deepest level, this is the same \(A\mathbf{x} \approx \mathbf{b}\) that appears in linear regression, signal processing (FFT is a linear transform), and every other least-squares problem in the book.

The Fitting Problem¶

Given data points \((x_i, y_i)\) for \(i = 1, \ldots, m\), find the polynomial of degree \(n\) that best fits the data.

1. Polynomial Model¶

A polynomial of degree \(n\) has \(n + 1\) coefficients:

\[ p(x) = c_0 + c_1 x + c_2 x^2 + \cdots + c_n x^n \]

2. Residuals¶

The residual at each data point measures the fitting error:

\[ r_i = y_i - p(x_i) \]

3. Least Squares Objective¶

Find coefficients \(c_0, c_1, \ldots, c_n\) that minimize the sum of squared residuals:

\[ \min_{c_0, \ldots, c_n} \sum_{i=1}^{m} \left( y_i - \sum_{j=0}^{n} c_j x_i^j \right)^2 \]

The Vandermonde Matrix¶

The fitting problem becomes a linear system through the Vandermonde matrix.

1. Matrix Formulation¶

Stacking all data points into a matrix equation:

\[ \underbrace{\begin{pmatrix} 1 & x_1 & x_1^2 & \cdots & x_1^n \\ 1 & x_2 & x_2^2 & \cdots & x_2^n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_m & x_m^2 & \cdots & x_m^n \end{pmatrix}}_{V} \underbrace{\begin{pmatrix} c_0 \\ c_1 \\ \vdots \\ c_n \end{pmatrix}}_{\mathbf{c}} = \underbrace{\begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{pmatrix}}_{\mathbf{y}} \]

2. Building the Vandermonde in NumPy¶

```python import numpy as np

def main(): x = np.array([0, 1, 2, 3, 4], dtype=float) deg = 2

# np.vander builds the Vandermonde matrix
V = np.vander(x, N=deg + 1, increasing=True)
print("Vandermonde matrix:")
print(V)

if name == "main": main() ```

Output:

Vandermonde matrix: [[ 1. 0. 0.] [ 1. 1. 1.] [ 1. 2. 4.] [ 1. 3. 9.] [ 1. 4. 16.]]

3. Overdetermined System¶

When \(m > n + 1\) (more data points than coefficients), the system \(V \mathbf{c} = \mathbf{y}\) has no exact solution. The least squares solution minimizes \(\|V \mathbf{c} - \mathbf{y}\|_2^2\).

Normal Equations¶

The classical approach to solving least squares.

1. Derivation¶

Setting the gradient to zero yields the normal equations:

\[ V^T V \, \mathbf{c} = V^T \mathbf{y} \]

2. Direct Solution¶

```python import numpy as np

def main(): x = np.array([0, 1, 2, 3, 4], dtype=float) y = np.array([1, 3, 7, 13, 21], dtype=float) deg = 2

V = np.vander(x, N=deg + 1, increasing=True)
# Normal equations: (V^T V) c = V^T y
c = np.linalg.solve(V.T @ V, V.T @ y)
print("Coefficients (ascending):", c)

if name == "main": main() ```

Output:

Coefficients (ascending): [1. 1. 1.]

3. Numerical Issues¶

The normal equations square the condition number of \(V\). If \(V\) has condition number \(\kappa\), then \(V^T V\) has condition number \(\kappa^2\), amplifying numerical errors. For this reason, np.polyfit does not use normal equations directly.

QR Decomposition Approach¶

The method np.polyfit actually uses for better numerical stability.

1. QR Factorization¶

Decompose \(V = QR\) where \(Q\) is orthogonal and \(R\) is upper triangular:

\[ V \mathbf{c} = \mathbf{y} \implies QR \mathbf{c} = \mathbf{y} \implies R \mathbf{c} = Q^T \mathbf{y} \]

2. Implementation¶

```python import numpy as np

def main(): x = np.array([0, 1, 2, 3, 4], dtype=float) y = np.array([1, 3, 7, 13, 21], dtype=float) deg = 2

V = np.vander(x, N=deg + 1, increasing=True)
Q, R = np.linalg.qr(V)
c = np.linalg.solve(R, Q.T @ y)
print("Coefficients (QR):", c)

if name == "main": main() ```

3. Why QR is Better¶

The condition number of \(R\) equals the condition number of \(V\) (not its square), providing much better numerical stability for high-degree polynomials or poorly spaced data points.

np.lstsq — Direct Least Squares¶

NumPy provides np.linalg.lstsq for solving least squares problems directly.

1. Basic Usage¶

```python import numpy as np

def main(): x = np.array([0, 1, 2, 3, 4], dtype=float) y = np.array([1, 3, 7, 13, 21], dtype=float) deg = 2

V = np.vander(x, N=deg + 1, increasing=True)
c, residuals, rank, sv = np.linalg.lstsq(V, y, rcond=None)
print("Coefficients:", c)
print("Residuals:", residuals)
print("Rank:", rank)
print("Singular values:", sv)

if name == "main": main() ```

Output:

Coefficients: [1. 1. 1.] Residuals: [] Rank: 3 Singular values: [18.76166304 2.07972297 0.26600784]

2. Residuals Interpretation¶

The residuals array contains \(\|V \mathbf{c} - \mathbf{y}\|_2^2\). When the fit is exact (as with degree = number of points - 1), the residuals array is empty.

3. Comparison with polyfit¶

```python import numpy as np

def main(): x = np.array([1, 2, 3, 4, 5], dtype=float) y = np.array([2.1, 3.9, 6.2, 7.8, 10.1], dtype=float)

# polyfit (descending order)
coeffs_polyfit = np.polyfit(x, y, 2)

# lstsq (ascending order with Vandermonde)
V = np.vander(x, N=3, increasing=True)
coeffs_lstsq, _, _, _ = np.linalg.lstsq(V, y, rcond=None)

print("polyfit (descending):", coeffs_polyfit)
print("lstsq   (ascending):", coeffs_lstsq)
print("Match:", np.allclose(coeffs_polyfit, coeffs_lstsq[::-1]))

if name == "main": main() ```

Conditioning and Stability¶

When polynomial fitting fails numerically.

1. Ill-Conditioned Vandermonde¶

```python import numpy as np

def main(): # Wide-range x values create ill-conditioned Vandermonde x = np.array([1, 10, 100, 1000, 10000], dtype=float) y = np.array([1, 2, 3, 4, 5], dtype=float)

V = np.vander(x, N=3, increasing=True)
cond = np.linalg.cond(V)
print(f"Condition number: {cond:.2e}")

if name == "main": main() ```

2. Centering and Scaling¶

Improve conditioning by normalizing the x values:

```python import numpy as np

def main(): x = np.array([1, 10, 100, 1000, 10000], dtype=float) y = np.array([1, 2, 3, 4, 5], dtype=float)

# Center and scale
x_mean = x.mean()
x_std = x.std()
x_norm = (x - x_mean) / x_std

V_raw = np.vander(x, N=3, increasing=True)
V_norm = np.vander(x_norm, N=3, increasing=True)

print(f"Raw condition:        {np.linalg.cond(V_raw):.2e}")
print(f"Normalized condition: {np.linalg.cond(V_norm):.2e}")

if name == "main": main() ```

3. np.polynomial Uses This Automatically¶

The modern np.polynomial.Polynomial.fit method automatically centers and scales internally, which is one reason it is numerically superior to np.polyfit.

Weighted Least Squares¶

Assign different importance to different data points.

1. Weight Parameter in polyfit¶

```python import numpy as np

def main(): x = np.array([0, 1, 2, 3, 4], dtype=float) y = np.array([1.0, 2.8, 5.1, 7.0, 20.0]) # last point is outlier

# Unweighted fit
c_unweighted = np.polyfit(x, y, 1)

# Downweight the outlier
w = np.array([1, 1, 1, 1, 0.1])
c_weighted = np.polyfit(x, y, 1, w=w)

print(f"Unweighted: y = {c_unweighted[0]:.2f}x + {c_unweighted[1]:.2f}")
print(f"Weighted:   y = {c_weighted[0]:.2f}x + {c_weighted[1]:.2f}")

if name == "main": main() ```

2. Mathematical Formulation¶

Weighted least squares minimizes:

\[ \sum_{i=1}^{m} w_i \left( y_i - p(x_i) \right)^2 \]

This is equivalent to solving \(W V \mathbf{c} = W \mathbf{y}\) where \(W = \text{diag}(\sqrt{w_1}, \ldots, \sqrt{w_m})\).

3. Use Cases¶

Weights are useful when data points have different measurement uncertainties. Set \(w_i = 1/\sigma_i^2\) where \(\sigma_i\) is the standard deviation of the measurement at \(x_i\).

Summary¶

Method	Function	Stability	Notes
Normal equations	`np.linalg.solve(V.T @ V, V.T @ y)`	Poor	Condition number squared
QR decomposition	`np.linalg.qr` + solve	Good	Used by `np.polyfit`
SVD-based lstsq	`np.linalg.lstsq`	Best	Returns rank and singular values
Modern API	`Polynomial.fit`	Best	Auto-centering and scaling

The least squares framework unifies all polynomial fitting: np.polyfit builds a Vandermonde matrix and solves via QR, np.linalg.lstsq uses SVD for maximum stability, and np.polynomial.Polynomial.fit adds automatic conditioning improvements.

Exercises¶

Exercise 1. Build the Vandermonde matrix for x = [0, 1, 2, 3, 4] with degree 3 using np.vander(x, N=4, increasing=True). Solve the least-squares problem V @ c = y for y = [1, 2, 9, 28, 65] using both the normal equations and np.linalg.lstsq. Verify both methods produce the same coefficients.

Solution to Exercise 1

import numpy as np

x = np.array([0, 1, 2, 3, 4], dtype=float)
y = np.array([1, 2, 9, 28, 65], dtype=float)
V = np.vander(x, N=4, increasing=True)

# Normal equations
c_normal = np.linalg.solve(V.T @ V, V.T @ y)

# lstsq
c_lstsq, _, _, _ = np.linalg.lstsq(V, y, rcond=None)

print(f"Normal equations: {c_normal}")
print(f"lstsq:           {c_lstsq}")
print(f"Match: {np.allclose(c_normal, c_lstsq)}")

Exercise 2. Fit a degree-1 polynomial (line) to x = [0, 1, 2, 3, 4], y = [1.0, 2.8, 5.1, 7.0, 20.0] with and without weights. Use w = [1, 1, 1, 1, 0.1] to downweight the outlier at x=4. Print the slope and intercept for both fits and confirm the weighted fit is closer to the true underlying line y = 2x + 1.

Solution to Exercise 2

import numpy as np

x = np.array([0, 1, 2, 3, 4], dtype=float)
y = np.array([1.0, 2.8, 5.1, 7.0, 20.0])

# Unweighted
m, b = np.polyfit(x, y, 1)
print(f"Unweighted: y = {m:.2f}x + {b:.2f}")

# Weighted
w = np.array([1, 1, 1, 1, 0.1])
m_w, b_w = np.polyfit(x, y, 1, w=w)
print(f"Weighted:   y = {m_w:.2f}x + {b_w:.2f}")

# True line is y = 2x + 1
print(f"Weighted slope closer to 2: {abs(m_w - 2) < abs(m - 2)}")

Exercise 3. Compute the condition number of the Vandermonde matrix for x = np.linspace(0, 100, 20) at degrees 5, 10, and 15. Then center and scale x to x_norm = (x - x.mean()) / x.std() and recompute the condition numbers. Print the improvement factor at each degree.

Solution to Exercise 3

import numpy as np

x = np.linspace(0, 100, 20)
x_norm = (x - x.mean()) / x.std()

for deg in [5, 10, 15]:
    V_raw = np.vander(x, N=deg+1, increasing=True)
    V_norm = np.vander(x_norm, N=deg+1, increasing=True)
    cond_raw = np.linalg.cond(V_raw)
    cond_norm = np.linalg.cond(V_norm)
    print(f"Degree {deg:2d}: raw cond={cond_raw:.2e}, "
          f"norm cond={cond_norm:.2e}, "
          f"improvement={cond_raw/cond_norm:.0f}x")