Linear Algebra Notation and Conventions¶
Linear algebra is the language of multivariate statistics. Regression, principal component analysis, ANOVA decompositions, and multivariate distributions all rely on matrix and vector operations. This section establishes the notation used throughout the book and reviews the essential results.
Vectors¶
Notation¶
Vectors are denoted by lowercase bold letters. Unless stated otherwise, all vectors are column vectors:
A row vector is written as the transpose \(\mathbf{x}^T = (x_1, x_2, \dots, x_n)\).
Basic Operations¶
| Operation | Notation | Result |
|---|---|---|
| Scalar multiplication | \(c\mathbf{x}\) | \((cx_1, \dots, cx_n)^T\) |
| Addition | \(\mathbf{x} + \mathbf{y}\) | \((x_1 + y_1, \dots, x_n + y_n)^T\) |
| Dot (inner) product | \(\mathbf{x}^T \mathbf{y}\) | \(\sum_{i=1}^n x_i y_i \in \mathbb{R}\) |
| Euclidean norm | \(\lVert \mathbf{x} \rVert\) | \(\sqrt{\mathbf{x}^T \mathbf{x}}\) |
Statistical Interpretation¶
In data analysis, an observation vector \(\mathbf{x}_i \in \mathbb{R}^p\) represents the \(p\) measured features of the \(i\)-th observation. Stacking \(n\) such observations row-wise produces the design matrix \(\mathbf{X} \in \mathbb{R}^{n \times p}\).
Matrices¶
Notation¶
Matrices are denoted by uppercase bold letters:
The element in row \(i\), column \(j\) is \(a_{ij}\) or \([\mathbf{A}]_{ij}\).
Special Matrices¶
| Matrix | Notation | Definition |
|---|---|---|
| Identity | \(\mathbf{I}_n\) | \([\mathbf{I}]_{ij} = \begin{cases} 1 & i = j \\ 0 & i \neq j \end{cases}\) |
| Zero matrix | \(\mathbf{0}\) | All entries zero |
| Diagonal | \(\text{diag}(d_1, \dots, d_n)\) | \(a_{ij} = 0\) for \(i \neq j\) |
| Symmetric | \(\mathbf{A} = \mathbf{A}^T\) | \(a_{ij} = a_{ji}\) |
| Ones vector | \(\mathbf{1}_n\) | \((1, 1, \dots, 1)^T \in \mathbb{R}^n\) |
Matrix Operations¶
Multiplication¶
For \(\mathbf{A} \in \mathbb{R}^{m \times k}\) and \(\mathbf{B} \in \mathbb{R}^{k \times n}\):
The result \(\mathbf{AB} \in \mathbb{R}^{m \times n}\). Matrix multiplication is not commutative in general: \(\mathbf{AB} \neq \mathbf{BA}\).
Transpose¶
Properties:
- \((\mathbf{A}^T)^T = \mathbf{A}\)
- \((\mathbf{AB})^T = \mathbf{B}^T \mathbf{A}^T\)
- \((\mathbf{A} + \mathbf{B})^T = \mathbf{A}^T + \mathbf{B}^T\)
Trace¶
For a square matrix \(\mathbf{A} \in \mathbb{R}^{n \times n}\):
Properties:
- \(\text{tr}(\mathbf{A} + \mathbf{B}) = \text{tr}(\mathbf{A}) + \text{tr}(\mathbf{B})\)
- \(\text{tr}(\mathbf{AB}) = \text{tr}(\mathbf{BA})\) (cyclic property)
- \(\text{tr}(c\mathbf{A}) = c\,\text{tr}(\mathbf{A})\)
The trace appears in expressions for the sum of squared residuals and in the expected value of quadratic forms.
Determinant¶
The determinant \(\det(\mathbf{A})\) or \(|\mathbf{A}|\) is a scalar that encodes whether a matrix is invertible.
For a \(2 \times 2\) matrix:
Properties:
- \(\det(\mathbf{AB}) = \det(\mathbf{A})\det(\mathbf{B})\)
- \(\det(\mathbf{A}^T) = \det(\mathbf{A})\)
- \(\det(c\mathbf{A}) = c^n \det(\mathbf{A})\) for \(\mathbf{A} \in \mathbb{R}^{n \times n}\)
- \(\mathbf{A}\) is invertible iff \(\det(\mathbf{A}) \neq 0\)
The determinant appears in the density of the multivariate normal distribution: \(f(\mathbf{x}) \propto |\boldsymbol{\Sigma}|^{-1/2} \exp\!\bigl(-\tfrac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})\bigr)\).
Inverse¶
The inverse of a square matrix \(\mathbf{A}\) (when it exists) satisfies:
Properties:
- \((\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}\)
- \((\mathbf{A}^T)^{-1} = (\mathbf{A}^{-1})^T\)
Rank and Linear Independence¶
- A set of vectors \(\{\mathbf{v}_1, \dots, \mathbf{v}_k\}\) is linearly independent if no vector can be written as a linear combination of the others.
- The column rank of \(\mathbf{A}\) is the maximum number of linearly independent columns. For any matrix, column rank equals row rank, so we simply say rank.
- \(\mathbf{A} \in \mathbb{R}^{n \times n}\) is full rank iff \(\text{rank}(\mathbf{A}) = n\) iff \(\mathbf{A}\) is invertible.
In regression, the design matrix \(\mathbf{X}\) must have full column rank for the OLS estimator \((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\) to exist. Multicollinearity (Chapter 12) is precisely the situation where \(\mathbf{X}\) is close to rank-deficient.
Eigenvalues and Eigenvectors¶
For a square matrix \(\mathbf{A} \in \mathbb{R}^{n \times n}\), a scalar \(\lambda\) and nonzero vector \(\mathbf{v}\) satisfying
are an eigenvalue–eigenvector pair.
Properties for Symmetric Matrices¶
When \(\mathbf{A} = \mathbf{A}^T\) (which includes covariance matrices):
- All eigenvalues are real.
- Eigenvectors corresponding to distinct eigenvalues are orthogonal.
- \(\mathbf{A}\) admits the spectral decomposition: \(\mathbf{A} = \mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^T\), where \(\mathbf{Q}\) is orthogonal and \(\boldsymbol{\Lambda} = \text{diag}(\lambda_1, \dots, \lambda_n)\).
Connection to Statistics¶
- \(\text{tr}(\mathbf{A}) = \sum_i \lambda_i\) and \(\det(\mathbf{A}) = \prod_i \lambda_i\).
- A symmetric matrix is positive definite (\(\mathbf{A} \succ 0\)) iff all eigenvalues are strictly positive. Covariance matrices are positive semi-definite.
- Principal Component Analysis rotates data into the eigenvector basis of the sample covariance matrix.
Positive Definite Matrices¶
A symmetric matrix \(\mathbf{A}\) is:
| Type | Condition | Eigenvalues |
|---|---|---|
| Positive definite (\(\mathbf{A} \succ 0\)) | \(\mathbf{x}^T \mathbf{A} \mathbf{x} > 0\) for all \(\mathbf{x} \neq \mathbf{0}\) | All \(\lambda_i > 0\) |
| Positive semi-definite (\(\mathbf{A} \succeq 0\)) | \(\mathbf{x}^T \mathbf{A} \mathbf{x} \geq 0\) for all \(\mathbf{x}\) | All \(\lambda_i \geq 0\) |
Covariance matrices \(\boldsymbol{\Sigma}\) are always positive semi-definite. If no feature is a deterministic linear combination of others, \(\boldsymbol{\Sigma}\) is positive definite and invertible.
Quadratic Forms¶
A quadratic form associated with a symmetric matrix \(\mathbf{A}\) is
Quadratic forms appear frequently in statistics:
- Sum of squares: \(\text{SSR} = (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})^T(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})\)
- Mahalanobis distance: \((\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\)
- Chi-square statistics: sums of squared standardized residuals
Projection Matrices¶
The orthogonal projection onto the column space of \(\mathbf{X}\) (assuming full column rank) is
This is the hat matrix in regression, so named because \(\hat{\mathbf{y}} = \mathbf{H}\mathbf{y}\).
Properties:
- \(\mathbf{H}^2 = \mathbf{H}\) (idempotent)
- \(\mathbf{H}^T = \mathbf{H}\) (symmetric)
- \(\text{tr}(\mathbf{H}) = p\) (number of parameters)
- \(\mathbf{I} - \mathbf{H}\) projects onto the orthogonal complement (residual space)
Matrix Calculus (Quick Reference)¶
The following derivative identities are used in deriving estimators:
| Expression | Derivative w.r.t. \(\mathbf{x}\) |
|---|---|
| \(\mathbf{a}^T \mathbf{x}\) | \(\mathbf{a}\) |
| \(\mathbf{x}^T \mathbf{A} \mathbf{x}\) | \((\mathbf{A} + \mathbf{A}^T)\mathbf{x}\); if \(\mathbf{A}\) symmetric: \(2\mathbf{A}\mathbf{x}\) |
| \(\mathbf{x}^T \mathbf{x}\) | \(2\mathbf{x}\) |
Deriving OLS: Minimize \((\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\) with respect to \(\boldsymbol{\beta}\):
Notational Conventions Used in This Book¶
| Symbol | Meaning |
|---|---|
| \(\mathbf{x}, \mathbf{y}, \boldsymbol{\beta}\) | Column vectors (bold lowercase) |
| \(\mathbf{X}, \mathbf{A}, \boldsymbol{\Sigma}\) | Matrices (bold uppercase) |
| \(x_i\), \(a_{ij}\) | Scalar entries (plain lowercase) |
| \(\mathbf{I}_n\) | \(n \times n\) identity matrix |
| \(\mathbf{1}_n\) | \(n\)-vector of ones |
| \(\mathbf{0}\) | Zero vector or matrix (size from context) |
| \(\mathbf{A}^T\) | Transpose |
| \(\mathbf{A}^{-1}\) | Inverse |
| \(\text{tr}(\mathbf{A})\) | Trace |
| \(\det(\mathbf{A})\) or \(\lvert\mathbf{A}\rvert\) | Determinant |
| \(\text{diag}(\cdot)\) | Diagonal matrix |
| \(\lVert \mathbf{x} \rVert\) | Euclidean (\(\ell_2\)) norm, unless otherwise specified |
Summary¶
| Concept | Where It Appears |
|---|---|
| Vectors and dot products | Feature vectors, inner products in regression |
| Matrix multiplication | Design matrix operations, covariance computation |
| Inverse | OLS formula \((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\) |
| Eigendecomposition | PCA, spectral properties of covariance matrices |
| Positive definiteness | Covariance matrices, quadratic form positivity |
| Projection matrices | Hat matrix, residual decomposition in regression |
| Matrix calculus | Deriving MLE and least-squares estimators |