Skip to content

Linear Algebra Notation and Conventions

Linear algebra is the language of multivariate statistics. Regression, principal component analysis, ANOVA decompositions, and multivariate distributions all rely on matrix and vector operations. This section establishes the notation used throughout the book and reviews the essential results.

Vectors

Notation

Vectors are denoted by lowercase bold letters. Unless stated otherwise, all vectors are column vectors:

\[ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} \in \mathbb{R}^n \]

A row vector is written as the transpose \(\mathbf{x}^T = (x_1, x_2, \dots, x_n)\).

Basic Operations

Operation Notation Result
Scalar multiplication \(c\mathbf{x}\) \((cx_1, \dots, cx_n)^T\)
Addition \(\mathbf{x} + \mathbf{y}\) \((x_1 + y_1, \dots, x_n + y_n)^T\)
Dot (inner) product \(\mathbf{x}^T \mathbf{y}\) \(\sum_{i=1}^n x_i y_i \in \mathbb{R}\)
Euclidean norm \(\lVert \mathbf{x} \rVert\) \(\sqrt{\mathbf{x}^T \mathbf{x}}\)

Statistical Interpretation

In data analysis, an observation vector \(\mathbf{x}_i \in \mathbb{R}^p\) represents the \(p\) measured features of the \(i\)-th observation. Stacking \(n\) such observations row-wise produces the design matrix \(\mathbf{X} \in \mathbb{R}^{n \times p}\).

Matrices

Notation

Matrices are denoted by uppercase bold letters:

\[ \mathbf{A} = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} \in \mathbb{R}^{m \times n} \]

The element in row \(i\), column \(j\) is \(a_{ij}\) or \([\mathbf{A}]_{ij}\).

Special Matrices

Matrix Notation Definition
Identity \(\mathbf{I}_n\) \([\mathbf{I}]_{ij} = \begin{cases} 1 & i = j \\ 0 & i \neq j \end{cases}\)
Zero matrix \(\mathbf{0}\) All entries zero
Diagonal \(\text{diag}(d_1, \dots, d_n)\) \(a_{ij} = 0\) for \(i \neq j\)
Symmetric \(\mathbf{A} = \mathbf{A}^T\) \(a_{ij} = a_{ji}\)
Ones vector \(\mathbf{1}_n\) \((1, 1, \dots, 1)^T \in \mathbb{R}^n\)

Matrix Operations

Multiplication

For \(\mathbf{A} \in \mathbb{R}^{m \times k}\) and \(\mathbf{B} \in \mathbb{R}^{k \times n}\):

\[ [\mathbf{AB}]_{ij} = \sum_{\ell=1}^{k} a_{i\ell}\, b_{\ell j} \]

The result \(\mathbf{AB} \in \mathbb{R}^{m \times n}\). Matrix multiplication is not commutative in general: \(\mathbf{AB} \neq \mathbf{BA}\).

Transpose

\[ [\mathbf{A}^T]_{ij} = a_{ji} \]

Properties:

  • \((\mathbf{A}^T)^T = \mathbf{A}\)
  • \((\mathbf{AB})^T = \mathbf{B}^T \mathbf{A}^T\)
  • \((\mathbf{A} + \mathbf{B})^T = \mathbf{A}^T + \mathbf{B}^T\)

Trace

For a square matrix \(\mathbf{A} \in \mathbb{R}^{n \times n}\):

\[ \text{tr}(\mathbf{A}) = \sum_{i=1}^n a_{ii} \]

Properties:

  • \(\text{tr}(\mathbf{A} + \mathbf{B}) = \text{tr}(\mathbf{A}) + \text{tr}(\mathbf{B})\)
  • \(\text{tr}(\mathbf{AB}) = \text{tr}(\mathbf{BA})\) (cyclic property)
  • \(\text{tr}(c\mathbf{A}) = c\,\text{tr}(\mathbf{A})\)

The trace appears in expressions for the sum of squared residuals and in the expected value of quadratic forms.

Determinant

The determinant \(\det(\mathbf{A})\) or \(|\mathbf{A}|\) is a scalar that encodes whether a matrix is invertible.

For a \(2 \times 2\) matrix:

\[ \det\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc \]

Properties:

  • \(\det(\mathbf{AB}) = \det(\mathbf{A})\det(\mathbf{B})\)
  • \(\det(\mathbf{A}^T) = \det(\mathbf{A})\)
  • \(\det(c\mathbf{A}) = c^n \det(\mathbf{A})\) for \(\mathbf{A} \in \mathbb{R}^{n \times n}\)
  • \(\mathbf{A}\) is invertible iff \(\det(\mathbf{A}) \neq 0\)

The determinant appears in the density of the multivariate normal distribution: \(f(\mathbf{x}) \propto |\boldsymbol{\Sigma}|^{-1/2} \exp\!\bigl(-\tfrac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})\bigr)\).

Inverse

The inverse of a square matrix \(\mathbf{A}\) (when it exists) satisfies:

\[ \mathbf{A}^{-1}\mathbf{A} = \mathbf{A}\mathbf{A}^{-1} = \mathbf{I} \]

Properties:

  • \((\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}\)
  • \((\mathbf{A}^T)^{-1} = (\mathbf{A}^{-1})^T\)

Rank and Linear Independence

  • A set of vectors \(\{\mathbf{v}_1, \dots, \mathbf{v}_k\}\) is linearly independent if no vector can be written as a linear combination of the others.
  • The column rank of \(\mathbf{A}\) is the maximum number of linearly independent columns. For any matrix, column rank equals row rank, so we simply say rank.
  • \(\mathbf{A} \in \mathbb{R}^{n \times n}\) is full rank iff \(\text{rank}(\mathbf{A}) = n\) iff \(\mathbf{A}\) is invertible.

In regression, the design matrix \(\mathbf{X}\) must have full column rank for the OLS estimator \((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\) to exist. Multicollinearity (Chapter 12) is precisely the situation where \(\mathbf{X}\) is close to rank-deficient.

Eigenvalues and Eigenvectors

For a square matrix \(\mathbf{A} \in \mathbb{R}^{n \times n}\), a scalar \(\lambda\) and nonzero vector \(\mathbf{v}\) satisfying

\[ \mathbf{A}\mathbf{v} = \lambda \mathbf{v} \]

are an eigenvalue–eigenvector pair.

Properties for Symmetric Matrices

When \(\mathbf{A} = \mathbf{A}^T\) (which includes covariance matrices):

  • All eigenvalues are real.
  • Eigenvectors corresponding to distinct eigenvalues are orthogonal.
  • \(\mathbf{A}\) admits the spectral decomposition: \(\mathbf{A} = \mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^T\), where \(\mathbf{Q}\) is orthogonal and \(\boldsymbol{\Lambda} = \text{diag}(\lambda_1, \dots, \lambda_n)\).

Connection to Statistics

  • \(\text{tr}(\mathbf{A}) = \sum_i \lambda_i\) and \(\det(\mathbf{A}) = \prod_i \lambda_i\).
  • A symmetric matrix is positive definite (\(\mathbf{A} \succ 0\)) iff all eigenvalues are strictly positive. Covariance matrices are positive semi-definite.
  • Principal Component Analysis rotates data into the eigenvector basis of the sample covariance matrix.

Positive Definite Matrices

A symmetric matrix \(\mathbf{A}\) is:

Type Condition Eigenvalues
Positive definite (\(\mathbf{A} \succ 0\)) \(\mathbf{x}^T \mathbf{A} \mathbf{x} > 0\) for all \(\mathbf{x} \neq \mathbf{0}\) All \(\lambda_i > 0\)
Positive semi-definite (\(\mathbf{A} \succeq 0\)) \(\mathbf{x}^T \mathbf{A} \mathbf{x} \geq 0\) for all \(\mathbf{x}\) All \(\lambda_i \geq 0\)

Covariance matrices \(\boldsymbol{\Sigma}\) are always positive semi-definite. If no feature is a deterministic linear combination of others, \(\boldsymbol{\Sigma}\) is positive definite and invertible.

Quadratic Forms

A quadratic form associated with a symmetric matrix \(\mathbf{A}\) is

\[ Q(\mathbf{x}) = \mathbf{x}^T \mathbf{A} \mathbf{x} = \sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j \]

Quadratic forms appear frequently in statistics:

  • Sum of squares: \(\text{SSR} = (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})^T(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})\)
  • Mahalanobis distance: \((\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\)
  • Chi-square statistics: sums of squared standardized residuals

Projection Matrices

The orthogonal projection onto the column space of \(\mathbf{X}\) (assuming full column rank) is

\[ \mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \]

This is the hat matrix in regression, so named because \(\hat{\mathbf{y}} = \mathbf{H}\mathbf{y}\).

Properties:

  • \(\mathbf{H}^2 = \mathbf{H}\) (idempotent)
  • \(\mathbf{H}^T = \mathbf{H}\) (symmetric)
  • \(\text{tr}(\mathbf{H}) = p\) (number of parameters)
  • \(\mathbf{I} - \mathbf{H}\) projects onto the orthogonal complement (residual space)

Matrix Calculus (Quick Reference)

The following derivative identities are used in deriving estimators:

Expression Derivative w.r.t. \(\mathbf{x}\)
\(\mathbf{a}^T \mathbf{x}\) \(\mathbf{a}\)
\(\mathbf{x}^T \mathbf{A} \mathbf{x}\) \((\mathbf{A} + \mathbf{A}^T)\mathbf{x}\); if \(\mathbf{A}\) symmetric: \(2\mathbf{A}\mathbf{x}\)
\(\mathbf{x}^T \mathbf{x}\) \(2\mathbf{x}\)

Deriving OLS: Minimize \((\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\) with respect to \(\boldsymbol{\beta}\):

\[ \frac{\partial}{\partial \boldsymbol{\beta}}\bigl[\mathbf{y}^T\mathbf{y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\bigr] = -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0} \]
\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \]

Notational Conventions Used in This Book

Symbol Meaning
\(\mathbf{x}, \mathbf{y}, \boldsymbol{\beta}\) Column vectors (bold lowercase)
\(\mathbf{X}, \mathbf{A}, \boldsymbol{\Sigma}\) Matrices (bold uppercase)
\(x_i\), \(a_{ij}\) Scalar entries (plain lowercase)
\(\mathbf{I}_n\) \(n \times n\) identity matrix
\(\mathbf{1}_n\) \(n\)-vector of ones
\(\mathbf{0}\) Zero vector or matrix (size from context)
\(\mathbf{A}^T\) Transpose
\(\mathbf{A}^{-1}\) Inverse
\(\text{tr}(\mathbf{A})\) Trace
\(\det(\mathbf{A})\) or \(\lvert\mathbf{A}\rvert\) Determinant
\(\text{diag}(\cdot)\) Diagonal matrix
\(\lVert \mathbf{x} \rVert\) Euclidean (\(\ell_2\)) norm, unless otherwise specified

Summary

Concept Where It Appears
Vectors and dot products Feature vectors, inner products in regression
Matrix multiplication Design matrix operations, covariance computation
Inverse OLS formula \((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\)
Eigendecomposition PCA, spectral properties of covariance matrices
Positive definiteness Covariance matrices, quadratic form positivity
Projection matrices Hat matrix, residual decomposition in regression
Matrix calculus Deriving MLE and least-squares estimators