Performance Tips¶
Best practices for sparse matrix performance.
Mental Model
Sparse performance boils down to three rules: convert to the right format once up front, never access individual elements in a loop, and avoid operations that destroy sparsity. Most performance problems come from repeatedly converting formats inside loops or from treating sparse matrices like dense NumPy arrays with element-wise access.
1. Convert Once¶
```python from scipy import sparse
Bad: repeated conversion¶
for i in range(100): csr = coo.tocsr() result = csr @ x
Good: convert once¶
csr = coo.tocsr() for i in range(100): result = csr @ x ```
2. Use Right Format¶
| Operation | Best Format |
|---|---|
| A @ x | CSR |
| A[:, j] | CSC |
| A[i, :] | CSR |
| Build | LIL, COO |
3. Avoid Element Access¶
```python
Bad: element-by-element¶
for i in range(n): for j in range(n): A[i, j] = func(i, j)
Good: build with COO¶
rows, cols, data = [], [], [] for i in range(n): for j in range(n): if (val := func(i, j)) != 0: rows.append(i) cols.append(j) data.append(val) A = sparse.coo_matrix((data, (rows, cols))) ```
4. Preallocate When Possible¶
Use lil_matrix for incremental builds, then convert.
5. Monitor Fill-in¶
Matrix products can dramatically increase density.
Exercises¶
Exercise 1.
Create a \(500 \times 500\) sparse matrix using COO format (density 0.02, np.random.seed(9)). Measure the time for 200 matrix-vector products when: (a) converting COO to CSR once and reusing, versus (b) converting COO to CSR in every iteration. Print both times and the speedup factor.
Solution to Exercise 1
import numpy as np
from scipy import sparse
import time
np.random.seed(9)
coo = sparse.random(500, 500, density=0.02, format='coo')
x = np.random.randn(500)
# (a) Convert once
csr = coo.tocsr()
start = time.perf_counter()
for _ in range(200):
y = csr @ x
t_once = time.perf_counter() - start
# (b) Convert every time
start = time.perf_counter()
for _ in range(200):
csr_temp = coo.tocsr()
y = csr_temp @ x
t_every = time.perf_counter() - start
print(f"Convert once: {t_once:.4f} sec")
print(f"Convert every time: {t_every:.4f} sec")
print(f"Speedup: {t_every / t_once:.1f}x")
Exercise 2.
Compare two approaches for building a \(1000 \times 1000\) tridiagonal matrix: (a) element-by-element insertion in a CSR matrix, (b) building with sparse.diags. Time both methods and print the results. Explain why approach (b) is faster.
Solution to Exercise 2
import numpy as np
from scipy import sparse
import time
n = 1000
# (a) Element-by-element with LIL (CSR is worse for insertion)
start = time.perf_counter()
lil = sparse.lil_matrix((n, n))
for i in range(n):
lil[i, i] = 2
if i > 0:
lil[i, i - 1] = -1
if i < n - 1:
lil[i, i + 1] = -1
A1 = lil.tocsr()
t_loop = time.perf_counter() - start
# (b) Using sparse.diags
start = time.perf_counter()
A2 = sparse.diags([-1, 2, -1], [-1, 0, 1],
shape=(n, n), format='csr')
t_diags = time.perf_counter() - start
print(f"Element-by-element: {t_loop:.4f} sec")
print(f"sparse.diags: {t_diags:.4f} sec")
print(f"Speedup: {t_loop / t_diags:.1f}x")
Exercise 3.
Create a \(2000 \times 2000\) sparse matrix with density 0.005 (use np.random.seed(1)). Compute \(A^T A\) (a common operation in least squares). Measure the time and print the density of both \(A\) and \(A^T A\). Suggest when it would be better to use the normal equations versus QR factorization based on the fill-in observed.
Solution to Exercise 3
import numpy as np
from scipy import sparse
import time
np.random.seed(1)
n = 2000
A = sparse.random(n, n, density=0.005, format='csr')
start = time.perf_counter()
ATA = A.T @ A
elapsed = time.perf_counter() - start
print(f"A density: {A.nnz / n**2:.4%}")
print(f"A^T A density: {ATA.nnz / n**2:.4%}")
print(f"Time for A^T A: {elapsed:.4f} sec")
print(f"Fill-in factor: {ATA.nnz / A.nnz:.1f}x")