Skip to content

Format Selection Guide

Choose the right sparse format for your use case.

Mental Model

Choosing a sparse format is like choosing the right data structure: COO and LIL are for building, CSR is for row-wise operations and matrix-vector products, CSC is for column-wise access and direct solvers, and DIA is for banded matrices. Pick the format that matches your dominant operation, and convert only once.

Decision Tree

1. Building a Matrix?

  • Use LIL or COO for construction
  • Convert to CSR/CSC before computation

2. Row Operations?

  • Use CSR for row slicing and matvec

3. Column Operations?

  • Use CSC for column slicing

4. Banded Matrix?

  • Use DIA for diagonal/banded structures

5. Solving Linear Systems?

  • Use CSC for direct solvers
  • Use CSR for iterative solvers

Quick Reference

Task Recommended Format
Build incrementally LIL, COO
Matrix-vector product CSR
Row slicing CSR
Column slicing CSC
Direct solve (splu) CSC
Iterative solve (cg) CSR
Banded matrices DIA
Format conversion COO

Workflow

```python from scipy import sparse

def main(): # 1. Build with LIL lil = sparse.lil_matrix((1000, 1000)) # ... add entries ...

# 2. Convert to CSR for computation
csr = lil.tocsr()

# 3. Use CSR for all operations
# ... matrix operations ...

if name == "main": main() ```


Exercises

Exercise 1. You have a banded matrix with 5 diagonals. Create it using sparse.diags in DIA format, then convert it to CSR for matrix-vector products and to CSC for a direct solve. Print the format type at each stage and verify the dense representations are identical.

Solution to Exercise 1
import numpy as np
from scipy import sparse

n = 50
dia = sparse.diags([1, -2, 6, -2, 1], [-2, -1, 0, 1, 2],
                    shape=(n, n), format='dia')
print(f"DIA format: {type(dia)}")

csr = dia.tocsr()
print(f"CSR format: {type(csr)}")

csc = dia.tocsc()
print(f"CSC format: {type(csc)}")

assert np.allclose(dia.toarray(), csr.toarray())
assert np.allclose(dia.toarray(), csc.toarray())
print("All formats produce identical dense arrays.")

Exercise 2. Build a \(200 \times 200\) sparse matrix incrementally: for each row \(i\), set \(A[i, i] = 10\) and \(A[i, j] = -1\) for \(j\) in a random subset of 3 other columns (use np.random.seed(5)). Use LIL format for construction, then convert to CSR. Print the total nonzeros and verify the matrix is diagonally dominant.

Solution to Exercise 2
import numpy as np
from scipy import sparse

np.random.seed(5)
n = 200
lil = sparse.lil_matrix((n, n))
for i in range(n):
    lil[i, i] = 10
    others = np.random.choice([j for j in range(n) if j != i],
                               size=3, replace=False)
    for j in others:
        lil[i, j] = -1

csr = lil.tocsr()
print(f"Nonzeros: {csr.nnz}")

# Check diagonal dominance
dense = csr.toarray()
diag_dominant = True
for i in range(n):
    if abs(dense[i, i]) <= np.sum(np.abs(dense[i, :])) - abs(dense[i, i]):
        diag_dominant = False
        break
print(f"Diagonally dominant: {diag_dominant}")

Exercise 3. Create a \(1000 \times 1000\) sparse matrix in COO format from random triplets (use np.random.seed(0), 5000 entries). Convert to CSR and CSC. Measure the time for 100 row slices A[i, :] using CSR versus CSC, and 100 column slices A[:, j] using CSR versus CSC. Print the times to demonstrate which format is faster for each operation.

Solution to Exercise 3
import numpy as np
from scipy import sparse
import time

np.random.seed(0)
n = 1000
rows = np.random.randint(0, n, 5000)
cols = np.random.randint(0, n, 5000)
data = np.random.randn(5000)
coo = sparse.coo_matrix((data, (rows, cols)), shape=(n, n))

csr = coo.tocsr()
csc = coo.tocsc()

# Row slicing
start = time.perf_counter()
for i in range(100):
    _ = csr[i, :]
t_row_csr = time.perf_counter() - start

start = time.perf_counter()
for i in range(100):
    _ = csc[i, :]
t_row_csc = time.perf_counter() - start

# Column slicing
start = time.perf_counter()
for j in range(100):
    _ = csr[:, j]
t_col_csr = time.perf_counter() - start

start = time.perf_counter()
for j in range(100):
    _ = csc[:, j]
t_col_csc = time.perf_counter() - start

print(f"Row slice:  CSR={t_row_csr:.4f}s, CSC={t_row_csc:.4f}s")
print(f"Col slice:  CSR={t_col_csr:.4f}s, CSC={t_col_csc:.4f}s")