Skip to content

Storage Formats

SciPy provides multiple sparse matrix formats optimized for different operations.

Mental Model

A sparse format answers: "How do I store only the nonzero entries and still know where they belong?" Each format encodes position differently -- CSR groups by row, CSC by column, COO stores explicit (row, col, value) triplets. The storage scheme determines which operations are fast and which are slow, so understanding the internal arrays is key to predicting performance.

CSR (Compressed Sparse Row)

1. Structure

```python import numpy as np from scipy import sparse

def main(): A = np.array([[1, 0, 2], [0, 0, 3], [4, 5, 6]])

csr = sparse.csr_matrix(A)

print("Dense matrix:")
print(A)
print()
print("CSR representation:")
print(f"data:    {csr.data}")      # Nonzero values
print(f"indices: {csr.indices}")   # Column indices
print(f"indptr:  {csr.indptr}")    # Row pointers

if name == "main": main() ```

2. Best For

  • Fast row slicing
  • Fast matrix-vector products
  • Efficient arithmetic operations

3. Row Access

```python import numpy as np from scipy import sparse

def main(): A = sparse.random(1000, 1000, density=0.01, format='csr')

# Fast row slicing
row_5 = A[5, :]
rows_10_20 = A[10:20, :]

print(f"Row 5 nonzeros: {row_5.nnz}")
print(f"Rows 10-20 shape: {rows_10_20.shape}")

if name == "main": main() ```

CSC (Compressed Sparse Column)

1. Structure

```python import numpy as np from scipy import sparse

def main(): A = np.array([[1, 0, 2], [0, 0, 3], [4, 5, 6]])

csc = sparse.csc_matrix(A)

print("CSC representation:")
print(f"data:    {csc.data}")      # Nonzero values
print(f"indices: {csc.indices}")   # Row indices
print(f"indptr:  {csc.indptr}")    # Column pointers

if name == "main": main() ```

2. Best For

  • Fast column slicing
  • Efficient column operations
  • Some sparse solvers

3. Column Access

```python import numpy as np from scipy import sparse

def main(): A = sparse.random(1000, 1000, density=0.01, format='csc')

# Fast column slicing
col_5 = A[:, 5]
cols_10_20 = A[:, 10:20]

print(f"Column 5 nonzeros: {col_5.nnz}")
print(f"Columns 10-20 shape: {cols_10_20.shape}")

if name == "main": main() ```

COO (Coordinate)

1. Structure

```python import numpy as np from scipy import sparse

def main(): A = np.array([[1, 0, 2], [0, 0, 3], [4, 5, 6]])

coo = sparse.coo_matrix(A)

print("COO representation:")
print(f"data: {coo.data}")
print(f"row:  {coo.row}")
print(f"col:  {coo.col}")

if name == "main": main() ```

2. Best For

  • Fast construction
  • Converting between formats
  • Adding entries incrementally

3. Building Matrices

```python import numpy as np from scipy import sparse

def main(): # Build from triplets row = np.array([0, 0, 1, 2, 2, 2]) col = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6])

coo = sparse.coo_matrix((data, (row, col)), shape=(3, 3))

print("Built from triplets:")
print(coo.toarray())

if name == "main": main() ```

LIL (List of Lists)

1. Structure

```python import numpy as np from scipy import sparse

def main(): lil = sparse.lil_matrix((3, 3)) lil[0, 0] = 1 lil[0, 2] = 2 lil[1, 2] = 3 lil[2, :] = [4, 5, 6]

print("LIL matrix:")
print(lil.toarray())
print()
print("Internal rows:", lil.rows)
print("Internal data:", lil.data)

if name == "main": main() ```

2. Best For

  • Incremental construction
  • Modifying sparsity pattern
  • Building row by row

DIA (Diagonal)

1. Structure

```python import numpy as np from scipy import sparse

def main(): # Tridiagonal matrix diags = [[-1, -1, -1, -1], [2, 2, 2, 2, 2], [-1, -1, -1, -1]] offsets = [-1, 0, 1]

dia = sparse.diags(diags, offsets, shape=(5, 5), format='dia')

print("Diagonal matrix:")
print(dia.toarray())

if name == "main": main() ```

2. Best For

  • Banded matrices
  • PDE discretizations
  • Memory-efficient for diagonals

Format Comparison

1. Summary Table

Format Construction Arithmetic Row Slice Col Slice
CSR Slow Fast Fast Slow
CSC Slow Fast Slow Fast
COO Fast Slow Slow Slow
LIL Fast Slow Fast Slow
DIA Fast Fast Slow Slow

2. Typical Workflow

```python import numpy as np from scipy import sparse

def main(): n = 1000

# 1. Build with LIL or COO
lil = sparse.lil_matrix((n, n))
for i in range(n):
    lil[i, i] = 2
    if i > 0:
        lil[i, i-1] = -1
    if i < n-1:
        lil[i, i+1] = -1

# 2. Convert to CSR for computation
csr = lil.tocsr()

# 3. Use for matrix operations
x = np.ones(n)
y = csr @ x

print(f"Built {n}x{n} matrix")
print(f"Nonzeros: {csr.nnz}")

if name == "main": main() ```

Summary

Format Use Case
CSR General computation, row operations
CSC Column operations, some solvers
COO Building matrices, format conversion
LIL Incremental construction
DIA Banded/diagonal matrices

Exercises

Exercise 1. Create the same \(4 \times 4\) matrix \(A = \begin{pmatrix} 1 & 0 & 2 & 0 \\ 0 & 3 & 0 & 0 \\ 0 & 0 & 4 & 5 \\ 6 & 0 & 0 & 7 \end{pmatrix}\) in CSR, CSC, and COO formats. For each format, print the internal storage arrays (e.g., data, indices, indptr for CSR; data, row, col for COO). Verify all three produce the same dense array.

Solution to Exercise 1
import numpy as np
from scipy import sparse

A = np.array([[1, 0, 2, 0],
               [0, 3, 0, 0],
               [0, 0, 4, 5],
               [6, 0, 0, 7]])

csr = sparse.csr_matrix(A)
csc = sparse.csc_matrix(A)
coo = sparse.coo_matrix(A)

print("CSR: data={}, indices={}, indptr={}".format(
    csr.data, csr.indices, csr.indptr))
print("CSC: data={}, indices={}, indptr={}".format(
    csc.data, csc.indices, csc.indptr))
print("COO: data={}, row={}, col={}".format(
    coo.data, coo.row, coo.col))

assert np.allclose(csr.toarray(), csc.toarray())
assert np.allclose(csr.toarray(), coo.toarray())
print("All formats match.")

Exercise 2. Build a \(6 \times 6\) sparse matrix using LIL format where entry \((i, j)\) equals \(10i + j\) for positions on the main diagonal and the two adjacent diagonals. Convert to CSR and DIA formats. Compare the number of stored elements in each format.

Solution to Exercise 2
import numpy as np
from scipy import sparse

n = 6
lil = sparse.lil_matrix((n, n))
for i in range(n):
    for j in range(max(0, i - 1), min(n, i + 2)):
        lil[i, j] = 10 * i + j

csr = lil.tocsr()
dia = lil.todia()

print("Matrix:")
print(csr.toarray())
print(f"CSR stored elements: {csr.nnz}")
print(f"DIA stored elements: {dia.data.size}")

Exercise 3. Create a \(1000 \times 1000\) sparse random matrix with density 0.01 (use np.random.seed(11)). Convert it to all five formats (CSR, CSC, COO, LIL, DIA). For each format, measure the memory used by the internal arrays and print a comparison table. Which format uses the least memory for this matrix?

Solution to Exercise 3
import numpy as np
from scipy import sparse

np.random.seed(11)
A = sparse.random(1000, 1000, density=0.01, format='coo')

formats = {
    'CSR': A.tocsr(),
    'CSC': A.tocsc(),
    'COO': A,
    'LIL': A.tolil(),
}

print(f"{'Format':<6} {'Memory (KB)':>12}")
print("-" * 20)
for name, mat in formats.items():
    if name == 'CSR' or name == 'CSC':
        mem = mat.data.nbytes + mat.indices.nbytes + mat.indptr.nbytes
    elif name == 'COO':
        mem = mat.data.nbytes + mat.row.nbytes + mat.col.nbytes
    elif name == 'LIL':
        mem = sum(sum(len(r) for r in mat.rows) * 8,
                  sum(len(d) for d in mat.data) * 8)
    print(f"{name:<6} {mem / 1024:>12.1f}")