line_profiler¶

The line_profiler tool profiles code at the line level, showing exactly which lines consume the most time. It requires installation via pip.

Mental Model

While cProfile tells you which function is slow, line_profiler tells you which line inside that function is slow. It is the microscope you reach for after the telescope has pointed you to the right function. Decorate the suspect function with @profile, run with kernprof, and read the per-line time percentages.

Installation and Setup¶

bash pip install line_profiler

Using line_profiler¶

Mark functions with @profile decorator and run with kernprof:

```python

example.py¶

@profile def process_data(n): result = 0 for i in range(n): result += i ** 2 # Expensive operation

total = sum(range(n))  # Another loop
return result, total

if name == "main": process_data(10000) ```

Run with profiler: kernprof -l -v example.py

Output Interpretation¶

Line # Hits Time Per Hit % Time Line Contents ============================================================== 1 @profile 2 def process_data(n): 3 1 2 2.0 0.0 result = 0 4 10001 456 0.0 5.2 for i in range(n): 5 10000 8234 0.8 94.2 result += i ** 2 6 1 122 122.0 1.4 total = sum(range(n)) 7 1 1 1.0 0.0 return result, total

Advanced Features¶

```python

Profiling without @profile decorator¶

from line_profiler import LineProfiler

def expensive_function(): data = [] for i in range(100): data.append(i ** 2) return data

profiler = LineProfiler() profiler.add_function(expensive_function) profiler.enable() expensive_function() profiler.disable() profiler.print_stats() ```

Combining with cProfile¶

For a complete picture, use both tools:

cProfile identifies which functions are slow
line_profiler identifies which lines within those functions are slow

Exercises¶

Exercise 1. Write a function process_data(n) that (a) creates a list of n random floats, (b) sorts the list, (c) computes the sum, and (d) finds the median by indexing. Use LineProfiler programmatically (without the @profile decorator) to profile it, print the stats, and identify which line takes the most time.

Solution to Exercise 1

```python
from line_profiler import LineProfiler
import random

def process_data(n):
    data = [random.random() for _ in range(n)]
    data.sort()
    total = sum(data)
    median = data[n // 2]
    return total, median

profiler = LineProfiler()
profiler.add_function(process_data)
profiler.enable()
process_data(500_000)
profiler.disable()
profiler.print_stats()
```

Exercise 2. Create two versions of a function that counts word frequencies in a large string: one using a manual dictionary loop and one using collections.Counter. Profile both with LineProfiler and compare which lines are hotspots in each version.

Solution to Exercise 2

```python
from line_profiler import LineProfiler
from collections import Counter

text = " ".join(["word"] * 100_000 + ["hello"] * 50_000)

def count_manual(text):
    freq = {}
    for word in text.split():
        if word in freq:
            freq[word] += 1
        else:
            freq[word] = 1
    return freq

def count_counter(text):
    return Counter(text.split())

for func in [count_manual, count_counter]:
    lp = LineProfiler()
    lp.add_function(func)
    lp.enable()
    func(text)
    lp.disable()
    lp.print_stats()
```

Exercise 3. Write a matrix multiplication function matmul(A, B) using nested loops. Profile it with LineProfiler on two 100x100 matrices. Identify the innermost loop line and compute what percentage of total time it consumes.

Solution to Exercise 3

```python
from line_profiler import LineProfiler

def matmul(A, B):
    n = len(A)
    m = len(B[0])
    k = len(B)
    C = [[0] * m for _ in range(n)]
    for i in range(n):
        for j in range(m):
            total = 0
            for p in range(k):
                total += A[i][p] * B[p][j]
            C[i][j] = total
    return C

import random
n = 100
A = [[random.random() for _ in range(n)] for _ in range(n)]
B = [[random.random() for _ in range(n)] for _ in range(n)]

lp = LineProfiler()
lp.add_function(matmul)
lp.enable()
matmul(A, B)
lp.disable()
lp.print_stats()
```