IEEE 754 Standard¶
The IEEE 754 floating-point standard defines how computers represent real numbers in binary. Understanding this representation explains why certain decimal values cannot be stored exactly and why floating-point arithmetic behaves unexpectedly.
Binary Scientific Notation¶
Floating-point numbers use binary scientific notation, analogous to decimal scientific notation.
1. Decimal vs Binary¶
In decimal, we write \(1234.5 = 1.2345 \times 10^3\). Binary follows the same pattern.
# Decimal scientific notation
print(f"{1234.5:.4e}") # 1.2345e+03
# Binary representation of 5.75
# 5.75 = 4 + 1 + 0.5 + 0.25 = 2^2 + 2^0 + 2^-1 + 2^-2
# Binary: 101.11 = 1.0111 × 2^2
print(5.75) # 5.75
# Verify
print(4 + 1 + 0.5 + 0.25) # 5.75
2. Floating-Point Formula¶
IEEE 754 encodes numbers as:
Where: - \(s\) is the sign bit (0 = positive, 1 = negative) - \(m\) is the mantissa (fractional part after the implicit 1) - \(e\) is the biased exponent - bias = 1023 for double precision, 127 for single precision
import struct
def float_to_binary(x):
"""Show IEEE 754 binary representation."""
# Pack as double, unpack as 8 bytes
packed = struct.pack('>d', x)
# Convert to binary string
bits = ''.join(f'{byte:08b}' for byte in packed)
sign = bits[0]
exponent = bits[1:12]
mantissa = bits[12:]
return sign, exponent, mantissa
# Example: 5.75
s, e, m = float_to_binary(5.75)
print(f"Sign: {s}")
print(f"Exponent: {e} = {int(e, 2)} (biased)")
print(f"Mantissa: {m[:20]}...")
# Decode
bias = 1023
exp_val = int(e, 2) - bias
print(f"\nActual exponent: {int(e, 2)} - {bias} = {exp_val}")
Double Precision Format¶
Python's float uses IEEE 754 double precision (64 bits).
1. Bit Layout¶
The 64-bit double precision format allocates bits as follows:
| Component | Bits | Range |
|---|---|---|
| Sign | 1 bit | 0 or 1 |
| Exponent | 11 bits | 0–2047 (biased) |
| Mantissa | 52 bits | Fraction after implicit 1 |
import sys
# Python float info
print(f"Max float: {sys.float_info.max:.6e}")
print(f"Min positive: {sys.float_info.min:.6e}")
print(f"Mantissa digits: {sys.float_info.mant_dig}")
print(f"Max exponent: {sys.float_info.max_exp}")
print(f"Min exponent: {sys.float_info.min_exp}")
2. Precision Limits¶
52 mantissa bits provide approximately 15–17 significant decimal digits.
# 52 bits of precision
# 2^52 ≈ 4.5 × 10^15
# Numbers up to 2^52 are exact integers
print(2**52) # 4503599627370496 (exact)
print(2**52 + 1) # 4503599627370497 (exact)
print(float(2**52)) # 4503599627370496.0 (exact)
# Beyond 2^53, not all integers are representable
print(2**53) # 9007199254740992
print(2**53 + 1) # 9007199254740993
print(float(2**53 + 1))# 9007199254740992.0 (lost precision!)
3. Exponent Range¶
The 11-bit exponent allows magnitudes from \(10^{-308}\) to \(10^{308}\).
import sys
# Exponent limits
print(f"Largest: {sys.float_info.max:.6e}") # ~1.8e+308
print(f"Smallest: {sys.float_info.min:.6e}") # ~2.2e-308
# Overflow to infinity
print(1e308 * 10) # inf
# Underflow to zero (gradual)
print(1e-323) # 1e-323 (subnormal)
print(1e-324) # 0.0 (underflow)
Single Precision¶
NumPy provides single precision (32-bit) floats.
1. Bit Layout¶
Single precision uses fewer bits:
| Component | Bits | Range |
|---|---|---|
| Sign | 1 bit | 0 or 1 |
| Exponent | 8 bits | 0–255 (biased) |
| Mantissa | 23 bits | Fraction after implicit 1 |
import numpy as np
# Single precision info
info = np.finfo(np.float32)
print(f"Max: {info.max:.6e}")
print(f"Min: {info.min:.6e}")
print(f"Precision: {info.precision} decimal digits")
print(f"Epsilon: {info.eps:.6e}")
2. Double vs Single¶
Compare precision between formats.
import numpy as np
# Same value in different precisions
val = 1.23456789012345
d = np.float64(val)
s = np.float32(val)
print(f"Original: {val}")
print(f"float64: {d}")
print(f"float32: {s}") # Lost digits!
# Precision difference
print(f"\nfloat64 epsilon: {np.finfo(np.float64).eps:.2e}")
print(f"float32 epsilon: {np.finfo(np.float32).eps:.2e}")
Decimal Fractions Problem¶
Many common decimal fractions have infinite binary representations.
1. Why 0.1 Is Inexact¶
The decimal 0.1 cannot be represented exactly in binary.
The binary expansion repeats infinitely.
# 0.1 is not exactly representable
print(f"{0.1:.20f}") # 0.10000000000000000555...
# Show the actual stored value
from decimal import Decimal
print(Decimal(0.1))
# 0.1000000000000000055511151231257827021181583404541015625
# Compare with true 0.1
true_01 = Decimal('0.1')
stored_01 = Decimal(0.1)
print(f"Error: {stored_01 - true_01:.2e}")
2. Exact Representations¶
Numbers that are sums of powers of 2 are exact.
# Powers of 2 are exact
print(f"{0.5:.20f}") # 0.50000000000000000000 (exact: 2^-1)
print(f"{0.25:.20f}") # 0.25000000000000000000 (exact: 2^-2)
print(f"{0.125:.20f}") # 0.12500000000000000000 (exact: 2^-3)
# Sums of powers of 2 are exact
print(f"{0.75:.20f}") # 0.75000000000000000000 (2^-1 + 2^-2)
# But 0.1, 0.2, 0.3 are not
print(f"{0.1:.20f}") # Error in last digits
print(f"{0.2:.20f}") # Error in last digits
print(f"{0.3:.20f}") # Error in last digits
3. Common Inexact Values¶
| Decimal | Binary (approximate) | Exact? |
|---|---|---|
| 0.5 | 0.1 | ✓ |
| 0.25 | 0.01 | ✓ |
| 0.1 | 0.0001100110011... | ✗ |
| 0.2 | 0.0011001100110... | ✗ |
| 0.3 | 0.0100110011001... | ✗ |
# Quick test for exact representation
def is_exact_binary(x, tol=1e-16):
"""Check if x can be represented exactly."""
from decimal import Decimal
stored = Decimal(x)
intended = Decimal(str(x))
return abs(stored - intended) < Decimal(str(tol))
print(f"0.5 exact: {is_exact_binary(0.5)}") # True
print(f"0.1 exact: {is_exact_binary(0.1)}") # False
print(f"0.125 exact: {is_exact_binary(0.125)}") # True
Rounding Modes¶
IEEE 754 defines rounding modes for operations.
1. Round to Nearest Even¶
Default mode—rounds to nearest, ties go to even.
# Python's round() uses banker's rounding (round half to even)
print(round(0.5)) # 0 (tie goes to even)
print(round(1.5)) # 2 (tie goes to even)
print(round(2.5)) # 2 (tie goes to even)
print(round(3.5)) # 4 (tie goes to even)
# Contrast with away-from-zero rounding
import math
print(math.floor(2.5 + 0.5)) # 3 (traditional rounding)
2. Directed Rounding¶
Other rounding directions available via math module.
import math
x = 2.7
print(f"floor({x}): {math.floor(x)}") # 2 (toward -∞)
print(f"ceil({x}): {math.ceil(x)}") # 3 (toward +∞)
print(f"trunc({x}): {math.trunc(x)}") # 2 (toward 0)
# Negative numbers show the difference
y = -2.7
print(f"\nfloor({y}): {math.floor(y)}") # -3
print(f"ceil({y}): {math.ceil(y)}") # -2
print(f"trunc({y}): {math.trunc(y)}") # -2
Practical Implications¶
Understanding IEEE 754 helps write correct numerical code.
1. Integer Range in Floats¶
Floats can store integers exactly up to \(2^{53}\).
# Safe integer range for floats
MAX_SAFE_INT = 2**53
print(f"Max safe integer: {MAX_SAFE_INT:,}")
# Within range: exact
a = float(MAX_SAFE_INT - 1)
print(a == MAX_SAFE_INT - 1) # True
# Beyond range: may lose precision
b = float(MAX_SAFE_INT + 1)
print(b == MAX_SAFE_INT + 1) # False!
print(f"Stored: {b:.0f}, Expected: {MAX_SAFE_INT + 1}")
2. Format String Precision¶
Avoid false precision in output.
x = 1/3
# Too many digits suggests false precision
print(f"{x:.20f}") # 0.33333333333333331483
# Match actual precision (~15-17 digits)
print(f"{x:.15f}") # 0.333333333333333
print(f"{x:.6f}") # 0.333333 (reasonable for display)
# General rule: 15-16 significant digits max
import sys
print(f"Reliable digits: {sys.float_info.dig}") # 15
3. Hexadecimal Float Literal¶
Python supports exact hexadecimal float representation.
# Hexadecimal float literals (exact representation)
x = 0x1.999999999999ap-4 # Exact representation of ~0.1
print(x) # 0.1
# Get hex representation of any float
print((0.1).hex()) # 0x1.999999999999ap-4
print((0.5).hex()) # 0x1.0000000000000p-1
# Round-trip without precision loss
original = 3.141592653589793
hex_rep = original.hex()
recovered = float.fromhex(hex_rep)
print(original == recovered) # True