Binary Files¶
Binary files store raw bytes rather than human-readable text. Working with binary mode is essential for images, audio, executables, and custom data formats.
Mental Model
Text mode ('r'/'w') gives you strings with automatic encoding/decoding and newline translation. Binary mode ('rb'/'wb') gives you raw bytes with no translation at all. If the file is not meant for humans to read (images, audio, serialized data), always use binary mode to avoid corruption from encoding conversions.
Opening Binary Files¶
Use 'b' mode flag to read and write files as raw bytes.
1. Mode Flags¶
Combine 'b' with read, write, or append modes.
```python
Read binary¶
with open("image.png", "rb") as f: data = f.read()
Write binary¶
with open("output.bin", "wb") as f: f.write(data)
Append binary¶
with open("log.bin", "ab") as f: f.write(b"\x00\x01\x02")
Read and write binary¶
with open("data.bin", "r+b") as f: content = f.read() f.seek(0) f.write(modified) ```
2. Bytes vs Strings¶
Binary mode works with bytes, not str.
```python
Text mode returns str¶
with open("text.txt", "r") as f:
data = f.read()
print(type(data)) #
Binary mode returns bytes¶
with open("text.txt", "rb") as f:
data = f.read()
print(type(data)) #
3. No Encoding¶
Binary mode bypasses text encoding entirely.
```python
Text mode uses encoding¶
with open("file.txt", "r", encoding="utf-8") as f: text = f.read()
Binary mode: raw bytes, no encoding¶
with open("file.txt", "rb") as f: raw = f.read() text = raw.decode("utf-8") # Manual decode ```
Reading Binary Data¶
Methods for reading raw bytes from files.
1. Read Entire File¶
Load complete file contents into memory.
```python with open("photo.jpg", "rb") as f: image_data = f.read()
print(len(image_data)) # File size in bytes print(image_data[:10]) # First 10 bytes ```
2. Read Fixed Chunks¶
Read specific number of bytes at a time.
```python with open("large_file.bin", "rb") as f: # Read first 1024 bytes header = f.read(1024)
# Read next chunk
chunk = f.read(4096)
# Empty bytes means EOF
while chunk := f.read(4096):
process(chunk)
```
3. Read Into Buffer¶
Use readinto() for memory-efficient reading.
```python buffer = bytearray(4096)
with open("data.bin", "rb") as f: # Read into existing buffer bytes_read = f.readinto(buffer) print(f"Read {bytes_read} bytes")
# Process buffer[:bytes_read]
```
Writing Binary Data¶
Methods for writing raw bytes to files.
1. Write Bytes¶
Write bytes objects directly to file.
```python data = b"\x89PNG\r\n\x1a\n" # PNG header
with open("header.bin", "wb") as f: f.write(data)
Write bytearray¶
buffer = bytearray([0, 1, 2, 3, 4]) with open("buffer.bin", "wb") as f: f.write(buffer) ```
2. Write Multiple Chunks¶
Write data in segments for large files.
```python chunks = [b"chunk1", b"chunk2", b"chunk3"]
with open("output.bin", "wb") as f: for chunk in chunks: f.write(chunk)
Using writelines (no separator added)¶
with open("output.bin", "wb") as f: f.writelines(chunks) ```
3. Buffered Writing¶
Control write buffering behavior.
```python
Unbuffered (immediate writes)¶
with open("log.bin", "wb", buffering=0) as f: f.write(b"immediate")
Line buffered (not for binary)¶
buffering=1 only works for text mode¶
Custom buffer size¶
with open("data.bin", "wb", buffering=8192) as f: f.write(b"buffered") ```
File Position¶
Navigate within binary files using seek and tell.
1. Current Position¶
Use tell() to get current byte position.
```python with open("data.bin", "rb") as f: print(f.tell()) # 0 (start)
f.read(10)
print(f.tell()) # 10
f.read(5)
print(f.tell()) # 15
```
2. Seek Absolute¶
Move to specific byte position with seek().
```python with open("data.bin", "rb") as f: f.seek(100) # Go to byte 100 chunk = f.read(50) # Read bytes 100-149
f.seek(0) # Back to start
header = f.read(10)
```
3. Seek Relative¶
Use whence parameter for relative seeking.
```python import os
with open("data.bin", "rb") as f: # From start (default, whence=0) f.seek(10, os.SEEK_SET)
# From current position (whence=1)
f.seek(5, os.SEEK_CUR) # Now at 15
# From end (whence=2)
f.seek(-10, os.SEEK_END) # 10 bytes before end
```
Struct Module¶
Pack and unpack binary data with defined formats.
1. Basic Packing¶
Convert Python values to bytes.
```python import struct
Pack integer and float¶
data = struct.pack("if", 42, 3.14) print(data) # b'*\x00\x00\x00\xc3\xf5H@' print(len(data)) # 8 bytes
Format characters: i=int, f=float, d=double¶
h=short, b=byte, s=string¶
```
2. Basic Unpacking¶
Convert bytes back to Python values.
```python import struct
data = b'*\x00\x00\x00\xc3\xf5H@'
Unpack to tuple¶
values = struct.unpack("if", data) print(values) # (42, 3.140000104904175)
Unpack single value¶
num = struct.unpack("i", data[:4])[0] print(num) # 42 ```
3. Byte Order¶
Specify endianness in format string.
```python import struct
num = 0x12345678
Native byte order (system-dependent)¶
native = struct.pack("I", num)
Little-endian¶
little = struct.pack("<I", num) print(little.hex()) # 78563412
Big-endian (network order)¶
big = struct.pack(">I", num) print(big.hex()) # 12345678 ```
Common Patterns¶
Practical binary file operations.
1. File Header Reading¶
Parse structured file headers.
```python import struct
def read_bmp_header(filename): """Read BMP image header.""" with open(filename, "rb") as f: # BMP signature sig = f.read(2) if sig != b"BM": raise ValueError("Not a BMP file")
# File size, reserved, data offset
size, _, _, offset = struct.unpack("<IHHI", f.read(12))
return {"size": size, "offset": offset}
header = read_bmp_header("image.bmp")¶
```
2. Copy Binary File¶
Efficiently copy large binary files.
```python def copy_binary(src, dst, chunk_size=8192): """Copy binary file in chunks.""" with open(src, "rb") as fin: with open(dst, "wb") as fout: while chunk := fin.read(chunk_size): fout.write(chunk)
copy_binary("source.bin", "dest.bin") ```
3. Modify In Place¶
Update specific bytes within a file.
```python def patch_byte(filename, offset, value): """Change single byte at offset.""" with open(filename, "r+b") as f: f.seek(offset) f.write(bytes([value]))
patch_byte("data.bin", 100, 0xFF)¶
```
Exercises¶
Exercise 1.
Write a script that creates a binary file containing the bytes b'\x00\x01\x02\x03', then reads it back and prints each byte as a hexadecimal value.
Solution to Exercise 1
```python
# Write binary file
with open("/tmp/test.bin", "wb") as f:
f.write(b'\x00\x01\x02\x03')
# Read and print hex values
with open("/tmp/test.bin", "rb") as f:
data = f.read()
for byte in data:
print(f"0x{byte:02x}", end=" ")
# 0x00 0x01 0x02 0x03
```
Binary mode ("rb", "wb") reads/writes raw bytes without text encoding.
Exercise 2.
Write a function copy_file(src, dst) that copies a binary file from src to dst by reading and writing in 4096-byte chunks. Use "rb" and "wb" modes.
Solution to Exercise 2
```python
def copy_file(src, dst, chunk_size=4096):
with open(src, "rb") as fin, open(dst, "wb") as fout:
while True:
chunk = fin.read(chunk_size)
if not chunk:
break
fout.write(chunk)
```
Reading in chunks avoids loading the entire file into memory, making this suitable for large files.
Exercise 3.
Use the struct module to write two integers (42 and 100) to a binary file in little-endian format, then read them back and print them.
Solution to Exercise 3
```python
import struct
# Write
with open("/tmp/ints.bin", "wb") as f:
f.write(struct.pack("<ii", 42, 100))
# Read
with open("/tmp/ints.bin", "rb") as f:
data = f.read()
a, b = struct.unpack("<ii", data)
print(a, b) # 42 100
```
"<ii" means little-endian (<) with two signed integers (i). struct.pack converts to bytes and struct.unpack converts back.