Array I/O: save, load, savez¶
NumPy provides efficient binary formats for saving and loading arrays. These are faster and more compact than text formats like CSV.
import numpy as np
Why Use NumPy's Binary Format?¶
| Format | Speed | Size | Preserves dtype | Multiple arrays |
|---|---|---|---|---|
.npy |
Fast | Compact | Yes | No |
.npz |
Fast | Compact | Yes | Yes |
| CSV | Slow | Large | No | No |
| Pickle | Medium | Medium | Yes | Yes |
# Comparison: 1 million floats
arr = np.random.randn(1_000_000)
# Binary: ~8 MB, loads in milliseconds
np.save('data.npy', arr)
# CSV: ~25 MB, loads in seconds
np.savetxt('data.csv', arr)
np.save() — Save Single Array¶
Save one array to a .npy file:
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Save to file
np.save('my_array.npy', arr)
# File extension .npy is added automatically if missing
np.save('my_array', arr) # Creates my_array.npy
Parameters¶
np.save(
file, # Filename or file object
arr, # Array to save
allow_pickle=True, # Allow pickling objects
fix_imports=True # Python 2/3 compatibility
)
np.load() — Load Array¶
Load arrays from .npy or .npz files:
# Load single array
arr = np.load('my_array.npy')
print(arr)
# [[1 2 3]
# [4 5 6]]
# dtype is preserved
print(arr.dtype) # int64
Security Warning¶
# allow_pickle=False is safer for untrusted files
arr = np.load('untrusted.npy', allow_pickle=False)
# Default changed in NumPy 1.16.3 for security
# Pickle can execute arbitrary code!
np.savez() — Save Multiple Arrays¶
Save multiple arrays to a single .npz file (uncompressed):
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
z = np.array([[1, 2], [3, 4]])
# Save with automatic names (arr_0, arr_1, arr_2)
np.savez('arrays.npz', x, y, z)
# Save with custom names (recommended)
np.savez('arrays.npz', x_data=x, y_data=y, matrix=z)
Loading .npz Files¶
# Load returns NpzFile object (dict-like)
data = np.load('arrays.npz')
# Access by name
print(data['x_data']) # [1 2 3]
print(data['y_data']) # [4 5 6]
print(data['matrix']) # [[1 2] [3 4]]
# List available arrays
print(data.files) # ['x_data', 'y_data', 'matrix']
# Close when done (or use context manager)
data.close()
Context Manager (Recommended)¶
# Automatically closes file
with np.load('arrays.npz') as data:
x = data['x_data']
y = data['y_data']
print(x + y) # [5 7 9]
np.savez_compressed() — Compressed Archive¶
Same as savez() but with compression:
large_array = np.random.randn(1000, 1000)
# Uncompressed: ~8 MB
np.savez('uncompressed.npz', data=large_array)
# Compressed: ~6 MB (varies by data)
np.savez_compressed('compressed.npz', data=large_array)
When to Compress¶
| Scenario | Recommendation |
|---|---|
| Large arrays, infrequent access | Compress |
| Small arrays | Don't compress (overhead) |
| Frequent loading | Don't compress (slower) |
| Limited disk space | Compress |
| Random/incompressible data | Don't compress (no benefit) |
Practical Examples¶
Save Model Weights¶
# Save neural network weights
weights = {
'layer1': np.random.randn(784, 256),
'layer2': np.random.randn(256, 128),
'layer3': np.random.randn(128, 10),
'biases1': np.zeros(256),
'biases2': np.zeros(128),
'biases3': np.zeros(10),
}
np.savez_compressed('model_weights.npz', **weights)
# Load weights
with np.load('model_weights.npz') as data:
w1 = data['layer1']
b1 = data['biases1']
Checkpoint Training Progress¶
def save_checkpoint(epoch, weights, optimizer_state, loss_history):
np.savez(
f'checkpoint_epoch_{epoch}.npz',
weights=weights,
optimizer_state=optimizer_state,
loss_history=np.array(loss_history),
epoch=np.array(epoch)
)
def load_checkpoint(filename):
with np.load(filename, allow_pickle=True) as data:
return {
'weights': data['weights'],
'optimizer_state': data['optimizer_state'],
'loss_history': data['loss_history'].tolist(),
'epoch': int(data['epoch'])
}
Save Preprocessed Data¶
# Preprocess once, save for reuse
def preprocess_and_save(raw_data_path, output_path):
raw = np.loadtxt(raw_data_path, delimiter=',')
# Normalize
mean = raw.mean(axis=0)
std = raw.std(axis=0)
normalized = (raw - mean) / std
# Save data and parameters
np.savez(
output_path,
data=normalized,
mean=mean,
std=std
)
# Load preprocessed data
with np.load('preprocessed.npz') as f:
data = f['data']
mean = f['mean']
std = f['std']
Text Alternatives¶
For human-readable or interoperability needs:
np.savetxt() / np.loadtxt()¶
arr = np.array([[1.5, 2.5], [3.5, 4.5]])
# Save as text
np.savetxt('data.csv', arr, delimiter=',', header='col1,col2')
# Load from text
loaded = np.loadtxt('data.csv', delimiter=',')
np.genfromtxt() — Handle Missing Values¶
# More flexible than loadtxt
data = np.genfromtxt(
'data.csv',
delimiter=',',
missing_values='NA',
filling_values=0.0
)
Memory-Mapped Files¶
For arrays too large to fit in memory:
# Create memory-mapped file
large = np.memmap('large_array.dat', dtype='float64',
mode='w+', shape=(10000, 10000))
large[:] = np.random.randn(10000, 10000)
large.flush() # Write to disk
# Load as memory-mapped (doesn't load into RAM)
mapped = np.memmap('large_array.dat', dtype='float64',
mode='r', shape=(10000, 10000))
print(mapped[0, 0]) # Access without loading entire array
Common Issues¶
Issue 1: File Not Found¶
# Always use raw strings or forward slashes for paths
np.save(r'C:\data\array.npy', arr) # Raw string
np.save('C:/data/array.npy', arr) # Forward slashes
Issue 2: Pickle Security¶
# Untrusted .npy files can contain pickled objects
# Always use allow_pickle=False for untrusted sources
try:
arr = np.load('untrusted.npy', allow_pickle=False)
except ValueError:
print("File contains pickled objects - potentially unsafe!")
Issue 3: Version Compatibility¶
# Old NumPy versions may not read new files
# Check NumPy version if sharing files
print(np.__version__)
Summary¶
| Function | Purpose | File Type |
|---|---|---|
np.save() |
Save single array | .npy |
np.load() |
Load .npy or .npz |
Both |
np.savez() |
Save multiple arrays | .npz |
np.savez_compressed() |
Save compressed | .npz |
np.savetxt() |
Save as text | .csv, .txt |
np.loadtxt() |
Load from text | .csv, .txt |
np.memmap() |
Memory-mapped I/O | .dat |
Key Takeaways:
- Use
.npyfor single arrays,.npzfor multiple - Binary format is faster and preserves dtypes
- Use
savez_compressed()for large arrays with limited disk space - Use context manager (
with) when loading.npzfiles - Set
allow_pickle=Falsefor untrusted files - Use memory mapping for arrays too large for RAM