Structured Arrays¶
Structured arrays (also called record arrays) allow you to store heterogeneous data types in a single array, similar to a database table or spreadsheet row.
import numpy as np
What are Structured Arrays?¶
Regular NumPy arrays hold homogeneous data (all same type). Structured arrays hold records with multiple named fields of different types:
# Regular array: all floats
regular = np.array([1.0, 2.0, 3.0])
# Structured array: mixed types
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('score', 'f8')])
structured = np.array([
('Alice', 25, 95.5),
('Bob', 30, 87.3),
('Charlie', 22, 91.0)
], dtype=dt)
Creating Structured Arrays¶
Method 1: dtype with List of Tuples¶
# Define dtype: (field_name, data_type)
dt = np.dtype([
('name', 'U20'), # Unicode string, max 20 chars
('age', 'i4'), # 32-bit integer
('salary', 'f8'), # 64-bit float
('active', '?') # Boolean
])
# Create array
employees = np.array([
('Alice', 30, 75000.0, True),
('Bob', 25, 65000.0, True),
('Charlie', 35, 85000.0, False)
], dtype=dt)
Method 2: Dictionary Format¶
dt = np.dtype({
'names': ['x', 'y', 'z'],
'formats': ['f8', 'f8', 'f8']
})
points = np.array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0)], dtype=dt)
Method 3: String Format¶
# Comma-separated type strings
dt = np.dtype('U10, i4, f8') # Unnamed fields: f0, f1, f2
data = np.array([('Alice', 25, 95.5)], dtype=dt)
print(data['f0']) # 'Alice'
Data Type Codes¶
| Code | Type | Example |
|---|---|---|
'i4' |
32-bit int | np.int32 |
'i8' |
64-bit int | np.int64 |
'f4' |
32-bit float | np.float32 |
'f8' |
64-bit float | np.float64 |
'U10' |
Unicode string (10 chars) | |
'S10' |
Byte string (10 bytes) | |
'?' |
Boolean | np.bool_ |
'c16' |
Complex 128 | np.complex128 |
Accessing Data¶
By Field Name¶
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('score', 'f8')])
students = np.array([
('Alice', 20, 95.5),
('Bob', 22, 87.3),
('Charlie', 21, 91.0)
], dtype=dt)
# Access entire column
print(students['name']) # ['Alice' 'Bob' 'Charlie']
print(students['age']) # [20 22 21]
print(students['score']) # [95.5 87.3 91. ]
# Access single record
print(students[0]) # ('Alice', 20, 95.5)
print(students[0]['name']) # 'Alice'
By Index¶
# First record
print(students[0]) # ('Alice', 20, 95.5)
# Slice
print(students[:2]) # First two records
# Boolean indexing
adults = students[students['age'] >= 21]
print(adults['name']) # ['Bob' 'Charlie']
Multiple Fields¶
# Select multiple fields (returns structured array)
subset = students[['name', 'score']]
print(subset.dtype) # [('name', '<U10'), ('score', '<f8')]
Modifying Data¶
# Modify single field
students['score'][0] = 98.0
# Modify entire record
students[1] = ('Robert', 23, 90.0)
# Modify column
students['age'] = students['age'] + 1 # Everyone ages by 1
Nested Structures¶
# Nested dtype
address_dt = np.dtype([('city', 'U20'), ('zip', 'U10')])
person_dt = np.dtype([
('name', 'U20'),
('address', address_dt)
])
people = np.array([
('Alice', ('New York', '10001')),
('Bob', ('Los Angeles', '90001'))
], dtype=person_dt)
# Access nested fields
print(people['address']['city']) # ['New York' 'Los Angeles']
Array Fields¶
# Field that is itself an array
dt = np.dtype([
('name', 'U10'),
('grades', 'f8', (3,)) # Array of 3 floats
])
students = np.array([
('Alice', [95, 87, 92]),
('Bob', [88, 91, 85])
], dtype=dt)
print(students['grades'])
# [[95. 87. 92.]
# [88. 91. 85.]]
print(students[0]['grades'].mean()) # 91.33
Record Arrays (recarray)¶
Record arrays allow attribute-style access:
# Convert structured array to recarray
rec = students.view(np.recarray)
# Attribute access (instead of indexing)
print(rec.name) # ['Alice' 'Bob']
print(rec.age) # [20 22]
print(rec[0].name) # 'Alice'
# Create recarray directly
rec = np.rec.array([
('Alice', 25, 95.5),
('Bob', 30, 87.3)
], dtype=[('name', 'U10'), ('age', 'i4'), ('score', 'f8')])
Practical Examples¶
CSV-like Data¶
# Load CSV-like data
dt = np.dtype([
('id', 'i4'),
('product', 'U30'),
('price', 'f8'),
('quantity', 'i4')
])
inventory = np.array([
(1, 'Widget', 9.99, 100),
(2, 'Gadget', 24.99, 50),
(3, 'Gizmo', 14.99, 75)
], dtype=dt)
# Calculate total value
total_value = (inventory['price'] * inventory['quantity']).sum()
print(f"Total inventory value: ${total_value:.2f}")
Sorting Structured Arrays¶
# Sort by single field
sorted_by_score = np.sort(students, order='score')
# Sort by multiple fields
sorted_multi = np.sort(students, order=['age', 'score'])
Filtering¶
# Boolean filtering
high_scorers = students[students['score'] > 90]
young_students = students[students['age'] < 22]
# Combined conditions
filtered = students[(students['age'] >= 20) & (students['score'] > 85)]
When to Use Structured Arrays¶
Use Structured Arrays When:¶
- Working with large datasets in pure NumPy
- Need memory-efficient storage
- Interfacing with C/binary data formats
- Simple tabular operations without pandas overhead
Use Pandas Instead When:¶
- Need advanced data manipulation
- Working with time series
- Need missing value handling (NaN)
- Complex groupby/merge operations
Summary¶
| Task | Code |
|---|---|
| Create dtype | np.dtype([('name', 'U10'), ('age', 'i4')]) |
| Create array | np.array([('Alice', 25)], dtype=dt) |
| Access field | arr['name'] |
| Access record | arr[0] |
| Access nested | arr['address']['city'] |
| Sort by field | np.sort(arr, order='age') |
| Filter | arr[arr['age'] > 20] |
| To recarray | arr.view(np.recarray) |
Key Takeaways:
- Structured arrays store heterogeneous data with named fields
- Access fields by name:
arr['fieldname'] - Use dtype codes:
'i4'(int32),'f8'(float64),'U10'(string) - Record arrays add attribute-style access:
arr.name - Can nest structures and include array fields
- Good for memory-efficient tabular data without pandas
- Use
orderparameter for sorting by fields