Skip to content

Dtype Basics

The dtype attribute specifies how array bytes are interpreted.

Mental Model

A NumPy array is a flat buffer of bytes; the dtype is the lens that tells NumPy how to interpret each chunk of those bytes. Choosing a dtype is like choosing a unit of measurement -- float64 gives you precision, uint8 saves memory, and picking wrong silently corrupts your numbers.

Dtype Reference

NumPy supports many data types with different precision and range.

Data type Description
bool_ Boolean (True or False) stored as a byte
int_ Default integer type (int64 or int32)
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to 9223372036854775807)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float16 Half precision (sign, 5-bit exp, 10-bit mantissa)
float32 Single precision (sign, 8-bit exp, 23-bit mantissa)
float64 Double precision (sign, 11-bit exp, 52-bit mantissa)
complex64 Complex with two 32-bit floats
complex128 Complex with two 64-bit floats

Checking Dtype

The dtype attribute reveals an array's data type.

1. Basic Examples

```python import numpy as np

def main(): x = np.array([1, 2, 3]) y = np.array([1, 2, 3], dtype='uint8') z = np.array([1, 2, 3], dtype='float32') w = np.array([1., 2, 3]) print(x.dtype) print(y.dtype) print(z.dtype) print(w.dtype)

if name == "main": main() ```

Output:

int64 uint8 float32 float64

2. Float Inference

Including a decimal point (1.) triggers float64 inference.

Default Dtypes

Some functions have specific default dtypes.

1. zeros and ones

```python import numpy as np

def main(): a = np.zeros((2, 3)) b = np.ones((2, 3)) print(f"{a.dtype = }") print(f"{b.dtype = }")

if name == "main": main() ```

Output:

a.dtype = dtype('float64') b.dtype = dtype('float64')

2. float64 Default

np.zeros and np.ones default to float64, not integers.

MNIST Example

Image datasets commonly use uint8 for efficiency.

1. Loading MNIST

```python import numpy as np import matplotlib.pyplot as plt import torchvision.transforms as transforms from torchvision.datasets import MNIST

def main(): train_dataset = MNIST(root='data/', train=True, transform=transforms.ToTensor(), download=True)

fig, ax = plt.subplots(figsize=(9, 6))
fig.suptitle(f'{train_dataset.data.dtype = }', fontsize=15)

img = np.empty((28 * 10, 28 * 15))
for i in range(10):
    for j in range(15):
        img[i*28:(i+1)*28, j*28:(j+1)*28] = train_dataset.data[i*15+j]
ax.imshow(img, cmap='binary')
ax.axis('off')
plt.show()

if name == "main": main() ```

2. Why uint8

8-bit unsigned integers (0-255) perfectly represent pixel intensities.

Framework Comparison

Different frameworks have different default integer types.

1. NumPy Default

```python import numpy as np

a = np.array([1, 2, 3]) # int64 b = np.array([1., 2, 3]) # float64 c = a + b print(c) ```

2. PyTorch Default

```python import torch

a = torch.tensor([1, 2, 3]) # int64 (or int32) b = torch.tensor([1., 2, 3]) # float32

c = a + b # Error: different types

```

3. TensorFlow Default

```python import tensorflow as tf

a = tf.constant([1, 2, 3]) # int32 b = tf.constant([1., 2, 3]) # float32

c = a + b # Error: different types

```

NumPy promotes types automatically; PyTorch and TensorFlow require explicit conversion.

Common Dtype Pitfalls

Overflow and Precision

uint8 wraparound: values outside 0--255 silently wrap around. np.uint8(256) becomes 0, np.uint8(-1) becomes 255. This is the most common bug in image processing pipelines.

float32 precision loss: float32 has ~7 decimal digits of precision. In iterative algorithms (gradient descent, cumulative sums), rounding errors accumulate. Use float64 when precision matters more than memory.

Integer overflow: np.int8(127) + np.int8(1) wraps to -128 with no warning.

Dtype Bytes Use when Avoid when
float32 4 GPU training, images High-precision numerics
float64 8 Scientific computing Memory-constrained workloads
uint8 1 Image pixels (0--255) Arithmetic that may exceed 0--255
int64 8 General integers Memory-critical large arrays


Notebook Examples

```python import numpy as np

a = np.array([1,2,3]) b = np.array([4,5,6]) c = a + b # vector addition

print(a.dtype, b.dtype, c.dtype) ```

```python import numpy as np

a = np.array([1,2,3.]) b = np.array([4,5,6]) c = a + b # vector addition

print(a.dtype, b.dtype, c.dtype) ```

```python import torch

x = torch.tensor([1,2,3]) print(x, x.dtype) ```

```python import torch

x = torch.tensor([1,2,3.]) print(x, x.dtype) ```

```python import torch from torchvision import datasets, transforms import matplotlib.pyplot as plt

Transform: convert images to tensor

transform = transforms.ToTensor()

Download MNIST

mnist = datasets.MNIST(root="./data", train=True, download=True, transform=None)

Get first 64 images

images = [mnist[i][0] for i in range(64)] # (image, label)

Plot

fig, axes = plt.subplots(8, 8, figsize=(8, 8))

for i, ax in enumerate(axes.flat): #ax.imshow(images[i].squeeze(), cmap="binary") ax.imshow(images[i], cmap="binary") ax.axis("off")

plt.tight_layout() plt.show() ```

```python from torchvision import datasets import numpy as np

Download MNIST (no transform → raw PIL images)

mnist = datasets.MNIST(root="./data", train=True, download=True, transform=None)

Get one image

img, label = mnist[0]

Convert to NumPy

img_np = np.array(img)

Check dtype

print("Type:", type(img)) print("NumPy dtype:", img_np.dtype) print("Min/Max:", img_np.min(), img_np.max()) ```


Exercises

Exercise 1. Create a NumPy array from the Python list [1, 2.5, 3, 4.0] and print its dtype. Then create the same array with an explicit dtype=np.int32 and print the resulting values to observe the truncation behavior.

Solution to Exercise 1
import numpy as np

# Default dtype inference
a = np.array([1, 2.5, 3, 4.0])
print(a.dtype)   # float64 (because 2.5 and 4.0 are floats)
print(a)          # [1.  2.5 3.  4. ]

# Explicit int32 dtype — floats are truncated
b = np.array([1, 2.5, 3, 4.0], dtype=np.int32)
print(b.dtype)   # int32
print(b)          # [1 2 3 4]  (2.5 truncated to 2)

Exercise 2. Given an array a = np.array([100, 200, 300], dtype=np.int16), check whether the dtype is an integer kind using the .kind attribute. Then print the itemsize and verify that the total memory (nbytes) equals len(a) * itemsize.

Solution to Exercise 2
import numpy as np

a = np.array([100, 200, 300], dtype=np.int16)

# Check integer kind
print(a.dtype.kind)      # 'i' (signed integer)
print(a.dtype.itemsize)  # 2 (bytes per element)

# Verify total memory
print(a.nbytes)                        # 6
print(len(a) * a.dtype.itemsize)       # 6
print(a.nbytes == len(a) * a.dtype.itemsize)  # True

Exercise 3. Create two arrays: x = np.array([1, 2, 3], dtype=np.float32) and y = np.array([4, 5, 6], dtype=np.float64). Compute z = x + y and print the dtype of z. Explain why NumPy chose that dtype by referencing the type promotion rules.

Solution to Exercise 3
import numpy as np

x = np.array([1, 2, 3], dtype=np.float32)
y = np.array([4, 5, 6], dtype=np.float64)
z = x + y
print(z.dtype)  # float64

# Explanation: NumPy promotes to the higher-precision type.
# float32 + float64 -> float64, following the rule that
# the result dtype is the smallest type that can safely
# represent both operands.