Dtype Basics¶

The dtype attribute specifies how array bytes are interpreted.

Mental Model

A NumPy array is a flat buffer of bytes; the dtype is the lens that tells NumPy how to interpret each chunk of those bytes. Choosing a dtype is like choosing a unit of measurement -- float64 gives you precision, uint8 saves memory, and picking wrong silently corrupts your numbers.

Dtype Reference¶

NumPy supports many data types with different precision and range.

Data type	Description
`bool_`	Boolean (True or False) stored as a byte
`int_`	Default integer type (int64 or int32)
`int8`	Byte (-128 to 127)
`int16`	Integer (-32768 to 32767)
`int32`	Integer (-2147483648 to 2147483647)
`int64`	Integer (-9223372036854775808 to 9223372036854775807)
`uint8`	Unsigned integer (0 to 255)
`uint16`	Unsigned integer (0 to 65535)
`uint32`	Unsigned integer (0 to 4294967295)
`uint64`	Unsigned integer (0 to 18446744073709551615)
`float16`	Half precision (sign, 5-bit exp, 10-bit mantissa)
`float32`	Single precision (sign, 8-bit exp, 23-bit mantissa)
`float64`	Double precision (sign, 11-bit exp, 52-bit mantissa)
`complex64`	Complex with two 32-bit floats
`complex128`	Complex with two 64-bit floats

Checking Dtype¶

The dtype attribute reveals an array's data type.

1. Basic Examples¶

```python import numpy as np

def main(): x = np.array([1, 2, 3]) y = np.array([1, 2, 3], dtype='uint8') z = np.array([1, 2, 3], dtype='float32') w = np.array([1., 2, 3]) print(x.dtype) print(y.dtype) print(z.dtype) print(w.dtype)

if name == "main": main() ```

Output:

int64 uint8 float32 float64

2. Float Inference¶

Including a decimal point (1.) triggers float64 inference.

Default Dtypes¶

Some functions have specific default dtypes.

1. zeros and ones¶

```python import numpy as np

def main(): a = np.zeros((2, 3)) b = np.ones((2, 3)) print(f"{a.dtype = }") print(f"{b.dtype = }")

if name == "main": main() ```

Output:

a.dtype = dtype('float64') b.dtype = dtype('float64')

2. float64 Default¶

np.zeros and np.ones default to float64, not integers.

MNIST Example¶

Image datasets commonly use uint8 for efficiency.

1. Loading MNIST¶

```python import numpy as np import matplotlib.pyplot as plt import torchvision.transforms as transforms from torchvision.datasets import MNIST

def main(): train_dataset = MNIST(root='data/', train=True, transform=transforms.ToTensor(), download=True)

fig, ax = plt.subplots(figsize=(9, 6))
fig.suptitle(f'{train_dataset.data.dtype = }', fontsize=15)

img = np.empty((28 * 10, 28 * 15))
for i in range(10):
    for j in range(15):
        img[i*28:(i+1)*28, j*28:(j+1)*28] = train_dataset.data[i*15+j]
ax.imshow(img, cmap='binary')
ax.axis('off')
plt.show()

if name == "main": main() ```

2. Why uint8¶

8-bit unsigned integers (0-255) perfectly represent pixel intensities.

Framework Comparison¶

Different frameworks have different default integer types.

1. NumPy Default¶

```python import numpy as np

a = np.array([1, 2, 3]) # int64 b = np.array([1., 2, 3]) # float64 c = a + b print(c) ```

2. PyTorch Default¶

```python import torch

a = torch.tensor([1, 2, 3]) # int64 (or int32) b = torch.tensor([1., 2, 3]) # float32

c = a + b # Error: different types¶

```

3. TensorFlow Default¶

```python import tensorflow as tf

a = tf.constant([1, 2, 3]) # int32 b = tf.constant([1., 2, 3]) # float32

c = a + b # Error: different types¶

```

NumPy promotes types automatically; PyTorch and TensorFlow require explicit conversion.

Common Dtype Pitfalls¶

Overflow and Precision

uint8 wraparound: values outside 0--255 silently wrap around. np.uint8(256) becomes 0, np.uint8(-1) becomes 255. This is the most common bug in image processing pipelines.

float32 precision loss: float32 has ~7 decimal digits of precision. In iterative algorithms (gradient descent, cumulative sums), rounding errors accumulate. Use float64 when precision matters more than memory.

Integer overflow: np.int8(127) + np.int8(1) wraps to -128 with no warning.

Dtype	Bytes	Use when	Avoid when
`float32`	4	GPU training, images	High-precision numerics
`float64`	8	Scientific computing	Memory-constrained workloads
`uint8`	1	Image pixels (0--255)	Arithmetic that may exceed 0--255
`int64`	8	General integers	Memory-critical large arrays

Notebook Examples¶

```python import numpy as np

a = np.array([1,2,3]) b = np.array([4,5,6]) c = a + b # vector addition

print(a.dtype, b.dtype, c.dtype) ```

```python import numpy as np

a = np.array([1,2,3.]) b = np.array([4,5,6]) c = a + b # vector addition

print(a.dtype, b.dtype, c.dtype) ```

```python import torch

x = torch.tensor([1,2,3]) print(x, x.dtype) ```

```python import torch

x = torch.tensor([1,2,3.]) print(x, x.dtype) ```

```python import torch from torchvision import datasets, transforms import matplotlib.pyplot as plt

Transform: convert images to tensor¶

transform = transforms.ToTensor()

Download MNIST¶

mnist = datasets.MNIST(root="./data", train=True, download=True, transform=None)

Get first 64 images¶

images = [mnist[i][0] for i in range(64)] # (image, label)

Plot¶

fig, axes = plt.subplots(8, 8, figsize=(8, 8))

for i, ax in enumerate(axes.flat): #ax.imshow(images[i].squeeze(), cmap="binary") ax.imshow(images[i], cmap="binary") ax.axis("off")

plt.tight_layout() plt.show() ```

```python from torchvision import datasets import numpy as np

Download MNIST (no transform → raw PIL images)¶

mnist = datasets.MNIST(root="./data", train=True, download=True, transform=None)

Get one image¶

img, label = mnist[0]

Convert to NumPy¶

img_np = np.array(img)

Check dtype¶

print("Type:", type(img)) print("NumPy dtype:", img_np.dtype) print("Min/Max:", img_np.min(), img_np.max()) ```

Exercises¶

Exercise 1. Create a NumPy array from the Python list [1, 2.5, 3, 4.0] and print its dtype. Then create the same array with an explicit dtype=np.int32 and print the resulting values to observe the truncation behavior.

Solution to Exercise 1

import numpy as np

# Default dtype inference
a = np.array([1, 2.5, 3, 4.0])
print(a.dtype)   # float64 (because 2.5 and 4.0 are floats)
print(a)          # [1.  2.5 3.  4. ]

# Explicit int32 dtype — floats are truncated
b = np.array([1, 2.5, 3, 4.0], dtype=np.int32)
print(b.dtype)   # int32
print(b)          # [1 2 3 4]  (2.5 truncated to 2)

Exercise 2. Given an array a = np.array([100, 200, 300], dtype=np.int16), check whether the dtype is an integer kind using the .kind attribute. Then print the itemsize and verify that the total memory (nbytes) equals len(a) * itemsize.

Solution to Exercise 2

import numpy as np

a = np.array([100, 200, 300], dtype=np.int16)

# Check integer kind
print(a.dtype.kind)      # 'i' (signed integer)
print(a.dtype.itemsize)  # 2 (bytes per element)

# Verify total memory
print(a.nbytes)                        # 6
print(len(a) * a.dtype.itemsize)       # 6
print(a.nbytes == len(a) * a.dtype.itemsize)  # True

Exercise 3. Create two arrays: x = np.array([1, 2, 3], dtype=np.float32) and y = np.array([4, 5, 6], dtype=np.float64). Compute z = x + y and print the dtype of z. Explain why NumPy chose that dtype by referencing the type promotion rules.

Solution to Exercise 3

import numpy as np

x = np.array([1, 2, 3], dtype=np.float32)
y = np.array([4, 5, 6], dtype=np.float64)
z = x + y
print(z.dtype)  # float64

# Explanation: NumPy promotes to the higher-precision type.
# float32 + float64 -> float64, following the rule that
# the result dtype is the smallest type that can safely
# represent both operands.