[SOLVED] Saving a numpy array in binary does not improve disk usage compared to uint8

Issue

I’m saving numpy arrays while trying to use as little disk space as possible.
Along the way I realized that saving a boolean numpy array does not improve disk usage compared to a uint8 array.
Is there a reason for that or am I doing something wrong here?

Here is a minimal example:

import sys
import numpy as np

rand_array = np.random.randint(0, 2, size=(100, 100), dtype=np.uint8)  # create a random dual state numpy array

array_uint8 = rand_array * 255  # array, type uint8

array_bool = np.array(rand_array, dtype=bool)  # array, type bool

print(f"size array uint8 {sys.getsizeof(array_uint8)}")
# ==> size array uint8 10120
print(f"size array bool {sys.getsizeof(array_bool)}")
# ==> size array bool 10120

np.save("array_uint8", array_uint8, allow_pickle=False, fix_imports=False)
# size in fs: 10128
np.save("array_bool", array_bool, allow_pickle=False, fix_imports=False)
# size in fs: 10128

Solution

The uint8 and bool data types both occupy one byte of memory per element, so the arrays of equal dimensions are always going to occupy the same memory. If you are aiming to reduce your memory footprint, you can pack the boolean values as bits into a uint8 array using numpy.packbits, thereby storing binary data in a significantly smaller array (read here)

Answered By – theo

Answer Checked By – Mary Flores (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *