## Issue

I am trying to do simple calculations on 4D image arrays (timeseries included), but the broadcasting eat up a lot of RAM compared to the initialized arrays, and I have already tried to read others with the some what similar problems. E.g.

Memory growth with broadcast operations in NumPy

Here a comment from "rth" says

*"The broadcasting does not make additional memory allocations for the initial arrays"*

and the accepted comment from "Warren Weckesser" show the problem with the askers use og newaxis that create an extra array that is allocated.

I tried doing what Warren showed, but I still get alot of RAM eaten up and I cannot figure out why. Right now I have implemented rths chunk calculation method with good results as such, but it still buggers me why the direct numpy calculations blow up in RAM usage.

**Here is an example of what i do**

I initialize the array that I will add the data to and create the random raw images of uint16 as it is coming from 16bit TIFF files with 12bit RAW image data. And I keep the rest float32 to save RAM. The last precision is not that important

```
import numpy as np
imagearraydensity = np.ones((512, 1024, 250, 20),dtype=np.float32)
imagearrayraw = np.random.randint(4095,size=(512, 1024, 250, 20),dtype=np.uint16)
```

I have two arrays of linear constants calculated before hand, here just random numbers

```
acons = np.random.random((512, 1024)).astype(np.float32)
bcons = np.random.random((512, 1024)).astype(np.float32)
```

Then the calculation

```
np.divide(np.subtract(imagearrayraw, bcons[:, :, np.newaxis, np.newaxis], dtype=np.float32),
acons[:, :, np.newaxis, np.newaxis],
imagearraydensity, # Output array position
dtype=np.float32)
```

My code with real data end up using ~ 36 GB RAM on my system with ~27 GB when done with the calculation.

Are there anything I can do to reduce the RAM usage from the direct broadcasting or is the best method the chunk based way as I have already implemented?

## Solution

First of all, the two input arrays contains 2_621_440_000 items resulting in 15 GiB of RAM. The temporary array generated by `np.subtract`

contains the same number of element resulting in 10 GiB of RAM. The output array is overwritten so it should not take more RAM. This means Numpy should already take 25 GiB of RAM.

The thing is Numpy does not know how to subtract a `uint16`

array from a `float32`

one. Thus it applies semantic rules that cause the ** uint16 array to be casted to a float32** one. This means it creates a new 10 GiB temporary array hence the ~36 GiB of RAM taken. Note that neither Numpy nor Python are designed to minimize the memory footprint.

That being said, there are several solutions to address this issue. One solution is to split the input data in chunks so that the temporary array take less space. This is quite cumbersome to do tough. Another theoretical solution would be to cast the array yourself so to write it in `imagearrayraw`

, but Numpy does not provide an `out`

parameter for `astype`

. Another solution is to use Numba so to perform the cast on the fly (which is much more efficient).

Here is the chunk-based solution:

```
for i in range(imagearraydensity.shape[0]):
imagearraydensity[i,...] = imagearrayraw[i,...].astype(np.float32)
np.subtract(imagearraydensity, bcons[:, :, np.newaxis, np.newaxis], out=imagearraydensity, dtype=np.float32)
np.divide(imagearraydensity, acons[:, :, np.newaxis, np.newaxis], out=imagearraydensity, dtype=np.float32)
```

This solution takes 15 GiB on my machine.

Answered By – Jérôme Richard

Answer Checked By – David Marino (BugsFixing Volunteer)