## Issue

I have three 1D vectors. Let’s say T with 100k element array, f and df each with 200 element array:

```
T = [T0, T1, ..., T100k]
f = [f0, f1, ..., f200]
df = [df0, df1, ..., df200]
```

For each element array, I have to calculate a function such as the following:

```
P = T*f + T**2 *df
```

My first instinct was to use the NumPy outer to find the function with each combination of **f** and **df**

```
P1 = np.outer(f,T)
P2 = np.outer(df,T**2)
P = np.add.outer(P1, P2)
```

However, in this case, I am facing the ram issue and receiving the following error:

Unable to allocate 2.23 PiB for an array with shape (200, 100000, 200,

100000) and data type float64

Is there a good way that I can calculate this?

My attempt using for loops

```
n=100
f_range = 5e-7
df_range = 1.5e-15
fsrc = np.arange(f - n * f_range, f + n * f_range, f_range) #array of 200
dfsrc = np.arange(df - n * df_range, df + n * df_range, df_range) #array of 200
dfnus=pd.DataFrame(fsrc)
numf=dfnus.shape[0]
dfnudots=pd.DataFrame(dfsrc)
numfdot=dfnudots.shape[0]
test2D = np.zeros([numf,(numfdot)])
for indexf, f in enumerate(fsrc):
for indexfd, fd in enumerate(dfsrc):
a=make_phase(T,f,fd) #--> this is just a function that performs T*f + T**2 *df
zlauf2d=z_n(a, n=1, norm=1) #---> And this is just another function that takes this 1D "a" and gives another 1D element array
test2D[indexf, indexfd]=np.copy(zlauf2d) #---> I do this so I could make a contour plot at the end. It just copys the same thing to 2D
```

Now my test2D has the shape of (200,200). This is what I want, however the floor loop is taking ages and I want somehow reduce two for loop to at least one.

## Solution

Using broadcasting:

```
P1 = (f[:, np.newaxis] * T).sum(axis=-1)
P2 = (df[:, np.newaxis] * T**2).sum(axis=-1)
P = P1[:, np.newaxis] + P2
```

Alternatively, using outer:

```
P1 = (np.outer(f, T)).sum(axis=-1)
P2 = (np.outer(df, T**2)).sum(axis=-1)
P = P1[..., np.newaxis] + P2
```

This produces an array of shape `(f.size, df.size) == (200, 200)`

.

Generally speaking, if the *final* output array size is very large, one can either:

- Reduce the size of the datatypes. One way is to change the datatypes of the arrays used to calculate the final output via
`P1.astype(np.float32)`

. Alternatively, some operations allow one to pass in a`dtype=np.float32`

as a parameter. - Chunk the computation and work with smaller subsections of the result.

Based on the most recent edit, compute an array `a`

with shape `(200, 200, 100000)`

. Then, take its element-wise norm along the last axis to produce an array `z`

with shape `(200, 200)`

.

```
a = (
f[:, np.newaxis, np.newaxis] * T
+ df[np.newaxis, :, np.newaxis] * T**2
)
# L1 norm along last axis.
z = np.abs(a).sum(axis=-1)
```

This produces an array of shape `(f.size, df.size) == (200, 200)`

.

Answered By – Mateen Ulhaq

Answer Checked By – Pedro (BugsFixing Volunteer)