[SOLVED] Outer product of large vectors exceeds memory

Issue

I have three 1D vectors. Let’s say T with 100k element array, f and df each with 200 element array:

T = [T0, T1, ..., T100k]
f = [f0, f1, ..., f200]
df = [df0, df1, ..., df200]

For each element array, I have to calculate a function such as the following:

P =  T*f + T**2 *df

My first instinct was to use the NumPy outer to find the function with each combination of f and df

P1 = np.outer(f,T)
P2 = np.outer(df,T**2)
P = np.add.outer(P1, P2)

However, in this case, I am facing the ram issue and receiving the following error:

Unable to allocate 2.23 PiB for an array with shape (200, 100000, 200,
100000) and data type float64

Is there a good way that I can calculate this?

My attempt using for loops

n=100
f_range = 5e-7
df_range = 1.5e-15

fsrc = np.arange(f - n * f_range, f + n * f_range, f_range) #array of 200
dfsrc = np.arange(df - n * df_range, df + n * df_range, df_range) #array of 200

dfnus=pd.DataFrame(fsrc)
numf=dfnus.shape[0]

dfnudots=pd.DataFrame(dfsrc)
numfdot=dfnudots.shape[0]

test2D = np.zeros([numf,(numfdot)])

for indexf, f in enumerate(fsrc):
    
    for indexfd, fd in enumerate(dfsrc):
        
        a=make_phase(T,f,fd) #--> this is just a function that performs T*f + T**2 *df
        zlauf2d=z_n(a, n=1, norm=1) #---> And this is just another function that takes this 1D "a" and gives another 1D element array
        test2D[indexf, indexfd]=np.copy(zlauf2d) #---> I do this so I could make a contour plot at the end. It just copys the same thing to 2D
        

Now my test2D has the shape of (200,200). This is what I want, however the floor loop is taking ages and I want somehow reduce two for loop to at least one.

Solution

Using broadcasting:

P1 = (f[:, np.newaxis] * T).sum(axis=-1)
P2 = (df[:, np.newaxis] * T**2).sum(axis=-1)
P = P1[:, np.newaxis] + P2

Alternatively, using outer:

P1 = (np.outer(f, T)).sum(axis=-1)
P2 = (np.outer(df, T**2)).sum(axis=-1)
P = P1[..., np.newaxis] + P2

This produces an array of shape (f.size, df.size) == (200, 200).


Generally speaking, if the final output array size is very large, one can either:

  1. Reduce the size of the datatypes. One way is to change the datatypes of the arrays used to calculate the final output via P1.astype(np.float32). Alternatively, some operations allow one to pass in a dtype=np.float32 as a parameter.
  2. Chunk the computation and work with smaller subsections of the result.

Based on the most recent edit, compute an array a with shape (200, 200, 100000). Then, take its element-wise norm along the last axis to produce an array z with shape (200, 200).

a = (
    f[:, np.newaxis, np.newaxis] * T
    + df[np.newaxis, :, np.newaxis] * T**2
)

# L1 norm along last axis.
z = np.abs(a).sum(axis=-1)

This produces an array of shape (f.size, df.size) == (200, 200).

Answered By – Mateen Ulhaq

Answer Checked By – Pedro (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *