[SOLVED] Python memory management when creating variables that go into functions

Issue

I have two arrays A and B that have size (10000,100,100) (very large). I need to perform a series of operations to pass them to other functions. My question is: how can I save the most amount of memory? Let me give a specific example.

A = np.random.rand(10000,100,100)
B = np.random.rand(10000,100,100)

def ave_l2_error(diffs):
    for err in diffs:
        print(np.mean(err))

def ave_l1_error(diffs):
    for err in diffs:
        print(np.mean(err))

#Is there a difference in terms of memory usage between doing this:
L2 = [np.power(A-B, 2)]
L1 = [np.abs(A-B)]
ave_l2_error(L2)
ave_l1_error(L1)

#vs this:
ave_l2_error([np.power(A-B, 2)])
ave_l1_error([np.abs(A-B)])

I would think the first case uses more memory because it saves L1 and L2. This reddit thread discusses renaming variables, but this is a slightly different situation (or maybe not). Would here the garbage collector detect L1 and L2 are not used anymore, and hence it deletes them? What if the code is run in IPython (instead of a shell), where one has access to variables? Would that case make a difference?

Solution

In the first version, the arrays created by np.power() and np.abs() will stay in memory until the script ends, because the variables prevent them from becoming garbage.

In the second version, the arrays will be garbage collected when the function returns, because they were only assigned to the function parameters, which go away when the function exits. So this version will use less memory.

You can make the first version like the second if you reassign or delete the variables after using them in the function calls.

Answered By – Barmar

Answer Checked By – Katrina (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.