# [SOLVED] Is there any way to speed up the computation time of calling a function with multiple time in python?

## Issue

``````    import numpy as np
import matplotlib.pyplot as plt
from numpy import random
import time
from collections import Counter

def simulation(N):
I = 10
success = 0
M = 100
for i in range(I):
s = allocate(N,M)
M -= s
success += s
return success

def allocate(N,M):
count = Counter(random.randint(N,size = M))
success = sum(j for v,j in count.items() if j == 1)
return success

if __name__ == "__main__":
start = time.perf_counter()
SAMPLE_SIZE = 100000
N = np.linspace(5,45,41).astype(int)
Ps = []
for n in N:
ps = []
for _ in range(SAMPLE_SIZE):
ps.append(simulation(n)/100)
result = np.average(np.array(ps))
Ps.append(result)

elapsed = (time.perf_counter() - start)
print("Time used:",elapsed)
plt.scatter(N,Ps)
plt.show()
``````

Here is my situation. The ultimate goal is to set SAMPLE_SIZE to 10^7. However, when I set it to 10^5, it already requires about 1000sec to run it. Is there any way to make it more efficient and faster? Thanks for giving me suggestions.

## Solution

First of all, the implementation of `allocate` is not very efficient: you can use vectorized Numpy function to do that:

``````def allocate(N, M):
success = np.count_nonzero(np.bincount(random.randint(N, size=M)) == 1)
return success
``````

The thing is most of the time comes from the overhead of Numpy functions performing some checks and create some temporary arrays. You can use Numba to fix this problem:

``````import numba as nb

@nb.njit('int_(int_, int_)')
def allocate(N,M):
tmp = np.zeros(N, np.int_)
for i in range(M):
rnd = np.random.randint(0, N)
tmp[rnd] += 1
count = 0
for i in range(N):
count += tmp[i] == 1
return count
``````

Then, you can speed up the code a bit further by using the Numba decorator `@nb.njit('int_(int_)')` to the `simulation` function so to avoid the overhead of calling Numba functions from the CPython interpreter.

Finally, you can speed up the main loop by running it in parallel with Numba (and also avoid the use of slow lists). You can also recycle the `tmp` array so not to cause too many allocations (that are expensive and do not scale with the number of cores). Here is the resulting final code:

``````import numpy as np
import matplotlib.pyplot as plt
import time
import numba as nb

# Recycle the `tmp` buffer not to do many allocations
@nb.njit('int_(int_, int_, int_[::1])')
def allocate(N, M, tmp):
tmp.fill(0)
for i in range(M):
rnd = np.random.randint(0, N)
tmp[rnd] += 1
count = 0
for i in range(N):
count += tmp[i] == 1
return count

@nb.njit('int_(int_)')
def simulation(N):
I = 10
success = 0
M = 100
tmp = np.zeros(N, np.int_) # Preallocated buffer
for i in range(I):
s = allocate(N, M, tmp)
M -= s
success += s
return success

@nb.njit('float64(int_, int_)', parallel=True)
def compute_ps_avg(n, sample_size):
ps = np.zeros(sample_size, dtype=np.float64)
for i in nb.prange(sample_size):
ps[i] = simulation(n) / 100.0
# Note that np.average is not yet supported by Numba
return np.mean(ps)

if __name__ == "__main__":
start = time.perf_counter()
SAMPLE_SIZE = 100_000
N = np.linspace(5,45,41).astype(int)
Ps = [compute_ps_avg(n, SAMPLE_SIZE) for n in N]
elapsed = (time.perf_counter() - start)
print("Time used:",elapsed)
plt.scatter(N,Ps)
plt.show()
``````

Here are performance results on my 10-core machine:

``````Initial code:          670.6 s
Optimized Numba code:    3.9 s
``````

The resulting code is 172 times faster.

More than 80% of the time is spent in the generation of random numbers. Thus, if you want the code to be faster, one solution is to speed up the generation of random number using a SIMD-optimized random number generator. Unfortunately, AFAIK, this is not possible to (efficiently) achieve this in Python. You certainly need to use a native language like C or C++ to do that.