I am trying to speed up a code that is using Numpy’s
where() function. There are two calls to
where(), which return an array of indices for where the statement is evaluated as
True, which are then compared for overlap with numpy’s
intersect1d() function, of which the length of the intersection is returned.
import numpy as np def find_match(x,y,z): A = np.where(x == z) B = np.where(y == z) #A = True #B = True return len(np.intersect1d(A,B)) N = np.power(10, 8) M = 10 X = np.random.randint(M, size=N) Y = np.random.randint(M, size=N) Z = np.random.randint(M, size=N) #print(X,Y,Z) print(find_match(X,Y,Z))
This code takes about 8 seconds on my laptop. If I replace both the
B=True, then it takes about 5 seconds. If I replace only one of the
np.where()then it takes about 6 seconds.
Scaling up, by switching to
N = np.power(10, 9), the code takes 87 seconds. Replacing both the
np.where()statements results in the code takes 51 seconds. Replacing just one of the
np.where()takes about 61 seconds.
How can I merge the two
np.where statements that can speed up the code?
This is already an improved version of the code where the speed was increased ~4x by replacing a slower lookup with for-loops. Multiprocessing will be used at a higher level in this code, so I can’t apply it also here.
For the record, the actual data are strings, so doing integer math won’t be helpful.
Python 3.9.1 (default, Jan 8 2021, 17:17:43) [Clang 12.0.0 (clang-1188.8.131.52)] on darwin >>> import numpy >>> print(numpy.__version__) 1.19.5
Does this help?
def find_match2(x, y, z): return len(np.nonzero(np.logical_and(x == z, y == z)))
In : print(find_match(X,Y,Z)) 1000896 In : print(find_match2(X,Y,Z)) 1000896 In : %timeit find_match(X,Y,Z) 2.37 s ± 70.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In : %timeit find_match2(X,Y,Z) 272 ms ± 9.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.random.seed(210) before creating the arrays for the sake of reproducibility.
Answered By – Tonechas
Answer Checked By – Robin (BugsFixing Admin)