[SOLVED] Numpy: For every element in one array, find the index in another array

Issue

I have two 1D arrays, x & y, one smaller than the other. I’m trying to find the index of every element of y in x.

I’ve found two naive ways to do this, the first is slow, and the second memory-intensive.

The slow way

indices= []
for iy in y:
    indices += np.where(x==iy)[0][0]

The memory hog

xe = np.outer([1,]*len(x), y)
ye = np.outer(x, [1,]*len(y))
junk, indices = np.where(np.equal(xe, ye))

Is there a faster way or less memory intensive approach? Ideally the search would take advantage of the fact that we are searching for not one thing in a list, but many things, and thus is slightly more amenable to parallelization.
Bonus points if you don’t assume that every element of y is actually in x.

Solution

As Joe Kington said, searchsorted() can search element very quickly. To deal with elements that are not in x, you can check the searched result with original y, and create a masked array:

import numpy as np
x = np.array([3,5,7,1,9,8,6,6])
y = np.array([2,1,5,10,100,6])

index = np.argsort(x)
sorted_x = x[index]
sorted_index = np.searchsorted(sorted_x, y)

yindex = np.take(index, sorted_index, mode="clip")
mask = x[yindex] != y

result = np.ma.array(yindex, mask=mask)
print result

the result is:

[-- 3 1 -- -- 6]

Answered By – HYRY

Answer Checked By – Robin (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *