[SOLVED] Efficient way of selecting elements of an array based on values from another array in Python?

Issue

I have two arrays, e.g. one is labels another is distances:

labels= array([3, 1, 0, 1, 3, 2, 3, 2, 1, 1, 3, 1, 2, 1, 3, 2, 2, 3, 3, 3, 2, 3,
        0, 3, 3, 2, 3, 2, 3, 2,...])

distances = array([2.32284095, 0.36254613, 0.95734965, 0.35429638, 2.79098656,
        5.45921793, 2.63795657, 1.34516461, 1.34028463, 1.10808795,
        1.60549826, 1.42531201, 1.16280383, 1.22517273, 4.48511033,
        0.71543217, 0.98840598,...]) 

What I want to do is to group the values from distances into N arrays based on the amount of unique label values (in this case N=4). So all values with label = 3 go in one array with label = 2 in another and so on.

I can think of simple brute force with loops and if-conditions but this will incur serious slowdown on large arrays. I feel that there are better ways of doing this by using either native list comprehension or numpy, or something else, just not sure what. What would be best, most efficient approaches?

"Brute force" example for reference, note:(len(labels)==len(distances)):

all_distance_arrays = []
for id in np.unique(labels):

   sorted_distances = []
   
   for index in range(len(labels)):

        if id == labels[index]:

          sorted_distances.append(distances[index])
    
   all_distance_arrays.append(sorted_distances)

Solution

A simple list comprehension will be nice and fast:

groups = [distances[labels == i] for i in np.unique(labels)]

Output:

>>> groups
[array([0.95734965]),
 array([0.36254613, 0.35429638, 1.34028463, 1.10808795, 1.42531201,
        1.22517273]),
 array([5.45921793, 1.34516461, 1.16280383, 0.71543217, 0.98840598]),
 array([2.32284095, 2.79098656, 2.63795657, 1.60549826, 4.48511033])]

Answered By – richardec

Answer Checked By – Dawn Plyler (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *