## Issue

I have an array with several fields, which I want to be sorted with respect to 2 of them. One of these fields is binary, e.g.:

```
size = 100000
data = np.empty(
shape=2 * size,
dtype=[('class', int),
('value', int),]
)
data['class'][:size] = 0
data['value'][:size] = (np.random.normal(size=size) * 10).astype(int)
data['class'][size:] = 1
data['value'][size:] = (np.random.normal(size=size, loc=0.5) * 10).astype(int)
np.random.shuffle(data)
```

I need the result to be sorted with respect to `value`

, and for same values `class=0`

should go first. Doing it like so **(a)**:

```
idx = np.argsort(data, order=['value', 'class'])
data_sorted = data[idx]
```

seems to be an order of magnitude slower compared to sorting just `data['value']`

. Is there a way to improve the speed, given that there are only two classes?

By experimenting randomly I noticed that an approach like this **(b)**:

```
idx = np.argsort(data['value'])
data_sorted = data[idx]
idx = np.argsort(data_sorted, order=['value', 'class'], kind='mergesort')
data_sorted = data_sorted[idx]
```

takes ~20% less time than **(a)**. Changing field datatypes seem to also have some effect – floats instead of ints seem to be slightly faster.

## Solution

The simplest way to do this is using the `order`

parameter of `sort`

```
sort(data, order=['value', 'class'])
```

However, this takes 121 ms to run on my computer, while `data['class']`

and `data['value']`

take only 2.44 and 5.06 ms respectively. Interestingly, `sort(data, order='class')`

takes 135 ms again, suggesting the problem is with sorting structured arrays.

So, the approach you’ve taken of sorting each field using `argsort`

then indexing the final array seems to be on the right track. However, you need to sort each field individually,

```
idx=argsort(data['class'])
data_sorted = data[idx][argsort(data['value'][idx], kind='stable')]
```

This runs in 43.9 ms.

You can get a very slight speedup by removing one temporary array from indexing

```
idx = argsort(data['class'])
tmp = data[idx]
data_sorted = tmp[argsort(tmp['value'], kind='stable')]
```

Which runs in 40.8 ms. Not great, but it is a workaround if performance is critical.

This seems to be a known problem:

sorting numpy structured and record arrays is very slow

**Edit**

The sourcecode for the comparisons used in sort can be seen at https://github.com/numpy/numpy/blob/dea85807c258ded3f75528cce2a444468de93bc1/numpy/core/src/multiarray/arraytypes.c.src .

The numeric types are much, much simpler. Still, that large of a difference in performance is surprising.

Answered By – user2699

Answer Checked By – Mary Flores (BugsFixing Volunteer)