[SOLVED] Is there a faster way to get the index of the maximum value and keep count?

Issue

Is there a faster way to get the count of each team having the highest score?

Input

``````# Random scores per player per round (each player corresponds to an integer from 0 - 100)
scores_per_round = np.random.rand(10_000,100)

# 100,000 random teams of 8
teams = np.array([random.sample(list(range(100)), 8) for _ in range(100_000)])
``````

Desired Output

``````# Count of top scores per team key being the index of the team in the teams array and the value being the amount of wins.
{
0: 20,
1: 12,
...
}
``````

Currently I loop through the rounds, add up each teams score and then grab the maximum values index using `np.argmax` and store the count in a dictionary.

``````import random
from collections import defaultdict

win_count = defaultdict(int)

# Random scores
scores_per_round = np.random.rand(10_000,100)

# 100,000 random teams of 8
teams = np.array([random.sample(list(range(100)), 8) for _ in range(100_000)])

# Loop through and keep track of teams wins
for round in range(10_000):
win_count[np.argmax(np.sum(np.take(scores_per_round[round], teams), axis=1))] += 1
``````

Solution

The initial code is slow because it allocates pretty-big temporary arrays. Iterating over them 10_000 is expensive because the RAM or the last-level cache are relatively slow (compared to the L1 cache or registers). Using Numba can fix this problem by computing the arrays on-the-fly in a more cache-friendly way.

Here is a simple parallel implementation:

``````import numpy as np
import numba as nb

@nb.njit('int32[::1](float64[:,::1], int32[:,::1])', parallel=True, fastmath=True)
def computeTeamWins(scores_per_round, teams):
roundCount = scores_per_round.shape[0]
result = np.empty(roundCount, dtype=np.int32)
n, m = teams.shape
assert m == 8 # See the comment below

for r in nb.prange(roundCount):
iMax, sMax = -1, -1.0
for i in range(n):
s = 0.0
# Faster if the size is known as the loop can be unrolled
for j in range(8):
s += scores_per_round[r, teams[i, j]]
if s > sMax:
iMax, sMax = i, s
result[r] = iMax

return result

win_count = defaultdict(int)
for v in computeTeamWins(scores_per_round, teams):
win_count[v] += 1
``````

On my 6-core machine, it takes 0.8 second while the initial code takes 54.3 seconds. This means the Numba implementation is about 68 times faster. If you cast `teams` to an `np.uint8` array, then the computation takes only 0.57 seconds resulting in a 95 times faster execution (because of the caches). Note that this means the maximum integer value is bounded to 255 (included). Note that the final loop takes only few milliseconds.

Answered By – Jérôme Richard

Answer Checked By – Marie Seifert (BugsFixing Admin)