[SOLVED] Group numpy array elements without for-loop

Issue

After doing some text processing, I’ve got a list of tokens and a list of sentence indices, one for each token. Now I’d like to reassemble the tokens into sentences. I’ve used Numpy, but I feel like there’s a better/faster/more-numpy-ish way to do this…without a for loop. There could be a lot more than two sentences in the future.

``````import numpy as np

all_tokens = np.array(['I', 'spent', 'a', 'lot', 'of', 'time', ',', 'money', ',', 'and', 'effort', 'childproofing', 'my', 'house', '.', 'However', ',', 'the', 'kids', 'still', 'get', 'in', '.'])
sent_ids = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1])

new_sents = []
for unique_sent_id in np.unique(sent_ids):
sent_tokens = all_tokens[sent_ids == unique_sent_id].tolist()
new_sents.append(' '.join(sent_tokens))
``````

Result: ["I spent a lot of time , money , and effort childproofing my house .", "However , the kids still get in ."]

Solution

Assuming `sent_ids` is ordered, you can find out the position where `sent_id` has changed and then split tokens based on that:

``````list(map(" ".join, np.split(all_tokens, np.flatnonzero(np.diff(sent_ids) != 0)+1)))
# ['I spent a lot of time , money , and effort childproofing my house .', 'However , the kids still get in .']
``````