# [SOLVED] Efficient selection of values in numpy

## Issue

I’m trying to find elements of one DataFrame (`df_other`) which match a column in another DataFrame (`df`). In other words, I’d like to know where the values in `df['a']` match the values in `df_other['a']` for each row in `df['a']`.

An example might be easier to explain the expected result:

``````>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> df = pd.DataFrame({'a': ['x', 'y', 'z']})
>>> df
a
0  x
1  y
2  z
>>> df_other = pd.DataFrame({'a': ['x', 'x', 'y', 'z', 'z2'], 'c': [1, 2, 3, 4, 5]})
>>> df_other
a  c
0   x  1
1   x  2
2   y  3
3   z  4
4  z2  5
>>>
>>>
>>> u = df_other['c'].unique()
>>> u
array([1, 2, 3, 4, 5])
>>> bm = np.ones((len(df), len(u)), dtype=bool)
>>> bm
array([[ True,  True,  True,  True,  True],
[ True,  True,  True,  True,  True],
[ True,  True,  True,  True,  True]])
``````

should yield a bitmap of

``````[
[1, 1, 0, 0, 0], # [1, 2] are df_other['c'] where df_other['a'] == df['a']
[0, 0, 1, 0, 0], #  matches
[0, 0, 0, 1, 0], #  matches
]
``````

I’m looking for a fast numpy implementation that doesn’t iterate through all rows (which is my current solution):

``````>>> df_other['a'] == df.loc[0, 'a']
0     True
1     True
2    False
3    False
4    False
Name: a, dtype: bool
>>>
>>>
>>> df_other['a'] == df.loc[1, 'a']
0    False
1    False
2     True
3    False
4    False
Name: a, dtype: bool
>>> df_other['a'] == df.loc[2, 'a']
0    False
1    False
2    False
3     True
4    False
Name: a, dtype: bool
``````

Note: in the actual production code, there are many more column conditions (`(df['a'] == df_other['a']) & (df['b'] == df_other['b'] & ...`), but they are generally less than the number of rows in `df`, so I wouldn’t mind a solution that loops over the conditions (and subsequently sets values in `bm` to false).

Also, the bitmap should have the shape of `(len(df), len(df_other['c'].unique))`.

## Solution

numpy broadcasting is so useful here:

``````bm = df_other.values[:, 0] == df.values
``````

Output:

``````>>> bm
array([[ True,  True, False, False, False],
[False, False,  True, False, False],
[False, False, False,  True, False]])
``````

If you need it as ints:

``````>>> bm.astype(int)
array([[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]])
``````