# [SOLVED] row-wise calculation of cosine similarity in pandas without looping

## Issue

I have a pandas dataframe df with many rows. For each row, I want to calculate the cosinus similarity between the row’s columns A (first vector) and the row’s columns B (second vector). At the end, I aim to get a vector with one cosine similarity value for each row. I have found a solution but it seems to me like it could be done much faster without this loop. May anyone give me some feedback on this code?
Thank you very much!

``````
for row in np.unique(df.index):
cos_sim[row]=scipy.spatial.distance.cosine(df[df.index==row][columnsA],
df[df.index==row][columnsB])

df['cos_sim']=cos_sim
``````

Here comes some sample data:

``````df = pd.DataFrame({'featureA1': [2, 4, 1, 4],

'featureA2': [2, 4, 1, 4],

'featureB1': [10, 2, 1, 8]},

'featureB2': [10, 2, 1, 8]},

index=['Pit', 'Mat', 'Tim', 'Sam'])

columnsA=['featureA1', 'featureA2']
columnsB=['featureB1', 'featureB2']
``````

This is my desired output (cosine similarity for Pit, Mat, Tim and Sam):

``````cos_sim=[1, 1, 1, 1]
``````

I am already receiving this output with my method, but I am sure the code could be improved from a performance perspective

## Solution

several things you can improve on ðŸ™‚

1. Take a look at the `DataFrame.apply` function. pandas already offers you looping “under the hood”.
``````df['cos_sim'] = df.apply(lambda _df: scipy.spatial.distance.cosine(_df[columnsA], _df[columnsB])
``````

or something similar should be more performant

1. Also take a look at `DataFrame.loc`
``````df[df.index==row][columnsA]
``````

and

``````df.loc[row,columnsA]
``````

should be equivalent

1. If you really have to iterate over the dataframe (should be avoided again due to performance penalties and it is more difficult to read and understand), pandas gives you a generator for the rows (and id)
``````for index, row in df.iterrows():
scipy.spatial.distance.cosine(row[columnsA], row[columnsB])
``````
1. Finally as mentioned above to get better answers on stackoverflow, always provide a concrete example where the problem is reproducible. Otherwise it is much harder to interpret the question correctly and to test a solution.