## Issue

I have a pandas dataframe df with many rows. For each row, I want to calculate the cosinus similarity between the row’s columns A (first vector) and the row’s columns B (second vector). At the end, I aim to get a vector with one cosine similarity value for each row. I have found a solution but it seems to me like it could be done much faster without this loop. May anyone give me some feedback on this code?

```
for row in np.unique(df.index):
cos_sim[row]=scipy.spatial.distance.cosine(df[df.index==row][columnsA],
df[df.index==row][columnsB])
df['cos_sim']=cos_sim
```

Here comes some sample data:

```
df = pd.DataFrame({'featureA1': [2, 4, 1, 4],
'featureA2': [2, 4, 1, 4],
'featureB1': [10, 2, 1, 8]},
'featureB2': [10, 2, 1, 8]},
index=['Pit', 'Mat', 'Tim', 'Sam'])
columnsA=['featureA1', 'featureA2']
columnsB=['featureB1', 'featureB2']
```

This is my desired output (cosine similarity for Pit, Mat, Tim and Sam):

```
cos_sim=[1, 1, 1, 1]
```

I am already receiving this output with my method, but I am sure the code could be improved from a performance perspective

## Solution

several things you can improve on ðŸ™‚

- Take a look at the
`DataFrame.apply`

function. pandas already offers you looping “under the hood”.

```
df['cos_sim'] = df.apply(lambda _df: scipy.spatial.distance.cosine(_df[columnsA], _df[columnsB])
```

or something similar should be more performant

- Also take a look at
`DataFrame.loc`

```
df[df.index==row][columnsA]
```

and

```
df.loc[row,columnsA]
```

should be equivalent

- If you really have to iterate over the dataframe (should be avoided again due to performance penalties and it is more difficult to read and understand), pandas gives you a generator for the rows (and id)

```
for index, row in df.iterrows():
scipy.spatial.distance.cosine(row[columnsA], row[columnsB])
```

- Finally as mentioned above to get better answers on stackoverflow, always provide a concrete example where the problem is reproducible. Otherwise it is much harder to interpret the question correctly and to test a solution.

