## Issue

I have list of arrays and I want to calculate the cosine similarity for each combination of arrays in my list of arrays.

My full list comprises 20 arrays with 3 x 25000. A small selection below

```
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity,cosine_distances
C = np.array([[-127, -108, -290],
[-123, -83, -333],
[-126, -69, -354],
[-146, -211, -241],
[-151, -209, -253],
[-157, -200, -254]])
D = np.array([[-129, -146, -231],
[-127, -148, -238],
[-132, -157, -231],
[ -93, -355, -112],
[ -95, -325, -137],
[ -99, -282, -163]])
E = np.array(([[-141, -133, -200],
[-132, -123, -202],
[-119, -117, -204],
[-107, -210, -228],
[-101, -194, -243],
[-105, -175, -244]]))
ArrayList = (C,D,E)
```

My first problem is I am getting a pairwise result for each element of each array, however, what I am trying to achieve is the result looking at the arrays as a whole.

For example I try

```
scores = cosine_similarity(C,D)
scores
array([[0.98078461, 0.98258287, 0.97458466, 0.643815 , 0.71118811,
0.7929595 ],
[0.95226207, 0.95528395, 0.9428837 , 0.55905221, 0.63291722,
0.7240552 ],
[0.9363733 , 0.93972303, 0.9255921 , 0.51752531, 0.59402196,
0.68918496],
[0.98998438, 0.98903931, 0.99377116, 0.85494921, 0.8979725 ,
0.9449272 ],
[0.99335622, 0.99255262, 0.99635952, 0.84106771, 0.88619755,
0.93616556],
[0.9955969 , 0.99463213, 0.99794805, 0.82706302, 0.8738389 ,
0.92640196]])
```

What I am expecting is a singular value 0.989… (this is a made up number)

The next challenge is how to iterate over each array in my list of arrays to get a pairwise result of the array something like this

```
C D E
C 1.0 0.97 0.95
D 0.97 1.0 0.96
E 0.95 0.95 1.0
```

As a beginner to python I am not sure how to proceed. Any help appreciated.

## Solution

If I understand correctly, what you are trying to do is to get he cosine distance when using each matrix as an `1Xn`

dimensional vector. The easiest thing in my opinion will be to vectorially implement the cosine similarity with numpy functions. As a reminder, given two 1D vectors `x`

and `y`

, the cosine similarity is given by:

```
cosine_similarity = x.dot(y) / (np.linalg.norm(x, 2) * np.linalg.norm(y, 2))
```

To do this with the three metrices, we will first flatten them into 1D representation and stack them together:

```
matrices_1d = temp = np.vstack((C.reshape((1, -1)), D.reshape(1, -1), E.reshape(1,-1)))
```

Now that we have the vector-representation of each matrix, we can compute the L2 norm using `np.linalg.norm`

(read on this functions here) as follows:

```
norm_vec = np.linalg.norm(matrices_1d , ord=2, axis=1)
```

And finally, we can compute the cosine distances as follows:

```
cos_sim = matrices_1d .dot(matrices_1d .T) / np.outer(norm_vec ,norm_vec)
# array([[1. , 0.9126993 , 0.9699609 ],
# [0.9126993 , 1. , 0.93485159],
# [0.9699609 , 0.93485159, 1. ]])
```

Note that as a sanity check, the diagonal values are 1 since the cosine distance of a vector from itself is 1.

The cosine distance if defined to be `1-cos_sim`

and is easy to computeonce you have the similarity.

Answered By – Tomer Geva

Answer Checked By – Dawn Plyler (BugsFixing Volunteer)