Cosine Similarity
What is Cosine Similarity?
Cosine Similarity is a measure of similarity between two vectors by calculating the cosine of
the angle between them.
1. If two vectors point in the same direction, cosine similarity = 1
2. If two vectors are completely different (90° apart), cosine similarity = 0
3. If they are opposite, cosine similarity = –1
Formula
Why Use Cosine Similarity?
Perfect for text documents, where the length (number of words) varies but we care about how
similar the content is.
Example
Input documents
Let’s say we have two short documents:
Doc1: "I love machine learning"
Doc2: "I love deep learning"
Vocabulary
Vocabulary = [I, love, machine, deep, learning]
Index: [0, 1, 2, 3, 4]
Convert to vectors (using Bag of Words)
Term Doc1 Doc2
I 1 1
love 1 1
machine 1 0
deep 0 1
learning 1 1
Apply the formula
Doc1 vector = [1, 1, 1, 0, 1]
Doc2 vector = [1, 1, 0, 1, 1]
Dot product:
1×1+1×1+1×0+0×1+1×1=31×1 + 1×1 + 1×0 + 0×1 + 1×1 = 3
Magnitude of Doc1:
12+12+12+02+12=4=2\sqrt{1^2 + 1^2 + 1^2 + 0^2 + 1^2} = \sqrt{4} = 2
Magnitude of Doc2:
12+12+02+12+12=4=2\sqrt{1^2 + 1^2 + 0^2 + 1^2 + 1^2} = \sqrt{4} = 2
Cosine Similarity:
32×2=0.75\frac{3}{2 \times 2} = 0.75
Result: 0.75 Documents are fairly similar.
0 to 1