Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views118 pages

Chapter 8 - Collaborative - Filtering

asda

Uploaded by

YouTubeATP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views118 pages

Chapter 8 - Collaborative - Filtering

asda

Uploaded by

YouTubeATP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

Chapter 8.

Collaborative
Filtering
COMP3278 Introduction to
Database Management Systems
Department of Computer Science, The University of Hong Kong
Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278
For other uses, please email : [email protected]
Real Life Example

If I browse Spider-Man
Blue-ray disk in Amazon…
2
Real Life Example

Amazon generates We can gathers visitors’


recommendations ratings on items (1-5
for me, which seems stars), and use the ratings
not just a random of other visitors to predict
selection. your interests ☺!

3
Making Prediction
We are going to
predict Bob’s
rating(s). We call
him the Active user.

Item 1 Item 2 Item 3 Item 4

Bob 4 ? 5 5
Alice 4 2 1
Peter 3 2 4
Kit 4 4
Jolly 2 1 3 5

Considering the above ratings given by the site users,


what is the predicted rating of Bob to item 2?
4
Collaborative Filtering (CF)
Common insight: personal tastes are correlated:
If Alice and Bob both like X and Alice likes Y then Bob is
more likely to like Y.
Memory-based CF There are many established
approaches to do the
User-based
prediction.
Item-based Let us focus on the Memory-
Model-based CF based CF techniques that use a
“user-based” approach as an
Clustering easy start.
Bayesian belief networks
Latent semantic analysis
5
Section 1

Memory-based
Approach
Towards a more intelligent web system

Department of Computer Science, The University of Hong Kong


Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278
For other uses, please email : [email protected]
Memory-based CF
Two core steps
Finding users who are similar to the active user.

Produce a prediction for the active user based on the


ratings of the “similar” users.

Two questions
1. Measuring Similarities - How to represent the similarity
between users?

2. Calculating Prediction - How to calculate the predicted


value?
7
1. Measuring similarity
There are various methods to represent the similarity
of two users in memory-based CF.

Vector Cosine-based similarity

Correlation-based similarity

8
Vector-cosine similarity
Bob and Kit
Transformer III Iron Man II

Rating of Iron Man II


Bob 4 5
Jolly Alice 4 1
Peter 3 2
Ben
Peter Jolly 2 3
Kit 4 5
Alice Ben 2 2.5

Rating of Transformer III

Think about it…


We would like to have a similarity value that reflects things like:
As Bob and Kit have exactly the same ratings, they should
have very high similarity.
Scaling: As the scale of the ratings that Bob and Ben gave are the
same i.e., 4:5 (0.8), and 2:2.5 (0.8) , they should have high similarity
How will you model the users
(represent them mathematically)? 9
Vector-cosine similarity
Bob and Kit
Transformer III Iron Man II

Rating of Iron Man II


Bob 4 5
Jolly Alice 4 1
Peter 3 2
Ben
Peter Jolly 2 3
Kit 4 5
Alice Ben 2 2.5

Rating of Transformer III

User modeling: Each user is modeled as a “vector” of


ratings.
Similarity measure: The inner angle formed by the two
users’ vectors.
Rationale - If the inner angle of the vectors of two users is
smaller, then they are more similar.
10
Vector-cosine similarity
For easy interpretation, we scale the similarity value
range from 1 (similar) to 0 (not similar). How can we do
it? Rating of user u on item x
The vector u
Answer:
We obtain the cosine of the Θ
angle.
Rating of user u on item y
Then the value of 1 means the
most similar, 0 means not similar.
Can you derive the formula of
calculating cosΘ?

Theta is the angle that represents the similarity of two users u and v.
A larger Θ means that the two users u and v are less similar, vice versa.
11
Vector-cosine similarity

Target: We would like to


express cos θ in terms of ru,x
, rv,x ,ru,y and rv,y so that we
can substitute the raw data
into the formula to
calculate the similarity.

12
Angle sum and difference
Distance formula https://en.wikipedia.org/wiki/Distance
d = sqrt[ (ru,x -rv,x ) 2 + (ru,y -rv,y ) 2 ]
d
1

Step1. Let us consider a unit circle


1
so that the length of vector u and v
are both 1.
Step 2. We draw a line between the
point (ru,x ,ru,y) and (rv,x ,rv,y), let’s call
the distance between the two points d.

Step 3. What is the value of d?

13
Angle sum and difference
Distance formula https://en.wikipedia.org/wiki/Distance
d = sqrt[ (ru,x -rv,x ) 2 + (ru,y -rv,y ) 2 ]
d
1
= sqrt[ (cos θu - cos θv ) 2 + (sin θu - sin θv) 2 ]

sin θu 1

sin θv

cos? θu cos θv 14
Angle sum and difference
Distance formula https://en.wikipedia.org/wiki/Distance
d = sqrt[ (ru,x -rv,x ) 2 + (ru,y -rv,y ) 2 ]
d
1
= sqrt[ (cos θu - cos θv ) 2 + (sin θu - sin θv) 2 ]

= sqrt[ cos θu 2 - 2 cos θu cos θv + cos θv 2 + sin θu 2 - 2 sin θu sin θv + sin θv 2 ]

= sqrt[ - 2 cos θu cos θv - 2 sin θu sin θv + 1 + 1 ] 1

sin θu 2 + cos θu 2 = 1
Pythagorean identity
https://en.wikipedia.org/wiki/List_of_trigonometric_identities

15
Angle sum and difference
Distance formula https://en.wikipedia.org/wiki/Distance
d = sqrt[ (ru,x -rv,x ) 2 + (ru,y -rv,y ) 2 ]
d
1
= sqrt[ (cos θu - cos θv ) 2 + (sin θu - sin θv) 2 ]

= sqrt[ cos θu 2 - 2 cos θu cos θv + cos θv 2 + sin θu 2 - 2 sin θu sin θv + sin θv 2 ]

= sqrt[ - 2 cos θu cos θv - 2 sin θu sin θv + 2 ] …(i) 1

Law of cosines https://en.wikipedia.org/wiki/Law_of_cosines


d 2 = 12 + 12 - 2(1)(1) cos θ
= 2 - 2 cos θ ………………………………….…(ii)

Combining (i) and (ii)


2 - 2 cos θ = - 2 cos θu cos θv - 2 sin θu sin θv + 2

cos θ = cos θu cos θv + sin θu sin θv


16
Vector-cosine similarity

17
Vector-cosine similarity

?
|u |is the magnitude of u, i.e., the length of u.
To obtain it, we need to apply the Pythagorean
theorem.
Pythagorean theorem

=
18
Vector-cosine similarity

?
|v |is the magnitude of v, i.e., the length of v.
To obtain it, we need to apply the Pythagorean
theorem.
Pythagorean theorem

=
19
Vector-cosine similarity

This is the dot-product of


the two vectors u and v.

Note : this is just the proof of


the case for 2 dimensions (two
items)! 20
Example
Transformer III Iron Man II Similarity similarity
Bob and Kit Bob 4 5 Bob Kit 1
Alice 4 0
Peter 3 2
Rating of Iron Man II

Jolly 0 3
Kit 4 5
Ben 2 2.5

Rating of Transformer III


= =
Similarity between Bob and Kit

(4 × 4 + 5 × 5) / [ sqrt(42+52) × sqrt(42+52) ]
Since similarity between Bob
= ( 16 + 25 ) / [ sqrt(41) × sqrt(41) ]
and Kit calculated in this way
=1
is 1, they are “the same” in
this similarity model.
21
Vector-cosine similarity
Transformer III Iron Man II Similarity similarity
Bob and Kit Bob 4 5 Bob Kit 1
Alice 4 0 Bob Ben 1
Peter 3 2
Rating of Iron Man II

Jolly 0 3
Kit 4 5
Ben 2 2.5
Ben

Rating of Transformer III


= =
Similarity between Bob and Ben

(4 × 2 + 5 × 2.5)/[sqrt(42+52) × sqrt(22+2.52)] Bob and Ben are “the same” in this


= ( 8 + 12.5) / [ sqrt(41) × sqrt(10.25) ] similarity model.
This shows that this model handles
= 20.5 / 20.5
different scale of ratings by
=1 different users.

22
Vector-cosine similarity
Transformer III Iron Man II Similarity similarity
Bob and Kit Bob 4 5 Bob Kit 1
Alice 4 0 Bob Ben 1
Peter 3 2 Jolly Alice 0
Rating of Iron Man II

Jolly 0 3
Jolly Kit 4 5
Ben 2 2.5
Ben

Alice

Rating of Transformer III


= =
Similarity between Ben and Alice

(0 × 4 + 3 × 0) / [ sqrt(02+32) × sqrt(42+02) ]
=0
Since similarity between Jolly
and Alice calculated in this
way is 0, they are not similar .

23
Example
Transformer III Iron Man II Similarity similarity
Bob and Kit Bob 4 5 Bob Kit 1
Alice 4 0 Bob Ben 1
Peter 3 2 Jolly Alice 0
Rating of Iron Man II

Jolly 0 3 Ben Alice 0.62


Jolly Kit 4 5
Ben 2 2.5
Ben

Alice

Rating of Transformer III


= =
Similarity between Ben and Alice
(2 × 4 + 2.5 × 0) / [ sqrt(22+2.52) × sqrt(42+02) ]
= 8 / [sqrt(10.25) × sqrt(16) ]
= 0.62

24
Example
Transformer III Iron Man II Similarity similarity
Bob and Kit Bob 4 5 Bob Kit 1
Alice 4 0 Bob Ben 1
Peter 3 2 Jolly Alice 0
Rating of Iron Man II

Jolly 0 3 Ben Alice 0.62


Jolly Kit 4 5 Peter Alice 0.83
Ben 2 2.5
Ben
Peter
Alice

Rating of Transformer III


= =
Similarity between Ben and Alice
(2 × 4 + 2.5 × 0) / [ sqrt(22+2.52) × sqrt(42+02) ]
= 8 / [sqrt(10.25) × sqrt(16) ] Comparing Ben and Peter
= 0.62 with Alice, Peter is more
Similarity between Peter and Alice similar to Alice than Ben.
(3 × 4 + 2 × 0) / [ sqrt(32+22) × sqrt(42+02) ]
= 12 / [sqrt(13) × sqrt(16) ]
= 0.83
25
Correlation-based similarity
Some users may have their rating shifted by certain
amount.
E.g., In fact, Bob and Jolly’s are very similar except Jolly’s
rating is always Bob’s rating minus 2 (shifting).
Cosine similarity does not reflect negative correlation.
E.g., Bob and Peter’s ratings are negative correlated in
the sense that when Bob’s rating is low, Peter’s rating is
high, vice versa.
Transformer III Iron Man II
Think about it… Bob 4 5

How can we tackle the Alice


Peter
4
3
1
2
shifting problem? Jolly 2 3
26
Correlation-based similarity
y
Bob Jolly
x– y–
x y
Average Average Average Bob
(x,y) (x,y) (x,y)
Jolly Bob 4 5 4.5 -0.5 0.5
Alice 4 1 2.5 1.5 -1.5 Peter
Peter Peter 3 2 2.5 0.5 -0.5
Jolly 2 3 2.5 -0.5 0.5
Alice
x Alice

Idea: Instead of working on the absolute ratings, let’s look at


the variation w.r.t. the average rating of a user.
Bob’s rating on x and y are -0.5, +0.5 from his average,
Jolly’s rating on x and y are -0.5, +0.5 from her average
Bob and Jolly are exactly the same in this sense.

27
Correlation-based similarity
Rating of user u on item
y relative to the average
ratings of user u.

Rating of user u on item


x relative to the average
ratings of user u.

The vector of u ( relative to the


average ratings of u himself)

We use to model the correlation-based


similarity between two users.
Before we derive the formula of calculating the
similarity, I want to understand the meaning of the
similarity value. What does it represents?
28
Correlation-based similarity
y
Average
x- y- z- z
x y z Average Average Average
(x,y,z)
(x,y,z) (x,y,z) (x,y,z)
Bob 4 5 6 5 -1 0 1
Alice 4 1 4 3 1 -2 1 x
Peter 3 2 1 2 1 0 -1
Jolly 2 3 4 3 -1 0 1

Now, let’s look at the


Since the two
data with 3 films (3
vectors overlap,
dimensions) for easier
the angle is 0 and
explanation.
cos(0) is 1.
Take Bob and Jolly as an
What does the
example. What will be
correlation value
the correlation
of 1 imply?
?
29
Correlation-based similarity
y
Average
x- y- z- z
x y z Average Average Average
(x,y,z)
(x,y,z) (x,y,z) (x,y,z)
Bob 4 5 6 5 -1 0 1
Alice 4 1 4 3 1 -2 1 x
Peter 3 2 1 2 1 0 -1
Jolly 2 3 4 3 -1 0 1

It means Bob and Jolly’s ratings Jolly


(minus their corresponding
average) follow perfect positive
correlation!!!
Bob
If Bob’s rating (relative to his avg
rating) is higher, Jolly rating
(relative to her avg rating) is also
higher.
30
Correlation-based similarity
y
Average
x- y- z- z
x y z Average Average Average
(x,y,z)
(x,y,z) (x,y,z) (x,y,z)
Bob 4 5 6 5 -1 0 1
Alice 4 1 4 3 1 -2 1 x
Peter 3 2 1 2 1 0 -1
Jolly 2 3 4 3 -1 0 1

Now lets look at Bob Since the two vectors


and Peter’s pointing exactly in
similarity. What will opposite direction, the
be the correlation angle is 180 and cos(180)
? is -1.
What does the correlation
value of -1 imply?
31
Correlation-based similarity
y
Average
x- y- z- z
x y z Average Average Average
(x,y,z)
(x,y,z) (x,y,z) (x,y,z)
Bob 4 5 6 5 -1 0 1
Alice 4 1 4 3 1 -2 1 x
Peter 3 2 1 2 1 0 -1
Jolly 2 3 4 3 -1 0 1

It means Bob and Peter’s ratings Peter


(minus their corresponding
average) forms perfect negative
correlation!!!
if Bob’s rating (relative to his avg Bob
rating) is higher, Peter’s rating
(relative to his avg rating) is
lower, vice versa.
32
Correlation-based similarity
y
Average
x- y- z- z
x y z Average Average Average
(x,y,z)
(x,y,z) (x,y,z) (x,y,z)
Bob 4 5 6 5 -1 0 1
Alice 4 1 4 3 1 -2 1 x
Peter 3 2 1 2 1 0 -1
Jolly 2 3 4 3 -1 0 1

Now take Bob and Alice’s Alice


similarity as an example. The inner
angle is 90 degree (we will show
how to calculate shortly) and
equals cos(90) = 0.
Bob
This reflects the Bob and Alice ‘s
rating does not show any linear
correlations.
33
Correlation-based similarity
Note that the correlation reflects the extend to which
two variables linearly related with each other.
Perfect negative
Perfect positive
correlation
correlation
(Like Bob and Peter)
(Like Bob and Jolly)

No linear correlation
(Like Bob and Alice)

The correlation value is not the slope of that


relationship. They are all perfect
correlations (Except
the middle one)
although they have
different slopes.

Note that the middle case has correlation undefined because


there is no variation w.r.t. the mean value for the y-dimension.
34
Correlation-based similarity

Now let’s derive


the formula of We reuse the proof for the vector-
cosine similarity, but the vectors
the similarity change from u, v to
measure. and .
35
Correlation-based similarity

Note that I is the set of co-rated items of user


u and v. (so that we won’t have null value fit
into this formula) These are the
average on the co-
rated items.

36
Correlation-based similarity
The correlation is actually called the Pearson
coefficient, a well known measure of correlation
between two random variables (u and v).

In collaborative filtering, we can use the Pearson


coefficient as the weight in the prediction model.

37
2. Computing prediction
Target: To calculate a prediction for the user a’s
rating on item i (an item which user a has not rated).
Average rating
that user a gave to
other items
Weighted average
of all other users’ ratings on the item i
= + (relative to the corresponding user’s
average rating).

38
2. Computing prediction

Average rating
For other users
rated item i
Weighted
that user a
gave

Relative ratings of each other


?
= +
user on item i

Average ?

39
2. Computing prediction
Insight: If the weighting with user u is 1
(perfect positive correlation), and if
user u rate the item i as +3 from the his
average. Then we predict user a will
For other users also rate +3 from a’s average.
Average rating rated item i
that user a
gave
Weighting
Relative ratings of each other
(the similarity of that

= +
user on item i user with user a)

Average item i
?
Sum of weightings over all other users who rated on

Why multiply with Insight: If the weighting with user u is -1


the weighting for (perfect negative correlation), and if
user u rate the item i as +3 from the his
each item in the average. Then we predict user a will rate
nominator? -3 from a’s average.
40
2. Computing prediction

For other users


Average rating rated item i
that user a
gave
Weighting
(the similarity of that

= + user with user a)

Sum of weightings over all other users who rated on


item i

Relative rating is the user’s rating on item i minus the


user’s rating over all items he/she rated (handle shifting).

41
2. Computing prediction

For other users


Average rating rated item i
that user a
gave

= + Sum of weightings over all other users who rated on


item i

Let the similarity between two users a and u be


Note that can be Cosine similarity or Pearson
correlation coefficient.
42
2. Computing prediction

For other users


Average rating rated item i
that user a
gave

= +
Since the weighting can be negative value (e.g., -1 for
perfect negative correlation), we take the absolute value
when getting the sum of the weight.
43
Example
1. Transformer III 2. Ant-Man 3. Iron Man II 4. Spiderman

1. Bob 4 ? 5 5
2. Alice 4 2 1
3. Peter 3 2 4
4. Kit 4 4
5. Jolly 2 1 3 5

What is the predicted rating of Bob on “Ant-Man”?


Users who have rated Ant-Man are “Alice”, “Kit” and “Jolly”.
(i.e., U = {2,4,5})

What are the weights between Bob & Alice (w1,2), Bob
& Kit (w1,4), Bob & Jolly (w1,5) respectively?
(Assume that we use Pearson Correlation)
44
w1,2
1. Transformer III 2. Ant-Man 3. Iron Man II 4. Spiderman

1. Bob 4 ? 5 5
2. Alice 4 2 1
3. Peter 3 2 4
4. Kit 4 4
5. Jolly 2 1 3 5

y
Co-rated items : 1. Transformer III and 3. Iron Man II
1. Film 1 – Film 3 –
3. Iron Average
Transform Average Average
Man II (1,3)
er III (1,3) (1,3)
1. Bob 4 5 4.5 -0.5 0.5
2. Alice 4 1 2.5 1.5 -1.5 x

Since the two vectors are having opposite direction,


their angle is 180 and cos(180) = -1.
Therefore w1,2 is -1.
w1,4
1. Transformer III 2. Ant-Man 3. Iron Man II 4. Spiderman

1. Bob 4 ? 5 5
2. Alice 4 2 1
3. Peter 3 2 4
4. Kit 4 4
5. Jolly 2 1 3 5

Co-rated item : 1. Transformer III only, no correlations can be derived.


1. Film 1 –
Average
Transform Average
(1)
er III (1)
1. Bob 4 4 0 Since Bob and Kit have
4. Kit 4 4 0
no correlation, the
weighting w1,4 is 0.

46
I is the set of co-rated items, for Bob and Jolly, the co-rated items are

w1,5 (1. Transformer III, 3. Iron Man II, and 4. Spiderman)

The averages
and
are the average score
the users given to
other items.

The Pearson correlation is 0.756, 1.


2. Ant-
3.
4.
Transformer Iron Man
which mean a positive III
Man
II
Spiderman

correlation (If Alice’s rating is 1. Bob 4 5 5


larger, then Bob’s rating should be 5. Jolly 2 1 3 5

relatively larger, vice versa). 47


1.
2. 3. Iron 4.

What is P1,2? 1. Bob


Transformer

4
III
Avengers

?
Man II

5
Spiderman

5
2. Alice 4 2 1
3. Peter 3 2 4
4. Kit 4 4
5. Jolly 2 1 3 5

Therefore, the predicted rating of


Done! Bob on “Avenger” is 3.95.
48
Revision Note that I is the set of co-rated items of user u and v.
(so that we won’t have null value fit into this formula)

Item1 Item2 Item3 Item4 Item5 Average of 1,2,3,4 Item1 Item2 Item3 Item4 Similarity with Alice
Alice 5 3 4 4 ? 4 +1 -1 0 0 /
User1 3 1 2 3 3 2.25 +0.75 -1.25 -0.25 +0.75 0.85
User2 4 3 4 3 5 3.5 +0.5 -0.5 +0.5 -0.5 0.7
User3 3 3 1 5 4 3 0 0 -2 +2
User4 1 5 5 2 1 3.25 -2.25 +1.75 +1.75 -1.25

[1*0.75 + (-1)*(-1.25) + 0*(-0.25) + 0*0.75]


wAlice,User1 = sqrt( 12 + 12 + 02 + 02 ) * sqrt( 0.752 + (-1.25)2 + (-0.25)2 + 0.752 )
0.75 + 1.25
= Determine the rating of Alice on
sqrt( 2 ) * sqrt(2.75)
Item5. Use correlation-based
= 0.85 similarity in the prediction model.

[1*0.5 + (-1)*(-0.5) + 0*0.5 + 0*(-0.5)]


wAlice,User2 =
sqrt( 12 + 12 + 02 + 02 ) * sqrt( 0.52 + (-0.5)2 + 0.52 + (-0.5)2 )
0.5 + 0.5
=
sqrt( 2 ) * sqrt(1)
= 0.7 49
Revision
Item1 Item2 Item3 Item4 Item5 Average of 1,2,3,4 Item1 Item2 Item3 Item4 Similarity with Alice
Alice 5 3 4 4 ? 4 +1 -1 0 0 /
User1 3 1 2 3 3 2.25 +0.75 -1.25 -0.25 +0.75 0.85
User2 4 3 4 3 5 3.5 +0.5 -0.5 +0.5 -0.5 0.7
User3 3 3 1 5 4 3 0 0 -2 +2 0
User4 1 5 5 2 1 3.25 -2.25 +1.75 +1.75 -1.25 -0.79

[1*0 + (-1)*0 + 0*(-2) + 0*2]


wAlice,3 = sqrt( 12 + 12 + 02 + 02 ) * sqrt( 02 + 02 + (-2)2 + 22 )
0
=
sqrt( 2 ) * sqrt(8)
=0
[1*(-2.25) + (-1)*1.75 + 0*1.75 + 0*(-1.25)]
wAlice,4 =
sqrt( 12 + 12 + 02 + 02 ) * sqrt( (-2.25)2 + 1.752 + 1.752 + (-1.25)2 )
-2.25 -1.75
=
sqrt( 2 ) * sqrt(12.75)
= -0.79 50
For each user
rated item i

Revision
Item1 Item2 Item3 Item4 Item5 Average of 1,2,3,4 Item1 Item2 Item3 Item4 Similarity with Alice
Alice 5 3 4 4 ? 4 +1 -1 0 0 /
User1 3 1 2 3 3 2.25 +0.75 -1.25 -0.25 +0.75 0.85
User2 4 3 4 3 5 3.5 +0.5 -0.5 +0.5 -0.5 0.7
User3 3 3 1 5 4 3 0 0 -2 +2 0
User4 1 5 5 2 1 3.25 -2.25 +1.75 +1.75 -1.25 -0.79

(3 - 2.25)wAlice,1 + (5 – 3.5)wAlice,2 + (4 - 3)wAlice,3 + (1 - 3.25)wAlice,4


PAlice,item5 = 4 +
|wAlice,1| + |wAlice,2| + |wAlice,3| + |wAlice,4|

(0.75)0.85 + (2.5)0.7 + (1)0 + (-2.25)(-0.79)


= 4 +
|0.85| + |0.7| + |0| + |-0.79|

4.165
= 4 +
2.34

= 5.78

51
Item based similarity
Item1 Item2 Item3 Item4 Item5 Alice User1 User2 User3 User4
Alice 5 3 4 4 ? Item1 5 3 4 3 1
User1 3 1 2 3 3 Item2 3 1 3 3 5
User2 4 3 4 3 5 Item3 4 2 4 1 5
User3 3 3 1 5 4 Item4 4 3 3 5 2
User4 1 5 5 2 1 Item5 ? 3 5 4 1

Basic idea: Use the similarity between items (not


users) to make predictions.
Each item is a vector of ratings on the same item by
different users (similar to the user-based method:
each user is a vector of ratings on items).
Question: Look for items that are similar to Item5, instead of look for
users that are similar to Alice (i.e., calculate item-item similarity).
Take Alice's ratings on other similar items to predict her rating for
Item5. 52
Similarity measure
Alice User1 User2 User3 User4
Cosine-based similarity Item1
Item2
5
3
3
1
4
3
3
3
1
5
Item3 4 2 4 1 5
Item4 4 3 3 5 2
Item5 ? 3 5 4 1

Also known as vector-based similarity, this formulation views


two items and their ratings as vectors, and defines the
similarity between them as the angle between these vectors.

Pearson (correlation)-based similarity

This similarity measure is based on how much the ratings by


common users for a pair of items deviate from average
ratings for those items. (Is it meaningful?)
53
Similarity measure
Alice User1 User2 User3 User4
Adjusted cosine similarity Item1
Item2
5
3
3
1
4
3
3
3
1
5
Item3 4 2 4 1 5
Item4 4 3 3 5 2
Item5 ? 3 5 4 1

This similarity measurement is a modified form of vector-


based similarity where we take into the fact that different
users have different ratings schemes; some users might rate
items highly in general, and others might give items lower
ratings as a preference.

To remove this drawback from vector-based similarity, we


subtract average ratings for each user (not average ratings
for each item) from each user's rating for the pair of items in
question.
54
Item based similarity
Alice User1 User2 User3 User4 User1 User2 User3 User4 Similarity with Item5
Item1 5 3 4 3 1 +0.6 +0.2 -0.2 -1.8 0.8
Item2 3 1 3 3 5 -1.4 -0.8 -0.2 +2.2 -0.91
Item3 4 2 4 1 5 -0.4 +0.2 -2.2 +2.2
Item4 4 3 3 5 2 +0.6 -0.8 +1.8 -0.8
Item5 ? 3 5 4 1 +0.6 +1.2 +0.8 -1.8 /
User
Avg / 2.4 3.8 3.2 2.8
rating

U : Set of users who have rated both items i and j


[0.6*0.6 + 0.2*1.2 + (-0.2)*0.8 + (-1.8)*(-1.8)]
sim(item1,item5) =
sqrt( 0.62 + 0.22 + (-0.2)2 + (-1.8)2 ) * sqrt( 0.62 + 1.22 + 0.82 + (-1.8)2 )
= 0.8 In general, if a user like item1, it is quite likely that
he/she would also like item 5, vice versa

[(-1.4)*0.6 + (-0.8)*1.2 + (-0.2)*0.8 + 2.2*(-1.8)]


sim(item2,item5) =
sqrt( (-1.4)2 + (-0.8)2 + (-0.2)2 + 2.22 ) * sqrt( 0.62 + 1.22 + 0.82 + (-1.8)2 )

= -0.91 In general, if a user like item2, it is quite likely


that he/she would not like item 5, vice versa
55
Item based similarity
Alice User1 User2 User3 User4 User1 User2 User3 User4 Similarity with Item5
Item1 5 3 4 3 1 +0.6 +0.2 -0.2 -1.8 0.8
Item2 3 1 3 3 5 -1.4 -0.8 -0.2 +2.2 -0.91
Item3 4 2 4 1 5 -0.4 +0.2 -2.2 +2.2 -0.76
Item4 4 3 3 5 2 +0.6 -0.8 +1.8 -0.8 0.43
Item5 ? 3 5 4 1 +0.6 +1.2 +0.8 -1.8 /
User
Avg / 2.4 3.8 3.2 2.8
rating

U : Set of users who have rated both items a and b


[(-0.4)*0.6 + 0.2*1.2 + (-2.2)*0.8 + 2.2*(-1.8)]
sim(item3,item5) = =
sqrt( (-0.4)2 + 0.22 + (-2.2)2 + 2.22 ) * sqrt( 0.62 + 1.22 + 0.82 + (-1.8)2 )
= -0.76 In general, if a user like item3, maybe he/she
would not like item 5, vice versa

[0.6*0.6 + (-0.8)*1.2 + 1.8*0.8 + (-0.8)*(-1.8)]


sim(item4,item5) = =
sqrt( 0.62 + (-0.8)2 + 1.82 + (-0.8)2 ) * sqrt( 0.62 + 1.22 + 0.82 + (-1.8)2 )

= 0.43 In general, if a user like item4, maybe he/she


would probably like item 5, vice versa
56
Item based similarity
Alice User1 User2 User3 User4 User1 User2 User3 User4 Similarity with Item5
Average rating Alice Item1 5 3 4 3 1 +0.6 +0.2 -0.2 -1.8 0.8
given for item 1,2,3,4 Item2 3 1 3 3 5 -1.4 -0.8 -0.2 +2.2 -0.91
= (5+3+4+4)/4 = 4 Item3 4 2 4 1 5 -0.4 +0.2 -2.2 +2.2 -0.76
Item4 4 3 3 5 2 +0.6 -0.8 +1.8 -0.8 0.43
Item5 ? 3 5 4 1 +0.6 +1.2 +0.8 -1.8 /
User
Avg / 2.4 3.8 3.2 2.8
rating

Item2 is rated -1 from Item3 and item4 are rated +0


Item1 is rated +1 from
from Alice’s average rating
A prediction: Alice’s average rating Alice’s average rating

sim(1,5)*(+1) + sim(2,5)*(-1) + sim(3,5)*0 + sim(4,5)*0


pred(Alice,item5) = 4 +
|sim(1,5)| + |sim(2,5)| + |sim(3,5)| + |sim(4,5)|

[ 0.8 *(+1) + (-0.91)*(-1) + (-0.76)*0 + 0.43*0 ]


= 4 +
[0.8 + 0.91 + 0.76 + 0.43 ]

= 4 + 0.59 = 4.59

57
Item-based CF on binary data
We do not always give ratings. When the users
provide only binary data (e.g., the item was
purchased or not/ user like an item or not), how to sim(Item1,item2)
apply collaborative filtering in this case? sim(Item1,item3)

Item1 Item2 Item3 User1 User2 User3 Item1 Item2 Item3


User1   Item1 1 0 0 Item1 1 ? ?
User2   Item2 0 1 1 Item2 1 ?
User3  Item3 1 1 0 Item3 1 sim(Item2,item3)
Original data (User purchased items) Item-based CF (binary item vectors ) Item-item pair wise similarity

Similarity measure : cosine similarity (Why?)


sim(Item1,item2) = 1*0 + 0*1 + 0*1 / [ sqrt(12 + 02 + 02 ) * sqrt(02 + 12 + 12 ) ] = 0

sim(Item1,item3) = 1*1 + 0*1 + 0*0 / [ sqrt(12 + 02 + 02 ) * sqrt(12 + 12 + 02 ) ] = 0.71

sim(Item2,item3) = 0*1 + 1*1 + 1*0 / [ sqrt(02 + 12 + 12 ) * sqrt(12 + 12 + 02 ) ] = 0.5


58
User1 User2 User3 User4

Preprocessing Item1
Item2
Item3
3
1
2 4
5

Item4 5 2
Item5 5

Item similarities are more stable than user similarities.


Pre-processing approach by
Offline computation: Calculate all pair-wise item similarities in
advance (Up to n2 pair-wise similarities to be memorized (n =
number of items in theory).
In practice, memory requirement is significantly lower (There
are many items with no co-ratings, e.g., item 3 and item 4).
Online computation: To compute the predicted value of a
target item of an active user, we only need to retrieve the
similarity values of the items that have been purchased by the
active user with the target item. 59
Reference
A Survey of Collaborative Filtering Techniques by
Xiaoyuan Su and Taghi M. Khoshgoftaar

Wikipedia - Pearson product-moment correlation


coefficient
http://en.wikipedia.org/wiki/Pearson_product-
moment_correlation_coefficient

60
Section 2

Model-based
Approach

Department of Computer Science, The University of Hong Kong


Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278
For other uses, please email : [email protected]
Collaborative Filtering (CF)
Common insight: personal tastes are correlated:
If Alice and Bob both like X and Alice likes Y then Bob is
more likely to like Y.
Memory-based CF Now we have the idea of how
recommendation can be done
User-based through Memory-based
Item-based approach.
Let us focus on the Model -
Model-based CF
based CF techniques that use a
Clustering “Latent semantic analysis”
Bayesian belief networks approach in the next section.
Latent semantic analysis
62
Section 2a.

Motivation
Feature space
Rating of Toy story

Bob
Transformer Toy story
Peter
Bob 4 4
Alice Peter 3 3
Jolly Alice 2 2
Jolly 1 1
Rating of Transformer

Bob likes Transformer and he likes Jolly doesn’t like Transformer and she
Toy story too. doesn’t like Toy story either.

Think about it:


When we look at the users rating on two similar items,
we usually observe some “pattern” in the plot.
What is the underlying reason for that?
64
Feature space
Idea: Both Transformer and Toy
story are “similar” in terms of
Rating of Toy story

Bob
Peter their “computer animation”
Alice feature. When user rate the two
Jolly movies, they are actually rating on
Rating of Transformer
the “computer animation” feature.

There should be an underlying representation of the


response of each user to certain features.
Feature 1 Feature 2 Bob loves computer animation, that’s
User
(Computer animation) why he rates both Transformer and Toy
Jolly … story high.
Peter …
Jolly doesn’t like computer animation,
Alice … that’s why she rates both Transformer and
Bob … Toy story low. 65
Feature space
Feature 1 Feature 2
Item
Rating of Toy story (Computer animation) …
Bob …
Transformer
Peter
Alice Toy story …
Jolly

The Wolf of
Rating of Transformer Wall Street

There should also be an underlying representation


of the amount of each feature present in each item.
Feature 1 Feature 2 Toy story and Transformer are movies
User
(Computer animation) with the same “amount” of computer
Jolly … animation feature.
Peter … Users would probably rate similarly in
Alice … the two movies on the “computer
Bob … animation” feature. 66
Feature space
Instead of comparing users based on raw rating data,
ideally we hope to find similar users based on the
underlying features.
Challenge: But there is no
Rating of Toy story features given in the data!
Jolly We even do not know
Peter what the features are and
Alice how many of features exist.
Bob How can we find the
Rating of Transformer
features and the
representation of user
preference of those
features?
67
A different “angle”
Viewing data from a different “angle” .
Insight: When ratings exhibit greatest variation in an “angle”,
that “angle” should represent an underlying feature.
From this angle,
users’ ratings exhibit
Rating of Toy story

Jolly
Peter greatest variations.
Alice i.e., users’ difference
Bob are captured.
Rating of Transformer

68
A different “angle”
Viewing data from a different “angle” .
Insight: When ratings exhibit greatest variation in an “angle”,
that “angle” should represent an underlying feature.

Rating of Toy story


From this angle, users’
Jolly ratings do not exhibit any
Peter variations. i.e., users’ rating
Alice are all the same on this
Bob feature.
Rating of Transformer

69
Multiple features
When a user give rating to an item, the rating may
represents his/her preferences on multiple features.
Another “angle”
The line that minimize the sum of
perpendicular to the the shortest distances of the
first angle captures points to the line (best fit line)
the remaining represents one “angle” of viewing
variation of the data. data with greatest variance.

70
Collaborative Filtering (CF)
Through statistical analysis of users’ ratings, we seek
to generate a statistical breakdown of features without
ever having to specify what those features represent.
We would like to generate the features’ weights in items,
and users’ interest in the features.
A standard technique that we can use is called Singular
Value Decomposition (SVD).
Before we delve into the
details of SVD, let’s have
some revision of basic vector
and matrix manipulations
71
Section 2b.

Basics of
vector and matrix
A recap of fundamental mathematics

Reference: Singular Value Decomposition Tutorial by Kirk Baker


Vector length
Vector Length
v = [ 4, 3 ]
sqrt(42+32)
The length of a vector v , denoted as
|v| is found by squaring each 3
component, adding them all together,
and taking the square root of the sum.
4

Generalization for higher dimensional cases: if v = [4, 11, 8, 10],

73
Scalar multiplication
2.5 * v = [ 2.5, 5 ]
Scalar multiplication of a vector
stretches the vector without 2 * v = [ 2, 4 ]
changing its direction.
v = [ 1, 2 ]

Generalization for higher dimensional cases: if v = [3, 6, 8, 4],


then 1.5 ∗ v = 1.5 ∗ [3, 6, 8, 4]
= [4.5, 9, 12, 6].

74
Dot product
Dot product of two vectors x and y is found by
multiplying each component in x by the component
in y in the same dimension and adding them all
together to yield a scalar value.

Generalization for higher dimensional cases: if


and

75
Dot product
Geometrically, dot product is the A B
product of the length of the two θ
vectors and the cosine of the angle
between them.
A⋅B = |A||B| cos θ
The scalar projection of a vector A in the direction of a
vector B (denote as AB) is given by |A|cosθ (similarly, the
scalar project of a vector B in the direction of A is |B|cos θ ).

The dot product is thus characterized geometrically by the scalar


multiplication of AB and B (i.e., B*AB) or A and BA (i.e., A*BA).
76
Orthogonal vectors [ 1, 4 ]

A
Two vectors are orthogonal to
each other if their dot product
equals zero.
B
In two dimensional space this is [ 4, -1 ]
equivalent to saying that the vectors
are perpendicular, or that the only angle between them is a
90o angle.
A ∙ B = 1*4 + 4*-1 = 0 OR A ∙ B = |A||B|cos 90o = 0

Generalization for higher dimensional cases: [2,1,-2,4] and [3,-


6,4,2] are orthogonal because their dot product equals 0.

77
Normal vectors
A normal vector (or unit vector) is a vector of length 1.
Normalization: Any vector with an initial length > 0
can be normalized by dividing each component in it
by the vector’s length.
E.g

We can normalize and generate

is just a scalar multiple of .


i.e., they have the same direction but different
vector length only. 78
Orthonormal vectors
Normal vectors that are orthogonal to each other are
said to be orthonormal.
E.g.,

are orthonormal because

and

79
Matrix terminology
Square matrix - A matrix is said to be square if it has
the same number of rows as columns.
A diagonal matrix A is a matrix where all the entries aij
are 0 when i != j.
In other words, the only nonzero values run along the main
dialog from the upper left corner to the lower right corner.

1,1 1,2 1,3 1,4 3 0 0 0


2,1 2,2 2,3 2,4 0 5 0 0
3,1 3,2 3,3 3,4 0 0 4 0
4,1 4,2 4,3 4,4 0 0 0 2

A square matrix A diagonal matrix


80
Transpose
The transpose of a matrix is created by converting its
rows into columns.
i.e., Row 1 becomes column 1, row 2 becomes column 2, etc.

The transpose of a matrix is indicated with a


superscript T.
1 5
1 2 3 4 2 6
A 5 6 7 8 AT 3 7
4 8

81
Matrix multiplication
The coordinates of AB are determined by taking the
dot product of each row of A and each column in B.
2 1 4 3 2
A B 9 16
1 5 2 -1 4 AB =
0 26
1 2

3 2

ab1,1 2 1 4 . -1
=2*3 + 1*-1 + 4*1 = 9 ab1,2 2 1 4 . 4
=2*2 + 1*4 + 4*2 = 16
1 2

3 2

ab2,1 1 5 2 . -1
=1*3 + 5*-1 + 2*1 = 0 ab2,2 1 5 2 . 4
=1*2 + 5*4 + 2*2 = 26
1 2

Therefore the resulting matrix has equal number of rows as A,


and equal number of column as B. 82
Identity matrix
The identity matrix is a square matrix with entries on
the diagonal equal to 1 and all other entries equal 0.
The identity matrix is denoted as I. 1 0 0
I 0 1 0
The identity matrix behaves like the 0 0 1
number 1 in ordinary multiplication.
1 0 0
2 4 6 2 4 6
0 1 0 2 4 6
A 1 3 5 AI = 1 3 5 * = 1 3 5
0 0 1

1 0

ai1,1 2 4 6 . 0
=2*1+ 4*0 + 6*0 = 2 ai1,2 2 4 6 . 1
=2*0+ 4*1+ 6*0 = 4
0 0

83
1 0 0
Orthogonal matrix A 0 3/5 -4/5
0 4/5 3/5

An orthogonal matrix is a square matrix whose


columns and rows are orthonormal vectors.
This implies that a matrix A is orthogonal if ATA = I.
1 0 0 1 0 0 1 0 0
ATA = 0 3/5 4/5 0 3/5 -4/5 = 0 1 0

0 -4/5 3/5 0 4/5 3/5 0 0 1

0
This implies that col 2 of A is a normal
(ATA )2,2 0 3/5 4/5 . 3/5
=0*0+ 3/5*3/5 + -4/5*-4/5 = 1
4/5
vector (because the vector length is 1).
0
This implies that col 3 of A is a normal vector
(ATA )3,3 0 -4/5 3/5 . -4/5
=0*0+ -4/5*-4/5 + 3/5*3/5 = 1
3/5
(because the vector length is 1).

0
0 3/5 4/5 . -4/5
This implies that col 2, 3 of A are
(ATA )2,3 =0*0+ 3/5*-4/5 + 4/5*3/5 = 0
3/5 orthogonal (because their dot product is 0)
84
Section 2c.

Singular value
decomposition

Reference: The Effects of Singular Value Decomposition on Collaborative Filtering by Michael H. Pryor.
Singular value decomposition
The user rating vectors can be Toy story Transformer
Jolly 1 1
represented as an m * n matrix, Alice
Peter
2
3
2
3
A, with m users and n items. Bob 4 4

Through singular value decomposition, this matrix A


can be factored into three matrices USVT.

where UTU= I and VTV = I (i.e., U and V are orthogonal


matrices) and
S is a diagonal matrix which contains the singular values of
A, with S1,1 > S2,2 > S3,3 … 86
Singular value decomposition

(Existence) Every matrix A has an SVD.

(Uniqueness) The singular values in the diagonal of


S are uniquely determined.
How the U S VT matrices are computed and the proof of the above
properties are omitted in this chapter.
We can treat SVD as a black box tool / library in collaborative filtering.
Please refer to the references “Singular Value Decomposition Tutorial”
for the details of the computation.
JAVA Library reference: http://ejml.org/, API: http://ejml.org/javadoc/
87
Example
A U S VT

Rating of Toy story


1 1 0.183 -0.969 7.746 0 0.707 0.707
2 2
= 0.365 0.025 0 0 -0.707 0.707 Jolly
3 3 0.548 0.241
4 4 0.730 0.049 Peter
Alice
Bob

Rating of Transformer
UT U I
Toy story Transformer
0.183 0.365 0.548 0.730 0.183 -0.969 1 0
-0.969 0.025 0.241 0.049 0.365 0.025 = 0 1 Jolly 1 1
0.548 0.241 Alice 2 2
0.730 0.049 Peter 3 3
Bob 4 4

Property of SVD: UT U = I
Therefore the column vectors of U are normal
vectors and are orthogonal to each other
(orthonormal).
88
Example
A U S VT

Rating of Toy story


1 1 0.183 -0.969 7.746 0 0.707 0.707
2 2
= 0.365 0.025 0 0 -0.707 0.707 Jolly
3 3 0.548 0.241
4 4 0.730 0.049 Peter
Alice
Bob

Rating of Transformer
VT V I
Toy story Transformer
0.707 -0.707 0.707 0.707 1 0
0.707 0.707 -0.707 0.707 = 0 1 Jolly 1 1
Alice 2 2
Peter 3 3
Bob 4 4

Property of SVD: VT V = I
Therefore the column vectors of V are normal
vectors and are orthogonal to each other
(orthonormal).
89
A statistical breakdown
A U S VT

Rating of Toy story


1 1 0.183 -0.969 7.746 0 0.707 0.707
2 2
= 0.365 0.025 0 0 -0.707 0.707 Jolly
3 3 0.548 0.241
4 4 0.730 0.049 Peter
where k and l are row Alice
and col number in S.
Bob
A1,1 For k = 1, 2 ; l = 1
Rating of Transformer
1 1 U1,1 S1,1 + U1,2 S2,1 ? VT1,1 VT1,2
2 2 … …
3 3 = … … VT2,1 VT2,1
4 4 … …

US VT

We can recover A by multiplying U , S and VT.


What is the implication of this property?
Let’s look at how each value in A is recovered
in USVT. Say, A1,1 in this example…
90
A statistical breakdown
A U S VT

Rating of Toy story


1 1 0.183 -0.969 7.746 0 0.707 0.707
2 2
= 0.365 0.025 0 0 -0.707 0.707 Jolly
3 3 0.548 0.241
4 4 0.730 0.049 Peter
Alice
Bob
For k = 1, 2 ; l = 2
Rating of Transformer
1 1 U1,1 S1,1 + U1,2 S2,1 U1,1 S1,2 + U1,2 S2,2 VT1,1 VT1,2
2 2 … …
3 3 = … … VT2,1 VT2,1
4 4 … …

US VT

91
A statistical breakdown
A U S VT

Rating of Toy story


1 1 0.183 -0.969 7.746 0 0.707 0.707
2 2
= 0.365 0.025 0 0 -0.707 0.707 Jolly
3 3 0.548 0.241
4 4 0.730 0.049 Peter
Alice
Bob

Rating of Transformer
1 1 U1,1 S1,1 + U1,2 S2,1 U1,1 S1,2 + U1,2 S2,2 VT1,1 VT1,2
2 2 … …
3 3 = … … VT2,1 VT2,1
4 4 … …

U1,1 S1,1 U1,2 S2,2 VT1,1 VT1,2


= … … VT2,1 VT2,1
… …
… …

US VT
Note that since S is a diagonal matrix, all Si,j with i ≠ j
are 0. Therefore we can simplify the calculation and
remove all the terms with Si,j with i ≠ j . 92
A statistical breakdown
A U S VT

Rating of Toy story


1 1 0.183 -0.969 7.746 0 0.707 0.707
2 2
= 0.365 0.025 0 0 -0.707 0.707 Jolly
3 3 0.548 0.241
4 4 0.730 0.049 Peter
Alice
Bob

Rating of Transformer
1 1 U1,1 S1,1 + U1,2 S2,1 U1,1 S1,2 + U1,2 S2,2 VT1,1 VT1,2
2 2 … …
3 3 = … … VT2,1 VT2,1
4 4 … …

U1,1 S1,1 U1,2 S2,2 VT1,1 VT1,2


= … … VT2,1 VT2,1
… …
… …

= U1,1 S1,1VT1,1 + U1,2 S2,2 VT2,1 …


… …
… …
… …

A1,1 = U1,1 S1,1VT1,1 + U1,2 S2,2 VT2,1 = 0.183 * 7.746 * 0.707 + -0.969 * 0 * -0.707 = 1
A2,1 = U2,1 S1,1VT1,1 + U2,2 S2,2 VT2,1 = 0.365 * 7.746 * 0.707 + 0.025 * 0 * -0.707 = 2
A statistical breakdown
A U S VT

Rating of Toy story


1 1 0.183 -0.969 7.746 0 0.707 0.707
2 2
= 0.365 0.025 0 0 -0.707 0.707 Jolly
3 3 0.548 0.241
4 4 0.730 0.049 Peter
V
0.707 -0.707 Alice
0.707 0.707
Bob

Rating of Transformer

V is simply a transpose of VT.


Therefore the cell (VT)l,j is the same as Vj,l .

Due to the nature of S, in every entry for S where i ≠ j , Si,j = 0.


Therefore, every term in the above summation where
i ≠ j can be ignored.

A1,1 = U1,1 S1,1VT1,1 + U1,2 S2,2 VT2,1 = 0.183 * 7.746 * 0.707 + -0.969 * 0 * -0.707 = 1
94
A statistical breakdown
A U S VT
1 1 0.183 -0.969 7.746 0 0.707 0.707
This summation shows
2 2 0.365 0.025
3 3
= 0 0 -0.707 0.707 that the rating Aij can
0.548 0.241
4 4 0.730 0.049 be constructed as a sum
V
0.707 -0.707 over the features of the
0.707 0.707
product of
- The user i’s interest in
A4,2 = U4,1 S1,1 V2,1 + U4,2 S2,2 V2,2 each feature (U),
- The importance of
= 0.730 * 7.746 * 0.707 + 0.049 * 0 * 0.707
the each feature (S),
≈ 4
… …
- The amount of each
V1,1 V1,2
A4,2 =



… V2,1 V2,2
feature in item j (V).
U4,1 S1,1 + U4,2 S2,1 U4,1 S1,2 + U4,2 S2,2
… …
… …
V1,1 V1,2
= … …
V2,1 V2,2
U4,1 S1,1 U4,2 S2,2
… …
… …
= … … = U4,1 S1,1V2,1 + U4,2 S2,2 V2,2
… U4,1 S1,1VT1,2 + U4,2 S2,2 VT2,2
Feature 1 Feature 2 95
SVD and Collaborative filtering
A U S VT
1 1 0.183 -0.969 7.746 0 0.707 0.707
2 2 0.365 0.025
3 3
= 0 0 -0.707 0.707
0.548 0.241
4 4 0.730 0.049
V
0.707 -0.707
0.707 0.707

U is representative of the response of each user to


certain features. (4 users, 2 features in this example)

V is representative of the amount of each feature


present in each item. (2 items, 2 features in this example)
S is a matrix related to the feature importance in overall
determination of the rating. (2 features in this example, with the first one very
96
important, the 2nd one is not important at all “0”)
Example 1
A U S VT

Rating of item 2
1 1 0.183 -0.969 7.746 0 0.707 0.707
2 2
= 0.365 0.025 0 0 -0.707 0.707 Jolly
3 3 0.548 0.241
4 4 0.730 0.049 Peter
Alice
Bob

Rating of item 1

U1,1 S1,1 is the This is equal 0.707 is actually


response of user 1 on to Bob’s cos 45o. Thus 1.42 Bob
feature 1 multiply by value in the multiplying it will
the importance of projected obtain a projection
45o
feature 1. space (i.e., on item 1 (the raw
(i.e., it is 0.183 * sqrt(12+12) = rating that Bob
7.746 = 1.42) 1.42 ) gives to item 1). 1.42 cos 45o

A1,1 = U1,1 S1,1VT1,1 + U1,2 S2,2 VT2,1 = 0.183 * 7.746 * 0.707 + -0.969 * 0 * -0.707 = 1
Feature 1 Feature 2 97
Example 2

Rating of item 2
[2,3]
A U S VT
1 1 0.153 0 9.220 0 0.707 0.707
2 2 0.307 0
3 3
= 0 1 -0.707 0.707
0.460 0
4 4 0.614 0
2 3 0.383 0.707 V
3 2 0.383 -0.707
Item 1 0.707 -0.707 Rating of item 1
Item 2 0.707 0.707

Observation:
Both items 1 and 2 have equal
A5,1 = U5,1 S1,1 V1,1+ U5,2 S2,2 V1,2
weight on feature 1 (both are
= 0.383 * 9.220 * 0.707 + 0.707 * 1 * -0.707 0.707).
= 2.4966 - 0.499849 i.e. If user 5 just consider
≈ 2 feature 1, he will rate 2.4966
on item 1 and 2.
A5,2 = U5,1 S1,1 V2,1+ U5,2 S2,2 V2,2
= 0.383 * 9.220 * 0.707 + 0.707 * 1 * 0.707
= 2.4966 + 0.499849
98
≈ 3
Example 2

Rating of item 2
[2,3]
A U S VT
1 1 0.153 0 9.220 0 0.707 0.707
2 2 0.307 0
3 3
= 0 1 -0.707 0.707
0.460 0
4 4 0.614 0
2 3 0.383 0.707 V
3 2 0.383 -0.707
Item 1 0.707 -0.707 Rating of item 1
Item 2 0.707 0.707

Observation:
A5,1 = U5,1 S1,1 V1,1+ U5,2 S2,2 V1,2 Item 1 has negative value on
feature 2.
= 0.383 * 9.220 * 0.707 + 0.707 * 1 * -0.707
Therefore it brings negative
= 2.4966 - 0.499849
effect (-0.4998) to the rating
≈ 2
that user 5 give to item 1.
A5,2 = U5,1 S1,1 V2,1+ U5,2 S2,2 V2,2
= 0.383 * 9.220 * 0.707 + 0.707 * 1 * 0.707
= 2.4966 + 0.499849
99
≈ 3
Section 2d.

Generating
Prediction

Reference: The Effects of Singular Value Decomposition on Collaborative Filtering by Michael H. Pryor.
How to do prediction?
Once the SVD is computed and the U S V matrices
are known, it is possible to predict ratings for items
users have not rated by using the following formula.

User Preferences Item Features


Interested in: Science and Nature About: Science and Nature
Blu-ray movie -
1 Russell Short download time: Very important 1 Download time: Long
The Blue Planet
Good Layout & Graphic: Important Layout & Graphic: Excellent
Interested in: Coding About: Programming
“Intro to JAVA
2 John Short download time: Not important 2 Download time: Very Fast
programming"
Good Layout & Graphic: Not important Layout & Graphic: Fair
Interested in: Science and Nature About: Programming
Ebook- “Problem
3 Davis Short download time: Important 3 Download time: Extremely Fast
solving with C++"
Good Layout & Graphic: Very Important Layout & Graphic: Poor
Interested in: Coding About : Science and Nature
DVD movie - The
4 Kit Short download time: Very important 4 Download time: Fair
Monkey kingdom
Good Layout & Graphic: Not Important Layout & Graphic: Fair 101
How to do prediction?
If User 4 rates item 1 and 2 as 2 Item 1 Item 2 Item 3 Item 4
and 7 respectively, can you User 1 5 4 2 6
User 2 3 7 5 2
predict the user 4’s ratings on User 3 6 4 1 4
item 3 and 4? User 4 2 7 ? ?

User Preferences Item Features


Interested in: Science and Nature About: Science and Nature
Blu-ray movie -
1 Russell Short download time: Very important 1 Download time: Long
The Blue Planet
Good Layout & Graphic: Important Layout & Graphic: Excellent
Interested in: Coding About: Programming
“Intro to JAVA
2 John Short download time: Not important 2 Download time: Very Fast
programming"
Good Layout & Graphic: Not important Layout & Graphic: Fair
Interested in: Science and Nature About: Programming
Ebook- “Problem
3 Davis Short download time: Important 3 Download time: Extremely Fast
solving with C++"
Good Layout & Graphic: Very Important Layout & Graphic: Poor
Interested in: Coding About : Science and Nature
DVD movie - The
4 Kit Short download time: Very important 4 Download time: Fair
Monkey kingdom
Good Layout & Graphic: Not Important Layout & Graphic: Fair 102
How to do prediction?
A U S VT
5 4 2 6 -0.6 0.4124 -0.6855 14.489 0 0 0 -0.5551 -0.5982 -0.3213 -0.4805
3 7 5 2
6 4 1 4
= -0.5811 -0.8136 0.01923 0 4.932 0 0 0.4218 -0.4878 -0.5744 0.5041
-0.5498 0.4099 0.7278 0 0 1.655 0 0.6023 0.1835 -0.3306 -0.7031
-0.3889 0.6088 -0.6764 0.1437

V
-0.5551 0.4218 0.6023 -0.3889
-0.5982 -0.4878 0.1835 0.6088
-0.3213 -0.5744 -0.3306 -0.6764
-0.4805 0.5041 -0.7031 0.1437

Note: We only have the rating


matrix, what we would like to Question: How can
do is to use SVD to generate a we use the U S V
statistical breakdown of the matrices to do
underlying components of the prediction for User 4?
ratings.

103
How to do prediction?
A U S VT
5 4 2 6 User 1 -0.6 0.4124 -0.6855 14.489 0 0 0 -0.5551 -0.5982 -0.3213 -0.4805
3 7 5 2
6 4 1 4
= -0.5811 -0.8136 0.01923 0 4.932 0 0 0.4218 -0.4878 -0.5744 0.5041
-0.5498 0.4099 0.7278 0 0 1.655 0 0.6023 0.1835 -0.3306 -0.7031
-0.3889 0.6088 -0.6764 0.1437
Decreasing order
V
Item 1 -0.5551 0.4218 0.6023 -0.3889
-0.5982 -0.4878 0.1835 0.6088
Feature 1 Feature 2 Feature 3 -0.3213 -0.5744 -0.3306 -0.6764
-0.4805 0.5041 -0.7031 0.1437
A1,1 = 5 = U1,1 S1,1 V1,1 + U1,2 S2,2 V1,2 + U1,3 S3,3 V1,3

= -0.6 * 14.489 * -0.5551 + 0.4124 * 4.932 * 0.4218 + -0.6855 * 1.655 * 0.6023


= 4.826 + 0.858 + -0.683
≈5
Note: The singular values in S drop off fairly quickly.
In other words, feature one described in the S matrix by
"14.4890" is a fairly important feature in determining the
final ratings. 104
Consider feature 1 only
A U S VT
5 4 2 6 -0.6 0.4124 -0.6855 14.489 0 0 0 -0.5551 -0.5982 -0.3213 -0.4805
3 7 5 2
6 4 1 4
= -0.5811 -0.8136 0.01923 0 4.932 0 0 0.4218 -0.4878 -0.5744 0.5041
-0.5498 0.4099 0.7278 0 0 1.655 0 0.6023 0.1835 -0.3306 -0.7031
-0.3889 0.6088 -0.6764 0.1437
Since user 4 rates item 1 as 2, A4,1 equals 2. V
-0.5551 0.4218 0.6023 -0.3889
Ignore the less important features -0.5982 -0.4878 0.1835 0.6088
-0.3213 -0.5744 -0.3306 -0.6764
-0.4805 0.5041 -0.7031 0.1437
A4,1 = 2 = U4,1 S1,1 V1,1 + U4,2 S2,2 V1,2 + U4,3 S3,3 V1,3
2 = U4,1S1,1V1,1

2 = U4,1 * 14.489 * -0.5551


U4,1 = -0.24867 This is the user 4’s response on feature 1 (predicted).

If user 4 rates item one as 2, based on the current


information, we can generate an initial prediction of
U4,1 by ignoring the less important features.
105
Prediction with U4,1
A U S VT
5 4 2 6 -0.6 0.4124 -0.6855 14.489 0 0 0 -0.5551 -0.5982 -0.3213 -0.4805
3 7 5 2
6 4 1 4
= -0.5811 -0.8136 0.01923 0 4.932 0 0 0.4218 -0.4878 -0.5744 0.5041
-0.5498 0.4099 0.7278 0 0 1.655 0 0.6023 0.1835 -0.3306 -0.7031
-0.3889 0.6088 -0.6764 0.1437

This is the user 4’s V


response on feature -0.5551 0.4218 0.6023 -0.3889
1 (predicted). -0.5982 -0.4878 0.1835 0.6088
-0.3213 -0.5744 -0.3306 -0.6764
-0.4805 0.5041 -0.7031 0.1437
A4,1 = 2 = U4,1 S1,1 V1,1 + U4,2 S2,2 V1,2 + U4,3 S3,3 V1,3
2 = U4,1S1,1V1,1 A4,2 = U4,1S1,1V2,1
2 = U4,1 * 14.489 * -0.5551 = -0.24867 * 14.489 * -0.5982 = 2.155

U4,1 = -0.24867
A4,3 = U4,1S1,1V3,1
With the prediction of U4,1, we = -0.24867 * 14.489 * -0.3213 = 1.576
can predict user 4’s rating on
item 2, item 3 and item 4 (i.e., A4,4 = U4,1S1,1V4,1
A4,2) by predicting his rating on
= -0.24867 * 14.489 * -0.4805 = 1.731
feature 1 only.
106
Consider feature 1 and 2
A U S VT
5 4 2 6 -0.6 0.4124 -0.6855 14.489 0 0 0 -0.5551 -0.5982 -0.3213 -0.4805
3 7 5 2
6 4 1 4
= -0.5811 -0.8136 0.01923 0 4.932 0 0 0.4218 -0.4878 -0.5744 0.5041
-0.5498 0.4099 0.7278 0 0 1.655 0 0.6023 0.1835 -0.3306 -0.7031
-0.3889 0.6088 -0.6764 0.1437

V
Item 1 -0.5551 0.4218 0.6023 -0.3889
Item 2 -0.5982 -0.4878 0.1835 0.6088
Feature 1 Feature 2 -0.3213 -0.5744 -0.3306 -0.6764
-0.4805 0.5041 -0.7031 0.1437

A4,1 = 2 = U4,1S1,1V1,1 + U4,2S2,2V1,2


= U4,1 * 14.489 * -0.5551 + U4,2 * 4.932 * 0.4218
Now suppose User 4 rates
item 2 as 7 (as it is just the
2 = U4,1 * -8.0428439 + U4,2 * 2.0803176 …(i) item he is interested in).
We make use of this new
A4,2 = 7 = U4,1S1,1V2,1 + U4,2S2,2V2,2 info to solving U4,1 and U4,2.
= U4,1 * 14.489 * -0.5982 + U4,2 * 4.932 * -0.4878
7 = U4,1 * -8.6673198 + U4,2 * -2.4058296 …(ii)
107
Consider feature 1 and 2
A U S VT
5 4 2 6 -0.6 0.4124 -0.6855 14.489 0 0 0 -0.5551 -0.5982 -0.3213 -0.4805
3 7 5 2
6 4 1 4
= -0.5811 -0.8136 0.01923 0 4.932 0 0 0.4218 -0.4878 -0.5744 0.5041
-0.5498 0.4099 0.7278 0 0 1.655 0 0.6023 0.1835 -0.3306 -0.7031
-0.3889 0.6088 -0.6764 0.1437

V
-0.5551 0.4218 0.6023 -0.3889
-0.5982 -0.4878 0.1835 0.6088
-0.3213 -0.5744 -0.3306 -0.6764
-0.4805 0.5041 -0.7031 0.1437
2 = U4,1 * -8.0428439 + U4,2 * 2.0803176 …(i)
7 = U4,1 * -8.6673198 + U4,2 * -2.4058296 …(ii) We can
From (ii) U4,1 = (7 + 2.4058296 U4,2 ) / -8.6673198 …(iii) solve (i) and
Substitute (iii) to (i) 2 = ((7 + 2.4058296 U4,2 ) / -8.6673198) * -8.0428439 + U4,2 * 2.0803176
(ii) for U4,1
2 = ((7 + 2.4058296 U4,2 ) 0.927951 + U4,2 * 2.0803176 and U4,2 .
2 = 6.495654 + 2.2325 U4,2 + 2.0803176 U4,2
U = -1.0424 …(iv)
4,2

Substitute (iv) to (iii) U4,1 = -0.51829


108
Prediction with U4,1 and U4,2
A U S VT
5 4 2 6 -0.6 0.4124 -0.6855 14.489 0 0 0 -0.5551 -0.5982 -0.3213 -0.4805
3 7 5 2
6 4 1 4
= -0.5811 -0.8136 0.01923 0 4.932 0 0 0.4218 -0.4878 -0.5744 0.5041
-0.5498 0.4099 0.7278 0 0 1.655 0 0.6023 0.1835 -0.3306 -0.7031
-0.3889 0.6088 -0.6764 0.1437

V
-0.5551 0.4218 0.6023 -0.3889
U4,1 = -0.51829 U4,2 = -1.0424
-0.5982 -0.4878 0.1835 0.6088
Item 3 -0.3213 -0.5744 -0.3306 -0.6764
-0.4805 0.5041 -0.7031 0.1437

With U4,1 and U4,2 ,


A4,3 = U4,1S1,1V3,1 + U4,2S2,2V3,2
we can recalculate the
= -0.51829 * 14.489 * -0.3213 + -1.0424 * 4.932 * -0.5744 predictions.
= 2.4128 + 2.9531 The prediction now
= 5.3659
consider features 1
and 2.

109
Prediction with U4,1 and U4,2
A U S VT
5 4 2 6 -0.6 0.4124 -0.6855 14.489 0 0 0 -0.5551 -0.5982 -0.3213 -0.4805
3 7 5 2
6 4 1 4
= -0.5811 -0.8136 0.01923 0 4.932 0 0 0.4218 -0.4878 -0.5744 0.5041
-0.5498 0.4099 0.7278 0 0 1.655 0 0.6023 0.1835 -0.3306 -0.7031
-0.3889 0.6088 -0.6764 0.1437

V
-0.5551 0.4218 0.6023 -0.3889
U4,1 = -0.51829 U4,2 = -1.0424
-0.5982 -0.4878 0.1835 0.6088
-0.3213 -0.5744 -0.3306 -0.6764
Item 4 -0.4805 0.5041 -0.7031 0.1437

With U4,1 and U4,2 ,


A4,4 = U4,1S1,1V4,1 + U4,2S2,2V4,2 we can recalculate the
= -0.51829 * 14.489 * -0.4805 + -1.0424 * 4.932 * 0.5041 predictions.
= 3.60832 - 2.59164
The prediction now
consider features 1
= 1.0167 and 2.

110
Section 2e.

New ways to
compare users

Reference: The Effects of Singular Value Decomposition on Collaborative Filtering by Michael H. Pryor.
Compare users by feature
SVD analysis also offers an interesting way to
compare users and find similar users.

Instead of explicitly comparing users based on their


ratings, comparing users based on their feature
weights may be more accurate.
A U S VT
5 4 2 6 0.367 0.495 0.651 -0.195 22.691 0 0 0 0.568 0.636 0.370 0.368
3 7 5 2 0.385 -0.511 0.400 0.031 0 6.074 0 0 0.444 -0.481 -0.455 0.604
6 4 1 4 0.344 0.444 -0.341 -0.325 0 0 2.380 0 -0.635 -0.029 0.334 0.696
3 6 4 1 = 0.325 -0.456 -0.021 0.026 0 0 0 1.596 0.277 -0.602 0.738 -0.127
5 7 3 2 0.403 -0.215 -0.415 -0.546
6 6 4 4 0.449 0.061 0.056 0.308 V
0.568 0.444 -0.635 0.277
6 4 3 3 0.360 0.195 -0.353 0.680
0.636 -0.481 -0.029 -0.602
0.370 -0.455 0.334 0.738
0.368 0.604 0.696 -0.127
112
Compare users by feature
The matrix U*S means the user feature response
matrix (U) multiplied with the feature importance
A V
matrix (S). 5 4 2 6 0.568 0.444 -0.635 0.277
3 7 5 2 0.636 -0.481 -0.029 -0.602

U*S can also be obtained 6


3
4
6
1
4
4
1
0.370
0.368
-0.455
0.604
0.334
0.696
0.738
-0.127
by multiplying A with V : 5 7 3 2
6 6 4 4

A = U S VT 6 4 3 3
Users’ rating on feature (weighted)
obtained by AV or US
A V = U S VT V This matrix is 8.332 3.010 1.553 -0.309
the user’s 8.742 -3.102 0.954 0.053
A V = U S ( VT V ) response 7.794 2.701 -0.808 -0.516
7.368 -2.770 -0.047 0.044
weighted by
AV=USI feature
9.138
10.176
-1.304
0.374
-0.984
0.136
-0.869
0.494
importance.
AV=US 8.166 1.187 -0.836 1.087
113
Compare users by feature
By simply using normal cosine distance metrics on the
users' feature weights instead of the explicit ratings,
we may be able to find a better set of similar users.
A
Step 1. A new user enters the following ratings 5 6 3 3

Can you determine users who


are similar in the feature space Users’ rating on feature (weighted)
as the new user? obtained by AV or US
8.332 3.010 1.553 -0.309
We need to determine the 8.742 -3.102 0.954 0.053
7.794 2.701 -0.808 -0.516
user’s rating on feature space 7.368 -2.770 -0.047 0.044
(weighted) for comparison. 9.138 -1.304 -0.984 -0.869
10.176 0.374 0.136 0.494
8.166 1.187 -0.836 1.087
114
Compare users by feature
By simply using normal cosine distance metrics on the
users' feature weights instead of the explicit ratings,
we may be able to find a better set of similar users.
Step 2. We’ll place the new user’s ratings into the feature
space. This can be obtained by multiplying 5 6 3 3 with V
(since AV = US).
Users’ rating on feature (weighted)
A V obtained by AV or US
5 6 3 3 0.568 0.444 -0.635 0.277 8.332 3.010 1.553 -0.309
0.636 -0.481 -0.029 -0.602 8.742 -3.102 0.954 0.053
0.370 -0.455 0.334 0.738 7.794 2.701 -0.808 -0.516
0.368 0.604 0.696 -0.127 7.368 -2.770 -0.047 0.044
9.138 -1.304 -0.984 -0.869
10.176 0.374 0.136 0.494
8.166 1.187 -0.836 1.087
8.870 -0.219 -0.259 -0.394
115
Compare users by feature
By simply using normal cosine distance metrics on the
users' feature weights instead of the explicit ratings,
we may be able to find a better set of similar users.
Step 3. Apply cosine similarity on the users’ ratings on
feature space (weighted).
Cosine similarity of new user with user 1):
Dot product of the two user vectors. Users’ rating on feature (weighted)
= (8.870)*(8.332) + (-0.219)*(3.010) + (-0.259)*(1.553) + (-0.394)*(-0.309) obtained by AV or US
= 72.965169 8.332 3.010 1.553 -0.309
Length of user 1’s vector 8.742 -3.102 0.954 0.053
= sqrt(8.3322 + 3.0102 + 1.5532 + -0.3092) = 8.9994229815
7.794 2.701 -0.808 -0.516
Length of new user’s vector
= sqrt(8.8702 + (-0.219)2 + (-0.259)2 + (-0.394)2) = 8.88522245079 7.368 -2.770 -0.047 0.044
9.138 -1.304 -0.984 -0.869
10.176 0.374 0.136 0.494
Cosine Similarity between new user and user 1
= 72.965169 / (8.88522245079) * (8.9994229815) = 0.912499474169 8.166 1.187 -0.836 1.087
8.870 -0.219 -0.259 -0.394
116
Reference
The Effects of Singular Value Decomposition on
Collaborative Filtering by Michael H. Pryor.
www.cs.dartmouth.edu/reports/TR98-338.pdf

Singular Value Decomposition Tutorial by Kirk Baker


http://www.ling.ohio-state.edu/~kbaker/pubs/Singular_Value_Decomposition_Tutorial.pdf

Extra readings
- A Survey of Collaborative Filtering Techniques (published in Advances in Artificial Intelligence 2009 by
Xiaoyuan, et al.)
- Application of Dimensionality Reduction in Recommender System -- A Case Study (published in
WEBKDD2000 by Badrul M. Sarwar, et al.)
- Collaborative Tag Recommendations (published in Data Analysis, Machine Learning and Applications
2008 by Leandro Balby Marinho, et al.)

117
Chapter 8.

End
COMP3278 Introduction to
Database Management Systems
Department of Computer Science, The University of Hong Kong
Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278
For other uses, please email : [email protected]

You might also like