Lesson 9 - Recommender Systems
Lesson 9 - Recommender Systems
Collaborative Filtering
Apriori Algorithm
Learning Objectives
Personalized Recommendations:
The most relevant item is
recommended based on the user
profile and contextual parameters
Paradigms of Recommendation Systems (Contd.)
Collaborative:
“Recommends what's
popular among your
peers"
Paradigms of Recommendation Systems (Contd.)
Content-based: "Displays
similar to what you have
liked"
Paradigms of Recommendation Systems (Contd.)
Knowledge-based:
"Recommends you what
fits, based on your needs”
Paradigms of Recommendation Systems (Contd.)
Hybrid: Combinations of
various inputs and/or
composition of different
mechanism
Recommender Systems
Topic 2: Collaborative Filtering
Collaborative Filtering
Problem
Determine whether Alice will like the Item 5 that is not seen or rated
Measuring User Similarity: Pearson Correlation
σ𝒑 ∈𝑷(𝒓𝒂,𝒑 − 𝒓ത 𝒂 )(𝒓𝒃,𝒑 − 𝒓ത 𝒃 )
𝒔𝒊𝒎 𝒂, 𝒃 =
𝟐 𝟐
σ𝒑 ∈𝑷 𝒓𝒂,𝒑 − 𝒓ത 𝒂 σ𝒑 ∈𝑷 𝒓𝒃,𝒑 − 𝒓ത 𝒃
Where,
𝑎, 𝑏 : users
𝑟𝑎,𝑝 : rating of user 𝑎 for item 𝑝
𝑃 : set of items, rated both by 𝑎 and 𝑏
Possible similarity values between −1 and 1
Measuring User Similarity: Pearson Correlation
Uses the similarity between items (and not users) to make predictions
Example:
▪ Look for items that are like Item5
▪ Take Alice ratings for these items to predict the rating for Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
The Cosine and Adjusted Cosine Similarity Measures
Cosine Similarity
• Produces better results in item-to-item filtering
• Ratings are seen as vectors in n-dimensional space
• Similarity is calculated based on the angle between the vectors
𝒂∙𝒃
𝒔𝒊𝒎 𝒂, 𝒃 =
𝒂 ∗ |𝒃|
σ𝒖∈𝑼(𝒓𝒖,𝒂 − 𝒓𝒖 )(𝒓𝒖,𝒃 − 𝒓𝒖 )
𝒔𝒊𝒎 𝒂, 𝒃 =
𝟐 𝟐
σ𝒖∈𝑼 𝒓𝒖,𝒂 − 𝒓𝒖 σ𝒖∈𝑼 𝒓𝒖,𝒃 − 𝒓𝒖
Recommender Systems
Topic 3: Association Rule Mining
Association Rule: Basic Concepts
• Indicates how often the items X and Y occur together, given the
Performanc no. of times X occurs
e Measures Confidence • Confidence = No. of times item X and Y occurred / Total
occurrence of X = Pr (Y | X) =
P1 : A, B, C Here,
▪ A,B,C,D,E are items in a store, I =
P2 : A, C, D {A,B,C,D,E}
▪ Set of all transactions P =
P3 : B, C, D {P1,P2,P3,P4,P5}
▪ Each transaction is a set of items, P
P4 : A, D, E ⊆I
P5 : B, C, E
Association Rule: Example (Contd.)
Consider, you made some association rules using the transaction database as given below:
AD
CA
AC
B and C D
Calculating support, confidence, and lift values for the same will result in the following
matrix:
Rule Support Confidence Lift
AD 2/5 2/3 2/9
CA 2/5 2/4 1/6
AC 2/5 2/3 1/6
B and C D 1/5 1/3 1/9
Association Rule Generation: Apriori Algorithm
A subset of a frequent
itemset must also be a
frequent itemset
TID Items
100 134
200 235
300 1235
400 25
500 135
List of one side itemsets are made and their support values are calculated:
CI1 FI1
TID Items Itemset Support
Itemset Support
100 134 {1} 3
{1} 3
200 235 {2} 3
{2} 3
300 123 {3} 4
5 {3} 4
{4} 1
400 25 {5} 4
{5} 4
500 135
Since, the threshold value is 2, any itemset with support less than 2 are omitted.
Apriori Algorithm: Step 02
CI2
FI2
Itemset Support
TID Items Itemset Support
{1,2} 1
100 134 {1,3} 3
{1,3} 3
200 235 {1,5} 2
{1,5} 2
300 123 {2,3} 2
5 {2,3} 2
{2,5} 3
400 25 {2,5} 3
{3,5} 3
500 135 {3,5} 3
Apriori Algorithm: Step 03
CI3
TID Items
Itemset Support
100 134
{1,2,3}
200 235
{1,2,5}
300 123
5 {1,3,5}
400 25 {2,3,5}
500 135
Apriori Algorithm: Step 04
Divide your itemset to check if there are any other subsets whose support
you haven’t calculated yet.
CI3
Itemset In FI2? FI2
TID Items {1,2,3} Itemse
No Support
100 134 {1,2},{1,3},{2,3} t
200 235 {1,2,5} {1,3} 3
No
{1,2},{1,5},{2,5} {1,5} 2
300 123
5 {1,3,5} {2,3} 2
Yes
{1,5},{1,3},{3,5}
400 25 {2,5} 3
{2,3,5}
500 135 Yes {3,5} 3
{2,3},{2,5},{3,5}
Apriori Algorithm: Step 05
TID Items
FI3
100 134 CI4
Itemset Support
200 235 Itemset Support
{1,3,5} 2
300 123 {1,2,3,5} 1
5 {2,3,5} 2
400 25
500 135
Since, support of CI4 is less than 2, you will stop and return to the previous itemset, CI3.
Apriori Algorithm: Step 06
Now, you will proceed with subset creation with the obtained list of
frequent itemsets:
Itemset Support
{1,3,5} 2
{2,3,5} 2
Using this, you will generate all non-empty subsets for each frequent itemset:
▪ For I = {1,3,5}, subsets are {1,3}, {1,5}, {3,5}, {1}, {3}, {5}
▪ For I = {2,3,5}, subsets are {2,3}, {2,5}, {3,5}, {2}, {3}, {5}
TID Items
For set {1,3,5}:
100 134
▪ Rule 4: {1} → ({1,3,5} - {1}) means 1 → 3 and 5
200 235
Confidence = support(1,3,5)/support(1) = 2/3 = 66.66% > 60%
Rule 4 is selected 300 123
5
Problem Statement: Consider the ratings dataset below, containing the data on: UserID, MovieID,
Rating and Timestamp. Each line of this file represents one rating of one movie by one user, and has
the following format: UserID::MovieID::Rating::Timestamp
Ratings are made on a 5 star scale with half star increments.
UserID: represents ID of the user
MovieID: represents ID of the movie
Timestamp: represents seconds from midnight Coordinated Universal Time (UTC) of January 1, 1970.
Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and
password that are generated. Click on the Launch Lab button. On the page that appears, enter the
username and password in the respective fields, and click Login.
Unassisted Practice
Collaborative Filtering Duration: 15
mins.
Problem Statement: Consider the ratings dataset below, containing the data on: UserID, MovieID, Rating and
Timestamp. Each line of this file represents one rating of one movie by one user, and has the following format:
UserID::MovieID::Rating::Timestamp
Ratings are made on a 5 star scale with half star increments.
UserID: represents ID of the user
MovieID: represents ID of the movie
Timestamp: represents seconds from midnight Coordinated Universal Time (UTC) of January 1, 1970.
Note: This practice is not graded. It is only intended for you to apply the knowledge you have gained to solve real-
world problems.
Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Step 01
Code
Create a train test split of 75/25 by declaring number of users and movies
Code
n_users = df.user_id.unique().shape[0]
n_movies = df.movie_id.unique().shape[0]
train_data, test_data = train_test_split(df, test_size=0.25)
Step 03
Populate the train matrix (user_id x movie_id) with ratings such that
[user_id index, movie_id index] = given rating
Code
Populate the test matrix (user_id x movie_id) with ratings such that
[user_id index, movie_id index] = given rating
Code
Code
movie_similarity = pairwise_distances(train_data_matrix.T,
metric='cosine’)
movie_pred = train_data_matrix.dot(movie_similarity) /
np.array([np.abs(movie_similarity).sum(axis=1)])
movie_pred
Key Takeaways
a. The items recommended don’t have good attributes or keywords to describe them.
Most users have rated a core set of popular items, though they have different tastes on that
c. set.
a. The items recommended don’t have good attributes or keywords to describe them.
Most users have rated a core set of popular items, though they have different tastes
c. on that set.
The correct answer is a. The items recommended don’t have good attributes or keywords to describe them.
User-user collaborative filtering is strongly preferable when, the items recommended don’t have good attributes or
keywords to describe them or less number of items.
Knowledge
Check Which of the following is not a requirement for a successful user-user collaborative
filtering system?
2
a. Users tastes must either be stable (individual) or changing. If changing, they change in sync
with other users' tastes.
b. The domain in which you are performing collaborative filtering is scoped such that people
who agree within one part of that domain generally agree within other parts of the domain.
d. Users mostly have similar tastes on a set of popular items, though they may have
individually different tastes on unpopular items.
Knowledge
Check Which of the following is not a requirement for a successful user-user collaborative
filtering system?
2
a. Users tastes must either be stable (individual) or changing. If changing, they change in sync
with other users' tastes.
b. The domain in which you are performing collaborative filtering is scoped such that people
who agree within one part of that domain generally agree within other parts of the domain.
d. Users mostly have similar tastes on a set of popular items, though they may have
individually different tastes on unpopular items.
The correct answer is d. Users mostly have similar tastes on a set of popular items, though they may have individually
different tastes on unpopular items.
If the users mostly have similar tastes on a set of popular items, it calls for hybrid or item-based collaborative filtering.
Lesson-End Project Duration: 20 mins.
Problem Statement: BookRent is the largest online and offline book rental chain in India. The company
charges a fixed and rental fee for a book per month. Hence, the company gets more money for rented books.
Since, most of the users returning the books and not taking rentals, the company wants to increase the
revenue and profit.
Objective: You as an ML expert have to model a recommendation engine so that user gets recommendations
of books based on the behaviour of similar users. This will ensure that users are renting books based on their
individual taste.
Access: Click the Labs tab in the left side panel of the LMS. Copy or note the username and password that are
generated. Click the Launch Lab button. On the page that appears, enter the username and password in the
respective fields and click Login.
Thank You