Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
100 views51 pages

Lesson 9 - Recommender Systems

Uploaded by

Omid khosravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views51 pages

Lesson 9 - Recommender Systems

Uploaded by

Omid khosravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Machine Learning

Lesson 9: Recommender Systems

© Simplilearn. All rights reserved.


Concepts Covered

Theory of Recommender Systems

Collaborative Filtering

User Based Nearest Neighbour

Item Based Nearest Neighbour

Cosine and Adjusted Cosine Similarity

Association Rule Mining

Apriori Algorithm
Learning Objectives

By the end of this lesson, you will be able to:

Build recommender model using python

Understand mechanism of association rule mining

Demonstrate apriori algorithm


Recommender Systems
Topic 1: Introduction
Recommender Systems

Recommender system is an information filtering technique, which


provides users with recommendations, which they might be interested
in.
Recommender Systems: Solution

Recommender systems acts as a solution for your day to day choices

Which websites will you find Which digital camera should


interesting? you buy?

Which degree and university Which book should you buy


are best for your future? for your next vacation?

Which is the best investment What is the best holiday for


for supporting the education you and your family?
of your children?
Recommender Systems: Example
Success of Recommendation Systems

Prediction Perspective Different Aspects


• Predicts to what degree a user • Depends on domain and purpose
likes an item • No holistic evaluation scenario
• Most popular evaluation exists
scenario in research Success and
Purpose of
Interaction Perspective Recommendatio Retrieval Perspective
• Gives users a good feeling n Systems • Reduces search costs
• Educates users about the • Provides correct proposals
product domain

Conversion Perspective Recommendation Perspective


• Commercial Situations • Identifies items from the long
• Increases hit and clickthrough, tail
lookers to bookers rates • Users did not know about their
existence
Paradigms of Recommendation Systems

Recommender systems reduce information overload by estimating


relevance
Paradigms of Recommendation Systems (Contd.)

Personalized Recommendations:
The most relevant item is
recommended based on the user
profile and contextual parameters
Paradigms of Recommendation Systems (Contd.)

Collaborative:
“Recommends what's
popular among your
peers"
Paradigms of Recommendation Systems (Contd.)

Content-based: "Displays
similar to what you have
liked"
Paradigms of Recommendation Systems (Contd.)

Knowledge-based:
"Recommends you what
fits, based on your needs”
Paradigms of Recommendation Systems (Contd.)

Hybrid: Combinations of
various inputs and/or
composition of different
mechanism
Recommender Systems
Topic 2: Collaborative Filtering
Collaborative Filtering

It matches people with similar interests as a basis for recommendation.

Item-Based Nearest Neighbor User-Based Nearest Neighbor


User-Based Nearest Neighbor

Consider a database of ratings of the current user, Alice and other


users given:

Item1 Item2 Item3 Item4 Item5


Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1

Problem

Determine whether Alice will like the Item 5 that is not seen or rated
Measuring User Similarity: Pearson Correlation

A measure of how strong a relationship is between two variables

σ𝒑 ∈𝑷(𝒓𝒂,𝒑 − 𝒓ത 𝒂 )(𝒓𝒃,𝒑 − 𝒓ത 𝒃 )
𝒔𝒊𝒎 𝒂, 𝒃 =
𝟐 𝟐
σ𝒑 ∈𝑷 𝒓𝒂,𝒑 − 𝒓ത 𝒂 σ𝒑 ∈𝑷 𝒓𝒃,𝒑 − 𝒓ത 𝒃

Where,
𝑎, 𝑏 : users
𝑟𝑎,𝑝 : rating of user 𝑎 for item 𝑝
𝑃 : set of items, rated both by 𝑎 and 𝑏
Possible similarity values between −1 and 1
Measuring User Similarity: Pearson Correlation

A common prediction function

Item1 Item2 Item3 Item4 Item5


Alice 5 3 4 4 ?
User1 3 1 2 3 3 sim = 0,85
User2 4 3 4 3 5 sim = 0,00
User3 3 3 1 5 4 sim = 0,70
sim = -0,79
User4 1 5 5 2 1

σ𝐛 ∈𝐍 𝐬𝐢𝐦 𝐚, 𝐛 ∗ (𝐫𝐛,𝐩 − 𝐫ഥ𝐛 )


Making Predictions 𝐩𝐫𝐞𝐝 𝐚, 𝐩 = rഥa +
σb ∈N sim a, b
Item-Based Nearest Neighbor

Uses the similarity between items (and not users) to make predictions

Example:
▪ Look for items that are like Item5
▪ Take Alice ratings for these items to predict the rating for Item5

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 3 3 1 5 4

User4 1 5 5 2 1
The Cosine and Adjusted Cosine Similarity Measures

Cosine Similarity
• Produces better results in item-to-item filtering
• Ratings are seen as vectors in n-dimensional space
• Similarity is calculated based on the angle between the vectors

𝒂∙𝒃
𝒔𝒊𝒎 𝒂, 𝒃 =
𝒂 ∗ |𝒃|

Adjusted Cosine Similarity

• Takes average user ratings into account


• Transforms the original ratings

σ𝒖∈𝑼(𝒓𝒖,𝒂 − 𝒓𝒖 )(𝒓𝒖,𝒃 − 𝒓𝒖 )
𝒔𝒊𝒎 𝒂, 𝒃 =
𝟐 𝟐
σ𝒖∈𝑼 𝒓𝒖,𝒂 − 𝒓𝒖 σ𝒖∈𝑼 𝒓𝒖,𝒃 − 𝒓𝒖
Recommender Systems
Topic 3: Association Rule Mining
Association Rule: Basic Concepts

Given a database of Finds all rules that correlate


transactions, each the presence of one set of
transaction is considered as items with that of another
a list of items set of items
Concepts

Most commonly used for


Identifies frequent patterns
market basket analysis
Association Rule: Performance Measures

• Provides fraction of transactions which contains the item X and Y


Support • Support = No. of times item X occurred / Total number of
transactions = P (X  Y))=

• Indicates how often the items X and Y occur together, given the
Performanc no. of times X occurs
e Measures Confidence • Confidence = No. of times item X and Y occurred / Total
occurrence of X = Pr (Y | X) =

• Indicates the strength of a rule over the random co-occurrence


of X and Y
Lift
• Lift = No. of times item X and Y occurred / Total occurrence of X
multiplied by Total occurrence of Y =
Association Rule: Example

Suppose there are five transactions P1,P2,P3,P4,P5 as given below:

P1 : A, B, C Here,
▪ A,B,C,D,E are items in a store, I =
P2 : A, C, D {A,B,C,D,E}
▪ Set of all transactions P =
P3 : B, C, D {P1,P2,P3,P4,P5}
▪ Each transaction is a set of items, P
P4 : A, D, E ⊆I

P5 : B, C, E
Association Rule: Example (Contd.)
Consider, you made some association rules using the transaction database as given below:

AD
CA
AC
B and C  D

Calculating support, confidence, and lift values for the same will result in the following
matrix:
Rule Support Confidence Lift
AD 2/5 2/3 2/9
CA 2/5 2/4 1/6
AC 2/5 2/3 1/6
B and C  D 1/5 1/3 1/9
Association Rule Generation: Apriori Algorithm

Uses frequent itemset to


generate association
rules

A subset of a frequent
itemset must also be a
frequent itemset

Frequent itemset is a set


of items whose support
value > threshold value
Apriori Algorithm: Example

Consider the following transaction dataset

TID Items
100 134
200 235
300 1235
400 25
500 135

Minimum Support Count =


2 Threshold Value = 2
Apriori Algorithm: Step 01

List of one side itemsets are made and their support values are calculated:

CI1 FI1
TID Items Itemset Support
Itemset Support
100 134 {1} 3
{1} 3
200 235 {2} 3
{2} 3
300 123 {3} 4
5 {3} 4
{4} 1
400 25 {5} 4
{5} 4
500 135

Since, the threshold value is 2, any itemset with support less than 2 are omitted.
Apriori Algorithm: Step 02

The length of the itemset is extended with 1 (k = k+1)

CI2
FI2
Itemset Support
TID Items Itemset Support
{1,2} 1
100 134 {1,3} 3
{1,3} 3
200 235 {1,5} 2
{1,5} 2
300 123 {2,3} 2
5 {2,3} 2
{2,5} 3
400 25 {2,5} 3
{3,5} 3
500 135 {3,5} 3
Apriori Algorithm: Step 03

The length of the itemset is extended. All the combinations of itemsets in


FI2 are used.

CI3
TID Items
Itemset Support
100 134
{1,2,3}
200 235
{1,2,5}
300 123
5 {1,3,5}
400 25 {2,3,5}
500 135
Apriori Algorithm: Step 04

Divide your itemset to check if there are any other subsets whose support
you haven’t calculated yet.

CI3
Itemset In FI2? FI2
TID Items {1,2,3} Itemse
No Support
100 134 {1,2},{1,3},{2,3} t
200 235 {1,2,5} {1,3} 3
No
{1,2},{1,5},{2,5} {1,5} 2
300 123
5 {1,3,5} {2,3} 2
Yes
{1,5},{1,3},{3,5}
400 25 {2,5} 3
{2,3,5}
500 135 Yes {3,5} 3
{2,3},{2,5},{3,5}
Apriori Algorithm: Step 05

Using the itemsets of CI3, a new itemset CI4 is created.

TID Items
FI3
100 134 CI4
Itemset Support
200 235 Itemset Support
{1,3,5} 2
300 123 {1,2,3,5} 1
5 {2,3,5} 2

400 25
500 135

Since, support of CI4 is less than 2, you will stop and return to the previous itemset, CI3.
Apriori Algorithm: Step 06

Now, you will proceed with subset creation with the obtained list of
frequent itemsets:

Itemset Support
{1,3,5} 2
{2,3,5} 2

Confidence Value – 60%

Using this, you will generate all non-empty subsets for each frequent itemset:
▪ For I = {1,3,5}, subsets are {1,3}, {1,5}, {3,5}, {1}, {3}, {5}
▪ For I = {2,3,5}, subsets are {2,3}, {2,5}, {3,5}, {2}, {3}, {5}

For every subset S of I, output is:


▪ S → (I-S) (S recommends I-S)
▪ if support(I) / support(S) >= min_conf value
Apriori Algorithm: Rule Selection

Based on the threshold value, few rules are selected

▪ForRule 1: {1,3} → ({1,3,5} - {1,3}) means 1 and 3 → 5 TID Items


set {1,3,5}:
Confidence = support(1,3,5)/support(1,3) = 2/3 = 66.66% > 60% 100 134
Rule 1 is selected 200 235
300 123
▪ Rule 2: {1,5} → ({1,3,5} - {1,5}) means 1 and 5 → 3 5
Confidence = support(1,3,5)/support(1,5) = 2/2 = 100% > 60% 400 25
Rule 2 is selected
500 135

▪ Rule 3: {3,5} → ({1,3,5} - {3,5}) means 3 and 5 → 1


Confidence = support(1,3,5)/support(3,5) = 2/3 = 66.66% > 60%
Rule 3 is selected
Apriori Algorithm: Rule Selection (Contd.)

TID Items
For set {1,3,5}:
100 134
▪ Rule 4: {1} → ({1,3,5} - {1}) means 1 → 3 and 5
200 235
Confidence = support(1,3,5)/support(1) = 2/3 = 66.66% > 60%
Rule 4 is selected 300 123
5

▪ Rule 5: {3} → ({1,3,5} - {3}) means 3 → 1 and 5 400 25


Confidence = support(1,3,5)/support(3) = 2/4 = 50% <60% 500 135
Rule 5 is rejected

▪ Rule 6: {5} → ({1,3,5} - {5}) means 5 → 1 and 3


Confidence = support(1,3,5)/support(5) = 2/4 = 50% < 60%
Rule 6 is rejected
Assisted Practice
Collaborative Filtering Duration: 20 mins.

Problem Statement: Consider the ratings dataset below, containing the data on: UserID, MovieID,
Rating and Timestamp. Each line of this file represents one rating of one movie by one user, and has
the following format: UserID::MovieID::Rating::Timestamp
Ratings are made on a 5 star scale with half star increments.
UserID: represents ID of the user
MovieID: represents ID of the movie
Timestamp: represents seconds from midnight Coordinated Universal Time (UTC) of January 1, 1970.

Objective: Predict a User-movie recommendation model.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and
password that are generated. Click on the Launch Lab button. On the page that appears, enter the
username and password in the respective fields, and click Login.
Unassisted Practice
Collaborative Filtering Duration: 15
mins.

Problem Statement: Consider the ratings dataset below, containing the data on: UserID, MovieID, Rating and
Timestamp. Each line of this file represents one rating of one movie by one user, and has the following format:
UserID::MovieID::Rating::Timestamp
Ratings are made on a 5 star scale with half star increments.
UserID: represents ID of the user
MovieID: represents ID of the movie
Timestamp: represents seconds from midnight Coordinated Universal Time (UTC) of January 1, 1970.

Objective: Predict a movie-movie recommendation model.

Note: This practice is not graded. It is only intended for you to apply the knowledge you have gained to solve real-
world problems.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Step 01

Load the ‘Ratings’ movie dataset into pandas with labels

Code

df = pd.read_csv('Recommend.csv',names=['user_id', 'movie_id', 'rating',


'timestamp'])
df
Step 02

Create a train test split of 75/25 by declaring number of users and movies

Code

n_users = df.user_id.unique().shape[0]
n_movies = df.movie_id.unique().shape[0]
train_data, test_data = train_test_split(df, test_size=0.25)
Step 03

Populate the train matrix (user_id x movie_id) with ratings such that
[user_id index, movie_id index] = given rating

Code

train_data_matrix = np.zeros((n_users, n_movies))


for line in train_data.itertuples():
#[user_id index, movie_id index] = given rating.
train_data_matrix[line[1]-1, line[2]-1] = line[3]
train_data_matrix
Step 04

Populate the test matrix (user_id x movie_id) with ratings such that
[user_id index, movie_id index] = given rating

Code

test_data_matrix = np.zeros((n_users, n_movies))


for line in test_data.itertuples():
#[user_id index, movie_id index] = given rating.
test_data_matrix[line[1]-1, line[2]-1] = line[3]
test_data_matrix
Step 05

Create cosine similarity matrices for movies and predict a movie-movie


recommendation model

Code

movie_similarity = pairwise_distances(train_data_matrix.T,
metric='cosine’)
movie_pred = train_data_matrix.dot(movie_similarity) /
np.array([np.abs(movie_similarity).sum(axis=1)])
movie_pred
Key Takeaways

Now, you are able to:

Build recommender model using python

Understand mechanism of association rule mining

Demonstrate apriori algorithm


Knowledge
Check

©Simplilearn. All rights reserved


Knowledge
Check Which of the following would most indicate a situation, where user-user collaborative
filtering would be strongly preferable for content-content based filtering?
1

a. The items recommended don’t have good attributes or keywords to describe them.

b. Only implicit ratings are available.

Most users have rated a core set of popular items, though they have different tastes on that
c. set.

d. There are lots of items to recommend and relatively few users.


Knowledge
Check Which of the following would most indicate a situation, where user-user collaborative
filtering would be strongly preferable for content-content based filtering?
1

a. The items recommended don’t have good attributes or keywords to describe them.

b. Only implicit ratings are available.

Most users have rated a core set of popular items, though they have different tastes
c. on that set.

d. There are lots of items to recommend and relatively few users.

The correct answer is a. The items recommended don’t have good attributes or keywords to describe them.
User-user collaborative filtering is strongly preferable when, the items recommended don’t have good attributes or
keywords to describe them or less number of items.
Knowledge
Check Which of the following is not a requirement for a successful user-user collaborative
filtering system?
2

a. Users tastes must either be stable (individual) or changing. If changing, they change in sync
with other users' tastes.

b. The domain in which you are performing collaborative filtering is scoped such that people
who agree within one part of that domain generally agree within other parts of the domain.

c. Past agreement between users is predictive of future agreement.

d. Users mostly have similar tastes on a set of popular items, though they may have
individually different tastes on unpopular items.
Knowledge
Check Which of the following is not a requirement for a successful user-user collaborative
filtering system?
2

a. Users tastes must either be stable (individual) or changing. If changing, they change in sync
with other users' tastes.

b. The domain in which you are performing collaborative filtering is scoped such that people
who agree within one part of that domain generally agree within other parts of the domain.

c. Past agreement between users is predictive of future agreement.

d. Users mostly have similar tastes on a set of popular items, though they may have
individually different tastes on unpopular items.

The correct answer is d. Users mostly have similar tastes on a set of popular items, though they may have individually
different tastes on unpopular items.
If the users mostly have similar tastes on a set of popular items, it calls for hybrid or item-based collaborative filtering.
Lesson-End Project Duration: 20 mins.

Problem Statement: BookRent is the largest online and offline book rental chain in India. The company
charges a fixed and rental fee for a book per month. Hence, the company gets more money for rented books.
Since, most of the users returning the books and not taking rentals, the company wants to increase the
revenue and profit.

Objective: You as an ML expert have to model a recommendation engine so that user gets recommendations
of books based on the behaviour of similar users. This will ensure that users are renting books based on their
individual taste.

Access: Click the Labs tab in the left side panel of the LMS. Copy or note the username and password that are
generated. Click the Launch Lab button. On the page that appears, enter the username and password in the
respective fields and click Login.
Thank You

©Simplilearn. All rights reserved

You might also like