0% found this document useful (0 votes)

5 views18 pages

Unit 3 Notes

Content-based recommender systems analyze user-rated items to build a profile of user interests, which is then used to recommend new items based on their attributes. The process involves three main components: a content analyzer for feature extraction, a profile learner for user preference modeling, and a filtering component for matching items to user profiles. While these systems offer advantages like user independence and the ability to recommend new items, they also face limitations such as over-specialization and the need for sufficient user ratings to generate accurate recommendations.

Uploaded by

22jr1a43d4cai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views18 pages

Unit 3 Notes

Uploaded by

22jr1a43d4cai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 18

CONTENT-BASED RECOMMENDER SYSTEMS

Systems implementing a content-based recommendation approach analyze a set of documents

and/or descriptions of items previously rated by a user, and build a model or profile of user
interests based on the features of the objects rated by that user.
The profile is a structured representation of user interests, adopted to recommend new
interesting items. The recommendation process basically consists in matching up the
attributes of the user profile against the attributes of a content object. The result is a relevance
judgment that represents the user’s level of interest in that object.
If a profile accurately reflects user preferences, it is of tremendous advantage for the
effectiveness of an information access process. For instance, it could be used to filter search
results by deciding whether a user is interested in a specific Web page or not and, in the
negative case, preventing it from being displayed.
High Level Architecture of Content-based Systems
Content-based Information Filtering (IF) systems need proper techniques for representing the
items and producing the user profile, and some strategies for comparing the user profile with
the item representation. The high-level architecture of a content-based recommender system
is depicted in the following figure. The recommendation process is performed in three steps,
each of which is handled by a separate component:
CONTENT ANALYZER – When information has no structure (e.g. text), some kind of pre-
processing step is needed to extract structured relevant information.The main responsibility
of the component is to represent the content of items (e.g. documents, Web pages, news,
product descriptions, etc.) coming from information sources in a form suitable for the next
processing steps. Data items are analyzed by feature extraction techniques in order to shift
item representation from the original information space to the target one (e.g.Web pages
represented as keyword vectors). This representation is the input to the PROFILE LEARNER
and FILTERING COMPONENT;
PROFILE LEARNER – This module collects data representative of the user preferences
and tries to generalize this data, in order to construct the user profile. Usually, the
generalization strategy is realized through machine learning techniques, which are able to
infer a model of user interests starting from items liked or disliked in the past. For instance,
the PROFILE LEARNER of a Web page recommender can implement a relevance feedback
method in which the learning technique combines vectors of positive and negative examples
into a prototype vector representing the user profile. Training examples are Web pages on
which a positive or negative feedback has been provided by the user;
FILTERING COMPONENT – This module exploits the user profile to suggest relevant
items by matching the profile representation against that of items to be recommended. The
result is a binary or continuous relevance judgment (computed using some similarity
metrics), the latter case resulting in a ranked list of potentially interesting items. In the above
mentioned example, the matching is realized by computing the cosine similarity between the
prototype vector and the item vectors.
The first step of the recommendation process is the one performed by the
CONTENTANALYZER, that usually borrows techniques from Information Retrieval
systems. Item descriptions coming from Information Source are processed by the CONTENT
ANALYZER, that extracts features (keywords, n-grams, concepts, . . .) from unstructured text
to produce a structured item representation, stored in the repository Represented Items.
In order to construct and update the profile of the active user ua (user for which
recommendations must be provided) her reactions to items are collected in some way and
recorded in the repository Feedback. These reactions, called annotations or feedback,
together with the related item descriptions, are exploited during the process of learning a
model useful to predict the actual relevance of newly presented items. Users can also
explicitly define their areas of interest as an initial profile without providing any feedback.
Advantages and Drawbacks of Content-based Filtering
Advantages :
The adoption of the content-based recommendation paradigm has several advantages when
compared to the collaborative one:
USER INDEPENDENCE - Content-based recommenders exploit solely ratings provided by
the active user to build her own profile. Instead, collaborative filtering methods need ratings
from other users in order to find the “nearest neighbours” of the active user, i.e., users that
have similar tastes since they rated the same items similarly. Then, only the items that are
most liked by the neighbours of the active user will be recommended;
TRANSPARENCY - Explanations on how the recommender system works can be provided
by explicitly listing content features or descriptions that caused an item to occur in the list of
recommendations. Those features are indicators to consult in order to decide whether to trust
a recommendation. Conversely, collaborative systems are black boxes since the only
explanation for an item recommendation is that unknown users with similar tastes liked that
item;
NEW ITEM - Content-based recommenders are capable of recommending items not yet
rated by any user. As a consequence, they do not suffer from the first-rater problem, which
affects collaborative recommenders which rely solely on users’ preferences to make
recommendations. Therefore, until the new item is rated by a substantial number of users, the
system would not be able to recommend it.

Drawbacks:
LIMITED CONTENT ANALYSIS - Content-based techniques have a natural limit in the
number and type of features that are associated. Domain knowledge is often needed, e.g., for
movie recommendations the system needs to know the actors and directors, and sometimes,
domain ontologies are also needed.
No content-based recommendation system can provide suitable suggestions if the analysed
content does not contain enough information to discriminate items the user likes from items
the user does not like.
Some representations capture only certain aspects of the content, but there are many others
that would influence a user’s experience.

For instance, often there is not enough information in the word frequency to model the user
interests in jokes or poems, while techniques for affective computing would be most
appropriate. Again, for Web pages, feature extraction techniques from text completely ignore
aesthetic qualities and additional multimedia information.
To sum up, both automatic and manually assignment of features to items could not be
sufficient to define distinguishing aspects of items that turn out to be necessary for the
elicitation of user interests.

OVER-SPECIALIZATION - Content-based recommenders have no inherent method for

finding something unexpected. The system suggests items whose scores are high when
matched against the user profile, hence the user is going to be recommended items similar to
those already rated. This drawback is also called serendipity problem to highlight the
tendency of the content-based systems to produce recommendations with a limited degree of
novelty. To give an example, when a user has only rated movies directed by Stanley Kubrick,
she will be recommended just that kind of movies. A “perfect” content-based technique
would rarely find anything novel, limiting the range of applications for which it would be
useful.
NEW USER - Enough ratings have to be collected before a content-based recommender
system can really understand user preferences and provide accurate recommendations.
Therefore, when few ratings are available, as for a new user, the system will not be able to
provide reliable recommendations.
CONTENT REPRESENTATION AND CONTENT SIMILARITY

The simplest way to describe catalogue items is to maintain an explicit list of features for
each item (also often called attributes, characteristics, or item profiles).
Book Knowledge Base
Vector-Space Model :

Term Frequency and Inverse Document Frequency:

• Simple keyword representation has its problems
• in particular when automatically extracted:
• not every word has similar importance
• longer documents have a higher chance to have an overlap with the
user profile
• Standard measure: TF-IDF
• Encodes text documents in multi-dimensional Euclidian space
• weighted term vector
• TF: Measures, how often a term appears (density in a document)
• assuming that important terms appear more often
• normalization has to be done in order to take document length into
account
• IDF: Aims to reduce the weight of terms that appear in all documents

TF-IDF calculation:

Term frequency describes how often a certain term appears in a document (assuming that
important words appear more often).

We search for the normalized term frequency value TF(i, j) of keyword i in document j. Let
freq(i, j) be the absolute number of occurrences of I in j. Given a keyword i, let

maximum frequency maxOthers(i, j) as max(freq(z, j )), z ∈ OtherKeywords(i, j). Finally,

OtherKeywords(i, j) denote the set of the other keywords appearing in j . Compute the

calculate TF (i, j).

Inverse document frequency is the second measure that is combined with term frequency. It
aims at reducing the weight of keywords that appear very often in all documents. The idea is
that those generally frequent words are not very helpful to discriminate among documents,
and more weight should therefore be given to words that appear in only a few documents.
Let N be the number of all recommendable documents and n(i) be the number of documents
from N in which keyword i appears. The inverse document frequency for i is typically
calculated as

The combined TF-IDF weight for a keyword i in document j is computed as the product of
these two measures:
In the TF-IDF model, the document is, therefore, represented not as a vector of Boolean
values for each keyword but as a vector of the computed TF-IDF weights.
Example TF-IDF representation:

Improving the vector space model:

i. Stop words and stemming:

A straightforward method is to remove so-called stop words. In the English
language these are, for instance, prepositions and articles such as “a”, “the”, or
“on”, which can be removed from the document vectors because they will appear
in nearly all documents.

Another commonly used technique is called stemming or conflation, which aims

to replace variants of the same word by their common stem (root word). The word
“stemming” would, for instance, be replaced by “stem”, “went” by “go”, and so
forth.
ii. Size cutoffs:
Another straightforward method to reduce the size of the document representation
and hopefully remove “noise” from the data is to use only the n most informative
words.
iii. Phrases.
A further possible improvement with respect to representation accuracy is to use
“phrases as terms”, which are more descriptive for a text than single words alone.
Phrases, or composed words such as “United Nations”, can be encoded as
additional dimensions in the vector space.

Limitations:
The described approach of extracting and weighting individual keywords from
the text has another important limitation: it does not take into account the context
of the keyword and, in some cases, may not capture the “meaning” of the
description correctly.

SIMILARITY-BASED RETRIEVAL

When the item selection problem in collaborative filtering can be described as “recommend
items that similar users liked”, content-based recommendation is commonly described as
“recommend items that are similar to those the user liked in the past”.
i. Nearest neighbors:

The prediction for a not-yet-seen item d is based on letting the k most similar items for
which a rating exists “vote” for n. If, for instance, four out of k = 5 of the most similar
items were liked by the current user, the system may guess that the chance that d will also
be liked is relatively high. Besides varying the neighborhood size k, several other
variations are possible, such as binarization of ratings, using a minimum similarity
threshold,or weighting of the votes based on the degree of similarity.

The kNN method was implemented as part of a multi-strategy user profile technique. The
system-maintained profiles of short-term (ephemeral) and long-term interests. The short-
term profile, as described earlier, allows the system to provide the user with information
on topics of recent interest. For the long-term model collects information over a longer
period of time (e.g., several months) and also seeks to identify the most informative words
in the documents by determining the terms that consistently receive high TF-IDF scores in
a larger document collection.
ii. Relevance feedback – Rocchio’s method:
Another method that is based on the vector-space model and was developed in the context
of the pioneering information retrieval (IR) system in the late 1960s is Rocchio’s relevance
feedback method.
The relevance feedback loop used in this method will help the system improve and
automatically extend the query as follows. The main idea is to first split the already rated
documents into two groups, D+ and D−, of liked (interesting/relevant) and disliked
documents and calculate a prototype (oraverage) vector for these categories. This
prototype can also be seen as a sort of centroid of a cluster for relevant and nonrelevant
document sets;The current query Qi , which is represented as a multidimensional term
vector just like the documents, is then repeatedly refined to Qi+1 by a weighted addition
of the prototype vector of the relevant documents and weighted substraction of the vector
representing the nonrelevant documents. As an effect,the query vector should consistently
move toward the set of relevant documents as depicted schematically in the following
figure.

The proposed formula for computing the modified query Qi+1 from Qi is defined as
follows:

The variables α, β, and γ are used to fine-tune the behavior of the “move” toward the more
relevant documents. The value of α describes how strongly the last (or original) query should
be weighted, and β and γ correspondingly capture how strongly positive and negative
feedback should be taken into account in the improvement step.

Average vectors for relevant and nonrelevant documents.

Relevance feedback. After feedback, the original query is moved toward the cluster of the
relevant documents;

Overall, the relevance feedback retrieval method and its variations are used in many
application domains. It has been shown that the method, despite its simplicity, can lead to
good retrieval improvements in real-world settings.

Other text classification methods

Another way of deciding whether or not a document will be of interest to a user is to view the
problem as a classification task, in which the possible classes are “like” and “dislike”. Once
the content-based recommendation task has been formulated as a classification problem,
various standard (supervised) machine learning techniques can, in principle, be applied such
that an intelligent system can automatically decide whether a user will be interested in a
certain document. Supervised learning means that the algorithm relies on the existence
of training data, in our case a set of (manually labelled) document-class pairs.

i. Probabilistic methods :

The most prominent classification methods developed in early text classification systems are
probabilistic ones. These approaches are based on the naive Bayes assumption of conditional
independence (with respect to term occurrences) and have also been successfully deployed in
content-based recommenders.

Classification based on Boolean feature vector

The basic formula to compute the posterior probability for document classification is:
Calculation:
To determine the correct class, we can compute the class-conditional probabilities for the
feature vector X of Document 6 again as follows:
P(X|Label=1) = P(recommender=1|Label=1) ×
P(intelligent=1|Label=1) ×
P(learning=0|Label=1) × P(school=0|Label=1)
= 3/3 × 2/3 × 1/3 × 2/3
≈ 0.149

The same can be done for the case Label = 0.

There are two advantages of this Bayesian Classifier :

i. Good Accuracy
ii. the components of the classifier can be easily updated when new data are
available
iii. the learning time complexity remains linear to the number of examples

Other linear classifiers and machine learning

When viewing the content-based recommendation problem as a classification problem,

various other machine learning techniques can be employed. At a more abstract level, most
learning methods aim to find coefficients of a linear model to discriminate between relevant
and nonrelevant documents.

The following figure sketches the basic idea in a simplified setting in which the available
documents are characterized by only two dimensions. If there are only two dimensions, the
classifier can be represented by a line. The idea can, however, also easily be generalized to
the multidimensional space in which a two-class classifier then corresponds to a hyperplane
that represents the decision boundary.

A linear classifier in two-dimensional space.

In two-dimensional space, the line that we search for has the form w1x1 + w2x2 = b where x1
and x2 correspond to the vector representation of a document (using, e.g., TF-IDF weights)
andw1,w2, and b are the parameters to be learned.
The classification of an individual document is based on checking whether for a certain
document w1x1 + w2x2 > b, which can be done very efficiently. In n-dimensional space, a
generalized equation using weight and feature vectors instead of only two values is used, so
the classification function is
COMPARE AND CONTRAST BETWEEN COLLABORATIVE AND
CONTENT-BASED RECOMMENDER SYSTEMS
Aspect Collaborative Filtering Content-Based Filtering

Basis of Based on user-user or item-item Based on item features and user

Recommendation similarities using ratings/behavior. preferences.

Requires Item
No Yes
Metadata

Implicitly learned from user behavior Explicitly created using item

User Profile
(e.g., rating history). features the user interacted with.

Cold Start Problem Yes, difficult to recommend for new Less severe if item features are
(New User) users. known.

Cold Start Problem Yes, hard to recommend new items No, can recommend if item features
(New Item) with no interaction data. are available.

Often suffers from sparse user-item Less affected, as recommendations

Sparsity Issue
matrices. are based on content.

Scalability May not scale well for large datasets. Easier to scale using item features.

High, as it considers peer user High, tailored to individual user’s

Personalization
behavior. preferences.

High–can recommend unexpected but May be lower – tends to

Serendipity
relevant items. recommend similar items.

Hard to explain ("People like you Easier to explain ("Recommended

Explainability
liked this"). because it has features you liked").

MovieLens,Netflix Amazon's item suggestions based

Example Use Case Recommendations based on other on product features you viewed or
users' ratings. liked.

KNOWLEDGE BASED RECOMMENDER SYSTEMS

Knowledge-based recommender systems help us to tackle the challenges imposed by both

Collaborative and content-based recommender systems. The advantage of these systems is
that no ramp-up problems exist, because no rating data are needed for the calculation of
recommendations. Recommendations are calculated independently of individual user ratings:
either in the form of similarities between customer requirements and items or on the basis of
explicit recommendation rules.

Two basic types of knowledge-based recommender systems are constraint based and case-
based systems. Both approaches are similar in terms of the recommendation process: the user
must specify the requirements, and the system tries to identify a solution. If no solution can
be found, the user must change the requirements. The system may also provide explanations
for the recommended items. These recommenders, however, differ in the way they use the
provided knowledge: case-based recommenders focus on the retrieval of similar items on the
basis of different types of similarity measures, whereas constraint-based recommenders rely
on an explicitly defined set of recommendation rules. In constraint-based systems, the set of
recommended items is determined by, for instance, searching for a set of items that fulfil the
recommendation rules. Case-based systems, on the other hand, use similarity metrics to
retrieve items that are similar (within a predefined threshold) to the specified customer
requirements.

Knowledge representation and reasoning

Knowledge-based systems rely on detailed knowledge about item characteristics. An example

for knowledge representation of a given product is given in the following table:

Example product assortment: digital cameras

The recommendation problem consists of selecting items from this catalog that match the
user’s needs, preferences, or hard requirements. The user’s requirements can, for instance, be
expressed in terms of desired values or value ranges for an item feature, such as “the price
should be lower than 300e” or in terms of desired functionality, such as “the camera should
be suited for sports photography”.

CONSTRAINT BASED APPROACH

A classical constraint satisfaction problem (CSP)1 can be described by a-tuple

(V,D,C) where

V is a set of variables,
D is a set of finite domains for these variables, and
C is a set of constraints that describes the combinations of values the variables
can simultaneously take.

A solution to a CSP corresponds to an assignment of a value to each variable

in V in a way that all constraints are satisfied.

Example recommendation task (VC, VPROD, CR, CF , CPROD,REQ) and the corresponding
recommendation result (RES)

recommender knowledge base that typically includes two different sets of variables (V = VC ∪
Constraint-based recommender systems can build on this formalism and exploit a

properties. Three different sets of constraints (C = CR ∪ CF ∪ CPROD) define which items

VPROD), one describing potential customer requirements and the other describing product

should be recommended to a customer in which situation. Examples for such variables and
constraints for a digital camera recommender, are shown in the above Table.

Customer properties (VC) describe the possible customer requirements. The customer
property max-price denotes the maximum price acceptable for the customer, the property
usage denotes the planned usage of photos (print versus digital organization), and
photography denotes the predominant type of photos to be taken; categories are, for example,
sports or portrait photos.
Product properties (VPROD) describe the properties of products in an assortment; for example,
mpix denotes possible resolutions of a digital camera.
Compatibility constraints (CR) define allowed instantiations of customer properties – for
example, if large-size photoprints are required, the maximal accepted price must be higher
than 200.
Filter conditions (CF) define under which conditions which products should be selected – in
other words, filter conditions define the relationships between customer properties and
product properties. An example filter condition is large-size photoprints require resolutions
greater than 5 mpix.
Product constraints (CPROD) define the currently available product assortment.
An example constraint defining such a product assortment is depicted in the above table.
Each conjunction in this constraint completely defines a product (item) – all product
properties have a defined value.
The task of identifying a set of products matching a customer’s wishes and needs is denoted
as a recommendation task. The customer requirements REQ can be encoded as unary
constraints over the variables in VC and VPROD – for example, max-price = 300.

Formally, each solution to the CSP (V = VC ∪ VPROD , D, C = CR ∪ CF ∪ CPROD ∪ REQ)

corresponds to a consistent recommendation.
Cases and similarities
In case-based recommendation approaches, items are retrieved using similarity measures that

similarity of an item p to the requirements r ∈ REQ is often defined as shown in the

describe to which extent item properties match some given user’s requirements. The distance

following formula.

customer requirement r ∈ REQ – for example, φmpix(p1) = 8.0.Furthermore, wr is the

In this context, sim(p, r) expresses for each item attribute value φr (p) its distance to the

importance weight for requirement r.3

In real-world scenarios, there are properties a customer would like to maximize – for
example, the resolution of a digital camera. There are also properties that customers want to
minimize – for example, the price of a digital camera or the risk level of a financial service.
In the first case we are talking about “more-is-better” (MIB) properties; in the second case
the corresponding properties are denoted with “less-is-better” (LIB).
To take those basic properties into account in our similarity calculations,we introduce the
following formulae for calculating local similarities.
First, in the case of MIB properties, the local similarity between p and r is calculated as
follows:

The local similarity between p and r in the case of LIB properties is calculated as follows:

4.3.1 Defaults
Proposing default values. Defaults are an important means to support customers in the
requirements specification process, especially in situations in which they are unsure about
which option to select or simply do not know technical details. Defaults can support
customers in choosing a reasonable alternative (an alternative that realistically fits the current
preferences). For example, if a customer is interested in printing large-format pictures from
digital images, the camera should support a resolution of more than 5.0 megapixels (default).
The negative side of the coin is that defaults can also be abused to manipulate consumers to
choose certain options. For example, users can be stimulated to buy a park distance control
functionality in a car by presenting the corresponding default value.
Defaults can be specified in various ways:
_ Static defaults: In this case, one default is specified per customer property – for example,
default(usage)=large-print, because typically users want to generate posters from high-
quality pictures.
_ Dependent defaults: In this case a default is defined on different combinations of potential
customer requirements – for example, default(usage=smallprint,max-price) = 300.
Derived defaults: When the first two default types are strictly based on a declarative
approach, this third type exploits existing interaction logs for the automated derivation of
default values.
Interacting with constraint-based recommenders
In our example, a given set of requirements REQ = {r1 : price <= 150, r2 :
opt-zoom = 5x, r3 : sound = yes, r4 : waterproof = yes} cannot be fulfilled by any of the

zoom=5x,sound=yes,waterproof=yes](P) = ∅.
products in P = {p1, p2, p3, p4, p5, p6, p7, p8} because σ[price<=150,opt-
In the context of our problem setting, a diagnosis is a minimal set of user requirements whose
repair (adaptation) will allow the retrieval of a recommendation.

Given P = {p1, p2, . . . , pn} and REQ = {r1, r2, . . . , rm} where σ[REQ](P) = ∅, a

dk} where σ[REQ−di ](P) =∅∀di ∈_.A diagnosis is a minimal set of elements {r1, r2, . . . ,
knowledge-based recommender system would calculate a set of diagnoses_ = {d1, d2, . . . ,

rk} = d ⊆ REQ that have to be repaired in order to restore consistency with the given product
assortment so at least one solution can be found: σ[REQ−d](P) = ∅. Following the basic
principles of MBD, the calculation of diagnoses di ∈ _ is based on the determination and

rl} ⊆ REQ, such that σ[CS](P) = ∅. A conflict set CS is minimal if and only if (iff) there does
resolution of conflict sets. A conflict set CS (Junker 2004) is defined as a subset {r1, r2, . . . ,

not exist a CS_ with CS_ ⊂ CS.

Calculating diagnoses for unsatisfiable requirements

In the context of our problem setting, a diagnosis is a minimal set of user requirements whose
repair (adaptation) will allow the retrieval of a recommendation.

Given P = {p1, p2, . . . , pn} and REQ = {r1, r2, . . . , rm} where σ[REQ](P) = ∅, a

dk} where σ[REQ−di ](P) =∅∀di ∈_.Adiagnosis is a minimal set of elements {r1, r2, . . . ,
knowledge-based recommender system would calculate a set of diagnoses_ = {d1, d2, . . . ,

not exist a CS_ with CS_ ⊂ CS.

Calculating conflict sets. A recent and general method for the calculation
of conflict sets is QuickXPlain (Algorithm 4.1), an algorithm that calculates
one conflict set at a time for a given set of constraints. Its divide-and-conquer
strategy helps to significantly accelerate the performance compared to other
approaches (for details see, e.g., Junker 2004).
QuickXPlain has two input parameters: first, P is the given product assortment
P = {p1, p2, . . . , pm}. Second, REQ = {r1, r2, . . . , rn} is a set of
requirements analyzed by the conflict detection algorithm.
QuickXPlain is based on a recursive divide-and-conquer strategy that divides
the set of requirements into the subsets REQ1 and REQ2. If both subsets
contain about 50 percent of the requirements (the splitting factor is n2
), all the requirements contained in REQ2 can be deleted (ignored) after a single

consistency check if σ[REQ1](P) = ∅. The splitting factor of n2 is generally

recommended; however, other factors can be defined. In the best case (e.g.,
all elements of the conflict belong to subset REQ1) the algorithm requires
log2 n u + 2u consistency checks; in the worst case, the number of consistency
checks is 2u(log2 n u
+ 1), where u is the number of elements contained in the conflict set.

Unit Ii Content-Based Recommendation Systems
No ratings yet
Unit Ii Content-Based Recommendation Systems
21 pages
Using Item Descriptors in Recommender Systems: Eliseo Reategui, John A. Campbell, Roberto Torres
No ratings yet
Using Item Descriptors in Recommender Systems: Eliseo Reategui, John A. Campbell, Roberto Torres
7 pages
What Is A Recommender System
No ratings yet
What Is A Recommender System
3 pages
Student Movie Recommender Report
No ratings yet
Student Movie Recommender Report
28 pages
09 Chapter 1
No ratings yet
09 Chapter 1
4 pages
Survey Recomender System Algorithm
No ratings yet
Survey Recomender System Algorithm
33 pages
Web-Based Personalized Hybrid Book Recommendation System
No ratings yet
Web-Based Personalized Hybrid Book Recommendation System
5 pages
Ijesrt: International Journal of Engineering Sciences & Research Technology
No ratings yet
Ijesrt: International Journal of Engineering Sciences & Research Technology
5 pages
Music Recommendation
100% (1)
Music Recommendation
113 pages
Recommendation in Social Media: Recommender System
No ratings yet
Recommendation in Social Media: Recommender System
29 pages
Module 5
No ratings yet
Module 5
50 pages
Flipkart Product Recommendation System
No ratings yet
Flipkart Product Recommendation System
8 pages
A Programmers Guide To Data Mining
No ratings yet
A Programmers Guide To Data Mining
299 pages
Recommender Systems
No ratings yet
Recommender Systems
23 pages
Basics of Content-Based Recommender Systems: Prepared by Senthil.V
No ratings yet
Basics of Content-Based Recommender Systems: Prepared by Senthil.V
18 pages
Using Genetic Algorithm For Hybrid Modes of Collaborative Filtering in Online Recommenders
No ratings yet
Using Genetic Algorithm For Hybrid Modes of Collaborative Filtering in Online Recommenders
6 pages
Unit I-Introduction
100% (1)
Unit I-Introduction
23 pages
Temporal Recommenders Apply Data
No ratings yet
Temporal Recommenders Apply Data
1 page
Business Intelligence:: A Managerial Perspective On Analytics (3 Edition)
No ratings yet
Business Intelligence:: A Managerial Perspective On Analytics (3 Edition)
46 pages
Recommender Systems Asanov
No ratings yet
Recommender Systems Asanov
7 pages
Paper 23-An Automated Recommender System For Course Selection
No ratings yet
Paper 23-An Automated Recommender System For Course Selection
10 pages
Merits and Demerits of Content Based Filtering
No ratings yet
Merits and Demerits of Content Based Filtering
9 pages
Hybrid Book Recommendation System
No ratings yet
Hybrid Book Recommendation System
16 pages
Survey Paper On Recommendation Engine
No ratings yet
Survey Paper On Recommendation Engine
9 pages
Sample Final Proposal
No ratings yet
Sample Final Proposal
10 pages
Moviegen: A Movie Recommendation System: Eyrun A. Eyjolfsdottir, Gaurangi Tilak, Nan Li
No ratings yet
Moviegen: A Movie Recommendation System: Eyrun A. Eyjolfsdottir, Gaurangi Tilak, Nan Li
10 pages
A Case Study of Exploiting Data Mining Techniques
No ratings yet
A Case Study of Exploiting Data Mining Techniques
8 pages
RIEJ - Volume 9 - Issue 4 - Pages 337-348
No ratings yet
RIEJ - Volume 9 - Issue 4 - Pages 337-348
12 pages
Final Report 18.7.24
No ratings yet
Final Report 18.7.24
26 pages
Content-Based Recommender Guide
No ratings yet
Content-Based Recommender Guide
7 pages
Job Recommendation Bot
No ratings yet
Job Recommendation Bot
27 pages
Paper2-An Improved Recommender System Solution To Mitigat
No ratings yet
Paper2-An Improved Recommender System Solution To Mitigat
22 pages
M21DGS323 - 2610 - 02
No ratings yet
M21DGS323 - 2610 - 02
77 pages
Software Requirements Specification Document For
100% (1)
Software Requirements Specification Document For
24 pages
Cold Start Solution
No ratings yet
Cold Start Solution
8 pages
Movie Recommendation System Using Content Based Filtering Ijariie14954
No ratings yet
Movie Recommendation System Using Content Based Filtering Ijariie14954
16 pages
Recommendation System
No ratings yet
Recommendation System
7 pages
Recommender System Algorithms
No ratings yet
Recommender System Algorithms
4 pages
Ccs360-Rs-Unit-Ii - Content Based RS
No ratings yet
Ccs360-Rs-Unit-Ii - Content Based RS
15 pages
Perspective Analysis Recommendation System in Machine Learning
No ratings yet
Perspective Analysis Recommendation System in Machine Learning
4 pages
2023 KEDIR Pattern Based Hybrid Book Recommendation System
No ratings yet
2023 KEDIR Pattern Based Hybrid Book Recommendation System
12 pages
Clustering in Recommender Systems Review
No ratings yet
Clustering in Recommender Systems Review
22 pages
Information Retrieval in Folksonomies: Search and Ranking
No ratings yet
Information Retrieval in Folksonomies: Search and Ranking
15 pages
Content-Based Recommender Architecture
No ratings yet
Content-Based Recommender Architecture
6 pages
Chapter 3
No ratings yet
Chapter 3
25 pages
Chen 2017
No ratings yet
Chen 2017
17 pages
Icitsi 2014 7048228
No ratings yet
Icitsi 2014 7048228
6 pages
Travel Companion: Keywords:-Blockchain, Machine Learning, Hybrid Filtering
No ratings yet
Travel Companion: Keywords:-Blockchain, Machine Learning, Hybrid Filtering
5 pages
RecSys - Final (Solution)
No ratings yet
RecSys - Final (Solution)
6 pages
Welcome 1
No ratings yet
Welcome 1
9 pages
Item-Based Collaborative Filtering Recommendation Algorithms - Highlighted Paper
No ratings yet
Item-Based Collaborative Filtering Recommendation Algorithms - Highlighted Paper
11 pages
Ai 1
No ratings yet
Ai 1
17 pages
MINI PROJECT Music Recommendation
No ratings yet
MINI PROJECT Music Recommendation
21 pages
Paper 6
No ratings yet
Paper 6
10 pages
Unit V Chapter II
No ratings yet
Unit V Chapter II
22 pages
Artificial Intelligence in Marketing
No ratings yet
Artificial Intelligence in Marketing
54 pages
Unit 5
No ratings yet
Unit 5
36 pages
An Industrial Training Report On: Ai - ML Internship
No ratings yet
An Industrial Training Report On: Ai - ML Internship
17 pages
Advanced E-commerce Recommendation System
No ratings yet
Advanced E-commerce Recommendation System
37 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
A Cooking Recipe Recommendation System With Visual
No ratings yet
A Cooking Recipe Recommendation System With Visual
7 pages
Generative Recommender Systems
No ratings yet
Generative Recommender Systems
10 pages
Bilgic 和 Mooney - Explaining Recommendations Satisfaction vs. Promo
No ratings yet
Bilgic 和 Mooney - Explaining Recommendations Satisfaction vs. Promo
8 pages
Unit-4 Generative AI
No ratings yet
Unit-4 Generative AI
14 pages
Module 6 - Link Analysis Recommendation Systems
No ratings yet
Module 6 - Link Analysis Recommendation Systems
68 pages
UNIT II - Recommender Systems
No ratings yet
UNIT II - Recommender Systems
19 pages
Music Recommendations Real Time Based On Face Emotions With Spotify
No ratings yet
Music Recommendations Real Time Based On Face Emotions With Spotify
22 pages
Case Study
No ratings yet
Case Study
8 pages
SWOT Analysis (Paragraph Form
No ratings yet
SWOT Analysis (Paragraph Form
1 page
Tvrev Contextual+Targeting+Report Final
No ratings yet
Tvrev Contextual+Targeting+Report Final
61 pages
UC3BPR20 202425 NUC BPR Project Proposal Thinh Tran
No ratings yet
UC3BPR20 202425 NUC BPR Project Proposal Thinh Tran
9 pages
Netflix
No ratings yet
Netflix
46 pages
2024-Widyaningtyas T. Et Al.-mf-NCG - Recommendation Algorithm Using Matrix Factorization-Based Normalized Cumulative Genre
No ratings yet
2024-Widyaningtyas T. Et Al.-mf-NCG - Recommendation Algorithm Using Matrix Factorization-Based Normalized Cumulative Genre
10 pages
Unit 1 Final Merged
No ratings yet
Unit 1 Final Merged
254 pages
Data Science Brochure
No ratings yet
Data Science Brochure
15 pages
Advanced Pothole Detection and Repair Recommendation System Using Computer Vision Techniques
No ratings yet
Advanced Pothole Detection and Repair Recommendation System Using Computer Vision Techniques
11 pages
Netflix 35% 6
No ratings yet
Netflix 35% 6
5 pages
Unit 1 Final
No ratings yet
Unit 1 Final
50 pages
Minalsharma Jietcs21070
No ratings yet
Minalsharma Jietcs21070
1 page
Sample B
No ratings yet
Sample B
25 pages
AI Mini Project
No ratings yet
AI Mini Project
22 pages
Unit IV Notes
No ratings yet
Unit IV Notes
16 pages
Akshatha Paper
No ratings yet
Akshatha Paper
7 pages
EPGC in Data Analytics Ihub IIT Roorkee - Intellipaat
No ratings yet
EPGC in Data Analytics Ihub IIT Roorkee - Intellipaat
16 pages
Intelligent Movie Recommendation System Based On Hybrid Recommendation Algorithms
No ratings yet
Intelligent Movie Recommendation System Based On Hybrid Recommendation Algorithms
5 pages

Unit 3 Notes

Uploaded by

Unit 3 Notes

Uploaded by

CONTENT-BASED RECOMMENDER SYSTEMS

Systems implementing a content-based recommendation approach analyze a set of documents

OVER-SPECIALIZATION - Content-based recommenders have no inherent method for

Term Frequency and Inverse Document Frequency:

maximum frequency maxOthers(i, j) as max(freq(z, j )), z ∈ OtherKeywords(i, j). Finally,

calculate TF (i, j).

Improving the vector space model:

i. Stop words and stemming:

Another commonly used technique is called stemming or conflation, which aims

Average vectors for relevant and nonrelevant documents.

Other text classification methods

Classification based on Boolean feature vector

The same can be done for the case Label = 0.

There are two advantages of this Bayesian Classifier :

Other linear classifiers and machine learning

When viewing the content-based recommendation problem as a classification problem,

A linear classifier in two-dimensional space.

Basis of Based on user-user or item-item Based on item features and user

Implicitly learned from user behavior Explicitly created using item

Often suffers from sparse user-item Less affected, as recommendations

High, as it considers peer user High, tailored to individual user’s

High–can recommend unexpected but May be lower – tends to

Hard to explain ("People like you Easier to explain ("Recommended

MovieLens,Netflix Amazon's item suggestions based

KNOWLEDGE BASED RECOMMENDER SYSTEMS

Knowledge-based recommender systems help us to tackle the challenges imposed by both

Knowledge representation and reasoning

Knowledge-based systems rely on detailed knowledge about item characteristics. An example

Example product assortment: digital cameras

CONSTRAINT BASED APPROACH

A classical constraint satisfaction problem (CSP)1 can be described by a-tuple

A solution to a CSP corresponds to an assignment of a value to each variable

properties. Three different sets of constraints (C = CR ∪ CF ∪ CPROD) define which items

Formally, each solution to the CSP (V = VC ∪ VPROD , D, C = CR ∪ CF ∪ CPROD ∪ REQ)

similarity of an item p to the requirements r ∈ REQ is often defined as shown in the

customer requirement r ∈ REQ – for example, φmpix(p1) = 8.0.Furthermore, wr is the

importance weight for requirement r.3

not exist a CS_ with CS_ ⊂ CS.

Calculating diagnoses for unsatisfiable requirements

not exist a CS_ with CS_ ⊂ CS.

consistency check if σ[REQ1](P) = ∅. The splitting factor of n2 is generally

You might also like