0% found this document useful (0 votes)

172 views10 pages

Semantics-Aware Content-Based Recommender Systems

Uploaded by

Touko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

172 views10 pages

Semantics-Aware Content-Based Recommender Systems

Uploaded by

Touko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Chapter 4

Semantics-Aware Content-Based
Recommender Systems

Marco de Gemmis, Pasquale Lops, Cataldo Musto, Fedelucio Narducci,

and Giovanni Semeraro

4.1 Introduction

Content-based recommender systems (CBRSs) rely on item and user descriptions

(content) to build item representations and user profiles to suggest items similar
to those a target user already liked in the past. The basic process of producing
content-based recommendations consists in matching up the attributes of the target
user profile, in which preferences and interests are stored, with the attributes of the
items. The result is a relevance score that predicts the target user’s level of interest
in those items. Usually, attributes for describing an item are features extracted from
metadata associated to that item, or textual features extracted directly from the item
description. The content extracted from metadata is often too short and not sufficient
to correctly define the user interests, while the use of textual features involves
a number of complications when learning a user profile due to natural language
ambiguity. Polysemy, synonymy, multi-word expressions, named entity recognition
and disambiguation are inherent problems of traditional keyword-based profiles,
which are not able to go beyond the usage of lexical/syntactic structures to infer the
user interest in topics.
The ever increasing interest in semantic technologies and the availability of
several open knowledge sources, such as Wikipedia, DBpedia, Freebase, and
BabelNet have fueled recent progress in the field of CBRSs. Novel research works
have introduced semantic techniques that shift from a keyword-based to a concept-
based representation of items and user profiles. These observations make very
relevant the integration of proper techniques for deep content analytics borrowed

M. de Gemmis • P. Lops () • C. Musto • F. Narducci • G. Semeraro

Department of Computer Science, University of Bari “Aldo Moro”, Bari, Italy
e-mail: [email protected]; [email protected]; [email protected];
[email protected]; [email protected]

© Springer Science+Business Media New York 2015 119

F. Ricci et al. (eds.), Recommender Systems Handbook,
DOI 10.1007/978-1-4899-7637-6_4
120 M. de Gemmis et al.

from Natural Language Processing (NLP) and Semantic Technologies, which is one
of the most innovative lines of research in semantic recommender systems [61].
We roughly classify semantic techniques into top-down and bottom-up
approaches. Top-down approaches rely on the integration of external knowledge,
such as machine readable dictionaries, taxonomies (or IS-A hierarchies), thesauri or
ontologies (with or without value restrictions and logical constraints), for annotating
items and representing user proﬁles in order to capture the semantics of the target
user information needs. The main motivation behind top-down approaches is the
challenge of providing recommender systems with the linguistic knowledge and
common sense knowledge, as well as the cultural background which characterize
the human ability of interpreting documents expressed in natural language and
reasoning on their meaning.
On the other side, bottom-up approaches exploit the so-called geometric
metaphor of meaning to represent complex syntagmatic and paradigmatic relations
between words in high-dimensional vector spaces. According to this metaphor, each
word (and each document as well) can be represented as a point in a vector space.
The peculiarity of these models is that the representation is learned by analyzing
the context in which the word is used, in a way that terms (or documents) similar
to each other are close in the space. For this reason bottom-up approaches are also
called distributional models. One of the great virtues of these approaches is that they
are able to induce the semantics of terms by analyzing their use in large corpora
of textual documents using unsupervised mechanisms, as evidenced by the recent
advances of machine translation techniques [52, 83].
This chapter describes a variety of semantic approaches, both top-down and
bottom-up, and shows how to leverage them to build a new generation of semantic
CBRSs that we call semantics-aware content-based recommender systems.

4.2 Overview of Content-Based Recommender Systems

This section reports an overview of the basic principles for building CBRSs,
the main techniques for representing items, learning user proﬁles and providing
recommendations. The most important limitations of CBRSs are also discussed,
while the semantic techniques useful to tackle those limitations are introduced in
the next sections.
The high level architecture of a content-based recommender system is depicted
in Fig. 4.1. The recommendation process is performed in three steps, each of which
is handled by a separate component:
• CONTENT ANALYZER—When information has no structure (e.g. text), some
kind of pre-processing step is needed to extract structured relevant information.
The main responsibility of the component is to represent the content of items
(e.g. documents, Web pages, news, product descriptions, etc.) coming from
information sources in a form suitable for the next processing steps. Data items
4 Semantics-Aware Content-Based Recommender Systems 121

User ua User ua
training feedback
examples PROFILE
Represented Feedback
LEARNER
Items

User ua
Structured Profile
Item User ua
Representation feedback

CONTENT New Active user ua

ANALYZER Items PROFILES

User ua
Profile
Item
Descriptions

FILTERING List of
Information
COMPONENT recommendations
Source

Fig. 4.1 High level architecture of a content-based recommender

are analyzed by feature extraction techniques in order to shift item representation

from the original information space to the target one (e.g. Web pages represented
as keyword vectors). This representation is the input to the PROFILE LEARNER
and FILTERING COMPONENT;
• PROFILE LEARNER—This module collects data representative of the user
preferences and tries to generalize this data, in order to construct the user
profile. Usually, the generalization strategy is realized through machine learning
techniques [86], which are able to infer a model of user interests starting from
items liked or disliked in the past. For instance, the PROFILE LEARNER of a Web
page recommender can implement a relevance feedback method [113] in which
the learning technique combines vectors of positive and negative examples into a
prototype vector representing the user profile. Training examples are Web pages
on which a positive or negative feedback has been provided by the user;
• FILTERING COMPONENT—This module exploits the user profile to suggest
relevant items by matching the profile representation against that of items to
be recommended. The result is a binary or continuous relevance judgment
(computed using some similarity metrics [57]), the latter case resulting in a
ranked list of potentially interesting items. In the above mentioned example, the
matching is realized by computing the cosine similarity between the prototype
vector and the item vectors.
The first step of the recommendation process is the one performed by the
CONTENT ANALYZER, that usually borrows techniques from Information Retrieval
122 M. de Gemmis et al.

systems [6, 118]. Item descriptions coming from Information Source are processed
by the CONTENT ANALYZER, that extracts features (keywords, n-grams, concepts,
. . . ) from unstructured text to produce a structured item representation, stored in the
repository Represented Items.
In order to construct and update the profile of the active user ua (user for
which recommendations must be provided) her reactions to items are collected in
some way and recorded in the repository Feedback. These reactions, called anno-
tations [51] or feedback, together with the related item descriptions, are exploited
during the process of learning a model useful to predict the actual relevance of
newly presented items. Users can also explicitly define their areas of interest as an
initial profile without providing any feedback. Typically, it is possible to distinguish
between two kinds of relevance feedback: positive information (inferring features
liked by the user) and negative information (i.e., inferring features the user is
not interested in [58]). Two different techniques can be adopted for recording
user’s feedback. When a system requires the user to explicitly evaluate items, this
technique is usually referred to as “explicit feedback”; the other technique, called
“implicit feedback”, does not require any active user involvement, in the sense
that feedback is derived from monitoring and analyzing user’s activities. Explicit
evaluations indicate how relevant or interesting an item is to the user [111]. Explicit
feedback has the advantage of simplicity, albeit the adoption of numeric/symbolic
scales increases the cognitive load on the user, and may not be adequate for catching
user’s feeling about items. Implicit feedback methods are based on assigning a
relevance score to specific user actions on an item, such as saving, discarding,
printing, bookmarking, etc. The main advantage is that they do not require a direct
user involvement, even though biasing is likely to occur, e.g. interruption of phone
calls while reading.
In order to build the profile of the active user ua , the training set TRa for ua must
be defined. TRa is a set of pairs hIk ; rk i, where rk is the rating provided by ua on the
item representation Ik . Given a set of item representation labeled with ratings, the
PROFILE LEARNER applies supervised learning algorithms to generate a predictive
model—the user profile—which is usually stored in a profile repository for later
use by the FILTERING COMPONENT. After the user profile has been learned, the
FILTERING COMPONENT predicts whether a new item is likely to be of interest
for the active user, by comparing features in the item representation to those in the
representation of user preferences (stored in the user profile).
User tastes usually change in time, therefore up-to-date information must be
maintained and provided to the PROFILE LEARNER in order to automatically
update the user profile. Further feedback is gathered on generated recommendations
by letting users state their satisfaction or dissatisfaction with items in La . After
gathering that feedback, the learning process is performed again on the new training
set, and the resulting profile is adapted to the updated user interests. The iteration
of the feedback-learning cycle over time enables the system to take into account the
dynamic nature of user preferences.
4 Semantics-Aware Content-Based Recommender Systems 123

4.2.1 Keyword-Based Vector Space Model

Most content-based recommender systems use relatively simple retrieval models,

such as keyword matching or the Vector Space Model (VSM). VSM is a spatial
representation of text documents. In that model, each document is represented by a
vector in a n-dimensional space, where each dimension corresponds to a term from
the overall vocabulary of a given document collection.
Formally, every document is represented as a vector of term weights, where
each weight indicates the degree of association between the document and the
term. Let D D fd1 ; d2 ; : : : ; dN g denote a set of documents or corpus, and T D
ft1 ; t2 ; : : : ; tn g be the dictionary, that is to say the set of words in the corpus. T
is obtained by applying some standard natural language processing operations,
such as tokenization, stopwords removal, and stemming [6]. Each document dj is
!

represented as a vector in a n-dimensional vector space, so dj D hw1j ; w2j ; : : : ; wnj i,
where wkj is the weight for term tk in document dj .
Document representation in the VSM raises two issues: weighting the terms and
measuring the feature vector similarity. The most commonly used term weighting
scheme, TF-IDF (Term Frequency-Inverse Document Frequency) weighting, is
based on empirical observations regarding text [117]:
• rare terms are not less relevant than frequent terms (IDF assumption);
• multiple occurrences of a term in a document are not less relevant than single
occurrences (TF assumption);
• long documents are not preferred to short documents (normalization
assumption).
In other words, terms that occur frequently in one document (TF=term-
frequency), but rarely in the rest of the corpus (IDF=inverse-document-frequency),
are more likely to be relevant to the topic of the document. In addition, normalizing
the resulting weight vectors prevent longer documents from having a better chance
of retrieval. These assumptions are well exempliﬁed by the TF-IDF function:

N
TF - IDF.tk ; dj / D TF.tk ; dj / log (4.1)
„ ƒ‚ … nk
TF „ƒ‚…
IDF

where N denotes the number of documents in the corpus, and nk denotes the number
of documents in the collection in which the term tk occurs at least once.

fk;j
TF .tk ; dj / D (4.2)
maxz fz;j

where the maximum is computed over the frequencies fz;j of all terms tz that
occur in document dj . In order for the weights to fall in the Œ0; 1 interval and for
124 M. de Gemmis et al.

the documents to be represented by vectors of equal length, weights obtained by

Eq. (4.1) are usually normalized by cosine normalization:

TF - IDF.tk ; dj /
wk;j D q (4.3)
PjTj 2
sD1 TF - IDF .ts ; dj /

which enforces the normalization assumption.

As stated earlier, a similarity measure is required to determine the closeness
between two documents. Many similarity measures have been derived to describe
the proximity of two vectors; among those measures, cosine similarity is the most
widely used:
P
wki wkj
sim.di ; dj / D pP k
pP (4.4)
2 2
k wki k wkj

In content-based recommender systems relying on VSM, both user proﬁles and

items are represented as weighted term vectors. Predictions of a user’s interest in a
particular item can be derived by computing the cosine similarity.

4.2.2 Methods for Learning User Proﬁles

Machine learning techniques generally used in the task of inducing content-

based profiles, are well-suited for text categorization [119]. In a machine learning
approach to text categorization, an inductive process automatically builds a text
classifier from a set of training documents, i.e. documents labeled with the
categories they belong to.
The problem of learning user profiles can be cast as a binary text categorization
task: each document has to be classified as interesting or not with respect to the
user preferences. Therefore, the set of categories is C D fcC ; c g, where cC is
the positive class (user-likes) and c the negative one (user-dislikes). Classifiers
can be also adopted with a set of categories which is not binary. Besides the use
of classifiers, other machine learning algorithms, such as linear regression, can be
adopted to predict numerical ratings. The most used learning algorithms in content-
based recommender systems are based on probabilistic methods, relevance feedback
and k-nearest neighbors [6].

4.2.2.1 Probabilistic Methods

Naïve Bayes is a probabilistic approach to inductive learning, and belongs to the

general class of Bayesian classiﬁers. These approaches generate a probabilistic
model based on previously observed data. The model estimates the a posteriori
4 Semantics-Aware Content-Based Recommender Systems 125

probability, P.cjd/, of document d belonging to class c. This estimation is based

on the a priori probability, P.c/, the probability of observing a document in class
c, P.djc/, the probability of observing the document d given c, and P.d/, the
probability of observing the instance d. Using these probabilities, the Bayes theorem
is applied to calculate P.cjd/:
P.c/P.djc/
P.cjd/ D (4.5)
P.d/
To classify the document d, the class with the highest probability is chosen:
P.cj /P.djcj /
c D argmaxcj
P.d/
P.d/ is generally removed as it is equal for all cj . As we do not know the value
for P.djc/ and P.c/, we estimate them by observing the training data. However,
estimating P.djc/ in this way is problematic, as it is very unlikely to see the
same document more than once: the observed data is generally not enough to
be able to generate good probabilities. The naïve Bayes classifier overcomes this
problem by simplifying the model through the independence assumption: all the
words or tokens in the observed document d are conditionally independent of each
other given the class. Individual probabilities for the words in a document are
estimated one by one rather than the complete document as a whole. The conditional
independence assumption is clearly violated in real-world data, however, despite
these violations, empirically the naïve Bayes classifier does a good job in classifying
text documents [12, 70].
There are two commonly used working models of the naïve Bayes classifier,
the multivariate Bernoulli event model and the multinomial event model [77]. Both
models treat a document as a vector of values over the corpus vocabulary, V, where
each entry in the vector represents whether a word occurred in the document, hence
both models lose information about word order. The multivariate Bernoulli event
model encodes each word as a binary attribute, i.e., whether a word appeared or not,
while the multinomial event model counts how many times the word appeared in
the document. Empirically, the multinomial naïve Bayes formulation was shown to
outperform the multivariate Bernoulli model. This effect is particularly noticeable
for large vocabularies [77]. The way the multinomial event model uses its document
vector to calculate P.cj jdi / is as follows:

Y
P.cj jdi / D P.cj / P.tk jcj /N.di ;tk / (4.6)
tk 2Vdi

where N.di ;tk / is deﬁned as the number of times word or token tk appeared in
document di . Notice that, rather than getting the product of all the words in the
corpus vocabulary V, only the subset of the vocabulary, Vdi , containing the words
that appear in the document di , is used. A key step in implementing naïve Bayes
126 M. de Gemmis et al.

is estimating the word probabilities P.tk jcj /. To make the probability estimates
more robust with respect to infrequently encountered words, a smoothing method
is used to modify the probabilities that would have been obtained by simple event
counting. One important effect of smoothing is that it avoids assigning probability
values equal to zero to words not occurring in the training data for a particular
class. A rather simple smoothing method relies on the common Laplace estimates
(i.e., adding one to all the word counts for a class). A more interesting method is
Witten-Bell [129].
Although naïve Bayes performances are not as good as some other statistical
learning methods such as nearest-neighbor classifiers or support vector machines,
it has been shown that it can perform surprisingly well in the classification tasks
where the computed probability is not important [40]. Another advantage of the
naïve Bayes approach is that it is very efficient and easy to implement compared to
other learning methods.

4.2.2.2 Relevance Feedback

Relevance feedback is a technique adopted in Information Retrieval that helps users

to incrementally refine queries based on previous search results. It consists of the
users feeding back into the system decisions on the relevance of retrieved documents
with respect to their information needs.
Relevance feedback and its adaptation to text categorization, the well-known
Rocchio’s formula [113], are commonly adopted by content-based recommender
systems. The general principle is to let users to rate documents suggested by the
recommender system with respect to their information need. This form of feedback
can subsequently be used to incrementally refine the user profile or to train the
learning algorithm that infers the user profile as a classifier. Some linear classifiers
consist of an explicit profile (or prototypical document) of the category [119]. The
Rocchio’s method is used for inducing linear, profile-style classifiers. This algorithm
represents documents as vectors, so that documents with similar content have similar
vectors. Each component of such a vector corresponds to a term in the document,
typically a word. The weight of each component is computed using the TF-IDF
term weighting scheme. Learning is achieved by combining document vectors (of
positive and negative examples) into a prototype vector for each class in the set
of classes C. To classify a new document d, the similarity between the prototype
vectors and the corresponding document vector representing d are calculated for
each class (for example by using the cosine similarity measure), then d is assigned
to the class whose document vector has the highest similarity value.
More formally, Rocchio’s method computes a classifier !
ci D h!1i ; : : : ; !jTji i for
the category ci (T is the vocabulary, that is the set of distinct terms in the training
set) by means of the formula:
X wkj X wkj
!ki D ˇ (4.7)
jPOSi j jNEGi j
fdj 2POSi g fdj 2NEGi g
4 Semantics-Aware Content-Based Recommender Systems 127

where wkj is the TF-IDF weight of the term tk in document dj , POSi and NEGi are
the set of positive and negative examples in the training set for the speciﬁc class ci , ˇ
and are control parameters that allow to set the relative importance of all positive
and negative examples. To assign a class cQ to a document dj , the similarity between
!
each prototype vector ! c and the document vector d is computed and cQ will be the
i j
ci with the highest value of similarity. The Rocchio-based classiﬁcation approach
does not have any theoretic underpinning and there are guarantees on performance
or convergence [108].

4.2.2.3 Nearest Neighbors

Nearest neighbor algorithms, also called lazy learners, simply store training data in
memory, and classify a new unseen item by comparing it to all stored items by using
a similarity function. The “nearest neighbor” or the “k-nearest neighbors” items are
determined, and the class label for the unclassified item is derived from the class
labels of the nearest neighbors. A similarity function is needed, for example the
cosine similarity measure is adopted when items are represented using the VSM.
Nearest neighbor algorithms are quite effective, albeit the most important drawback
is their inefficiency at classification time, since they do not have a true training phase
and thus defer all the computation to classification time.

4.2.3 Advantages and Drawbacks of Content-Based Filtering

The adoption of the content-based recommendation paradigm has several advan-

tages when compared to the collaborative one:
• USER INDEPENDENCE—Content-based recommenders exploit solely ratings
provided by the active user to build her own profile. Instead, collaborative
filtering methods need ratings from other users in order to find the “nearest
neighbors” of the active user, i.e., users that have similar tastes since they
rated the same items similarly. Then, only the items that are most liked by the
neighbors of the active user will be recommended;
• TRANSPARENCY—Explanations on how the recommender system works can be
provided by explicitly listing content features or descriptions that caused an item
to occur in the list of recommendations. Those features are indicators to consult
in order to decide whether to trust a recommendation. Conversely, collaborative
systems are black boxes since the only explanation for an item recommendation
is that unknown users with similar tastes liked that item;
• NEW ITEM—Content-based recommenders are capable of recommending items
not yet rated by any user. As a consequence, they do not suffer from the first-rater
problem, which affects collaborative recommenders which rely solely on users’
preferences to make recommendations. Therefore, until the new item is rated by
a substantial number of users, the system would not be able to recommend it.
128 M. de Gemmis et al.

Nonetheless, content-based systems have several shortcomings:

• LIMITED CONTENT ANALYSIS—Content-based techniques have a natural limit
in the number and type of features that are associated, whether automatically
or manually, with the objects they recommend. Domain knowledge is often
needed, e.g., for movie recommendations the system needs to know the actors
and directors, and sometimes, domain ontologies are also needed. No content-
based recommendation system can provide suitable suggestions if the analyzed
content does not contain enough information to discriminate items the user
likes from items the user does not like. Some representations capture only
certain aspects of the content, but there are many others that would influence
a user’s experience. For instance, often there is not enough information in the
word frequency to model the user interests in jokes or poems, while techniques
for affective computing would be most appropriate. Again, for Web pages,
feature extraction techniques from text completely ignore aesthetic qualities
and additional multimedia information. Furthermore, CBRSs based on a string
matching approach suffer from problems of:
– POLYSEMY, the presence of multiple meanings for one word;
– SYNONYMY, multiple words with the same meaning;
– MULTI-WORD EXPRESSIONS, the difficulty to assign the correct properties to
a sequence of two or more words whose properties are not predictable from
the properties of the individual words;
– ENTITY IDENTIFICATION or NAMED ENTITY RECOGNITION, the difficulty
to locate and classify elements in text into pre-defined categories such as the
names of persons, organizations, locations, expressions of times, quantities,
monetary values, etc.
– ENTITY LINKING or NAMED ENTITY DISAMBIGUATION, the difficulty of
determining the identity (often called the reference) of entities mentioned in
text.
• OVER-SPECIALIZATION—Content-based recommenders have no inherent
method for finding something unexpected. The system suggests items whose
scores are high when matched against the user profile, hence the user is going
to be recommended items similar to those already rated. This drawback is also
called lack of serendipity problem to highlight the tendency of the content-based
systems to produce recommendations with a limited degree of novelty. To give
an example, when a user has only rated movies directed by Stanley Kubrick,
she will be recommended just that kind of movies. A “perfect” content-based
technique would rarely find anything novel, limiting the range of applications for
which it would be useful.
• NEW USER—Enough ratings have to be collected before a content-based rec-
ommender system can really understand user preferences and provide accurate
recommendations. Therefore, when few ratings are available, as for a new user,
the system will not be able to provide reliable recommendations.

Book Recommendation System
No ratings yet
Book Recommendation System
1 page
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
Recommender Systems Capstone Report
No ratings yet
Recommender Systems Capstone Report
7 pages
Recommender Systems
No ratings yet
Recommender Systems
6 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
Rs Techneo Chp-5
No ratings yet
Rs Techneo Chp-5
14 pages
Movie Recommender System PDF
100% (1)
Movie Recommender System PDF
5 pages
M21DGS323 - 2610 - 02
No ratings yet
M21DGS323 - 2610 - 02
77 pages
B.Tech Movie Recommender Report
No ratings yet
B.Tech Movie Recommender Report
44 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
46 pages
An Online Voting System Using Biometric Fingerprint and Aadhaar Card PDF
No ratings yet
An Online Voting System Using Biometric Fingerprint and Aadhaar Card PDF
6 pages
Ambo Town Credit System Project
No ratings yet
Ambo Town Credit System Project
75 pages
Final Report
100% (1)
Final Report
20 pages
AOPA - GPS Technology
100% (1)
AOPA - GPS Technology
16 pages
Study On Movie Recommendation System Using Machine Learning
No ratings yet
Study On Movie Recommendation System Using Machine Learning
4 pages
TCS - Book Recommendation System
No ratings yet
TCS - Book Recommendation System
11 pages
Recommender Systems - Chaptre1
No ratings yet
Recommender Systems - Chaptre1
62 pages
Final Year Project (Product Recommendation)
No ratings yet
Final Year Project (Product Recommendation)
33 pages
ATV600 Communication Parameters EAV64332 V3.6
No ratings yet
ATV600 Communication Parameters EAV64332 V3.6
324 pages
How To Easily Generate Sales Funnels and Growth Hack Your Business Using ClickFunnels - Kev Chavez - Your Keen & Crisp VP
50% (8)
How To Easily Generate Sales Funnels and Growth Hack Your Business Using ClickFunnels - Kev Chavez - Your Keen & Crisp VP
103 pages
A Movie Recommendation System Based On A Convolutional Neural Network
No ratings yet
A Movie Recommendation System Based On A Convolutional Neural Network
13 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
5 pages
Cranes&Hoists For Mining Industry
No ratings yet
Cranes&Hoists For Mining Industry
2 pages
Recommender System With PHP-SQL
100% (1)
Recommender System With PHP-SQL
8 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
5 pages
Content-Based Movie Recommender Report
No ratings yet
Content-Based Movie Recommender Report
40 pages
Book Recs for Tech Students
No ratings yet
Book Recs for Tech Students
7 pages
Python Recommendation Systems Guide
No ratings yet
Python Recommendation Systems Guide
11 pages
Example Network Diagram: Msa Bts1 Bsc1 Msc/Vlr1 Air Interface/Lapdm Abis Interface/Lapd A Interface Map - E Interface
No ratings yet
Example Network Diagram: Msa Bts1 Bsc1 Msc/Vlr1 Air Interface/Lapdm Abis Interface/Lapd A Interface Map - E Interface
40 pages
Yubraj Shrestha
No ratings yet
Yubraj Shrestha
15 pages
Other Techiniques
No ratings yet
Other Techiniques
63 pages
Grocery Shopping Android
No ratings yet
Grocery Shopping Android
3 pages
Movie Recommendation System PDF
No ratings yet
Movie Recommendation System PDF
48 pages
Personalized E-Learning Recommender System
No ratings yet
Personalized E-Learning Recommender System
19 pages
A Hybrid Recommender System For Recommending Smartphones To
No ratings yet
A Hybrid Recommender System For Recommending Smartphones To
24 pages
Music Stream App Project Report
No ratings yet
Music Stream App Project Report
41 pages
Internship Report
No ratings yet
Internship Report
26 pages
Movie Recommendation System: CSN-382 Project
No ratings yet
Movie Recommendation System: CSN-382 Project
25 pages
Movie Recommender System Insights
No ratings yet
Movie Recommender System Insights
7 pages
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
No ratings yet
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
9 pages
AP-Lab Manual - Updated
No ratings yet
AP-Lab Manual - Updated
110 pages
ML Project Movie Recommendation System
No ratings yet
ML Project Movie Recommendation System
2 pages
Book Recommender System Using Hadoop
100% (7)
Book Recommender System Using Hadoop
55 pages
Movie Recommendations
No ratings yet
Movie Recommendations
35 pages
Unit 4 Rs
No ratings yet
Unit 4 Rs
10 pages
DPP ITNO Pump PetroTec CEM03 80335202
No ratings yet
DPP ITNO Pump PetroTec CEM03 80335202
9 pages
Yogesh Limbu
No ratings yet
Yogesh Limbu
25 pages
ColorGATE RIP-Software Release Notes 8.00 Build 5055
No ratings yet
ColorGATE RIP-Software Release Notes 8.00 Build 5055
34 pages
Recommender Systems: A Project Report Submitted in Partial Fulfillment of Requirement For The Award in The Degree of
No ratings yet
Recommender Systems: A Project Report Submitted in Partial Fulfillment of Requirement For The Award in The Degree of
33 pages
Remote Entity Authentication Using Chaotic Maps in Telemedicine (React)
No ratings yet
Remote Entity Authentication Using Chaotic Maps in Telemedicine (React)
13 pages
Topic:-Product Recommendation System Using Machine Learning
No ratings yet
Topic:-Product Recommendation System Using Machine Learning
26 pages
Movie Recommendation System Using Machine Learning
No ratings yet
Movie Recommendation System Using Machine Learning
23 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
17 pages
Final
No ratings yet
Final
58 pages
PROBLEM SENSING FOR TEACHERS AND MTs
No ratings yet
PROBLEM SENSING FOR TEACHERS AND MTs
91 pages
ResearchPaperRecommenderSystems ALiteratureSurvey Preprint
No ratings yet
ResearchPaperRecommenderSystems ALiteratureSurvey Preprint
70 pages
Movie Recommender System
No ratings yet
Movie Recommender System
47 pages
Recommendation of Indian Recipes Based On Ingredients
No ratings yet
Recommendation of Indian Recipes Based On Ingredients
18 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
First Quarter Examination in Epas G12
100% (1)
First Quarter Examination in Epas G12
3 pages
Keyword Protocol 2000 - Part 1 - Physical Layer - Swedish
No ratings yet
Keyword Protocol 2000 - Part 1 - Physical Layer - Swedish
12 pages
Least Mastered Competency: Consolidated
No ratings yet
Least Mastered Competency: Consolidated
2 pages
Interview Questions
No ratings yet
Interview Questions
50 pages
Factsheet Ric290 2018-08 en Web
No ratings yet
Factsheet Ric290 2018-08 en Web
2 pages
A Report of Recommender Systems: Ennan Zhai Peking University Zhaien@infosec - Pku.edu - CN
No ratings yet
A Report of Recommender Systems: Ennan Zhai Peking University Zhaien@infosec - Pku.edu - CN
19 pages
Project Proposal
No ratings yet
Project Proposal
3 pages
Medium Voltage Detector Guide
No ratings yet
Medium Voltage Detector Guide
1 page
Free AI Tools To Boost Task Productivity and Work Efficiency
No ratings yet
Free AI Tools To Boost Task Productivity and Work Efficiency
3 pages
RDBMS - SQL
No ratings yet
RDBMS - SQL
30 pages
L21 L22 Varying CTReconstruction Parameters
No ratings yet
L21 L22 Varying CTReconstruction Parameters
24 pages
Online Recommendation System
No ratings yet
Online Recommendation System
42 pages
Movie Recommendation System Using Sentiment Analys
No ratings yet
Movie Recommendation System Using Sentiment Analys
20 pages
Movie Recommendation via Classification
No ratings yet
Movie Recommendation via Classification
8 pages
Python Lab
No ratings yet
Python Lab
21 pages
MOSFET Basics for Engineering Students
No ratings yet
MOSFET Basics for Engineering Students
46 pages
Recommendation System
No ratings yet
Recommendation System
19 pages
Movie Recommendation System: Using Machine Learning
No ratings yet
Movie Recommendation System: Using Machine Learning
7 pages
I 3 Lines THEORY
No ratings yet
I 3 Lines THEORY
4 pages
Weather Forecasting Final Project Report
No ratings yet
Weather Forecasting Final Project Report
27 pages
Movie Recommender System Report
No ratings yet
Movie Recommender System Report
27 pages
Movie Recommendation System-1
No ratings yet
Movie Recommendation System-1
25 pages
Movie Recommendation System Presentation
No ratings yet
Movie Recommendation System Presentation
15 pages
Black Book
No ratings yet
Black Book
58 pages
Movie Recommendation Engine Using Artificial Intelligence
No ratings yet
Movie Recommendation Engine Using Artificial Intelligence
30 pages
Exam AZ-120 Topic 10 Question 2 Discussion - ExamTopics
No ratings yet
Exam AZ-120 Topic 10 Question 2 Discussion - ExamTopics
3 pages
GAMMA Building Control KNX 2012
No ratings yet
GAMMA Building Control KNX 2012
324 pages
WM 2024
No ratings yet
WM 2024
6 pages
Bus Naming On Xilinx Schematics PDF
No ratings yet
Bus Naming On Xilinx Schematics PDF
3 pages
Egbuziem CelestinaCV
No ratings yet
Egbuziem CelestinaCV
3 pages
Log
No ratings yet
Log
4 pages

Semantics-Aware Content-Based Recommender Systems

Uploaded by

Semantics-Aware Content-Based Recommender Systems

Uploaded by

Chapter 4

Marco de Gemmis, Pasquale Lops, Cataldo Musto, Fedelucio Narducci,

Content-based recommender systems (CBRSs) rely on item and user descriptions

M. de Gemmis • P. Lops () • C. Musto • F. Narducci • G. Semeraro

© Springer Science+Business Media New York 2015 119

4.2 Overview of Content-Based Recommender Systems

CONTENT New Active user ua

Fig. 4.1 High level architecture of a content-based recommender

are analyzed by feature extraction techniques in order to shift item representation

4.2.1 Keyword-Based Vector Space Model

Most content-based recommender systems use relatively simple retrieval models,

the documents to be represented by vectors of equal length, weights obtained by

which enforces the normalization assumption.

In content-based recommender systems relying on VSM, both user proﬁles and

4.2.2 Methods for Learning User Proﬁles

Machine learning techniques generally used in the task of inducing content-

4.2.2.1 Probabilistic Methods

Naïve Bayes is a probabilistic approach to inductive learning, and belongs to the

probability, P.cjd/, of document d belonging to class c. This estimation is based

4.2.2.2 Relevance Feedback

Relevance feedback is a technique adopted in Information Retrieval that helps users

4.2.2.3 Nearest Neighbors

4.2.3 Advantages and Drawbacks of Content-Based Filtering

The adoption of the content-based recommendation paradigm has several advan-

Nonetheless, content-based systems have several shortcomings:

You might also like