Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views17 pages

A Survey On Knowledge Graph-Based Recommender Systems

This document presents a systematic survey of knowledge graph-based recommender systems, highlighting their ability to enhance user experience by addressing challenges like data sparsity and cold start issues. The authors analyze various algorithms that utilize knowledge graphs for accurate and explainable recommendations, as well as the datasets used in this research area. Additionally, the paper outlines potential future research directions in the field of knowledge graph-based recommendations.

Uploaded by

Äbdou Lacosta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views17 pages

A Survey On Knowledge Graph-Based Recommender Systems

This document presents a systematic survey of knowledge graph-based recommender systems, highlighting their ability to enhance user experience by addressing challenges like data sparsity and cold start issues. The authors analyze various algorithms that utilize knowledge graphs for accurate and explainable recommendations, as well as the datasets used in this research area. Additionally, the paper outlines potential future research directions in the field of knowledge graph-based recommendations.

Uploaded by

Äbdou Lacosta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

1

A Survey on Knowledge Graph-Based


Recommender Systems
Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Senior Member, IEEE, Xing Xie, Senior
Member, IEEE, Hui Xiong, Fellow, IEEE, and Qing He

Abstract—To solve the information explosion problem and enhance user experience in various online applications, recommender
systems have been developed to model users preferences. Although numerous efforts have been made toward more personalized
recommendations, recommender systems still suffer from several challenges, such as data sparsity and cold start. In recent years,
generating recommendations with the knowledge graph as side information has attracted considerable interest. Such an approach can
not only alleviate the abovementioned issues for a more accurate recommendation, but also provide explanations for recommended
arXiv:2003.00911v1 [cs.IR] 28 Feb 2020

items. In this paper, we conduct a systematical survey of knowledge graph-based recommender systems. We collect recently
published papers in this field and summarize them from two perspectives. On the one hand, we investigate the proposed algorithms by
focusing on how the papers utilize the knowledge graph for accurate and explainable recommendation. On the other hand, we
introduce datasets used in these works. Finally, we propose several potential research directions in this field.

Index Terms—Knowledge Graph, Recommender System, Explainable Recommendation.

1 I NTRODUCTION mented in multiple scenarios, without the efforts of extract-


ing features in content-based recommender systems [5], [6].
With the rapid development of the internet, the volume of
However, CF-based recommendation suffers from the data
data has grown exponentially. Because of the overload of
sparsity and cold start problems [6]. To address these issues,
information, it is difficult for users to pick out what interests
hybrid recommender systems have been proposed to unify
them among a large number of choices. To improve the
the interaction-level similarity and content-level similarity.
user experience, recommender systems have been applied
In this process, multiple types of side information have been
for scenarios such as music recommendation [1], movie
explored, such as item attributes [7], [8], item reviews [9],
recommendation [2], and online shopping [3].
[10], and users’ social networks [11], [12].
The recommendation algorithm is the core element of
In recent years, introducing a knowledge graph (KG)
recommender systems, which are mainly categorized into
into the recommender system as side information has at-
collaborative filtering (CF)-based recommender systems,
tracted the attention of researchers. A KG is a heterogeneous
content-based recommender systems, and hybrid recom-
graph, where nodes function as entities, and edges represent
mender systems [4]. CF-based recommendation models user
relations between entities. Items and their attributes can be
preference based on the similarity of users or items from
mapped into the KG to understand the mutual relations be-
the interaction data, while content-based recommendation
tween items [2]. Moreover, users and user side information
utilizes item’s content features. CF-based recommender sys-
can also be integrated into the KG, which makes relations
tems have been widely applied because they are effective
between users and items, as well as the user preference, can
to capture the user preference and can be easily imple-
be captured more accurately [13]. Figure 1 is an example of
KG-based recommendation, where the movie “Avatar” and
• Qingyu Guo, Fuzhen Zhuang, Qing He are with Key Lab of Intelligent “Blood Diamond” are recommended to Bob. This KG con-
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China. Qingyu Guo is
tains users, movies, actors, directors, and genres as entities,
also with Hong Kong University of Science and Technology, Clearwater while interaction, belonging, acting, directing, and friend-
Bay, Kowloon, Hong Kong. Fuzhen Zhuang and Qing He are also with ship are relations between entities. With the KG, movies
University of Chinese Academy of Sciences, Beijing 100049, China. and users are connected with different latent relations,
E-mail: [email protected], {zhuangfuzhen, heqing}@ict.ac.cn
which helps to improve the precision of recommendation.
• Chuan Qin is with University of Science and Technology of Another benefit of KG-based recommender system is the
China and Baidu Talent Intelligence Center, Baidu Inc. E-mail: explainability of recommendation results [14]. In the same
[email protected]
example, reasons for recommending these two movies to
• Hengshu Zhu and Hui Xiong is with Baidu Talent Intelligence Center, Bob can be known by following the relation sequences in the
Baidu Inc. Hui Xiong is also with Business Intelligence Lab, Baidu user-item graph. For instance, one reason for recommending
Research. E-mail: {zhuhengshu, xionghui}@gmail.com “Avatar” is that “Avatar” is the same genre as “Interstellar”,
• Xing Xie is with Microsoft Research Asia, Beijing, China. E-mail: which was watched by Bob before. Recently, multiple KGs
[email protected] have been proposed, such as Freebase [15], DBpedia [16],
YAGO [17], and Google’s Knowledge Graph [18], which
• Fuzhen Zhuang is the corresponding author.
makes it convenient to build KGs for recommendation.
2

be discovered through these relational links. The concept of


genre knowledge graph was developed in the 1980s [21], when
tch
ed KGs were integrated into the framework of expert systems
wa Science
Interstellar
genre
fiction include for medical and social sciences. Later, the application was
Bob wa
tc
broadened to linguistic and logical domains. In 2012, Google
hed acted Avatar
friend act introduced the KG into the framework of search, for a better
Inception Leonardo understanding of the query and to make search results
DiCaprio
acted
direct
more user-friendly [18]. To date, KGs have been created
wa
tc hed
and applied in multiple scenarios, including search engines,
Alice
directed Blood Diamond recommender systems, Question Answering system [22],
James
Titanic
Cameron Recommended movies
relation detection [23], etc.
We list some popular KGs in Table 1. Based on the
Fig. 1. An illustration of KG-based recommendation. scope of the knowledge covered, these KGs can be di-
This survey aims to provide a comprehensive review of vided into two classes. The first group are cross-domain
the literature utilizing KGs as side information in recom- KGs, such as Freebase [15], DBpedia [16], YAGO [17], and
mender systems. Throughout our investigation, we discover NELL [24], while the second are domain-specific KGs, like
that existing KG-based recommender systems apply KGs Bio2RDF [25]. Six of the cross-domain KGs are utilized
in three ways: the embedding-based method, the path- in recommender systems in this survey, and we briefly
based method, and the unified method. We illustrate the introduce them as follows: Freebase [15] was launched in
similarities and differences between these methods in detail. 2007 by Metaweb and was acquired by Google in 2010. It
Besides the more accurate recommendation, another benefit contains more than 3 billion facts and almost 50 million
of KG-based recommendation is the interpretability. We entities by 2015 [26]. Though it is a cross-domain KG,
discuss how different works utilize the KG for explain- around 77% of its information is in the domain of media [27].
able recommendation. In addition, based on our survey, Currently, the data is available at Google’s Data Dumps [28].
we find that KGs serve as side information in multiple DBpedia [16] is an open community project, which was
scenarios, including the recommendation for movies, books, started by researchers from the Free University of Berlin and
news, products, points of interest (POIs), music, and social Leipzig University, in cooperation with OpenLink Software.
platform. We gather recent works, categorize them by the The first version was released in 2007 and is updated yearly.
application, and collect datasets evaluated in these works. The main knowledge is extracted from different language
The organization of this survey is as follows: in Section versions of Wikipedia, and DBpedia combines them in a
2, we introduce the foundations of KGs and recommender large-scale graph structure. YAGO [17] (Yet Another Great
systems; in Section 3, we present notations and concepts Ontology) was introduced by the Max Planck Institute in
used in this paper; in Section 4 and Section 5, we review KG- 2007. It contains more than 5 million facts, such as people,
based recommender systems from the aspect of approaches locations, and organizations. It automatically extracts and
and evaluated datasets, respectively; in Section 6, we pro- unifies knowledge from Wikipedia and multiple sources,
vide some potential research directions in this field; finally, including WordNet [29] and GeoNames [30], then unifies
we conclude this survey in Section 7. them into an RDF graph. Satori [31] is a KG proposed by
Microsoft. Similarly to Google’s Knowledge Graph, which
empowers the Google search engine, Satori has been in-
2 R ELATED W ORK tegrated into the search engine Bing. Though publicly ac-
This section introduces the fundamental knowledge and cessible documents about the Satori KG are limited, it is
summarizes related work in the domain of KG-based rec- known that Satori consisted of 300 million entities and 800
ommendation, including KGs and recommender systems. million relations in 2012 [32]. CN-DBPedia [33] is the largest
Chinese KG. Published by Fudan University in 2015, it has
over 16 million entities and over 220 million relations. It au-
2.1 Knowledge Graphs tomatically extracts knowledge from Baidu Baike, Hudong
The KG is a practical approach to represent large-scale Baike, and Chinese Wikipedia, then integrates them into a
information from multiple domains [19]. A common way to Chinese database. The system is updated continuously with
describe a KG is to follow the Resource Description Frame- little human effort needed.
work (RDF) standard [20], in which nodes represent entities,
while edges in the graph function as relations between en-
tities. Each edge is represented in the form of a triple (head 2.2 Recommender Systems
entity, relation, tail entity), also known as a fact in the graph, Recommender systems have been applied in many
implying the specific relationship between the head entity domains, including movies [2], [44], music [1], [45],
and tail entity. For example, (Donald Trump, president of, POIs [46], [47], news [14], [48], education [49], [50], etc.
America) indicates the fact that Donald Trump is the presi- The recommendation task is to recommend one or a
dent of America. A KG is a heterogeneous network since it series of unobserved items to a given user, and it can be
contains multiple types of nodes and relations in the graph. formulated into the following steps. First, the system learns
Such a graph has strong representation ability since multiple a representation ui and vj for the given user ui and an item
attributes of an entity can be obtained by following different vj . Then, it learns a scoring function f : ui × vj → ŷi,j ,
edges in the graph, and high-level relations of entities can which models the preference of ui for vj . Finally, the
3

TABLE 1
A collection of commonly used knowledge graphs.

KG Name Domain Type Main Knowledge Source


YAGO [17] Cross-Domain Wikipedia [34]
Freebase [15] Cross-Domain Wikipedia, NNDB [35]FMD [36], MusicBrainz [37]
DBpedia [16] Cross-Domain Wikipedia
Satori [31] Cross-Domain Web Data
CN-DBPedia [33] Cross-Domain Baidu Baike [38], Hudong Baike [39], Wikipedia (Chinese)
NELL [24] Cross-Domain Web Data
Wikidata [40] Cross-Domain Wikipedia, Freebase
Google’s Knowledge Graph [18] Cross-Domain Web data
Facebooks Entities Graph [41] Cross-Domain Wikipedia, Facebook data [42]
Bio2RDF [25] Biological Domain Public bioinformatics databases, NCBIs databases
KnowLife [43] Biomedical Domain Scientific literature, Web portals

recommendation can be generated by sorting the preference issue of CF-based recommendation is the sparsity of
scores for items. To learn the user/item representation and user-item interaction data, which makes it difficult to find
the scoring function, there are three main approaches, as similar items or users from the perspective of interaction.
described below. A special case for this issue is the cold-start problem,
• Collaborative Filtering. CF assumes that users may be which means the recommendation for new user or item
interested in items selected by people who share similar is difficult, since the user-user and item-item similarity
interaction records with them. The interaction can either cannot be determined without any interaction records. By
be explicit interaction [51], [52], like ratings, or implicit incorporating content information of users and items, also
interaction [53], [54], such as click and view. To implement known as user side information and item side information,
CF-based recommendation, interaction data from multiple into the CF-based framework, better recommendation
users and items are required, which further forms the performance can be achieved [6]. Some commonly used
user-item interaction matrix. The CF-based approach item side information include item attributes [7], [8], [58],
contains two main techniques, memory-based CF and [58], like brands, categories; item multimedia information,
model-based CF [5]. In detail, memory-based CF first learns like textual description [59], image features [60], audio
the user-user similarity from the user-item interaction signals [61], video features [62]; and item reviews [9],
data. Then, unobserved items are recommended to a [10]. Common options for user side information involve
given user based on the interaction records of people user’s demographic information [63], including occupation,
similar to the specific user. Alternatively, some models gender, and hobbies; and user network [11], [12]. In this
learn the similarity among items, and recommend similar survey, KG-based recommender systems leverage the KG
items for a user based on the user’s purchase history. as the side information, combining the CF-based technique
The model-based CF approach attempts to alleviate the for more accurate recommendation.
sparsity issue by building an inference model. One common
implementation is the latent factor model [55], [56], which
extracts the latent representation of the user and item from
the high dimensional user-item interaction matrix, and then 3 OVERVIEW
computes the similarity between the user and item with the Before delving into the state-of-the-art approaches exploit-
inner product or other methods. ing KGs as side information for recommendation, we first
• Content-based Filtering. Compared with the CF-based present notations and concepts used in the paper to elimi-
model, which learns the representation of user and item nate misunderstanding. For convenience, we list some sym-
from global user-item interaction data, content-based bols and their descriptions in Table 2.
methods depict the user and item from the content of items. • Heterogeneous Information Network (HIN). A HIN is
The assumption of content-based filtering is that users may a directed graph G = (V, E) with an entity type mapping
be interested in items that are similar to their past interacted function φ : V → A and a link type mapping function
items. The item representation is obtained by extracting ψ : E → R. Each entity v ∈ V belongs to an entity type
attributes from the item’s auxiliary information, including φ(v) ∈ A, and each link e ∈ E belongs to a relation type
texts, images, etc., while the user representation is based ψ(e) ∈ R. In addition, the number of entity types |A| > 1
on the features of personal interacted items. The procedure and/or the number of relation types |R| > 1.
of comparing candidate items with the user profile is • Knowledge Graph (KG). A KG Gknow = (V, E) is a
essentially matching them with the user’s previous records. directed graph whose nodes are entities and edges are
Therefore, this approach tends to recommend items that are subject-property-object triple facts. Each edge of the form
similar to items liked by a user in the past [57]. (head entity, relation, tail entity) (denoted as < eh , r, et >)
• Hybrid Method. Hybrid method is to leverage multiple indicates a relationship of r from entity eh to entity et . It can
recommendation techniques in order to overcome the be regarded as an instance of a HIN.
limitation of using only one type of method. One major R1 R2 Rk
• Meta-path. A meta-path P = A0 −→ A1 −→ · · · −→ Ak
4

TABLE 2
Notations used in this paper.

Notations Descriptions
ui User i
vj Item j
ek Entity k in the knowledge graph
rk Relation between two entities (ei , ej ) in the knowledge graph
ŷi,j Predicted user ui ’s preference for item vj
ui ∈ Rd×1 Latent vector of user ui
vj ∈ Rd×1 Latent vector of item vj
ek ∈ Rd×1 Latent vector of entity ek in the KG
rk ∈ Rd×1 Latent vector of relation rk in the KG
U = {u1 , u2 , · · · , um } User set
V = {v1 , v2 , · · · , vn } Item set
U ∈ Rd×m Latent vector of the user set
V ∈ Rd×n Latent vector of the item set
R ∈ Rm×n User-Item Interaction matrix
pk One path k to connect two entities (ei , ej ) in the knowledge graph
P(ei , ej ) = {p1 , p2 , · · · , ps } Path set between entity pair (ei , ej )
Φ Nonlinear Transformation
Element-wise Product
⊕ Vector concatenation operation

is a path defined on the graph of network schema GT = to five. Some papers have extracted the data of score ratings
(A, R), which defines a new composite relation R1 R2 · · · Rk of five to indicate the user’s preference in such a case.
between type A0 and Ak , where Ai ∈ A and Ri ∈ R for • H -hop Neighbor. Nodes in the graph can be connected
r1 r2 rH
i = 0, · · · , k. It is a relation sequence connecting object pairs with a multi-hop relation path: e0 −→ e1 −→ · · · −→ eH ,
in a HIN, which can be used to extract connectivity features in this case, eH is the H-hop neighbor of e0 , which can be
in the graph. represented as eH ∈ NeH0 . Note that Ne00 is e0 itself.
• Meta-graph. Similar to a meta-path, a meta-graph is • Relevant Entity. Given the interaction matrix R and the
another meta-structure that connects two entities in a HIN. knowledge graph Gknow , the set of k -hop relevant entities
The difference is that a meta-path only defines one relation for user u can be represented as
sequence, while a meta-graph is a combination of different n o
meta-paths [64]. Compared with a meta-path, a meta-graph Euk = et |(eh , r, et ) ∈ G and eh ∈ Euk−1 ,
can contain more expressive structural information between k = 1, 2, · · · , H.
entities in the graph.
• Knowledge Graph Embedding (KGE). KGE is to embed where Eu0 = {u|Ruv = 1} is the set of the user’s historical
a KG Gknow = (V, E) into a low dimensional space [65]. interacted items.
After the embedding procedure, each graph component, • User Ripple Set. The ripple set of a user is defined as the
including the entity and the relation, is represented with a knowledge triples with the head entities being (k − 1)-hop
d-dimensional vector. The low dimensional embedding still relevant entities Euk−1 ,
preserves the inherent property of the graph, which can be n o
quantified by semantic meaning or high-order proximity in Suk = (eh , r, et )|(eh , r, et ) ∈ G and eh ∈ Euk−1 ,
the graph. k = 1, 2, · · · , H.
• User Feedback. With m users U = {u1 , · · · , um } and n
items V = {v1 , · · · , vn }, we define the binary user feedback • Entity Ripple Set. The ripple set of an entity e ∈ G is
matrix R ∈ Rm×n as follows: defined as
 n o
1, if (ui , vj ) interaction is observed; Sek = (eh , r, et )|(eh , r, et ) ∈ G and eh ∈ Nek−1 ,
Rij =
0, otherwise.
k = 1, 2, · · · , H.
Note that a value of 1 for Rij indicates there is an implicit
interaction between user ui and item vj , such as behaviors
of clicking, watching, browsing, etc. Such an implicit inter- 4 M ETHODS OF R ECOMMENDER S YSTEMS WITH
action does not necessarily imply ui ’s preference over vj . K NOWLEDGE G RAPHS
Unless otherwise stated, the user feedback used in this pa- In this section, we collect papers related to KG-based
per means the implicit feedback. However, in some specific recommender systems. Based on how these works utilize
scenarios, explicit feedback to show the user’s preference the KG information, we group them into three categories:
can also be available. For example, in movie recommenda- embedding-based methods, path-based methods, and uni-
tion, a user explicitly rates a movie in the score range of one fied methods. We will introduce how different methods
5

leverage KGs to improve the recommendation results. To the user-item interaction matrix. The final representation of
facilitate readers checking the literature, we summarize and each item vj can be written as
organize these papers in Table 3, which lists their publica-
tion information, the approach to utilize a KG for recom-
vj = η j + xj + zt,j + zv,j . (2)
mendation, and the techniques adopted in these works. After obtaining the latent vector ui of the user ui , the
Explainable recommendation has been another hot re- preference score ŷi,j is obtained via the inner product uTi vj .
search topic in recent years. It is helpful for users to adopt Finally, in the prediction stage, items are recommended to
the suggestions generated by the recommender system if ui by the following ranking criteria:
appropriate explanations are provided to them [97]. Com- vj1 > vj2 > · · · > vjn → uTi vj1 > uTi vj2 > · · · > uTi vjn . (3)
pared with traditional recommender systems, KG-based
recommendation makes the reasoning process available. In Experiments show that incorporating structural knowledge
this section, we will also show how different works leverage can boost the performance of recommendation.
KGs for explainable recommendation. Wang et al. [48] proposed DKN for news recommen-
dation. It models the news by combining the textual em-
bedding of sentences learned with Kim CNN [104] and the
4.1 Embedding-based Methods knowledge-level embedding of entities in news content via
TransD. With the incorporation of a KG for entities, high-
The embedding-based methods generally use the informa-
level semantic relations of news can be depicted in the
tion from the KG directly to enrich the representation of
final embedding vj of news vj . In order to capture the
items or users. In order to exploit the KG information,
user’s dynamic interest in news, the representation of ui is
knowledge graph embedding (KGE) algorithms need to be
learned by aggregating the embedding of historical clicked
applied to encode the KG into low-rank embedding. KGE
news {v1 , v2 , · · · , vNi } with an attention mechanism. The
algorithms can be divided into two classes [98]: transla-
attention weight for each news vk (k = 1, 2, · · · , Ni ) in the
tion distance models, such as TransE [99], TransH [100],
clicked news set is calculated via
TransR [101], TransD [102], etc., and semantic matching
models, such as DistMult [103]. exp (g (vk , vj ))
svk ,vj = PNi , (4)
Based on whether users are included in the KG, k=1 exp (g (vk , vj ))
embedding-based methods can be divided into two classes. where g(·) is a DNN layer, vj is the candidate news. Then,
In the first type of method, KGs are constructed with items the final user embedding ui is calculated via the weighted
and their related attributes, which are extracted from the sum of clicked news embeddings:
dataset or external knowledge bases. We name such a graph
Ni
as the item graph. Note that users are not included in such X
an item graph. Papers following this strategy leverage the ui = svk ,vj vk . (5)
knowledge graph embedding (KGE) algorithms to encode k=1

the graph for a more comprehensive representation of items, Finally, user’s preference for candidate news vj can be
and then integrate the item side information into the recom- calculated with Equation 1, where f (·) is a DNN layer.
mendation framework. The general idea can be illustrated Huang et al. [44] proposed the KSR framework for sequen-
as follows. The latent vector vj of each item vj is obtained tial recommendation. KSR uses a GRU network with a
by aggregating information from multiple sources, such as knowledge-enhanced key-value memory network (KV-MN)
the KG, the user-item interaction matrix, item’s content, and to model comprehensive user preference from the sequential
item’s attributes. The latent vector ui of each user ui can interaction. The GRU network captures the user’s sequential
either be extracted from the user-item interaction matrix, or preference, while the KV-MN module utilizes knowledge
the combination of interacted items’ embedding. Then, the base information (learned with TransE) to model the user’s
probability of ui selecting vj can be calculated with attribute-level preference. In this way, fine-grained user
preference can be captured for recommendation. In detail, at
ŷi,j = f (ui , vj ), (1)
time t, the latent vector of ui is represented as uti = hti ⊕ mti ,
where f (·) refers to a function to map the embedding of the where hti and mti stands for the representation of user’s
user and item into a preference score, which can be the inner interaction-level preference and attribute-level preference,
product, DNN, etc. In the recommendation stage, results respectively. The latent vector of vj is represented as vj =
will be generated in descending order of the preference qj ⊕ ej · uti , where qj is the item embedding in the GRU
score ŷi,j . network, and ej is the item embedding in the KG. After
For instance, Zhang et al. [2] proposed CKE, which uni- transforming uti and vj to the same dimension, the user’s
fies various types of side information in the CF framework. preference for items is ranked with the score obtained from
They fed the item’s structural knowledge (item’s attributes Equation 1, where f (·) is the inner product.
represented with knowledge graph) and content (textual The other type of embedding-based method directly
and visual) knowledge into a knowledge base embedding builds a user-item graph, where users, items, and their
module. The latent vector of the item’s structural knowledge related attributes function as nodes. In the user-item graph,
xj is encoded with the TransR algorithm, while the textual both attribute-level relation (brand, category, etc) and user-
feature zt,j and the visual feature zv,j are extracted with related relations (co-buy, co-view, etc.) serve as edges. After
the autoencoder architecture. Then these representations are obtaining the embeddings of entities in the graph, the user’s
aggregated along with the offset vector η j extracted from preference can be calculated with Equation 1, or by further
6

TABLE 3
Table of collected papers. In the table, ‘Emb.’ stands for embedding-based Method, ‘Uni.’ stands for unified method, ‘Att.’ stands for attention
mechanism, ‘RL’ stands for reinforcement learning, ‘AE’ stands for autoencoder, and ‘MF’ stands for matrix factorization.

KG Usage Type Framework


Method Venue Year
Emb. Path Uni. CNN RNN Att. GNN GAN RL AE MF
CKE [2] KDD 2016
entity2rec [66] RecSys 2017
ECFKG [67] Algorithms 2018
SHINE [68] WSDM 2018
DKN [48] WWW 2018
KSR [44] SIGIR 2018
CFKG [13] SIGIR 2018
KTGAN [69] ICDM 2018
KTUP [70] WWW 2019
MKR [45] WWW 2019
DKFM [71] WWW 2019
SED [72] WWW 2019
RCF [73] SIGIR 2019
BEM [74] CIKM 2019
Hete-MF [75] IJCAI 2013
HeteRec [76] RecSys 2013
HeteRec p [77] WSDM 2014
Hete-CF [78] ICDM 2014
SemRec [79] CIKM 2015
ProPPR [80] RecSys 2016
FMG [3] KDD 2017
MCRec [1] KDD 2018
RKGE [81] RecSys 2018
HERec [82] TKDE 2019
KPRN [83] AAAI 2019
RuleRec [84] WWW 2019
PGPR [85] SIGIR 2019
EIUM [86] MM 2019
Ekar [87] arXiv 2019
RippleNet [14] CIKM 2018
RippleNet-agg [88] TOIS 2019
KGCN [89] WWW 2019
KGAT [90] KDD 2019
KGCN-LS [91] KDD 2019
AKUPM [92] KDD 2019
KNI [93] KDD 2019
IntentGC [94] KDD 2019
RCoLM [95] IEEE Access 2019
AKGE [96] arXiv 2019

considering the relation embedding in the graph via recommendation phase, the system will rank candidate
items j in an ascending order of the distance between ui
ŷi,j = f (ui , vj , r), (6)
and vj
where f (·) maps the user representation ui , the item rep- d (ui + rbuy , vj ) , (7)
resentation vj , as well as the relation embedding r into a
where rbuy is the learned embedding for the relation type
scalar.
‘buy’. A smaller distance between ui and vj measured by
Zhang et al. [13] proposed CFKG, which constructs a the ‘buy’ relation refers to a higher preference score ŷi,j .
user-item KG. In this user-item graph, user behaviors (pur-
chase, mention) are regarded as one relation type between Wang et al. [68] proposed SHINE, which takes the
entities, and multiple types of item side information (review, celebrity recommendation task as the sentiment link pre-
brand, category, bought-together, etc.) are included. To learn diction task between entities in the graph. In detail, SHINE
the embedding of entities and relations in the graph, the builds a sentiment network Gs for users and targets (celebri-
model defines a metric function d(·) to measure the distance ties), and utilizes their social network Gr and profile in-
between two entities according to a given relation. In the formation network Gp as side information. These three
7

networks are embedded with the auto-encoder technique, the user-item interaction matrix to contrast the observed
and are then aggregated as the representation of the user interaction pair (ui , vj ) and unobserved interaction pair
and target. Finally, the recommendation can be generated by (ui , vj 0); in the KG-related task, another function g(eh , r, et )
following Equation 1, where f (·) is a DNN layer. Dadoun is learned to determine whether (eh , r, et ) is a valid triplet
et al. [71] proposed DKFM for POI recommendation. DKFM in the KG. These two parts are connected with the following
applies TransE over a city KG to enrich the representation objective function,
of the destination, which shows improvement in the perfor-
mance of POI recommendation.
L = Lrec + λLKG , (9)

Previous works generally directly utilize the raw la- where Lrec is the loss function for recommendation, LKG is
tent vector of structural knowledge learned with the KGE the loss function for the KG related task, and λ is the hyper-
technique for recommendation. Recently, some papers have parameter to balance the two tasks. A general motivation
tried to improve the recommendation performance by refin- for the multi-task learning is that item embeddings in the
ing the learned entity/relation representation. For instance, recommendation module share features with the associated
Yang et al. [69] introduced a GAN-based model, KTGAN, for entity embeddings in the KG.
movie recommendation. In the first phase, KTGAN learns Cao et al. [70] proposed KTUP to jointly learn the task of
the knowledge embedding vkj for movie vj by incorporating recommendation and knowledge graph completion. In the
the Metapath2Vec model [105] on the movie’s KG, and recommendation module, the loss function is defined as
the tag embedding vtj with the Word2Vec model [106] on X
movie’s attributes. The initial latent vector of movie vj is Lrec = − log σ [f (u, v0 , p0 ) − f (u, v, p)] , (10)
represented as vinitial
j = vkj ⊕ vtj . Similarly, the initial latent (u,v,v 0 )∈R

vector of user ui is represented as uinitial


i = uki ⊕ uti , where where (u, v) is the observed user-item pair in the user-item
uki is the average of knowledge embeddings of ui ’s favored interaction matrix (Ruv = 1); (u, v0) denotes the unobserved
movies, and uti is ui ’s tag embedding. Then, a generator user-item pair (Ruv0 = 0); p denotes the latent vector of
G and a discriminator D are proposed to refine initial user’s preference for the given item; f (·) is the proposed
translation-based model, TUP, to model the correctness of
representations of users and items. The generator G tries to such a user-item pair; and σ is the sigmoid function. For the
generate relevant (favorite) movies for user ui according the KG completion module, a hinge loss is adopted,
score function pθ (vj |ui , r), where r denotes the relevance X X
g (eh , r, et ) + γ − g e0h , r0 , e0t
 
between ui and vj . During the training process, G aims LKG = +
,
(eh ,r,et )∈G (e0 ,r 0 ,e0 )∈G −
to let pθ (vj |ui , r) approximate ui ’s true favorite movie dis- h t
(11)
tribution ptrue (vj |ui , r), so that G can select relevant user-
where G − is constructed by replacing eh or et in the valid
movie pairs. The discriminator D is a binary classifier to
triplet (eh , r, et ) ∈ G ; g(·) is the TransH model, and a
distinguish relevant user-movie pairs and irrelevant pairs
lower g(eh , r, et ) value infers a higher correctness of such
according to the learned score function fφ (ui , vj ). The ob-
a triplet; [·]+ , max(0, ·); and γ is the margin between
jective function of the GAN module is written as,
correct triplets and incorrect triplets. The recommendation
M
X module is to mine the preference relation between user
L = min max {Evj ∼ptrue (vj |ui ,r) [log P (vj |ui )] u and item v , while the knowledge completion task is
θ φ
i=1
(8)
to mine the relation among items in the KG. The bridge
+Evj ∼pθ (vj |ui ,r) [log (1 − P (vj |ui ))]}, between these two modules is that items can be aligned
with corresponding entities in the KG, and the user’s
where P (vj |ui ) = 1+exp(−f1φ (ui ,vj )) stands for the proba-
preference is related with relations among entities in the
bility of movie vj being preferred by user ui . After the
KG. Hence, embeddings of items and preferences can be
adversarial training, optimal representations of ui and vj are
enriched by transferring knowledge of entities, relations
learned and movies can be ranked with G’s score function
and preference in each module under the framework of
pθ (vj |ui , r). Later, Ye et al. [74] proposed BEM, which uses
KTUP. Meanwhile, Wang et al. [45] proposed MKR, which
two types of graphs for items, the knowledge-related graph
consists of a recommendation module and a KGE module.
(containing item attributes information, like brand, category,
The former learns latent representation for users and items,
etc.) and behavior graph (containing item interaction-related
while the latter learns representation for item associated
information, including co-buy, co-rate, co-add to cart) for
entities with the semantic matching KGE model. These
recommendation. BEM first learns the initial embeddings
two parts are connected with a cross & compress unit to
from the knowledge-related graph and the behavior graph
transfer knowledge and share regularization of items in
with the TransE model and a GNN-based model, respec-
the recommendation module and entities in the KG. Xin
tively. Then, BEM applies a Bayesian framework to refine
et al. [73] proposed RCF, which introduces a hierarchical
these two types of embeddings mutually. Recommendation
description of items, including both the relation type
can be generated by finding the closest items of the inter-
embedding and relation value embedding. RCF utilizes
acted items in the behavior graph, which are measured by
the DistMult model for KGE to preserve the relational
the relation of ‘co-buy’ or ‘co-click’.
structure between items. Then, it models the user’s type-
Another trend is to adopt the strategy of multi-task level preference and value-level preference separately
learning, to jointly learn the recommendation task with with the attention mechanism. With the jointly training of
the guidance of the KG-related task. Generally, in the rec- recommendation module and the KG relation modeling
ommendation task, a function f (ui , vj ) is learned from module, decent recommendations can be made.
8

Summary for Embedding-based Methods. Most is


L m X
n
embedding-based methods [2], [44], [45], [48], [69], X X
min θl (uTi vj − sli,j )2 . (15)
[70], [72], [73], [74] build KGs with multiple types of item U,V,Θ
l=1 i=1 j=1
side information to enrich the representation of items,
and such information can be used to model the user The user-item similarity term will force the latent vector of
representation more precisely. Some models [13], [66], [67], users and items to be close to each other if their meta-path-
[68], [71] build user-item graphs by introducing users into based similarity is high.
the graph, which can directly model the user preference. Yu et al. [75] proposed the Hete-MF, which extracts L dif-
Entity embedding is the core of embedding-based methods, ferent meta-paths and calculates item-item similarity in each
and some papers refine the embedding with GAN [69] or path. The item-item regularization is integrated with the
BEM [74] for better recommendation. Embedding-based weighted non-negative matrix factorization method [108]
methods leverage the information in the graph structure to refine low-rank representation of users and items for
intrinsically. Papers [45], [70], [73] apply the strategy of better recommendation. Later, Luo et al. [78] proposed Hete-
multi-task learning to jointly train the recommendation CF to find the user’s affinity to unrated items by taking
module along with the graph-related task to improve the the user-user similarity, item-item similarity, and user-item
quality of recommendation. similarity together as regularization terms. Therefore, the
Hete-CF outperforms the Hete-MF model.
Yu et al. [76] proposed HeteRec, which leverages the
meta-path similarities to enrich the user-item interaction
4.2 Path-based Methods matrix R, so that more comprehensive representations of
users and items can be extracted. HeteRec first defines L
Path-based methods build a user-item graph and leverage different types of meta-paths that connect users and items in
the connectivity patterns of the entity in the graph for the HIN. The item-item similarity in each path is measured
recommendation. Path-based methods have been developed with PathSim [107], which further forms L item-item similar
since 2013, and traditional papers call this type of method matrices S (l) ∈ Rn×n , where l = 1, 2, · · · , L. Next, L
as recommendation in the HIN. In general, these mod- diffused user preference matrices R̃(q) are calculated via the
els take advantage of the connectivity similarity of users equation R̃(l) = RS (l) . Then L refined latent vectors of users
and/or items to enhance the recommendation. To measure and items in different meta-paths can be obtained via apply-
the connectivity similarity between entities in the graph, ing the non-negative matrix factorization technique [109] on
PathSim [107] is commonly used. It is defined as these diffused user preference matrices,
 (l) (l)  2
2 × |{px y : px y ∈ P}| Û , V̂ = argminU,V R̃(l) − UT V s.t. U ≥ 0, V ≥ 0.
sx,y = , (12) F
|{px x : px x ∈ P}| + |{py y : py y ∈ P}| (16)
Finally, the recommendation can be generated by combin-
where pm n is a path between the entity m and n.
ing the user’s preference on each path, with the scoring
One type of path-based method leverages semantic sim- function
ilarities of entities in different meta-paths as the graph L
(l)T (l)
X
regularization to refine the representation of users and items ŷi,j = θl · ûi v̂j , (17)
in the HIN. Then, ui ’s preference for vj can be predicted by l=1
following Equation 1, where f (·) refers to the inner product. where θl is the weight for the user-item latent vector pair in
Three types of entity similarities are commonly utilized, the l-th path.
• User-User Similarity: the objective function for this term Later, Yu et al. [77] proposed HeteRec-p, which further
is considers the importance of different meta-paths should
L m X m
2 vary for different users. HeteRec-p first clusters users based
X X
min θl sli,j kui − uj kF . (13)
U,Θ
l=1 i=1 j=1
on their past behaviors into c groups and generates per-
sonalized recommendation with the clustering information,
where k · kF denotes the matrix Frobenius norm, Θ = instead of applying a global preference model. The modified
[θ1 , θ2 , · · · , θL ] denotes the weight for each meta-path, U = scoring function becomes
[u1 , u2 , · · · , um ] denotes latent vectors of all users, and sli,j
c L
denotes the similarity score of user i and j in meta-path l. X X (l)T (l)
The user-user similarity forces the embeddings of users to ŷi,j = sim (Ck , ui ) θlk · ûi v̂j , (18)
k=1 l=1
be close in the latent space if users share high meta-path-
based similarity. where sim (Ck , ui ) denotes the cosine similarity between
• Item-Item Similarity: the objective function for this term user ui and the target user group Ck , and θlk denotes the
is importance of meta-path l for the user group k .
L n Xn
X X 2 To overcome the limitation of the meta-path’s represen-
min θl sli,j kvi − vj kF . (14) tation ability, Zhao et al. [3] designed FMG by replacing
V,Θ
l=1 i=1 j=1
the meta-path with the meta-graph. As a meta-graph con-
where V = [v1 , v2 , · · · , vn ] denotes latent vectors of all tains richer connectivity information than a meta-path, FMG
items. Similar to the user-user similarity, the low-rank repre- can capture the similarity between entities more accurately.
sentations of items should be close if their meta-path-based Then, the model utilizes the matrix factorization (MF) to
similarity is high. generate the latent vectors for both users and items in each
• User-Item Similarity: the objective function for this term meta-graph. Next, the factorization machine (FM) is applied
9

to fuse the features of users and items across different an attention mechanism. The representations of the user
meta-graphs for computing preference score ŷi,j . The FM and item also get updated via the attention mechanism
considers the interaction of entities along different meta- with the final interaction embedding h. Finally, the pref-
graphs, which can further exploit connectivity patterns. erence score is calculated via Equation 20, where f (·) is
The above-mentioned path-based methods only utilize an MLP layer. Sun et al. [81] proposed a recurrent knowl-
the data of user’s favored interacted items. Shi et al. [79] edge graph embedding (RKGE) approach that mines the
proposed the SemRec which considers the interaction of path relation between user ui and item vj automatically,
user’s favored and hated past items. This framework uti- without manually defining meta-paths. Specifically, RKGE
lizes a weighted HIN and weighted meta-path to integrate first enumerates user-to-item paths P(ui , vj ) that connects
attribute values in the link. By modeling both positive and ui and vj with different semantic relations under a sequence
negative preference patterns, more accurate item relations length constraint. Then, each path constructed by the entity
and user similarity can be depicted via these paths to embedding sequence is fed into a recurrent network to
propagate the real user preference. encode the entire path. Next, following Equation 19, final
Another disadvantage of previous methods is the te- hidden states hp of all these paths are aggregated via the
dious requirement of tuning hyper-parameters, for example, average-pooling operation to model the semantic relation
the number of selected meta-paths. To lighten the burden, h between ui and vj . Finally, the preference of ui for vj is
Ma et al. [84] proposed RuleRec to learn relations between estimated with h, and Equation 20 becomes ŷi,j = f (h),
associated items (co-buy, co-view, etc.) by exploiting the where f (·) is a fully-connected layer. By leveraging the
item’s connectivity in an external KG. RuleRec jointly trains information of semantic paths between entity pairs, a better
a rule learning module and an item recommendation mod- representation for ui and vj will be obtained and further be
ule. The rule learning module first links items with associ- integrated with the recommendation generation. Similarly,
ated entities in an external KG. Next, it summarizes explain- Wang et al. [83] proposed a knowledge-aware path recurrent
able rules, which is in the form of meta-paths in the KG. The network (KPRN) solution. KPRN constructs the extracted
corresponding weight for each rule is further learned. Then, path sequence with both the entity embedding and the
the item recommendation module integrates the learned relation embedding. These paths are encoded with an LSTM
rules and rule weights with the user purchase history to layer and the preference of ui for vj in each path is predicted
generate recommendations with the MF technique. Since through fully-connected layers. By aggregating the score in
the rules and rule weights are explicit, this model makes each path via a weighted pooling layer, the final estimation
the recommendation process explainable. of preference can be used for recommendation.
Recently, some frameworks have been proposed to learn Huang et al. [86] designed EIUM, which captures users’
the explicit embedding of paths that connect user-item pairs dynamic interests for sequential recommendation. The rec-
in order to directly model the user-item relations. Assume ommendation module follows the schedule in Equation 19
there are K paths that connect ui and vj in the KG, the and 20. First, each path connecting the user-item pair is
embedding of path p is represented as hp . Then, the final encoded and be aggregated to obtain the interaction embed-
representation of the interaction between ui and vj can be ding h of the user-item pair (ui , vj ). The dynamic preference
obtained via embedding p is further obtained by applying the attention
mechanism on the interaction sequential. The preference
h = g(hp ), p = 1, 2, · · · , K, (19) score can be modeled via ŷi,j = f (h, p). Besides the path-
where g(·) is the function to summarize the information based recommendation module, EIUM further integrates a
from each path embedding, which can be a max-pooling multi-modal fusion constraint module. This module intro-
operation or weighted sum operation. Then, ui ’s preference duces the KG structural constraint into the framework,
for the vj can be modeled via c2c : ehfc + r ≈ etfc , s2s : ehfs + r ≈ etfs ,
(21)
ŷi,j = f (ui , vj , h) (20)
c2s : ehfc + r ≈ etfs , s2c : ehfs + r ≈ etfc ,
where (eh , r, et ) ∈ G , fc denotes the content feature (textual,
where f (·) is the function to map the representation of
visual), and fs denotes the structural feature. The loss
the interaction between the user-item pair as well as the
function of this module is
embedding of the user-item pair to a preference score. A
common selection for f (·) is a fully-connected layer. LKG = Lc2c + Ls2s + Lc2s + Ls2c
For instance, Hu et al. [1] proposed MCRec, which 1X (22)
= kh + r − tk, i ∈ {c2c, s2s, c2s, s2c}.
learns the explicit representations of meta-paths to depict 4 i
the interaction context of user-item pairs. For each ui and
This term can refine features of entities under the structural
vj , MCRec first uses a look up layer to embed the user-
constraint of the KG. In this way, more accurate recommen-
item pair. Next, it defines L meta-path that connects ui
dation can be generated.
and vj and samples K path instances for each meta-path.
These path instances are embedded with CNN to obtain the Recently, Xian et al. [85] proposed Policy-Guided Path
representations of each path instance hp . Then, meta-path Reasoning (PGPR) to use reinforcement learning (RL) to
embeddings are calculated by applying the max-pooling search for reasonable paths between user-item pairs. They
operation on embeddings of path instances that belong to formulated the recommendation problem as a Markov de-
each type of meta-path. These meta-path embeddings are cision process to find a reasonable path connecting the
aggregated to obtain the final interaction embedding h via user-item pair in the KG. They trained an agent to sample
10

paths between users and items by carefully designing the well as multi-hop neighbors of these interacted items. The
path searching algorithm, the transition strategy, terminal process of learning user representation ui can be written in
conditions, and RL rewards. In the prediction phase, PGPR a general form as
can generate recommended items for users with specific n oH 
paths to interpret the reasoning process. Later, Song et ui = gu Suki , (23)
k=1
al. [87] proposed a similar model, EKar*, which adopts the
RL technique in generating recommendation as well. where gu (·) is a function to concatenate embeddings of
Summary for Path-based Methods. Path-based methods multi-hop entities with bias. Since the propagation starts
generate recommendations based on user-item graphs, and from the user’s engaged items, this process can be regarded
such methods have also been called HIN-based recom- as propagating the user’s preference in the graph.
mendation in the past. Traditional path-based methods [3], Wang et al. [14] proposed RippleNet, which is the first
[75], [76], [77], [78], [79], [82] generally integrate MF with work to introduce the concept of preference propagation.
extracted meta-paths in HINs. These methods utilize path Specifically, RippleNet first assigns entities in the KG with
connectivity to regularize or enrich the user and/or item initial embeddings. Then it samples ripple sets Suki (k =
representation. The disadvantage of these methods is that 1, 2, · · · , H) from the KG. To refine the user representa-
they commonly need domain knowledge to define the type tion, the aggregation process can be illustrated as follows.
and number of meta-paths. RuleRec [84] tries to overcome Starting from Su1i , every head entity interacts with the
the limitation by exploiting rules in an external KG in an embedding of the candidate item vj in turn via
automatic fashion. With the development of deep learning  
techniques, different models [1], [81], [83], [85], [86], [87] exp vjT Ri ehi
have been proposed to encode the path embedding ex- pi = P  , (24)
T
plicitly. Recommendation can be generated with the path 1 exp vj Rk ehk
(eh ,rk ,et )∈Su
k k i
embeddings, or by discovering the most salient paths that
d×d
connect user-item pairs. where Ri ∈ R represents the embedding of relation ri ,
Path-based methods naturally bring interpretability into and ehi ∈ Rd is the embedding of head entity in the ripple
the recommendation process. For traditional path-based set. During this process, the similarities of the candidate
methods, the motivation is to match the similarity of the item vj and head entities are calculated in the relation space.
item or user on the meta-path level. The recommendation Then, the user’s 1-order response of historical interaction
results can find a reference from the pre-defined meta- can be calculated via
paths. RuleRec utilize an external KG to generate rules for
X
o1ui = pi eti , (25)
recommendation. Since the rule and corresponding weight
( hi i ti ) ui
e ,r ,e ∈S 1
are explicit, the reason for recommendation is also avail-
able to users. More recent works take advantage of deep where eti represents the embedding of the tail entity in the
learning models to mine salient paths for a user-item pair ripple set. The user’s h-order (h = 2, 3, · · · , H) response
automatically, which reflects the recommendation process ohui can be obtained by replacing vj with the (h − 1)-
in the graph. order response ouh−1 in Equation 24, then interacting with
head entities in h-hop ripple set Suh iteratively. The final
representation of ui can be obtained with the equation of
4.3 Unified Methods
ui = o1ui + o2ui + · · · + oH ui . Finally, the preference score can
As discussed in Section 4.1 and Section 4.2, embedding- be generated with
based methods leverage the semantic representation of  
users/items in the KG for recommendation, while path- ŷi,j = σ ui T vj , (26)
based methods use the semantic connectivity information,
and both approaches utilize only one aspect of information where σ(x) is the sigmoid function. In this way, RippleNet
in the graph. To fully exploit the information in the KG for propagates the user’s preference from historical interests
better recommendations, unified methods which integrate along the path in the KG.
both the semantic representation of entities and relations, Similar to RippleNet, Tang et al. [92] proposed AKUPM,
and the connectivity information have been proposed. The which models users with their click history. AKUPM first
unified method is based on the idea of embedding propa- applies TransR for the entity representation. During each
gation. These methods refine the entity representation with propagation process, AKUPM learns the relations between
the guidance of the connective structure in the KG. After entities with a self-attention layer and propagates the user’s
obtaining the enriched representations of user ui and/or preference toward different entities with bias. Finally, em-
the potential item vj , the user’s preference can be predicted beddings from different-order neighbors of interacted items
with Equation 1. are aggregated with the self-attention mechanism to obtain
The first group of works refine the user’s representation the final user representation. Later, Li et al. [95] extended the
from their interaction history. These works first extract AKUPM and designed RCoLM. RCoLM jointly trains the
multi-hop ripple sets Suki (k = 1, 2, · · · , H) (defined in KG completion module and the recommendation module,
Section 3), where Su1i is the triple set (eh , r, et ) in the graph where AKUPM serves as the backbone. With the assumption
with the head entities being the user ui ’s engaged items. The that an item should have the same latent representation in
general idea of this method is to learn the user embedding the two modules, RCoLm unifies two modules and facili-
by utilizing the embeddings of past interacted items as tates their mutual enhancement. Thus, RCoLM outperforms
11

the AKUPM model. vj inwardly. After this feature propagation process, the
The second group of works focus on refining the item final representation of item vj is a mixture of its initial
representation vj by aggregating embeddings of an item’s representation and information from multi-hop neighbors.
multi-hop neighbors Nvk (k = 1, 2, · · · , H). A general de- RippleNet and KGCN are two similar frameworks, the for-
scription for this process is mer models users by propagating the user’s preference from
n o
H
 historical interests outwardly, while the latter learns item
k
vj = gv Svj , (27) representations from distant neighbors inwardly. Moreover,
k=1
KGCN leverages the idea of GCN by sampling a fixed
where Svkj is the ripple set of candidate item vj , and gv (·) number of neighbors as the receptive field, which makes
is the function to concatenate embeddings of multi-hop the learning process highly efficient and scalable. Recently,
neighbors. There are two steps to concatenate the embed- Wang et al. [91] proposed a follow-up approach, KGCN-LS,
dings of multi-hop neighbors. The first step is to learn a which further adds a label smoothness (LS) mechanism on
representation of candidate item vj ’s k -hop neighbors, the KGCN model. The LS mechanism takes the information
X of user interaction and propagates the user interaction labels
eSvk = α(eh ,r,et ) et , (28) on the KG, which is able to guide the learning process and
j
(eh ,r,et )∈Svk obtain a comprehensive representation for the candidate
j

item vj .
where α(eh ,r,et ) denotes the importance of different neigh-
bors. Then for eh ∈ Svkj , the representation can be updated RippleNet and its extension focus on using the embed-
by ding propagation mechanism on the item KG. Recently,
some papers have explored the propagation mechanism
 
eh = agg eh , eSvk , (29)
j in the user-item graph. Wang et al. [90] proposed KGAT,
where agg is the aggregation operator. During this process, which directly models the high order relations between
the information of k -hop neighbors is aggregated with that users and items with embedding propagation. KGAT first
of (k − 1)-hop neighbors. Four types of aggregators are applies TransR to obtain the initial representation for en-
commonly used: tities. Then, it runs the entity propagation from the entity
• Sum Aggregator. The sum aggregator sums two represen- itself outwardly. During the outward propagation process,
tations, followed by a nonlinear transformation. information from the entity ei will be interacted with the
    multi-hop neighbors iteratively. The Equation 29 can be
aggsum = Φ W · eh + eSvk + b . (30) modified as
j
 
ek+1
i = agg eki , eSek+1 , k = 0, 1, · · · , H − 1, (34)
i
• Concat Aggregator. The concat aggregator concatenates
0
two representations, then applies a nonlinear transforma- where ei represents the initial presentation of the entity,
tion.     and ei k contains the connectivity information from k -hop
aggconcat = Φ W · eh ⊕ eSvk + b . (31) neighbors. These H embeddings ei k are aggregated with
j
bias to form the final representation ei ∗ . In this way, both
the user representation and the item representation can be
• Neighbor Aggregator. The neighbor aggregator directly enriched with corresponding neighbors. The user preference
replaces the representation of an entity with representations is modeled via ŷu,v = e∗T ∗ ∗ ∗
u ev , where eu and ev stands for the
from neighbors. final representation of the user u and item v , respectively.
 
aggneighbor = Φ W · eSvk + b . (32) Qu et al. [93] proposed KNI, which further considers
j
the interaction between item-side neighbors and user-side
neighbors, so that the refinement process of user embed-
• Bi-Interaction Aggregator. The bi-interaction aggrega- dings and item embeddings are not separated. Zhao et
tor considers both the sum and the element-wise product al. [94] proposed IntentGC, which exploits rich user-related
relations between entities. The second term allows more behaviors in the graph for better recommendation. They also
information to be passed from similar entities. designed a faster graph convolutional network to guarantee
the scalability of IntentGC. Recently, Sha et al. [96] proposed
   
aggBi-Interaction =Φ W · eh + eSvk + b +
  j
  (33) AKGE, which learns the representation of user ui and can-
Φ W · eh eSvk + b . didate item vj by propagating information in a subgraph of
j
this user-item pair. AKGE first pre-trains the embeddings of
Wang et al. [89] proposed KGCN which models the final entities in the graph with TransR, then samples several paths
representation of a candidate item vj by aggregating the connecting ui and vj based on the pairwise distance in these
embedding of entities in the KG from distant neighbors of vj paths, which forms a subgraph for ui and vj . Next, AKGE
to vj itself. KGCN first samples neighbors of the candidate uses an attention-based GNN in this subgraph to propagate
item vj in the KG, and it iteratively samples neighbors with the information from neighbors for the final representation
a fixed number for each entity. Starting from the H -hop of this user-item pair. The construction of the subgraph fil-
neighbors, it updates the representation of inner entities by ters out less related entities in the graph, facilitating mining
replacing k = H, H − 1, · · · , 1 iteratively in Equation 29. high-order user-item relations for recommendation.
During the aggregation process, the information of multi- Summary for Unified Methods. Unified methods benefit
hop neighbors can be propagated to the candidate item from both the semantic embedding of the KG and semantic
12

path patterns. These methods leverage the idea of embed- is crawled from Douban [112], a popular Chinese social
ding propagation to refine the representation of the item media network. The dataset includes the social relation
or user with multi-hop neighbors in the KG. These works among users and the attributes of users and movies.
generally adopt a GNN-based architecture that naturally fits There are different ways to construct the movie-related
the process of embedding propagation, and such methods KG for recommendation. Some papers [2], [14], [44], [45],
have been a new research trend since the RippleNet [14] was [69], [70], [73], [88], [89], [91], [92], [93], [95] construct
proposed in 2018. Unified methods inherit interpretability the movie-centric item graph to enrich the information of
from path-based methods. The propagation process can be movies by extracting movies and related attributes from
treated as discovering user’s preference patterns in the KG, Satori, DBpedia, Freebase, CN-DBPedia, or IMDB [113]. In
which is similar to finding connectivity patterns in path- this way, movies are connected via attributes, including gen-
based methods. res, countries, actors, directors, etc. This item graph serves
as side information to facilitate the collaborative filtering
module. Another approach is to directly take the user’s
4.4 Summary
rating as one type of relation and introduce the user to the
Embedding-based methods preprocess the KG, either item graph. Some papers [1], [79], [82] build the user-item graph
graph or user-item graph, with KGE methods to obtain by directly leveraging the interaction data and attributes of
the embedding of entities and relations, which is further movies inside the MovieLens dataset or the DoubanMovie
integrated into the recommendation framework. However, dataset, while others [66], [75], [76], [77], [80], [81], [83], [86],
the informative connectivity patterns in the graph are ig- [87], [96] still utilize external database to enrich the movie-
nored in this approach and few works can provide the side information.
recommendation results with reasons. Path-based methods • Book. Book recommendation is another popular
utilize the user-item graph to discover path-level similarity task. There are five commonly used datasets: Book-
for items, either by predefining meta-paths or mining con- Crossing [114], Amazon-Book [115], DoubanBook, DB-
nective patterns automatically. The path-based approach can book2014, and IntentBooks [116]. Book-Crossing, DB-
also provide users with an explanation for the result. A re- book2014, IntentBooks, and Amazon-Book contain binary
cent research trend is to unify the embedding-based method feedback between users and books, and the KG for each
and the path-based method to fully exploit information from dataset is built by mapping books to corresponding entities
both sides. Moreover, unified methods also have the ability in Satori [2], [14], [45], [88], [89], [91], [92], [93], [95], DBpe-
to explain the recommendation process. dia [70], [87], or Freebase [44], [90], [93]. The DoubanBook
dataset is crawled from Douban [117], which contains both
the user-item interaction data and books attributes, such as
5 DATASETS OF R ECOMMENDER S YSTEMS WITH
information about the author, publisher, and the year of
K NOWLEDGE G RAPH publication. This work [82] builds the user-item graph by
Besides the benefit of accuracy and interpretability, another utilizing this knowledge in the DoubanBook dataset without
advantage of KG-based recommendation is that this type the assistance of an external KG.
of side information can be naturally incorporated into rec- • Music. Last.FM [118] is the most popular dataset for music
ommender systems for different applications. To show the recommendation. The dataset contains information about
effectiveness of the KG as side information, KG-based rec- users and their music listening records from the Last.fm
ommender systems have been evaluated on datasets under online music system [119]. Some papers [44], [45], [89], [90],
different scenarios. In this section, we categorized these [91] construct the item graph by extracting music-related
works based on the dataset and illustrate the difference subgraphs from Freebase or Satori. Some papers [87], [96]
among these scenarios. The contributions of this section are build the user-item graph with knowledge from Freebase or
two-fold. First, we provide an overview of datasets used un- Satori, while this paper [1] build the user-item graph from
der various scenarios. Second, we illustrate how knowledge the Last.FM dataset directly. Another popular dataset is the
graphs are constructed for different recommendation tasks. KKBox dataset, which was released by the WSDM Cup 2018
This section can help researchers find suitable datasets to Challenge [120]. This dataset contains both the user-item
test their recommender systems. interaction data and the description of the music. Paper [73]
We group KG based recommender systems according to builds the item graph and [83] builds the user-item graph
the datasets which are summarized in Table 4. Generally, from this dataset without leveraging any external databases.
these works can be categorized into seven application sce- • Product. The most popular dataset for the product rec-
narios and we will illustrate how different works construct ommendation task is the Amazon Product dataset [115].
the KG with each dataset. This dataset includes multiple types of item and user infor-
• Movie. In this task, the recommender system needs to mation, such as interaction records, user reviews, product
infer the user’s preference based on movies watched in categories, product descriptions, and user behaviors. These
the past. Two datasets are most commonly used: Movie- works [3], [13], [67], [85], [94] build a user-item graph
Lens [110] and DoubanMovie. MovieLens maintains a set with this dataset alone, and [84] build the item graph by
of datasets collected from the MovieLens website [111], enriching the item information with the external Freebase
among which three stable benchmark datasets with differ- database. There are also some papers [74], [94] use the data
ent rating numbers, MovieLens-100K, MovieLens-1M, and provided by Alibaba Taobao.
MovieLens-20M are most commonly used. Each dataset con- • POI. Point of Interest (POI) recommendation is the rec-
tains ratings, the movie’s attributes and tags. DoubanMovie ommendation of new businesses and activities (restaurants,
13

TABLE 4
A collection of datasets for different application scenarios and corresponding papers.

Scenario Dataset Paper


Movie MovieLens-100K [1], [73], [75], [76], [77], [80]
MovieLens-1M [2], [14], [44], [45], [66], [70], [81], [83], [87], [92], [93], [95], [96]
MovieLens-20M [44], [86], [88], [89], [91], [93]
DoubanMovie [69], [79], [82]
Book DBbook2014 [70], [87]
Book-Crossing [14], [45], [88], [89], [91], [92], [93], [95]
Amazon-Book [44], [90], [93]
IntentBooks [2]
DoubanBook [82]
News Bing-News [14], [45], [48], [88]
Product Amazon Product data [3], [13], [67], [84], [85], [94]
Alibaba Taobao [74], [94]
POI Yelp challenge [1], [3], [76], [77], [79], [80], [81], [82], [90], [96]
Dianping-Food [91]
CEM [71]
Music Last.FM [1], [44], [45], [87], [89], [90], [91], [96]
KKBox [73], [83]
Social Platform Weibo [68]
DBLP [78]
MeetUp [78]

museums, parks, cities, etc.) to users based on their histori- is to recommend offline meetings for users on a social
cal check-in data. The most popular dataset is the Yelp Chal- website, MeetUp [125], with data on that platform. The
lenge [121], which contains the information of businesses, last application lies in the academic domain, to recommend
users, check-ins, and reviews. These papers [1], [3], [76], [77], conferences to researchers with the DBLP data [126].
[79], [80], [81], [82], [96] build a user-item graph with the
data of check-ins, reviews and the attributes in the dataset, 6 F UTURE D IRECTIONS
while [90] construct the item graph. Paper [71] utilizes the In the above sections, we have demonstrated the advan-
CEM dataset1 to recommend next trip. Another work [91] tage of KG-based recommender systems from the aspects
uses the Dianping-Food dataset, which is provided by Di- of more accurate recommendation and explainability. Al-
anping.com [122] for restaurant recommendation. though many novel models have been proposed to utilize
• News. News recommendation is challenging [48] because the KG as side information for recommendation, some fur-
the news itself is time-sensitive, and the content is highly ther opportunities still exist. In this section, we outline and
condensed, which requires commonsense to understand. discuss some prospective research directions.
Moreover, people are topic-sensitive in choosing news to • Dynamic Recommendation. Although KG-based recom-
read and may prefer news from various domains. Tradi- mender systems with GNN or GCN architectures have
tional news recommendation models fail to discover the achieved good performance, the training process is time-
high level connection among the news. Therefore, KGs are consuming. Thus such models can be regarded as static
introduced into this scenario [14], [45], [48], [88] to find the preference recommendation. However, in some scenarios,
logical relations between different news and improve the such as online shopping, news recommendation, Twitter,
precision of recommendation. The most popular dataset is and forums, a user’s interest can be influenced by social
Bing-News, collected from server logs of Bing News [123], events or friends very quickly. In this case, recommenda-
which contains the user click information, news title, etc. To tion with a static preference modeling may not be enough
build a KG for news recommendation, the first step is to to understand real-time interests. In order to capture dy-
extract entities in the title. Then, subgraphs are constructed namic preference, leveraging the dynamic graph network
by extracting neighbors of these entities in Satori. can be a solution. Recently, Song et al. [127] designed
• Social Platform. This task is to recommend potentially a dynamic-graph-attention network to capture the user’s
interested people or meetings to users in the community. rapidly-changing interests by incorporating long term and
One application is to recommend unfollowed celebrities to short term interests from friends. It is natural to integrate
users on the social platform Weibo [124] with the collected other types of side information and build a KG for dynamic
Weibo tweets data [68]. Despite the user-item graph to recommendation by following such an approach.
represent sentiment links between users and celebrities, an • Multi-task Learning. KG-based recommender systems
item graph with knowledge extracted from the Satori is built can be naturally regarded as link prediction in the graph.
to enrich the information of celebrities. Another application Therefore, considering the nature of the KG has the potential
to improve the performance of graph-based recommenda-
1. an Amadeus database containing bookings over a dozen of airlines tion. For example, there may exist missing facts in the KG,
14

which leads to missing relations or entities. However, the of different KGE methods under various conditions.
user’s preference may be ignored because these facts are • User Side Information. Currently, most KG-based rec-
missing, which can deteriorate the recommendation results. ommender systems build the graph by incorporating item
[70], [95] have shown it is effective to jointly train the KG side information, while few models consider user side
completion module and recommendation module for bet- information. However, user side information, such as the
ter recommendation. Other works have utilized multi-task user network, and user’s demographic information, can also
learning by jointly training the recommendation module be naturally integrated into the framework of current KG-
with the KGE task [45] and item relation regulation task [73]. based recommender systems. Recently, Fan et al. [132] used
It would be interesting to exploit transferring knowledge the GNN to represent a user-user social network and a
from other KG-related tasks, such as entity classification and user-item interaction graph separately, which outperforms
resolution, for better recommendation performance. traditional CF-based recommender systems with user social
• Cross-Domain Recommendation. Recently, works on information. A recent paper in our survey [96] integrated
cross-domain recommendation have appeared. The motiva- user relations into the graph and showed the effectiveness
tion is that interaction data is not equal across domains. For of this strategy. Therefore, considering user side information
example, on the Amazon platform, book ratings are denser in the KG could be another research direction.
than other domains. With the transfer learning technique,
interaction data from the source domain with relatively
rich data can be shared for better recommendation in the 7 C ONCLUSION
target domains. Zhang et al. [128] proposed a matrix-based
In this survey paper, we investigate KG-based recommender
method for cross-domain recommendation. Later, Zhao et
systems and summarize the recent efforts in this domain.
al. [129] introduced PPGN, which puts users and products
This survey illustrates how different approaches utilize the
from different domains in one graph, and leverages the user-
KG as side information to improve the recommendation
item interaction graph for cross-domain recommendation.
result as well as providing interpretability in the recommen-
Although PPGN outperforms SOTA significantly, the user-
dation process. Moreover, an introduction to datasets used
item graph contains only interaction relations, and does not
in different scenarios is provided. Finally, future research
consider other relationships among users and items. It could
directions are identified, hoping to promote development
be promising to follow works in this survey, by incorpo-
in this field. KG-based recommender systems are promising
rating different types of user and item side information
for accurate recommendation and explainable recommenda-
in the user-item interaction graph for better cross-domain
tion, benefitting from the fruitful information contained in
recommendation performance.
the KGs. We hope this survey paper can help readers better
• Knowledge Enhanced Language Representation. To im-
understand work in this area.
prove the performance of various natural language process-
ing tasks, there is a trend to integrate external knowledge
into the language representation model. The knowledge ACKNOWLEDGMENTS
representation and the text representation can be refined
mutually. For example, Chen et al. [130] proposed the The research work supported by the National Key Re-
STCKA for short text classification, which utilizes the prior search and Development Program of China under Grant No.
knowledge from KGs, such as YAGO, to enrich the semantic 2018YFB1004300, the National Natural Science Foundation
representation of short texts. Zhang et al. [131] proposed of China under Grant No. U1836206, U1811461, 61773361,
the ERNIE, which incorporates knowledge from Wikidata to the Project of Youth Innovation Promotion Association CAS
enhance the language representation, and such an approach under Grant No. 2017146.
has proven to be effective in the task of relation classifica-
tion. Although the DKN model [48] utilizes both the text
embedding and the entity embedding in the news, these R EFERENCES
two types of embeddings are simply concatenated to obtain [1] B. Hu, C. Shi, W. X. Zhao, and P. S. Yu, “Leveraging meta-
the final representation of news, instead of considering the path based context for top-n recommendation with a neural
information fusion between two vectors. Therefore, it is co-attention model,” in Proceedings of the 24th ACM SIGKDD
promising to apply the strategy of knowledge-enhanced text International Conference on Knowledge Discovery & Data Mining.
ACM, 2018, pp. 1531–1540.
representation in the news recommendation task and other [2] F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W.-Y. Ma, “Collabora-
text-based recommendation tasks for better representation tive knowledge base embedding for recommender systems,” in
learning to achieve more accurate recommendation results. Proceedings of the 22Nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, ser. KDD ’16. New York,
• Knowledge Graph Embedding Method. There are two NY, USA: ACM, 2016, pp. 353–362.
types of KGE methods, translation distance models and se- [3] H. Zhao, Q. Yao, J. Li, Y. Song, and D. L. Lee, “Meta-graph
mantic matching models, based on the different constraints. based recommendation fusion over heterogeneous information
In this survey, these two types of KGE methods are used networks,” in Proceedings of the 23rd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining. ACM, 2017,
in all three kinds of KG-based recommender systems and pp. 635–644.
recommendation tasks. However, there is no comprehensive [4] G. Adomavicius and A. Tuzhilin, “Toward the next generation
work to suggest under which circumstances, including data of recommender systems: A survey of the state-of-the-art and
possible extensions,” IEEE transactions on knowledge and data
sources, recommendation scenarios, and model architec-
engineering, vol. 17, no. 6, pp. 734–749, 2005.
tures, should a specific KGE method be adopted. Therefore, [5] X. Su and T. M. Khoshgoftaar, “A survey of collaborative filtering
another research direction lies in comparing the advantages techniques,” Advances in artificial intelligence, vol. 2009, 2009.
15

[6] Z. Sun, Q. Guo, J. Yang, H. Fang, G. Guo, J. Zhang, and R. Burke, International World Wide Web Conferences Steering Committee,
“Research commentary on recommendations with side infor- 2016, pp. 1419–1428.
mation: A survey and research directions,” Electronic Commerce [27] M. Färber and A. Rettinger, “Which knowledge graph is best for
Research and Applications, vol. 37, p. 100879, 2019. me?” arXiv preprint arXiv:1809.11099, 2018.
[7] S. Sen, J. Vig, and J. Riedl, “Tagommenders: connecting users [28] Google, “Freebase data dumps,” 2013, https://developers.
to items through tags,” in Proceedings of the 18th international google.com/freebase/data.
conference on World wide web. ACM, 2009, pp. 671–680. [29] G. A. Miller, WordNet: An electronic lexical database. MIT press,
[8] Y. Zhen, W.-J. Li, and D.-Y. Yeung, “Tagicofi: tag informed col- 1998.
laborative filtering,” in Proceedings of the third ACM conference on [30] “The geonames geographical database,” 2006, http://www.
Recommender systems. ACM, 2009, pp. 69–76. geonames.org/.
[9] L. Zheng, V. Noroozi, and P. S. Yu, “Joint deep modeling of users [31] R. Qian, “Understand your world with bing,” Bing search blog,
and items using reviews for recommendation,” in Proceedings of Mar, 2013.
the Tenth ACM International Conference on Web Search and Data [32] H. Paulheim, “Knowledge graph refinement: A survey of ap-
Mining. ACM, 2017, pp. 425–434. proaches and evaluation methods,” Semantic web, vol. 8, no. 3,
[10] Y. Xu, Y. Yang, J. Han, E. Wang, F. Zhuang, and H. Xiong, pp. 489–508, 2017.
“Exploiting the sentimental bias between ratings and reviews for [33] B. Xu, Y. Xu, J. Liang, C. Xie, B. Liang, W. Cui, and Y. Xiao, “Cn-
enhancing recommendation,” in 2018 IEEE International Confer- dbpedia: A never-ending chinese knowledge extraction system,”
ence on Data Mining (ICDM). IEEE, 2018, pp. 1356–1361. in International Conference on Industrial, Engineering and Other
[11] P. Massa and P. Avesani, “Trust-aware recommender systems,” Applications of Applied Intelligent Systems. Springer, 2017, pp.
in Proceedings of the 2007 ACM conference on Recommender systems. 428–438.
ACM, 2007, pp. 17–24. [34] “Wikipedia,” 2001, http://www.wikipedia.org/.
[12] M. Jamali and M. Ester, “Trustwalker: a random walk model [35] “Nndb,” 2007, https://www.nndb.com/.
for combining trust-based and item-based recommendation,” in [36] “The fashion model directory,” 2000, http://www.
Proceedings of the 15th ACM SIGKDD international conference on fashionmodeldirectory.com/.
Knowledge discovery and data mining. ACM, 2009, pp. 397–406. [37] “Musicbrainz,” 2000, https://musicbrainz.org/.
[13] Y. Zhang, Q. Ai, X. Chen, and P. Wang, “Learning [38] “Baidu baike,” 2006, https://baike.baidu.com/.
over knowledge-base embeddings for recommendation,” arXiv [39] “Hudong baike,” 2005, http://www.baike.com/.
preprint arXiv:1803.06540, 2018. [40] “Wikidata,” 2012, http://www.wikidata.org/.
[14] H. Wang, F. Zhang, J. Wang, M. Zhao, W. Li, X. Xie, and M. Guo, [41] “Under the hood: The entities graph,” 2013, https:
“Ripplenet: Propagating user preferences on the knowledge //www.facebook.com/notes/facebookengineering/
graph for recommender systems,” in Proceedings of the 27th ACM under-the-hood-the-entitiesgraph/10151490531588920/.
International Conference on Information and Knowledge Management. [42] “Facebook,” 2004, https://www.facebook.com/.
ACM, 2018, pp. 417–426.
[43] P. Ernst, C. Meng, A. Siu, and G. Weikum, “Knowlife: a knowl-
[15] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Free- edge graph for health and life sciences,” in 2014 IEEE 30th
base: a collaboratively created graph database for structuring International Conference on Data Engineering. IEEE, 2014, pp. 1254–
human knowledge,” in Proceedings of the 2008 ACM SIGMOD 1257.
international conference on Management of data. AcM, 2008, pp.
[44] J. Huang, W. X. Zhao, H. Dou, J.-R. Wen, and E. Y. Chang, “Im-
1247–1250.
proving sequential recommendation with knowledge-enhanced
[16] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. memory networks,” in The 41st International ACM SIGIR Confer-
Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Auer et al., ence on Research & Development in Information Retrieval. ACM,
“Dbpedia–a large-scale, multilingual knowledge base extracted 2018, pp. 505–514.
from wikipedia,” Semantic Web, vol. 6, no. 2, pp. 167–195, 2015.
[45] H. Wang, F. Zhang, M. Zhao, W. Li, X. Xie, and M. Guo, “Multi-
[17] F. M. Suchanek, G. Kasneci, and G. Weikum, “Yago: a core task feature learning for knowledge graph enhanced recommen-
of semantic knowledge,” in Proceedings of the 16th international dation,” in The World Wide Web Conference, ser. WWW ’19. New
conference on World Wide Web. ACM, 2007, pp. 697–706. York, NY, USA: ACM, 2019, pp. 2000–2010.
[18] A. Singhal, “Introducing the knowledge graph: things, [46] D. Xi, F. Zhuang, Y. Liu, J. Gu, H. Xiong, and Q. He, “Modelling
not strings,” 2012, https://googleblog.blogspot.com/2012/05/ of bi-directional spatio-temporal dependence and users dynamic
introducing-knowledge-graph-things-not.html. preferences for missing poi check-in identification,” in Proceedings
[19] L. Ehrlinger and W. Wöß, “Towards a definition of knowledge of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp.
graphs.” SEMANTiCS (Posters, Demos, SuCCESS), vol. 48, 2016. 5458–5465.
[20] J. M. Gomez-Perez, J. Z. Pan, G. Vetere, and H. Wu, “Enterprise [47] P. Zhao, H. Zhu, Y. Liu, J. Xu, Z. Li, F. Zhuang, V. S. Sheng, and
knowledge graph: An introduction,” in Exploiting linked data and X. Zhou, “Where to go next: A spatio-temporal gated network for
knowledge graphs in large organisations. Springer, 2017, pp. 1–14. next poi recommendation,” in Proceedings of the AAAI Conference
[21] S. Nurdiati and C. Hoede, “25 years development of knowledge on Artificial Intelligence, vol. 33, 2019, pp. 5877–5884.
graph theory: the results and the challenge,” Memorandum, vol. [48] H. Wang, F. Zhang, X. Xie, and M. Guo, “Dkn: Deep knowledge-
1876, 2008. aware network for news recommendation,” in Proceedings of the
[22] X. Huang, J. Zhang, D. Li, and P. Li, “Knowledge graph embed- 2018 World Wide Web Conference, ser. WWW ’18. Republic and
ding based question answering,” in Proceedings of the Twelfth ACM Canton of Geneva, Switzerland: International World Wide Web
International Conference on Web Search and Data Mining. ACM, Conferences Steering Committee, 2018, pp. 1835–1844.
2019, pp. 105–113. [49] Z. Huang, Q. Liu, C. Zhai, Y. Yin, E. Chen, W. Gao, and G. Hu,
[23] D. Hakkani-Tür, A. Celikyilmaz, L. Heck, G. Tur, and G. Zweig, “Exploring multi-objective exercise recommendations in online
“Probabilistic enrichment of knowledge graph entities for re- education systems,” in Proceedings of the 28th ACM International
lation detection in conversational understanding,” in Fifteenth Conference on Information and Knowledge Management, 2019, pp.
Annual Conference of the International Speech Communication As- 1261–1270.
sociation, 2014. [50] C. Qin, H. Zhu, C. Zhu, T. Xu, F. Zhuang, C. Ma, J. Zhang,
[24] A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr, and T. M. and H. Xiong, “Duerquiz: A personalized question recommender
Mitchell, “Coupled semi-supervised learning for information ex- system for intelligent job interview,” in Proceedings of the 25th
traction,” in Proceedings of the third ACM international conference ACM SIGKDD International Conference on Knowledge Discovery &
on Web search and data mining. ACM, 2010, pp. 101–110. Data Mining. ACM, 2019, pp. 2165–2173.
[25] F. Belleau, M.-A. Nolin, N. Tourigny, P. Rigault, and J. Morissette, [51] X. Amatriain, J. M. Pujol, and N. Oliver, “I like it... i like it
“Bio2rdf: towards a mashup to build bioinformatics knowledge not: Evaluating user ratings noise in recommender systems,” in
systems,” Journal of biomedical informatics, vol. 41, no. 5, pp. 706– International Conference on User Modeling, Adaptation, and Person-
716, 2008. alization. Springer, 2009, pp. 247–258.
[26] T. Pellissier Tanon, D. Vrandečić, S. Schaffert, T. Steiner, and [52] G. Jawaheer, M. Szomszor, and P. Kostkova, “Comparison of
L. Pintscher, “From freebase to wikidata: The great migration,” implicit and explicit feedback from an online music recommen-
in Proceedings of the 25th international conference on world wide web. dation service,” in proceedings of the 1st international workshop on
16

information heterogeneity and fusion in recommender systems, 2010, ion Proceedings of The 2019 World Wide Web Conference. ACM,
pp. 47–51. 2019, pp. 690–699.
[53] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative filtering for [73] X. Xin, X. He, Y. Zhang, Y. Zhang, and J. Jose, “Relational col-
implicit feedback datasets,” in 2008 Eighth IEEE International laborative filtering: Modeling multiple item relations for recom-
Conference on Data Mining. Ieee, 2008, pp. 263–272. mendation,” in Proceedings of the 42Nd International ACM SIGIR
[54] C. Wang, H. Zhu, C. Zhu, C. Qin, and H. Xiong, “Setrank: A Conference on Research and Development in Information Retrieval, ser.
setwise bayesian approach for collaborative ranking from im- SIGIR’19. New York, NY, USA: ACM, 2019, pp. 125–134.
plicit feedback,” in Proceedings of the AAAI Conference on Artificial [74] Y. Ye, X. Wang, J. Yao, K. Jia, J. Zhou, Y. Xiao, and H. Yang, “Bayes
Intelligence, 2020. embedding (bem): Refining representation by integrating knowl-
[55] R. Salakhutdinov and A. Mnih, “Bayesian probabilistic matrix edge graphs and behavior-specific networks,” in Proceedings of the
factorization using markov chain monte carlo,” in Proceedings of 28th ACM International Conference on Information and Knowledge
the 25th international conference on Machine learning, 2008, pp. 880– Management. ACM, 2019, pp. 679–688.
887. [75] X. Yu, X. Ren, Q. Gu, Y. Sun, and J. Han, “Collaborative filtering
[56] F. Zhuang, Z. Zhang, M. Qian, C. Shi, X. Xie, and Q. He, “Rep- with entity similarity regularization in heterogeneous informa-
resentation learning via dual-autoencoder for recommendation,” tion networks,” IJCAI HINA, vol. 27, 2013.
Neural Networks, vol. 90, pp. 83–89, 2017. [76] X. Yu, X. Ren, Y. Sun, B. Sturt, U. Khandelwal, Q. Gu, B. Norick,
[57] P. Lops, M. De Gemmis, and G. Semeraro, “Content-based rec- and J. Han, “Recommendation in heterogeneous information
ommender systems: State of the art and trends,” in Recommender networks with implicit user feedback,” in Proceedings of the 7th
systems handbook. Springer, 2011, pp. 73–105. ACM conference on Recommender systems. ACM, 2013, pp. 347–
[58] C.-N. Ziegler, G. Lausen, and L. Schmidt-Thieme, “Taxonomy- 350.
driven computation of product recommendations,” in Proceedings [77] X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick,
of the thirteenth ACM international conference on Information and and J. Han, “Personalized entity recommendation: A heteroge-
knowledge management. ACM, 2004, pp. 406–415. neous information network approach,” in Proceedings of the 7th
[59] J. Han, L. Zheng, Y. Xu, B. Zhang, F. Zhuang, S. Y. Philip, and ACM international conference on Web search and data mining. ACM,
W. Zuo, “Adaptive deep modeling of users and items using side 2014, pp. 283–292.
information for recommendation,” IEEE transactions on neural [78] C. Luo, W. Pang, Z. Wang, and C. Lin, “Hete-cf: Social-based
networks and learning systems, 2019. collaborative filtering recommendation using heterogeneous re-
[60] W.-T. Chu and Y.-L. Tsai, “A hybrid recommendation system con- lations,” in 2014 IEEE International Conference on Data Mining.
sidering visual information for predicting favorite restaurants,” IEEE, 2014, pp. 917–922.
World Wide Web, vol. 20, no. 6, pp. 1313–1331, 2017. [79] C. Shi, Z. Zhang, P. Luo, P. S. Yu, Y. Yue, and B. Wu, “Se-
[61] D. Liang, M. Zhan, and D. P. Ellis, “Content-aware collaborative mantic path based personalized recommendation on weighted
music recommendation using pre-trained neural networks.” in heterogeneous information networks,” in Proceedings of the 24th
ISMIR, 2015, pp. 295–301. ACM International on Conference on Information and Knowledge
[62] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua, “At- Management. ACM, 2015, pp. 453–462.
tentive collaborative filtering: Multimedia recommendation with [80] R. Catherine and W. Cohen, “Personalized recommendations
item-and component-level attention,” in Proceedings of the 40th using knowledge graphs: A probabilistic logic programming ap-
International ACM SIGIR conference on Research and Development in proach,” in Proceedings of the 10th ACM Conference on Recommender
Information Retrieval. ACM, 2017, pp. 335–344. Systems. ACM, 2016, pp. 325–332.
[63] Z. Gantner, L. Drumond, C. Freudenthaler, S. Rendle, and [81] Z. Sun, J. Yang, J. Zhang, A. Bozzon, L.-K. Huang, and C. Xu,
L. Schmidt-Thieme, “Learning attribute-to-feature mappings for “Recurrent knowledge graph embedding for effective recommen-
cold-start recommendations.” in ICDM, vol. 10. Citeseer, 2010, dation,” in Proceedings of the 12th ACM Conference on Recommender
pp. 176–185. Systems, ser. RecSys ’18. New York, NY, USA: ACM, 2018, pp.
[64] Y. Fang, W. Lin, V. W. Zheng, M. Wu, K. C.-C. Chang, and X.- 297–305.
L. Li, “Semantic proximity search on graphs with metagraph- [82] C. Shi, B. Hu, W. X. Zhao, and S. Y. Philip, “Heterogeneous
based learning,” in 2016 IEEE 32nd International Conference on information network embedding for recommendation,” IEEE
Data Engineering (ICDE). IEEE, 2016, pp. 277–288. Transactions on Knowledge and Data Engineering, vol. 31, no. 2, pp.
[65] H. Cai, V. W. Zheng, and K. C.-C. Chang, “A comprehensive 357–370, 2018.
survey of graph embedding: Problems, techniques, and appli- [83] X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T.-S. Chua, “Explain-
cations,” IEEE Transactions on Knowledge and Data Engineering, able reasoning over knowledge graphs for recommendation,” in
vol. 30, no. 9, pp. 1616–1637, 2018. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33,
[66] E. Palumbo, G. Rizzo, and R. Troncy, “Entity2rec: Learning user- 2019, pp. 5329–5336.
item relatedness from knowledge graphs for top-n item recom- [84] W. Ma, M. Zhang, Y. Cao, W. Jin, C. Wang, Y. Liu, S. Ma, and
mendation,” in Proceedings of the Eleventh ACM Conference on X. Ren, “Jointly learning explainable rules for recommendation
Recommender Systems. ACM, 2017, pp. 32–36. with knowledge graph,” in The World Wide Web Conference.
[67] Q. Ai, V. Azizi, X. Chen, and Y. Zhang, “Learning heterogeneous ACM, 2019, pp. 1210–1221.
knowledge base embeddings for explainable recommendation,” [85] Y. Xian, Z. Fu, S. Muthukrishnan, G. de Melo, and Y. Zhang,
Algorithms, vol. 11, no. 9, p. 137, 2018. “Reinforcement knowledge graph reasoning for explainable rec-
[68] H. Wang, F. Zhang, M. Hou, X. Xie, M. Guo, and Q. Liu, ommendation,” arXiv preprint arXiv:1906.05237, 2019.
“Shine: Signed heterogeneous information network embedding [86] X. Huang, Q. Fang, S. Qian, J. Sang, Y. Li, and C. Xu, “Explain-
for sentiment link prediction,” in Proceedings of the Eleventh ACM able interaction-driven user modeling over knowledge graph
International Conference on Web Search and Data Mining. ACM, for sequential recommendation,” in Proceedings of the 27th ACM
2018, pp. 592–600. International Conference on Multimedia. ACM, 2019, pp. 548–556.
[69] D. Yang, Z. Guo, Z. Wang, J. Jiang, Y. Xiao, and W. Wang, “A [87] W. Song, Z. Duan, Z. Yang, H. Zhu, M. Zhang, and J. Tang,
knowledge-enhanced deep recommendation framework incorpo- “Explainable knowledge graph-based recommendation via deep
rating gan-based models,” IEEE International Conference on Data reinforcement learning,” arXiv preprint arXiv:1906.09506, 2019.
Mining (ICDM), pp. 1368–1373, 2018. [88] H. Wang, F. Zhang, J. Wang, M. Zhao, W. Li, X. Xie, and M. Guo,
[70] Y. Cao, X. Wang, X. He, Z. Hu, and T.-S. Chua, “Unifying “Exploring high-order user preference on the knowledge graph
knowledge graph learning and recommendation: Towards a bet- for recommender systems,” ACM Transactions on Information Sys-
ter understanding of user preferences,” in The World Wide Web tems (TOIS), vol. 37, no. 3, p. 32, 2019.
Conference, ser. WWW ’19. New York, NY, USA: ACM, 2019, pp. [89] H. Wang, M. Zhao, X. Xie, W. Li, and M. Guo, “Knowledge graph
151–161. convolutional networks for recommender systems,” in The World
[71] A. Dadoun, R. Troncy, O. Ratier, and R. Petitti, “Location embed- Wide Web Conference, ser. WWW ’19. New York, NY, USA: ACM,
dings for next trip recommendation,” in Companion Proceedings of 2019, pp. 3307–3313.
The 2019 World Wide Web Conference. ACM, 2019, pp. 896–903. [90] X. Wang, X. He, Y. Cao, M. Liu, and T.-S. Chua, “Kgat: Knowledge
[72] K. Joseph and H. Jiang, “Content based news recommendation graph attention network for recommendation,” in Proceedings
via shortest entity distance over knowledge graphs,” in Compan- of the 25th ACM SIGKDD International Conference on Knowledge
17

Discovery & Data Mining, ser. KDD ’19. New York, NY, USA: [114] “Book-crossing dataset,” 2004, http://www2.informatik.
ACM, 2019, pp. 950–958. uni-freiburg.de/∼cziegler/BX/.
[91] H. Wang, F. Zhang, M. Zhang, J. Leskovec, M. Zhao, W. Li, [115] J. McAuley, C. Targett, Q. Shi, and A. Van Den Hengel, “Image-
and Z. Wang, “Knowledge-aware graph neural networks with based recommendations on styles and substitutes,” in Proceedings
label smoothness regularization for recommender systems,” in of the 38th International ACM SIGIR Conference on Research and
Proceedings of the 25th ACM SIGKDD International Conference on Development in Information Retrieval, 2015, pp. 43–52.
Knowledge Discovery & Data Mining, ser. KDD ’19. New York, [116] A. Uyar and F. M. Aliyu, “Evaluating search features of google
NY, USA: ACM, 2019, pp. 968–977. knowledge graph and bing satori: entity types, list searches and
[92] X. Tang, T. Wang, H. Yang, and H. Song, “Akupm: Attention- query interfaces,” Online Information Review, vol. 39, no. 2, pp.
enhanced knowledge-aware user preference model for recom- 197–213, 2015.
mendation,” in Proceedings of the 25th ACM SIGKDD International [117] “Douban book,” 2005, http://book.douban.com/.
Conference on Knowledge Discovery & Data Mining. ACM, 2019, [118] M. Schedl, “The lfm-1b dataset for music retrieval and rec-
pp. 1891–1899. ommendation,” in Proceedings of the 2016 ACM on International
[93] Y. Qu, T. Bai, W. Zhang, J. Nie, and J. Tang, “An end-to-end Conference on Multimedia Retrieval, 2016, pp. 103–110.
neighborhood-based interaction model forknowledge-enhanced [119] “Last.fm online music system,” 2002, http://www.last.fm/.
recommendation,” arXiv preprint arXiv:1908.04032, 2019. [120] “Kkbox dataset,” 2018, https://wsdm-cup-2018.kkbox.events/.
[94] J. Zhao, Z. Zhou, Z. Guan, W. Zhao, W. Ning, G. Qiu, and [121] “Yelp challenge dataset,” 2013, https://www.yelp.com/dataset/
X. He, “Intentgc: a scalable graph convolution framework fusing challenge/.
heterogeneous information for recommendation,” in Proceedings [122] “Dianping.com,” 2009, https://www.dianping.com/.
of the 25th ACM SIGKDD International Conference on Knowledge [123] “Bing news,” 2009, https://www.bing.com/news.
Discovery & Data Mining. ACM, 2019, pp. 2347–2357. [124] “Sina weibo,” 2009, http://weibo.com.
[95] Q. Li, X. Tang, T. Wang, H. Yang, and H. Song, “Unifying task- [125] “Meetup,” 2002, http://www.meetup.com/.
oriented knowledge graph learning and recommendation,” IEEE [126] “Dblp dataset,” 2013, https://dblp.uni-trier.de/xml/.
Access, vol. 7, pp. 115 816–115 828, 2019. [127] W. Song, Z. Xiao, Y. Wang, L. Charlin, M. Zhang, and J. Tang,
“Session-based social recommendation via dynamic graph atten-
[96] X. Sha, Z. Sun, and J. Zhang, “Attentive knowledge graph
tion networks,” in Proceedings of the Twelfth ACM International
embedding for personalized recommendation,” arXiv preprint
Conference on Web Search and Data Mining. ACM, 2019, pp. 555–
arXiv:1910.08288, 2019.
563.
[97] Y. Zhang and X. Chen, “Explainable recommendation: A survey [128] Q. Zhang, J. Lu, D. Wu, and G. Zhang, “A cross-domain rec-
and new perspectives,” arXiv preprint arXiv:1804.11192, 2018. ommender system with kernel-induced knowledge transfer for
[98] Q. Wang, Z. Mao, B. Wang, and L. Guo, “Knowledge graph overlapping entities,” IEEE transactions on neural networks and
embedding: A survey of approaches and applications,” IEEE learning systems, 2018.
Transactions on Knowledge and Data Engineering, vol. 29, no. 12, [129] C. Zhao, C. Li, and C. Fu, “Cross-domain recommendation via
pp. 2724–2743, 2017. preference propagation graphnet,” in Proceedings of the 28th ACM
[99] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and International Conference on Information and Knowledge Management.
O. Yakhnenko, “Translating embeddings for modeling multi- ACM, 2019, pp. 2165–2168.
relational data,” in Advances in neural information processing sys- [130] J. Chen, Y. Hu, J. Liu, Y. Xiao, and H. Jiang, “Deep short text
tems, 2013, pp. 2787–2795. classification with knowledge powered attention,” vol. 33, no. 01,
[100] Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge graph pp. 6252–6259, 2019.
embedding by translating on hyperplanes,” in Twenty-Eighth [131] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “Ernie:
AAAI conference on artificial intelligence, 2014. Enhanced language representation with informative entities,” pp.
[101] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity 1441–1451, 2019.
and relation embeddings for knowledge graph completion,” in [132] W. Fan, Y. Ma, Q. Li, Y. He, E. Zhao, J. Tang, and D. Yin, “Graph
Twenty-ninth AAAI conference on artificial intelligence, 2015. neural networks for social recommendation,” in The World Wide
[102] G. Ji, S. He, L. Xu, K. Liu, and J. Zhao, “Knowledge graph em- Web Conference. ACM, 2019, pp. 417–426.
bedding via dynamic mapping matrix,” in Proceedings of the 53rd
Annual Meeting of the Association for Computational Linguistics and
the 7th International Joint Conference on Natural Language Processing
(Volume 1: Long Papers), 2015, pp. 687–696.
[103] B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng, “Embedding
entities and relations for learning and inference in knowledge
bases,” arXiv preprint arXiv:1412.6575, 2014.
[104] Y. Kim, “Convolutional neural networks for sentence classifica-
tion,” arXiv preprint arXiv:1408.5882, 2014.
[105] Y. Dong, N. V. Chawla, and A. Swami, “metapath2vec: Scalable
representation learning for heterogeneous networks,” in Proceed-
ings of the 23rd ACM SIGKDD international conference on knowledge
discovery and data mining, 2017, pp. 135–144.
[106] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient esti-
mation of word representations in vector space,” arXiv preprint
arXiv:1301.3781, 2013.
[107] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta
path-based top-k similarity search in heterogeneous information
networks,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp.
992–1003, 2011.
[108] S. Zhang, W. Wang, J. Ford, and F. Makedon, “Learning from
incomplete ratings using non-negative matrix factorization,” in
Proceedings of the 2006 SIAM international conference on data mining.
SIAM, 2006, pp. 549–553.
[109] C. H. Ding, T. Li, and M. I. Jordan, “Convex and semi-
nonnegative matrix factorizations,” IEEE transactions on pattern
analysis and machine intelligence, vol. 32, no. 1, pp. 45–55, 2008.
[110] “Movielens dataset,” 1997, https://grouplens.org/datasets/
movielens/.
[111] “Movielens website,” 1997, https://movielens.org/.
[112] “Douban movie,” 2005, http://movie.douban.com/.
[113] “Imdb,” 1990, https://www.imdb.com/.

You might also like