Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
51 views16 pages

Statistical Relational Learning With Formal Ontologies

This document discusses an approach to statistical relational learning that integrates formal ontologies. It proposes combining an Infinite Hidden Relational Model (IHRM) with inference guided by constraints from a Description Logic (DL) ontology. This allows analyzing entity relationships through the latent model structure while ensuring the model does not violate ontological constraints, improving predictive performance. The approach is demonstrated on a social network dataset using an OWL DL ontology. Results show entities and relationships can be analyzed via clustering, and knowledge base completion respects ontological constraints.

Uploaded by

Mohamed Aly
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views16 pages

Statistical Relational Learning With Formal Ontologies

This document discusses an approach to statistical relational learning that integrates formal ontologies. It proposes combining an Infinite Hidden Relational Model (IHRM) with inference guided by constraints from a Description Logic (DL) ontology. This allows analyzing entity relationships through the latent model structure while ensuring the model does not violate ontological constraints, improving predictive performance. The approach is demonstrated on a social network dataset using an OWL DL ontology. Results show entities and relationships can be analyzed via clustering, and knowledge base completion respects ontological constraints.

Uploaded by

Mohamed Aly
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Statistical Relational Learning

with Formal Ontologies


Achim Rettinger
1
, Matthias Nickles
2
, and Volker Tresp
3
1
Technische Universit at M unchen, Germany,
[email protected]
2
University of Bath, United Kingdom,
[email protected]
3
Siemens AG, CT, IC, Learning Systems, Germany,
[email protected]
Abstract. We propose a learning approach for integrating formal knowl-
edge into statistical inference by exploiting ontologies as a semantically
rich and fully formal representation of prior knowledge. The logical con-
straints deduced from ontologies can be utilized to enhance and control
the learning task by enforcing description logic satisability in a latent
multi-relational graphical model. To demonstrate the feasibility of our
approach we provide experiments using real world social network data in
form of a SHOIN(D) ontology. The results illustrate two main practical
advancements: First, entities and entity relationships can be analyzed via
the latent model structure. Second, enforcing the ontological constraints
guarantees that the learned model does not predict inconsistent relations.
In our experiments, this leads to an improved predictive performance.
1 Introduction
This paper focuses on the combination of statistical machine learning with on-
tologies specied by formal logics. In contrast to existing approaches to the use
of constraints in machine learning (ML) and data mining, we exploit a semanti-
cally rich and fully formal representation of hard constraints which govern and
support the stochastic learning task. Technically, this is achieved by combining
the Innite Hidden Relational Model (IHRM) approach to Statistical Relational
Learning (SRL) with inference guided by the constraints implied by a Descrip-
tion Logic (DL) ontology used on the Semantic Web (SW). In this way, our
approach supports a tight integration of formal background knowledge resulting
in an Innite Hidden Semantic Models (IHSM). The term Semantic in IHSM
stands for this integration of meaningful, symbolic knowledge which enables
the use of deductive reasoning.
Benets of the presented approach are (1) the analysis of known entity classes
of individuals by means of clustering and (2) the completion of the knowledge
base (KB) with uncertain predictions about unknown relations while considering
constraints as background knowledge for the machine learning process. Thus, it
is guaranteed that the learned model does not violate ontological constraints and
II
the predictive performance can be improved . While there is some research on
data mining for the SW, like instance-based learning and classication of indi-
viduals, considering hard constraints specied in the ontology during machine
learning has hardly been tried so far or only in quite restricted and semi-formal
ways (see Sec. 6 for related work). Even though we use a social network OWL
DL ontology and settle on SRL as an apparently natural counterpart for logical
constraints, our general approach is in no way restricted to DL or SRL and could
be easily adapted to other formal and learning frameworks.
To provide an intuitive understanding of the presented approach we will use a
simple example throughout this paper to illustrate the application of constraints
in our learning setting: Consider a social network where, amongst others, the age
of persons and the schools they are attending is partially known. In addition, an
ontology designer specied that persons under the age of 5 are not allowed to
attend a school. All this prior knowledge is provided in a formal ontology and
the ultimate task is to predict unknown elements of this network.
The remainder of this paper is structured as follows: In Sec. 2 we specify
an ontology in OWL DL that denes the taxonomy, relational structure and
constraints. Next, we show how to infer a relational model from the ontology
and transfer the relational model into an IHSM (Sec. 3). Then, we learn the
parameters of this innite model in an unsupervised manner while taking the
constraints into account (Sec. 4). In Sec. 5 the IHSM is evaluated empirically
using a complex dataset from the Semantic Web. Finally, we discuss related work
in Sec. 6 and conclude in Sec. 7.
2 Formal Framework
Our approach requires the specication of formal background knowledge and
formal constraints for the learning process. We do so by letting the user of the
proposed machine learning algorithm specify a formal ontology or use an exist-
ing ontology e.g. from the SW. In computer science, an ontology is the formal
representation of the concepts of a certain domain and their relations. In the
context of the (Semantic) Web and thus also in our approach, such an ontology
is typically given as a so-called TBox and a ABox, each of which consists of a
number of description logic formulas. The TBox comprises conceptual knowledge
(i.e., knowledge about classes and their relations), whereas the ABox comprises
knowledge about the instances of these classes. In our context, the given ontol-
ogy and also all knowledge which is a logical consequence from it constitutes
the hard knowledge for the learning process (i.e., knowledge which cannot be
overwritten during learning), as described later.
However, our approach is not restricted to ontologies, but works in principle
with all sorts of formal knowledge bases. We are using ontologies mainly because
there is an obvious relatedness of clustering and ontological classication, and
because formal languages, reasoners, editors and other tools and frameworks for
ontologies on the SW are standardized and widely available. Consequently, we
use a description logic for our examples. This is not only because ontologies
III
and other formal knowledge on the Web (which is our application here) are
usually represented using DLs, but also because the standard DL which we use
is a decidable fragment of rst-order logic (FOL) for which highly optimized
reasoners exist.
We settle on the oHO1^(D) [1] description logic, because entailment in the
current Semantic Web standard ontology language OWL DL can be reduced to
oHO1^(D) knowledge base satisability. We could likewise work with OWL
DL syntax directly, but that wouldnt have any technical advantages and would
just reduce the readability of our examples. Our approach requires that the
satisability or consistency of ontologies can be checked, which is a standard
operation of most automated reasoning software for the SW. Allowing to check
the satisability means that the reasoner is able to check whether a given KB
(ontology) has a model. On the syntactic level, satisability corresponds to con-
sistency, i.e., there are no sentences in the given ontology which contradict each
other. The following species the syntax of oHO1^(D). Due to lack of space,
please refer to [1] for a detailed account of this language.
C A[C[C
1
C
2
[C
1
. C
2
[R.C[R.C [
nS[ nS[a
1
, ..., a
n
[ nT[ nT[
T
1
, ..., T
n
.D[T
1
, ..., T
n
.D[D d[c
1
, ..., c
n

Here, C denote concepts, A denote atomic concepts, R denote abstract roles


or inverse roles of abstract roles (R

), S denote abstract simple roles, the T


i
denote concrete roles, d denotes a concrete domain predicate, and the a
i
/ c
i
denote abstract / concrete individuals.
A oHO1^(D) ontology or knowledge base is then a non-empty, nite set of
TBox axioms and ABox assertions C
1
_ C
2
(inclusion of concepts), Trans(R)
(transitivity), R
1
_ R
2
, T
1
_ T
2
(role inclusion for abstract respectively con-
crete roles), C(a) (concept assertion), R(a, b) (role assertion), a = b (equal-
ity of individuals), and a ,= b (inequality of individuals). Concept equality is
denoted as C
1
C
2
which is just an abbreviation for mutual inclusion, i.e.,
C
1
_ C
2
, C
2
_ C
1
. Dening a semantics of oHO1^(D) is not required within
the scope of this work, the canonical semantics which we assume in this work
can be found, e.g., in [1].
2.1 Constraints
Constraints in the sense of this work are actually just formal statements. Our
approach is expected to work with all kinds of logical frameworks which allow
for satisability (or consistency) checks over some given set of logical sentences,
for example an ontology. This set of given statements is denoted as the KB in
the further course of this paper. Formally, we dene a set of constraints C to
be the deductive closure (KB) of a given knowledge base KB, with (KB) =
c[KB [= c. The deductive closure contains not only explicitly given knowledge
(the knowledge base KB), but also all logical sentences which can be derived from
IV
the KB via deductive reasoning. E.g., if the KB would contain the sentences a
and a b, the deductive closure would also contain b.
The application-specic constraint set which we use as an OWL DL ontology
is similar to the well-known Friend-Of-A-Friend (FOAF) social network schema,
together with additional constraints which will be introduced later. The following
ontology SN comprises only a fragment of the full FOAF-like ontology we have
used (with DOB meaning date of birth and hasBD meaning has birthday.).
Person _ Agent knows

_ knows knows. _ Person


_ knows.Person hasBD. _ Person _ hasBD.DOB
_ 1 hasBD _ 1 hasBD yearV alue. _ DOB
_ yearV alue.gY ear _ 1 yearV alue _ attends.School
These axioms mainly express certain properties of binary relations (so-called
roles) between classes. For example, _ attends.School species that in our
example ontology the range (target set) of role attends is School.
In addition to these, we provide the machine learning algorithm with an
ABox which models an incomplete social network. The later machine learning
task consists essentially in a (uncertain) completion of this given network frag-
ment. An example for such additional individuals-governing constraints A: tim :
Person, tina : Person, tom : Person; (tina, tim) : knows, (tina, tom) : knows.
Note, that these relationships among persons cannot be weakened or over-
written by the learning process, even if they contradict observed data. They need
to be provided manually by the KB engineer. As further constraints, we assume
some specic properties G of the analyzed social network. The following set of
axioms expresses that no one who is younger than six years goes to school. At
this, UnderSixY earsOld is the class which contains persons with an age less
than six years (calculated from the given dates of birth):
Pupil _ Person Pupil _ UnderSixOld Pupil _ attendsSchool
The complete set of given formal and denite knowledge for our running
example is then C = (SN . A. G).
Example Data: The set of data used as examples for the learning tasks takes
the form of ABox assertions. But in contrast to the ABox knowledge in set A
above, an example here might turn out to be wrong. We also do not demand
that examples are mutually consistent, or consistent with the ontology. In order
to maintain compatibility with the expected input format for relational learning,
we restrict the syntax of examples to the following two description logic formula
patterns:
instance : category
(instance
a
, instance
b
) : role
At this, roles correspond to binary relations. The set of all example data given
as logical formulas is denoted as D.
V
3 Innite Hidden Semantic Models
The proposed Innite Hidden Semantic Model (IHSM) is a machine learning
algorithm from the area of SRL [2]. The novelty is its additional ability to exploit
formal ontologies as prior knowledge given as a set of logical formulas. In our
case, the constraints are provided as a oHO1^(D) ontology with a TBox and
an ABox as just described in the previous section. In traditional ML, prior
knowledge is just specied by the likelihood model and the prior distributions,
parameters of the learning algorithm or selection of features.
In this section, we rst show how the ontology from Sec. 2 denes a Relational
Model (RM) which is the basis for an Innite Hidden Relational Model (IHRM).
Then, the IHSM is generated by constraining the IHRM appropriately.
3.1 Relational Models
tim : Person
knows
usa :
Location
tina :
Person
tom :
Person
Image has
knows
uk :
Location
residence
residence
Fig. 1. Partial sociogram of the LJ-FOAF-domain.
First an abstract RM of concepts and roles dened in our social network
ontology is created. Based on the TBox axioms given by the ontology we can
create a simple sociogram as depicted in Fig. 1. A sociogram consists of three
dierent elements: concept individuals (individuals that are instances of a con-
cept (e.g. tim : Person)), attribute instances (relations between a concept and
a literal (e.g. tina : hasImage)), role instances (relations between concepts (e.g.
(tina, tim) : knows)). Please note that many TBox elements rst need to be
deduced from the ontology, so that all individuals can be assigned to its most
specic concepts. This process is known as realization in DL reasoning. Fig. 2
shows the full RM we use for experiments in Sec. 5.
3.2 Innite Hidden Relational Models
Following [3] and [4] we extend the RM to a Hidden Relational Model (HRM)
by assigning a hidden variable denoted as Z
c
i
to each individual i of concept c
with current state k. Given that the hidden variables have discrete probability
distributions they can be intuitively interpreted as clusters Z where similar indi-
viduals of the same concept c (in our case similar persons, locations, schools,...)
VI
Person
Date
Image
has
OnlineChat
Account
Location
#BlogPosts
School
holds
dateOfBirth
residence attends
posted
located
knows
Fig. 2. Relational Model of the LJ-FOAF-domain.
are grouped in one specic component k. These assignments of latent states
specify the component one individual is assigned to.
The resulting HRM of the sociogram shown in Fig. 1 is depicted in Fig. 3.
Following the idea of hidden variables in Hidden Markov Models (HMMs) or
Markov Random Fields, those additional variables can be thought of as unknown
properties (roles or attributes) of the attached concept. We assume that all
attributes of a concept only depend on its hidden variable and roles depend on
two hidden variables of the two concepts involved. This implies that if the hidden
variables were known, attributes and roles can be predicted independently. In
addition, the hidden variables in the IHSM incorporate restrictions in the form
of constraints imposed by the ontology (see Sec. 3.3).
tim : Person
knows
usa :
Location
tina :
Person
tom :
Person
Image has
knows
uk :
Location
residence
residence
Fig. 3. Hidden relational model of the sociogram dened in Fig. 1.
Considering the HRM model shown in Fig. 3, information can now prop-
agate via those interconnected hidden variables Z. E.g. if we want to predict
whether tom with hidden state Z
1
3
might know tina (Z
1
2
) we need to consider a
new relationship R
3,2
. Intuitively, the probability is computed based on (i) the
attributes A
1
3
and A
1
1
of the latent states of immediately related persons Z
1
3
and
Z
1
2
; (ii) the known relations associated with the persons of interest, namely the
role knows and residence R
2,1
, R
3,1
and R
3,2
; (iii) higher-order information in-
directly transferred via hidden variables Z
1
3
and Z
1
2
. In summary, by introducing
hidden variables, information can globally distribute in the HRM. This reduces
the need for extensive structural learning, which is known to be dicult.
VII
Entity
Attri-
bute
Relation
Fig. 4. Parameters of an IHRM.
Critical parameters in the HRM are the number of states in the various
latent variables, which might have to be tuned as part of a complex optimization
routine. A solution here oers the IHRM, that was introduced by [4] and [3]. In
the IHRM, a hidden variable has a potentially innite number of states and an
estimate of the optimal number of states is determined as part of the inference
process.
Finally, we need to dene the remaining variables, their probability distribu-
tions and model parameters.
4
The most important parameters in our case are
shown in Fig. 4. The state k of Z
c
i
species the cluster assignment of the concept
(aka entity class) c. K denotes the number of clusters in Z. Z is sampled from a
multinomial distribution with parameter vector = (
1
, . . . ,
K
), which speci-
es the probability of a concept belonging to a component, i.e. P(Z
i
= k) =
k
.
is referred to as mixing weights, and is drawn according to a truncated stick
breaking construction with a hyperparameter
0
.
0
is referred to as a concentra-
tion parameter in Dirichlet Process (DP) mixture modeling and acts as a tuning
parameter that inuences K. K is also limited by a truncation parameter that
species the maximum number of components per cluster for each entity class.
Attributes A
c
are generated from a Bernoulli distribution with parameters

k
. For each component, there is an innite number of mixture components

k
. Each person in the component k inherits the mixture component, thus we
have: P(G
i
= s[Z
i
= k, ) =
k,s
. These mixture components are independently
drawn from a prior G
0
. The base distributions G
c
0
and G
b
0
are conjugated priors
with hyperparameters
c
and
b
.
The truth values for the role R
i,j
involving two persons (i and j) are sampled
from a binomial distribution with parameter
k,
, where k and denote cluster
assignments of the person i and the person j, respectively.
b
k,
is the correlation
mixture component indexed by potentially innite hidden states k for c
i
and
for c
j
, where c
i
and c
j
are indexes of the individuals involved in the relationship
class b. Again, G
b
0
is the Dirichlet Process base distribution of a role b. If an
individual i is assigned to a component k, i.e. Z
i
= k, the person inherits not
only
k
, but also
k,
, = 1, . . . , K.
4
Please note that we cannot focus on the technical details of an lHRM and need to
refere the reader to [4] and [3] for a more detailed introduction.
VIII
3.3 Innite Hidden Semantic Models
The IHSM is based on the idea that formal constraints can be imposed on the
correlation mixture component
k,
and thus restrict possible truth values for
the roles R
i,j
. This, amongst others, imposes constraints on the structure of the
underlying ground network or, more specically in our application, the struc-
ture of the sociogram. Recap the simple example from Sec. 1: According to this
a person i known to be younger than 5 years old should not be attending any
school j. The IHSM will extract this information from the ontology and set the
correlation mixture component
k,
at entries representing according relations
from Person component k to School component to 0. Here, k and denote
the components the person i and the school j are assigned to. This eliminates
inconsistent structural connection from the underlying ground network. More
generally, all connections R
i,j
between two components k and where inconsis-
tent individuals i and j are (partial) member of are considered void.
However, this redirection of relations by the latent variables allows IHSM
not only to restrict possible connections in the ground network but makes this
restriction inuence the likelihood model itself. By restricting , is aected
as well. Ultimately, cluster assignments Z are inuenced and information can
globally propagate through the network and inuence all , and (see Sec. 3.2).
While this section (3.3) focused on a conceptual description of IHSM the
algorithm will be specify in detail in the next section (4) before Sec. 5 presents
experimental results.
4 Learning, Constraining and Predictions
The key inferential problem in the IHSM is to compute the joint posterior dis-
tribution of unobservable variables given the data. In addition, we need to avoid
inconsistent correlation mixture components during learning. As computation
of the joint posterior is analytically intractable, approximate inference methods
need to be considered to solve the problem. We use the blocked Gibbs sampling
(GS) with truncated stick breaking representation [5] a Markov chain Monte
Carlo method to approximate the posterior .
Let D be the set of all available observations (observed example data, each
represented as a logical formula as dened in 2.1), and let Agents = Agent
I
be the set of all instances of category Agent under interpretation I - that is
informally, all persons which contribute to the social network. At each iteration,
we rst update the hidden variables conditioned on the parameters sampled in
the last iteration, and then update the parameters conditioned on the hidden
variables. So, for each entity class
1. Update hidden variable Z
c
i
for each e
c
i
: Assign to component with probability
proportional to:

c(t)
k
P(A
c
i
[Z
c(t+1)
i
= k,
c(t)
)

P(R
b

i,j
[Z
c(t+1)
i
= k, Z
c
j
(t)
j
,
b

(t)
)
IX
2. Update
c(t+1)
as follows:
(a) Sample v
c(t+1)
k
from
Beta(
c(t+1)
k,1
,
c(t+1)
k,2
) for k = 1, . . . , K
c
1 with

c(t+1)
k,1
= 1 +
N
c

i=1

k
(Z
c(t+1)
i
),

c(t+1)
k,2
=
c
0
+
K
c

=k+1
N
c

i=1

k
(Z
c(t+1)
i
),
and set v
c(t+1)
K
c = 1.
k
(Z
c(t+1)
i
) equals to 1 if Z
c(t+1)
i
= k and 0 otherwise.
(b) Compute
c(t+1)
as:
c(t+1)
1
= v
c(t+1)
1
and

c(t+1)
k
= v
c(t+1)
k
k1

=1
(1 v
c(t+1)
k

), k > 1.
3. Update :

c(t+1)
k
P([A
c
, Z
c(t+1)
, G
c
0
)
4. Constrain to satisable relations:
For entity cluster k, let F
k
ext
= F
k
(e
m
, e
n
) : r[e
m
, e
n
Agents, r R, m ,=
n be the set of those logical formulas in the example data set which represent
some relation (role) r between two dierent individuals (persons) e
m
and
e
n
where person e
m
is assigned to component k already and e
n
is assigned
to a component . To keep the notation compact, we spell out role instances
(e
1
, e
2
) : r only asymmetrically (i.e., we omit (e
2
, e
1
) : r if we have covered
the case (e
1
, e
2
) : r). Let F
k
D be the set of all example formulas which
have already been used to learn component k so far, that is, the subset of
the data D which has been used for forming that cluster until now. Let fur-
thermore (e, D) be the set of all sampled formulas in D where the person
e appears, i.e., f (e, D) i f D (f e : c f (e, e
x
) : r for some
c C, e
x
Agents and r R. We use (e, j) to express that a certain entity
e has already been assigned to a certain component j. The following steps are
now used in order to check whether component k is usable w.r.t. the given
set of logical constraints C:
(a) Identify the largest subset F
k
clean
of formulas within F
k
ext
which is consis-
tent with C and the set of example data about person e
c
i
:
F
k
clean
2
F
k
ext
, 1, 1 [= F
k
clean
(e
c
i
, D) C,
F 2
F
k
ext
, 1, 1 [= F (e
c
i
, D) C : F F
k
clean
(1 [= X expresses that the set of logical sentences X is satisable, 1 being
an interpretation).
X
(b) Verify whether F
k
clean
, the formulas which have been used to learn re-
lated other clusters, (e
c
i
, D) and the constraints are consistent in sum
if we replace in F
k
clean
the names of all persons which are assigned to
components other than k with the name of person e
c
i
.
Let F
k
upd
= (e
c
i
, e
m
) : r[(e
n
, e
m
) : r F
k
ext
be the latter set of formulas.
Furthermore, let F
k
rel
=

j=k,(em,k),(em,en):rF
j F
j
be the set of all for-
mulas in all other components than k which relate to component k
using role formulas. The overall consistency check for component k yields
a positive result i
1, 1 [= (e
c
i
, D) F
k
upd
F
k
rel
C F
k
clean
,=
Where the consistency check described above yielded a positive result:

b(t+1)
k,
P([R
b
, Z
(t+1)
, G
b
0
).
After the GS procedure reaches stationarity the role of interest is approximated
by looking at the sampled values. Here, we only mention the simple case where
the predictive distribution of the existence of a relation R
i,j
between to known
individuals i, j is approximated by
b
i

,j
where i

and j

denote the cluster as-


signments of the objects i and j, respectively.
5 Experiments
The increasing popularity of social networking services like MySpace and Face-
book has fostered research on social network analysis in the last years. The
immense number of user proles demands for automated and intelligent data
management capabilities, e.g. formal ontologies. While data mining techniques
can handle large amounts of simple facts, little eort has been made to exploit
the semantic information inherent in social networks and user proles. There
is almost no work on statistical relational learning with formal ontologies in
general and with SW data in particular. The lack of experiments on large and
complex real world ontologies is not only due to the absence of algorithms but
also due to missing suitable datasets. In this section we will present both, a large
and complex SW dataset and the methodology of how to apply IHSM in prac-
tice. Ultimately, we evaluate our approach by presenting results of an empirical
comparison of IHSM and IHRM in this domain.
5.1 Data and Methodology
As mentioned before our core ontology is based on Friend of a Friend (FOAF)
data. The purpose of the FOAF project is to create a web of machine-readable
pages describing people, the links between them and the things they create
and do. The FOAF ontology is dened using OWL DL/RDF(S) and formally
specied in the FOAF Vocabulary Specication 0.91
5
. In addition, we make
5
http://xmlns.com/foaf/spec/
XI
use of further concepts and roles which are available in the data (see Sec. 2.1).
We gathered our FOAF dataset from user proles of the community website
LiveJournal.com
6
(This specic ontology will be called LJ-FOAF from now on).
All extracted concepts and roles are shown in Fig. 2. Tab. 1 lists the number
of dierent individuals (left column) and their known instantiated roles (middle
column). Please note that Date and #BlogPosts are reduced to a small number
of discrete states. As expected for a social networks knows is the primary source
of information. This real world data set oers both, a suciently large set of
individuals for inductive learning and a formal ontology specied in RDFS and
OWL. However, while LJ-FOAF oers a taxonomy there are no complex con-
straints given. Thus, to demonstrate the full potential of IHSM, we additionally
added constraints that are not given in the original ontology (see Sec. 2.1).
Concept #Indivi. Role #Inst. #C. IHRM #C. IHSM
Location 200 residence 514 18 17
School 747 attends 963 36 48
OnlineChatAccount 5 holdsAccount 427 4 4
Person 638 knows 8069 38 45
hasImage 574
Date 4 dateOfBirth 194 4 2
#BlogPosts 5 posted 629 4 4
Table 1. No. of individuals, no. of instantiated roles and nal number of components
To implement all features of IHSM we made use of additional open source
software packages: The Semantic Web framework Jena
7
is used to load, store and
query the ontology and Pellet
8
provides the OWL DL reasoning capabilities.
This outlines the workow: First, the TBox axioms are designed and loaded
into Jena. Next, all ABox assertions are added and loaded into Jena. Then, by
using the taxonomy information from the ontology and the ABox assertions we
extract the RM as described in Sec. 3.1. This RM is transferred into a IHSM
by adding hidden variables and parameters, accordingly. Finally, the parameters
are learned from the data, while constraints are constantly checked as shown in
Sec. 4.
In our experiments the standard setting for the truncation parameter were
#Individuals/10 for entity classes with over 100 instances and #Individuals for
entity classes with less individuals. The standard iterations of the Gibbs sampler
are 100. We did not engage in extensive parameter tuning because the purpose
of this evaluation is to examine the inuence of the constraints and not optimal
predictive performance. Thus, we xed
0
= 5 for every entity class and
0
= 20
for every relationship class.
6
http://www.livejournal.com/bots/
7
http://jena.sourceforge.net/
8
http://pellet.owldl.com/
XII
5.2 Results
We will now report our results on learning and constraining with the LJ-FOAF
data set.
Computational Complexity: The additional consistency check for every in-
dividual per iteration made training slower by approximately a factor of 6 if
performed with Jena and Pellet. After implementing a non-generic constrain-
ing module optimized for the simple example introduced in Sec. 1 we could
reduce the additional computation considerably. A comparison between IHSM
and IHRM for dierent truncation parameter settings is given in Fig. 5. Obvi-
ously, there is almost no computational overhead in the latter case.
0
1000
2000
3000
4000
1 2 3
s
e
c
o
n
d
s
truncation number setting
IHRM
IHSM
Fig. 5. Running time.
0
100
200
300
400
500
0 20 40 60 80 100
N
u
m
b
e
r

o
f

i
n
d
i
v
i
d
u
a
l
s

i
n

c
o
m
p
o
n
e
n
t

o
f

c
l
u
s
t
e
r

P
e
r
s
o
n
Gibbs sampling iterations
IHRM - largest component
IHSM - largest component
IHRM - 2nd largest component
IHSM - 2nd largest component
Fig. 6. Convergence of the two largest components.
Evaluating the convergence of the cluster sizes is another interesting aspect
in the comparison of IHSM and IHRM. Fig. 6 shows the number of individuals
for the two largest components of the entity cluster Z
Person
plotted over Gibbs
sampler iterations for one exemplary training run. Apparently, the constraining
does not aect the convergence speed which is desirable.
Cluster Analysis: An interesting outcome of the comparison of IHRM and
IHSM is the number of components per hidden variable after convergence (see
Table 1 right column). In both cases, if compared to the initialization, Gibbs
sampling converged to a much smaller number of components. Most of the indi-
viduals were assigned to a few distinct components leaving most of the remaining
components almost empty. There is a noticeable dierence between IHRM and
IHSM concerning the concepts School and Person which needed more compo-
nents after training with IHSM (see bold numbers in Table 1). A closer analysis of
the components revealed that IHSM generated additional components for incon-
sistent individuals, because both concepts are aected by constraints. However,
the last concept aected by the constraints (Date) has fewer components. Here,
IHSM divided more generally into age groups too young and old enough
XIII
which also reects the constraints. This demonstrates that the restriction of
roles does inuence the states of the latent variables.
Fig.7 compares the learned parameter
attend
of IHRM to the one learned
by IHSM. A brighter cell indicates stronger relations between two components.
Although hard to generalize, a cell with 50% gray might indicate that no sig-
nicant probabilistic dependencies for individuals in this component are found
in the data. The most obvious results are the rows with black cells only which
represent Person components that have no relation to any school. In fact, all of
those cells contained at least one persons that conicted with the ontology by
having specied an age under 5. This proves that one of the main goals of IHSM
is achieved, namely the exploitation of constraints provided by the ontology.
Note that the learned clusters can also be used to extract symbolic and un-
certain knowledge and feed it back to the ontology. This is a promising direction
of future research.
0 10 20 30 40 50 60 70
cluster components School
0
10
20
30
40
50
60
c
lu
s
t
e
r

c
o
m
p
o
n
e
n
t
s

P
e
r
s
o
n
0 10 20 30 40 50 60 70
cluster components School
0
10
20
30
40
50
60
c
lu
s
t
e
r

c
o
m
p
o
n
e
n
t
s

P
e
r
s
o
n
Fig. 7. Correlation mixture component
attend
for each combination of components
Z
Person
and Z
School
. Left: without constraining (IHRM). Right: with constraining
(IHSM).
Predictive Performance: Given LJ-FOAF data for social network analysis
one could for instance want to predict who knows who in case either this
information is unknown or the systems wants to recommend new friendships.
Other relations that could be interesting to predict in case they are unknown
are the school someone attends/attended or the place he lives/lived. Furthermore
one could want to predict unspecied attributes of certain persons, like their age.
The purpose of this section is not to show superior predictive performance of
IHRM compared to other multi-relational learning algorithms. This has been
evaluated before, e.g. in [3]. Here, we want to show the inuence of constraining
on the predictive performance for IHSM compared to IHRM.
We ran a 5-fold cross validation to evaluate the predictions of dierent re-
lationship classes. In specic, the non-zero entries of the relationship matrix to
be predicted were randomly split in 5 parts. Each part was once used for testing
while the remaining parts were used for training. The entries of each testing
part were set to zero (unknown) for training and to their actual value of 1 for
XIV
testing. Each fold was trained with 100 iterations of the Gibbs sampler, where 50
iterations are discarded as the burn-in period. After this, the learned parameters
are recorded every fth iteration. In the end we use the 10 recorded parameter
sets to predict the unknown relationship values, average over them and calculate
the area under the ROC curve (AUC) as our evaluation measure
9
. Finally we
average over the 5 folds and calculate the 95% condence interval.
Role attends dateOfBirth knows
IHRM 0.577 (0.013) 0.548 (0.018) 0.813 (0.005)
IHSM 0.608 (0.017) 0.561 (0.011) 0.824 (0.002)
Table 2. Predictive performance for dierent LJ-FOAF roles: AUC and 95% condence
intervals
The obvious roles to evaluate are attends and dateOfBirth. Both are con-
strained by the ontology, so IHSM should have an advantage over IHRM because
it cannot predict any false positives. The results in Table 2 conrm this obser-
vation. In both cases IHSM did outperform IHRM. A less obvious outcome can
be examined from the inuence of the constraining on a relationship that is not
directly constrained by the ontology like knows. Still, in our experiments IHSM
showed a slight advantage over IHRM. Thus, there seems to be a positive inu-
ence of the background knowledge, although a lot of users specify an incorrect
age. However, there is the potential that the opposite may occur likewise. If the
given constraints are conicting with the empirical evidence there could even
be a decrease in predictive performance. It is the ontology designers choice to
decide whether to enforce a constraint that conicts with the observed evidence.
Considering the numerous ongoing eorts concerning ontology learning for
the Semantic Web more data sets with complex ontologies should become avail-
able in the near future. Thus, we expect to achieve more denite results of IHSM
in those domains.
6 Related Work
Very generally speaking, our proposed method aims at combining machine learn-
ing with formal logic. So far, machine learning has been mainly approached either
with statistical methods, or with approaches which aim at the inductive learning
of formal knowledge from examples which are also provided using formal logic.
The most important direction in this respect is Inductive Logic Programming
(ILP) [6]. Probabilistic- and Stochastic Logic Programming (e.g., [7]) (SLP) are
a family of ILP-based approaches which are capable of learning stochastically
9
Please note that SW data has no negative samples, because zero entries do not
represent negative relations but unknown ones (open world assumption). Still, the
AUC is appropriate because it has been shown to be a useful measure for probabilistic
predictions of binary classication on imbalanced data sets.
XV
weighted logical formulas (the weights of formulas, respectively). In contrast to
that, our approach learns probability distributions with the help of a given, for-
mal theory which acts as a set of hard constraints. To the best of our knowledge,
this direction is new. What (S)ILP and our approach have in common is that
our method also uses examples formalized in a logic language as data. There
are also some approaches which build upon other types of uncertain logic, for
example [8]. Although (S)ILP and SRL are conceptually very closely related and
often subsumed under the general term relational learning, SRL still is rarely
integrated with formal logic or ontologies as prior knowledge. [9] use ontologies
in an similar model but only use taxonomic information as additional soft
knowledge (i.e., knowledge which can be overwritten during the learning pro-
cess) in the form of features for learning. They do not restrict their results using
formal hard constraints. One exception are Markov Logic Networks [10] which
combine First Order Logic and Markov Networks and learn weights of formulas.
Surprisingly there are also hardly any applications of (pure) SRL algorithms
to (SW) ontologies. The few examples, e.g. [11], [12], do not consider formal
constraints. There are various approaches to the learning of categories in formal
ontologies from given instance data and/or similar categories (e.g., [13]). How-
ever, these approaches do not allow for the statistical learning of relations in the
sense of SRL and their aims are all in all more related to those of ILP than to
our learning goals. Although there are applications of SRL to social networks,
such as [14], none of those approaches uses a formal ontology or any other kind
of formal knowledge. Furthermore, the social networks examined in this work
are mostly signicantly less complex in regard of the underlying relation model.
The use of hard constraints for clustering tasks in purely statistical ap-
proaches to learning, as opposed to the ubiquitous use of soft prior knowledge,
has been approached in, e.g., [15]. A common characteristic of these approaches
is that they work with a relatively narrow, semi-formal notion of constraints and
do not relate constraints to relational learning. In contrast to these eorts, our
approach allows for rich constraints which take the form of a OWL DL knowledge
base (with much higher expressivity). The notion of forbidden pairings of data
points (cannot-link constraints [15]) is replaced with the more general notion of
logical (un-)satisability w.r.t. formal background knowledge.
7 Conclusions and Future Work
In the presented approach, we explored the integration of formal ontological
prior knowledge into machine learning tasks. We introduced IHSM and provided
empirical evidence that hard constraints cannot only improve predictive perfor-
mance of unknown roles, which are directly aected by the constraints, but also
unconstraint roles via IHSMs latent variables.
In general we are hope to see more work on inductive learning with SW
ontologies and on the other hand complex Semantic Web ontologies that can
be supplemented with uncertain evidence. For the IHSM in particular, future
work will concern a detailed theoretical analysis of the eect of constraining on
XVI
clusters. Rening the ontology by extracting formal knowledge from the latent
model structure is another promising research direction. As mentioned before
we intend to obtain additional experimental evidence concerning computational
complexity and predictive performance as soon as more suitable ontologies be-
come available. We expect that the increased research on semantic technologies
will soon result in those suitable formal ontologies that contain both, complex
consistency reasoning tasks and large sets of instances.
References
1. Horrocks, I., Patel-Schneider, P.F.: Reducing owl entailment to description logic
satisability. In: Journal of Web Semantics, Springer (2003) 1729
2. Getoor, L., Taskar, B., eds.: Introduction to Statistical Relational Learning. The
MIT Press (2007)
3. Xu, Z., Tresp, V., Yu, K., Kriegel, H.P.: Innite hidden relational models. In:
Proceedings of the 22nd International Conference on Uncertainity in Articial
Intelligence (UAI 2006). (2006)
4. Kemp, C., Tenenbaum, J.B., Griths, T.L., Yamada, T., Ueda, N.: Learning
systems of concepts with an innite relational model. In: Proc. 21st Conference on
Articial Intelligence. (2006)
5. Ishwaran, H., James, L.: Gibbs sampling methods for stick breaking priors. Journal
of the American Statistical Association 96(453) (2001) 161173
6. Lisi, F.A., Esposito, F.: On Ontologies as Prior Conceptual Knowledge in Induc-
tive Logic Programming. In: ECML PKDD 2008 Workshop: Prior Conceptual
Knowledge in Machine Learning and Knowledge Discovery PriCKL07. (2007)
7. Raedt, L.D., Kersting, K.: Probabilistic logic learning. SIGKDD Explor. Newsl.
5(1) (2003) 3148
8. Carbonetto, P., Kisynski, J., de Freitas, N., Poole, D.: Nonparametric bayesian
logic. In: Proc. 21st UAI. (2005)
9. Reckow, S., Tresp, V.: Integrating Ontological Prior Knowledge into Relational
Learning. In: NIPS 2008 Workshop: Structured Input - Structured Output. (2008)
10. Richardson, M., Domingos, P.: Markov logic networks. Journal of Machine Learn-
ing Research 62 (2006) 107136
11. Kiefer, C., Bernstein, A., Locher, A.: Adding Data Mining Support to SPARQL
via Statistical Relational Learning Methods. In: Proceedings of the 5th European
Semantic Web Conference (ESWC). Volume 5021 of Lecture Notes in Computer
Science., Springer-Verlag Berlin Heidelberg (2008) 478492
12. N. Fanizzi, C. dAmato, F.E.: Induction of classiers through non-parametric
methods for approximate classication and retrieval with ontologies. International
Journal of Semantic Computing vol.2(3) (2008) 403 423
13. Fanizzi, N., DAmato, C., Esposito, F.: A multi-relational hierarchical cluster-
ing method for datalog knowledge bases. In: Foundations of Intelligent Systems,
Springer Berlin / Heidelberg (2008)
14. Xu, Z., Tresp, V., Rettinger, A., Kersting, K.: Social network mining with non-
parametric relational models. Advances in Social Network Mining and Analysis -
the Second SNA-KDD Workshop at KDD 2008 (2008)
15. Davidson, I., Ravi, S.S.: The complexity of non-hierarchical clustering with in-
stance and cluster level constraints. Data Min. Knowl. Discov. 14(1) (2007) 2561

You might also like