A Modularization-Based Approach To Finding All Justifications For OWL DL Entailments
A Modularization-Based Approach To Finding All Justifications For OWL DL Entailments
negation C
I
\C
I
conjunction C D C
I
D
I
exists restriction r.C |x
I
[ y
I
: (x, y) r
I
y C
I
|(x, y)
I
I
[ (y, x) r
I
role hierarchy r s r
I
s
I
transitivity Trans(r) (x, y), (y, z) r
I
implies (x, z) r
I
GCI C D C
I
D
I
axioms are precisely those axioms in the module [3]. In order to exploit modu-
larity in black-box axiom pinpointing, Baader and Suntisrivaraporn showed that
the reachability-based module [16] covers all justications for an entailment of
interest in cL
+
[3].
In the present paper, we combine the relevance-based techniques developed
in [9] and the modularization-based techniques in [3] to eectively enhance the
HST pinpointing algorithm. Since the results in [3] are w.r.t. reachability-based
modules for cL
+
, we need to adopt the locality-based module [6] for oHO1Q.
Our main contributions in the present paper are twofold. In theory, we show that
the minimal locality-based module is a subsumption module (rst dened in [3]),
i.e., it covers all justications. As a consequence, it suces to focus on axioms
in the module when nding all justications and when testing subsumption. In
practice, we have implemented the approach using KAON2 as the black-box rea-
soner and evaluated it on realistic ontologies. Our empirical results demonstrate
an improvement of several orders of magnitude in the eciency and scalability of
nding all justications. The results thus render the black-box approach feasible
for application-scale OWL DL ontologies.
2 Preliminaries
In this section, we give formal denitions for oHO1Q ontologies, justications
and locality-based modules. Then, we introduce selection functions and the HST
pinpointing algorithm.
Description logic and justications
To make the paper self-contained, we rst introduce the Description Logic (DL)
oHO1Q [7] which is the underpinning DL formalism of the Web Ontology Lan-
guage (OWL DL and OWL Lite).
4 B. Suntisrivaraporn et al.
Starting with disjoint sets of concept names CN, role names RN and individuals
Ind, a oHO1Q-role is either a role name r RN or an inverse role r
with
r RN. We denote by Rol the set of all oHO1Q-roles. oHO1Q-concepts can
be built using the constructors shown in the upper part of Table 1, where a
Ind, r, s Rol with s a simple role
1
, n is a positive integer, A CN, and C, D
are oHO1Q-concepts.
2
We use the standard abbreviations: stands for ;
C . D stands for (C D); r.C stands for (r.C); and ns.C stands
for ( (n + 1)s.C). We denote by Con the set of all oHO1Q-concepts.
A oHO1Qontology O is a nite set of role hierarchy axioms r _ s, transitivity
axioms Trans(r), and a general concept inclusion axioms (GCIs) C _ D with
r, s Rol and C, D Con.
3
We write CN(O), RN(O) and Ind(O) to denote,
respectively, the set of concept names, role names and individuals occurring in
the the ontology O, and Sig(O) to denote the signature of O, i.e., CN(O)
RN(O) Ind(O). Similarly, Sig(r), Sig(C) and Sig() are used to denote the
signature of a role, a concept and an axiom, respectively.
The DL semantics is dened by means of interpretations 1 with a non-empty
domain
I
and a function
I
that maps each concept C Con to a subset of
the domain and each role r Rol to a binary relation over the domain. An
interpretation 1 is a model of an ontology O (1 [= O), if the conditions given
in the semantics column of Table 1 are satised. The main types of entailments
are concept satisability: C is satisable w.r.t. O if there exists a model 1 of
O such that C
I
,= ; and concept subsumption: C is subsumed by D w.r.t. O
(written O [= C _ D or C _
O
D) if, for every model 1 of O, C
I
D
I
. Without
loss of generality, we restrict attention to concept subsumption in what follows.
Considering an example ontology depicted in Figure 1, all DL reasoners are able
to detect that the subsumption O
ex
[= = (Endocarditis _ HeartDisease) holds.
Denition 1 (Justication). Let O be a oHO1Q ontology with an entailment
(i.e., O [= ). A subset J O is a justication for in O if J [= and, for
every J
J, J
,[= .
Justications for an entailment need not be unique. Moreover, given an ontology
and an entailment, the number of justications may be exponential in the size
of the ontology. For the small example ontology O
ex
(see Figure 1), it is not
dicult to infer that there are precisely two justications for : one consisting
of axioms marked by , and the other by .
Modularization
We now introduce the notions of syntactic locality and locality-based module,
which have been rst introduced in [6]. Syntactic locality is used to dene the
notion of module for a signature, i.e., a subset of the ontology that preserves the
meaning of names in the signature.
1
A simple role is neither transitive nor a superrole of a transitive role.
2
Concepts and roles in DL correspond to classes and properties in OWL, respectively.
3
A concept denition A C is an abbreviation of two GCIs A C and C A,
while ABox assertions C(a) and r(a, b) can be expressed as the GCIs |a C and
|a r.|b, respectively.
A Modularization-Based Approach to Finding All Justications 5
1
Pericardium Tissue part-of.Heart
2
Endocardium Tissue part-of.HeartValve
part-of.HeartWall
3
HeartValve BodyValve part-of.Heart
4
HeartWall BodyWall part-of.Heart
5
Pericarditis Inammation has-loc.Pericardium
6
Endocarditis Inammation has-loc.Endocardium
7
Inammation Disease acts-on.Tissue
8
Disease has-loc.Heart HeartDisease
9
part-of has-loc
10
Trans(has-loc)
Fig. 1. An example ontology O
ex
; the minimal locality-based module O
loc
Endocarditis
; and
the justications for Endocarditis
O
HeartDisease
Denition 2 (Syntactic locality for oHO1Q). Let S be a signature. The
following grammar recursively denes two sets of concepts Con
(S)
for a signature S:
Con
(S) ::= A
[ (C
) [ (C C
) [ (r
.C) [ (r.C
)
[ ( n r
.C) [ ( n r.C
)
Con
(S) ::= (C
) [ (C
1
C
2
)
where A
Con
(S),
C
i
Con
) , S.
An axiom is syntactically local w.r.t. S if it is of one of the following
forms: (i) r
_ r, (ii) Trans(r
), (iii) C
_ C or (iv) C _ C
is
syntactically local w.r.t. S Sig(O
such that [T
s
i
S
s
i
. For each node n in a HST, let H(n) be the set of edge labels on the
path from the root of the HST to n. Then the label for n is any set s S such
that s H(n) = , if such a set exists. Suppose s is the label of a node n, then
for each s, n has a successor n
)
I
= for every
concept C
Con
(S), and (C
)
I
=
I
for every concept C
Con
(S).
The proof is an easy induction on the structure of the concepts C
and C
.
Intuitively, every concept in Con
(S) (Con
_ C and
C _ C
= o O
loc
A
.
Proof. We show the contraposition by assuming that A ,_
S
B and then demon-
strating that A ,_
S
B. Since A ,_
S
B, there must be a model 1
of o
and
an individual w
I
such that w A
I
B
I
by setting x
I
:= for all symbols (role or concept names)
x Sig(O)Sig(O
loc
A
). Obviously, w A
I
since 1 does not change the interpre-
tation of A Sig(O
loc
A
). There are two possibilities for B: either B
I
= B
I
or
B
I
= . In either case, we have that w , B
I
.
8 B. Suntisrivaraporn et al.
It remains to show that 1 is a model of o, i.e., satises every axiom =
(
L
_
R
) in o. We make a case distinction as follows:
O
loc
A
. It follows that o
, and thus 1
[= . By construction, both 1
and 1
, S and
thus r
Sig(O)Sig(O
loc
A
). By construction of 1, (r
)
I
= . Otherwise,
r
is an inverse role s
. Then, s Sig(r
) , S. It follows that s
Sig(O)Sig(O
loc
A
), and thus (r
)
I
= s
I
= . In both cases, 1 [= as
required.
= Trans(r
_ C. By Proposition 1, (C
)
I
= . Hence, 1 [= .
= C _ C
. By Proposition 1, (C
)
I
=
I
. Hence, 1 [= .
Since 1 is a model of o such that w A
I
B
I
, we have A ,_
S
B, contradicting
the premise of the lemma.
Now, we are ready to establish the required property of the modules:
Theorem 1 (O
loc
A
is a strong subsumption module). Let O be a oHO1Q
ontology and A a concept name. Then O
loc
A
is a strong subsumption module for
A in O.
Proof. The fact that O
loc
A
is a subsumption module has been shown in [4]. It
remains to show that it is strong, i.e., every justication J O for A _
O
B is
contained in O
loc
A
, for every concept name B CN(O).
Assume to the contrary that there is a concept name B and a justication
J for A _
O
B that is not contained in O
loc
A
. By Lemma 1, the strict subset
J
= J O
loc
A
of J is such that A _
J
B. Obviously, J is not minimal and hence
cannot be a justication for A _
O
B, contradicting the initial assumption.
Intuitively, the (minimal) locality-based module for S = A in a oHO1Q-
ontology O contains all the relevant axioms for any subsumption = (A _
O
B),
in the sense that all responsible axioms for are included. In other words, in
order to nd all justications for a certain entailment in an OWL ontology,
it is sucient to consider only axioms in the locality-based module. Since the
minimal locality-based modules are relatively very small (see, e.g., [6,16]), our
modularization-based approach proves promising. The empirical results on real-
life ontologies are described in Section 5.
4 Our Modularization-Based Algorithm
In this section, we propose a new algorithm for nding all justications based
on the relevance-based algorithm and the modularization extraction algorithm.
Before we describe our algorithm, we need to recap the relevance-based algorithm
given in [9].
A Modularization-Based Approach to Finding All Justications 9
Algorithm 1. REL ALL JUSTS(A _ B, O, s)
Data: An ontology O, a subsumption A B and a selection function s.
Result: All justications
begin 1
Globals : ; 2
O
HS HS
local
; k 1; 3
S s(O, A B, k); 4
while S ,= do 5
O
S; 6
if HS
local
,= then 7
for P HS
local
do /* Get global hitting sets */ 8
if O \ P ,[= A B then 9
HS HS |P; 10
HS
local
HS
local
\ HS; 11
if (HS
local
= ) then 12
return /* Early termination */; 13
HS
temp
HS
local
; 14
for P HS
temp
do /* Expand hitting set tree */ 15
(
, HS
local
) EXPAND HST(A B, O
\ P); 16
; 17
HS
local
HS
local
|P P
[P
HS
local
\ |P; 18
else if O
[= A B then 19
(, HS
local
) EXPAND HST(A B, O
); 20
k k + 1; 21
S s
k
(O, A B); 22
return 23
end 24
The relevance-based algorithm (Algorithm 1) receives an ontology O, a sub-
sumption A _ B of O and a selection function s, and outputs the set of all
justications . We sketch the basic idea of the algorithm and refer to [9] for
details of the algorithm. First of all, we nd the rst k such that A _ B is
inferred by the k-relevant subset O
and a set of local hitting sets, where a local hitting set is a hitting set for all
justications in the selected sub-ontology, i.e., O
1i
: A
1i
_ P
1i
Q
1i
Z,
2i
: P
1i
_ A
2i
Z,
3i
: Q
1i
_ A
2i
Z
4i
: A
2i
_ P
2i
Q
2i
Z,
5i
: P
2i
_ A
3i
Z,
6i
: Q
2i
_ A
3i
Z,
A Modularization-Based Approach to Finding All Justications 11
Algorithm 3. MODULE ALL JUSTS(A _ B, O)
Data: An ontology O and a subsumption A B
Result: All justications
begin 1
O
loc
A
EXTRACT MODULE(O, A) 2
return REL ALL JUSTS(A B, O
loc
A
, s
rel
) 3
end 4
{
11
,
21
,
41
,
51
}
11
21
{
11
,
31
,
41
,
51
} {
11
,
31
,
41
,
61
}
41
51
31
41
61
{
11
,
21
,
41
,
61
}
11
31
41
51
{
11
,
31
,
41
,
61
}
11
h
11
31
41
61
h h h
h h
11
21
41
61
h h h
h
Fig. 2. Finding all justications by HST algorithm on the locality-based module. Each
rectangle represents a justication, and the bold rectangle indicates a justication
reuse. means early path termination, while
= O
loc
A
11
21
still entails , and thus another justication can be com-
puted by calling SINGLE JUST(, O
11
,
21
,
41
,
61
. Observe that the node following the branch
51
is a result
of the optimization justication reuse.
12 B. Suntisrivaraporn et al.
Table 2. Benchmark ontologies and their characteristics
Ontologies Axioms Concepts Roles Module size Extraction time
Average Maximum (sec)
Galen 4 529 2 748 413 75 530 6
Go 28 897 20 465 1 16 125 40
Nci 46 940 27 652 70 29 436 65
5 Empirical Results
Our algorithm has been realized by using KAON2
4
as the black-box reasoner. Of
course, the method (like other black-box approaches) can be applied to any other
reasoner, e.g., RacerPro
5
and FaCT++
6
. To fairly compare with the pinpointing
algorithm in [10], we re-implemented it with KAON2 API (henceforth referred
to as ALL JUSTS algorithm). The experiments have been performed on a Linux
server with an Intel(R) CPU Xeon(TM) 3.2GHz running Suns Java 1.5.0 with
allotted 2GB heap space.
Benchmark ontologies used in our experiments are the Galen Medical Knowl-
edge Base
7
, the Gene Ontology (Go)
8
and the US National Cancer Institute
thesaurus (Nci)
9
. The three biomedical ontologies are well-known to both the
life science and Semantic Web communities since they are employed in real-world
applications and often used as benchmarks for testing DL reasoners. Both Go
and Nci are formulated in the lightweight DL cL, while Galen uses expressiv-
ity of the more complex DL oHT. Some information concerning the size and
characteristics of the benchmark ontologies are given in the left part of Table 2.
Modularization reveals structures and dependencies of concepts in the ontologies
as argued in [4,16]. We extract the (minimal) locality-based module for S = A
in O, for every benchmark ontology O and each concept name A CN(O). The
size of the modules and the time required to extract them are shown in the last
three columns of Table 2. Observe that the modules in Galen are larger than
those in the other two ontologies although the ontology itself is smaller. This
suggests that Galen is more complex in the sense that more axioms in it are
non-local (thus relevant) according to Denition 2.
In the experiments, we consider three concept names in CN(O) for each benchmark
ontology O such that one of them has the largest locality-based module
10
. For the
sake of brevity, we denote by subs(O) the set of all tested subsumptions A _ B
in O, with A one of the three concept names mentioned above and B an inferred
4
http://kaon2.semanticweb.org/
5
http://www.racer-systems.com/
6
http://owl.man.ac.uk/factplusplus/
7
http://www.openclinical.org/prj galen.html
8
http://www.geneontology.org
9
http://www.mindswap.org/2003/CancerOntology/nciOntology.owl
10
The concept name with largest module is hand-picked in order to cover hard cases in
our experiments, while the other two are randomly selected.
A Modularization-Based Approach to Finding All Justications 13
subsumer of A. For each O of our benchmark ontologies, we compute all justica-
tions for in O, where subs(O). In order to compare with the other existing
approaches, we performthe following for each and Oto compute all justications:
1. ALL JUSTS(, O) (i.e., the algorithm in [10]).
2. REL ALL JUSTS(, O, s
rel
);
3. MODULE ALL JUSTS(, O);
The justication results by MODULE ALL JUSTS are shown in Table 3, where
the ontology marked with means that some run does not terminate within the
two hour time-out. Precisely, there are three subsumptions in Go and one in Nci,
for which the computation took more than two hours. The statistics given on
the right hand side of the table does not take into account these subsumptions.
Table 3. Justication results using the modularization-based approach
Ontologies Subsumptions Justications Justication size
[subs(O)[ Average Maximum Average Maximum
Galen 69 1.5 4 9.7 24
Go
53 3.2 11 5.3 9
Nci
23 1.6 8 5.4 9
To visualize the time performances of the three algorithms, we randomly
selected two subsumptions
1
and
2
from subs(O) for each ontology O and
compared their computation time required by the three algorithms. These sub-
sumptions are shown as follows:
Galen:
1
AcuteErosionOfStomach _ GastricPathology
Galen:
2
AppendicularArtery _ PhysicalStructure
Go:
1
GO 0000024 _ GO 0007582
Go:
2
GO 0000027 _ GO 0044238
Nci:
1
CD97 Antigen _ Protein
Nci:
2
APC 8024 _ Drugs and Chemicals
The chart in Figure 3 depicts the overall computation time required for each
algorithm to nd all justications for each tested subsumption. Unlike the time
results reported in [10], which excluded the time for satisability checking, we re-
port here the overall computation time, i.e. the total time of the algorithm includ-
ing the time needed by the black-box reasoner for the standard reasoning tasks.
Observe that both ALL JUSTS and REL ALL JUSTS did not yield results within
the time-out of two hours on three out of six tested subsumptions (marked by
TO on the chart). Comparing these two algorithms (without modularization),
REL ALL JUSTS performs noticeably better than ALL JUSTS in most cases. For
instance, on the subsumptions Galen:
2
and Nci:
2
, REL ALL JUSTS outper-
forms ALL JUSTS by about 10 and 20 minutes, respectively. On the subsumption
Go:
2
, both algorithms show a similar performance, i.e., time dierence is less
than a minute. More explanations on the comparison between these two algo-
rithms can be found in [9].
14 B. Suntisrivaraporn et al.
TO TO TO TO TO TO
Subsumption
Module Size
Number of Justs
Justs Size(Avg)
0.01
0.1
1
10
100
1000
10000
Galen:X1 Galen:X2 GO:X1 GO:X2 NCI:X1 NCI:X2
293 133 25 26 436 9
4 2 10 1 2
19.5 6.5 6.9 6
T
i
m
e
(
s
e
c
)
ALL_JUSTS REL_ALL_JUSTS MODULE_ALL_JUSTS
1 1 2
6 6 6
Fig. 3. The time performance of three algorithms for nding all justications
Interestingly, MODULE ALL JUSTS outperforms all the other algorithms on
all subsumptions, and the improvement is tremendous as can be seen in all
cases in the chart. This empirically conrms our initial conjecture that, given
the strongness property (in the sense of Denition 3) and the small size (see
Table 2 and [6,16]) of locality-based modules, our optimization should be highly
eective. As an example, MODULE ALL JUSTS took only 0.6 seconds to nd
all the justications for Nci:
2
, while REL ALL JUSTS needed 3 242 seconds. In
this case, the locality-based module for APC 8024 in Nci consists of 9 axioms,
whereas the whole ontology has some tens of thousands of axioms. Although
the selection function used in REL ALL JUSTS also prunes the search space by
considering only k-directly relevant axioms (see Denition 7) when HST algo-
rithm is executed, several irrelevant axioms (in the sense of syntactic locality)
are still considered.
6 Conclusion
In this paper, we proposed a novel approach for nding all justications for an
entailment in OWL DL. The approach is based on the computation of minimal
locality-based modules. We rst showed that locality-based modules always cover
all axioms in all justications and exploited this property to limit the search
space when nding all justications. Then, we presented a modularization-based
pinpointing algorithm that is based on relevance-based techniques and a hitting
set tree algorithm. Finally, we reported on several promising empirical results
that demonstrate an improvement of several orders of magnitude in eciency and
scalability of nding all justications in OWL DL ontologies. Our work is based
on locality-based modules. As future work, we shall investigate dierent kinds
of modules and selection functions that hopefully produce even more relevant
axioms for pinpointing.
A Modularization-Based Approach to Finding All Justications 15
Acknowledgements. This work was partially supported by the DFG project
under grant BA1122/11-1 and the EU under the IST project NeOn (IST-2006-
027595) http://www.neon-project.org.
References
1. Baader, F., Pe naloza, R.: Axiom pinpointing in general tableaux. In: Olivetti, N.
(ed.) TABLEAUX 2007. LNCS, vol. 4548, pp. 1127. Springer, Heidelberg (2007)
2. Baader, F., Pe naloza, R., Suntisrivaraporn, B.: Pinpointing in the description logic
c/
+
. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS, vol. 4667,
pp. 5267. Springer, Heidelberg (2007)
3. Baader, F., Suntisrivaraporn, B.: Debugging SNOMED CT using axiom pinpoint-
ing in the description logic c/
+
. In: Proceedings of KR-MED 2008: Representing
and Sharing Knowledge Using SNOMED (2008)
4. Grau, B.C., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: The-
ory and practice. J. of Articial Intelligence Research (JAIR) 31, 273318 (2008)
5. Cuenca Grau, B., Halaschek-Wiener, C., Kazakov, Y.: History matters: Incremen-
tal ontology reasoning using modules. In: Aberer, K., Choi, K.-S., Noy, N., Alle-
mang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi,
R., Schreiber, G., Cudre-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS,
vol. 4825, pp. 183196. Springer, Heidelberg (2007)
6. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Just the right amount:
Extracting modules from ontologies. In: Proc. of WWW 2007, Ban, Canada, pp.
717726. ACM, New York (2007)
7. Horrocks, I., Sattler, U.: A tableaux decision procedure for S1O1Q. In: Proc. of
IJCAI 2005, pp. 448453 (2005)
8. Huang, Z., van Harmelen, F., ten Teije, A.: Reasoning with inconsistent ontologies.
In: Proc. of IJCAI 2005, pp. 254259 (2005)
9. Ji, Q., Qi, G., Haase, P.: A relevance-based algorithm for nding justications of
DL entailments. In: Technical report, University of Karlsruhe (2008),
http://www.aifb.uni-karlsruhe.de/WBS/gqi/papers/RelAlg.pdf
10. Kalyanpur, A., Parsia, B., Horridge, M., Sirin, E.: Finding all justications of
OWL DL entailments. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee,
K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber,
G., Cudre-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp.
267280. Springer, Heidelberg (2007)
11. Kalyanpur, A., Parsia, B., Sirin, E., Hendler, J.: Debugging unsatisable classes in
OWL ontologies. Journal of Web Semantics 3(4), 268293 (2005)
12. Meyer, T., Lee, K., Booth, R.: Knowledge integration for description logics. In:
Proc. of AAAI 2005, pp. 645650. AAAI Press, Menlo Park (2005)
13. Reiter, R.: A theory of diagnosis from rst principles. Articial Intelligence 32(1),
5795 (1987)
14. Schlobach, S., Cornet, R.: Non-standard reasoning services for the debugging of
description logic terminologies. In: Proc. of IJCAI 2003, pp. 355362 (2003)
15. Schlobach, S., Huang, Z., Cornet, R., van Harmelen, F.: Debugging incoherent
terminologies. J. Autom. Reasoning 39(3), 317349 (2007)
16. Suntisrivaraporn, B.: Module extraction and incremental classication: A prag-
matic approach for c/
+
ontologies. In: Bechhofer, S., Hauswirth, M., Homann,
J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 230244. Springer,
Heidelberg (2008)