Organization and Structure of Information Using Semantic Web Technologies
Organization and Structure of Information Using Semantic Web Technologies
Introduction
Today's web has millions of pages that are dynamically generated from content
stored in databases. This not only makes managing a large site easier, but is
necessary for fully functioning ecommerce and other large, interactive websites.
These local databases, in one sense, are not full participants in the web. Though
they present normal looking HTML pages, the databases themselves are not
interconnected in any way. Organization X has basically no way of using or
understanding Organization Y's data. If these two want to share or merge
information, the database integration would be a fairly significant undertaking. It
would also be a one time solution. If Organization Z entered the picture, a new
merging effort would have to be undertaken.
As the web stands, this has not been a significant problem. By design, the web has
been a vehicle for conveying information in a human readable form – computers
had no need to understand the content. As dynamic sources of information have
become omnipresent on the web, the World Wide Web Consortium has
undertaken efforts to make information machine readable. This technology,
collectively called the Semantic Web, allows computers to understand and
communicate with one another. For site designers, this means data from other
sites can be accessed and presented on your own website, and your own public
data can be made easily accessible to anyone. It follows that just as web pages are
currently hyperlinked, data can also be linked to form a second web behind the
scenes, allowing full across-the-web integration of data.
The nature of the web, with interconnected information, does not extend backend
databases or media either. It is usually not possible for a web designer to use
information from an external database to drive their own site. The databases are
not publicly accessible for queries, nor is the underlying organization of the
database apparent.
The Semantic Web is a vision for the future of the World Wide Web that will give
meaning to all of this data, as well as making it publicly accessible to anyone who
is interested. While some web sites and designers will want to keep their backend
data proprietary, many will find it in their interest, for design and public interest,
to use semantic encodings.
This chapter will introduce the semantic web, explain how to organize content for
use on the semantic web, and show several examples of how it can be used.
Throughout the discussion, we will describe how the technologies affect the
human factors in web design and use.
Motivations
The “formal” models of the domain enabled by the ontologies provide a number
of new capabilities, but also require extra work with respect to entering the
metadata appropriately, developing the vocabularies, etc. To justify the significant
added effort required for good encoding of the semantics behind a given
application, users should understand some of the benefits that will be available,
and doors that are opened. There are many places where the semantic web can
improve the way things are done on the web now, and add new capabilities
beyond what is available now on the web. The following sections enumerate some
of the visions for the Semantic Web as put forth by the World Wide Web
Consortium’s Web Ontology Working Group in a document outlining use cases
and requirements for ontologies on the Web (Heflin, 2003).
Web portals
A Web portal is a web site that provides information content on a topic. While the
term has become common for full-web search engines such as Google, portals in
the traditional sense are also domain specific pages that do not necessarily have a
search feature. The goal is to provide users with a centralized place to find links,
newsgroups, and resources on a topic.
For portals to work well, they need to be good sources of information to
encourage the community to participate in maintaining and updating their content.
To create a semantic web portal, where information is well annotated and
maintained in a semantic web format, the same is true. Users need some
motivation to do the markup that makes the site work.
The vision for semantic web portals is to not only make them available as online
web pages, but also to integrate them into tools. On web pages, users can find
resources based on their semantic markup. To encourage users to create their own
metadata, tool integration of portal features is key. For example, if a scientist
authoring a paper or web page uses a particular term from an online ontology, the
semantic web portal feature should return other sources with similar markup.
Results would most certainly return related web pages. They will also provide
links to images, video, audio files, or datasets whose content is described by the
same term. By these sorts of providing useful information and resources, which
could not be found with a standard text-based keyword search, users will be
encouraged to mark up their documents so that they make take advantage of the
portal.
What allows this system to work more fully is the integration of the markup
process with the portal. The portal provides the most advantage to users while
they are creating their own semantic web documents. Thus, after providing
information to the user, the portal itself is extended when the new markup is
published. This interactive cycle means that semantic web portals will reach out
to incorporate external resources, as well as creating a dense web of semantically
interlinked documents.
Multimedia collections
An ontology-based web site allows users to search and navigate using specific,
ontologically defined terms. This will make documents easier to find, and cross-
references easier to track down. Later on, this chapter will discuss one website
using semantic markup as its foundation.
Design documentation
Documentation of systems is often very complex. Large sets of documents with
overlapping scopes have several presentation challenges. Since documents are
generally grouped thematically, it is not unusual for several sets of documentation
to address different aspects of the same sub problem. For a client who is trying to
find data on the sub problem alone, there is sometimes no choice but to navigate
through several sets of complex documents. Even when the desired information is
contained in one set, the level of detail can often be overwhelming.
Troubleshooting problems on a website, for example, usually demands a less
detailed analysis from the user when compared to the system administrator.
Web services are sets of functions that can be executed over the web. When
services are semantically marked up, they become available for agents to find,
compose, and execute in conjunction with data also found on the semantic web.
Already, there are hundreds of web services, and a fast growing number of agents
and tools (Sirin et al., 2002) that can work with them.
Ubiquitous computing
Ubiquitous computing describes a movement from hard-wired personal
computing devices, to embedding devices in the environment and making them
available to any other wireless device. For these systems to work effectively, each
device needs to make itself known to the environment and advertise what types of
inputs it requires and what it is able to output. When agents are introduced to the
system, needing to configure a collection of services and devices to accomplish a
goal, it is important to have the ability to reason over the descriptions of the
devices and their capabilities.
On the semantic web, all of this information and more would be available for
computers to understand. A number of research efforts have explored the
representation of ontological information on the Web (see references 15,20,17,
16). A language called DAML+OIL was released in March 2001 as the result of a
joint committee of US and European researchers working together to develop a de
facto standard. In November of 2001, the W3C created the Web Ontology
Working Group to develop a recommendation based on DAML+OIL. The
resulting language, OWL is emerging as the standard language to use for these
applications, and a set of tools for OWL is being produced as part of the W3C
process and under both US and EU funding. OWL is based on the Resource
Description Framework (RDF) and its extension RDF Schema .
Using OWL, users can encode the knowledge from the webpage, and point to
knowledge stored on other sites. To understand how this is done, it is necessary to
have a general understanding of how Semantic Web markup works.
With OWL, users define classes, much like classes in a programming language.
These can be sub-classed and instantiated. Properties allow users to define
attributes of classes. In the example above, a "Photo" class would be useful.
Properties of the Image class may include the URL of the photo on the web, the
date it was taken, the location, references to the people and objects in the picture,
as well as what event is taking place. To describe a particular photo, users would
create instances of the Image class, and then fill in values for the Image's
properties. In a simple table format, the data may look like this
Photo
Name: ParisPhoto1
URL: http://www.example.com/photo1.jpg
Date Taken: June 26, 2001
Location: Parc Du Champ De Mars, Paris, France
Person in Photo: John Doe
Person in Photo: Joe Blog
Object in Photo: Eiffel Tower
Since each resource has a unique name, it allows authors to make reference to
definitions elsewhere. In our ontology above, the author can make definitions of
the two travelers, John Doe and Joe Blog:
Person
Name: JohnDoe
First Name: John
Last Name: Doe
Age: 23
Person
Name: JoeBlog
First Name: Joe
Last Name: Blog
Age: 24
Then, in the properties of the photo, these definitions can be referenced. Instead of
having just the string "John Doe", the computer will know that the people in the
photo are the same ones defined in the ontology, with all of their properties.
Object in Photo:
http://www.example.com/parisHistroyOntology.owl#EiffelTower
The benefit of this linking is similar to why links are used in HTML documents. If
a web page mentions a book, a link to its listing on Amazon offers many benefits.
Thorough information about the book may not belong on a page that just
mentions it in passing, or an author may not want to retype all of the text that is
nicely presented elsewhere. A link passes users off to another site, and in the
process provides them with more data. References on the Semantic Web are even
better at this. Though the travelers in this example may not know much about the
Eiffel Tower, the authors of the Paris History Ontology may have included all
sorts of interesting data about the history, location, construction, and architecture
of the Eiffel Tower in their definition. By making reference to that definition in
the description of the trip, the computer understands that the Eiffel Tower in the
photo has all of the properties described in the History Ontology. This means,
among other things, that agents and reasoners can connect the properties defined
in the external file to our file.
Subject: http://www.example.com/parisTrip.owl#JohnDoe
Predicate: http://www.example.com/parisTrip.owl#age
Value: 23
Subject: http://www.example.com/parisTrip.owl#JohnDoe
Predicate: http://www.example.com/parisTrip.owl#firstName
Value: John
In the two examples above, the predicates relate the subject to a string value. It is
also possible to relate two resources through a predicate. For example
Subject: http://www.example.com/parisTrip.owl#ParisPhoto1
Predicate: http://www.example.com/parisTrip.owl#objectInPhoto
Object: http://www.example.com/parisHistoryOntology.owl#EiffelTower
1
Actually, we slightly simplify the treatment of datatypes, as the details are not
relevant to this chapter. Readers interested in the full details of the RDF encoding
are directed to Re s o u r c e Description Framework (RDF).
http://www.w3.org/RDF/.
Each of these triples forms a small graph with two nodes, representing the subject
and object, connected by an edge representing the predicate. The information for
John Doe is represented in a graph as shown below:
Taking all of descriptors from the previous section and encoding them as triples
will produce a much more complex graph:
Figure 2: The graph of triples from the Paris example
As documents are linked together, joining terms, these graphs grow to be large,
complex interconnected webs. To create them, we need languages that support the
creation of these relationships. Though there are many languages that can be used
on the semantic web, the most popular is RDF, with OWL emerging as a new,
more powerful extension.
Document Skeleton
There are several flavors of RDF, but the version this chapter will focus on is
RDF/XML, which is RDF based on XML syntax. The skeleton of an RDF
document is as follows:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
</rdf:RDF>
The tag structure is inherited from XML. The RDF tag begins and ends the
document, indicating that the rest of the document will be encoded as RDF. Inside
the rdf:RDF tag is an XML namespace declaration, represented as an xmlns
attribute of the rdf:RDF start-tag. Namespaces are convenient abbreviations for
full URIs. This declaration specifies that tags prefixed with rdf: are part of the
namespace described in the document at http://www.w3.org/1999/02/22-rdf-
syntax-ns#.
Defining Vocabularies
RDF provides a way to create instances and associate descriptive properties with
each. RDF does not, however, provide syntax for defining Classes, Properties, and
describing how they relate to one another. To do that, authors use RDF Schema
(RDFS). RDF Schema uses RDF as a base to specify the a set of pre-defined RDF
resources and properties that allow users to define Classes and restrict Properties.
The RDFS vocabulary is defined in a namespace identified by the URI reference
http://www.w3.org/2000/01/rdf-schema#", and commonly uses the prefix "rdfs:".
This namespace is added to the rdf tag.
Describing Classes
Classes are the main way to describe types of things we are interested in. Classes
are general categories that can later be instantiated. In the previous example, we
want to create a class that can be used to describe photographs. The syntax to
create a class is written:
<rdfs:Class rdf:ID="Photo"/>
The beginning of the tag "rdfs:Class" says that we are creating a Class of things.
The second part, "rdf:ID" is used to assign a unique name to the resource in the
document. Names always need to be enclosed in quotes, and class names are
usually written with the first letter capitalized, though this is not required. Like all
XML tags, the rdfs:Class tag must be closed, and this is accomplished with the "/"
at the end.
Classes can also be subclassed. For example, if an ontology exists that defines a
class called "Image", we could indicate that our Photo class is a subclass of that.
<rdfs:Class rdf:ID="Photo">
<rdfs:subClassOf rdf:resource=
"http://example.com/mediaOntology.rdf#Image"/>
</rdfs:Class>
The rdfs:subClassOf tag indicates that the class we are defining will be a subclass
of the resource indicated by the rdf:resource attribute. The value for rdf:resource
should be the URI of another class. Subclasses are transitive. Thus, if X is a
subclass of Y, and Y is a subclass of Z, then X is also a subclass of Z. Classes
may also be subclasses of multiple classes. This is accomplished by simply
adding more rdfs:subClassOf statements.
Describing Properties
Properties are used to describe attributes. By default, Properties are not attached
to any particular Class; that is, if a Property is declared, it can be used with
instances of any class. Using elements of RDFS, Properties can be restricted in
several ways.
<rdf:Property rdf:ID="objectInPhoto"/>
This creates a Property called "objectInPhoto" which can be attached to any class.
To limit the domain of the property, so it can only be used to describe instances of
the Photo class, we can add a domain restriction:
<rdf:Property rdf:ID="objectInPhoto">
<rdfs:domain rdf:resource="#Photo"/>
</rdf:Property>
Here, we use the rdfs:domain tag that limits which class the Property can be used
to describe. Here, the rdf:resource is used the same way as in the subclass
restriction above, but we have used a local resource. Since the Photo class is
declared in the same namespace (the same file, in this case) as the
"objectInPhoto" property, we can abbreviate the resource reference to just the
name.
Sub-properties inherit any restrictions of their parent Properties. In this case, since
the objectInPhoto property has a domain restriction to Photo, the personInPhoto
has the same restriction. We can also add restrictions. In addition to the domain
restriction which limits which classes the property can be used to described, we
can add range restrictions which limit what types of values the property can
accept. For the personInPhoto Property, we should restrict the value to be an
instance of the Person class. Ranges are restricted in the same way as domains:
<rdf:Property rdf:ID="personInPhoto">
<rdfs:range rdf:resource="#Person"/>
</rdf:Property>
Creating Instances
Once this structure is set up, our instances can be defined. Consider the previous
triple that we described as plain text:
Person
Name: JoeBlog
First Name: Joe
Last Name: Blog
Age: 24
Here, JoeBlog is the subject, and is an instance of the class Person. There are also
Properties for age, first name, and last name. Assuming we have defined the
Person class and its corresponding properties, we can create the Joe Blog
instance:
<Person rdf:ID="JoeBlog">
<firstName>Joe</firstName>
<lastName>Blog</lastName>
<age>24</age>
</Person>
In the simplest case, the classes and properties we are using are declared in the
same namespace as where our instances are being defined. If that is not the case,
we use namespace prefixes, just as we used with rdf: and rdfs:. For example, if
there is a property defined in an external file, we can add a prefix of our choosing
to the rdf tag:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:edu="http://example.com/education.rdf#">
OWL
The OWL (Web Ontology Language) is a vocabulary extension of RDFS that
adds the expressivity needed to define classes and their relationships more fully.
Since OWL is built on RDF, any RDF graph forms a valid OWL ontology.
However, OWL adds semantics and vocabulary to RDF, and RDFS, giving it
more power to express complex relationships.
OWL introduces many new features over what is available in RDF and RDFS.
They include, among others, relations between classes (e.g. disjointness),
cardinality of properties (e.g. "exactly one"), equality, characteristics of properties
(e.g. symmetry), and enumerated classes. Since OWL is based in the knowledge
engineering tradition, expressive power and computational tractability were major
concerns in the drafting of the language. Features of OWL are well documented
online (McGuinness, van Harmelen, 2003), and an overview is given here. Since
OWL is based on RDF, the syntax is basically the same. OWL uses Class and
Property definitions and restrictions from RDF Schema. It also adds the following
syntactic elements:
<owl:AllDifferent>
<owl:distinctMembers rdf:parseType="Collection">
<Opera rdf:about="#Don_Giovanni"/>
<Opera rdf:about="#Nozze_di_Figaro"/>
<Opera rdf:about="#Cosi_fan_tutte"/>
<Opera rdf:about="#Tosca"/>
<Opera rdf:about="#Turandot"/>
<Opera rdf:about="#Salome"/>
</owl:distinctMembers>
</owl:AllDifferent>
Property Characteristics:
• inverseOf – This indicates inverse properties. For example,
"picturedInPhoto" for a Person would be the inverseOf the "personInPhoto"
property for Photos.
• TransitiveProperty – Transitive properties state that if A relates to B with a
transitive property, and B relates to C with the same transitive property,
then A relates to C through that property.
• SymmetricProperty – Symmetric properties state that if A has a symmetric
relationship with B, then B has that relationship with A. For example, a
"knows" property could be considered transitive, since if A knows B, then
B should also know A.
• FunctionalProperty - If a property is a FunctionalProperty, then it has no
more than one value for each individual. "Age" could be considered a
functional property, since no individual has more than one age.
• InverseFunctionalProperty – Inverse functional properties are formally
properties such that their inverse property is a functional property. More
clearly, inverse functional properties are unique identifiers.
Class Intersection:
2
Both of these tools were developed in our lab and are available for download at
http://www.mindswap.org/.
Instance Creation
Users will often want to create Semantic Web markup for individual web pages,
photos, or concepts, rather than making a mass conversion of existing data. One
of several tools available to assist the user in creating instances is the RDF
Instance Creator (RIC) (Golbeck et al., 2002). The tool lets users import existing
ontologies, choose a class from those available, and then create an instance by
simply filling in a form.
When a class is selected, the user is presented with a workspace that lists all of the
known properties of that class. In the screen shot shown in Figure 5, the user is
creating an instance of the class "Athlete". The known properties of Athlete, such
as "weight", "eyeColor", and "height" are shown in the workspace, and the user
can enter the values. Though these first properties just take strings as values, RIC
also allows the user to link objects. The "plays" property shown below, for
example, requires an instance of the "Sport" class as its value. The user can either
create a new instance of a sport to act as the object in the triple, or an existing
instance can be linked in.
RIC also facilitates the extension of existing ontologies. Users may add a property
to any existing class, and the RDF for the new property is stored in the local
output file. Users have the capability to add new classes, as well. These may be
independent classes, or subclass any classes that have been imported from other
ontologies. For users who are new to the semantic web with limited understanding
of the underlying languages, a lightweight tool like RIC can hide most of the ugly
details, and jumpstart the instance authoring process.
Most people are not ontological engineers, domain experts, or logicians, or even
programmers, so its unlikely that they will be able to read, sort through, and grasp
how to apply large ontologies, much less construct their own. Aside from the
difficulty of learning how to model content in a reasonably correct and formal
way, current Web focused knowledge engineering tends to involve either an
interruption of normal workflow and techniques (e.g., switching to an RDF editor
to create RDF content which is then linked to an HTML page (McGuiness, van
Harmelen , 2003), (Bechhofer, Ng), (Staab et al., 2002)) or a wholesale
abandonment of prior practice. While there are many tools for easing ontology
creation and knowledge acquisition, few focus on how normal Web authors work.
Most tools are geared only toward ontology development (Musen et al., 2002)
This forces the author into a two-step situation where either the author must first
create the content and then annotate, or create all of the content in a knowledge
creation context and then render it to HTML in some fashion.
SMORE (Semantic Markup, Ontology and RDF Editor) is a tool whose design is
driven by the idea that much Semantic Web based knowledge acquisition will
look more like Web page authoring than traditional knowledge engineering. It
blurs the line between normal content creation and Semantic annotation, but
SMORE also supports ad hoc ontology use, modification, combination, and
extension.
SMORE lets users add triples to a document that describe a particular photo as a
whole. One of its interesting features also allows sub-image annotation. Using
standard drawing-like tools (squares, circles, polygons, etc.), the user delineates a
region of a photo. The user then can represent facts about that region. One crucial
fact about these regions is that the user can assert is what they depict. Subsequent
annotations can then be about the depicted object. For example, in the screen shot
below, the photo depicts Bonnie, an orangutan housed at the National Zoo. One of
Bonnie’s identifying features is a bulbous forehead, and in this markup, the
feature is mapped from the overall photo, semantically described, and noted in
connection with other info about Bonnie.
Figure 4: The SMORE interface, showing the sub-image markup feature
On a web site generated using Semantic Web technology, as with any good
dynamically generated website using traditional database methods, the average
user does not see anything different from hard-coded HTML site. This means that
users who are viewing the page do not need to even be aware of the underlying
technology, and the usability of the website is not affected.
The real human factors change arises for the web site managers. Instead of
potentially complicated software with a centralized and engineered database,
information for a Semantic Web based site can be distributed across the web, and
automatically incorporated as dynamic content. For example, in a current database
backed dynamic website, a website that presents the day’s headlines would
potentially have to collect stories and news from a variety of wire services,
convert each source of that data into the database format, and then load it into the
database before it can appear on the page. In a world using Semantic Web
technology, each wire service would maintain its news headlines as RDF or OWL
documents that would be available on the web. To display this information, the
centralized news service would only need to do a one time description of how the
ontology used for marking up the news of each wire service maps to the ontology
or formatting used for the website. Because the wire services automatically update
their news, the centralized site would merely have to retrieve the latest RDF or
OWL document from each service and use the pre-defined mappings to present
that data on a page. By allowing each source to maintain their own data, the
central site that presents that data is freed from maintaining a central database,
updating that database, and worrying about consistency between central news
service and wire services.
One may argue that a system of automated retrieval and conversion of data in the
traditional database model is quite similar to the scenario described above. An
even clearer benefit can be seen in the case where a wire service may change the
format of their news. In a system of automated conversion, a human user would
have to manually update the code that does the conversion. A change as simple as
swapping the position of article date and article author, or changing the name of a
field, say from “Author Name” to “Byline” could break a converter. Conversely,
in the Semantic Web model, the ontology dictates the structure of data. As long as
a revised ontology is based on the original version, the system will continue to
function. This gives the maintainer the freedom to update mappings leisurely,
since breakage is less likely to occur.
One final benefit, before explaining how to implement such a site, is that of the
standard format of data. Because all Semantic Web data is based on standards, it
is easy to connect information maintained by separate sources. If a wire service
connects an author to each article, and a separate service maintains information
about authors, Semantic Web technology makes it possible to automatically
connect the two because of the shared data format. Thus, not only is a hyperlinked
web of connected pages built, but a secondary web of connected data emerges.
Users could read an article form a wire service, click on the authors name to get
some biographical information about them from a second service, and then
perhaps click on their hometown to find out more about the place with
information provided by yet another service. In some cases, where the ontologies
of each service interlink themselves, this is a trivial task for the website manager.
Even in more complex cases, the burden on the manager is much lighter because
of the common format permits the ability to easily link data from distributed
sources.
Implementation Details
Mindswap is the Maryland Information and Network Dynamics Lab Semantic
Web Agents Project, based at the University of Maryland, College Park. The
website http://owl.mindswap.org was created to showcase the tools and
technologies developed in the lab, and to become the first website generated using
Semantic Web technology exclusively.
RDF and OWL are used to store all of the local information for the site. A
collection of ontologies describes any domain relevant concepts, such as people,
news items, downloads, and paper references. The site is divided into categories,
and instance data is presented on each page.
Data Storage
Data exists in two places simultaneously. RDF and OWL files that contain the
data are available on the web server and available for download, allowing
interchange with other sites. RDF is also stored in a backend database, and is
manipulated and accessed via the Redland application Framework.
Using a database of RDF as the backend for the web site raises the question “why
not just use a standard database?” The answer is that to do so would require
building an extensive hierarchy anyway, and would not be portable to other sites.
With an RDF base, stored in a more traditional database, all of the site's data can
be accessible to anyone on the web. No capabilities are taken away with this
approach, since the RDF can easily be edited or changed, and the backend
database can be updated in real time.
The Redland framework not only mirrors RDF found on the owl.mindswap.org
site – it can also import data from other web servers. This allows querying based
on ontologies created by other organizations. Any user can submit the URIs of
their RDF and OWL data through a form on the site. That data is immediately
added to the database, and will appear on any pages that use the same semantics.
If any external pages are changed, users can request an update via the website, so
even the external data in the database is kept consistent with the files.
Generating HTML
HTML web pages are generated from the database. For example, one of the
Mindswap ontologies defines a class called "Swapper", which is used to refer to
any members or affiliates of the lab. Subclasses of "Swapper" include "Graduate
Students", "Faculty", and "Alumni" to name a few. The "People" page on the
website queries through Redland for all subclasses of “Swapper”, and retrieves all
instances of each of those subclasses. A nested list in HTML is then used to
represent the hierarchy of types of “Swappers” and information about each
individual.
Figure 4: The People page from http://owl.mindswap.org
Because instances are interlinked, users of the site are not restricted to viewing
information from a specific category. A common example of this is finding items
created by a particular person. Any RDF instance generated for the Mindswap site
can have a “creator” property, which will be an instance of a “Swapper.” This
makes it easy to find and list all RDF entries created by a particular person.,
including news items, papers, and software.
It is not uncommon to find HTML web pages that present data in a well
formatted, structured layout. Tables of data certainly fall into this category, as do
many auto-generated pages, such as eBay listings or Amazon.com products. If
data is available in this format, it can be "scraped" into a tool that can output
corresponding RDF.
<b>John Doe</b>
<br><img
src="http://www.example.com/images/johndoe.jpg">
<br>
Mr. Doe can be emailed at <i>[email protected]</i>.
By specifying the three points above, the software can extract a simple table of
data from a page. The screen scraper also has the capability to crawl over a
number of pages. This means that even if a server generates a different page for
each person, each page can be scraped and the data can be aggregated. Once a
table of data has been collected, the user can specify how each column should be
translated into RDF. Columns may be turned into class names, instances of
existing classes, or attached to an instance as values for pre-defined properties.
The situation is similar for spreadsheets and simple databases. Since these types
of files are often highly structured in a straightforward way, the step of "scraping"
is unnecessary, and direct conversion to RDF is fairly simple.
ConvertToRDF uses a simple text file stating that each row corresponds to a
Person (as defined in an existing ontology), and that each column corresponds to a
particular, pre-defined property of Person. With these converter tools, it is trivial
to produce thousands of RDF triples in minutes. Depending on the detail within a
given database, this may result in fairly rich data models with minimal effort from
the user.
References
[1] Golbeck, J., Grove,M., Parsia, B., Kalyanpur, A., Hendler, J. (2002). New
Tools for the Semantic Web in Proceedings of 13th International
Conference on Knowledge Engineering and Knowledge Management
EKAW02 Siguenza, Spain.
[2] NCI Center for Bioinformatics caCORE :
http://ncicb.nci.nih.gov/NCICB/core
[3] McGuinness, D., van Harmelen, F. (2003) Web Ontology Language
(OWL): Overview. http://www.w3.org/TR/owl-features/
[4] Heflin, J. (2003). Web Ontology Language (OWL) Use Cases and
Requirements. http://www.w3.org/TR/2003/WD-webont-req-20030203/.
[5] Payne, T., Singh, R., & Sycara, K. (2002). Calendar Agents on the
Semantic Web. IEEE Intelligent Systems, Vol. 17 (3), 84-86.
[6] Sirin, E., Hendler, J., Parsia, B. (2002). Semi-automatic Composition of
Web Services using Semantic Descriptions. Web Services: Modeling,
Architecture and Infrastructure Workshop at ICEIS2003.
[7] Golbeck, J., Parsia, B., Hendler, J. (2003). Trust Networks on the
Semantic Web” Proceedings of Cooperative Intelligent Agents 2003.
Helsinki, Finland.
[8] van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D., Patel-
Schneider, P., Stein, L. (2003). Web Ontology Language (OWL)
Reference Version 1.0 W3C Working Draft 21 February 2003.
http://www.w3.org/TR/owl-ref/
[9] Evett, M., Hendler, J., & Spector, L. (1994). Parallel Knowledge
Representation on the Connection Machine, Journal of Parallel and
Distributed Computing, 22, 168-184.
[10] Musen, M., Fergerson, R., Grosso, W., Noy, N., Crubezy, M., & Gennari,
J. (2002). Component-Based Support for Building Knowledge-Acquisition
Systems. Conference on Intelligent Information Processing (IIP 2000) of
the International Federation for Information Processing World Computer
Congress (WCC 2000). Beijing.
[11] Bechhofer, G. & Ng, G. OilED. http://img.cs.man.ac.uk/oil/.
[12] Staab, S., Sure, Y., Erdmann, M., Wenke, D., Angele, J., Studer, R.
(2002). OntoEdit: Collaborative Ontology Development for the Semantic
Web. Proceedings of the first International Semantic Web Conference
2002 (ISWC 2002). Sardinia, Italia.
[13] Jan Winkler. RDFedt. http://www.jan-winkler.de/dev/e_rdfe.htm.
[14] The DAML+OIL Language. http://www.daml.org/2001/03/daml+oil-
index.html
[15] SHOE (Simple HTML Ontology Extensions).
http://www.cs.umd.edu/projects/plus/SHOE/.
[16] The OIL Language. http://www.ontoknowledge.org/oil/.
[17] Extensible Markup Language (XML). http://www.w3.org/XML/.
[18] Resource Description Framework (RDF). http://www.w3.org/RDF/.
[19] RDF Schema. http://www.w3.org/TR/rdf-schema/.
[20] Ontobroker. http://ontobroker.aifb.uni-karlsruhe.de/index_ob.html.
[21] McGuinness , D., van Harmelen , F. OWL Web Ontology Language.
http://www.w3.org/TR/owl-features/
[22] Mutton, P. & Golbeck, J. (2003). Visualizing Ontologies and Metadata on
the Semantic Web. Proceedings of Information Visualization 2003.
London.