Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views5 pages

Integration Biomedical 4 12

Uploaded by

sarahpuree1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

Integration Biomedical 4 12

Uploaded by

sarahpuree1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Integration of biomedical data using semantic

web technologies
Sarah Taleb Altaimani 2400664
Supervised by: Bassma Alsulami

Abstract— Biology research today is heavily reliant on


information availability and efficient usage. Different biological • Variations in the forms used to describe data are
data sources are frequently merged to provide new referred to as syntactic heterogeneity.
understanding. Since semantic web technologies provide a
common foundation that enables data sharing and reuse across • Divergent interpretations of the meaning of various
apps, they can be used to manage biological data that is publicly resources lead to semantic heterogeneity.
available. The use of these technologies in the life sciences is
extremely difficult, nevertheless, because of certain features of
The Semantic Web aims to give current texts and data a
biological data. via the integration of biological data use case. clear semantic meaning by adding structured meta-
Understanding systems with a broad range of sizes and information. This semantic extension facilitates the
distributions is essential for current life science research. Thus, it integration of Web-based information and automates
is imperative that biomedical knowledge produced by many machine processing by enabling machines to process
sources be integrated. Life science databases are becoming more semantic data, but not human writing or speech. The
and more numerous, larger, and more complicated. Therefore, Semantic Web's fundamental concept is to define and
the requirement for a concept to integrate their data is a characterize relationships between resources on the World
significant problem for researchers in the fields of genetics,
Wide Web by adding machine-readable metadata to them.
metabolism, clinical medicine, and drug discovery. The diversity
and volume of data available, the inconsistency of data from
This information can be assimilated by semantic web
various sources, the independence of sources and their varying technology. Furthermore, it functions as an extension of
capabilities, and the absence of criteria for this concept of the current web rather than creating a new network. Each
integration are some of the obstacles that still need to be layer of semantic web technology leverages the
addressed in order to produce such a cohesive idea of data capabilities of the layers underneath it through a
integration. Linking sources in a way that allows computer hierarchical application of various standards and
processes to navigate and recognize them intelligently will be approaches.
crucial. . One viable method for combining several data sources
is to use semantic web technology. They provide a framework for There are two aspects of the semantic web. Whereas the
dealing with the problems brought up. This paper provides an original web was primarily concerned with document
overview of data integration for biological data using Semantic exchange, It concerns standard methods for combining and
Web technologies, including details on existing standards, integrating data from many sources. It also has to do with
specifications, techniques, issues, and strategies.
a language for recording the connection between data and
Keywords: Ontology, Biomedical Data, Data Integration, and real objects. Instead of using wires, this allows a human or
Semantic Web a machine to start in one database and move through an
endless number of databases that are linked by a shared
interest. By providing current documents and data with
I. INTRODUCTION organized meta-information, the Semantic Web method
can help solve heterogeneity in data integration. The use of
The process of merging data from several sources to give
semantics, which provides meaning to a phrase or concept,
the user a logical representation of the data is known as
is a crucial component of information integration.
data integration. However, there are certain obstacles that
must be addressed in order to complete the work of Since semantics can ensure that two concepts with
integrating many heterogeneous sources. This incapacity different names and forms (synonyms) are equivalent or
to compare falls into four categories in computer science: that two concepts with the same name and form
(synonyms) are different, it can resolve the issue of
• Different operating systems and hardware platforms synonyms and synonyms between various sources.
lead to system heterogeneity. Relationships between concepts are described by
• Different data sources exhibit structural heterogeneity semantics. This enables a thorough description of the
due to varying models or data structures. available data, demonstrating the interplay between ideas
and allowing for inferences. Such semantic ontologies are
described by technologies for the meaningful Web. and semantics. Medical data is also rapidly growing,
Making the most of ontologies requires linking the data to heterogeneous, distributed, and unstructured. Medical staff
its semantic content. To put it another way, the ontology must use a single language to anonymously query the
needs annotations. However, such data frequently comes dispersed EHR systems. Data integration and combination are
in a variety of data types (websites, text files, relational essential for sharing information, retrieving patient histories,
databases, etc.). This issue can be resolved by adding and generating questions. Many healthcare systems can use
metadata. However, this metadata needs to be machine and exchange clinical data meaningfully thanks to semantic
readable and standardized in order to be used. The interoperability. Doctors frequently ask ambiguous questions
extensible markup language (XML) serves as the of EHR systems and want responses from dispersed systems.
foundation for this kind of metadata that semantic web
technology offers [1]. Ebtsam Adel et al. [4] suggested a uniform semantic
interoperability architecture for dispersed EHRs based on
II. LITERATURE REVIEW fuzzy ontologies. In the framework architecture, there are
three main layers.The EHRs' diverse data is stored at the
A knowledge-driven architecture that can incorporate large lowest layer (local ontologies creation), which has various
biomedical data into a knowledge graph was proposed by database schemas, standards, terminologies, locations, forms,
Maria et al [2]. New patterns and relationships can be and objectives. EHR standards, spreadsheet files, XML files,
explored and found thanks to the organized and semantically archetype definition language (ADL) files, and various
specified nature of integrated data. The knowledge-driven databases (such as MySQL, SqlServer, DB2, Access, and
framework creates a knowledge graph from huge data sources Oracle) in heterogeneous schemas could be the sources of this
in various formats, such as structured data, scientific papers, data. With the aid of a mediator (such as DB2OWL, X2OWL,
pictures, and clinical notes, from which unidentified patterns or ADL2OntoModule) appropriate for each kind, these many
and relationships can be found. There are four primary parts to inputs are converted into clear ontologies. The local
the framework: Knowledge management and discovery, ontologies are mapped to a clear global one in the middle
knowledge extraction, knowledge graph creation, and data layer (global ontology building) utilizing mapping algorithms
access control and privacy are the first four. As noted, a or human experts assisted by common terminology
variety of data sources can be incorporated and explained into vocabularies. All data is described by the global reference
a knowledge graph, upon which management and research are ontology, which unifies and aggregates all local ontologies. A
carried out. According to preliminary findings, the framework unified fuzzy ontology is then created from this clear ontology.
may scale up to enormous knowledge graphs and the inherent By concentrating just on the global reference, a doctor or
characteristics of biomedical data. More significantly, the other specialist can ask any language or semantic query
results demonstrate that patterns that pave the way for lung through the user interface, which is the third layer. fuzzy
cancer patient profiling can be found by utilizing the ontology. That ontology is more dynamic and aids in
knowledge represented in the knowledge graph. comprehending complex medical questions in natural
language. A robust and worldwide semantic interoperability
Chuming Chen et al [3] found that the amount of scholarly
method is the end product. The suggested method for
literature on COVID-19 has rapidly increased as a result of the
integrating various healthcare systems is predicated on a fuzzy
global reaction to the pandemic. Gaining understanding of the
origin, diagnosis, and therapy of COVID-19 requires ontology meaning. Compared to frameworks that solely rely
extracting information from biomedical literature and on clear ontologies, this framework offers several advantages
and benefits, such as: (1) it advances the goal of complete
combining it with pertinent data from curated biological
semantic interoperability of heterogeneous EHRs; (2) it
databases. The knowledge was codified in a standardized and
endorses the concept of plug and play, which allows any
calculable COVID-19 Knowledge Graph (KG) by integrating
system with any structure to be anonymously integrated with
COVID-19 knowledge mined from literature using iTextMine,
PubTator, and SemRep with pertinent biological databases current systems without compromising the current working
using Semantic Web technology RDF. A knowledge was environment; and (3) it is modular and expandable because it
is based on the use of ontologies and terminologies; the
created portal with browsing and searching interfaces and
functionality of the suggested framework can be uniformly
published the COVID-19 KG via a SPARQL endpoint to
expanded.
facilitate federated searches on the Semantic Web.
Additionally, the RESTful API was created to facilitate Mara Abel et al. [5] They contributed to providing the system
programmatic access and made RDF dumps available for architecture integration with the basic the main components,
download. which can be used in more than on Field of application. The
Electronic health records, or EHRs, provide efficient clinical system was able to provide Integration in both biomedical and
information management in any healthcare institution by geological the domain. The central components in the system
perform Integration according to the field of application
providing a thorough and longitudinal electronic recording of
Defined and linked to the system by domain Ontology and
all events and data related to a person's health condition, from
semantic descriptions.
birth to death. Each piece of medical data has its own schema,
structure, standard, format, coding system, level of abstraction,
III. MAIN DISCUSSION local ontologies also describe resources. However, these
Both the storage and reflection techniques to data integration ontologies are constructed using a common global language in
are supported by biomedical ontology, which is a key order to circumvent the drawbacks and make them
component of the process. comparable. This vocabulary enables querying through a
A controlled vocabulary that attempts to include the common vocabulary and includes fundamental terms from a
knowledge of a specific field is called an ontology. Since domain. An ontology might also be the vocabulary. It is also
various sources are transformed into a similar format and feasible to establish mappings just between the local and
vocabulary, this is the necessary standardization of storage shared global ontologies, eliminating the need for mappings
approaches. It is possible to divide the ontology into four between the local ontologies. Existing mappings don't need to
categories: be changed in order to add new sources.
• Top-level ontologies: they explain extremely broad ideas The scientific community has access to a wealth of
that are not specific to any one problem or field and can biomedical data on the internet. A large portion of this data is
be greatly applied to other fields. kept in numerous databases. These databases' content varies
• Top-domain ontologies: they include the fundamental depending on the kind of biological data they offer. It is
ideas of a certain field. For instance, a living thing or a frequently necessary to use numerous datasets for
biological domain cell. It serves as a conduit between the computational analysis of biological data. At the moment,
domain and the higher-level ontology. integrating several data sources is often done by hand. This
method, which calls for integrated datasets with rich,
• Domain ontology: only covers a specific area since it only
adaptable, and effective interfaces, is exceedingly time-
contains domain-specific concepts.
consuming.
• Local ontology: describes the semantics of a single
Heterogeneous database integration issues
information resource.
•Diverse file formats, query languages, access protocols,
Semantic data integration is made possible by ontologies'
and other factors lead to technical heterogeneity.
capacity to offer a map of concepts and relationships. In this
• When several models store the same data, data model
instance, ontologies are used to characterize the semantics of
heterogeneity results.
the data sources in order to clarify their content. Data from
• When separate databases with disparate but related data
multiple sources can be mapped using a very fine-grained
are combined, semantic heterogeneity arises. For instance,
integration process, regardless of whether the resources offer
merge a database of genes and a database of proteins. Because
structured or unstructured data. A semantic layer acts as a
genes can result in gene products, these two databases are
mediator between the presentation layer and the physical layer
connected. Facilitating database integration and addressing
in ontology-based approaches to data integration, which
such heterogeneity are two of the primary problems that the
typically offer a three-tier design. This semantic mediator
Semantic Web aims to address [6].
converts queries into execution plans using mapping models.
Problems with integrating data with Semantic Web
This enables transparent access to multiple data sources using
technologies:
a single query language, like SPARQL. Ontologies are used at
Uniform naming
the mediator layer because they provide a uniform language
The individual naming of things is one of the challenges
for data integration, where each concept has a unique name,
that data integration faces. Mapping is necessary in order to
related properties, and well-defined synonyms. Furthermore,
incorporate these resources. Establishing an official names
there are three methods for integrating data based on
commission to oversee the definitive list of these names is one
ontologies. An ontology can grow over time and connect to
strategy. For gene names and symbols (short-form
other ontologies; it is not a static structure.
abbreviation), the HUGO Gene Nomenclature Committee 14
• Single ontology approach: This method integrates various
serves as an illustration. However, this technique rarely works
sources using a single global ontology. The global ontology is
in practice due to the changing nature of biological research.
connected to all information sources. A variety of specialized
The development of biological IDs that are globally unique
ontologies may be combined to form the global ontology. This
could be another method. URIs can be used for this, enabling
method necessitates data sources with comparable granularity
resources to be uniquely identified This is crucial for the use
and domain views. One drawback of this strategy is that
of Semantic Web technologies [7].
including new data sources may cause significant adjustments
Extraction of the semantic information out of existing
to the ontology being employed.
knowledge
• Multiple ontologies approach: A source's own local
To fully utilize Semantic Web technologies, it would be
ontology describes its meaning. Since there is no standard
advantageous to automatically or semi-automatically extract
lexicon, inter-ontology mapping is required.This method has
the semantic information from existing sources.
the benefit of being easily able to include new data sources
Therefore, creating techniques that facilitate such a process is
and their local ontologies. However, it can be quite
a significant difficulty. When utilizing Semantic Web
challenging to describe the mapping between ontologies due
technologies for data integration, this would facilitate two
to a lack of common vocabulary.
primary tasks:
• Hybrid approach: This combines the two methods
mentioned earlier. Similar to the many ontologies approach,
1. Annotate sources to existing ontologies: This procedure
involves taking information out of the data source and adding
it to an existing ontology either automatically or semi-
automatically.
2. The process of creating ontologies involves gathering data
from various sources that are related to a particular field. The
objective is to use the gathered domain information to
automatically or semi-automatically construct an ontology.
Development, upkeep, and quality of ontologies Dedicated
practice communities must create, oversee, and support
ontologies. Moreover, an ontology is a "living structure,"
meaning that new information can lead ideas to evolve
continuously. They can be added, modified, swapped out, or
eliminated. As a result, ontologies require ongoing
maintenance and are not permanent. Ontology quality
assurance (QA) is another issue.
The following should be the design and quality standards for
ontologies:
Figure 1:Thea-online
▪ Clarity: Definitions should be objective and provide a
clear understanding of the intended meaning.
▪ Extendibility: The work required to add new information
to an ontology without making it invalid. III. CONCLUSIONS
▪ Minimal encoding bias: Terms shouldn't be specified Large volumes of data must be analyzed for biological
using any certain symbol-level encoding. research. These analyses often involve exploring and
▪ Minimal ontological commitment: The domain being correlating data from multiple different sources. However,
represented should be described by an ontology using the with the unstoppable growth and the growing number of
fewest concepts and relationships feasible. biological databases, finding relevant resources and making
▪ Coherence: The ontology content needs to make sense. the right links between their content becomes increasingly
To put it another way, conclusions should never conflict difficult. Of course, today, biologists have access to biogates
with definitions [8]. that provide a unified view of a variety of different data
The availability and effective use of information are critical sources. However, the integration of each data source into
to current biology research. It is frequently necessary to mix these aggregated databases is custom designed, programmed
data from multiple biological sources in order to create new and optimized by expert programmers.
understanding. The management of distributed biological data The significance of data integration and the difficulty of
can benefit from the adoption of Semantic Web technologies, integrating biomedical data were examined in this study,
which offer a common framework that permits data to be along with the identification of semantic web technologies,
shared and reused across applications. their use in integrating biomedical data, the main issues and
One use of Semantic Web technologies to biological potential solutions, and a use case for the technology.
science is Thea-online. It uses the Semantic Web standards The semantic web is on the verge of revolutionizing access to
(URIs, RDF, OWL, and SPARQL) that are currently in place information. By enhancing the existing web with thinking
to combine, query, and display data from multiple sources [9]. capabilities, the semantic web will enable automatic
integration and combination of data derived from diverse
http://bioinfo.unice.fr:8080/thea-online/ is the URL to the sources. From the user's point of view, the retained basic
website. A synthetic report of all the information that is technical solutions should be completely transparent.
known on a gene or gene product is produced when search However, the adoption of semantic web technologies and
terms are entered in a straightforward text box. The internet languages will enable access to an almost unlimited number of
has been made to make searching as easy as possible. It is not data sources.
necessary to indicate the name of the database from which a
query identifier originates or to format inquiries in any
particular manner. The text field can contain a range of names,
symbols, aliases, or identifiers [10].
REFERENCES

[1] "[1]R. Kienast and C. Baumgartner, Accessed: Oct. 20, 2024. [Online].
Available: https://cdn.intechopen.com/pdfs/22489/InTech-
Semantic_data_integration_on_biomedical_data_using_semantic_web_tech
nologies.pdf".

[2] "M.-E. Vidal, K. M. Endris, Samaneh Jozashoori, F. Karim, and G. Palma,


“Semantic Data Integration of Big Biomedical Data for Supporting
Personalised Medicine,” Studies in computational intelligence, pp. 25–56,
Jan. 2019, doi: https://doi.org/10.1007/978-3".

[3] "C. Chen, K. E. Ross, Sachin Gavali, J. E. Cowart, and C. H. Wu, “COVID-
19 Knowledge Graph from semantic integration of biomedical literature and
databases,” Bioinformatics (Oxford. Print), vol. 37, no. 23, pp. 4597–4598,
Oct. 2021, doi: https://doi.org/10".

[4] "E. Adel, S. El-Sappagh, S. Barakat, and M. Elmogy, “A unified fuzzy


ontology for distributed electronic health record semantic interoperability,”
Academic Press, 2018, pp. 353–395. doi: https://doi.org/10.1016/B978-0-
12-815370-3.00014-1.".

[5] "T. De, M. Abel, and F. García-Sánchez, “Using Semantic Web Services to
Integrate Data and Processes from Different Web Portals,” vol. 302, Aug.
2009, Available:
https://www.researchgate.net/publication/228908373_Using_Semantic_Web
_Services_to_Integrate".

[6] "M. Karami and A. Rahimi, “Semantic Web Technologies for Sharing
Clinical Information in Health Care Systems,” Acta Informatica Medica,
vol. 27, no. 1, p. 4, 2019, doi: https://doi.org/10.5455/aim.2019.27.4-7.".

[7] "D. Ostrowski, N. Rychtyckyj, P. MacNeille, and M. Kim, “Integration of


Big Data Using Semantic Web Technologies,” IEEE Xplore, Feb. 01, 2016.
https://ieeexplore.ieee.org/abstract/document/7439370 (accessed Mar. 23,
2021).".

[8] "J. Ahmed and Dr. Muqeem Ahmed, “Big data and semantic web,
challenges and opportunities a survey,” International Journal of Engineering
& Technology, vol. 7, no. 4.5, p. 631, Sep. 2018, doi:
https://doi.org/10.14419/ijet.v7i4.5.21174.".

[9] "C. Pasquier, “Applying Semantic Web technologies to biological data


integration and visualization,” Hal.science, pp. 131–51, 2024, doi:
https://hal.science/hal-01151507.".

[10] "C. Pasquier, “Biological data integration using Semantic Web


technologies,” Biochimie, vol. 90, no. 4, pp. 584–594, Apr. 2008, doi:
https://doi.org/10.1016/j.biochi.2008.02.007.".

You might also like