Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views18 pages

Data Integration

Data integration combines data from multiple sources into a single view. There are two main approaches: tight coupling stores integrated data in a single location, while loose coupling keeps data in original sources and provides a unified query interface. Key challenges are identifying duplicate entities, analyzing redundancy and correlations, and detecting inconsistent data values between sources.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views18 pages

Data Integration

Data integration combines data from multiple sources into a single view. There are two main approaches: tight coupling stores integrated data in a single location, while loose coupling keeps data in original sources and provides a unified query interface. Key challenges are identifying duplicate entities, analyzing redundancy and correlations, and detecting inconsistent data values between sources.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Integration

What is Data Integration?


Data integration is a record preparation technique used in
data mining that combines facts from several heterogeneous
information assets into coherent information to keep a
unified perspective of the facts.

Those assets might also consist of more than one record


cubes, databases, or flat documents.
Data integration approaches
The data integration approaches are formally defined as
triple <G, S, M> where:

● G - stand for the global schema,


● S - stands for the heterogeneous source of schema,
● M - stands for mapping between the queries of source and
global schema.
Two Main Approaches to Data Integration

Tight Coupling Loose Coupling


Tight Coupling
● Here, a data warehouse is treated as an information
retrieval component.
● In this coupling, data is combined from different sources
into a single physical location through the process of
ETL – Extraction, Transformation, and Loading.
Loose Coupling
● Here, an interface is provided that takes the query from
the user, transforms it in a way the source database can
understand, and then sends the query directly to the
source databases to obtain the result.
● And the data only remains in the actual source databases
Issues in Data Integration

Redundancy
Entity Data warfare
and Tuple
Identification Detection and
Correlation Duplication
Problem backbone
Analysis
Entity Identification Problem
Since records are obtained from heterogeneous sources, some
data resources match each other and they will become
redundant if they are integrated.

An entity from one statistics supply has a patron identity


and the entity from the supply of the alternative statistics
has a purchaser wide variety. Analyzing such metadata
statistics will prevent you from making errors during schema
integration.
Redundancy and Correlation Analysis
Unimportant data that are no longer required are referred to
as redundant data. It may also appear due to attributes
created from the use of another property inside the
information set.

Inconsistencies further increase the level of redundancy


within the characteristic. The traits are examined to
determine their interdependence on each difference,
consequently discovering the link between them.
Tuple Duplication
Information integration has also handled duplicate tuples in
addition to redundancy.

Duplicate tuples may also appear in the generated


information if the denormalized table was utilized as a
deliverable for data integration.
Data warfare Detection and backbone
The data warfare technique of combining records from several
sources is unhealthy. In the same way, that characteristic
values can vary, so can statistics units.

The disparity may be related to the fact that they are


represented differently within the special data units
Data Integration Techniques

Application
Manual Data
based
Integration Warehousing
integration

Uniform
Middleware
Access
Integration
Integration
Manual Integration
This method avoids using automation during data integration.
The data analyst collects, cleans, and integrates the data
to produce meaningful information. This strategy is suitable
for a mini organization with a limited data set. Although,
it will be time-consuming for the huge, sophisticated, and
recurring integration. Because the entire process must be
done manually, it is a time-consuming operation.
Middleware Integration
The middleware software is used to take data from many
sources, normalize it, and store it in the resulting data
set. When an enterprise needs to integrate data from legacy
systems to modern systems, this technique is used.
Middleware software acts as a translator between legacy and
advanced systems. You may take an adapter that allows two
systems with different interfaces to be connected. It is
only applicable to certain systems.
Application-based integration
It is using software applications to extract, transform, and
load data from disparate sources. This strategy saves time
and effort, but it is a little more complicated because
building such an application necessitates technical
understanding. This strategy saves time and effort, but it
is a little more complicated because building such an
application necessitates technical understanding.
Uniform Access Integration
This method combines data from a more disparate source.
However, the data's position is not altered in this
scenario; the data stays in its original location. This
technique merely generates a unified view of the integrated
data. The integrated data does not need to be stored
separately because the end-user only sees the integrated
view.
Data Warehousing
This technique is related to the uniform access integration
technique in a roundabout way. The unified view, on the
other hand, is stored in a different location. It enables
the data analyst to deal with more sophisticated inquiries.
Although it is a promising solution and increased storage
costs, the unified data's view or copy requires separate
storage and maintenance costs.
The End

You might also like