Data Integration

Data integration combines data from multiple sources into a single view. There are two main approaches: tight coupling stores integrated data in a single location, while loose coupling keeps data in original sources and provides a unified query interface. Key challenges are identifying duplicate entities, analyzing redundancy and correlations, and detecting inconsistent data values between sources.

Uploaded by

TuLbig E. Winnower

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views18 pages

Data Integration

Uploaded by

TuLbig E. Winnower

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Data Integration

What is Data Integration?

Data integration is a record preparation technique used in
data mining that combines facts from several heterogeneous
information assets into coherent information to keep a
unified perspective of the facts.

Those assets might also consist of more than one record

cubes, databases, or flat documents.
Data integration approaches
The data integration approaches are formally defined as
triple <G, S, M> where:

● G - stand for the global schema,

● S - stands for the heterogeneous source of schema,
● M - stands for mapping between the queries of source and
global schema.
Two Main Approaches to Data Integration

Tight Coupling Loose Coupling

Tight Coupling
● Here, a data warehouse is treated as an information
retrieval component.
● In this coupling, data is combined from different sources
into a single physical location through the process of
ETL – Extraction, Transformation, and Loading.
Loose Coupling
● Here, an interface is provided that takes the query from
the user, transforms it in a way the source database can
understand, and then sends the query directly to the
source databases to obtain the result.
● And the data only remains in the actual source databases
Issues in Data Integration

Redundancy
Entity Data warfare
and Tuple
Identiﬁcation Detection and
Correlation Duplication
Problem backbone
Analysis
Entity Identiﬁcation Problem
Since records are obtained from heterogeneous sources, some
data resources match each other and they will become
redundant if they are integrated.

An entity from one statistics supply has a patron identity

and the entity from the supply of the alternative statistics
has a purchaser wide variety. Analyzing such metadata
statistics will prevent you from making errors during schema
integration.
Redundancy and Correlation Analysis
Unimportant data that are no longer required are referred to
as redundant data. It may also appear due to attributes
created from the use of another property inside the
information set.

Inconsistencies further increase the level of redundancy

within the characteristic. The traits are examined to
determine their interdependence on each difference,
consequently discovering the link between them.
Tuple Duplication
Information integration has also handled duplicate tuples in
addition to redundancy.

Duplicate tuples may also appear in the generated

information if the denormalized table was utilized as a
deliverable for data integration.
Data warfare Detection and backbone
The data warfare technique of combining records from several
sources is unhealthy. In the same way, that characteristic
values can vary, so can statistics units.

The disparity may be related to the fact that they are

represented differently within the special data units
Data Integration Techniques

Application
Manual Data
based
Integration Warehousing
integration

Uniform
Middleware
Access
Integration
Integration
Manual Integration
This method avoids using automation during data integration.
The data analyst collects, cleans, and integrates the data
to produce meaningful information. This strategy is suitable
for a mini organization with a limited data set. Although,
it will be time-consuming for the huge, sophisticated, and
recurring integration. Because the entire process must be
done manually, it is a time-consuming operation.
Middleware Integration
The middleware software is used to take data from many
sources, normalize it, and store it in the resulting data
set. When an enterprise needs to integrate data from legacy
systems to modern systems, this technique is used.
Middleware software acts as a translator between legacy and
advanced systems. You may take an adapter that allows two
systems with different interfaces to be connected. It is
only applicable to certain systems.
Application-based integration
It is using software applications to extract, transform, and
load data from disparate sources. This strategy saves time
and effort, but it is a little more complicated because
building such an application necessitates technical
understanding. This strategy saves time and effort, but it
is a little more complicated because building such an
application necessitates technical understanding.
Uniform Access Integration
This method combines data from a more disparate source.
However, the data's position is not altered in this
scenario; the data stays in its original location. This
technique merely generates a unified view of the integrated
data. The integrated data does not need to be stored
separately because the end-user only sees the integrated
view.
Data Warehousing
This technique is related to the uniform access integration
technique in a roundabout way. The unified view, on the
other hand, is stored in a different location. It enables
the data analyst to deal with more sophisticated inquiries.
Although it is a promising solution and increased storage
costs, the unified data's view or copy requires separate
storage and maintenance costs.
The End

C - BCBDC - 2505 SAP Exam Practice Questions
No ratings yet
C - BCBDC - 2505 SAP Exam Practice Questions
4 pages
Informatica Data Quality Study Guide
No ratings yet
Informatica Data Quality Study Guide
106 pages
DT-EDU-DeN60EDU0101. Virtual DataPort Architecture
No ratings yet
DT-EDU-DeN60EDU0101. Virtual DataPort Architecture
23 pages
BIGuidebook Templates - BI Roadmap
No ratings yet
BIGuidebook Templates - BI Roadmap
14 pages
Data Integration in Data Mining
No ratings yet
Data Integration in Data Mining
4 pages
Unit 5
No ratings yet
Unit 5
20 pages
Data Cleaning - Merged
No ratings yet
Data Cleaning - Merged
19 pages
A Roadmap To Enterprise Data Integration
No ratings yet
A Roadmap To Enterprise Data Integration
32 pages
Data Integration
No ratings yet
Data Integration
10 pages
Reading Material Mod 4 Data Integration - Data Warehouse
No ratings yet
Reading Material Mod 4 Data Integration - Data Warehouse
33 pages
Data Integration
No ratings yet
Data Integration
8 pages
Univr Ba2425 - l9 - Data Integration p1
No ratings yet
Univr Ba2425 - l9 - Data Integration p1
31 pages
IEEE Paper Template in A4
No ratings yet
IEEE Paper Template in A4
4 pages
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
No ratings yet
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
36 pages
Telesperience Data Integration
No ratings yet
Telesperience Data Integration
13 pages
UNIVR BA2425 - L10 - DATA INTEGRATION p2
No ratings yet
UNIVR BA2425 - L10 - DATA INTEGRATION p2
32 pages
Parent 1998 Issues and Approaches of Database Integration
No ratings yet
Parent 1998 Issues and Approaches of Database Integration
12 pages
Beauty and The Beast: The Theory and Practice of Information Integration
No ratings yet
Beauty and The Beast: The Theory and Practice of Information Integration
17 pages
A RDF-based Data Integration Framework
No ratings yet
A RDF-based Data Integration Framework
6 pages
EDM - Chapter 5 - Data Integration
No ratings yet
EDM - Chapter 5 - Data Integration
27 pages
Acceptance Testing and ETL Process j8Mus6Ctvj
No ratings yet
Acceptance Testing and ETL Process j8Mus6Ctvj
19 pages
Guide To Metadata-Driven Data Integration
No ratings yet
Guide To Metadata-Driven Data Integration
9 pages
BMIS Chapter 4 SCMSB
No ratings yet
BMIS Chapter 4 SCMSB
35 pages
OLAP and Metadata
No ratings yet
OLAP and Metadata
6 pages
DS Module2 L5 L15
No ratings yet
DS Module2 L5 L15
40 pages
1 IntegrationApproaches
No ratings yet
1 IntegrationApproaches
19 pages
Implementation of Data Warehousing in Online Sales Company
No ratings yet
Implementation of Data Warehousing in Online Sales Company
5 pages
Data Integration for Business Insights
No ratings yet
Data Integration for Business Insights
15 pages
Enterprise Information Integration
No ratings yet
Enterprise Information Integration
2 pages
Business Analytics Olaa
No ratings yet
Business Analytics Olaa
34 pages
Library Data Warehouse Benefits
No ratings yet
Library Data Warehouse Benefits
67 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
5 pages
DW 1
No ratings yet
DW 1
2 pages
Data Integration
No ratings yet
Data Integration
6 pages
BCOM304 Management Information System Unit-4: Multimedia Approach To Information Processing
No ratings yet
BCOM304 Management Information System Unit-4: Multimedia Approach To Information Processing
10 pages
Data Integration PDF
No ratings yet
Data Integration PDF
3 pages
BI Architecture Essentials
No ratings yet
BI Architecture Essentials
4 pages
Unit-2 New
No ratings yet
Unit-2 New
61 pages
Data Warehouse and BI Overview
No ratings yet
Data Warehouse and BI Overview
14 pages
An Introduction To SQL Server 2005 Integration Services
No ratings yet
An Introduction To SQL Server 2005 Integration Services
18 pages
16 08 2024 Data Virtualization Session2
No ratings yet
16 08 2024 Data Virtualization Session2
45 pages
Bi - Unit Iii
No ratings yet
Bi - Unit Iii
65 pages
Extending The Data Warehouse Whitepaper
No ratings yet
Extending The Data Warehouse Whitepaper
25 pages
Integrated Technology Object Oriented Model
No ratings yet
Integrated Technology Object Oriented Model
11 pages
Paper 2 Datawarehouse Notes
No ratings yet
Paper 2 Datawarehouse Notes
20 pages
Imp Doc 2
No ratings yet
Imp Doc 2
6 pages
Data Integration Challenges
No ratings yet
Data Integration Challenges
98 pages
Business Intelligence Endsem
No ratings yet
Business Intelligence Endsem
10 pages
Annex 1 Technology Architecture
No ratings yet
Annex 1 Technology Architecture
20 pages
What Is Data Warehousing?: Service Operation Management
No ratings yet
What Is Data Warehousing?: Service Operation Management
2 pages
Data Warehouse For Bignners
No ratings yet
Data Warehouse For Bignners
14 pages
Sia Mod3
No ratings yet
Sia Mod3
5 pages
05 Data Warehousing Process Overview
No ratings yet
05 Data Warehousing Process Overview
31 pages
05 DS Data Preprocessing - Cleaning
No ratings yet
05 DS Data Preprocessing - Cleaning
14 pages
Data Warehouse & Architecture Guide
No ratings yet
Data Warehouse & Architecture Guide
30 pages
Reviewer in INTE
No ratings yet
Reviewer in INTE
6 pages
Data Warehousing for Managers
No ratings yet
Data Warehousing for Managers
16 pages
Data Integration and The Extraction, Transformation and Loading Processes
No ratings yet
Data Integration and The Extraction, Transformation and Loading Processes
5 pages
The Clio Project
No ratings yet
The Clio Project
6 pages
Big Data and Data Warehousing 1
No ratings yet
Big Data and Data Warehousing 1
24 pages
Data Warehousing Unit 1
No ratings yet
Data Warehousing Unit 1
26 pages
Integrating Data
No ratings yet
Integrating Data
4 pages
Association Rule Learning
No ratings yet
Association Rule Learning
16 pages
Process Mining and Data Stream Mining
No ratings yet
Process Mining and Data Stream Mining
19 pages
Biotech Report Group 3
No ratings yet
Biotech Report Group 3
12 pages
STS - SCIENCE, TECHNOLOGY, A ... Llabus AY2020-2021 Sem 2
No ratings yet
STS - SCIENCE, TECHNOLOGY, A ... Llabus AY2020-2021 Sem 2
7 pages
Data Quality Talend
No ratings yet
Data Quality Talend
346 pages
Assignment 2 - Frontsheet - Business Process Support
No ratings yet
Assignment 2 - Frontsheet - Business Process Support
40 pages
Chapater 1 Data Mining 2025
No ratings yet
Chapater 1 Data Mining 2025
7 pages
DM Unit-1 Notes
No ratings yet
DM Unit-1 Notes
47 pages
Informatica Powercenter and Data Quality On Oracle Exadata
No ratings yet
Informatica Powercenter and Data Quality On Oracle Exadata
8 pages
BI Data Integration & Mining
No ratings yet
BI Data Integration & Mining
6 pages
Integration Architecture Guide For Cloud and Hybrid Landscapes PDF
No ratings yet
Integration Architecture Guide For Cloud and Hybrid Landscapes PDF
83 pages
Essentials of Data Engineering - Saini, DR - Mukesh - 2024 - Anna's Archive
No ratings yet
Essentials of Data Engineering - Saini, DR - Mukesh - 2024 - Anna's Archive
431 pages
DIV2020-Data Intelligence Virtual Conference APAC2020
No ratings yet
DIV2020-Data Intelligence Virtual Conference APAC2020
7 pages
Data Integration Challenges & Solutions
No ratings yet
Data Integration Challenges & Solutions
22 pages
Etl Developer Resume
100% (1)
Etl Developer Resume
4 pages
Adastra Framework For Managing Information Quality Bratislava Oct 21 2008
No ratings yet
Adastra Framework For Managing Information Quality Bratislava Oct 21 2008
36 pages
Oil Data Quality Enhancement
No ratings yet
Oil Data Quality Enhancement
17 pages
Lloyd 2011 PDF
No ratings yet
Lloyd 2011 PDF
76 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
Powercenter Real Time Data Sheet 6812
No ratings yet
Powercenter Real Time Data Sheet 6812
8 pages
Architecture of Deep Web: Surfacing Hidden Value: Suneet Kumar Virender Kumar Sharma
No ratings yet
Architecture of Deep Web: Surfacing Hidden Value: Suneet Kumar Virender Kumar Sharma
5 pages
Informatica Powerconnect Products: Getting To The Core of A Complex Problem
No ratings yet
Informatica Powerconnect Products: Getting To The Core of A Complex Problem
4 pages
Mit401 Unit 10-Slm
No ratings yet
Mit401 Unit 10-Slm
23 pages
Data Integration in Distributed Databases
No ratings yet
Data Integration in Distributed Databases
5 pages
Use Iot To Advance Railway Predictive Maintenance Whitepaper
100% (1)
Use Iot To Advance Railway Predictive Maintenance Whitepaper
28 pages
Unit 2 Data Preprocessing and Association Rule Mining
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
31 pages
Data Integration Platform Cloud Hands On Lab
No ratings yet
Data Integration Platform Cloud Hands On Lab
12 pages
Tech Solutions for Enterprises
No ratings yet
Tech Solutions for Enterprises
6 pages
بنك اسئلة للمراجعة - تنقيب بيانات
No ratings yet
بنك اسئلة للمراجعة - تنقيب بيانات
24 pages
Data Mapping for Enterprise Success
No ratings yet
Data Mapping for Enterprise Success
6 pages

Data Integration

Uploaded by

Data Integration

Uploaded by

Data Integration

What is Data Integration?

Those assets might also consist of more than one record

● G - stand for the global schema,

Tight Coupling Loose Coupling

An entity from one statistics supply has a patron identity

Inconsistencies further increase the level of redundancy

Duplicate tuples may also appear in the generated

The disparity may be related to the fact that they are

You might also like