Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
72 views1 page

Data Quality For Data Lakes - Reference Architecture

This document discusses steps to enhance data quality in a data lake through an iterative process. It recommends taking a collaborative approach across various data users such as data engineers, data scientists and data analysts. The key steps include profiling data to understand anomalies, building rules to validate data quality, measuring initial metrics to establish a baseline, standardizing data through dictionaries, cleansing data using rules, handling exceptions, and measuring final metrics to establish trust in the published data.

Uploaded by

eslam.aeroman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views1 page

Data Quality For Data Lakes - Reference Architecture

This document discusses steps to enhance data quality in a data lake through an iterative process. It recommends taking a collaborative approach across various data users such as data engineers, data scientists and data analysts. The key steps include profiling data to understand anomalies, building rules to validate data quality, measuring initial metrics to establish a baseline, standardizing data through dictionaries, cleansing data using rules, handling exceptions, and measuring final metrics to establish trust in the published data.

Uploaded by

eslam.aeroman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data Quality for Data Lakes

Reference Avoid creating a data swamp by taking logical steps to enhance data quality in the data lake. The iterative process will ensure gradual
improvement in the quality of data during data engineering. A collaborative approach across various data users such as data engineers, data
Architecture scientists and data analysts is key to success.

1 Profile helps understand data


Informatica Data Quality anomalies and discovery data
patterns.

Build Rules to validate if data is


Landing Enrichment Enterprise 2
fit for business needs.
Streaming
Zone Zone Zone
3 Measure Initial KPIs to establish
1 4 7 baseline on the quality of data and
Machine Apps
IoT Data Set Measure establish historical trends.
Profile
Dictionaries Final
4 Set Dictionaries to help
Log files Social Mobile standardize data across multiple
systems.

On-Premises 2 5 8 5 Cleanse Data using business rules


Cleanse to help improve analytics and
Ingest Build Rules Harmonize Publish Certified reduce time on data remediation.
Data
Mainframe Application Databases
Servers 6 Handle Exceptions process as part
of your daily load. Automate
correction of data as much as
3 6 possible and involve data owners.
Data Hadoop
Documents Warehouse Measure Handle Measure Final KPIs at the
Initial Exception 7
consumption layer to establish
SaaS trust of data being published for
consumption.

8 Certified Data is the process of


ERP CRM Exceptions validating that the data is ready for
Data Lake business consumption and
provides a mechanism to
provision it.

You might also like