0% found this document useful (0 votes)

160 views28 pages

Data Quality for Data Stewards

This document provides an overview of Module 10 which covers enforcing data quality using Data Quality Services (DQS) in SQL Server. It discusses introducing data quality and using DQS to cleanse and match data. It provides lessons on creating a DQS knowledge base, using DQS to cleanse data by mapping columns to domains, and using DQS to match data by defining matching policies and mapping columns. It also includes demonstrations and labs on cleansing and deduplicating data using a DQS project.

Uploaded by

Richie Poo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

160 views28 pages

Data Quality for Data Stewards

Uploaded by

Richie Poo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Module 10

Enforcing Data Quality

Module Overview

Introduction to Data Quality

Using Data Quality Services to Cleanse Data
• Using Data Quality Services to Match Data
Lesson 1: Introduction to Data Quality

What Is Data Quality and Why Do You Need It?

Data Quality Services Overview
What Is a Knowledge Base?
What Is a Domain?
What Is a Reference Data Service?
Creating a Knowledge Base
• Demonstration: Creating a Knowledge Base
What Is Data Quality and Why Do You Need It?

• Business decisions should be made on trusted

data
• Data quality issues in sources can be propagated
into the data warehouse:
• Invalid data values
• Inconsistencies
• Duplicate business entities
Data Quality Services Overview

• DQS is a knowledge-based solution for:

• Data cleansing
• Data matching

• DQS Components:
• Server
• Client
• Data cleansing SSIS transformation
What Is a Knowledge Base?

• Repository of knowledge about data:

• Domains define values and rules for each field
• Matching policies define rules for identifying duplicate
records
• Determine data for a DQS knowledge base:
• Analyze source databases and data warehouses for
inconsistencies, inaccuracies, and incompleteness
• Audit website and software forms used for data entry
to find free-form fields prone to creating low quality
data
• Look at dependent reporting systems and find
incorrect results
What Is a Domain?

• Domains:
• Are specific to a data field
• Contain the rules for the data
• Can be individual or composite
What Is a Reference Data Service?

• The Azure Marketplace hosts specialist data

cleansing providers, where you can:
• Set up an account
• Subscribe to a reference service
• Map your domain to the reference service
Creating a Knowledge Base

• Creating a knowledge base is an iterative

process:
1. Knowledge discovery
2. Domain management
Demonstration: Creating a Knowledge Base

In this demonstration, you will see how to:

• Create a knowledge base
• Perform knowledge discovery
• Perform domain management
Lesson 2: Using Data Quality Services to Cleanse
Data

Creating a Data Cleansing Project

Viewing Cleansed Data
Demonstration: Cleansing Data
• Using the Data Cleansing Data Flow
Transformation
Creating a Data Cleansing Project

1. Select a knowledge base

2. Map columns to domains
3. Review suggestions and corrections
4. Export results
Viewing Cleansed Data

• Output: the values for all fields after data

cleansing
• Source: the original value for fields that were
mapped to domains and cleansed
• Reason: the reason the output value was
selected by the cleansing operation
• Confidence: an indication of the confidence Data
Quality Services estimates for corrected values
• Status: the status of the output column (correct
or corrected)
Demonstration: Cleansing Data

In this demonstration, you will see how to:

• Create a data cleansing project
• View cleansed data
Using the Data Cleansing Data Flow
Transformation

• Input data to be cleansed

• Select knowledge base and map columns to
domains
• Output cleansed columns
Lab A: Cleansing Data

Exercise 1: Creating a DQS Knowledge Base

Exercise 2: Using a DQS Project to Cleanse Data
• Exercise 3: Using DQS in an SSIS Package

Logon Information
Virtual machine: 20767C-MIA-SQL
User name: ADVENTUREWORKS\Student
Password: Pa55w.rd

Estimated Time: 30 minutes.

Lab Scenario

You have created an ETL solution for the

Adventure Works data warehouse, and invited
some data stewards to validate the process before
putting it into production.
The data stewards have noticed some data quality
issues in the staged customer data, and have
asked you to provide a way for them to cleanse
data, so that the data warehouse is based on
consistent and reliable data. The data stewards
have given you an Excel workbook containing
some examples of the issues found in the data.
Lab Review

Having completed this lab, you will now be able to:

• Create a DQS knowledge base
• Use DQS to cleanse data
• Incorporate data cleansing into an SSIS data flow
Lesson 3: Using Data Quality Services to Match
Data

Creating a Matching Policy

Creating a Data Matching Project
Viewing Data Matching Results
• Demonstration: Matching Data
Creating a Matching Policy

• Define matching rules for business entities

• Rules match entities based on domains:
 Similarity:similar or exact match
 Weight: percentage to apply if match succeeds
 Prerequisite: mandatory domain match for rule to
succeed
• If the combined weight of all matches meets or
exceeds the rule’s minimum matching score, the
entities are duplicates
Creating a Data Matching Project

1. Select a knowledge base

2. Map columns to domains
3. Review match clusters
4. Export matches and survivors
• Select survivorship rule:
 Pivot record
 Most complete and longest record
 Most complete record
 Longest record
Viewing Data Matching Results

• Cluster ID: identifier for a cluster of matched

records
• Record ID: identifier for a matched record
• Matching Rule: the rule that produced the match
• Score: combined weighting of match criteria
• Pivot Mark: a matched record arbitrarily chosen
by Data Quality Services as the pivot record for a
cluster
Demonstration: Matching Data

In this demonstration, you will see how to:

• Create a matching policy
• Create a data matching project
• View data matching results
Lab B: Deduplicating Data

Exercise 1: Creating a Matching Policy

• Exercise 2: Using a DQS Project to Match Data

Logon Information
Virtual machine: 20767C-MIA-SQL
User name: ADVENTUREWORKS\Student
Password: Pa55w.rd

Estimated Time: 30 minutes

Lab Scenario

You have created a DQS knowledge base and

used it to cleanse customer data. However, data
stewards are concerned that the staged customer
data might include duplicate entries. For records
to be considered a match, the following criteria
must be true:
 The Country/Region column must be an exact
match.
 A total matching score of 80 or higher must be
achieved, based on the following weightings:
o An exact match of the Gender column has a
weighting of 10.
Lab Scenario (Continued)

o An exact match of the City column has a

weighting of 20.
o An exact match of the EmailAddress column
has a weighting of 30.
o A similar FirstName column value has a
weighting of 10.
o A similar LastName column value has a
weighting of 10.
o A similar AddressLine1 column value has a
weighting of 20.
Lab Review

Having completed this lab, you will now be able to:

• Add a matching policy to a DQS knowledge base
• Use DQS to match data
Module Review and Takeaways

• Review Question(s)

Hareesh: Snowflake Developer
100% (1)
Hareesh: Snowflake Developer
4 pages
EIM Tutorial
0% (1)
EIM Tutorial
84 pages
OpenText InfoArchive
No ratings yet
OpenText InfoArchive
14 pages
Data Science Course Overview
No ratings yet
Data Science Course Overview
34 pages
Ccs341-Question-Bank NNNNNN
No ratings yet
Ccs341-Question-Bank NNNNNN
10 pages
AP Invoices Conversion
50% (2)
AP Invoices Conversion
52 pages
SQL Server DQS Data Cleansing Lab
No ratings yet
SQL Server DQS Data Cleansing Lab
5 pages
Implementing Paas Cloud Services and Mobile Services
No ratings yet
Implementing Paas Cloud Services and Mobile Services
32 pages
#Lab4 ANJAR ELMECHRY
No ratings yet
#Lab4 ANJAR ELMECHRY
20 pages
SQL Server101 How Does It Work
No ratings yet
SQL Server101 How Does It Work
19 pages
Data Warehousing & Mining Guide
No ratings yet
Data Warehousing & Mining Guide
142 pages
SQL Server 2012 DQS Overview
No ratings yet
SQL Server 2012 DQS Overview
19 pages
Database Basics for Beginners
No ratings yet
Database Basics for Beginners
24 pages
SQL Server Data Protection & Auditing
100% (1)
SQL Server Data Protection & Auditing
38 pages
SQL Server Transaction Log Guide
100% (1)
SQL Server Transaction Log Guide
19 pages
TDM SBV Data Migration Design Document
No ratings yet
TDM SBV Data Migration Design Document
8 pages
20463D 02
No ratings yet
20463D 02
21 pages
20767B ENU Companion
100% (1)
20767B ENU Companion
188 pages
Core Azure Services Overview
No ratings yet
Core Azure Services Overview
43 pages
How To Analyse Facebook Ads Using Google BigQuery Automation
100% (1)
How To Analyse Facebook Ads Using Google BigQuery Automation
18 pages
10987C - Performance Tuning and Optimising SQL Databases
No ratings yet
10987C - Performance Tuning and Optimising SQL Databases
4 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
21 pages
Authorizing Users To Access Resources
No ratings yet
Authorizing Users To Access Resources
31 pages
Bca Vi Sem Bi - Unit III
No ratings yet
Bca Vi Sem Bi - Unit III
110 pages
20761C TrainerPrepGuide PDF
No ratings yet
20761C TrainerPrepGuide PDF
7 pages
Planning Data Warehouse Infrastructure
No ratings yet
Planning Data Warehouse Infrastructure
21 pages
Implementing Control Flow in An SSIS Package
No ratings yet
Implementing Control Flow in An SSIS Package
35 pages
GC Buffer Busy
No ratings yet
GC Buffer Busy
19 pages
ETL Solution with SSIS Guide
No ratings yet
ETL Solution with SSIS Guide
27 pages
Using Set Operators
No ratings yet
Using Set Operators
13 pages
Implementing An Azure SQL Data Warehouse
No ratings yet
Implementing An Azure SQL Data Warehouse
41 pages
Debugging and Troubleshooting SSIS Packages
No ratings yet
Debugging and Troubleshooting SSIS Packages
24 pages
Big Data Analytics Lifecycle Guide
No ratings yet
Big Data Analytics Lifecycle Guide
1 page
Deploying and Configuring SSIS Packages
No ratings yet
Deploying and Configuring SSIS Packages
25 pages
CDC For Microsoft SQL Server Using Routes and Queues
No ratings yet
CDC For Microsoft SQL Server Using Routes and Queues
36 pages
SQL2014 Updating Your Skills MVA Module 3
No ratings yet
SQL2014 Updating Your Skills MVA Module 3
12 pages
20463C Curso SQL Server
No ratings yet
20463C Curso SQL Server
130 pages
Microsoft Official Course: Implementing A SQL Data Warehouse
0% (1)
Microsoft Official Course: Implementing A SQL Data Warehouse
13 pages
AZ 900T00 Microsoft Azure Fundamentals 04
No ratings yet
AZ 900T00 Microsoft Azure Fundamentals 04
37 pages
Columnstore Indexes Guide
No ratings yet
Columnstore Indexes Guide
20 pages
SSIS Scripting and Custom Components
No ratings yet
SSIS Scripting and Custom Components
16 pages
Loading and Extracting HFM
No ratings yet
Loading and Extracting HFM
11 pages
Database Structures
No ratings yet
Database Structures
30 pages
Consuming Data in A Data Warehouse
No ratings yet
Consuming Data in A Data Warehouse
24 pages
Rustam Zokirov Resume
No ratings yet
Rustam Zokirov Resume
2 pages
AZ 900T00 Microsoft Azure Fundamentals 03
No ratings yet
AZ 900T00 Microsoft Azure Fundamentals 03
67 pages
AZ 900T00 Microsoft Azure Fundamentals 03
No ratings yet
AZ 900T00 Microsoft Azure Fundamentals 03
67 pages
Lab Answer Key: Module 1: SQL Server Security Lab: Authenticating Users
No ratings yet
Lab Answer Key: Module 1: SQL Server Security Lab: Authenticating Users
10 pages
20764C 01 PDF
No ratings yet
20764C 01 PDF
26 pages
10777A ENU Companion
No ratings yet
10777A ENU Companion
137 pages
20762B ENU Companion
0% (1)
20762B ENU Companion
212 pages
Data Analytics
No ratings yet
Data Analytics
146 pages
Theni School Dropouts: Data Mining Study
No ratings yet
Theni School Dropouts: Data Mining Study
6 pages
Streaming Integration (Steve Wilkes and Alok Pareek)
No ratings yet
Streaming Integration (Steve Wilkes and Alok Pareek)
108 pages
20761B 01
100% (1)
20761B 01
21 pages
Designing and Implementing Tables
No ratings yet
Designing and Implementing Tables
28 pages
Introduction To SQL Server® 2012 and Its Toolset
100% (1)
Introduction To SQL Server® 2012 and Its Toolset
28 pages
Microsoft Official Course: Querying Data With Transact-SQL
No ratings yet
Microsoft Official Course: Querying Data With Transact-SQL
13 pages
Microsoft Official Course: Implementing Microsoft Azure Infrastructure Solutions
No ratings yet
Microsoft Official Course: Implementing Microsoft Azure Infrastructure Solutions
15 pages
10987C Setupguide
No ratings yet
10987C Setupguide
23 pages
Using Subqueries
No ratings yet
Using Subqueries
12 pages
Lab01
No ratings yet
Lab01
5 pages
Planning and Implementing Data Services
No ratings yet
Planning and Implementing Data Services
33 pages
Ensuring Data Integrity Through Constraints
No ratings yet
Ensuring Data Integrity Through Constraints
26 pages
Journal
No ratings yet
Journal
32 pages
Sorting and Filtering Data
No ratings yet
Sorting and Filtering Data
24 pages
SQL Server Built-In Functions Guide
No ratings yet
SQL Server Built-In Functions Guide
29 pages
20764C Setupguide
No ratings yet
20764C Setupguide
23 pages
Introduction To Business Intelligence and Data Modeling
No ratings yet
Introduction To Business Intelligence and Data Modeling
15 pages
PBI - Banking Project Interview Questions
No ratings yet
PBI - Banking Project Interview Questions
16 pages
Implementing Data Models and Reports With Microsoft® SQL Server® 2012
No ratings yet
Implementing Data Models and Reports With Microsoft® SQL Server® 2012
12 pages
20463D 03
No ratings yet
20463D 03
32 pages
Configuring Security For SQL Server Agent
No ratings yet
Configuring Security For SQL Server Agent
22 pages
10775a 01
No ratings yet
10775a 01
28 pages
Unit 1 - BD - Introduction To Big Data
No ratings yet
Unit 1 - BD - Introduction To Big Data
89 pages
Monitoring SQL Server With Alerts and Notifications
No ratings yet
Monitoring SQL Server With Alerts and Notifications
25 pages
SQL Server Security Essentials
No ratings yet
SQL Server Security Essentials
27 pages
Recovery Models and Backup Strategies
No ratings yet
Recovery Models and Backup Strategies
23 pages
Planning For SQL Server® 2012 Indexing
No ratings yet
Planning For SQL Server® 2012 Indexing
25 pages
Working With Data Types
No ratings yet
Working With Data Types
31 pages
SQL Server CLR Integration Guide
No ratings yet
SQL Server CLR Integration Guide
19 pages
Manage SQL Server with PowerShell
No ratings yet
Manage SQL Server with PowerShell
30 pages
Supporting Self Service Reporting
No ratings yet
Supporting Self Service Reporting
22 pages
Configuring Security For SQL Server Agent
No ratings yet
Configuring Security For SQL Server Agent
22 pages
Creating Multidimensional Databases
No ratings yet
Creating Multidimensional Databases
35 pages
Using DML To Modify Data
No ratings yet
Using DML To Modify Data
14 pages
Building A Data Empowered Company Domo Ebook PDF
No ratings yet
Building A Data Empowered Company Domo Ebook PDF
12 pages
Dataiku - Data Science Operationalization
No ratings yet
Dataiku - Data Science Operationalization
23 pages
Columnstore Indexes Guide
No ratings yet
Columnstore Indexes Guide
20 pages
Using In-Memory Tables
No ratings yet
Using In-Memory Tables
21 pages
Naukri Lavakumartanukula (7y 6m)
No ratings yet
Naukri Lavakumartanukula (7y 6m)
5 pages
ELT Using Pandas
No ratings yet
ELT Using Pandas
5 pages
DataStage Administration
No ratings yet
DataStage Administration
98 pages
Naga QA Lead
No ratings yet
Naga QA Lead
6 pages
Recovery Models and Backup Strategies
No ratings yet
Recovery Models and Backup Strategies
31 pages
Top 8 Business Intelligence Challenges and How To Handle Them
No ratings yet
Top 8 Business Intelligence Challenges and How To Handle Them
5 pages
Data Warehouse & BI Expert Profile
No ratings yet
Data Warehouse & BI Expert Profile
4 pages
SAP & Anaplan BPM Implementation Expert
No ratings yet
SAP & Anaplan BPM Implementation Expert
16 pages
Pentaho QuickStart Windows PDF
No ratings yet
Pentaho QuickStart Windows PDF
14 pages
Intel It Minimizing Manu Data MGMT Costs Paper
No ratings yet
Intel It Minimizing Manu Data MGMT Costs Paper
9 pages
The Effects of Using Business Intelligence Systems On An Excellence Management and Decision-Making Process by Start-Up Companies: A Case Study
No ratings yet
The Effects of Using Business Intelligence Systems On An Excellence Management and Decision-Making Process by Start-Up Companies: A Case Study
11 pages

Data Quality for Data Stewards

Uploaded by

Data Quality for Data Stewards

Uploaded by

Module 10

Enforcing Data Quality

Introduction to Data Quality

What Is Data Quality and Why Do You Need It?

• Business decisions should be made on trusted

• DQS is a knowledge-based solution for:

• Repository of knowledge about data:

• The Azure Marketplace hosts specialist data

• Creating a knowledge base is an iterative

In this demonstration, you will see how to:

Creating a Data Cleansing Project

1. Select a knowledge base

• Output: the values for all fields after data

In this demonstration, you will see how to:

• Input data to be cleansed

Exercise 1: Creating a DQS Knowledge Base

Estimated Time: 30 minutes.

You have created an ETL solution for the

Having completed this lab, you will now be able to:

Creating a Matching Policy

• Define matching rules for business entities

1. Select a knowledge base

• Cluster ID: identifier for a cluster of matched

In this demonstration, you will see how to:

Exercise 1: Creating a Matching Policy

Estimated Time: 30 minutes

You have created a DQS knowledge base and

o An exact match of the City column has a

Having completed this lab, you will now be able to:

You might also like