Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
160 views46 pages

ETL Testing Topics1

The document discusses Extract, Transform, Load (ETL) testing concepts. It covers what ETL is, the ETL process, who is involved in ETL testing, why ETL testing is performed, how ETL testing works, common ETL testing tasks like data extraction, transformation, loading and validation, ETL testing categories, tools used for ETL testing, and the importance of testing the ETL process to ensure accurate data warehousing and reporting. It also briefly discusses SQL concepts, advantages of manual testing, data warehouse testing, and the ETL testing process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views46 pages

ETL Testing Topics1

The document discusses Extract, Transform, Load (ETL) testing concepts. It covers what ETL is, the ETL process, who is involved in ETL testing, why ETL testing is performed, how ETL testing works, common ETL testing tasks like data extraction, transformation, loading and validation, ETL testing categories, tools used for ETL testing, and the importance of testing the ETL process to ensure accurate data warehousing and reporting. It also briefly discusses SQL concepts, advantages of manual testing, data warehouse testing, and the ETL testing process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

ETL Testing Concepts :

 ETL Testing - Introduction


 ETL Testing - Tasks
 ETL vs Database Testing
 ETL Testing - Categories
 ETL Testing - Challenges
 ETL - Tester's Roles
 ETL Testing - Techniques
 ETL Testing - Process
 ETL Testing - Scenarios(Test Cases)
 ETL Testing - Performance
 ETL Testing - Scalability
 ETL Testing - Data Accuracy
 ETL Testing - Metadata
 ETL Testing - Data Transformations
 ETL Testing - Data Quality
 ETL Testing - Data Completeness
 ETL Testing - Backup Recovery
 ETL Testing - Automation(Informatica,ICEDQ)
SQL Concepts:

 SQL - Overview
 SQL - RDBMS Concepts
 SQL - Databases
 SQL - Syntax
 SQL - Data Types
 SQL - Operators
 SQL - Expressions
 SQL - Create Database
 SQL - Drop Database
 SQL - Select Database
 SQL - Create Table
 SQL - Drop Table
 SQL - Insert Query
 SQL - Select Query
 SQL - Where Clause
 SQL - AND & OR Clauses
 SQL - Update Query
 SQL - Delete Query
 SQL - Like Clause
 SQL - Top Clause
 SQL - Order By
 SQL - Group By
 SQL - Distinct Keyword
 SQL - Sorting Results

 Advanced SQL

 SQL - Constraints
 SQL - Using Joins
 SQL - Unions Clause
 SQL - NULL Values
 SQL - Alias Syntax
 SQL - Indexes
 SQL - Alter Command
 SQL - Truncate Table
 SQL - Using Views
 SQL - Having Clause
 SQL - Transactions
 SQL - Wildcards
 SQL - Date Functions
 SQL - Temporary Tables
 SQL - Clone Tables
 SQL - Sub Queries
 SQL - Using Sequences
 SQL - Handling Duplicates

Advantages of Manual Testing

No Environment Limitations
UFT/QTP doesn’t support Linux Operating Environment.

Selenium doesn’t support Desk Applications Test Automation.)

 Programming Knowledge is not required.


4. Recommendable for Usability Testing.

(Human users only can judge the look & feel accepts, We cannot check the User-friendliness of the System (AUT) using Test
tools.)

5. Manual testing allows for human observation, which may be more useful to find potential defects.

(In Manual Testing, User / Tester interacts more with the AUT then it is easy to find defects and Tester can provide
suggestions to the development team.)

Advantages and Disadvantages of Manual Testing


2. Disadvantages of Manual Testing

1. Manual Testing requires more time or more resources, sometimes both Time and Resources.

(Covering all areas of the Application requires more Tests, Creating all possible Test cases, and executing Test cases takes
more time. If it is Test Automation, the Test tool can execute Tests quickly.)

2. Less Accuracy

3. Performance testing is impractical in Manual testing.


4. Comparing a large amount of data is impractical.

(Comparing two Databases that have thousands of records is impractical, but it is very is in Test Automation.)

5. Processing change requests during software maintenance takes more time.

6. Batch Testing is possible, but for each test execution, Human user interaction is mandatory.

7. GUI Objects Size difference and Color combinations etc.. are not easy to find in Manual Testing.

8. Manual Test Case scope is very less, if it is an Automated test then the scope is more.

(In Manual Testing, Test case scope is very limited why because Tester/user can concentrate on one or two Verification
points only, If it is Test Automation, Test tool (Tool also Software) can concentration on multiple verification points at a
time.)

9. Executing the same tests, again and again, is time taking process as well as Tedious.

(Sometimes we need to execute the same tests using multiple sets of Test data, for each test iteration user interaction is
mandatory, In Test Automation using a Test Data data file (either a Text file or Excel file, or Database file) we can easily
conduct Data-driven Testing.)

10. For every release you must rerun the same set of tests which can be tiresome.

(We need to execute Sanity Test Cases and Regression Test cases on every modified build, it takes more time. In
Automated Testing / Test Automation once we can create Tests then Tool can execute Tests multiple times quickly.)
Data Warehouse Testing
Data Warehouse Testing is a testing method in which the data inside a data warehouse is tested for integrity, reliability,
accuracy and consistency in order to comply with the company’s data framework. The main purpose of data warehouse
testing is to ensure that the integrated data inside the data warehouse is reliable enough for a company to make
decisions on.

Enterprise resource planning (ERP) refers to a type of software that organizations use to manage day-to-day business
activities such as accounting, procurement, project management, risk management and compliance, and supply chain
operations.
Customer relationship management is a process in which a business or other organization administers its interactions with
customers, typically using data analysis to study large amounts of information.

A data mart is a simple form of a data warehouse that is focused on a single subject or line of business, such as sales,
finance, or marketing. 

ETL Testing Process


Similar to other Testing Process, ETL also go through different phases. The different phases of ETL testing process is as
follows
CONTENTS
 What does ETL stand for?
 Why perform an ETL?
 What is a Data Warehouse?
 What is Big Data?
 Why deal with all this data?
 What is Business Intelligence software?
 Typical Data Architecture
 How does the ETL process work?
 Who is involved in the ETL process?
 Why perform ETL Testing?
 How does ETL Testing work?
 What is Sampling?
 What is a Minus Query?
 Tools, Utilities and Frameworks
 Commercial Software
 The Bottom Line

What does ETL stand for?

ETL = Extract, Transform, Load

According to Wikipedia, “Extract, Transform, Load (ETL) is the general procedure of copying data from one or more data
sources into a destination system which represents the data differently from the source(s). The ETL process is often used in
data warehousing.
 Data extraction involves extracting data from homogeneous or heterogeneous sources
 Data transformation processes data by cleaning and transforming them into a proper storage format/structure
for the purposes of querying and analysis
 Data loading describes the insertion of data into the target data store, data mart, data lake or data warehouse.
Who is involved in the ETL process?

There are at least 4 roles involved. They are: 

Data Analyst: Creates data requirements (source-to-target map or mapping doc) 

Data Architect: Models and builds data store (Big Data lake, Data Warehouse, Data Mart, etc.) 

ETL Developer: Transforms and loads data from sources to target data stores 

ETL Tester: Validates the data, based on mappings, as it moves and transforms f.rom sources to targets 

The image on the right shows the intertwined roles, tasks and timelines for performing ETL Testing with the sampling
method. 

Why perform ETL Testing?


Bad data caused by defects in the ETL process can cause data problems in reporting that can result in poor strategic
decision-making.

ETL Testing – Tasks to be Performed

Here is a list of the common tasks involved in ETL Testing −

 Understand the data to be used for reporting

 Source to target mapping

 Data checks on source data

 Packages and schema validation


 Data verification in the target system

 Verification of data transformation calculations and aggregation rules

 Sample data comparison between the source and the target system

 Data integrity and quality checks in the target system

 Performance testing on data

 Both ETL testing and database testing involve data validation, but they are not the same. ETL testing is normally
performed on data in a data warehouse system, whereas database testing is commonly performed on
transactional systems where the data comes from different applications into the transactional database.
 Here, we have highlighted the major differences between ETL testing and Database testing.

ETL Testing categorization is done based on objectives of testing and reporting. Testing categories vary as per the
organization standards and it also depends on the client requirements. Generally, ETL testing is categorized based on the
following points −

 Source to Target Count Testing − It involves matching of count of records in the source and the target
systems.
 Source to Target Data Testing − It involves data validation between the source and the target systems.
It also involves duplicate data check in the target system.
 Data Mapping or Transformation Testing − It confirms the mapping of objects in the source and the
target systems. It also involves checking the functionality of data in the target system.
 End-User Testing − It involves generating reports for end-users to verify if the data in the reports are as
per expectation. It involves finding deviation in reports and cross-check the data in the target system for
report validation.
 Retesting − It involves fixing the bugs and defects in data in the target system and running the reports
again for data validation.
 System Integration Testing − It involves testing all the individual systems, and later combine the results
to find if there are any deviations. There are three approaches that can be used to perform this: top-
down, bottom-up, and hybrid.
Based on the structure of a Data Warehouse system, ETL testing (irrespective of the tool that is used) can be divided into
the following categories −

New DW System Testing

In this type of testing, there is a new DW system built and verified. Data inputs are taken from customers/end-users and
also from different data sources and a new data warehouse is created. Later, the data is verified in the new system with
help of ETL tools.

Migration Testing

In migration testing, customers have an existing Data Warehouse and ETL, but they look for a new ETL tool to improve the
efficiency. It involves migration of data from the existing system using a new ETL tool.

Change Testing

In change testing, new data is added from different data sources to an existing system. Customers can also change the
existing rules for ETL or a new rule can also be added.

Report Testing

Report testing involves creating reports for data validation. Reports are the final output of any DW system. Reports are
tested based on their layout, data in the report, and calculated values.

ETL Testing

ETL testing involves the following operations −


 Validation of data movement from the source to the target system.
 Verification of data count in the source and the target system.
 Verifying data extraction, transformation as per requirement and expectation.
 Verifying if table relations − joins and keys − are preserved during the transformation.
Common ETL testing tools include QuerySurge, Informatica, etc.

Database Testing

Database testing stresses more on data accuracy, correctness of data and valid values. It involves the following operations

 Verifying if primary and foreign keys are maintained.


 Verifying if the columns in a table have valid data values.
 Verifying data accuracy in columns. Example − Number of months column shouldn’t have a value
greater than 12.
 Verifying missing data in columns. Check if there are null columns which actually should have a valid
value.
Common database testing tools include Selenium, QTP, etc.

The following table captures the key features of Database and ETL testing and their comparison −

Function Database Testing ETL Testing

Data validation and Integration Data Extraction, Transform and Loading for
Primary Goal
BI Reporting

Transactional system where business flow System containing historical data and not
Applicable System
occurs in business flow environment

Common tools QTP, Selenium, etc. QuerySurge, Informatica, etc.

It is used to integrate data from multiple It is used for Analytical Reporting,


Business Need
applications, Severe impact. information and forecasting.

Database Type It is normally used in OLTP systems It is applied to OLAP systems

De-normalized data with less joins, more


Data Type Normalized data with more joins
indexes, and aggregations.
ETL TESTING CHALLENGES

ETL testing is different from database testing or any other conventional testing. One may have to face different types of
challenges while performing ETL testing. Here we listed a few common challenges −

 Data loss during the ETL process.


 Incorrect, incomplete or duplicate data.
 DW system contains historical data, so the data volume is too large and extremely complex to perform
ETL testing in the target system.
 ETL testers are normally not provided with access to see job schedules in the ETL tool. They hardly have
access to BI Reporting tools to see the final layout of reports and data inside the reports.
 Tough to generate and build test cases, as data volume is too high and complex.
 ETL testers normally don’t have an idea of end-user report requirements and business flow of the
information.
 ETL testing involves various complex SQL concepts for data validation in the target system.
 Sometimes the testers are not provided with the source-to-target mapping information.
 Unstable testing environment delay the development and testing of a process.
ETL Testing – Techniques
It is important that you define the correct ETL Testing technique before starting the testing process. You should take an
acceptance from all the stakeholders and ensure that a correct technique is selected to perform ETL testing. This technique
should be well known to the testing team and they should be aware of the steps involved in the testing process.
There are various types of testing techniques that can be used. In this chapter, we will discuss the testing techniques in brief.

Production Validation Testing

To perform Analytical Reporting and Analysis, the data in your production should be correct. This testing is done on the data
that is moved to the production system. It involves data validation in the production system and comparing it the with the
source data.

Source-to-target Count Testing

This type of testing is done when the tester has less time to perform the testing operation. It involves checking the count of
data in the source and the target systems. It doesn’t involve checking the values of data in the target system. It also doesn’t
involve if the data is in ascending or descending order after mapping of data.

Source-to-target Data Testing

In this type of testing, a tester validates data values from the source to the target system. It checks the data values in the
source system and the corresponding values in the target system after transformation. This type of testing is time-consuming
and is normally performed in financial and banking projects.

Data Integration / Threshold Value Validation Testing

In this type of testing, a tester validates the range of data. All the threshold values in the target system are checked if they are
as per the expected result. It also involves integration of data in the target system from multiple source systems after
transformation and loading.
Example − Age attribute shouldn’t have a value greater than 100. In the date column DD/MM/YY, the month field
shouldn’t have a value greater than 12.

Application Migration Testing

Application migration testing is normally performed automatically when you move from an old application to a new
application system. This testing saves a lot of time. It checks if the data extracted from an old application is same as per the
data in the new application system.

Data Check and Constraint Testing

It includes performing various checks such as data type check, data length check, and index check. Here a Test Engineer
performs the following scenarios − Primary Key, Foreign Key, NOT NULL, NULL, and UNIQUE.

Duplicate Data Check Testing

This testing involves checking for duplicate data in the target system. When there is a huge amount of data in the target
system, it is possible that there is duplicate data in the production system that may result in incorrect data in Analytical
Reports.
Duplicate values can be checked with SQL statement like −
Select Cust_Id, Cust_NAME, Quantity, COUNT (*)
FROM Customer
GROUP BY Cust_Id, Cust_NAME, Quantity HAVING COUNT (*) >1;
Duplicate data appears in the target system due to the following reasons −

 If no primary key is defined, then duplicate values may come.

 Due to incorrect mapping or environmental issues.


 Manual errors while transferring data from the source to the target system.

Data Transformation Testing

Data transformation testing is not performed by running a single SQL statement. It is time-consuming and involves running
multiple SQL queries for each row to verify the transformation rules. The tester needs to run SQL queries for each row and
then compare the output with the target data.

Data Quality Testing

Data quality testing involves performing number check, date check, precision check, etc. A tester performs Syntax Test to
report invalid characters, incorrect upper/lower case order, etc. and Reference Tests to check if the data is according to the
data model.

Incremental Testing

Incremental testing is performed to verify if Insert and Update statements are executed as per the expected result. This
testing is performed step-by-step with old and new data.

Regression Testing

When we make changes to data transformation and aggregation rules to add new functionality which also helps the tester to
find new errors, it is called Regression Testing. The bugs in data that that comes in regression testing are called Regression.

Retesting

When you run the tests after fixing the codes, it is called retesting.

System Integration Testing

System integration testing involves testing the components of a system individually and later integrating the modules. There
are three ways a system integration can be done: top-down, bottom-up, and hybrid.

Navigation Testing

Navigation testing is also known as testing the front-end of the system. It involves enduser point of view testing by checking
all the aspects of the front-end report − includes data in various fields, calculation and aggregates, etc.
 Previous Page Print PageNext Page  

Advertisements

ETL Testing – Scenarios

Test Scenarios Test-Cases

It involves validating the source and the target table structure as per the mapping
document.
Data type should be validated in the source and the target systems.
Structure Validation The length of data types in the source and the target system should be same.
Data field types and their format should be same in the source and the target
system.
Validating the column names in the target system.

Validating Mapping document It involves validating the mapping document to ensure all the information has been
provided. The mapping document should have change log, maintain data types,
length, transformation rules, etc.

Validate Constraints It involves validating the constraints and ensuring that they are applied on the
expected tables.

It involves checking the misuse of integrity constraints like Foreign Key.


Data Consistency check The length and data type of an attribute may vary in different tables, though their
definition remains same at the semantic layer.

It involves checking if all the data is loaded to the target system from the source
system.

Data Completeness Validation Counting the number of records in the source and the target systems.
Boundary value analysis.
Validating the unique values of primary keys.

It involves validating the values of data in the target system.


Misspelled or inaccurate data is found in table.
Data Correctness Validation
Null, Not Unique data is stored when you disable integrity constraint at the time of
import.

It involves creating a spreadsheet of scenarios for input values and expected results
and then validating with end-users.
Validating parent-child relationship in the data by creating scenarios.
Data Transform validation
Using data profiling to compare the range of values in each field.
Validating if the data types in the warehouse are same as mentioned in the data
model.

It involves performing number check, date check, precision check, data check,
Data Quality Validation Null check, etc.
Example − Date format should be same for all the values.

Null Validation It involves checking the Null values where Not Null is mentioned for that field.

It involves validating duplicate values in the target system when data is coming
from multiple columns from the source system.
Duplicate Validation
Validating primary keys and other columns if there is any duplicate values as per
the business requirement.

Date Validation check Validating date field for various actions performed in ETL process.
Common test-cases to perform Date validation −

 From_Date should not greater than To_Date


 Format of date values should be proper.
 Date values should not have any junk values or null values

It involves validating full data set in the source and the target tables by using
minus query.

 You need to perform both source minus target and target


minus source.
 If the minus query returns a value, that should be considered as
Full Data Validation Minus mismatching rows.
Query  You need to match the rows in source and target using
the Intersect statement.
 The count returned by Intersect should match with the
individual counts of source and target tables.
 If the minus query returns no rows and the count intersect is
less than the source count or the target table count, then the
table holds duplicate rows.

Other Test scenarios can be to verify that the extraction process did not extract
duplicate data from the source system.
Other Test Scenarios
The testing team will maintain a list of SQL statements that are run to validate that
no duplicate data have been extracted from the source systems.

Data Cleaning Unwanted data should be removed before loading the data to the staging area.

 Previous Page Print PageNext Page  

Advertisements

\How does ETL Testing work?

ETL Testing is a way to perform validation of the data as it moves from one data store to another. The ETL Tester uses
a Mapping Document (if one exists), which is also known as a source-to-target map. This is the critical element required to
efficiently plan the target Data Stores. It also defines the Extract, Transform and Load (ETL) process.

The intention is to capture business rules, data flow mapping and data movement requirements. The Mapping Document
specifies source input definition, target/output details and business & data transformation rules.

The typical process for ETL Testing is as follows:

1) Review the Schema and Business Rules / Mappings

Schemas are ways in which data is organized within a database or data warehouse.

Business Rules are also known as Mappings or Source-to-Target mappings and are typically found in a Mapping Document.
The mapping tables in the document are the requirements or rules for extracting, transforming (if at all) and loading (ETL)
data from the source database and files into the target data warehouse or big data store. Specifically, the mapping fields
show:
 Table names, field names, data types and length of both source and target fields
 How source tables / files should be joined in the new target data set
 Any transformation logic that will be applied
 Any business rules that will be applied
Click to Enlarge

2) Create Test Cases

Each Mapping will typically have its own test case. Test Case will typically have two sets of SQL queries (or HQL for
Hadoop). One query will extract data from the sources (flat files, databases, xml, web services, etc.) and the other query
will extract data from the target (Data Warehouses or Big Data stores).

Click to Enlarge

Extract data from source data stores (Left) Extract data from target Data Warehouse (Right)
3) Execute Tests, Export Results

These tests are typically executed using a SQL editor such as Toad, SQuirrel, DBeaver, or any other favorite editor. The test
results from the 2 queries are saved into 2 Excel spreadsheets.

4) Compare Results

Compare all result sets in the source spreadsheet with the target spreadsheet by eye compare. (also known as“Stare
& Compare”). There will be lots of scrolling to the right to compare dozens, if not hundreds of columns and lots of scrolling
down to compare tens of thousands or even millions of rows. 

There are 4 different methods for performing ETL testing: 


 Sampling
 Minus Queries
 Using home grown tools, utilities and frameworks
 Using commercial software like QuerySurge

What is a Minus Query?

A Minus Query is a query that uses the MINUS operator in SQL to subtract one result set from another result set to
evaluate the result set difference. If there is no difference, there is no remaining result set. If there is a difference, the
resulting rows will be displayed.
The way to test using Minus Queries is to perform source-minus-target and target-minus-source queries for all data, making
sure the extraction process did not provide duplicate data in the source and all unnecessary columns are removed before
loading the data for validation.

Some issues we found are: 


 Minus Queries are processed on either the source or the target database, which can draw significantly on
database resources (CPU, memory, and hard drive read/write)
 In the standard minus query implementation, minus queries need to be executed twice (Source-to-Target and
Target-to-Source). This doubles execution time and resource utilization 
 If directional minus queries are combined via a Union (a union reduces the number of queries executed by half),
information about which side the extra rows are on can be lost 
 Result sets may be inaccurate when duplicate rows of data exist (the minus query may only return 1 row even if
there are duplicates) 

Commercial ETL Testing solutions pull data from data sources and from data targets and quickly compare them.

The ETL testing process mimics the ETL development process by testing data from point-to-point along the data lifecycle.
During transformation, data goes through the following sub-processes

 Cleaning (Removing, or deleting error or inconsistency of data to improve data quality)


 Filtering (selection of relevant rows or columns)
 Joining (linking of relevant data of multiple sources)
 Sorting (sorting in desired order)
 Splitting (Splitting data into columns)
 Deduplication (finding and removing duplicate records)
 Summarization (data is collected and stored in a summarized format. For example total sales in a
particular year)
 Data validation: Rejecting the data which is missing some default value, or predefined format.
 Derivation: Business rules are applied on data, and checked for validity, if found incorrect then its
returned back to the source.

Once the data is transformed by applying all the above methods, data become consistent and ready to load.

Types of ETL Testing


Types Of Testing Testing Process

“Table balancing” or “production reconciliation” this type of ETL testing is done on data as it is being moved into produ
Production Validation
support your business decision, the data in your production systems has to be in the correct order. Informatica Data Va
Testing
provides the ETL testing automation and management capabilities to ensure that production systems are not comprom

Source to Target Testing


Such type of testing is carried out to validate whether the data values transformed are the expected data values.
(Validation Testing)

Such type of ETL testing can be automatically generated, saving substantial test development time. This type of testing
Application Upgrades
data extracted from an older application or repository are exactly same as the data in a repository or new application.

Metadata Testing Metadata testing includes testing of data type check, data length check and index/constraint check.
Types Of Testing Testing Process

To verify that all the expected data is loaded in target from the source, data completeness testing is done. Some of the
Data Completeness
are compare and validate counts, aggregates and actual data between the source and target for columns with simple tr
Testing
transformation.

Data Accuracy Testing This testing is done to ensure that the data is accurately loaded and transformed as expected.

Data Transformation Testing data transformation is done as in many cases it cannot be achieved by writing one source SQL query and compa
Testing the target. Multiple SQL queries may need to be run for each row to verify the transformation rules.

Data Quality Tests includes syntax and reference tests. In order to avoid any error due to date or order number during
Quality testing is done.

Syntax Tests: It will report dirty data, based on invalid characters, character pattern, incorrect upper or lower case orde
Data Quality Testing

Reference Tests: It will check the data according to the data model. For example: Customer ID

Data quality testing includes number check, date check, precision check, data check , null check etc.
This testing is done to check the data integrity of old and new data with the addition of new data. Incremental testing v
Incremental ETL testing
inserts and updates are getting processed as expected during incremental ETL process.

GUI/Navigation Testing This testing is done to check the navigation or GUI aspects of the front end reports.

How to Create ETL Test Case


ETL testing is a concept which can be applied to different tools and databases in information management industry. The
objective of ETL testing is to assure that the data that has been loaded from a source to destination after business
transformation is accurate. It also involves the verification of data at various middle stages that are being used between
source and destination.

While performing ETL testing, two documents that will always be used by an ETL tester are

1. ETL mapping sheets :An ETL mapping sheets contain all the information of source and destination
tables including each and every column and their look-up in reference tables. An ETL testers need
to be comfortable with SQL queries as ETL testing may involve writing big queries with multiple
joins to validate data at any stage of ETL. ETL mapping sheets provide a significant help while
writing queries for data verification.
2. DB Schema of Source, Target: It should be kept handy to verify any detail in mapping sheets.

ETL Test Scenarios and Test Cases


Test Scenario Test Cases

Mapping doc validation Verify mapping doc whether corresponding ETL information is provided or not. Change log should maintain in every ma

1. Validate the source and target table structure against corresponding mapping doc.
2. Source data type and target data type should be same
3. Length of data types in both source and target should be equal
Validation 4. Verify that data field types and formats are specified
5. Source data type length should not less than the target data type length
6. Validate the name of columns in the table against mapping doc.

Constraint Validation Ensure the constraints are defined for specific table as expected

1. The data type and length for a particular attribute may vary in files or tables though the semantic definition is
Data consistency issues 2. Misuse of integrity constraints
Test Scenario Test Cases

1. Ensure that all expected data is loaded into target table.


2. Compare record counts between source and target.
3. Check for any rejected records
Completeness Issues 4. Check data should not be truncated in the column of target tables
5. Check boundary value analysis
6. Compares unique values of key fields between data loaded to WH and source data

1. Data that is misspelled or inaccurately recorded


Correctness Issues 2. Null, non-unique or out of range data

Transformation Transformation

1. Number check: Need to number check and validate it


2. Date Check: They have to follow date format and it should be same across all records
Data Quality 3. Precision Check
4. Data check
5. Null check

Null Validate Verify the null values, where “Not Null” specified for a specific column.

1. Needs to validate the unique key, primary key and any other column should be unique as per the business re
any duplicate rows
Duplicate Check 2. Check if any duplicate values exist in any column which is extracting from multiple columns in source and com
column
3. As per the client requirements, needs to be ensure that no duplicates in combination of multiple columns wit

Date values are using many areas in ETL development for

1. To know the row creation date


Date Validation 2. Identify active records as per the ETL development perspective
3. Identify active records as per the business requirements perspective
4. Sometimes based on the date values the updates and inserts are generated.

1. To validate the complete data set in source and target table minus a query in a best solution
2. We need to source minus target and target minus source
3. If minus query returns any value those should be considered as mismatching rows
Complete Data 4. Needs to matching rows among source and target using intersect statement
Validation 5. The count returned by intersect should match with individual counts of source and target tables
6. If minus query returns of rows and count intersect is less than source count or target table then we can consi
are existed.

Data Cleanness Unnecessary columns should be deleted before loading into the staging area.

Types of ETL Bugs

Type of Bugs Description

 Related to GUI of application


User interface bugs/cosmetic bugs  Font style, font size, colors, alignment, spelling mistakes, navigation and so on

 Minimum and maximum values


Boundary Value Analysis (BVA) related bug
Type of Bugs Description

 Valid values not accepted


Input/Output bugs  Invalid values accepted

 Mathematical errors
Calculation bugs  Final output is wrong

 Does not allows multiple users


Load Condition bugs  Does not allows customer expected load

 System crash & hang


Race Condition bugs  System cannot run client platforms

 No logo matching
 No version information available
Version control bugs

 Device is not responding to the application


H/W bugs

Best Practices for ETL Testing

1. Make sure data is transformed correctly


2. Without any data loss and truncation projected data should be loaded into the data warehouse
3. Ensure that ETL application appropriately rejects and replaces with default values and reports invalid data
4. Need to ensure that the data loaded in data warehouse within prescribed and expected time frames to confirm
scalability and performance
5. All methods should have appropriate unit tests regardless of visibility
6. To measure their effectiveness all unit tests should use appropriate coverage techniques
7. Strive for one assertion per test case
8. Create unit tests that target exceptions
 
 
t the time of mapping from source table to target table% Transformation is not 
i n mapping condition% then the Test 8ngineer raises bugs.
 /4)
 
#egression Testing<
0ode modi7cation to 7x a bug or to implement a new functionality which makes us to to7nd errors. These
introduced errors are called regression . !dentifying for regression e+ect is calledregression testing.
 /5)
 
#etesting<
e executing the failed test cases after 7xing the bug.
 /7)
 
System ntegration Testing<
!ntegration testing9 fter the completion of programming process. Developer 
c a n integrate the modules there are " modelsa) Top Downb) 6ottom 2pc) @ybrid 
*******************************************************************************
 What is secondary index? Whats are its uses?
 secondary index is an alternate path to the data. 'econdary indexes are used
t o improve performance by allowing the user to avoid scanning the entire table during a*uery. 
secondary index is like a primary index in that it allows the user to locate rows.2nlike a primary index% it has no
inAuence on the way rows are distributed among amps.'econdary indexes are optional and can be created and
dropped dynamically. 'econdaryindexes re*uire separate subtables which re*uire
extra i>o to maintain the indexes. 0omparing to primary indexes% secondary indexes allow access to
information in a tableby alternate% less fre*uently used paths. Teradata automatically creates a
secondaryindex subtable. The subtable will contain9

 
'econdary index value

 
'econdary index row id

 
#rimary index row id When a user writes an s*l *uery that has a si in the where clause
% the parsingengine will hash the secondary index value. The output is the row hash of the si. The pecreates a
re*uest containing the row hash and gives the re*uest to the message passinglayer -which includes the bynet
software and network). The message passing layer usesa portion of the row hash to point to a bucket in the hash
map. That bucket contains ana m p n u m b e r t o w h i c h t h e p e P s r e * u e s t w i l l b e s e n t . T h e a m p
g e t s t h e r e * u e s t a n d accesses the secondary index subtable pertaining to the re*uested si
information. Thea m p w i l l c h e c k t o s e e i f t h e r o w h a s h e x i s t s i n t h e s u b t a b l e a n d
d o u b l e c h e c k t h e subtable row with the actual secondary index value. Then% the amp will create a
re*uestcontaining the primary index row id and send it back to the message passing layer. This

You might also like