ETL Testing Topics1
ETL Testing Topics1
SQL - Overview
SQL - RDBMS Concepts
SQL - Databases
SQL - Syntax
SQL - Data Types
SQL - Operators
SQL - Expressions
SQL - Create Database
SQL - Drop Database
SQL - Select Database
SQL - Create Table
SQL - Drop Table
SQL - Insert Query
SQL - Select Query
SQL - Where Clause
SQL - AND & OR Clauses
SQL - Update Query
SQL - Delete Query
SQL - Like Clause
SQL - Top Clause
SQL - Order By
SQL - Group By
SQL - Distinct Keyword
SQL - Sorting Results
Advanced SQL
SQL - Constraints
SQL - Using Joins
SQL - Unions Clause
SQL - NULL Values
SQL - Alias Syntax
SQL - Indexes
SQL - Alter Command
SQL - Truncate Table
SQL - Using Views
SQL - Having Clause
SQL - Transactions
SQL - Wildcards
SQL - Date Functions
SQL - Temporary Tables
SQL - Clone Tables
SQL - Sub Queries
SQL - Using Sequences
SQL - Handling Duplicates
No Environment Limitations
UFT/QTP doesn’t support Linux Operating Environment.
(Human users only can judge the look & feel accepts, We cannot check the User-friendliness of the System (AUT) using Test
tools.)
5. Manual testing allows for human observation, which may be more useful to find potential defects.
(In Manual Testing, User / Tester interacts more with the AUT then it is easy to find defects and Tester can provide
suggestions to the development team.)
1. Manual Testing requires more time or more resources, sometimes both Time and Resources.
(Covering all areas of the Application requires more Tests, Creating all possible Test cases, and executing Test cases takes
more time. If it is Test Automation, the Test tool can execute Tests quickly.)
2. Less Accuracy
(Comparing two Databases that have thousands of records is impractical, but it is very is in Test Automation.)
6. Batch Testing is possible, but for each test execution, Human user interaction is mandatory.
7. GUI Objects Size difference and Color combinations etc.. are not easy to find in Manual Testing.
8. Manual Test Case scope is very less, if it is an Automated test then the scope is more.
(In Manual Testing, Test case scope is very limited why because Tester/user can concentrate on one or two Verification
points only, If it is Test Automation, Test tool (Tool also Software) can concentration on multiple verification points at a
time.)
9. Executing the same tests, again and again, is time taking process as well as Tedious.
(Sometimes we need to execute the same tests using multiple sets of Test data, for each test iteration user interaction is
mandatory, In Test Automation using a Test Data data file (either a Text file or Excel file, or Database file) we can easily
conduct Data-driven Testing.)
10. For every release you must rerun the same set of tests which can be tiresome.
(We need to execute Sanity Test Cases and Regression Test cases on every modified build, it takes more time. In
Automated Testing / Test Automation once we can create Tests then Tool can execute Tests multiple times quickly.)
Data Warehouse Testing
Data Warehouse Testing is a testing method in which the data inside a data warehouse is tested for integrity, reliability,
accuracy and consistency in order to comply with the company’s data framework. The main purpose of data warehouse
testing is to ensure that the integrated data inside the data warehouse is reliable enough for a company to make
decisions on.
Enterprise resource planning (ERP) refers to a type of software that organizations use to manage day-to-day business
activities such as accounting, procurement, project management, risk management and compliance, and supply chain
operations.
Customer relationship management is a process in which a business or other organization administers its interactions with
customers, typically using data analysis to study large amounts of information.
A data mart is a simple form of a data warehouse that is focused on a single subject or line of business, such as sales,
finance, or marketing.
According to Wikipedia, “Extract, Transform, Load (ETL) is the general procedure of copying data from one or more data
sources into a destination system which represents the data differently from the source(s). The ETL process is often used in
data warehousing.
Data extraction involves extracting data from homogeneous or heterogeneous sources
Data transformation processes data by cleaning and transforming them into a proper storage format/structure
for the purposes of querying and analysis
Data loading describes the insertion of data into the target data store, data mart, data lake or data warehouse.
Who is involved in the ETL process?
Data Architect: Models and builds data store (Big Data lake, Data Warehouse, Data Mart, etc.)
ETL Developer: Transforms and loads data from sources to target data stores
ETL Tester: Validates the data, based on mappings, as it moves and transforms f.rom sources to targets
The image on the right shows the intertwined roles, tasks and timelines for performing ETL Testing with the sampling
method.
Sample data comparison between the source and the target system
Both ETL testing and database testing involve data validation, but they are not the same. ETL testing is normally
performed on data in a data warehouse system, whereas database testing is commonly performed on
transactional systems where the data comes from different applications into the transactional database.
Here, we have highlighted the major differences between ETL testing and Database testing.
ETL Testing categorization is done based on objectives of testing and reporting. Testing categories vary as per the
organization standards and it also depends on the client requirements. Generally, ETL testing is categorized based on the
following points −
Source to Target Count Testing − It involves matching of count of records in the source and the target
systems.
Source to Target Data Testing − It involves data validation between the source and the target systems.
It also involves duplicate data check in the target system.
Data Mapping or Transformation Testing − It confirms the mapping of objects in the source and the
target systems. It also involves checking the functionality of data in the target system.
End-User Testing − It involves generating reports for end-users to verify if the data in the reports are as
per expectation. It involves finding deviation in reports and cross-check the data in the target system for
report validation.
Retesting − It involves fixing the bugs and defects in data in the target system and running the reports
again for data validation.
System Integration Testing − It involves testing all the individual systems, and later combine the results
to find if there are any deviations. There are three approaches that can be used to perform this: top-
down, bottom-up, and hybrid.
Based on the structure of a Data Warehouse system, ETL testing (irrespective of the tool that is used) can be divided into
the following categories −
In this type of testing, there is a new DW system built and verified. Data inputs are taken from customers/end-users and
also from different data sources and a new data warehouse is created. Later, the data is verified in the new system with
help of ETL tools.
Migration Testing
In migration testing, customers have an existing Data Warehouse and ETL, but they look for a new ETL tool to improve the
efficiency. It involves migration of data from the existing system using a new ETL tool.
Change Testing
In change testing, new data is added from different data sources to an existing system. Customers can also change the
existing rules for ETL or a new rule can also be added.
Report Testing
Report testing involves creating reports for data validation. Reports are the final output of any DW system. Reports are
tested based on their layout, data in the report, and calculated values.
ETL Testing
Database Testing
Database testing stresses more on data accuracy, correctness of data and valid values. It involves the following operations
−
The following table captures the key features of Database and ETL testing and their comparison −
Data validation and Integration Data Extraction, Transform and Loading for
Primary Goal
BI Reporting
Transactional system where business flow System containing historical data and not
Applicable System
occurs in business flow environment
ETL testing is different from database testing or any other conventional testing. One may have to face different types of
challenges while performing ETL testing. Here we listed a few common challenges −
To perform Analytical Reporting and Analysis, the data in your production should be correct. This testing is done on the data
that is moved to the production system. It involves data validation in the production system and comparing it the with the
source data.
This type of testing is done when the tester has less time to perform the testing operation. It involves checking the count of
data in the source and the target systems. It doesn’t involve checking the values of data in the target system. It also doesn’t
involve if the data is in ascending or descending order after mapping of data.
In this type of testing, a tester validates data values from the source to the target system. It checks the data values in the
source system and the corresponding values in the target system after transformation. This type of testing is time-consuming
and is normally performed in financial and banking projects.
In this type of testing, a tester validates the range of data. All the threshold values in the target system are checked if they are
as per the expected result. It also involves integration of data in the target system from multiple source systems after
transformation and loading.
Example − Age attribute shouldn’t have a value greater than 100. In the date column DD/MM/YY, the month field
shouldn’t have a value greater than 12.
Application migration testing is normally performed automatically when you move from an old application to a new
application system. This testing saves a lot of time. It checks if the data extracted from an old application is same as per the
data in the new application system.
It includes performing various checks such as data type check, data length check, and index check. Here a Test Engineer
performs the following scenarios − Primary Key, Foreign Key, NOT NULL, NULL, and UNIQUE.
This testing involves checking for duplicate data in the target system. When there is a huge amount of data in the target
system, it is possible that there is duplicate data in the production system that may result in incorrect data in Analytical
Reports.
Duplicate values can be checked with SQL statement like −
Select Cust_Id, Cust_NAME, Quantity, COUNT (*)
FROM Customer
GROUP BY Cust_Id, Cust_NAME, Quantity HAVING COUNT (*) >1;
Duplicate data appears in the target system due to the following reasons −
Data transformation testing is not performed by running a single SQL statement. It is time-consuming and involves running
multiple SQL queries for each row to verify the transformation rules. The tester needs to run SQL queries for each row and
then compare the output with the target data.
Data quality testing involves performing number check, date check, precision check, etc. A tester performs Syntax Test to
report invalid characters, incorrect upper/lower case order, etc. and Reference Tests to check if the data is according to the
data model.
Incremental Testing
Incremental testing is performed to verify if Insert and Update statements are executed as per the expected result. This
testing is performed step-by-step with old and new data.
Regression Testing
When we make changes to data transformation and aggregation rules to add new functionality which also helps the tester to
find new errors, it is called Regression Testing. The bugs in data that that comes in regression testing are called Regression.
Retesting
When you run the tests after fixing the codes, it is called retesting.
System integration testing involves testing the components of a system individually and later integrating the modules. There
are three ways a system integration can be done: top-down, bottom-up, and hybrid.
Navigation Testing
Navigation testing is also known as testing the front-end of the system. It involves enduser point of view testing by checking
all the aspects of the front-end report − includes data in various fields, calculation and aggregates, etc.
Previous Page Print PageNext Page
Advertisements
It involves validating the source and the target table structure as per the mapping
document.
Data type should be validated in the source and the target systems.
Structure Validation The length of data types in the source and the target system should be same.
Data field types and their format should be same in the source and the target
system.
Validating the column names in the target system.
Validating Mapping document It involves validating the mapping document to ensure all the information has been
provided. The mapping document should have change log, maintain data types,
length, transformation rules, etc.
Validate Constraints It involves validating the constraints and ensuring that they are applied on the
expected tables.
It involves checking if all the data is loaded to the target system from the source
system.
Data Completeness Validation Counting the number of records in the source and the target systems.
Boundary value analysis.
Validating the unique values of primary keys.
It involves creating a spreadsheet of scenarios for input values and expected results
and then validating with end-users.
Validating parent-child relationship in the data by creating scenarios.
Data Transform validation
Using data profiling to compare the range of values in each field.
Validating if the data types in the warehouse are same as mentioned in the data
model.
It involves performing number check, date check, precision check, data check,
Data Quality Validation Null check, etc.
Example − Date format should be same for all the values.
Null Validation It involves checking the Null values where Not Null is mentioned for that field.
It involves validating duplicate values in the target system when data is coming
from multiple columns from the source system.
Duplicate Validation
Validating primary keys and other columns if there is any duplicate values as per
the business requirement.
Date Validation check Validating date field for various actions performed in ETL process.
Common test-cases to perform Date validation −
It involves validating full data set in the source and the target tables by using
minus query.
Other Test scenarios can be to verify that the extraction process did not extract
duplicate data from the source system.
Other Test Scenarios
The testing team will maintain a list of SQL statements that are run to validate that
no duplicate data have been extracted from the source systems.
Data Cleaning Unwanted data should be removed before loading the data to the staging area.
Advertisements
ETL Testing is a way to perform validation of the data as it moves from one data store to another. The ETL Tester uses
a Mapping Document (if one exists), which is also known as a source-to-target map. This is the critical element required to
efficiently plan the target Data Stores. It also defines the Extract, Transform and Load (ETL) process.
The intention is to capture business rules, data flow mapping and data movement requirements. The Mapping Document
specifies source input definition, target/output details and business & data transformation rules.
Schemas are ways in which data is organized within a database or data warehouse.
Business Rules are also known as Mappings or Source-to-Target mappings and are typically found in a Mapping Document.
The mapping tables in the document are the requirements or rules for extracting, transforming (if at all) and loading (ETL)
data from the source database and files into the target data warehouse or big data store. Specifically, the mapping fields
show:
Table names, field names, data types and length of both source and target fields
How source tables / files should be joined in the new target data set
Any transformation logic that will be applied
Any business rules that will be applied
Click to Enlarge
2) Create Test Cases
Each Mapping will typically have its own test case. Test Case will typically have two sets of SQL queries (or HQL for
Hadoop). One query will extract data from the sources (flat files, databases, xml, web services, etc.) and the other query
will extract data from the target (Data Warehouses or Big Data stores).
Click to Enlarge
Extract data from source data stores (Left) Extract data from target Data Warehouse (Right)
3) Execute Tests, Export Results
These tests are typically executed using a SQL editor such as Toad, SQuirrel, DBeaver, or any other favorite editor. The test
results from the 2 queries are saved into 2 Excel spreadsheets.
4) Compare Results
Compare all result sets in the source spreadsheet with the target spreadsheet by eye compare. (also known as“Stare
& Compare”). There will be lots of scrolling to the right to compare dozens, if not hundreds of columns and lots of scrolling
down to compare tens of thousands or even millions of rows.
A Minus Query is a query that uses the MINUS operator in SQL to subtract one result set from another result set to
evaluate the result set difference. If there is no difference, there is no remaining result set. If there is a difference, the
resulting rows will be displayed.
The way to test using Minus Queries is to perform source-minus-target and target-minus-source queries for all data, making
sure the extraction process did not provide duplicate data in the source and all unnecessary columns are removed before
loading the data for validation.
Commercial ETL Testing solutions pull data from data sources and from data targets and quickly compare them.
The ETL testing process mimics the ETL development process by testing data from point-to-point along the data lifecycle.
During transformation, data goes through the following sub-processes
Once the data is transformed by applying all the above methods, data become consistent and ready to load.
“Table balancing” or “production reconciliation” this type of ETL testing is done on data as it is being moved into produ
Production Validation
support your business decision, the data in your production systems has to be in the correct order. Informatica Data Va
Testing
provides the ETL testing automation and management capabilities to ensure that production systems are not comprom
Such type of ETL testing can be automatically generated, saving substantial test development time. This type of testing
Application Upgrades
data extracted from an older application or repository are exactly same as the data in a repository or new application.
Metadata Testing Metadata testing includes testing of data type check, data length check and index/constraint check.
Types Of Testing Testing Process
To verify that all the expected data is loaded in target from the source, data completeness testing is done. Some of the
Data Completeness
are compare and validate counts, aggregates and actual data between the source and target for columns with simple tr
Testing
transformation.
Data Accuracy Testing This testing is done to ensure that the data is accurately loaded and transformed as expected.
Data Transformation Testing data transformation is done as in many cases it cannot be achieved by writing one source SQL query and compa
Testing the target. Multiple SQL queries may need to be run for each row to verify the transformation rules.
Data Quality Tests includes syntax and reference tests. In order to avoid any error due to date or order number during
Quality testing is done.
Syntax Tests: It will report dirty data, based on invalid characters, character pattern, incorrect upper or lower case orde
Data Quality Testing
Reference Tests: It will check the data according to the data model. For example: Customer ID
Data quality testing includes number check, date check, precision check, data check , null check etc.
This testing is done to check the data integrity of old and new data with the addition of new data. Incremental testing v
Incremental ETL testing
inserts and updates are getting processed as expected during incremental ETL process.
GUI/Navigation Testing This testing is done to check the navigation or GUI aspects of the front end reports.
While performing ETL testing, two documents that will always be used by an ETL tester are
1. ETL mapping sheets :An ETL mapping sheets contain all the information of source and destination
tables including each and every column and their look-up in reference tables. An ETL testers need
to be comfortable with SQL queries as ETL testing may involve writing big queries with multiple
joins to validate data at any stage of ETL. ETL mapping sheets provide a significant help while
writing queries for data verification.
2. DB Schema of Source, Target: It should be kept handy to verify any detail in mapping sheets.
Mapping doc validation Verify mapping doc whether corresponding ETL information is provided or not. Change log should maintain in every ma
1. Validate the source and target table structure against corresponding mapping doc.
2. Source data type and target data type should be same
3. Length of data types in both source and target should be equal
Validation 4. Verify that data field types and formats are specified
5. Source data type length should not less than the target data type length
6. Validate the name of columns in the table against mapping doc.
Constraint Validation Ensure the constraints are defined for specific table as expected
1. The data type and length for a particular attribute may vary in files or tables though the semantic definition is
Data consistency issues 2. Misuse of integrity constraints
Test Scenario Test Cases
Transformation Transformation
Null Validate Verify the null values, where “Not Null” specified for a specific column.
1. Needs to validate the unique key, primary key and any other column should be unique as per the business re
any duplicate rows
Duplicate Check 2. Check if any duplicate values exist in any column which is extracting from multiple columns in source and com
column
3. As per the client requirements, needs to be ensure that no duplicates in combination of multiple columns wit
1. To validate the complete data set in source and target table minus a query in a best solution
2. We need to source minus target and target minus source
3. If minus query returns any value those should be considered as mismatching rows
Complete Data 4. Needs to matching rows among source and target using intersect statement
Validation 5. The count returned by intersect should match with individual counts of source and target tables
6. If minus query returns of rows and count intersect is less than source count or target table then we can consi
are existed.
Data Cleanness Unnecessary columns should be deleted before loading into the staging area.
Mathematical errors
Calculation bugs Final output is wrong
No logo matching
No version information available
Version control bugs