ETL Testing Concepts
What is ETL Testing?
ETL testing stands for Extract, Transform, and Load testing. It is a process of validating and
verifying the correctness and completeness of data during the ETL process.
When do we need ETL Testing?
• ETL is commonly associated with Data Warehousing projects but in reality, any form of
bulk data movement from a source to a target can be considered ETL.
• Large enterprises often have a need to move application data from one source to another
for data integration or data migration purposes. ETL testing is a data centric testing process
to validate that the data has been transformed and loaded into the target as expected.
Types of ETL Testing
• Metadata Testing: The purpose of Metadata Testing is to verify that the table definitions
conform to the data model and application design specifications.
• Data Type Check: Verify that the table and column data type definitions are as per the data
model design specifications.
Example: Data Model column data type is NUMBER but the database column data type is
STRING (or VARCHAR).
Data Completeness Testing
The purpose of Data Completeness tests are to verify that all the expected data is loaded in
target from the source. Some of the tests that can be run are: Compare and Validate counts,
aggregates (min, max, sum, avg) and actual data between the source and target.
Record Count Validation
Compare count of records of the primary source table and target table. Check for any
rejected records.
Example: A simple count of records comparison between the source and target tables
Source Query: SELECT count(*) src_count FROM customer
Target Query: SELECT count(*) tgt_count FROM customer_dim
Metadata Naming Standards Check
Verify that the names of the database metadata such as tables, columns, indexes are as per
the naming standards.
Example: The naming standard for Fact tables is to end with an ‘_F’ but some of the fact
tables names end with ‘_FACT’.
Metadata Check Across Environments
Compare table and column metadata across environments to ensure that changes have been
migrated appropriately.
Example: A new column added to the SALES fact table was not migrated from the
Development to the Test environment resulting in ETL failures.
Data Length Check
Verify that the length of database columns are as per the data model design specifications.
Example: Data Model specification for the ‘first_name’ column is of length 100 but the
corresponding database table column is only 80 characters long.
Column Data Profile Validation
1. Compare unique values in a column between the source and target.
2. Compare max, min, avg, max length, min length values for columns depending on the data
type.
3. Compare null values in a column between the source and target.
Challenges in ETL Testing
1. ETL Testing involves comparing of large volumes of data typically millions of records.
2. The data that needs to be tested is in heterogeneous data sources (e.g. databases, flat
files).
3. Data is often transformed which might require complex SQL queries for comparing the
data.
4. ETL testing is very much dependent on the availability of test data with different test
scenarios.
Data Length Check Index / Constraint Check
Verify that proper constraints and indexes are defined on the database tables as per the
design specifications:
1. Verify that the columns that cannot be null have the ‘NOT NULL’ constraint.
2. Verify that the unique key and foreign key columns are indexed as per the requirement.
3. Verify that the table was named according to the table naming convention.
Example 1: A column was defined as ‘NOT NULL’ but it can be optional as per the design.
Example 2: Foreign key constraints were not defined on the database table resulting in
orphan records in the child table.