Data Processing
Lecture no 11
Lecture Outline
• Data Validation
• Types of data validation
• Importance of Data validation
• why perform data validation
• Major difference between verification and validation
Lecture Objective
Students will gain a comprehensive understanding of data validation, and different types of data validation and
when it is important to do data verification.
And learn the key difference between verification and validation
Data Validation
Data validation deals with making sure the data is valid (clean, correct and useful). Data validation procedures use
data validation rules (or check routines) to ensure the validity (mostly correctness and meaningfulness) of data.
It also ensures the validity of input data to maintain the security of the system. These rules are automatically
implemented through data dictionaries. Data validation can also be implemented through declaring data integrity
rules or procedures enforcing business rules (especially in business applications)
Any type of data handling task, whether it is gathering data, analyzing it, or structuring it for presentation, must
include data validation to ensure accurate results.
The simplest form of validation is checking the input to make sure they are made up of characters from the “valid”
set. For example, a validation process for telephone directory application should validate the input telephone
numbers to make sure that they contain only numbers, plus/minus symbols and brackets (and nothing else).
Types of Data Validation
There are many types of data validation. Most data validation procedures will perform one or more of these checks
to ensure that the data is correct before storing it in the database.
Common types of data validation checks include:
1) Data Type Check
A data type check confirms that the data entered has the correct data type. For example, a field might only accept
numeric data. If this is the case, then any data containing other characters such as letters or special symbols should
be rejected by the system.
2) Code Check
A Code Check ensures that a field is chosen from a valid list of values or that certain formatting rules are followed.
For example, it is easier to verify the validity of a postal code by comparing it to a list of valid codes. Other items,
such as country codes and NAICS industry codes, can be approached in the same way.
Types of Data Validation
3) Range Check
A Range Check will determine whether the input data falls within a given range. Latitude and longitude, for
example, are frequently used in geographic data. Latitude should be between -90 and 90, and longitude should be
between -180 and 180. Any values outside of this range are considered invalid.
4) Format Check
Many data types follow a certain predefined format. A common use case is date columns that are stored in a fixed
format like “YYYY-MM-DD” or “DD-MM-YYYY.” A data validation procedure that ensures dates are in the proper
format helps maintain consistency across data and through time.
5) Consistency Check
A consistency check is a type of logical check that confirms the data’s been entered in a logically consistent way. An
example is checking if the delivery date is after the shipping date for a parcel.
6) Uniqueness Check
Some data like IDs or e-mail addresses are unique by nature. A database should likely have unique entries on these
fields. A uniqueness check ensures that an item is not entered multiple times into a database.
Importance of data validation
• Analysts can limit the quantity of inaccurate data in their warehouse by validating their data
• Validating the accuracy, clarity, and specificity of data is necessary to fix any project problems. You risk making
decisions based on inaccurate, unrepresentative data without validating data.
• Data Validation is used in the ETL (Extraction, Translation, and Load) process and data warehousing. It allows
an analyst to understand the scope of data conflicts better.
• It is also important to test the data model. If the data model is set up and structured correctly, you can use
data files in different programs and applications.
Benefits of data validation
Data Validation ensures that the data collected is accurate, qualitative, and healthy. It also makes sure that the
data collected from different resources meet business requirements. Some benefits to Data Validation are:
• It ensures cost-effectiveness because it saves time and money by making sure that the datasets collected and
used in processing are clean and accurate
• It is easy to integrate and is compatible with most processes.
• It ensures that the data collected from different sources — structured or unstructured — meet the business
requirement by creating a standard database and cleaning dataset information.
• With increased data accuracy, it ensures increased profitability and reduced loss in the long run.
• It also provides better decision-making, strategy, and enhanced market goals.
Key Difference Between Verification and Validation
Data Verification Data Validation
It is a process of Making sure the data Make sure the data entered meets the
entered is the same as the source specified criteria
Take place when using the existing data Takes place when creating or adding new
data.
Example: When user enters email to reset Example: when user enter to login or in a
password and presses submit then the form, check the syntax to determine if it
email address is matched with the system is correct or incorrect
repository
References
1. https://
www.analyticsvidhya.com/blog/2021/03/data-validation-and-data-verification-from-dic
tionary-to-machine-learning
2. https://www.sigmoid.com/blogs/data-validation
3. https://www.questionpro.com/blog/data-validation/
4. https://www.precisely.com/blog/data-quality/data-validation-vs-data-verification
Review Questions
1. What are the types of data verification?
2. Write down some advantages of data verification.
3. What is the difference between data verification and validation.