What is Data Validation
Data validation is the process of verifying and validating data that is collected
before it is used. Any type of data handling task, whether it is gathering data,
analyzing it, or structuring it for presentation, must include data validation to
ensure accurate results. Sometimes it can be tempting to skip validation since it
takes time. However, it is an essential step toward garnering the best results
possible.
The data validation process has gained significant importance within
organizations involved with data and its collection, processing, and analysis. It is
considered to be the foundation for efficient data management since it facilitates
analytics based on meaningful and valid datasets.
Validation scripts
A validation script is an attribute of a validation rule asset. It contains the
validation logic, which expresses a single condition to evaluate assets.
The outcome of the evaluation is binary:
valid: The validated asset meets the condition.
invalid: The validated asset does not meet the condition
Within the validation script, you can use the following features:
Aggregate functions
Validation functions
Multi-line boolean expressions
Collections
Closures
Aggregate functions
Aggregate functions allow you to extract a value from a collection. More
information on collections: Collections.
They are not validation functions in itself, but can help you in retrieving data
you want to refer to more easily in the rest of your validation script.
Function Result
max (Collection) The maximum of the given collection of values.
min (Collection) The minimum of the given collection of values.
avg (Collection) The average of the given collection of number values.
sum (Collection) The sum of the given collection of number values.
Validation functions
To make the validation script more readable, there are many packaged
validation functions. The built-in validation functions not only handle the
validation, but they also provide a way to produce meaningful error messages. For
more advanced functions, you need a closure, more information:
General functions
Error messages
String functions
Number functions
Date conversion functions
Date functions
The validation script of a validation rule may consist of multiple boolean
expressions. To increase legibility, you can use specific constructs.
Using the constructs makes your rule a little more verbose, but more readable in
most cases.
Constructions
Expression Purpose
allOf All the conditions have to be valid (= AND).
anyOf One or more of the conditions have to be valid (= OR).
condition Capture the result of a boolean to re-use in other constructs.
Collections
A collection is a list of data. Usually you create it when retrieving attributes or
relations.
You can refer to a specific element in the collection in multiple ways.
NoteIf a reference points to a value that doesn't exist, the rule will crash. For
example, retrieving the fifth value of a list that only contains four values causes a
crash.
Reference Result
mylist.first() The first value in the list mylist.
mylist.last() The last value from the list mylist.
mylist[1] The second element from the list mylist. The number is the index of the value you
the list.
NoteThe first element has index zero. As a consequence, mylist.first() is equiva
mylist.get(1) The second element from the list my list. It is very similar to the example above.
mylist?.get(1) The second element from the list. However, this syntax is null-safe. ?. is the null-s
You cannot combine the ?. with the [ ... ] notation. This means that if the reference
present, the result is null, whereas the expression without the question mark fails t
Types of Data Validation
TYPE CHECK
Data comes in different types. One type of data is numerical data — like years,
age, grades or postal codes. Though all of these are numbers, they can be either
integers or floats. For example, a year can’t be 2010.14 because years must be
integers. On the other hand, grades can be either an integer (99) or a float (90.5).
Another type of data is text data — names, addresses or emails, for instance.
FORMAT CHECK
Format checking validates the data’s structure. For example, birthdays have a
specific format (say, YYYY-MM-DD). Having the data in this format is essential
for the project’s next steps, so checking that your data has the correct structure is
vital. When you’re validating the data structure, you should have a clear
understanding of the correct structure in order to make the validation process
consistent and straightforward.
CORRECTNESS CHECK
Sometimes the data may be in the correct format but may need to be corrected. For
example, a birthday entry may be 1990-13-06. Although the format is valid, there’s
no month 13. This step in the validation ensures that your values are logical and
meaningful. Another example is checking if a postal code or a phone number is
valid. Sometimes this is referred to as the range check.