30-Apr-24
Accounting Information Systems
Fifteenth Edition, Global Edition
Chapter 6
Transforming Data
• Copyright © 2021 Pearson Education Ltd.
Learning Objectives
• Describe the principles of data structuring related to
aggregating data, data joining, and data pivoting.
• Describe data parsing, data concatenation, cryptic data
values, misfielded data values, data formatting, and data
consistency and how they relate to data standardization.
• Describe how to diagnose and fix the data cleaning errors
related to data duplication, data filtering, data contradiction
errors, data threshold violations, violated attribute
dependencies, and data entry errors.
• List and describe four different techniques to perform data
validation.
• Copyright © 2021 Pearson Education Ltd.
1
30-Apr-24
Table 6.1 Attributes of High-Quality
Data
• Copyright © 2021 Pearson Education Ltd.
Data Structuring (1 of 2)
• Data structuring is the process of changing the
organization and relationships among data fields to
prepare the data for analysis.
• Extracted data often needs to be structured in a manner
that will enable analysis. This can entail
– aggregating the data at different levels of detail
– joining different data together, and/or
– pivoting the data
• Copyright © 2021 Pearson Education Ltd.
2
30-Apr-24
Data Structuring (2 of 2)
• Aggregate data is the presentation of data in a
summarized form.
• Data joining is the process of combining different data
sources.
• Data pivoting is rotating data from rows to columns.
• Copyright © 2021 Pearson Education Ltd.
Figure 6.4 Examples of Different
Levels of Aggregating Data
• Copyright © 2021 Pearson Education Ltd.
3
30-Apr-24
Figure 6.5 Pivoting S&S Data
• Copyright © 2021 Pearson Education Ltd.
Figure 6.6 Pivoting Figure 6.5 Data
VendorName ProdCat TotalCosts
B&D 1 $15,982.00
B&D 2 $ 2,529.00
Black and Decker 1 $ 2,220.06
Black and Decker 2 $ 568.00
Black and Decker 3 $13,024.57
Calphalon 1 $19,509.75
Honeywell 2 $ 5,516.90
Honeywell 1 $43,282.53
Oster 1 $28,020.11
Panasonic 1 $15,765.12
Panasonic 2 $ 5,693.50
• Copyright © 2021 Pearson Education Ltd.
4
30-Apr-24
Data Standardization (1 of 3)
• Data standardization is the process of standardizing the
structure and meaning of each data element so it can be
analyzed and used in decision making.
– It is particularly important when merging data from
several sources.
– It may involve changing data to a common format, data
type, or coding scheme.
– It encompasses ensuring the information is contained
in the correct field and the fields are organized in a
useful manner.
• Copyright © 2021 Pearson Education Ltd.
Data Standardization (2 of 3)
• Data parsing involves separating data from a single field
into multiple fields.
– It is often an iterative process that relies heavily on
pattern recognition.
• Data concatenation is the combining of data from two or
more fields into a single field.
– It is often used to create a unique identifier for a row.
• Copyright © 2021 Pearson Education Ltd.
10
5
30-Apr-24
Figure 6.7 Data Parsing Example
• Copyright © 2021 Pearson Education Ltd.
11
Figure 6.8 Data Concatenation Example
• Copyright © 2021 Pearson Education Ltd.
12
6
30-Apr-24
Data Standardization (3 of 3)
• Cryptic data values are data items that have no meaning
without understanding a coding scheme.
– When a field contains only two different responses,
typically 0 or 1, this field is called a dummy variable or
dichotomous variable.
• Misfielded data values are data values that are correctly
formatted but not listed in the correct field.
• Data consistency is the principle that every value in a
field should be stored in the same way.
• Copyright © 2021 Pearson Education Ltd.
13
Data Cleaning (1 of 3)
• Data cleaning is the process of updating data to be
consistent, accurate, and complete.
– Dirty data is data that is inconsistent, inaccurate, or
incomplete.
– To be useful, dirty data must be cleaned.
• Data de-duplication is the process of analyzing data and
removing two or more records that contain identical
information.
• Data filtering is the process of removing records or fields
of information from a data source.
• Copyright © 2021 Pearson Education Ltd.
14
7
30-Apr-24
Data Cleaning (2 of 3)
• Data imputation is the process of replacing a null or
missing value with a substituted value.
– It only works with numeric data.
• Data contradiction errors are errors that exist when the
same entity is described in two conflicting ways.
– Contradiction errors need to be investigated and
resolved appropriately.
• Data threshold violations are data errors that occur when
a data value falls outside an allowable level.
• Copyright © 2021 Pearson Education Ltd.
15
Data Cleaning (3 of 3)
• Violated attribute dependencies are errors that occur
when a secondary attribute in a row of data does not
match the primary attribute.
• Data entry errors are all types of errors that come from
inputting data incorrectly.
– They often occur in human data entry and can also be
introduced by the computer system.
– They may be indistinguishable from data formatting
and data consistency errors in an output data file.
• Copyright © 2021 Pearson Education Ltd.
16
8
30-Apr-24
Data Validation (1 of 2)
• Data validation is the process of analyzing data to make
certain the data has the properties of high-quality data.
– It is both a formal and informal process.
– It is an important precursor to data cleaning.
– The techniques used to validate data can be thought of
as a continuum from simple to complex.
• Copyright © 2021 Pearson Education Ltd.
17
Data Validation (2 of 2)
• Visual inspection is the process of examining data using
human vision to see if there are problems.
• Basic statistical tests can be performed to validate the
data.
• Audit a sample is one of the best techniques for assuring
data quality.
• Advanced testing techniques are possible with a deeper
understanding of the content of data.
• Copyright © 2021 Pearson Education Ltd.
18
9
30-Apr-24
Key Terms
• Data structuring • Data filtering
• Aggregate data • Data imputation
• Data pivoting • Data contradiction errors
• Data standardization • Data threshold violations
• Data parsing • Violated attribute dependencies
• Data concatenation • Data entry errors
• Cryptic data values • Data validation
• Dummy variable or • Visual inspection
dichotomous variable
• Misfielded data values
• Data consistency
• Dirty data
• Data cleaning
• Data de-duplication
• Copyright © 2021 Pearson Education Ltd.
19
10