Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
59 views10 pages

Ch06 - Transforming Data (Slides)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views10 pages

Ch06 - Transforming Data (Slides)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

30-Apr-24

Accounting Information Systems


Fifteenth Edition, Global Edition

Chapter 6
Transforming Data

• Copyright © 2021 Pearson Education Ltd.

Learning Objectives
• Describe the principles of data structuring related to
aggregating data, data joining, and data pivoting.
• Describe data parsing, data concatenation, cryptic data
values, misfielded data values, data formatting, and data
consistency and how they relate to data standardization.
• Describe how to diagnose and fix the data cleaning errors
related to data duplication, data filtering, data contradiction
errors, data threshold violations, violated attribute
dependencies, and data entry errors.
• List and describe four different techniques to perform data
validation.

• Copyright © 2021 Pearson Education Ltd.

1
30-Apr-24

Table 6.1 Attributes of High-Quality


Data

• Copyright © 2021 Pearson Education Ltd.

Data Structuring (1 of 2)
• Data structuring is the process of changing the
organization and relationships among data fields to
prepare the data for analysis.
• Extracted data often needs to be structured in a manner
that will enable analysis. This can entail
– aggregating the data at different levels of detail
– joining different data together, and/or
– pivoting the data

• Copyright © 2021 Pearson Education Ltd.

2
30-Apr-24

Data Structuring (2 of 2)
• Aggregate data is the presentation of data in a
summarized form.
• Data joining is the process of combining different data
sources.
• Data pivoting is rotating data from rows to columns.

• Copyright © 2021 Pearson Education Ltd.

Figure 6.4 Examples of Different


Levels of Aggregating Data

• Copyright © 2021 Pearson Education Ltd.

3
30-Apr-24

Figure 6.5 Pivoting S&S Data

• Copyright © 2021 Pearson Education Ltd.

Figure 6.6 Pivoting Figure 6.5 Data


VendorName ProdCat TotalCosts
B&D 1 $15,982.00
B&D 2 $ 2,529.00
Black and Decker 1 $ 2,220.06
Black and Decker 2 $ 568.00
Black and Decker 3 $13,024.57
Calphalon 1 $19,509.75
Honeywell 2 $ 5,516.90
Honeywell 1 $43,282.53
Oster 1 $28,020.11
Panasonic 1 $15,765.12
Panasonic 2 $ 5,693.50

• Copyright © 2021 Pearson Education Ltd.

4
30-Apr-24

Data Standardization (1 of 3)
• Data standardization is the process of standardizing the
structure and meaning of each data element so it can be
analyzed and used in decision making.
– It is particularly important when merging data from
several sources.
– It may involve changing data to a common format, data
type, or coding scheme.
– It encompasses ensuring the information is contained
in the correct field and the fields are organized in a
useful manner.

• Copyright © 2021 Pearson Education Ltd.

Data Standardization (2 of 3)
• Data parsing involves separating data from a single field
into multiple fields.
– It is often an iterative process that relies heavily on
pattern recognition.
• Data concatenation is the combining of data from two or
more fields into a single field.
– It is often used to create a unique identifier for a row.

• Copyright © 2021 Pearson Education Ltd.

10

5
30-Apr-24

Figure 6.7 Data Parsing Example

• Copyright © 2021 Pearson Education Ltd.

11

Figure 6.8 Data Concatenation Example

• Copyright © 2021 Pearson Education Ltd.

12

6
30-Apr-24

Data Standardization (3 of 3)
• Cryptic data values are data items that have no meaning
without understanding a coding scheme.
– When a field contains only two different responses,
typically 0 or 1, this field is called a dummy variable or
dichotomous variable.
• Misfielded data values are data values that are correctly
formatted but not listed in the correct field.
• Data consistency is the principle that every value in a
field should be stored in the same way.

• Copyright © 2021 Pearson Education Ltd.

13

Data Cleaning (1 of 3)
• Data cleaning is the process of updating data to be
consistent, accurate, and complete.
– Dirty data is data that is inconsistent, inaccurate, or
incomplete.
– To be useful, dirty data must be cleaned.
• Data de-duplication is the process of analyzing data and
removing two or more records that contain identical
information.
• Data filtering is the process of removing records or fields
of information from a data source.

• Copyright © 2021 Pearson Education Ltd.

14

7
30-Apr-24

Data Cleaning (2 of 3)
• Data imputation is the process of replacing a null or
missing value with a substituted value.
– It only works with numeric data.
• Data contradiction errors are errors that exist when the
same entity is described in two conflicting ways.
– Contradiction errors need to be investigated and
resolved appropriately.
• Data threshold violations are data errors that occur when
a data value falls outside an allowable level.

• Copyright © 2021 Pearson Education Ltd.

15

Data Cleaning (3 of 3)
• Violated attribute dependencies are errors that occur
when a secondary attribute in a row of data does not
match the primary attribute.
• Data entry errors are all types of errors that come from
inputting data incorrectly.
– They often occur in human data entry and can also be
introduced by the computer system.
– They may be indistinguishable from data formatting
and data consistency errors in an output data file.

• Copyright © 2021 Pearson Education Ltd.

16

8
30-Apr-24

Data Validation (1 of 2)
• Data validation is the process of analyzing data to make
certain the data has the properties of high-quality data.
– It is both a formal and informal process.
– It is an important precursor to data cleaning.
– The techniques used to validate data can be thought of
as a continuum from simple to complex.

• Copyright © 2021 Pearson Education Ltd.

17

Data Validation (2 of 2)
• Visual inspection is the process of examining data using
human vision to see if there are problems.
• Basic statistical tests can be performed to validate the
data.
• Audit a sample is one of the best techniques for assuring
data quality.
• Advanced testing techniques are possible with a deeper
understanding of the content of data.

• Copyright © 2021 Pearson Education Ltd.

18

9
30-Apr-24

Key Terms
• Data structuring • Data filtering
• Aggregate data • Data imputation
• Data pivoting • Data contradiction errors
• Data standardization • Data threshold violations
• Data parsing • Violated attribute dependencies
• Data concatenation • Data entry errors
• Cryptic data values • Data validation
• Dummy variable or • Visual inspection
dichotomous variable
• Misfielded data values
• Data consistency
• Dirty data
• Data cleaning
• Data de-duplication
• Copyright © 2021 Pearson Education Ltd.

19

10

You might also like