0% found this document useful (0 votes)

4 views37 pages

Session 2 - Data Structure, Data Management and Data Quality

The document outlines a syllabus for a business analytics course focused on marketing, detailing problem-solving frameworks using data, data wrangling, analysis, and visualization techniques. It includes practice cases involving Amazon and Netflix to apply structured problem-solving approaches to real-world challenges. Additionally, it covers data types, data quality, and data cleaning processes essential for effective data analysis.

Uploaded by

Vũ Ngọc Uyên Phương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views37 pages

Session 2 - Data Structure, Data Management and Data Quality

Uploaded by

Vũ Ngọc Uyên Phương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

BUSINESS ANALYTICS

IN
MARKETING
GEMS UPDATE
TEAMS
TEAMS
SYLLABUS

problem-solving DATA wrangling VISUALIZATION EXPERT

using data & ANALYSIS & STORYTELLING SHARING
• Problem statement & goal • Data wrangling • Graphs, charts &
setting • Data analysis dashboards
• Data analytics roadmap • RFM analysis • Story-telling
• Analytics tools exercise • Common pitfalls
HOW TO SOLVE PROBLEMS USING DATA?

STEP 1 Step 2 Step 3

Problem statement Data wrangling Data analysis

Step 4 Step 5
Data visualization Communication
data WRANGLING
& analysis
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING

data DATA DATA
DISCOVERY STRUCTURING CLEANING

DATA DATA DATA

ENRICHING VALIDATING publishing
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING

data What does this dataset mean?
DISCOVERY
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING

DATA A dataset is a collection of values.
STRUCTURING Every value belongs to a variable and an
observation.

Each variable forms a column.

Each observation forms a row.
Each type of observational unit forms a table.

A variable contains all values that measure the

same underlying attribute across units. An
observation contains all values measured on
the same unit across attributes.
PRACTICE
PROBLEM-SOLVE
HOW TO SOLVE PROBLEMS USING DATA?
GROUP
DISCUSSION Objective:
Today, you’ll practice applying structured problem-solving frameworks to tackle a real-world business
challenge. Your goal is to identify, break down, and prioritize problems, and then develop data-driven
insights and actionable recommendations.
Your Task:
Pick 1 case and work through this challenge using a structured problem-solving approach.
Time: 30’
Step 1: Define the Problem
Present: 5’ Draft a clear and concise problem statement (1-2 sentences).
Step 2: Break It Down Using the MECE Framework
Decompose the problem into Mutually Exclusive, Collectively Exhaustive sub-problems.
Think in logical buckets: hiring process, onboarding, compensation, work conditions, management, etc.
Step 3: Make Assumptions & Identify Data Needs
Since detailed data isn’t provided, make reasonable assumptions (e.g., cost per hire, training time, exit
interviews).
List the data you would ideally collect to analyze each sub-problem.
Step 4: Prioritize Issues
Use impact vs. ease or cost vs. urgency matrices to determine which problems to tackle first.
Identify the top 1–2 root causes worth solving.
Step 5: Develop Insights & Recommendations
What might be driving the high attrition?
Propose data-driven solutions based on your assumptions and analysis.
Step 6: Present with the STAR Framework – without laptop/supporting materials
PRACTICE: CASE NO.1

Context

Amazon is a multinational technology company based in Seattle, Washington, United

States. It is one of the largest online retailers in the world, selling a wide variety of
products, including electronics, books, clothing, and household items. Amazon is among
the top 5 most valuable companies in terms of market capitalization (Jan 2023).

In 2021, only a third of Amazon’s new hires stayed with the company for more than 90
days before quitting, being fired, or getting laid off.

An investigation from the New York Times found that, among hourly employees,
Amazon’s turnover was approximately 150 percent annually.

Those numbers indicate that Amazon is having serious issues retaining employees.
Amazon estimated that its attrition rate costs it almost $8 billion a year across its global
consumer field operations team.

• Clark M. Leaked documents show just how fast employees are leaving Amazon [Internet]. The Verge. 2022 [cited 2023Feb19].
Available from: https://www.theverge.com/2022/10/17/23409920/amazon-third-hires-attrition-cost-workforce
• Villegas A, Beachy S. Inside Amazon’s Employment Machine [Internet]. New York Time. 2021 [cited 2023Feb19].
Available from: https://www.nytimes.com/interactive/2021/06/15/us/amazon-workers.html
PRACTICE: CASE NO.2

Context

Netflix is an American streaming service that provides a wide range of TV shows, movies,
documentaries, and other forms of entertainment to subscribers. It was founded in 1997
originally as a DVD-by-mail service before transitioning to a streaming service in 2007.

During the three-month period ending June 30 2022, Netflix reported a loss of 970,000
subscribers. This is the largest quarterly loss in the company’s history.

Previously, in April, the company reported that it had lost 200,000 subscribers in the first
quarter of 2022 — the first big loss in over a decade.

Netflix’s stock was on a decline of approximately 70% from the beginning of the year to
July 2022. Its market valuation has decreased from $300 billion to under $90 billion in
less than a year.

• Forristal L. Netflix loses 970,000 subscribers, its largest quarterly loss ever [Internet]. TechCrunch. 2022 [cited 2023Feb19].
Available from: https://techcrunch.com/2022/07/19/netflix-loses-970000-subscribers-its-largest-quarterly-loss-ever/
data
STRUCTURe
DATA STRUCTURE

DATA A dataset is a collection of values.

STRUCTURING Every value belongs to a variable and an
observation.

Each variable forms a column.

Each observation forms a row.
Each type of observational unit forms a table.

A variable contains all values that measure the

same underlying attribute across units. An
observation contains all values measured on
the same unit across attributes.
DATA STRUCTURE

Transforming given data to the standard tabular

format (variables - observations - values).
DATA STRUCTURE

EXTRA INFO: RELATIONAL DATABASE

A relational database is a collection of
information that organizes data in predefined
relationships where data is stored in one or
more tables (or "relations") of columns and
rows.

Relationships are a logical connection

between different tables, established on the
basis of interaction among these tables.

A database schema comprise of all

relationships and defines how data is
organized within a relational database
What is a relational database (RDBMS)? [Internet]. Google. Google; [cited 2023Feb19]. Available from: https://cloud.google.com/learn/what-is-a-relational-database#
Brazilian e-commerce public dataset by Olist [Internet]. Kaggle. Olist; 2021 [cited 2023Feb19]. Available from: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
the most
common
data TYPES?
Data types

Step 2 DATA analysis: DATA TYPES IN EXCEL

There are 4 main data types for data wrangling and analysis using Excel:

TEXT NUMBER BOOLEAN ERROR

A, B, C 1, 2, 3 TRUE #DIV/0, #N/A,
apple, Banana 1.2, 1.999 FALSE #NAME?,
Who? -1, -0.9 #NULL!,
“10”, “2.1” *date, #NUM!, #REF!
“TRUE” *time, #VALUE!
“” *duration etc.
Data types

Step 2 DATA analysis: DATA TYPES IN EXCEL

WARNING: In Excel, the data type (what it is) and data format (how we see
it) of one value might be vastly different:

DATA TYPE DATA FORMAT

● Character: A, B, C
TEXT ●
●
Special character: !@#$%^&*()
Text: apple, Banana, ORANGE
● Numbers as text: “0”, “1.1”

● Number: 0, 1.2, -1, -3.5

number ● Percentage: 12%, 1.5%, -3%
● Accounting; currency: (3), 4; 5000đ, $5.00
● Date; datetime: Feb 19th, 2023; 2023-02-19 17:00:00
● Duration: 3:20:00
Data types

EXTRA INFO: DATA TYPES IN DATABASE

There are many main data types for a structured database. They are heavily
validated. PostgresQL has some data types similar to Excel including:

TEXT NUMBER TEMPORAL BOOLEAN NULL

TEXT FLOAT DATE TRUE
CHAR INTEGER TIME FALSE
VARCHAR DATETIME
TIMESTAMP
INTERVAL
data
QUALITY
Data quality

Korolov, M. (2022) 6 dimensions of Data Quality Boost Data Performance, TechTarget. TechTarget.
Available at: https://www.techtarget.com/searchdatamanagement/tip/6-dimensions-of-data-quality-boost-data-performance (Accessed: February 19, 2023).
data
CLEANING
Data cleaning

DATA
STRUCTURING

Create unique ids for

the observations.
Data cleaning

DATA Missing values:

CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Impute the values using the
mean, median or max value
(for continuous values) or
most frequent value (for
categorical values) depending
on the situation.
● Ignore the values of those
variables only.
Data cleaning

DATA Invalid value:

CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Correct the value using most
reasonable methods.
Data cleaning

DATA
CLEANING
DATA
ENRICHING

Remove, impute or
correct missing values
and invalid values.
Data cleaning

Value inconsistency: lengths

Data type inconsistency: text (“20”)

vs. number (19)
DATA
VALIDATING

Validate and ensure the correct data

types, data homogeneity and
constraints.
Data cleaning

DATA
publishing

Store & manage data in suitable

format and system to deliver &
distribute the data to end-users
through platform and tools.
Data cleaning

DATA
STRUCTURING

Remove duplicates.
MID-TERM
BRIEFING
MID-TERM BRIEFING

Instruction & Dataset:

BA_S3(2024-2025)_Assignment_Mid-term
● Work in team
● Submit on eLearning by Mon, 16th Jun 2025
practice
data cleaning
mid-term dataset

Personal Voice Assistant in Python
86% (22)
Personal Voice Assistant in Python
30 pages
Data Analytics Program - Introduction To Data Analytics - Lesson 1
No ratings yet
Data Analytics Program - Introduction To Data Analytics - Lesson 1
56 pages
Data Mining and Preprocessing Guide
No ratings yet
Data Mining and Preprocessing Guide
64 pages
Activities Involved in Support Project
100% (1)
Activities Involved in Support Project
3 pages
Data Preprocessing
100% (1)
Data Preprocessing
33 pages
Google Certificate Notes
No ratings yet
Google Certificate Notes
36 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
33 pages
Data Munging for Data Scientists
No ratings yet
Data Munging for Data Scientists
54 pages
Excel for Data Analysis Beginners
100% (1)
Excel for Data Analysis Beginners
56 pages
Session2 Short
No ratings yet
Session2 Short
196 pages
UNIT - 2 .DataScience 04.09.18
No ratings yet
UNIT - 2 .DataScience 04.09.18
53 pages
Data Science Preprocessing Guide
No ratings yet
Data Science Preprocessing Guide
40 pages
Day-4 Preprocessing
No ratings yet
Day-4 Preprocessing
11 pages
Data Integrity for Analysts
No ratings yet
Data Integrity for Analysts
48 pages
Data Cleaning Essentials
No ratings yet
Data Cleaning Essentials
42 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
33 pages
Data Sciences Unit-I
No ratings yet
Data Sciences Unit-I
83 pages
Salesforce Architect - Citizens-1
No ratings yet
Salesforce Architect - Citizens-1
1 page
Data Cleaning and JSON in R
No ratings yet
Data Cleaning and JSON in R
61 pages
Data Science Course Overview
No ratings yet
Data Science Course Overview
34 pages
Peer-to-Peer Versus Client
No ratings yet
Peer-to-Peer Versus Client
8 pages
Data Quality for Researchers
No ratings yet
Data Quality for Researchers
27 pages
Intro To Data Analytics - Cleanup & Transformation
No ratings yet
Intro To Data Analytics - Cleanup & Transformation
30 pages
Importance of Data Cleaning
No ratings yet
Importance of Data Cleaning
35 pages
Session2 Parts 3 4
No ratings yet
Session2 Parts 3 4
202 pages
IPLOOK HSS Web Operation Manual-20200408-V1.4
No ratings yet
IPLOOK HSS Web Operation Manual-20200408-V1.4
47 pages
Data Analysis
No ratings yet
Data Analysis
29 pages
Data Analitics 4
No ratings yet
Data Analitics 4
10 pages
MCQA
No ratings yet
MCQA
14 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
CourseNotes - Learning Data Analytics 1 Foundations
No ratings yet
CourseNotes - Learning Data Analytics 1 Foundations
8 pages
TTDS Lecture 2
No ratings yet
TTDS Lecture 2
40 pages
(M3S1) Data Analytics Framework
No ratings yet
(M3S1) Data Analytics Framework
12 pages
Chapter 3
No ratings yet
Chapter 3
54 pages
IT Network Specialist Resume
No ratings yet
IT Network Specialist Resume
2 pages
Process Data From Dirty To Clean
No ratings yet
Process Data From Dirty To Clean
34 pages
UNIT - Introduction - DataScience - New
No ratings yet
UNIT - Introduction - DataScience - New
55 pages
Data Integrity and Cleaning Guide
No ratings yet
Data Integrity and Cleaning Guide
6 pages
DoDAF V2 - Volume 2 (Public)
No ratings yet
DoDAF V2 - Volume 2 (Public)
279 pages
Adnan Ahmad: Objective
No ratings yet
Adnan Ahmad: Objective
3 pages
Data Cleaning and Data Transformation
No ratings yet
Data Cleaning and Data Transformation
13 pages
Mod2 DM
No ratings yet
Mod2 DM
86 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
Data Analytics - Module-1.2
No ratings yet
Data Analytics - Module-1.2
55 pages
Freeds
No ratings yet
Freeds
2 pages
24.2 Exercise 9 - Process Modeling 101, Part 1
No ratings yet
24.2 Exercise 9 - Process Modeling 101, Part 1
18 pages
Chapter 3& 4
No ratings yet
Chapter 3& 4
60 pages
Hospital IT Support Request
No ratings yet
Hospital IT Support Request
1 page
Coronel Morris - DatabaseSystems - 14e - PPT - Mod04
No ratings yet
Coronel Morris - DatabaseSystems - 14e - PPT - Mod04
42 pages
Pre Processing
No ratings yet
Pre Processing
52 pages
Chandu Zeroth Review
No ratings yet
Chandu Zeroth Review
15 pages
Big Data Lec5
No ratings yet
Big Data Lec5
37 pages
DAA - Chapter 02
No ratings yet
DAA - Chapter 02
12 pages
Daf Brochure 181220242312
No ratings yet
Daf Brochure 181220242312
23 pages
Storage Devices Updated Randa
No ratings yet
Storage Devices Updated Randa
38 pages
L 4 and 5-Data Cleaning DS-Sa
No ratings yet
L 4 and 5-Data Cleaning DS-Sa
44 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
CS822 DataMining Week3
No ratings yet
CS822 DataMining Week3
91 pages
Manual de Utilizare EZVIZ C8C (Română - 6 Pagini)
No ratings yet
Manual de Utilizare EZVIZ C8C (Română - 6 Pagini)
2 pages
2 Data Preprocessing
No ratings yet
2 Data Preprocessing
57 pages
Data Preprocessing for Tech Students
No ratings yet
Data Preprocessing for Tech Students
59 pages
Global NetAcad Instance - Request Voucher - Networking Academy
No ratings yet
Global NetAcad Instance - Request Voucher - Networking Academy
4 pages
DSF 3-4
No ratings yet
DSF 3-4
18 pages
Business: Analyst
No ratings yet
Business: Analyst
10 pages
Emerging CH2
No ratings yet
Emerging CH2
41 pages
SCHENCK
No ratings yet
SCHENCK
26 pages
CrossWordPuzzle Chap 9
100% (1)
CrossWordPuzzle Chap 9
3 pages
WAD Lab Manual
No ratings yet
WAD Lab Manual
31 pages
DevOps Engineer Resume: Shyamanand Kumar
No ratings yet
DevOps Engineer Resume: Shyamanand Kumar
3 pages
Anubhav Jain Resume
No ratings yet
Anubhav Jain Resume
3 pages
NME1 Unit4 Notes
No ratings yet
NME1 Unit4 Notes
21 pages
Forescout Eyeextend CrowdStrike
No ratings yet
Forescout Eyeextend CrowdStrike
2 pages
GRC IRMF Lab 3.2 Acknowledge The Sustainable Travel Program Policy
No ratings yet
GRC IRMF Lab 3.2 Acknowledge The Sustainable Travel Program Policy
9 pages
Session 3 - Data Wrangling, Data Manipulation With Excel
No ratings yet
Session 3 - Data Wrangling, Data Manipulation With Excel
36 pages
Session 2 - Data Structure, Data Management and Data Quality
No ratings yet
Session 2 - Data Structure, Data Management and Data Quality
37 pages
Network Simulation 2
No ratings yet
Network Simulation 2
9 pages
1-Intro To Data Analysis and Careers 3 Hour
No ratings yet
1-Intro To Data Analysis and Careers 3 Hour
43 pages
FBA Module 3
No ratings yet
FBA Module 3
41 pages
Data Analysis Intro Session
No ratings yet
Data Analysis Intro Session
71 pages
Session 3 - Data Wrangling, Data Manipulation With Excel
No ratings yet
Session 3 - Data Wrangling, Data Manipulation With Excel
15 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
52 pages
Uom Synopsis-Project 4565 Synopsis 2 1735404580751
No ratings yet
Uom Synopsis-Project 4565 Synopsis 2 1735404580751
4 pages
Fdsa PPT - Unit 1
No ratings yet
Fdsa PPT - Unit 1
19 pages
Istock Api Guide
No ratings yet
Istock Api Guide
7 pages
Sop Apisetu
No ratings yet
Sop Apisetu
5 pages
Intro. Data Science 3
No ratings yet
Intro. Data Science 3
38 pages
Unit 2 Preprocessing
No ratings yet
Unit 2 Preprocessing
39 pages
DS& Ai
No ratings yet
DS& Ai
11 pages
Intro and Power Query Slides
No ratings yet
Intro and Power Query Slides
29 pages

Session 2 - Data Structure, Data Management and Data Quality

Uploaded by

Session 2 - Data Structure, Data Management and Data Quality

Uploaded by

BUSINESS ANALYTICS

problem-solving DATA wrangling VISUALIZATION EXPERT

STEP 1 Step 2 Step 3

Step 2 DATA WRANGLING

DATA DATA DATA

Step 2 DATA WRANGLING

Step 2 DATA WRANGLING

Each variable forms a column.

A variable contains all values that measure the

Amazon is a multinational technology company based in Seattle, Washington, United

DATA A dataset is a collection of values.

Each variable forms a column.

A variable contains all values that measure the

Transforming given data to the standard tabular

EXTRA INFO: RELATIONAL DATABASE

Relationships are a logical connection

A database schema comprise of all

Step 2 DATA analysis: DATA TYPES IN EXCEL

TEXT NUMBER BOOLEAN ERROR

Step 2 DATA analysis: DATA TYPES IN EXCEL

DATA TYPE DATA FORMAT

● Number: 0, 1.2, -1, -3.5

EXTRA INFO: DATA TYPES IN DATABASE

TEXT NUMBER TEMPORAL BOOLEAN NULL

Create unique ids for

DATA Missing values:

DATA Invalid value:

Value inconsistency: lengths

Data type inconsistency: text (“20”)

Validate and ensure the correct data

Store & manage data in suitable

Instruction & Dataset:

You might also like