BUSINESS ANALYTICS
IN
MARKETING
GEMS UPDATE
TEAMS
TEAMS
SYLLABUS
problem-solving DATA wrangling VISUALIZATION EXPERT
using data & ANALYSIS & STORYTELLING SHARING
• Problem statement & goal • Data wrangling • Graphs, charts &
setting • Data analysis dashboards
• Data analytics roadmap • RFM analysis • Story-telling
• Analytics tools exercise • Common pitfalls
HOW TO SOLVE PROBLEMS USING DATA?
STEP 1 Step 2 Step 3
Problem statement Data wrangling Data analysis
Step 4 Step 5
Data visualization Communication
data WRANGLING
& analysis
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
data DATA DATA
DISCOVERY STRUCTURING CLEANING
DATA DATA DATA
ENRICHING VALIDATING publishing
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
data What does this dataset mean?
DISCOVERY
HOW TO SOLVE PROBLEMS USING DATA?
Step 2 DATA WRANGLING
DATA A dataset is a collection of values.
STRUCTURING Every value belongs to a variable and an
observation.
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
A variable contains all values that measure the
same underlying attribute across units. An
observation contains all values measured on
the same unit across attributes.
PRACTICE
PROBLEM-SOLVE
HOW TO SOLVE PROBLEMS USING DATA?
GROUP
DISCUSSION Objective:
Today, you’ll practice applying structured problem-solving frameworks to tackle a real-world business
challenge. Your goal is to identify, break down, and prioritize problems, and then develop data-driven
insights and actionable recommendations.
Your Task:
Pick 1 case and work through this challenge using a structured problem-solving approach.
Time: 30’
Step 1: Define the Problem
Present: 5’ Draft a clear and concise problem statement (1-2 sentences).
Step 2: Break It Down Using the MECE Framework
Decompose the problem into Mutually Exclusive, Collectively Exhaustive sub-problems.
Think in logical buckets: hiring process, onboarding, compensation, work conditions, management, etc.
Step 3: Make Assumptions & Identify Data Needs
Since detailed data isn’t provided, make reasonable assumptions (e.g., cost per hire, training time, exit
interviews).
List the data you would ideally collect to analyze each sub-problem.
Step 4: Prioritize Issues
Use impact vs. ease or cost vs. urgency matrices to determine which problems to tackle first.
Identify the top 1–2 root causes worth solving.
Step 5: Develop Insights & Recommendations
What might be driving the high attrition?
Propose data-driven solutions based on your assumptions and analysis.
Step 6: Present with the STAR Framework – without laptop/supporting materials
PRACTICE: CASE NO.1
Context
Amazon is a multinational technology company based in Seattle, Washington, United
States. It is one of the largest online retailers in the world, selling a wide variety of
products, including electronics, books, clothing, and household items. Amazon is among
the top 5 most valuable companies in terms of market capitalization (Jan 2023).
In 2021, only a third of Amazon’s new hires stayed with the company for more than 90
days before quitting, being fired, or getting laid off.
An investigation from the New York Times found that, among hourly employees,
Amazon’s turnover was approximately 150 percent annually.
Those numbers indicate that Amazon is having serious issues retaining employees.
Amazon estimated that its attrition rate costs it almost $8 billion a year across its global
consumer field operations team.
• Clark M. Leaked documents show just how fast employees are leaving Amazon [Internet]. The Verge. 2022 [cited 2023Feb19].
Available from: https://www.theverge.com/2022/10/17/23409920/amazon-third-hires-attrition-cost-workforce
• Villegas A, Beachy S. Inside Amazon’s Employment Machine [Internet]. New York Time. 2021 [cited 2023Feb19].
Available from: https://www.nytimes.com/interactive/2021/06/15/us/amazon-workers.html
PRACTICE: CASE NO.2
Context
Netflix is an American streaming service that provides a wide range of TV shows, movies,
documentaries, and other forms of entertainment to subscribers. It was founded in 1997
originally as a DVD-by-mail service before transitioning to a streaming service in 2007.
During the three-month period ending June 30 2022, Netflix reported a loss of 970,000
subscribers. This is the largest quarterly loss in the company’s history.
Previously, in April, the company reported that it had lost 200,000 subscribers in the first
quarter of 2022 — the first big loss in over a decade.
Netflix’s stock was on a decline of approximately 70% from the beginning of the year to
July 2022. Its market valuation has decreased from $300 billion to under $90 billion in
less than a year.
• Forristal L. Netflix loses 970,000 subscribers, its largest quarterly loss ever [Internet]. TechCrunch. 2022 [cited 2023Feb19].
Available from: https://techcrunch.com/2022/07/19/netflix-loses-970000-subscribers-its-largest-quarterly-loss-ever/
data
STRUCTURe
DATA STRUCTURE
DATA A dataset is a collection of values.
STRUCTURING Every value belongs to a variable and an
observation.
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
A variable contains all values that measure the
same underlying attribute across units. An
observation contains all values measured on
the same unit across attributes.
DATA STRUCTURE
Transforming given data to the standard tabular
format (variables - observations - values).
DATA STRUCTURE
EXTRA INFO: RELATIONAL DATABASE
A relational database is a collection of
information that organizes data in predefined
relationships where data is stored in one or
more tables (or "relations") of columns and
rows.
Relationships are a logical connection
between different tables, established on the
basis of interaction among these tables.
A database schema comprise of all
relationships and defines how data is
organized within a relational database
What is a relational database (RDBMS)? [Internet]. Google. Google; [cited 2023Feb19]. Available from: https://cloud.google.com/learn/what-is-a-relational-database#
Brazilian e-commerce public dataset by Olist [Internet]. Kaggle. Olist; 2021 [cited 2023Feb19]. Available from: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
the most
common
data TYPES?
Data types
Step 2 DATA analysis: DATA TYPES IN EXCEL
There are 4 main data types for data wrangling and analysis using Excel:
TEXT NUMBER BOOLEAN ERROR
A, B, C 1, 2, 3 TRUE #DIV/0, #N/A,
apple, Banana 1.2, 1.999 FALSE #NAME?,
Who? -1, -0.9 #NULL!,
“10”, “2.1” *date, #NUM!, #REF!
“TRUE” *time, #VALUE!
“” *duration etc.
Data types
Step 2 DATA analysis: DATA TYPES IN EXCEL
WARNING: In Excel, the data type (what it is) and data format (how we see
it) of one value might be vastly different:
DATA TYPE DATA FORMAT
● Character: A, B, C
TEXT ●
●
Special character: !@#$%^&*()
Text: apple, Banana, ORANGE
● Numbers as text: “0”, “1.1”
● Number: 0, 1.2, -1, -3.5
number ● Percentage: 12%, 1.5%, -3%
● Accounting; currency: (3), 4; 5000đ, $5.00
● Date; datetime: Feb 19th, 2023; 2023-02-19 17:00:00
● Duration: 3:20:00
Data types
EXTRA INFO: DATA TYPES IN DATABASE
There are many main data types for a structured database. They are heavily
validated. PostgresQL has some data types similar to Excel including:
TEXT NUMBER TEMPORAL BOOLEAN NULL
TEXT FLOAT DATE TRUE
CHAR INTEGER TIME FALSE
VARCHAR DATETIME
TIMESTAMP
INTERVAL
data
QUALITY
Data quality
Korolov, M. (2022) 6 dimensions of Data Quality Boost Data Performance, TechTarget. TechTarget.
Available at: https://www.techtarget.com/searchdatamanagement/tip/6-dimensions-of-data-quality-boost-data-performance (Accessed: February 19, 2023).
data
CLEANING
Data cleaning
DATA
STRUCTURING
Create unique ids for
the observations.
Data cleaning
DATA Missing values:
CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Impute the values using the
mean, median or max value
(for continuous values) or
most frequent value (for
categorical values) depending
on the situation.
● Ignore the values of those
variables only.
Data cleaning
DATA Invalid value:
CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Correct the value using most
reasonable methods.
Data cleaning
DATA
CLEANING
DATA
ENRICHING
Remove, impute or
correct missing values
and invalid values.
Data cleaning
Value inconsistency: lengths
Data type inconsistency: text (“20”)
vs. number (19)
DATA
VALIDATING
Validate and ensure the correct data
types, data homogeneity and
constraints.
Data cleaning
DATA
publishing
Store & manage data in suitable
format and system to deliver &
distribute the data to end-users
through platform and tools.
Data cleaning
DATA
STRUCTURING
Remove duplicates.
MID-TERM
BRIEFING
MID-TERM BRIEFING
Instruction & Dataset:
BA_S3(2024-2025)_Assignment_Mid-term
● Work in team
● Submit on eLearning by Mon, 16th Jun 2025
practice
data cleaning
mid-term dataset