Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views37 pages

Session 2 - Data Structure, Data Management and Data Quality

The document outlines a syllabus for a business analytics course focused on marketing, detailing problem-solving frameworks using data, data wrangling, analysis, and visualization techniques. It includes practice cases involving Amazon and Netflix to apply structured problem-solving approaches to real-world challenges. Additionally, it covers data types, data quality, and data cleaning processes essential for effective data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views37 pages

Session 2 - Data Structure, Data Management and Data Quality

The document outlines a syllabus for a business analytics course focused on marketing, detailing problem-solving frameworks using data, data wrangling, analysis, and visualization techniques. It includes practice cases involving Amazon and Netflix to apply structured problem-solving approaches to real-world challenges. Additionally, it covers data types, data quality, and data cleaning processes essential for effective data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

BUSINESS ANALYTICS

IN
MARKETING
GEMS UPDATE
TEAMS
TEAMS
SYLLABUS

problem-solving DATA wrangling VISUALIZATION EXPERT


using data & ANALYSIS & STORYTELLING SHARING
• Problem statement & goal • Data wrangling • Graphs, charts &
setting • Data analysis dashboards
• Data analytics roadmap • RFM analysis • Story-telling
• Analytics tools exercise • Common pitfalls
HOW TO SOLVE PROBLEMS USING DATA?

STEP 1 Step 2 Step 3


Problem statement Data wrangling Data analysis

Step 4 Step 5
Data visualization Communication
data WRANGLING
& analysis
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


data DATA DATA
DISCOVERY STRUCTURING CLEANING

DATA DATA DATA


ENRICHING VALIDATING publishing
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


data What does this dataset mean?
DISCOVERY
HOW TO SOLVE PROBLEMS USING DATA?

Step 2 DATA WRANGLING


DATA A dataset is a collection of values.
STRUCTURING Every value belongs to a variable and an
observation.

Each variable forms a column.


Each observation forms a row.
Each type of observational unit forms a table.

A variable contains all values that measure the


same underlying attribute across units. An
observation contains all values measured on
the same unit across attributes.
PRACTICE
PROBLEM-SOLVE
HOW TO SOLVE PROBLEMS USING DATA?
GROUP
DISCUSSION Objective:
Today, you’ll practice applying structured problem-solving frameworks to tackle a real-world business
challenge. Your goal is to identify, break down, and prioritize problems, and then develop data-driven
insights and actionable recommendations.
Your Task:
Pick 1 case and work through this challenge using a structured problem-solving approach.
Time: 30’
Step 1: Define the Problem
Present: 5’ Draft a clear and concise problem statement (1-2 sentences).
Step 2: Break It Down Using the MECE Framework
Decompose the problem into Mutually Exclusive, Collectively Exhaustive sub-problems.
Think in logical buckets: hiring process, onboarding, compensation, work conditions, management, etc.
Step 3: Make Assumptions & Identify Data Needs
Since detailed data isn’t provided, make reasonable assumptions (e.g., cost per hire, training time, exit
interviews).
List the data you would ideally collect to analyze each sub-problem.
Step 4: Prioritize Issues
Use impact vs. ease or cost vs. urgency matrices to determine which problems to tackle first.
Identify the top 1–2 root causes worth solving.
Step 5: Develop Insights & Recommendations
What might be driving the high attrition?
Propose data-driven solutions based on your assumptions and analysis.
Step 6: Present with the STAR Framework – without laptop/supporting materials
PRACTICE: CASE NO.1

Context

Amazon is a multinational technology company based in Seattle, Washington, United


States. It is one of the largest online retailers in the world, selling a wide variety of
products, including electronics, books, clothing, and household items. Amazon is among
the top 5 most valuable companies in terms of market capitalization (Jan 2023).

In 2021, only a third of Amazon’s new hires stayed with the company for more than 90
days before quitting, being fired, or getting laid off.

An investigation from the New York Times found that, among hourly employees,
Amazon’s turnover was approximately 150 percent annually.

Those numbers indicate that Amazon is having serious issues retaining employees.
Amazon estimated that its attrition rate costs it almost $8 billion a year across its global
consumer field operations team.

• Clark M. Leaked documents show just how fast employees are leaving Amazon [Internet]. The Verge. 2022 [cited 2023Feb19].
Available from: https://www.theverge.com/2022/10/17/23409920/amazon-third-hires-attrition-cost-workforce
• Villegas A, Beachy S. Inside Amazon’s Employment Machine [Internet]. New York Time. 2021 [cited 2023Feb19].
Available from: https://www.nytimes.com/interactive/2021/06/15/us/amazon-workers.html
PRACTICE: CASE NO.2

Context

Netflix is an American streaming service that provides a wide range of TV shows, movies,
documentaries, and other forms of entertainment to subscribers. It was founded in 1997
originally as a DVD-by-mail service before transitioning to a streaming service in 2007.

During the three-month period ending June 30 2022, Netflix reported a loss of 970,000
subscribers. This is the largest quarterly loss in the company’s history.

Previously, in April, the company reported that it had lost 200,000 subscribers in the first
quarter of 2022 — the first big loss in over a decade.

Netflix’s stock was on a decline of approximately 70% from the beginning of the year to
July 2022. Its market valuation has decreased from $300 billion to under $90 billion in
less than a year.

• Forristal L. Netflix loses 970,000 subscribers, its largest quarterly loss ever [Internet]. TechCrunch. 2022 [cited 2023Feb19].
Available from: https://techcrunch.com/2022/07/19/netflix-loses-970000-subscribers-its-largest-quarterly-loss-ever/
data
STRUCTURe
DATA STRUCTURE

DATA A dataset is a collection of values.


STRUCTURING Every value belongs to a variable and an
observation.

Each variable forms a column.


Each observation forms a row.
Each type of observational unit forms a table.

A variable contains all values that measure the


same underlying attribute across units. An
observation contains all values measured on
the same unit across attributes.
DATA STRUCTURE

Transforming given data to the standard tabular


format (variables - observations - values).
DATA STRUCTURE

EXTRA INFO: RELATIONAL DATABASE


A relational database is a collection of
information that organizes data in predefined
relationships where data is stored in one or
more tables (or "relations") of columns and
rows.

Relationships are a logical connection


between different tables, established on the
basis of interaction among these tables.

A database schema comprise of all


relationships and defines how data is
organized within a relational database
What is a relational database (RDBMS)? [Internet]. Google. Google; [cited 2023Feb19]. Available from: https://cloud.google.com/learn/what-is-a-relational-database#
Brazilian e-commerce public dataset by Olist [Internet]. Kaggle. Olist; 2021 [cited 2023Feb19]. Available from: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
the most
common
data TYPES?
Data types

Step 2 DATA analysis: DATA TYPES IN EXCEL


There are 4 main data types for data wrangling and analysis using Excel:

TEXT NUMBER BOOLEAN ERROR


A, B, C 1, 2, 3 TRUE #DIV/0, #N/A,
apple, Banana 1.2, 1.999 FALSE #NAME?,
Who? -1, -0.9 #NULL!,
“10”, “2.1” *date, #NUM!, #REF!
“TRUE” *time, #VALUE!
“” *duration etc.
Data types

Step 2 DATA analysis: DATA TYPES IN EXCEL


WARNING: In Excel, the data type (what it is) and data format (how we see
it) of one value might be vastly different:

DATA TYPE DATA FORMAT

● Character: A, B, C
TEXT ●

Special character: !@#$%^&*()
Text: apple, Banana, ORANGE
● Numbers as text: “0”, “1.1”

● Number: 0, 1.2, -1, -3.5


number ● Percentage: 12%, 1.5%, -3%
● Accounting; currency: (3), 4; 5000đ, $5.00
● Date; datetime: Feb 19th, 2023; 2023-02-19 17:00:00
● Duration: 3:20:00
Data types

EXTRA INFO: DATA TYPES IN DATABASE


There are many main data types for a structured database. They are heavily
validated. PostgresQL has some data types similar to Excel including:

TEXT NUMBER TEMPORAL BOOLEAN NULL


TEXT FLOAT DATE TRUE
CHAR INTEGER TIME FALSE
VARCHAR DATETIME
TIMESTAMP
INTERVAL
data
QUALITY
Data quality

Korolov, M. (2022) 6 dimensions of Data Quality Boost Data Performance, TechTarget. TechTarget.
Available at: https://www.techtarget.com/searchdatamanagement/tip/6-dimensions-of-data-quality-boost-data-performance (Accessed: February 19, 2023).
data
CLEANING
Data cleaning

DATA
STRUCTURING

Create unique ids for


the observations.
Data cleaning

DATA Missing values:


CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Impute the values using the
mean, median or max value
(for continuous values) or
most frequent value (for
categorical values) depending
on the situation.
● Ignore the values of those
variables only.
Data cleaning

DATA Invalid value:


CLEANING ● Drop the observation
DATA
altogether.
ENRICHING
● Correct the value using most
reasonable methods.
Data cleaning

DATA
CLEANING
DATA
ENRICHING

Remove, impute or
correct missing values
and invalid values.
Data cleaning

Value inconsistency: lengths

Data type inconsistency: text (“20”)


vs. number (19)
DATA
VALIDATING

Validate and ensure the correct data


types, data homogeneity and
constraints.
Data cleaning

DATA
publishing

Store & manage data in suitable


format and system to deliver &
distribute the data to end-users
through platform and tools.
Data cleaning

DATA
STRUCTURING

Remove duplicates.
MID-TERM
BRIEFING
MID-TERM BRIEFING

Instruction & Dataset:


BA_S3(2024-2025)_Assignment_Mid-term
● Work in team
● Submit on eLearning by Mon, 16th Jun 2025
practice
data cleaning
mid-term dataset

You might also like