0% found this document useful (0 votes)

72 views18 pages

Introduction To Data Mining Unit 2

This document provides an introduction to data mining and summarizes several key articles on data science trends. It discusses common challenges with data science projects, including lack of business focus and disconnects between data scientists and decision-makers. Additionally, it argues that data science teams are more effective when composed of generalists rather than specialists. The document also outlines popular applications of data mining like classification, clustering, association rule mining, and analytics. Classification is explained in more detail with examples in direct marketing, fraud detection, and customer churn prediction.

Uploaded by

vinay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views18 pages

Introduction To Data Mining Unit 2

Uploaded by

vinay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

9/22/2020

INTRODUCTION TO DATA MINING

UNIT # 2

FALL 2020 Sajjad Haider 1

ACKNOWLEDGEMENT

 A few slides in this presentation are taken from material provided by

 Han and Kimber (Data Mining Concepts and Techniques) and
 Tan, Steinbach and Kumar (Introduction to Data Mining)

FALL 2020 Sajjad Haider 2

1
9/22/2020

RECENT TRENDS

FALL 2020 Sajjad Haider 3

DATA SCIENCE AND THE ART OF PERSUASION (HBR 2019)

 Despite the success stories, many companies aren’t getting the value they
could from data science.
 Four of the top seven “barriers faced at work”:
 lack of management/financial support
 lack of clear questions to answer
 results not used by decision makers and
 explaining data science to others

FALL 2020 Sajjad Haider 4

2
9/22/2020

DO YOUR DATA SCIENTISTS KNOW THE ‘WHY’ BEHIND

THEIR WORK? (HBR 2019)

 Data science, broadly defined, has been around for a long time. But the failure
rates of big data projects in general and AI projects in particular remain
disturbingly high.
 The following were found to be the two most important reasons:
 Many data scientists are much more interested in pursuing their crafts — namely, finding
interesting nuggets buried in data — than they are in solving business problems.
 From the company’s perspective, the talent is rare and protecting data scientists from the
chaos of everyday work just makes sense. But doing so increases the distance between
data scientists and the company’s most important problems and opportunities.

FALL 2020 Sajjad Haider 5

WHY DATA SCIENCE TEAMS NEED GENERALISTS, NOT

SPECIALISTS (HBR 2019)

 The division of labor in Data Science projects similar to a pin factory

assembly line where “One [person] draws out the wire, another straights
it, a third cuts it, a fourth points it, a fifth grinds it,” doesn’t work well.
 Algorithmic products and services like recommendations systems, style
preference classification, seasonal trend detection, and more can’t be
designed up-front.
 With data science, you learn as you go, not before you go.

FALL 2020 Sajjad Haider 6

3
9/22/2020

WHAT’S THE BEST APPROACH TO DATA ANALYTICS? (HBR

2020)

 Organizations’ approaches generally fall into one of five scenarios:

 We’re here to help — do you have any problems to solve? (typically fail)
 Boil the ocean. (typically fail)
 Let a thousand flowers bloom. (work partially)
 Three years and $10 million from now, it’s going to be great. (work partially)
 Start with high-leverage business problems. (best approach)

FALL 2020 Sajjad Haider 7

POPULAR APPLICATIONS OF DATA MINING

FALL 2020 Sajjad Haider 8

4
9/22/2020

ANALYTICS

 Two major types are: Descriptive and Predictive Analytics

 Descriptive Analytics (Unsupervised Machine Learning)
 what happened and why did it happen
 Referred to as “unsupervised learning” in machine learning

 Predictive Analytics (Supervised Machine Learning)

 what will happen
 Referred to as “supervised learning” in machine learning

FALL 2020 Sajjad Haider 9

POPULAR APPLICATIONS OF DATA MINING

Grouping items by similarity Clustering

Discovering relationships between items Association rules
Determining relationship between outcome Regression
and the input variables
Analyzing text data to find trending terms, Text analytics
sentiment analysis, document classification,
etc.
Assigning label/class to records Classification
FALL 2020 Sajjad Haider 10

5
9/22/2020

CLASSIFICATION EXAMPLE

Tid Refund Marital Taxable Refund Marital Taxable

Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?
3 No Single 70K No No Married 150K ?
4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6 No Married 60K No No Married 80K ? Test
Set
10

7 Yes Divorced 220K No

8 No Single 85K Yes
9 No Married 75K No
Training
Learn
10 No Single 90K Yes Model
FALL 2020
10

Set Classifier Sajjad Haider 11

CLASSIFICATION: DEFINITION

 Given a collection of records (training set )

 Each record contains a set of attributes, one of the attributes is the class.
 Find a model for class attribute as a function of the values of other
attributes.
 Goal: previously unseen records should be assigned a class as accurately
as possible.
 A test set is used to determine the accuracy of the model. Usually, the given data
set is divided into training and test sets, with training set used to build the model
and test set used to validate it.
FALL 2020 Sajjad Haider 12

6
9/22/2020

CLASSIFICATION: APPLICATION 1
 Direct Marketing
 Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new
cell-phone product.
 Approach:
 Use the data for a similar product introduced before.
 We know which customers decided to buy and which decided otherwise. This {buy,
don’t buy} decision forms the class attribute.
 Collect various demographic, lifestyle, and company-interaction related information
about all such customers.
 Use this information as input attributes to learn a classifier model.
FALL 2020 Sajjad Haider 13

CLASSIFICATION: APPLICATION 2

 Fraud Detection
 Goal: Predict fraudulent cases in credit card transactions.
 Approach:
 Use credit card transactions and the information on its account-holder as attributes.
 When does a customer buy, what does he buy, how often he pays on time, etc
 Label past transactions as fraud or fair transactions. This forms the class attribute.
 Learn a model for the class of the transactions.
 Use this model to detect fraud by observing credit card transactions on an account.

FALL 2020 Sajjad Haider 14

7
9/22/2020

CLASSIFICATION: APPLICATION 3

 Customer Attrition/Churn:
 Goal: To predict whether a customer is likely to be lost to a competitor.
 Approach:
 Use detailed record of transactions with each of the past and present customers, to find
attributes.
 How often the customer calls, where he calls, what time-of-the day he calls most, his financial
status, marital status, etc.
 Label the customers as loyal or disloyal.
 Find a model for loyalty.

FALL 2020 Sajjad Haider 15

REGRESSION

 Predict a value of a given continuous valued variable based on the values

of other variables, assuming a linear or nonlinear model of dependency.
 Applications:
 Predicting sales amounts of new product based on advertising expenditure.
 Predicting wind velocities as a function of temperature, humidity, air pressure, etc.

FALL 2020 Sajjad Haider 16

8
9/22/2020

CLUSTERING DEFINITION

 Given a set of data points, each having a set of attributes, and a similarity
measure among them, find clusters such that
 Data points in one cluster are more similar to one another.
 Data points in separate clusters are less similar to one another.
 Similarity Measures:
 Euclidean Distance if attributes are continuous.
 Other Problem-specific Measures.

FALL 2020 Sajjad Haider 17

ILLUSTRATING CLUSTERING
Intracluster distances Intercluster distances
are minimized are maximized

FALL 2020
Euclidean Distance Based Clustering in 3-D space.
Sajjad Haider 18

9
9/22/2020

CLUSTERING: APPLICATION 1

 Market Segmentation:
 Goal: subdivide a market into distinct subsets of customers where any subset may
conceivably be selected as a market target to be reached with a distinct
marketing mix.
 Approach:
 Collect different attributes of customers based on their geographical and lifestyle related
information.
 Find clusters of similar customers.
 Measure the clustering quality by observing buying patterns of customers in same cluster vs.
those from different clusters.

FALL 2020 Sajjad Haider 19

CLUSTERING: APPLICATION 2

 Document Clustering:
 Goal: To find groups of documents that are similar to each other based on the
important terms appearing in them.
 Approach: To identify frequently occurring terms in each document. Form a
similarity measure based on the frequencies of different terms. Use it to cluster.

FALL 2020 Sajjad Haider 20

10
9/22/2020

ASSOCIATION RULE DISCOVERY: DEFINITION

 Given a set of records each of which contain TID Items

some number of items from a given collection; 1 Bread, Coke, Milk
2 Beer, Bread
 Produce dependency rules which will predict 3 Beer, Coke, Diaper, Milk
occurrence of an item based on occurrences 4 Beer, Bread, Diaper, Milk
of other items. 5 Coke, Diaper, Milk

Rules Discovered:
{Milk} --> {Coke}
 Most popular Application: Recommender {Diaper, Milk} --> {Beer}
Systems

FALL 2020 Sajjad Haider 21

ASSOCIATION RULE DISCOVERY: APPLICATION 1

 Marketing and Sales Promotion:

 Let the rule discovered be
{Bagels, … } --> {Potato Chips}
 Potato Chips as consequent => Can be used to determine what should be done
to boost its sales.
 Bagels in the antecedent => Can be used to see which products would be affected
if the store discontinues selling bagels.
 Bagels in antecedent and Potato chips in consequent => Can be used to see what
products should be sold with Bagels to promote sale of Potato chips!

FALL 2020 Sajjad Haider 22

11
9/22/2020

ASSOCIATION RULE DISCOVERY: APPLICATION 2

 Supermarket shelf management.

 Goal: To identify items that are bought together by sufficiently many customers.
 Approach: Process the point-of-sale data collected with barcode scanners to find
dependencies among items.
 A classic rule --
 If a customer buys diaper and milk, then he is very likely to buy beer.
 So, don’t be surprised if you find six-packs stacked next to diapers!

FALL 2020 Sajjad Haider 23

ASSOCIATION RULE DISCOVERY: APPLICATION 3

 Inventory Management:
 Goal: A consumer appliance repair company wants to anticipate the nature
of repairs on its consumer products and keep the service vehicles equipped
with right parts to reduce on number of visits to consumer households.
 Approach: Process the data on tools and parts required in previous repairs
at different consumer locations and discover the co-occurrence patterns.

FALL 2020 Sajjad Haider 24

12
9/22/2020

TEXT ANALYTICS

 Text analytics is the process of analyzing unstructured text, extracting

relevant information, and transforming it into useful insight.
 Applications:
 Sentiment analysis
 Tag cloud
 Topic modeling
 Machine translation

FALL 2020 Sajjad Haider 25

DATA MINING VS ALLIED FIELDS

FALL 2020 Sajjad Haider 26

13
9/22/2020

STATISTICS VS. MACHINE LEARNING

 Data mining has its origins in various disciplines, of which the two most
important are statistics and machine learning.
 Statistics has its roots in mathematics, and therefore, there has been an
emphasis on mathematical rigor, a desire to establish that something is sensible
on theoretical grounds before testing it in practice.
 In contrast, the machine learning community has its origin very much in
computer practice. This has led to a practical orientation, a willingness to test
something out to see how well it performs, without waiting for a formal proof
of effectiveness.

FALL 2020 Sajjad Haider 27

STATISTICS VS. MACHINE LEARNING (CONT’D)

 Modern statistics is entirely driven by the notion of a model. This is a

postulated structure, or an approximation to a structure, which could
have led to the data.
 In place of the statistical emphasis on models, machine learning tends to
emphasize algorithms.

FALL 2020 Sajjad Haider 28

14
9/22/2020

DATA MINING AND MACHINE LEARNING

 Data mining as a process includes data understanding, data preparation

and data modeling; while machine learning takes the processed data as
input and performs predictions by applying algorithms.
 Thus, data mining requires involvement of human beings to clean and
prepare the data and to understand the patterns.
 While in machine learning human effort is involved only to define an
algorithm, after which the algorithm takes over operations.

FALL 2020 Sajjad Haider 29

DM AND ML (CONT’D)

 Nevertheless, it is worth pointing out some of the differences to give

perspective.
 Speaking generally, because Machine Learning is concerned with many
types of performance improvement, it includes subfields such as robotics
and computer vision that are not part of Data Mining.
 Machine Learning also is concerned with issues of agency and cognition—
how will an intelligent agent use learned knowledge to reason and act in
its environment—which are not concerns of Data Mining.

FALL 2020 Sajjad Haider 30

15
9/22/2020

BI VS. DATA ANALYTICS

Past
 Business Intelligence (BI) focuses on using a
consistent set of metrics to measure past
performance and inform business planning.
 Data Analytics refers to a combination of analytical
and machine learning techniques used for drawing
inferences and insight out of data

Future

FALL 2020 Sajjad Haider 31

BI ANSWERS FOR FRAUD DETECTION

 How many cases were investigated last month?

 What was the success rate in collecting debts?
 How much revenue was recovered through collections?
 What was the close rate of cases in the past month? Past quarter? Past
year?
 For debts that were closed out, how many days it take on average to
close out debts?

FALL 2020 Sajjad Haider 32

16
9/22/2020

PREDICTIVE ANALYTICS FOR FRAUD DETECTION

 What is the likelihood that the transaction is fraudulent?

 What is the likelihood the invoice is fraudulent or warrants further
investigation?
 Which characteristics of the transaction are most related to or most
predictive of fraud?
 What is the expected amount of fraud?
 Historically, which demographic and historic purchase patterns were most
related to fraud?

FALL 2020 Sajjad Haider 33

BI ANSWERS FOR CUSTOMER ANALYTICS

 Which regions/states/ZIPs had the highest response rates?

 Which products had the highest/lowest click-through rates?
 How many repeat purchasers were there last month?
 How many new subscriptions to the loyalty program were there?
 How many visits to the store/website did a person have?

FALL 2020 Sajjad Haider 34

17
9/22/2020

PREDICTIVE ANALYTICS FOR CUSTOMER ANALYTICS

 What is the likelihood an e-mail will be opened?

 What is the likelihood a customer will click-through a link in an e-mail?
 Which product is a customer more likely to purchase if given the choice?
 How many e-mails should the customer receive to maximize the
likelihood of a purchase?
 What is the likelihood of a product will sell out if it is put on sale?

FALL 2020 Sajjad Haider 35

Data Mining and Warehousing: - Module 1 - Introduction
No ratings yet
Data Mining and Warehousing: - Module 1 - Introduction
29 pages
Foundations of Data Science - Unit 3
No ratings yet
Foundations of Data Science - Unit 3
18 pages
Datamining ch1
No ratings yet
Datamining ch1
24 pages
Ch2 DTasks
No ratings yet
Ch2 DTasks
44 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Data Mining, Data Wharehousing and Olap
No ratings yet
Data Mining, Data Wharehousing and Olap
33 pages
3 DM
No ratings yet
3 DM
36 pages
Data Mining Slide
No ratings yet
Data Mining Slide
35 pages
Data Mining
No ratings yet
Data Mining
33 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
Chap1 Intro
No ratings yet
Chap1 Intro
28 pages
Data Mining
No ratings yet
Data Mining
23 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
43 pages
Data Mining for Business Students
No ratings yet
Data Mining for Business Students
75 pages
2a. Basic Data Mining Techniques
No ratings yet
2a. Basic Data Mining Techniques
39 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
Data Mining for Aspiring Analysts
No ratings yet
Data Mining for Aspiring Analysts
36 pages
Data Management
No ratings yet
Data Management
36 pages
3 Data Mining
No ratings yet
3 Data Mining
58 pages
Introduction
No ratings yet
Introduction
29 pages
Chap1 Intro-2
No ratings yet
Chap1 Intro-2
34 pages
Topic 1c - Tasks and Techniques of DM
No ratings yet
Topic 1c - Tasks and Techniques of DM
24 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Data Mining: Introduction: Lecture Notes For Chapter 1
No ratings yet
Data Mining: Introduction: Lecture Notes For Chapter 1
32 pages
L1 Intro
No ratings yet
L1 Intro
32 pages
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
Fakulteti I Shkencave Kompjuterike: Lënda
No ratings yet
Fakulteti I Shkencave Kompjuterike: Lënda
58 pages
02 - Data Mining
No ratings yet
02 - Data Mining
27 pages
01 Intro To Data Mining
No ratings yet
01 Intro To Data Mining
43 pages
DMlecture 1
No ratings yet
DMlecture 1
39 pages
DWDM Unit 1 Part 1
No ratings yet
DWDM Unit 1 Part 1
35 pages
Lect 1
No ratings yet
Lect 1
38 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
30 pages
Lec 1
No ratings yet
Lec 1
33 pages
01 Intro To Data Mining
No ratings yet
01 Intro To Data Mining
32 pages
3510-6510 Ch4
No ratings yet
3510-6510 Ch4
62 pages
Data Mining
No ratings yet
Data Mining
37 pages
CRM Descriptive Analytics Guide
No ratings yet
CRM Descriptive Analytics Guide
33 pages
Intelligent Systems 1
No ratings yet
Intelligent Systems 1
38 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
20 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
Predictive Analytics
100% (1)
Predictive Analytics
62 pages
0 KDLVLP Đã G P
No ratings yet
0 KDLVLP Đã G P
523 pages
lecture1&2-đã chuyển đổi
No ratings yet
lecture1&2-đã chuyển đổi
46 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
DMiningKuliah 1 Introduction
No ratings yet
DMiningKuliah 1 Introduction
41 pages
Instructor:: Doaa Adil Mohamed Altayeb
No ratings yet
Instructor:: Doaa Adil Mohamed Altayeb
34 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
52 pages
Data Mining & Agent Selection Guide
No ratings yet
Data Mining & Agent Selection Guide
8 pages
3510-6510 - Ch4 Predictive Analytics I
No ratings yet
3510-6510 - Ch4 Predictive Analytics I
66 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
27 pages
DMiningKuliah1 (Introduction)
No ratings yet
DMiningKuliah1 (Introduction)
45 pages
DSand ML
No ratings yet
DSand ML
76 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Data Mining
No ratings yet
Data Mining
7 pages
DM Day2 DataUnderstanding MS S25
No ratings yet
DM Day2 DataUnderstanding MS S25
165 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
41 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
Ch04 7ed
No ratings yet
Ch04 7ed
20 pages
Part 1: Objective Paper: Is Used For Process Switching in An Operating System
No ratings yet
Part 1: Objective Paper: Is Used For Process Switching in An Operating System
4 pages
I Am Not Going To Give You The Answer To These Questions, But You Can Expect Them To Show Up On The Exam in Some Form - So Use Them To Prepare!
No ratings yet
I Am Not Going To Give You The Answer To These Questions, But You Can Expect Them To Show Up On The Exam in Some Form - So Use Them To Prepare!
4 pages
Subqueries in SQL: A Comprehensive Guide
No ratings yet
Subqueries in SQL: A Comprehensive Guide
21 pages
Introduction To Data Mining Unit 1
No ratings yet
Introduction To Data Mining Unit 1
13 pages
Introduction To Data Mining Unit 4
No ratings yet
Introduction To Data Mining Unit 4
13 pages
LA Assignment 3 PDF
No ratings yet
LA Assignment 3 PDF
4 pages
Sepi Journal Brief PDF
No ratings yet
Sepi Journal Brief PDF
2 pages
The Hadith of Jibraeel Explained
No ratings yet
The Hadith of Jibraeel Explained
2 pages
Institute of Business Administration, Karachi
No ratings yet
Institute of Business Administration, Karachi
2 pages
K Meansassignment PDF
No ratings yet
K Meansassignment PDF
2 pages
Intro to Computer Programming CS141
No ratings yet
Intro to Computer Programming CS141
31 pages
LA Assignment 3 PDF
No ratings yet
LA Assignment 3 PDF
4 pages
Packet Tracer - Designing and Implementing A VLSM Addressing Scheme
No ratings yet
Packet Tracer - Designing and Implementing A VLSM Addressing Scheme
4 pages
Data Mining Cheat Sheet PDF
No ratings yet
Data Mining Cheat Sheet PDF
6 pages
Research Paper On Genetic Algorithm in Data Mining
No ratings yet
Research Paper On Genetic Algorithm in Data Mining
7 pages
1 s2.0 S0950423019303638 Main
No ratings yet
1 s2.0 S0950423019303638 Main
15 pages
Customer Loan Prediction: Term Project Report
100% (1)
Customer Loan Prediction: Term Project Report
11 pages
Cluster Analysis and Methods Overview
No ratings yet
Cluster Analysis and Methods Overview
47 pages
Accenture Data Scientist Interview Questions
No ratings yet
Accenture Data Scientist Interview Questions
13 pages
Module 1: Introduction To Business Intelligence Architecture
0% (1)
Module 1: Introduction To Business Intelligence Architecture
42 pages
Reducing Unwelcome Surprises in Project Management
No ratings yet
Reducing Unwelcome Surprises in Project Management
26 pages
Research Papers On Web Mining 2014
No ratings yet
Research Papers On Web Mining 2014
8 pages
Process Mining vs. Data Mining: Common
No ratings yet
Process Mining vs. Data Mining: Common
2 pages
Recommendation of Agricultural Crop Based On Productivity and Season Using Machine Learning
No ratings yet
Recommendation of Agricultural Crop Based On Productivity and Season Using Machine Learning
9 pages
Mastering DeepSeek For Financial Growth (Manuh, Yatendra Kumar Singh) (Z-Library)
No ratings yet
Mastering DeepSeek For Financial Growth (Manuh, Yatendra Kumar Singh) (Z-Library)
86 pages
Yue Zhao: AI & ML Research Expertise
No ratings yet
Yue Zhao: AI & ML Research Expertise
7 pages
Web Mining PPT 4121
No ratings yet
Web Mining PPT 4121
18 pages
CS402 Data Mining and Warehousing
No ratings yet
CS402 Data Mining and Warehousing
3 pages
Cs425 Datawarehousing & Datamining (Elective - Iv) : IV Year B.Tech. ECM I - Semester L T P To C
No ratings yet
Cs425 Datawarehousing & Datamining (Elective - Iv) : IV Year B.Tech. ECM I - Semester L T P To C
2 pages
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
No ratings yet
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
8 pages
Chap8 Basic Cluster Analysis
No ratings yet
Chap8 Basic Cluster Analysis
98 pages
ML Unit 5 Material SVCK Cse
No ratings yet
ML Unit 5 Material SVCK Cse
22 pages
Computer Use Cases and Benefits
100% (1)
Computer Use Cases and Benefits
18 pages
4-1 Syllabus
No ratings yet
4-1 Syllabus
6 pages
(Week 11) Data Mining (K-Means Clustering)
No ratings yet
(Week 11) Data Mining (K-Means Clustering)
8 pages
CMR Technical Campus B. Tech. Mid Question Bank (R22 Regulation) Academic Year:2024-2025 Semester: VI
No ratings yet
CMR Technical Campus B. Tech. Mid Question Bank (R22 Regulation) Academic Year:2024-2025 Semester: VI
4 pages
Ai and Machine Learning For Business
No ratings yet
Ai and Machine Learning For Business
114 pages
Kumpulan Kuis Kuis
100% (2)
Kumpulan Kuis Kuis
24 pages
Digital Forensics for Analysts
No ratings yet
Digital Forensics for Analysts
13 pages
Big Data in Supply Chain Management
No ratings yet
Big Data in Supply Chain Management
71 pages
Summary and Q/A of Opening & Ending Vignettes of Data Mining For Business Intelligence & Data Warehousing
0% (1)
Summary and Q/A of Opening & Ending Vignettes of Data Mining For Business Intelligence & Data Warehousing
17 pages
DMW Simp-Tie
No ratings yet
DMW Simp-Tie
2 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages

Introduction To Data Mining Unit 2

Uploaded by

Introduction To Data Mining Unit 2

Uploaded by

9/22/2020

INTRODUCTION TO DATA MINING

FALL 2020 Sajjad Haider 1

 A few slides in this presentation are taken from material provided by

FALL 2020 Sajjad Haider 2

FALL 2020 Sajjad Haider 3

DATA SCIENCE AND THE ART OF PERSUASION (HBR 2019)

FALL 2020 Sajjad Haider 4

DO YOUR DATA SCIENTISTS KNOW THE ‘WHY’ BEHIND

FALL 2020 Sajjad Haider 5

WHY DATA SCIENCE TEAMS NEED GENERALISTS, NOT

 The division of labor in Data Science projects similar to a pin factory

FALL 2020 Sajjad Haider 6

WHAT’S THE BEST APPROACH TO DATA ANALYTICS? (HBR

 Organizations’ approaches generally fall into one of five scenarios:

FALL 2020 Sajjad Haider 7

POPULAR APPLICATIONS OF DATA MINING

FALL 2020 Sajjad Haider 8

 Two major types are: Descriptive and Predictive Analytics

 Predictive Analytics (Supervised Machine Learning)

FALL 2020 Sajjad Haider 9

POPULAR APPLICATIONS OF DATA MINING

Grouping items by similarity Clustering

Tid Refund Marital Taxable Refund Marital Taxable

1 Yes Single 125K No No Single 75K ?

7 Yes Divorced 220K No

Set Classifier Sajjad Haider 11

 Given a collection of records (training set )

FALL 2020 Sajjad Haider 14

FALL 2020 Sajjad Haider 15

 Predict a value of a given continuous valued variable based on the values

FALL 2020 Sajjad Haider 16

FALL 2020 Sajjad Haider 17

FALL 2020 Sajjad Haider 19

FALL 2020 Sajjad Haider 20

ASSOCIATION RULE DISCOVERY: DEFINITION

 Given a set of records each of which contain TID Items

FALL 2020 Sajjad Haider 21

ASSOCIATION RULE DISCOVERY: APPLICATION 1

 Marketing and Sales Promotion:

FALL 2020 Sajjad Haider 22

ASSOCIATION RULE DISCOVERY: APPLICATION 2

 Supermarket shelf management.

FALL 2020 Sajjad Haider 23

ASSOCIATION RULE DISCOVERY: APPLICATION 3

FALL 2020 Sajjad Haider 24

 Text analytics is the process of analyzing unstructured text, extracting

FALL 2020 Sajjad Haider 25

DATA MINING VS ALLIED FIELDS

FALL 2020 Sajjad Haider 26

STATISTICS VS. MACHINE LEARNING

FALL 2020 Sajjad Haider 27

STATISTICS VS. MACHINE LEARNING (CONT’D)

 Modern statistics is entirely driven by the notion of a model. This is a

FALL 2020 Sajjad Haider 28

DATA MINING AND MACHINE LEARNING

 Data mining as a process includes data understanding, data preparation

FALL 2020 Sajjad Haider 29

 Nevertheless, it is worth pointing out some of the differences to give

FALL 2020 Sajjad Haider 30

BI VS. DATA ANALYTICS

FALL 2020 Sajjad Haider 31

BI ANSWERS FOR FRAUD DETECTION

 How many cases were investigated last month?

FALL 2020 Sajjad Haider 32

PREDICTIVE ANALYTICS FOR FRAUD DETECTION

 What is the likelihood that the transaction is fraudulent?

FALL 2020 Sajjad Haider 33

BI ANSWERS FOR CUSTOMER ANALYTICS

 Which regions/states/ZIPs had the highest response rates?

FALL 2020 Sajjad Haider 34

PREDICTIVE ANALYTICS FOR CUSTOMER ANALYTICS

 What is the likelihood an e-mail will be opened?

FALL 2020 Sajjad Haider 35

You might also like