0% found this document useful (0 votes)

8 views23 pages

Topic 1c - Tasks & Techniques

The document provides an overview of data mining, including its objectives, tasks, and techniques. It discusses various data mining tasks such as classification, clustering, and association rule discovery, along with their applications in fields like marketing and fraud detection. Additionally, it highlights the importance of understanding data relationships and the methods used to analyze and predict trends from data.

Uploaded by

2024793147

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views23 pages

Topic 1c - Tasks & Techniques

Uploaded by

2024793147

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Topic 1c:

Task and
Techniques
of Data
Mining
Ts. Dr. Tuan Norhafizah Tuan Zakaria
Objectives

To introduce about To discuss the history, To discuss Data Mining

Data Mining and its evolution and techniques, tasks,
relationship with data motivation of Data applications and some
and knowledge Mining major issues
Knowledge Discovery
in Databases

DM: Tasks and Techniques Data mining

Tasks

Techniques

Tasks Techniques
• Classification • Decision Trees
• Clustering • Association Rule
• Association Rules • k-means
• Prediction • Neural Networks
• Sequential Analysis • Naïve Bayes
• Deviation analysis • k-nearest neighbor
• Similarity analysis • Statistical Method
• Trend analysis
Given a collection of records (training set )
• Each record contains a set of attributes, one of the
attributes is the class.

Classificati Find a model for class attribute as a

function of the values of other attributes.
on:
Definition Goal: previously unseen records should be
assigned a class as accurately as possible.
• A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to build
the model and test set used to validate it.
Classification Example
l l us
ir ca ir ca uo
go go ti n
te te n s
ca ca co lc as
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?
3 No Single 70K No No Married 150K ?
4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6 No Married 60K No No Married 80K ? Test
Set
10

7 Yes Divorced 220K No

8 No Single 85K Yes
9 No Married 75K No Training Learn
Set Classifier Model
10 No Single 90K Yes
10
Classification: Direct Marketing

Goal: Reduce cost of mailing by targeting a set of

consumers likely to buy a new cell-phone product.

Approach:
We know Collect various demographic, lifestyle,
Identify which customers decided to buy and
and company-interaction related information, Use this information as input attributes to learn a
which decided otherwise. This {buy, don’t buy}
type of business, where they stay, how much classifier model.
decision forms the class attribute.
they earn, etc.
Classification: Customer Attrition/Churn

Goal: To predict whether a customer is likely to be lost to a

competitor.

Approach:
How often the customer calls,
Use detailed record of transactions where he calls, what time-of-the day Label the customers as loyal or
Find a model for loyalty.
(past and present customers he calls most, his financial status, disloyal.
marital status, etc.
Given a set of data points, each having a
set of attributes, and a similarity measure
among them, find clusters such that

• Data points in one cluster are more similar to one

another.

Clusterin • Data points in separate clusters are less similar to one

another.

g
Similarity Measures:

• Euclidean Distance if attributes are continuous.

• Other Problem-specific Measures.
Clustering: Euclidean Distance
Based Clustering in 3-D space.

Intracluster Intercluster
distances distances
are minimized are maximized
Clustering: Market Segmentation

Goal: subdivide a market into distinct subsets of customers

where any subset may conceivably be selected as a market target
to be reached with a distinct marketing mix.

2. Approach:
Collect different attributes of customers Measure the clustering quality by observing
based on their geographical and lifestyle Find clusters of similar customers. buying patterns of customers in same cluster
related information. vs. those from different clusters.
Clustering: Market Segmentation
Segment 1: high duration
Segment 2: moderate
but low number of
duration of generated calls
generated calls and
and moderate to high data
moderate number of sent
usage.
and received SMS.

Segment 3: high duration of

off-net calls, high number Segment 4: very low call
of generated calls, and duration, high sent and
moderate to low of both received SMS, and high
duration of generated calls data usage.
and data usage.

Segment 5: very low data

usage, low duration of
Segment 6: relatively high
generated calls, and high
duration of international
number of received calls
calls.
with respect to the number
of generated calls.
Clustering: Document Clustering

Goal: To find groups of documents that are similar to each

other based on the important terms appearing in them.

2. Approach:
To identify frequently occurring terms in each document. Gain: Information Retrieval can utilize the clusters to
Form a similarity measure based on the frequencies of relate a new document or search term to clustered
different terms. Use it to cluster. documents.
Association
Rule TID Items
Discovery 1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
• Given a set of records each of
which contain some number of 4 Beer, Bread, Diaper, Milk
items from a given collection; 5 Coke, Diaper, Milk
• Produce dependency rules
which will predict occurrence
of an item based on Rules
RulesDiscovered:
Discovered:
occurrences of other items. {Milk}
{Milk}-->
-->{Coke}
{Coke}
{Diaper,
{Diaper,Milk}
Milk}-->
-->{Beer}
{Beer}
Association Rule
Discovery:
Marketing & Sales
Promotion
• Let the rule discovered be
{Bagels, … } --> {Potato Chips}
• Potato Chips as consequent can be used to
determine what should be done to boost its sales.
• Bagels in the antecedent Can be used to see which
products would be affected if the store discontinues
selling bagels.
• Bagels in antecedent and Potato chips in consequent
can be used to see what products should be sold
with Bagels to promote sale of Potato chips!
Goal: To identify items that are bought
Association together by sufficiently many customers.

Rule Approach:
Discovery: • Process the point-of-sale data collected with barcode
Supermark scanners to find dependencies among items.

et Shelf A classic rule

Manageme • If a customer buys diaper and milk, then he is very
nt likely to buy rootbeer.
• So, don’t be surprised if you find six-packs of rootbeer
stacked next to diapers!
Retail
Analytics

https://www.digitalnewsasia.com/download/tapwaycasestudy.pdf
Regression

Predict a value of a given

continuous valued variable
Greatly studied in statistics,
based on the values of
and machine learning Examples:
other variables, assuming a
fields.
linear or nonlinear model
of dependency.

Predicting sales amounts of Predicting wind velocities as

Time series prediction of
new product based on a function of temperature,
stock market indices.
advertising expenditure. humidity, air pressure, etc.
Deviation Analysis
Discovering most significant changes in data from previously measured or normative
values
• Usually, categorical separately from other data mining tasks

Deviations are often infrequent

Modifications of classification, clustering, time series analysis can be used as a means

to achieve the goal

Outlier detection in statistics

Detect significant deviations from
Deviation normal behavior.

Analysis:
Anomaly Applications:
Detection • Credit card fraud detection
• Network intrusion detection

Typical network traffic at University level may reach over 100 million connections per day
Deviation Analysis: Fraud Detection

Compare employee home

Identify employee accounts at addresses, social security numbers,
financial institutions that have telephone numbers and bank
excess numbers of credit memos. routing and account numbers to
Excess credit memos can indicate those of vendors from vendor
diversion of funds into employee master file. This test can reveal
accounts. bogus or improperly selected vendor
accounts.
Deviation Analysis: Fraud Detection

https://www.insurancebusinessmag.com/asia/news/breaking-news/malaysias-antifraud-system-operational-by-october-74933.aspx
Profiteering Cases

https://www.freemalaysiatoday.com/category/nation/2018/08/25/yes-keep-receipts-to-fight-profit
eering-say-retailers/

Yes, keep receipts to fight profiteering, say retailers

Robin Augustin -August 25, 2018 8:00 AM
http://english.astroawani.com/malaysia-news/gst-1-256-profiteering-
cases-detected-1-115-notices-issued-till-june-5-61853
References

1. Tan, Steinbach, Karpatne, Kumar, Lecture Notes, Chapter 1, Introduction to Data

Mining, 2nd Edition, 2018
2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data Mining,
Addison Wesley, 2019.
3. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 3rd
Edition, Morgan Kaufmann, 2012.
4. Coenen, F. Data mining: past, present and future. Knowledge Engineering Review,
26(1), 25-29, 2011
5. Gregory Piatetsky-Shapiro, Data Science: Past, Present, and Future KDnuggets 1©
Kdnuggets, 2016

Andrea Cordell - Ian Thompson - The Category Management Handbook-Routledge (2018)
100% (1)
Andrea Cordell - Ian Thompson - The Category Management Handbook-Routledge (2018)
247 pages
Brand Strategy Guide for Facilitators
100% (7)
Brand Strategy Guide for Facilitators
35 pages
Valley Bread Inc
No ratings yet
Valley Bread Inc
29 pages
Fitbit Case Analysis - Edited
100% (1)
Fitbit Case Analysis - Edited
7 pages
Flutter E-Commerce App Guide
100% (1)
Flutter E-Commerce App Guide
14 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Ch2 DTasks
No ratings yet
Ch2 DTasks
44 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Foundations of Data Science - Unit 3
No ratings yet
Foundations of Data Science - Unit 3
18 pages
Data Mining: Introduction: Lecture Notes For Chapter 1
No ratings yet
Data Mining: Introduction: Lecture Notes For Chapter 1
32 pages
Knowledge Discovery & Data Mining
No ratings yet
Knowledge Discovery & Data Mining
30 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
MGT657 Chapter 7
No ratings yet
MGT657 Chapter 7
49 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Data Mining Slides
No ratings yet
Data Mining Slides
65 pages
Data Mining and Warehousing: - Module 1 - Introduction
No ratings yet
Data Mining and Warehousing: - Module 1 - Introduction
29 pages
Instructor:: Doaa Adil Mohamed Altayeb
No ratings yet
Instructor:: Doaa Adil Mohamed Altayeb
34 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
30 pages
Topic 1c - Tasks and Techniques of DM
No ratings yet
Topic 1c - Tasks and Techniques of DM
24 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Data Mining
No ratings yet
Data Mining
37 pages
Data Mining for Aspiring Analysts
No ratings yet
Data Mining for Aspiring Analysts
36 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
DM in Marketing
No ratings yet
DM in Marketing
14 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
38 pages
IME 672-Chapter 1 PDF
No ratings yet
IME 672-Chapter 1 PDF
41 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Data Mining for Business Students
No ratings yet
Data Mining for Business Students
75 pages
Data Mining Essentials Guide
No ratings yet
Data Mining Essentials Guide
23 pages
Data Mining
100% (13)
Data Mining
25 pages
2a. Basic Data Mining Techniques
No ratings yet
2a. Basic Data Mining Techniques
39 pages
Yes Bank Final
No ratings yet
Yes Bank Final
107 pages
Data Mining Basics for Beginners
100% (1)
Data Mining Basics for Beginners
7 pages
Data Mining
No ratings yet
Data Mining
23 pages
3 Data Mining
No ratings yet
3 Data Mining
58 pages
Business Plan I
No ratings yet
Business Plan I
56 pages
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
52 pages
Big Data 4 (3 - 4)
No ratings yet
Big Data 4 (3 - 4)
13 pages
Introduction
No ratings yet
Introduction
29 pages
02 - Data Mining
No ratings yet
02 - Data Mining
27 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
L1 Intro
No ratings yet
L1 Intro
32 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
43 pages
MBA Internship Report on Dabur
No ratings yet
MBA Internship Report on Dabur
46 pages
Automotive Servicing Grade 7-8 Environment and Market - Identify The Players Competitors Within The Town
No ratings yet
Automotive Servicing Grade 7-8 Environment and Market - Identify The Players Competitors Within The Town
4 pages
Luxury Brand M&A Case
No ratings yet
Luxury Brand M&A Case
16 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
Research Paper
No ratings yet
Research Paper
29 pages
Product Strategy
No ratings yet
Product Strategy
37 pages
Annual Report 2022
No ratings yet
Annual Report 2022
190 pages
Morningbrew Newsletter Ux Checklist
No ratings yet
Morningbrew Newsletter Ux Checklist
11 pages
CRM Descriptive Analytics Guide
No ratings yet
CRM Descriptive Analytics Guide
33 pages
Datamining ch1
No ratings yet
Datamining ch1
24 pages
Copy Writing (Tybmm) : Claude Hopkins
No ratings yet
Copy Writing (Tybmm) : Claude Hopkins
14 pages
Data Mining Slide
No ratings yet
Data Mining Slide
35 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Chapter III Marketing Aspect
No ratings yet
Chapter III Marketing Aspect
7 pages
Data Mining
No ratings yet
Data Mining
33 pages
2020 Corportate Sustainability Report
No ratings yet
2020 Corportate Sustainability Report
22 pages
Avon Policy Book Isr
No ratings yet
Avon Policy Book Isr
17 pages
DMlecture 1
No ratings yet
DMlecture 1
39 pages
Case Study - Blackberry Hill Farm Solve by Mariam Gelashvili
No ratings yet
Case Study - Blackberry Hill Farm Solve by Mariam Gelashvili
7 pages
Data Mining 4545
No ratings yet
Data Mining 4545
20 pages
COEN413 Machine Learning-2
No ratings yet
COEN413 Machine Learning-2
38 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Fakulteti I Shkencave Kompjuterike: Lënda
No ratings yet
Fakulteti I Shkencave Kompjuterike: Lënda
58 pages
MBA Exam Schedule June 2022
No ratings yet
MBA Exam Schedule June 2022
2 pages
11 Steps To Starting A Clothing Store
No ratings yet
11 Steps To Starting A Clothing Store
2 pages
Pengajuan Analisa Marketing Mix-4p Terhadap Purcha
No ratings yet
Pengajuan Analisa Marketing Mix-4p Terhadap Purcha
8 pages
Full Download Ebook PDF Advertising Principles and Practice Custom Edition Ebook PDF
100% (51)
Full Download Ebook PDF Advertising Principles and Practice Custom Edition Ebook PDF
42 pages
Data Mining
No ratings yet
Data Mining
7 pages
Credit Policy Solutions for Stanley
100% (1)
Credit Policy Solutions for Stanley
14 pages
Data Mining
No ratings yet
Data Mining
69 pages
Chen Jiangpei
No ratings yet
Chen Jiangpei
38 pages
MRK644 (Fall 2024) - Digital Marketing Audit Assignment Instructions
No ratings yet
MRK644 (Fall 2024) - Digital Marketing Audit Assignment Instructions
22 pages
Data Management
No ratings yet
Data Management
36 pages
3 DM
No ratings yet
3 DM
36 pages
International Business Assignment
No ratings yet
International Business Assignment
12 pages
Chap1 Intro
No ratings yet
Chap1 Intro
28 pages
Data Mining, Data Wharehousing and Olap
No ratings yet
Data Mining, Data Wharehousing and Olap
33 pages
GC Marketing Management
No ratings yet
GC Marketing Management
89 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
Zomato Case Study
No ratings yet
Zomato Case Study
24 pages
Business Assessment
No ratings yet
Business Assessment
2 pages

Topic 1c - Tasks & Techniques

Uploaded by

Topic 1c - Tasks & Techniques

Uploaded by

Topic 1c:

To introduce about To discuss the history, To discuss Data Mining

DM: Tasks and Techniques Data mining

Classificati Find a model for class attribute as a

1 Yes Single 125K No No Single 75K ?

7 Yes Divorced 220K No

Goal: Reduce cost of mailing by targeting a set of

Goal: To predict whether a customer is likely to be lost to a

• Data points in one cluster are more similar to one

Clusterin • Data points in separate clusters are less similar to one

• Euclidean Distance if attributes are continuous.

Goal: subdivide a market into distinct subsets of customers

Segment 3: high duration of

Segment 5: very low data

Goal: To find groups of documents that are similar to each

et Shelf A classic rule

Predict a value of a given

Predicting sales amounts of Predicting wind velocities as

Deviations are often infrequent

Modifications of classification, clustering, time series analysis can be used as a means

Outlier detection in statistics

Compare employee home

Yes, keep receipts to fight profiteering, say retailers

1. Tan, Steinbach, Karpatne, Kumar, Lecture Notes, Chapter 1, Introduction to Data

You might also like