Classification and Prediction

The document discusses classification and prediction, describing classification as predicting categorical class labels by constructing a model based on training data, while regression models continuous functions. It covers issues in classification like data preparation and model evaluation, and describes decision tree induction as a method for classification that generates trees to partition data based on attribute tests at internal nodes.

Uploaded by

Bhagirath Prajapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views14 pages

Classification and Prediction

Uploaded by

Bhagirath Prajapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 14

Classification and

Prediction
Classification and Prediction
 What is classification? What is
regression?
 Issues regarding classification and
prediction
 Classification by decision tree induction
 Scalable decision tree induction
Classification vs. Prediction
 Classification:
 predicts categorical class labels
 classifies data (constructs a model) based on the
training set and the values (class labels) in a classifying
attribute and uses it in classifying new data
 Regression:
 models continuous-valued functions, i.e., predicts
unknown or missing values
 Typical Applications
 credit approval
 target marketing
 medical diagnosis
 treatment effectiveness analysis
Why Classification? A motivating
application
 Credit approval
 A bank wants to classify its customers based on whether
they are expected to pay back their approved loans
 The history of past customers is used to train the
classifier
 The classifier provides rules, which identify potentially
reliable future customers
 Classification rule:
 If age = “31...40” and income = high then credit_rating =
excellent
 Future customers
 Paul: age = 35, income = high  excellent credit rating
 John: age = 20, income = medium  fair credit rating
Classification—A Two-Step Process
 Model construction: describing a set of predetermined
classes
 Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
 The set of tuples used for model construction: training set
 The model is represented as classification rules, decision
trees, or mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test samples is compared with the

classified result from the model

 Accuracy rate is the percentage of test set samples that

are correctly classified by the model

 Test set is independent of training set, otherwise over-

fitting will occur

Classification Process (1):
Model Construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
Classification Process (2): Use
the Model in Prediction
Accuracy=?
Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Mellisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Supervised vs. Unsupervised
Learning
 Supervised learning (classification)
 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
Issues regarding classification and
prediction (1): Data Preparation
 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data
 numerical attribute income  categorical
{low,medium,high}
 normalize all numerical attributes to [0,1)
Issues regarding classification and
prediction (2): Evaluating Classification
Methods
 Predictive accuracy
 Speed
 time to construct the model
 time to use the model
 Robustness
 handling noise and missing values
 Scalability
 efficiency in disk-resident databases
 Interpretability:
 understanding and insight provided by the model
 Goodness of rules (quality)
 decision tree size
 compactness of classification rules
Classification by Decision Tree
Induction
 Decision tree
 A flow-chart-like tree structure
 Internal node denotes a test on an attribute
 Branch represents an outcome of the test
 Leaf nodes represent class labels or class distribution
 Decision tree generation consists of two phases
 Tree construction
 At start, all the training examples are at the root

 Partition examples recursively based on selected attributes

 Tree pruning
 Identify and remove branches that reflect noise or outliers

 Use of decision tree: Classifying an unknown sample

 Test the attribute values of the sample against the decision tree
Training Dataset
age income student credit_rating buys_computer
This <=30 high no fair no
<=30 high no excellent no
follows 31…40 high no fair yes
an >40 medium no fair yes
example >40 low yes fair yes
>40 low yes excellent no
from 31…40 low yes excellent yes
Quinlan’s <=30 medium no fair no
<=30 low yes fair yes
ID3 >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Output: A Decision Tree for
“buys_computer”

age?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes
Scalable Decision Tree Induction Methods

 SLIQ (EDBT’96 — Mehta et al.)

 Builds an index for each attribute and only class list and the
current attribute list reside in memory
 SPRINT (VLDB’96 — J. Shafer et al.)
 Constructs an attribute list data structure
 PUBLIC (VLDB’98 — Rastogi & Shim)
 Integrates tree splitting and tree pruning: stop growing the
tree earlier
 RainForest (VLDB’98 — Gehrke, Ramakrishnan &
Ganti)
 Builds an AVC-list (attribute, value, class label)
 BOAT (PODS’99 — Gehrke, Ganti, Ramakrishnan &
Loh)
 Uses bootstrapping to create several small samples

Learning and Behavior 9th Edition Full Version Download
82% (11)
Learning and Behavior 9th Edition Full Version Download
17 pages
Solution For "Financial Statement Analysis" Penman 5th Edition
64% (28)
Solution For "Financial Statement Analysis" Penman 5th Edition
16 pages
Rockwool Installation Guide
100% (1)
Rockwool Installation Guide
8 pages
Review Answers: Your Answer
50% (2)
Review Answers: Your Answer
3 pages
Second Series Plays, JUSTICE by John Galsworthy
100% (1)
Second Series Plays, JUSTICE by John Galsworthy
66 pages
Classification & Prediction Guide
100% (1)
Classification & Prediction Guide
67 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
7 Classification
100% (3)
7 Classification
63 pages
Benchmarking Sox Costs, Hours and Controls
No ratings yet
Benchmarking Sox Costs, Hours and Controls
45 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
95 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Classification
No ratings yet
Classification
81 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
5G Wireless Technology: Millimeter Wave Health Effects
No ratings yet
5G Wireless Technology: Millimeter Wave Health Effects
5 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Data Classification & Prediction Guide
No ratings yet
Data Classification & Prediction Guide
38 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Classification & Prediction Guide
No ratings yet
Classification & Prediction Guide
83 pages
CH 5
No ratings yet
CH 5
84 pages
Module 04
No ratings yet
Module 04
75 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Astm C273-C273M - 19
No ratings yet
Astm C273-C273M - 19
9 pages
Daily Lesson Log of M8Al-Ib-2 (Week 2 Day 3) : Can The Difference of Two Squares Be Applicable To 3 - 12 If No, Why?
No ratings yet
Daily Lesson Log of M8Al-Ib-2 (Week 2 Day 3) : Can The Difference of Two Squares Be Applicable To 3 - 12 If No, Why?
4 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Classification
No ratings yet
Classification
33 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
98 pages
1 Preoperative
No ratings yet
1 Preoperative
67 pages
Example J.6 Base Plate Bearing On Concrete: Merican Nstitute of Teel Onstruction
100% (1)
Example J.6 Base Plate Bearing On Concrete: Merican Nstitute of Teel Onstruction
4 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
Controledge Hc900 Io Modules Specifications: 51-52-03-41, November 2019
No ratings yet
Controledge Hc900 Io Modules Specifications: 51-52-03-41, November 2019
35 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
FUN Transmissions: by Bill Brayton
No ratings yet
FUN Transmissions: by Bill Brayton
4 pages
PNL Account Cashflow Forecast: Missing Values
No ratings yet
PNL Account Cashflow Forecast: Missing Values
5 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Data Mining and Classification Basics
No ratings yet
Data Mining and Classification Basics
129 pages
Classification
No ratings yet
Classification
73 pages
Tyre Industry in India - Me Project
100% (2)
Tyre Industry in India - Me Project
17 pages
What Is Classification? What Is Prediction?
No ratings yet
What Is Classification? What Is Prediction?
36 pages
ML Classification Essentials
No ratings yet
ML Classification Essentials
50 pages
INFINITIVO Inglés
No ratings yet
INFINITIVO Inglés
20 pages
DM Unit-3
No ratings yet
DM Unit-3
23 pages
Unit 4 - Classification and Prediction
No ratings yet
Unit 4 - Classification and Prediction
72 pages
359 - EC8651 Transmission Lines and RF Systems - Anna University 2017 Regulation Syllabus
No ratings yet
359 - EC8651 Transmission Lines and RF Systems - Anna University 2017 Regulation Syllabus
2 pages
Unit 3
No ratings yet
Unit 3
53 pages
Unit-5 3161610
No ratings yet
Unit-5 3161610
92 pages
FIFA 17 Release Date Details
No ratings yet
FIFA 17 Release Date Details
3 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Clinical Microbiology MCQ Practice Test
100% (4)
Clinical Microbiology MCQ Practice Test
13 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
Azure VPN Setup for IT Professionals
No ratings yet
Azure VPN Setup for IT Professionals
19 pages
Classification-1
No ratings yet
Classification-1
48 pages
0 - Ritu Sharma Old CV
No ratings yet
0 - Ritu Sharma Old CV
2 pages
Classification and Prediction
No ratings yet
Classification and Prediction
130 pages
SanyaMidha FullStackWebDeveloper Resume
100% (1)
SanyaMidha FullStackWebDeveloper Resume
1 page
SWP01 CoreRules 7.5.24
No ratings yet
SWP01 CoreRules 7.5.24
41 pages
Ictasol
No ratings yet
Ictasol
1 page
Chapter 3
No ratings yet
Chapter 3
67 pages
Basic Technology Exam Questions For Jss2 Second Term
No ratings yet
Basic Technology Exam Questions For Jss2 Second Term
6 pages
13 Council of Student Organizations: Minutes of The Meeting
No ratings yet
13 Council of Student Organizations: Minutes of The Meeting
4 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
Unit6 - 1 Classification-and-Prediction-Basics
No ratings yet
Unit6 - 1 Classification-and-Prediction-Basics
12 pages
UPI Transactiosn Frauds in India
No ratings yet
UPI Transactiosn Frauds in India
4 pages
Classification
No ratings yet
Classification
23 pages
DW Unit 6-Min
No ratings yet
DW Unit 6-Min
44 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
UNIT-5 DWM
No ratings yet
UNIT-5 DWM
73 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Unit 11
No ratings yet
Unit 11
6 pages
Session 5
No ratings yet
Session 5
91 pages
Improvement of Supply Chain Performance of Printin
No ratings yet
Improvement of Supply Chain Performance of Printin
12 pages