Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views33 pages

Data Mining - Lecture 5

Data Mining - Lecture 5

Uploaded by

hendymostafa256
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views33 pages

Data Mining - Lecture 5

Data Mining - Lecture 5

Uploaded by

hendymostafa256
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Data Mining and Business Intelligence

Attribute
Selection
Measures
Tree
Classification Pruning

Extracting
By Rules

Dr. Nora Shoaip

Lecture 5

Damanhour University
Faculty of Computers & Information Sciences
Department of Information Systems

2024- 2025
Outline
 The Basics
• What is Classification?
• General Approach

 Decision Tree Induction


• The Algorithm
• Attribute Selection Measures
• Tree Pruning
• Extracting Rules from Decision Trees

2
The Basics: What is Classification

 Motivation: Prediction
 Is a bank loan applicant “safe” or “risky”?
 Which treatment is better for patient, “treatmentX” or “treatmentY”?

 Classification is a data analysis task where a model is


constructed to predict class labels (categories)

3
The Basics: General Approach

 A two-step process:
 Learning (training) step  construct classification model
 Build classifier for a predetermined set of classes
 Learn from a training dataset (data tuples + their associated classes)  Supervised
Learning
 Classification step  model is used to predict class labels for given data (test
set)

4
The Basics: General Approach

Attribute Class
vector label

Classification
Training
data Algorithm

Classification
rules

5
The Basics: General Approach

Classification
rules

Estimate classifier Predict classification


accuracy (to avoid of new data
overfitting)

Test (Mohammed, youth, medium)


data Loan decision?

% test set tuples correctly classified


Risky
6
Outline
 The Basics
• What is Classification?
• General Approach

 Decision Tree Induction


• The Algorithm
• Attribute Selection Measures
• Tree Pruning
• Extracting Rules from Decision Trees

7
Decision Tree Induction
 Learning of decision trees from training
dataset
 Decision tree  A flowchart-like tree
structure
 Internal node  a test on an attribute
 Branch  a test outcome
 Leaf node  a class label
 Constructed tree can be binary or otherwise

8
Decision Tree Induction

Benefits
 No domain knowledge required
 No parameter setting
 Can handle multidimensional data
 Easy-to-understand representation
 Simple and fast

9
Decision Tree Induction :The Algorithm
N All of D

Training Dataset
D
No Yes
Classification
Attribute List
Algorithm Attribute
Selection Method
Attribute
Selection Method
Splitting attribute and
Split point(s) or
Splitting subsets
10
Decision Tree Induction :The Algorithm
Splitting Criterion

Outcome 1 Outcome n
Training Dataset
Partition 1 Partition n
D

Classification
Attribute List
Algorithm

No Yes
Attribute
Selection Method
Attribute
Selection Method

11
Decision Tree Induction :The Algorithm
Splitting
Attribute Splitting Criterion

Outcome 1 Outcome n
Discrete
Partition 1 Partition n

Continuous

No Yes
Discrete

Attribute
Binary
Tree Selection Method

12
Decision Tree Induction :The Algorithm

Splitting Criterion is a test:


 Which attribute to test at node N  What is the “best” way to
partition D into mutually exclusive classes
 which (and how many) branches to grow from node N to
represent the test outcomes
 Resulting partitions at each branch should be as “pure” as possible
 A partition is “pure” if all its tuples belong to the same class
 When attribute is chosen to split training data set, it’s removed
from attribute list

13
Decision Tree Induction :The Algorithm

Terminating conditions
 All the tuples in D (represented at node N) belong to the same
class
 There are no remaining attributes on which the tuples may be
further partitioned
 majority voting is employed  convert node into a leaf and label it with the most
common class in data partition

 There are no tuples for a given branch


 a leaf is created with the majority class in data partition

14
Decision Tree Induction :
Attribute Selection Measures
 Attribute selection measure  a heuristic for selecting the splitting
criterion that “best” splits a given data partition into smaller mutually
exclusive classes
 Attributes are ranked according to a measure
 attribute having the best score is chosen as the splitting attribute
 split-point for continuous attributes
 splitting subset for discrete attributes with binary trees

 Measures: Information Gain, Gain Ratio, Gini Index


15
Decision Tree Induction :
Attribute Selection Measures
 Information Gain
 Based on Shannon’s information theory
 Goal is to minimize the expected number of tests needed to classify a tuple
 guarantee that a simple tree is found
 Attribute with the highest information gain is chosen as the splitting attribute
 minimizes information needed to classify tuples in resulting partitions
 reflects least “impurity” in resulting partitions

16
Decision Tree Induction :
Attribute Selection Measures

17
Decision Tree Induction :
Attribute Selection Measures

18
Decision Tree Induction :
Attribute Selection Measures

19
Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
C1 = Yes = 9 C2 = No = 5
1 youth high no fair no

2 youth high no excellent no

3 middle aged high no fair yes

4 senior medium no fair yes

5 senior low yes fair yes

6 senior low yes excellent no

7 middle aged low yes excellent yes

8 youth medium no fair no

9 youth low yes fair yes

10 senior medium yes fair yes

11 youth medium yes excellent yes

12 middle aged medium no excellent yes

13 middle aged high yes fair yes

14 senior medium no excellent no

20
Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
C1 = Yes = 9 C2 = No = 5
1 youth high no fair no

2 youth high no excellent no

3 middle aged high no fair yes

4 senior medium no fair yes

5 senior low yes fair yes

6 senior low yes excellent no

7 middle aged low yes excellent yes

8 youth medium no fair no

9 youth low yes fair yes

10 senior medium yes fair yes

11 youth medium yes excellent yes

12 middle aged medium no excellent yes

13 middle aged high yes fair yes

14 senior medium no excellent no

21
Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
C1 = Yes = 9 C2 = No = 5
1 youth high no fair no

2 youth high no excellent no

3 middle aged high no fair yes

4 senior medium no fair yes

5 senior low yes fair yes

6 senior low yes excellent no

7 middle aged low yes excellent yes

8 youth medium no fair no

9 youth low yes fair yes

10 senior medium yes fair yes

11 youth medium yes excellent yes

12 middle aged medium no excellent yes

13 middle aged high yes fair yes

14 senior medium no excellent no

3- compute Gain(age) = 0.94 - 0.694 = 0.246


22 bits
Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer

1 youth high no fair no

2 youth high no excellent no Gain(age) = 0.246 bits


3 middle aged high no fair yes Gain(income) = 0.029 bits
4 senior medium no fair yes Gain(student) = 0.151 bits
5 senior low yes fair yes Gain(credit_rating) = 0.048 bits
6 senior low yes excellent no

7 middle aged low yes excellent yes


Gain(age) has
8 youth medium no fair no
highest information
9 youth low yes fair yes
gain
10 senior medium yes fair yes

11 youth medium yes excellent yes

12 middle aged medium no excellent yes

13 middle aged high yes fair yes

14 senior medium no excellent no

23
Attribute Selection Measures

24
Attribute Selection Measures

Age?
youth senior
middle_aged

Student? Yes credit_rating?

no yes fair excellent

No Yes No Yes

25
Decision Tree Induction :
Attribute Selection Measures

26
Decision Tree Induction :Tree Pruning

 Data may be overfitted to dataset anomalies and outliers


 Pruning removes the least reliable branches
 DT becomes less complex
 Prepruning  statistically assess the goodness of a split
before it takes place
 hard to choose thresholds for statistical significance
 Postpruning  remove sub-trees from already constructed
trees
 remove sub-tree branches and replace with leaf node
 leaf is labeled with most frequent class in sub-tree
27
Decision Tree Induction :Tree Pruning

28
Decision Tree Induction :
Rule Extraction from a Decision Tree

29
Decision Tree Induction :
Rule Extraction from a Decision Tree
RID age income student Credit_rating Class: buys_computer

1 youth high no fair no

2 youth high no excellent no

3 middle aged high no fair yes

4 senior medium no fair yes

5 senior low yes fair yes

6 senior low yes excellent no

7 middle aged low yes excellent yes

8 youth medium no fair no

9 youth low yes fair yes

10 senior medium yes fair yes

11 youth medium yes excellent yes X: (age = youth, income = medium,


12 middle aged medium no excellent yes
student = yes, credit_rating=fair)
13 middle aged high yes fair yes

14 senior medium no excellent no


30
Decision Tree Induction :
Rule Extraction from a Decision Tree– Resolving Rules Conflicts

 Rules conflicts are the result of a tuple firing more than one rule with different class
predictions
 Two resolution strategies
 Size Ordering rule with largest antecedent (toughest) has highest priority fires and returns class prediction
 Rule Ordering rules prioritized apriori according to

 Class-based ordering  decreasing importance (most frequent are highest – order of


prevalence)
 Rule-based ordering  measures of rule quality (e.g. accuracy, size, domain expertise)

 Fallback (default) rule when no rules are triggered

31
Decision Tree Induction :
Rule Extraction from a Decision Tree– Resolving Rules Conflicts

• Create one rule for each path from root to leaf in the decision tree
1. Each splitting criterion is Added to form rule antecedent (IF)
2. Leaf node holds class prediction (THEN)

Can the rules resulting from


decision trees have conflicts?

R1: IF age =youth AND student =no THEN buys


computer =no
32

You might also like