Data Mining and Business Intelligence
Attribute
Selection
Measures
Tree
Classification Pruning
Extracting
By Rules
Dr. Nora Shoaip
Lecture 5
Damanhour University
Faculty of Computers & Information Sciences
Department of Information Systems
2024- 2025
Outline
The Basics
• What is Classification?
• General Approach
Decision Tree Induction
• The Algorithm
• Attribute Selection Measures
• Tree Pruning
• Extracting Rules from Decision Trees
2
The Basics: What is Classification
Motivation: Prediction
Is a bank loan applicant “safe” or “risky”?
Which treatment is better for patient, “treatmentX” or “treatmentY”?
Classification is a data analysis task where a model is
constructed to predict class labels (categories)
3
The Basics: General Approach
A two-step process:
Learning (training) step construct classification model
Build classifier for a predetermined set of classes
Learn from a training dataset (data tuples + their associated classes) Supervised
Learning
Classification step model is used to predict class labels for given data (test
set)
4
The Basics: General Approach
Attribute Class
vector label
Classification
Training
data Algorithm
Classification
rules
5
The Basics: General Approach
Classification
rules
Estimate classifier Predict classification
accuracy (to avoid of new data
overfitting)
Test (Mohammed, youth, medium)
data Loan decision?
% test set tuples correctly classified
Risky
6
Outline
The Basics
• What is Classification?
• General Approach
Decision Tree Induction
• The Algorithm
• Attribute Selection Measures
• Tree Pruning
• Extracting Rules from Decision Trees
7
Decision Tree Induction
Learning of decision trees from training
dataset
Decision tree A flowchart-like tree
structure
Internal node a test on an attribute
Branch a test outcome
Leaf node a class label
Constructed tree can be binary or otherwise
8
Decision Tree Induction
Benefits
No domain knowledge required
No parameter setting
Can handle multidimensional data
Easy-to-understand representation
Simple and fast
9
Decision Tree Induction :The Algorithm
N All of D
Training Dataset
D
No Yes
Classification
Attribute List
Algorithm Attribute
Selection Method
Attribute
Selection Method
Splitting attribute and
Split point(s) or
Splitting subsets
10
Decision Tree Induction :The Algorithm
Splitting Criterion
Outcome 1 Outcome n
Training Dataset
Partition 1 Partition n
D
Classification
Attribute List
Algorithm
No Yes
Attribute
Selection Method
Attribute
Selection Method
11
Decision Tree Induction :The Algorithm
Splitting
Attribute Splitting Criterion
Outcome 1 Outcome n
Discrete
Partition 1 Partition n
Continuous
No Yes
Discrete
Attribute
Binary
Tree Selection Method
12
Decision Tree Induction :The Algorithm
Splitting Criterion is a test:
Which attribute to test at node N What is the “best” way to
partition D into mutually exclusive classes
which (and how many) branches to grow from node N to
represent the test outcomes
Resulting partitions at each branch should be as “pure” as possible
A partition is “pure” if all its tuples belong to the same class
When attribute is chosen to split training data set, it’s removed
from attribute list
13
Decision Tree Induction :The Algorithm
Terminating conditions
All the tuples in D (represented at node N) belong to the same
class
There are no remaining attributes on which the tuples may be
further partitioned
majority voting is employed convert node into a leaf and label it with the most
common class in data partition
There are no tuples for a given branch
a leaf is created with the majority class in data partition
14
Decision Tree Induction :
Attribute Selection Measures
Attribute selection measure a heuristic for selecting the splitting
criterion that “best” splits a given data partition into smaller mutually
exclusive classes
Attributes are ranked according to a measure
attribute having the best score is chosen as the splitting attribute
split-point for continuous attributes
splitting subset for discrete attributes with binary trees
Measures: Information Gain, Gain Ratio, Gini Index
15
Decision Tree Induction :
Attribute Selection Measures
Information Gain
Based on Shannon’s information theory
Goal is to minimize the expected number of tests needed to classify a tuple
guarantee that a simple tree is found
Attribute with the highest information gain is chosen as the splitting attribute
minimizes information needed to classify tuples in resulting partitions
reflects least “impurity” in resulting partitions
16
Decision Tree Induction :
Attribute Selection Measures
17
Decision Tree Induction :
Attribute Selection Measures
18
Decision Tree Induction :
Attribute Selection Measures
19
Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
C1 = Yes = 9 C2 = No = 5
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
20
Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
C1 = Yes = 9 C2 = No = 5
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
21
Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
C1 = Yes = 9 C2 = No = 5
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
3- compute Gain(age) = 0.94 - 0.694 = 0.246
22 bits
Attribute Selection Measures
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no Gain(age) = 0.246 bits
3 middle aged high no fair yes Gain(income) = 0.029 bits
4 senior medium no fair yes Gain(student) = 0.151 bits
5 senior low yes fair yes Gain(credit_rating) = 0.048 bits
6 senior low yes excellent no
7 middle aged low yes excellent yes
Gain(age) has
8 youth medium no fair no
highest information
9 youth low yes fair yes
gain
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
23
Attribute Selection Measures
24
Attribute Selection Measures
Age?
youth senior
middle_aged
Student? Yes credit_rating?
no yes fair excellent
No Yes No Yes
25
Decision Tree Induction :
Attribute Selection Measures
26
Decision Tree Induction :Tree Pruning
Data may be overfitted to dataset anomalies and outliers
Pruning removes the least reliable branches
DT becomes less complex
Prepruning statistically assess the goodness of a split
before it takes place
hard to choose thresholds for statistical significance
Postpruning remove sub-trees from already constructed
trees
remove sub-tree branches and replace with leaf node
leaf is labeled with most frequent class in sub-tree
27
Decision Tree Induction :Tree Pruning
28
Decision Tree Induction :
Rule Extraction from a Decision Tree
29
Decision Tree Induction :
Rule Extraction from a Decision Tree
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes X: (age = youth, income = medium,
12 middle aged medium no excellent yes
student = yes, credit_rating=fair)
13 middle aged high yes fair yes
14 senior medium no excellent no
30
Decision Tree Induction :
Rule Extraction from a Decision Tree– Resolving Rules Conflicts
Rules conflicts are the result of a tuple firing more than one rule with different class
predictions
Two resolution strategies
Size Ordering rule with largest antecedent (toughest) has highest priority fires and returns class prediction
Rule Ordering rules prioritized apriori according to
Class-based ordering decreasing importance (most frequent are highest – order of
prevalence)
Rule-based ordering measures of rule quality (e.g. accuracy, size, domain expertise)
Fallback (default) rule when no rules are triggered
31
Decision Tree Induction :
Rule Extraction from a Decision Tree– Resolving Rules Conflicts
• Create one rule for each path from root to leaf in the decision tree
1. Each splitting criterion is Added to form rule antecedent (IF)
2. Leaf node holds class prediction (THEN)
Can the rules resulting from
decision trees have conflicts?
R1: IF age =youth AND student =no THEN buys
computer =no
32