0% found this document useful (0 votes)

8 views24 pages

04-Data Maining Classification Decision Trees

Decision trees are a classification methodology that uses a hierarchical tree-like structure to model decisions based on feature variables. The process involves recursively splitting training examples based on attributes to classify new instances, with various algorithms like CART, ID3, and C4.5 employed to determine the best attributes for splitting using criteria such as Information Gain and Gini Index. The document also discusses challenges like overfitting and the high branching problem, along with methods to mitigate these issues.

Uploaded by

Ahmed Ajebli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views24 pages

04-Data Maining Classification Decision Trees

Uploaded by

Ahmed Ajebli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Classification

Decision Trees

EL Moukhtar ZEMMOURI
ENSAM-Meknès
2023-2024

What is a decision tree

• Decision trees are a classification methodology (classifier)

• The classification process is modeled with :
• a set of hierarchical decisions on the feature variables (attributes)
• Arranged in a tree-like structure
E. Zemmouri

2
What is a decision tree

• A decision tree is a tree composed of nodes, branches and leaves.

• Each node contain a test (condition) on one or more attributes.
• è The split criterion
• At each node, the condition is chosen to split training examples into distinct classes
as much as possible
• Each branch of a node represents an outcome of the test
• Example : Color=red, Color=blue, Color=green (3 branches)

• A leaf node represents a class label (class value).

E. Zemmouri
• A new instance is classified by following a matching path to a leaf node.

What is a decision tree

• Example : credit approval

• With attributes : Marital status, gender, age, has children
• New customers :
• (Single, Male, 25, No, ?)
Marital status
• (Married, Female, 35, Yes, ?)
Single Married Divorced

Age Yes Has Children

E. Zemmouri

30 < < 30 yes No

Yes No No Yes
4
Classification
Decision Tree Induction

Building a Decision Tree

• Top-down tree construction : recursive divide-and-conquer
• At start, all training examples are at the root node
• Select one attribute for root node and create corresponding branches
• Split instances into subsets, one subset for each branch
• Repeat recursively for each branch
• Stop the growth of the tree based on a stopping criterion and create a leaf
• A simple stopping criterion : if all instances on a branch have the same class (overfitting !)

• Problem :
E. Zemmouri

• How to choose the best splitting attribute to test in a node?

6
Building a Decision Tree

• Algorithms:
• CART : Breiman et al. 1984
• ID3 : Quinlan 1986
• Support Only nominal attributes
• C4.5 and C5 : Quinlan 1993
• ID3 successor
• Nominal and Numeric attributes supported

E. Zemmouri
7

Split Criteria

• At each node, available attributes are evaluated on the basis of

separating the classes of the training examples.
• A Goodness Function is used for this purpose.
• Split criterion
• Typical goodness functions:
• Information Gain (ID3/C4.5)
• Information Gain Ratio
E. Zemmouri

• Gini Index (CART)

• …
8
Choosing the Splitting Attribute
• Example : weather dataset
• Witch attribute to select first ?

Day Outlook Temp Humidity Windy Play

x1 Sunny Hot High False No

x2 Sunny Hot High True No
x3 Overcast Hot High False Yes

x4 Rainy Mild High False Yes

x5 Rainy Cool Normal False Yes

x6 Rainy Cool Normal True No
x7 Overcast Cool Normal True Yes

x8 Sunny Mild High False No

x9 Sunny Cool Normal False Yes

E. Zemmouri
x10 Rainy Mild Normal False Yes

x11 Sunny Mild Normal True Yes

x12 Overcast Mild High True Yes

x13 Overcast Hot Normal False Yes

9
x14 Rainy Mild High True No

Entropy and Information gain

• Entropy :
• Theory developed by Claude Shannon (1916-2001)
• Information as a quantity measured in bits
• Given a probability distribution, the information required to predict an event
is the distribution’s entropy
• Entropy formula :
• ! = #! , #" , … , ## a discrete random variable
E. Zemmouri

• The entropy & of ! is given by

• & ! = − ∑#$%! ) #$ *+,& ) #$ = − ∑#$%! -$ *+,& -$
10
Entropy and Information gain

• Entropy of a dataset :
• Let !, . be a set of n labeled examples
• Each data point ! ∈ # is labeled by a class y ∈ %
• Where C = {)! , … , )" } is a finite set of . predefined classes
• /! , /# , … , /" the class distribution (proportions of examples of each class )$ )

• The entropy & of ! is given by : & ! = − ∑'$%! -$ *+," -$

• Property :

E. Zemmouri
• 0 ≤ & ! ≤ 1

Entropy and Information gain

• Binary classification k = 2
• Two classes (yes/no, 1/0, true/false, +/-, …)
• & ! = −-( *+, -( − -) *+, -)
E. Zemmouri

12
Entropy and Information gain

• Example : Outlook Temp Humidity Windy Play

Sunny Hot High False No

• Entropy of weather dataset Sunny Hot High True No
Overcast Hot High False Yes
- 0
• -*+, = , -#/ = Rainy Mild High False Yes
!. !.
Rainy Cool Normal False Yes
- - 0 0
• & ! =− log − log Rainy Cool Normal True No
!. !. !. !. Overcast Cool Normal True Yes

• = 0.940 Sunny Mild High False No

Sunny Cool Normal False Yes

Rainy Mild Normal False Yes

E. Zemmouri
Sunny Mild Normal True Yes

Overcast Mild High True Yes

Overcast Hot Normal False Yes

Rainy Mild High True No

Entropy and Information gain

• Information Gain of an attribute =

• Information before split – Information after split
+$,-(."! #$ )
• !"#$ %, "! = ( % − ∑" ∈ "$%&'(($! ) ×( %$! /"
+$,- (.)
• Notes :
• Information gain increases with the average purity of the subsets that an
attribute produces
• How to chose splitting attribute?
E. Zemmouri

• Choose the one that results in greatest information gain

14
Example : weather data

• 0123 #, 456788. = ?
• : # = 0.940 ?26@
# # / /
• : #%&'())"*+&,,- = − log − log = 0.971 ?26@
. . . .

• : #%&'())"*)01234+' = −1 log 1 − 0 log 0 = 0 ?26@

/ / # #
• : #%&'())"*24$,- = − . log .
− . log .
= 0.971 ?26@
• è
• 0123 #, 456788.
5 4 5

E. Zemmouri
= 0.940 − 0.971× − 0× − 0.971×
14 14 14
= 0.246 ?26@

Example : weather data

• & ! = 0.940 89:;

• <=9> !, ?@:*++A = 0.246 89:;
• What that means?
• If we split the training dataset according
to Outlook attribute, we gain 0.246 bits of
information (insights on data) !!
E. Zemmouri

16
Example : Weather data

• Compute Information Gain for :

• Humidity, Windy, Temperature

Attribute Gain
Outlook 0.246
Humidity 0.151
Windy 0.048
Temperature 0.029

E. Zemmouri
• The best attribute is Outlook
• The one to put as root of the decision tree
17

Building the decision Tree

• Construct the decision tree recursively J

• è continuing to split
E. Zemmouri

• !"#$ %&'(&)"%)& = 0.571 , !"#$ 2#$34 = 0. 02 , !"#$ ℎ'#3#%4 = 0.971

18
Building the decision Tree

• Stop splitting when data can’t be split any more

• Note: not all leaves need to be pure; sometimes identical instances have
different classes !
• è the final decision tree

E. Zemmouri
19

Interpretation and use

• How to classify new instance ?

• (sunny, cool, normal, true)
• Attribute pertinence :
• Root attribute (outlook) is the most pertinent
• Temperature doesn’t appear in the tree
• If outlook = sunny then humidity is pertinent
• …
E. Zemmouri

20
Interpretation and use

• Rules induction :
• Example : one rule per leaf
• outlook = overcast ⇒ play = yes
• outlook = sunny and humidity = high ⇒ play = no
• …

E. Zemmouri
21

Decision trees
High branching problem
Highly branching attributes

• High branching problem:

• attributes with a large number of values
• extreme case: ID attribute
• Note : Subsets are more likely to be pure if there is a large number of
values
• è Information gain is biased towards choosing attributes with a large
number of values

E. Zemmouri
• è This may result in overfitting
• selection of an attribute that is non-optimal for prediction

Highly branching attributes

• Example Day Outlook Temp Humidity Windy Play

1 Sunny Hot High False No

• Weather data with Day attribute 2 Sunny Hot High True No
3 Overcast Hot High False Yes
• <=9> !, D=E = &(!) 4 Rainy Mild High False Yes

• è Information Gain is max for Day 5 Rainy Cool Normal False Yes

attribute 6 Rainy Cool Normal True No

7 Overcast Cool Normal True Yes

8 Sunny Mild High False No

9 Sunny Cool Normal False Yes

Day 10 Rainy Mild Normal False Yes
E. Zemmouri

11 Sunny Mild Normal True Yes

1 2 14
7 13 12 Overcast Mild High True Yes

13 Overcast Hot Normal False Yes

14 Rainy Mild High True No
No No … Yes … Yes No
24
Information Gain Ratio

• A modification of the information gain that reduces its bias on high-

branches attributes
• Gain ratio takes the number and size of branches into account when
choosing an attribute
• It corrects the information gain by taking the intrinsic information of a split
into account
• è how much info do we need to tell which branch an instance belongs to

E. Zemmouri
25

Information Gain Ratio

• Gain ratio (Quinlan 1986) normalizes information gain by :

2345(7, 35 )
• HIJKLIMJN O, I1 =
:;<4=>5?@ (7 , 35 )

• Where :
GDHI J76 89
• <=9> !, =A = & ! − ∑B ∈ BDEF+, D6 ×& !D6 %B
GDHI J

GDHI J76 89 GDHI J76 89

• Q-*9:R>S+ !, =A = − ∑B ∈ BDEF+, D6 ×*+,
GDHI J GDHI J
E. Zemmouri

• è Importance of attribute decreases as split information gets larger

26
Information Gain Ratio

• Example :
• Day attribute :
• 0123 #, L1M = : # = 0.940 ?26@
! !
• N/726O3P8 #, L1M = −14× !:
log !:
= 3.807 ?26@
;.=:;
• 0123S1628 #, L1M = = 0.246
/.>;?

• Outlook attribute :
• 0123 #, 856788. = 0.246 ?26@
. . : : . .

E. Zemmouri
• N/726O3P8 #, 856788. = − !: log !:
− !: log !:
− !: log !:
= 1.577 ?26@
;.#:@
• 0123S1628 #, 85678. = = 0.156
!..??

Information Gain Ratio

Attribute Gain Split Info Gain Ratio

Outlook 0.246 1.577 0.156
Humidity 0.151 1 0.151
Windy 0.048 0.985 0.049
Temperature 0.029 1.362 0.021

• Choose attribute that has the best Gain Ratio

• Note that Day ID is manually eliminated !!
E. Zemmouri

28
Information Gain Ratio : some issues !

• Outlook attribute still has good gain ratio

• But day attribute has greater gain ratio
• Standard fix: manually eliminate identifiers to prevent splitting on that type of
attribute
• Problem with gain ratio: it may overcompensate
• May choose an attribute just because its intrinsic information is very low
• Standard fix:

E. Zemmouri
• First, only consider attributes with greater than average information gain
• Then, compare them on gain ratio

Decision trees
Gini Index
Gini Index

• Gini index of a dataset %, F :

• Each data point # ∈ ! is labeled by a class E ∈ U

• Where C = {c! , … , cK } is a finite set of A predefined classes

• -$ is the proportion of examples of class [$

• The Gini Index is :

E. Zemmouri
2
GHIH J = K − L M30
0/1
31

Gini Split
• Gini index after splitting based on attribute "! :

• Is the weighted average of the Gini Index values of each subset !D6 %B

GDHI J76 89
• <9>9Q-*9: !, =A = ∑B ∈ BDEF+, D6 ×<9>9 !D6 %B
GDHI J

• The attribute with the smallest GiniSplit is chosen to split the data at
a given node.
E. Zemmouri

• The CART algorithm uses the Gini Index as the split criterion
32
Gini Index vs Entropy
10.2. FEATURE SELECTION FOR CLASSIFICATION
1

• The two metrics achieves a similar goal

0.9

0.8

0.7

• è measure the discriminative power of

CRITERION VALUE
0.6

0.5

a particular feature (attribute) 0.4

0.3

0.2

GINI INDEX
0.1
ENTROPY

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FRACTION OF FIRST CLASS

E. Zemmouri
Figure 10.1: Variation of two feature selection criteria with class distributio

vi belong to the same class, then the Gini index is 0. Therefore, lower values
index imply greater discrimination. An example of the Gini index 33 for a two-cla
for varying values of p1 is illustrated in Fig. 10.1. Note that the index takes on it
value at p1 = 0.5.
The value-specific Gini index is converted into an attributewise Gini index
the number! of data points that take on the value vi for the attribute. Then, fo
r
containing i=1 ni = n data points, the overall Gini index G for the attribute i
the weighted average over the different attribute values as follows:
r
"
G= ni G(vi )/n.
i=1

Lower values of the Gini index imply greater discriminative power. The Gini in
cally defined for a particular feature rather than a subset of features.

10.2.1.2 Entropy
The class-based entropy measure is related to notions of information gain res
fixing a specific attribute value. The entropy measure achieves a similar goal
index at an intuitive level, but it is based on sound information-theoretic pr
before, let pj be the fraction of data points belonging to the class j for attribu
Then, the class-based entropy E(vi ) for the attribute value vi is defined as follo

DecisionE(vtrees
)=−
"
p log (p ). i
k

j 2 j
j=1

C4.5 Algorithm
The class-based entropy value lies in the interval [0, log2 (k)]. Higher values of
imply greater “mixing” of different classes. A value of 0 implies perfect separ
therefore, the largest possible discriminative power. An example of the entropy
class problem with varying values of the probability p1 is illustrated in Fig.
the case of the Gini index, the overall entropy E of an attribute is defined as t
How to deal with numeric attributes?

C4.5

• C4.5 innovations (Quinlan):

• permit numeric attributes
• deal sensibly with missing values
• pruning to deal with noisy data
• C4.5 is one of best-known and most widely-used learning algorithms
• Last research version: C4.8, implemented in Weka as J4.8 (Java)
• Commercial successor: C5.0
E. Zemmouri

36
Numeric attributes

• Simple and standard method : binary splits

• Example : temperature < 30
• But, unlike nominal attributes, a numeric attribute has many possible
splitting points
• Solution is a straightforward extension :
• Evaluate Information Gain (or other measure) for every possible split point of
the attribute
• Choose “best” split point

E. Zemmouri
• Information Gain for best split point is info gain for attribute
• Computationally more demanding
37

Numeric attributes

• Efficient computation of Information Gain :

• Sort instances of the dataset !, . by the values of the numeric attribute
• Evaluate entropy only between points of different classes
• Breakpoints between values of the same class cannot be optimal (Fayyad & Irani, 1992)
• Split points can be placed between values or directly at values
E. Zemmouri

38
Numeric attributes : Gain computation
• Example : Temperature attribute
Temp 30 26 28 21 18 16 15 23 20 24 24 23 27 22
Play No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

• Sort examples : X
Temp 15 16 18 20 21 22 23 23 24 24 26 27 28 30
Play Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No

• Evaluate info gain at each breakpoint (example 21.5) : Split Point Gain
• % &, ()*+ ≤ 21.5 =
! !
− " log "
# #
− " log " = 0.722 89:; S = 15.5 0.047
S = 17 0.010
" " ! !
• % &, ()*+ ≥ 21.5 = − $ log $
− $ log $
= 0.991 89:; S = 21.5 0.045

E. Zemmouri
" $ S = 25 0.024
• >?9@ &, ()*+, ; = 21.5 = 0.940 − #! ×0.722 − #! ×0.991 = 0.045 89:;
S = 26.5 0.0002
• Finally choose the breakpoint that gives the best gain.
S = 29 0.113
39

Decision trees
Stopping, Overfitting and Pruning
Stopping Criterion

• When the size of the decision tree increases, il may overfit

• è it may generalize poorly to unseen test instances
• The stopping criterion is generally related to the pruning strategy

E. Zemmouri
41

Pruning
• Goal : prevent overfitting to noise in the data
• Method : remove overgrown subtrees that do not improve the expected accuracy
on new data.
• Example : The contact lenses data
E. Zemmouri

42
Pruning

• Two strategies for pruning the decision tree:

• Postpruning : take a fully-grown decision tree and discard unreliable parts

• Prepruning : stop growing a branch when information becomes unreliable

• Postpruning is preferred in practice

• prepruning can “stop too early”

E. Zemmouri
43

Decision trees
Missing Values
Missing values

• Missing values may be estimated during the data cleaning phase

• Classification can be used for missing values estimation
• Challenge : more than one attribute can have missing values

E. Zemmouri
45

Missing values

• C4.5 handle missing values in the algorithm (denoted “?”)

• Simple idea:
• Treat missing as a separate value of the attribute
• This may not be good if values are missing due to different reasons
• Example : attribute pregnant=missing for a male patient should be treated
differently (no) than for an adult female patient (unknown)
E. Zemmouri

46
Summary

To Sum Up
• Algorithms for top-down induction of decision trees
• ID3, C4.5, CART
• Measures for choosing the Splitting Attribute
• Information Gain : biased towards attributes with a large number of values
• ID3 and C4.5
• Gain Ratio takes number and size of branches into account
• Gini Index
• CART
• There are many other attribute selection criteria, but almost no difference in accuracy of result !
• ID3 vs C4.5
E. Zemmouri

• ID3 process only nominal attributes

• C4.5 process nominal as well as numeric attributes, deal with missing values and noisy data.
48

Decision Tree
100% (4)
Decision Tree
66 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
30 pages
Decision Tree Learning Basics
No ratings yet
Decision Tree Learning Basics
36 pages
Ada Practical
50% (2)
Ada Practical
59 pages
Decision Tree
No ratings yet
Decision Tree
36 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
8 Classification
No ratings yet
8 Classification
82 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Unit 5 Decision Tree2
No ratings yet
Unit 5 Decision Tree2
40 pages
Trees
No ratings yet
Trees
78 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Jdavis Indlearn2
No ratings yet
Jdavis Indlearn2
91 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
3 Decision Trees - LMS
No ratings yet
3 Decision Trees - LMS
47 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Unit 1 Classification & Prediction DM
No ratings yet
Unit 1 Classification & Prediction DM
71 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
61 pages
DM 4
No ratings yet
DM 4
68 pages
DM Lect8
No ratings yet
DM Lect8
56 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
CSE445 NSU Week - 4
No ratings yet
CSE445 NSU Week - 4
48 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Lecture 11 Classification-1
No ratings yet
Lecture 11 Classification-1
30 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Decision Tree 9146388
No ratings yet
Decision Tree 9146388
30 pages
Decision Tree Intro MDT903
No ratings yet
Decision Tree Intro MDT903
40 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
ML - 4
No ratings yet
ML - 4
58 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
DM UNIT 4b (1R ALGO)
No ratings yet
DM UNIT 4b (1R ALGO)
39 pages
07.2.decision Trees - ML
No ratings yet
07.2.decision Trees - ML
32 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Classification: Decision Trees
No ratings yet
Classification: Decision Trees
30 pages
Business Analytics & Machine Learning: Decision Tree Classifiers
No ratings yet
Business Analytics & Machine Learning: Decision Tree Classifiers
60 pages
Data Classification Basics
No ratings yet
Data Classification Basics
34 pages
Unit 3 - Classification
No ratings yet
Unit 3 - Classification
28 pages
Module 5 Notes
No ratings yet
Module 5 Notes
8 pages
Classification - Decision Tree
No ratings yet
Classification - Decision Tree
32 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
ID3 Lecture4
No ratings yet
ID3 Lecture4
25 pages
Learning Analytics
No ratings yet
Learning Analytics
56 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Unit 3 Part 2
No ratings yet
Unit 3 Part 2
21 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Clase12 13
No ratings yet
Clase12 13
15 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
14 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Construction of Decision Tree Attribute Selection Measures
No ratings yet
Construction of Decision Tree Attribute Selection Measures
5 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Module 3 Topic 3 Lesson 2B Weighted Graphs PDF
No ratings yet
Module 3 Topic 3 Lesson 2B Weighted Graphs PDF
14 pages
UNIT-2 Stream Ciphers and Block Ciphers: Cryptography and Network Security (3161606)
No ratings yet
UNIT-2 Stream Ciphers and Block Ciphers: Cryptography and Network Security (3161606)
32 pages
MILP Models For Energy-Aware Flexible Job Shop Scheduling Problem
No ratings yet
MILP Models For Energy-Aware Flexible Job Shop Scheduling Problem
22 pages
Matrix Analysis and Application Xianda Zhang Download
No ratings yet
Matrix Analysis and Application Xianda Zhang Download
83 pages
Analysis and Design of Algorithms - Handout
No ratings yet
Analysis and Design of Algorithms - Handout
32 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
21 pages
Algorithm Complexity Basics
No ratings yet
Algorithm Complexity Basics
5 pages
Chapter - 3 - Quiz 2
No ratings yet
Chapter - 3 - Quiz 2
8 pages
NA Report - H. M. J. Raheem
No ratings yet
NA Report - H. M. J. Raheem
7 pages
Section 2 Block Diagrams & Signal Flow Graphs
No ratings yet
Section 2 Block Diagrams & Signal Flow Graphs
62 pages
Q1 - 1.1 - Common Monomial Factor
No ratings yet
Q1 - 1.1 - Common Monomial Factor
18 pages
hw02 Solution PDF
No ratings yet
hw02 Solution PDF
9 pages
Mastering Algorithm Complexity
No ratings yet
Mastering Algorithm Complexity
20 pages
Ibrahim Chapter6c
No ratings yet
Ibrahim Chapter6c
15 pages
A New Global Solver For Transmission Expansion Planning With AC Network Model
No ratings yet
A New Global Solver For Transmission Expansion Planning With AC Network Model
12 pages
IF4071 Model
No ratings yet
IF4071 Model
10 pages
Knapsack Problem: Truck - 10t Capacity
No ratings yet
Knapsack Problem: Truck - 10t Capacity
14 pages
Electronics 11 02932
No ratings yet
Electronics 11 02932
12 pages
DWM Experiment No.04: B.1 Software Code Written by Student: INPUT (.Arff File)
No ratings yet
DWM Experiment No.04: B.1 Software Code Written by Student: INPUT (.Arff File)
3 pages
P4 Villoria Quiñones Pablo Alberto
No ratings yet
P4 Villoria Quiñones Pablo Alberto
2 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
8 pages
Integration On Acceleration Signals by Adjusting With Envelopes
No ratings yet
Integration On Acceleration Signals by Adjusting With Envelopes
5 pages
Lecture Text 2 - Machine Learning Algorithms and Techniques
No ratings yet
Lecture Text 2 - Machine Learning Algorithms and Techniques
2 pages
CIE 115 Lesson 4
No ratings yet
CIE 115 Lesson 4
5 pages
Numerical Methods in Civil Engineering: Instructions To Candidates
No ratings yet
Numerical Methods in Civil Engineering: Instructions To Candidates
2 pages
Entropy Coding - Wikipedia
No ratings yet
Entropy Coding - Wikipedia
2 pages
Random - Forest - Classification - Ipynb - Colab
No ratings yet
Random - Forest - Classification - Ipynb - Colab
3 pages
AMS-312301 Practice Questions For Test
No ratings yet
AMS-312301 Practice Questions For Test
1 page
Exercise - MLR - Colaboratory
No ratings yet
Exercise - MLR - Colaboratory
2 pages
Bsce 2D Group 9 Seatwork Solutions
No ratings yet
Bsce 2D Group 9 Seatwork Solutions
4 pages

04-Data Maining Classification Decision Trees

Uploaded by

04-Data Maining Classification Decision Trees

Uploaded by

Classification

What is a decision tree

• Decision trees are a classification methodology (classifier)

• A decision tree is a tree composed of nodes, branches and leaves.

• A leaf node represents a class label (class value).

What is a decision tree

• Example : credit approval

Age Yes Has Children

30 < < 30 yes No

Building a Decision Tree

• How to choose the best splitting attribute to test in a node?

• At each node, available attributes are evaluated on the basis of

• Gini Index (CART)

Day Outlook Temp Humidity Windy Play

x1 Sunny Hot High False No

x4 Rainy Mild High False Yes

x5 Rainy Cool Normal False Yes

x8 Sunny Mild High False No

x9 Sunny Cool Normal False Yes

x11 Sunny Mild Normal True Yes

x12 Overcast Mild High True Yes

x13 Overcast Hot Normal False Yes

Entropy and Information gain

• The entropy & of ! is given by

• The entropy & of ! is given by : & ! = − ∑'$%! -$ *+," -$

Entropy and Information gain

• Example : Outlook Temp Humidity Windy Play

Sunny Hot High False No

• = 0.940 Sunny Mild High False No

Sunny Cool Normal False Yes

Rainy Mild Normal False Yes

Overcast Mild High True Yes

Overcast Hot Normal False Yes

Entropy and Information gain

• Information Gain of an attribute =

• Choose the one that results in greatest information gain

• : #%&'())"*)01234+' = −1 log 1 − 0 log 0 = 0 ?26@

Example : weather data

• & ! = 0.940 89:;

• Compute Information Gain for :

Building the decision Tree

• Construct the decision tree recursively J

• !"#$ %&'(&)"%*)& = 0.571 , !"#$ 2#$34 = 0. 02 , !"#$ ℎ*'#3#%4 = 0.971

• Stop splitting when data can’t be split any more

Interpretation and use

• How to classify new instance ?

• High branching problem:

Highly branching attributes

• Example Day Outlook Temp Humidity Windy Play

1 Sunny Hot High False No

attribute 6 Rainy Cool Normal True No

8 Sunny Mild High False No

9 Sunny Cool Normal False Yes

11 Sunny Mild Normal True Yes

13 Overcast Hot Normal False Yes

• A modification of the information gain that reduces its bias on high-

Information Gain Ratio

• Gain ratio (Quinlan 1986) normalizes information gain by :

GDHI J76 89 GDHI J76 89

• è Importance of attribute decreases as split information gets larger

Information Gain Ratio

Attribute Gain Split Info Gain Ratio

• Choose attribute that has the best Gain Ratio

• Outlook attribute still has good gain ratio

• Gini index of a dataset %, F :

• Where C = {c! , … , cK } is a finite set of A predefined classes

• -$ is the proportion of examples of class [$

• The Gini Index is :

• The two metrics achieves a similar goal

• è measure the discriminative power of

a particular feature (attribute) 0.4

• C4.5 innovations (Quinlan):

• Simple and standard method : binary splits

• Efficient computation of Information Gain :

• When the size of the decision tree increases, il may overfit

• Two strategies for pruning the decision tree:

• !"#$ %&'(&)"%)& = 0.571 , !"#$ 2#$34 = 0. 02 , !"#$ ℎ'#3#%4 = 0.971