0% found this document useful (0 votes)

18 views41 pages

ML - Module 2

Uploaded by

tahirnaquash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views41 pages

ML - Module 2

Uploaded by

tahirnaquash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Decision Tree Tearning

Mahesh G Huddar
Asst. Professor
CSED, HIT, Nidasoshi
Inductive inference with decision trees

 Decision Trees is one of the most widely used and

practical methods of inductive inference
 Features
 Method for approximating discrete-valued functions
(including boolean)
 Learned functions are represented as decision trees (or if-
then-else rules)
 Expressive hypotheses space, including disjunction
 Robust to noisy data

9/19/2018 Mahesh G Huddar

Decision tree representation (PlayTennis)

Outlook=Sunny, Temp=Hot, Humidity=High, Wind=Strong No

9/20/2018 Mahesh G Huddar

Decision trees expressivity
 Decision trees represent a disjunction of conjunctions on
constraints on the value of attributes:
(Outlook = Sunny  Humidity = Normal) 
(Outlook = Overcast) 
(Outlook = Rain  Wind = Weak)
When to use Decision Trees
 Problem characteristics:
 Instances can be described by attribute value pairs
 Target function is discrete valued
 Disjunctive hypothesis may be required
 Possibly noisy training data samples
 Robust to errors in training data
 Missing attribute values
 Different classification problems:
 Equipment or medical diagnosis
 Credit risk analysis
 Several tasks in natural language processing

9/20/2018 Mahesh G Huddar

Top-down induction of Decision Trees
 ID3 (Quinlan, 1986) is a basic algorithm for learning DT's
 Given a training set of examples, the algorithms for building DT
performs search in the space of decision trees
 The construction of the tree is top-down. The algorithm is greedy.
 The fundamental question is “which attribute should be tested next?
Which question gives us more information?”
 Select the best attribute
 A descendent node is then created for each possible value of this
attribute and examples are partitioned according to this value
 The process is repeated for each successor node until all the
examples are classified correctly or there are no attributes left

9/20/2018 Mahesh G Huddar

Which attribute is the best classifier?

 A statistical property called information gain, measures how

well a given attribute separates the training examples
 Information gain uses the notion of entropy, commonly used in
information theory
 Information gain = expected reduction of entropy

9/20/2018 Mahesh G Huddar

Entropy in binary classification
 Entropy measures the impurity of a collection of examples. It
depends from the distribution of the random variable p.
 S is a collection of training examples
 p+ the proportion of positive examples in S
 p– the proportion of negative examples in S
Entropy (S)  – p+ log2 p+ – p–log2 p– [0 log20 = 0]
Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0,94
Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) =
= 1/2 + 1/2 = 1 [log21/2 = – 1]
Note: the log of a number < 1 is negative, 0  p  1, 0  entropy  1

9/20/2018 Mahesh G Huddar

Entropy

9/20/2018 Mahesh G Huddar

Example

9/20/2018 Mahesh G Huddar

Entropy in general
 Entropy measures the amount of information in a random
variable
Entropy(X) = – p+ log2 p+ – p– log2 p– X = {+, –}
for binary classification [two-valued random variable]
c c
Entropy(X) = –  pi log2 pi =  pi log2 1/ pi X = {i, …, c}
i=1 i=1
for classification in c classes
Example: rolling a die with 8, equally probable, sides
8
Entropy(X) = –  1/8 log2 1/8 = – log2 1/8 = log2 8 = 3
i=1

9/20/2018 Mahesh G Huddar

Information gain as entropy reduction
 Information gain is the expected reduction in entropy caused by
partitioning the examples on an attribute.
 The higher the information gain the more effective the attribute
in classifying training data.
 Expected reduction in entropy knowing A

Gain(S, A) = Entropy(S) −  |Sv|

Entropy(Sv)
v  Values(A) |S|
Values(A) possible values for A
Sv subset of S for which A has value v

9/20/2018 Mahesh G Huddar

Example: expected information gain
 Let
 Values(Wind) = {Weak, Strong}
 S = [9+, 5−]
 SWeak = [6+, 2−]
 SStrong = [3+, 3−]
 Information gain due to knowing Wind:
Gain(S, Wind) = Entropy(S) − 8/14 Entropy(SWeak) − 6/14 Entropy(SStrong)
= 0.94 − 8/14  0.811 − 6/14  1.00
= 0.048

9/20/2018 Mahesh G Huddar

Which attribute is the best classifier?

9/20/2018 Mahesh G Huddar

First step: which attribute to test at the root?

 Which attribute should be tested at the root?

 Gain(S, Outlook) = 0.246
 Gain(S, Humidity) = 0.151
 Gain(S, Wind) = 0.084
 Gain(S, Temperature) = 0.029
 Outlook provides the best prediction for the target
 Lets grow the tree:
 add to the tree a successor for each possible value of Outlook
 partition the training samples according to the value of Outlook

9/20/2018 Mahesh G Huddar

After first step

9/20/2018 Mahesh G Huddar

Second step
 Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970  3/5  0.0  2/5  0.0 = 0.970
Gain(SSunny, Wind) = 0.970  2/5  1.0  3/5  0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970  2/5  0.0  2/5  1.0  1/5  0.0 = 0.570
 Humidity provides the best prediction for the target
 Lets grow the tree:
 add to the tree a successor for each possible value of Humidity
 partition the training samples according to the value of Humidity

9/20/2018 Mahesh G Huddar

Second and third steps

{D1, D2, D8} {D9, D11} {D4, D5, D10} {D6, D14}
No Yes Yes No

9/20/2018 Mahesh G Huddar

ID3: algorithm
ID3(X, T, Attrs) X: training examples:
T: target attribute (e.g. PlayTennis),
Attrs: other attributes, initially all attributes
Create Root node
If all X's are +, return Root with class +
If all X's are –, return Root with class –
If Attrs is empty return Root with class most common value of T in X
else
A  best attribute; decision attribute for Root  A
For each possible value vi of A:
- add a new branch below Root, for test A = vi
- Xi  subset of X with A = vi
- If Xi is empty then add a new leaf with class the most common value of T in X
else add the subtree generated by ID3(Xi, T, Attrs  {A})
return Root

9/20/2018 Mahesh G Huddar

Search space in Decision Tree learning
 The search space is made by
partial decision trees
 The algorithm is hill-climbing
 The evaluation function is
information gain
 The hypotheses space is complete
(represents all discrete-valued
functions)
 The search maintains a single
current hypothesis
 No backtracking; no guarantee of
optimality
 It uses all the available examples
(not incremental)
 May terminate earlier, accepting
noisy classes

9/20/2018 Mahesh G Huddar

Inductive bias in decision tree learning

 What is the inductive bias of DT learning?

1. Shorter trees are preferred over longer trees
Not enough. This is the bias exhibited by a simple breadth
first algorithm generating all DT's e selecting the shorter one
2. Prefer trees that place high information gain attributes close to
the root
 Note: DT's are not limited in representing all possible functions

9/20/2018 Mahesh G Huddar

Two kinds of biases
 Preference or search biases (due to the search strategy)
 ID3 searches a complete hypotheses space; the search strategy is
incomplete
 Restriction or language biases (due to the set of hypotheses
expressible or considered)
 Candidate-Elimination searches an incomplete hypotheses space; the
search strategy is complete
 A combination of biases in learning a linear combination of
weighted features in board games.

9/20/2018 Mahesh G Huddar

Prefer shorter hypotheses: Occam's rasor
 Why prefer shorter hypotheses?
 Arguments in favor:
 There are fewer short hypotheses than long ones
 If a short hypothesis fits data unlikely to be a coincidence
 Elegance and aesthetics
 Arguments against:
 Not every short hypothesis is a reasonable one.
 Occam's razor:"The simplest explanation is usually the best one."
 a principle usually (though incorrectly) attributed14th-century English
logician and Franciscan friar, William of Ockham.
 lex parsimoniae ("law of parsimony", "law of economy", or "law of
succinctness")
 The term razor refers to the act of shaving away unnecessary
assumptions to get to the simplest explanation.

9/20/2018 Mahesh G Huddar

Issues in decision trees learning
 Overfitting
 Reduced error pruning
 Rule post-pruning
 Extensions
 Continuous valued attributes
 Alternative measures for selecting attributes
 Handling training examples with missing attribute values
 Handling attributes with different costs
 Improving computational efficiency
 Most of these improvements in C4.5 (Quinlan, 1993)

9/20/2018 Mahesh G Huddar

Overfitting: definition
 Building trees that “adapt too much” to the training examples
may lead to “overfitting”.
 Consider error of hypothesis h over
 training data: errorD(h) empirical error
 entire distribution X of data: errorX(h) expected error
 Hypothesis h overfits training data if there is an alternative
hypothesis h'  H such that
errorD(h) < errorD(h’) and
errorX(h’) < errorX(h)
i.e. h’ behaves better over unseen data

9/20/2018 Mahesh G Huddar

Example

D15 Sunny Hot Normal Strong No

9/20/2018 Mahesh G Huddar

Overfitting in decision trees

Outlook=Sunny, Temp=Hot, Humidity=Normal, Wind=Strong, PlayTennis=No 

New noisy example causes splitting of second leaf node.

9/20/2018 Mahesh G Huddar
Overfitting in decision tree learning

9/20/2018 Mahesh G Huddar

Avoid overfitting in Decision Trees
 Two strategies:
1. Stop growing the tree earlier, before perfect classification
2. Allow the tree to overfit the data, and then post-prune the tree
 Training and validation set
 split the training in two parts (training and validation) and use
validation to assess the utility of post-pruning
 Reduced error pruning
 Rule pruning
 Other approaches
 Use a statistical test to estimate effect of expanding or pruning
 Minimum description length principle: uses a measure of complexity of
encoding the DT and the examples, and halt growing the tree when this
encoding size is minimal

9/20/2018 Mahesh G Huddar

Reduced-error pruning (Quinlan 1987)
 Each node is a candidate for pruning
 Pruning consists in removing a subtree rooted in a node: the
node becomes a leaf and is assigned the most common
classification
 Nodes are removed only if the resulting tree performs no
worse on the validation set.
 Nodes are pruned iteratively: at each iteration the node
whose removal most increases accuracy on the validation set is
pruned.
 Pruning stops when no pruning increases accuracy

9/20/2018 Mahesh G Huddar

Effect of reduced error pruning

9/20/2018 Mahesh G Huddar

Rule post-pruning
1. Create the decision tree from the training set
2. Convert the tree into an equivalent set of rules
 Each path corresponds to a rule
 Each node along a path corresponds to a pre-condition
 Each leaf classification to the post-condition
3. Prune (generalize) each rule by removing those preconditions
whose removal improves accuracy …
 … over validation set
 … over training with a pessimistic, statistically inspired, measure
4. Sort the rules in estimated order of accuracy, and consider
them in sequence when classifying new instances

9/20/2018 Mahesh G Huddar

Converting to rules

(Outlook=Sunny)(Humidity=High) ⇒ (PlayTennis=No)
9/20/2018 Mahesh G Huddar
Why converting to rules?
 Each distinct path produces a different rule: a condition
removal may be based on a local (contextual) criterion. Node
pruning is global and affects all the rules
 In rule form, tests are not ordered and there is no book-
keeping involved when conditions (nodes) are removed
 Converting to rules improves readability for humans

9/20/2018 Mahesh G Huddar

Dealing with continuous-valued attributes
 So far discrete values for attributes and for outcome.
 Given a continuous-valued attribute A, dynamically create a
new attribute Ac
Ac = True if A < c, False otherwise
 How to determine threshold value c ?
 Example. Temperature in the PlayTennis example
 Sort the examples according to Temperature
Temperature 40 48 | 60 72 80 | 90
PlayTennis No No 54 Yes Yes Yes 85 No
 Determine candidate thresholds by averaging consecutive values where
there is a change in classification: (48+60)/2=54 and (80+90)/2=85
 Evaluate candidate thresholds (attributes) according to information gain.
The best is Temperature>54.The new attribute competes with the other
ones
9/20/2018 Mahesh G Huddar
Problems with information gain
 Natural bias of information gain: it favours attributes with
many possible values.
 Consider the attribute Date in the PlayTennis example.
 Date would have the highest information gain since it perfectly
separates the training data.
 It would be selected at the root resulting in a very broad tree
 Very good on the training, this tree would perform poorly in predicting
unknown instances. Overfitting.
 The problem is that the partition is too specific, too many small
classes are generated.
 We need to look at alternative measures …

9/20/2018 Mahesh G Huddar

An alternative measure: gain ratio
c |Si | |Si |
SplitInformation(S, A)  −  log2
|S |
i=1 |S |
 Si are the sets obtained by partitioning on value i of A
 SplitInformation measures the entropy of S with respect to the values of A. The
more uniformly dispersed the data the higher it is.
Gain(S, A)
GainRatio(S, A) 
SplitInformation(S, A)
 GainRatio penalizes attributes that split examples in many small classes such as
Date. Let |S |=n, Date splits examples in n classes
 SplitInformation(S, Date)= −[(1/n log2 1/n)+…+ (1/n log2 1/n)]= −log21/n =log2n
 Compare with A, which splits data in two even classes:
 SplitInformation(S, A)= − [(1/2 log21/2)+ (1/2 log21/2) ]= − [− 1/2 −1/2]=1

9/20/2018 Mahesh G Huddar

Adjusting gain-ratio
 Problem: SplitInformation(S, A) can be zero or very small
when |Si | ≈ |S | for some value i
 To mitigate this effect, the following heuristics has been used:
1. compute Gain for each attribute
2. apply GainRatio only to attributes with Gain above average
 Other measures have been proposed:
 Distance-based metric [Lopez-De Mantaras, 1991] on the partitions of
data
 Each partition (induced by an attribute) is evaluated according to the
distance to the partition that perfectly classifies the data.
 The partition closest to the ideal partition is chosen

9/20/2018 Mahesh G Huddar

Handling incomplete training data
 How to cope with the problem that the value of some attribute
may be missing?
 Example: Blood-Test-Result in a medical diagnosis problem
 The strategy: use other examples to guess attribute
1. Assign the value that is most common among the training examples at
the node
2. Assign a probability to each value, based on frequencies, and assign
values to missing attribute, according to this probability distribution
 Missing values in new instances to be classified are treated
accordingly, and the most probable classification is chosen
(C4.5)

9/20/2018 Mahesh G Huddar

Handling attributes with different
costs
 Instance attributes may have an associated cost: we would
prefer decision trees that use low-cost attributes
 ID3 can be modified to take into account costs:
1. Tan and Schlimmer (1990)
Gain2(S, A)
Cost(S, A)
2. Nunez (1988)
2Gain(S, A)  1
(Cost(A) + 1)w w ∈ [0,1]

9/20/2018 Mahesh G Huddar

References
 Machine Learning, Tom Mitchell, Mc Graw-Hill International
Editions, 2013, India Edition.

9/20/2018 Mahesh G Huddar

ch01-SLIDE - (2) Data Communications and Networking by Behrouz A.Forouzan
100% (3)
ch01-SLIDE - (2) Data Communications and Networking by Behrouz A.Forouzan
18 pages
Aileen English-LAS-Project-Proposal
No ratings yet
Aileen English-LAS-Project-Proposal
5 pages
Decision Tree
No ratings yet
Decision Tree
98 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
ML Lecture 3
No ratings yet
ML Lecture 3
13 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
L8 1 Decisiontrees Random Forest
No ratings yet
L8 1 Decisiontrees Random Forest
118 pages
Decision Trees & Neural Networks
No ratings yet
Decision Trees & Neural Networks
19 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
61 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Unit 3
No ratings yet
Unit 3
81 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Lesson 5 Decision Tree Learning
No ratings yet
Lesson 5 Decision Tree Learning
10 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Jdavis Indlearn2
No ratings yet
Jdavis Indlearn2
91 pages
Decision Trees
No ratings yet
Decision Trees
34 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
Unit 3
No ratings yet
Unit 3
46 pages
Unit 3
No ratings yet
Unit 3
90 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
Decision Tree Learning Notes On 23rd July
No ratings yet
Decision Tree Learning Notes On 23rd July
23 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Unit 2
100% (1)
Unit 2
42 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
79 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
75 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Unit 3 MLT
No ratings yet
Unit 3 MLT
18 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Decision Trees
No ratings yet
Decision Trees
42 pages
W7-8 - Decision Trees
No ratings yet
W7-8 - Decision Trees
81 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
12 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
42 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
ML Unit-2 Material
No ratings yet
ML Unit-2 Material
20 pages
P4-DTRF 1
No ratings yet
P4-DTRF 1
63 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
ML - Unit 2 - Part I
No ratings yet
ML - Unit 2 - Part I
15 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Lecture 2.6
No ratings yet
Lecture 2.6
23 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
21 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Module 3
No ratings yet
Module 3
101 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
CS 7 21CS731 QPModel 3402
No ratings yet
CS 7 21CS731 QPModel 3402
3 pages
III I ML Notes r22 Compressed
No ratings yet
III I ML Notes r22 Compressed
216 pages
21CSL481
No ratings yet
21CSL481
3 pages
204CS001 2
No ratings yet
204CS001 2
2 pages
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
25 pages
RIS Assisted UAV D2D Communications
No ratings yet
RIS Assisted UAV D2D Communications
9 pages
Federated Deep Reinforcement Learning For RIS-Assisted
No ratings yet
Federated Deep Reinforcement Learning For RIS-Assisted
6 pages
RIS-Assisted UAV For Timely Data Collection in
No ratings yet
RIS-Assisted UAV For Timely Data Collection in
12 pages
Sensors 23 08041 v3
No ratings yet
Sensors 23 08041 v3
17 pages
AI-AIML Question Bank
No ratings yet
AI-AIML Question Bank
6 pages
Reconfigurable Intelligent Surface-Assisted
No ratings yet
Reconfigurable Intelligent Surface-Assisted
30 pages
ATAL FDP Brochure File
No ratings yet
ATAL FDP Brochure File
3 pages
479 B.E. Computer Science Technology Scheme Syllabus
No ratings yet
479 B.E. Computer Science Technology Scheme Syllabus
1 page
Product Brochure2
No ratings yet
Product Brochure2
22 pages
1877677750
No ratings yet
1877677750
128 pages
Lecture 161
No ratings yet
Lecture 161
46 pages
Computer Networks Notes
67% (3)
Computer Networks Notes
23 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
42 pages
Manual Cooler CNPS9500A LED
No ratings yet
Manual Cooler CNPS9500A LED
9 pages
Session 1 and 2 Course Overview and Intro To R
No ratings yet
Session 1 and 2 Course Overview and Intro To R
147 pages
DS Honor Sem 5 Endsem Paper 1
No ratings yet
DS Honor Sem 5 Endsem Paper 1
2 pages
Form 1 Math Exam Paper
No ratings yet
Form 1 Math Exam Paper
6 pages
Dbaudio Technical Information Ti370 1.1 en
No ratings yet
Dbaudio Technical Information Ti370 1.1 en
18 pages
Datasheet - XPG SX8100 - EN - 202005
No ratings yet
Datasheet - XPG SX8100 - EN - 202005
2 pages
Node.js Basics for Developers
No ratings yet
Node.js Basics for Developers
36 pages
myDATA API Documentation v0 6b - Eng
No ratings yet
myDATA API Documentation v0 6b - Eng
50 pages
Fund Utilization Guidelines For CBScheme
No ratings yet
Fund Utilization Guidelines For CBScheme
24 pages
A Project Report On "Comparative Study Between Flipkart and Amazon India"
No ratings yet
A Project Report On "Comparative Study Between Flipkart and Amazon India"
88 pages
DBMS - Worksheet 1 - 3 Marks
No ratings yet
DBMS - Worksheet 1 - 3 Marks
13 pages
Big Data Multiple Choice Questions
No ratings yet
Big Data Multiple Choice Questions
9 pages
HW3: (Regularized) Least Square Problem (65 PTS) : Mathematical Backgrounds
No ratings yet
HW3: (Regularized) Least Square Problem (65 PTS) : Mathematical Backgrounds
13 pages
Esay
No ratings yet
Esay
2 pages
VSS01 - VSS02-Maximizing Use of Vanguard Administrator (Part 1 - Part 2)
No ratings yet
VSS01 - VSS02-Maximizing Use of Vanguard Administrator (Part 1 - Part 2)
126 pages
Function and Relations (Part 1)
No ratings yet
Function and Relations (Part 1)
23 pages
Bokeh Cheat Sheet
No ratings yet
Bokeh Cheat Sheet
1 page
Graphs: A Guide for CS Students
No ratings yet
Graphs: A Guide for CS Students
12 pages
API Guide: HTTP Verbs & Status Codes
67% (3)
API Guide: HTTP Verbs & Status Codes
48 pages
Assignment 1: 4% Monday, 30th of September at 8AM
No ratings yet
Assignment 1: 4% Monday, 30th of September at 8AM
8 pages
Digital Forensics Midterm Case Study
No ratings yet
Digital Forensics Midterm Case Study
5 pages
Guentner Manual GMMnext V1.1.5 en
No ratings yet
Guentner Manual GMMnext V1.1.5 en
179 pages
CyberArk Vault Setup Guide
No ratings yet
CyberArk Vault Setup Guide
66 pages
Huwaei Router Manual
No ratings yet
Huwaei Router Manual
215 pages
Asug82651 - Sap S4hana Licensing Cloud
No ratings yet
Asug82651 - Sap S4hana Licensing Cloud
36 pages
AI Boost for YouTube Long-Form Videos
No ratings yet
AI Boost for YouTube Long-Form Videos
26 pages
Official Non-Regression FIG-LX1 8.0.0.176 (C432) To FIG-LX1 8.0.0.174 (C432)
No ratings yet
Official Non-Regression FIG-LX1 8.0.0.176 (C432) To FIG-LX1 8.0.0.174 (C432)
2 pages
Techblume, Inc Company Profile
No ratings yet
Techblume, Inc Company Profile
14 pages

ML - Module 2

Uploaded by

ML - Module 2

Uploaded by

Decision Tree Tearning

 Decision Trees is one of the most widely used and

9/19/2018 Mahesh G Huddar

Outlook=Sunny, Temp=Hot, Humidity=High, Wind=Strong No

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

 A statistical property called information gain, measures how

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

Gain(S, A) = Entropy(S) −  |Sv|

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

 Which attribute should be tested at the root?

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

 What is the inductive bias of DT learning?

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

D15 Sunny Hot Normal Strong No

9/20/2018 Mahesh G Huddar

Outlook=Sunny, Temp=Hot, Humidity=Normal, Wind=Strong, PlayTennis=No 

New noisy example causes splitting of second leaf node.

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

9/20/2018 Mahesh G Huddar

You might also like