0% found this document useful (0 votes)

14 views51 pages

Module 4

The document discusses Decision Tree Learning and Bayesian Learning, outlining the structure and advantages of decision trees, including their construction and classification processes. It also covers Bayesian learning methods, emphasizing the use of Bayes theorem for classification and the importance of prior and posterior probabilities. Additionally, it addresses validation and pruning techniques for decision trees to prevent overfitting and improve model accuracy.

Uploaded by

Mukunda T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views51 pages

Module 4

Uploaded by

Mukunda T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 51

21CS54(AI/ML) Prof.

Thameeza

MODULE 4

DECISION TREE LEARNING & BAYESIAN LEARNING

Introduction

 Why called as decision tree?

 As starts from root node and finds number of solutions.
 The benefits of having a decision tree are as follows :
 It does not require any domain knowledge.
 It is easy to comprehend.
 The learning and classification steps of a decision tree are simple and fast.
 Example : Toll free number

Structure of a Decision Tree

 A decision tree is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the
outcome of a test, and each leaf node holds a class label.
 The topmost node in the tree is the root node and Applies to classification and
regression model.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

 Figure 6.1 shows symbols that are used to represent different nodes in
the construction of a decision trees.
 Decision networks are also called as influence diagrams.

 The decision tree consists of 2 major procedures:

1) Building a tree and

2) Knowledge inference or classification.

Building the Tree

Knowledge Inference or Classification

Advantages of Decision Trees

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Disadvantages of Decision Trees

Fundamentals of Entropy

 Given a training dataset with a set of attributes, the decision tree is constructed
by finding the attribute that best describes the target class for the given test
instances.
 The best split attribute is the one which contains more information about how to
split the dataset among all features so that target class is accurately identified for
the test instances.
 The splitting should be pure at every stage of selecting the best feature.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Algorithm 6.1: General Algorithm for Decision

Trees

Decision tree induction algorithms

ID3 Tree Construction(ID3 stands for Iterative Dichotomiser 3 )

 A decision tree is one of the most powerful tools of supervised learning
algorithmsused for both classification and regression tasks.
 It builds a flowchart-like tree structure where each internal node denotes a test
on anattribute, each branch represents an outcome of the test, and each leaf

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

node (terminal node) holds a class label.

 It is constructed by recursively splitting the training data into subsets based on
the values of the attributes until a stopping criterion is met, such as the
maximum depth of the tree or the minimum number of samples required to split
a node.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Step 1: Calculate the Entropy for the target class ‘Job Offer’.

Entropy_Info (7, 3) =

Iteration 1:
Step 2: Calculate the Entropy info and Gain (Information gain)for each of the attribute
in the training dataset

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Step 3: From table 6.8 choose the attribute for which entropy is minimum and
therefore the gain is maximum as the best split attribute

Iteration 2:

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Example: Construct a decision tree for the training dataset below using
ID3 Algorithm

C4.5 Construction
 C4.5 is a widely used algorithm for constructing decision trees from a dataset.
Disadvantages of ID3 are: Attributes must be nominal values, dataset must
not includemissing data, and finally the algorithm tend to fall into over fitting.
 To overcome this disadvantage Ross Quinlan, inventor of ID3, made some
improvements for these bottlenecks and created a new algorithm named C4.5.
 Now, thealgorithm can create a more generalized models including continuous
data and could handle missing data. And also works with discrete data,
supports post-pruning.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Example: Construct a decision tree for the table 6.3 using C4.5

Iteration 1:
Step 1: Calculate the Entropy for the target class ‘Job Offer’.

Entropy_Info (7, 3) =

Step 2: Calculate the Entropy info and Gain (Information gain), Split info and gain
ratio for each of the attribute in the training dataset
.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Step 3: Choose the attribute for which gain ratio is maximum as the best split
attribute

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

The final decision tree is shown in figure below.

Example 2:

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Dealing with Continuous Attributes in C4.5

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

 Similarly, the calculations are done for each of the distinct value for the
attribute CGPA and a table is created. Now, the value of GPA with maximum
gain is chosen as the threshold value or the best split point. From Table 6.13, we
can observe that CGPA with 7.9 has the maximum gain as 0.4462.
 Hence, CGPA € 7.9 is chosen as the split point. Now, we can discretize the
continuous values Of CGPA as two categories with CGPA $ 7.9 and GPA >
7.9. The resulting discretized instances are shown in Table 6.14.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Classification and Regression Trees Construction(CART)

 Classification and Regression Trees (CART) is a widely used algorithm for
constructing decision trees that can be applied to both classification and
regression tasks. CART is similar to C4.5 but has some differences in its
construction and splittingcriteria.
 It constructs a tree as a binary tree by recursively splitting a node into two
nodes. Therefore even if an attribute has more than two possible values. GINI
index is calculated for all the subsets of attributes and the subset which has
maximum value is selected as the best split subset.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Example: Construct a decision tree for the table 6.3 using CART

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Regression Trees

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Home Work Problems

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

VALIDATING AND PRUNING OF DECISION TREES

 Validating and pruning decision trees is a crucial part of building accurate and
robust machine learning models.
 Decision trees are prone to over fitting, which means they can learn to capture
noise and details in the training data that do not generalize well to new, unseen
data.
 Validation and pruning are techniques used to mitigate this issue and improve
the performance of decision tree models.
 The pre-pruning technique of Decision Trees is tuning the hyper parameters
prior to the training pipeline. It involves the heuristic known as ‘early stopping’
which stops thegrowth of the decision tree - preventing it from reaching its full
depth.
 It stops the tree-building process to avoid producing leaves with small samples.
During each stage of the splitting of the tree, the cross-validation error will be
monitored.
 If the value of theerror does not decrease anymore - then we stop the growth of
the decision tree.
 The hyper parameters that can be tuned for early stopping and preventing over
fitting are: max_depth, min_samples_leaf, and min_samples_split
 These same parameters can also be used to tune to get a robust model
 Post-pruning does the opposite of pre-pruning and allows the Decision Tree
model to grow to its full depth. Once the model grows to its full depth, tree
branches are removed to prevent the model from over fitting.
 The algorithm will continue to partition data intosmaller subsets until the final
subsets produced are similar in terms of the outcome variable.
 The final subset of the tree will consist of only a few data points allowing the
tree to have learned the data to the T. However, when a new data point is
introduced that differs from the learned data - it may not get predicted well.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Chapter 8
Bayesian Learning
8.1 Introduction to probability-based learning
8.2 Fundamentals of Bayes theorem
8.3 Classification using Bayes model
8.3.1 Naïve Bayes Algorithm
8.3.2 Brute Force Bayes Algorithm
8.3.3 Bayes optimal classifier
8.3.4 Gibbs Algorithm
8.4 Naïve Bayes Algorithm for continuous attributes
8.5 other popular types of naïve Bayes classifiers

Bayesian learning

 Bayesian Learning is a learning method that describes and represents knowledge

in an uncertain domain and provides a way to reason about this knowledge
using probability measure.
 It uses Bayes theorem to infer the unknown parameters of a model. Bayesian
inference is useful in many applications which involve reasoning and
diagnosis such as game theory, medicine, etc.
 Bayesian inference is much more powerful in handling missing data and
for estimating any uncertainty in predictions.

8.1 Introduction to probability-based learning

 Probability-based learning is one of the most important practical learning
methods which combines prior knowledge or prior probabilities with
observed data.
 Probabilistic learning uses the concept of probability theory that describes
how to model randomness, uncertainty, and noise to predict future events.
 It is a tool for modelling large datasets and uses Bayes rule to infer unknown
quantities, predict and learn from data. In a probabilistic model, randomness
plays a major role which gives probability distribution a solution, while in a
deterministic model there is no randomness and hence it exhibits the same
initial conditions every time the model is run and is likely to get a single
possible outcome as the solution.
 Bayesian learning differs from probabilistic learning as it uses subjective
probabilities (ie, probability that is based on an individual's belief or
interpretation about the outcome of an event and it can change over time) to
infer parameters of a model.
 Two practical learning algorithms called Naive Bayes learning and
Bayesian Belief Network (BBN) form the major part of Bayesian learning.
 These algorithms use prior probabilities and apply Bayes rule to infer
useful information

8.2 Fundamentals of Bayes theorem

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

 Naive Bayes Model relies on Bayes theorem

 It works on the principle of three kinds of probabilities called prior
probability, likelihood probability, and posterior probability.

Prior Probability
 It is the general probability of an uncertain event before an observation is seen
or some evidence is collected.
 It is the initial probability that is believed before any new information
is collected.

Likelihood Probability
 Likelihood probability is the relative probability of the observation occurring
for each class or the sampling density for the evidence given the hypothesis.
 It is stated as P (Evidence I Hypothesis), which denotes the likeliness of
the occurrence of the evidence given the parameters.
Posterior Probability

 It is the updated or revised probability of an event taking into account

the observations from the training data.
 P (Hypothesis I Evidence) is the posterior distribution representing the
belief about the hypothesis, given the evidence from the training data.
Therefore,
Posterior probability = prior probability + new evidence

8.3 Classification using bayes model

 Naive Bayes Classification models work on the principle of Bayes theorem.
 Bayes' rule is a mathematical formula used to determine the posterior probability,
 Given prior probabilities of events. Generally, Bayes theorem is used to select
the most probable hypothesis from data, considering both prior knowledge
and posterior distributions.
 It is based on the calculation of the posterior probability and is stated as:
P (Hypothesis h I Evidence E)
 Where, Hypothesis ‘h’ is the target class to be classified and Evidence ‘E’ is
the given test instance. P (Hypothesis h) is the prior probability of the
hypothesis h without observing the training data or considering any evidence.

 Where, P (Hypothesis h) is the prior probability of the hypothesis h

without observing the training data or considering any evidence.
 It denotes the prior belief or the initial probability that the hypothesis h is correct.
 P (Evidence E) is the prior probability of the evidence E from the training
dataset without any knowledge of which hypothesis holds. It is also called the
marginal probability.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

Maximum A Posteriori (MAP) Hypothesis, hMAP

 Given a set of candidate hypotheses, the hypothesis which has the
maximum value is considered as the maximum probable hypothesis or most
probable hypothesis.
 This most probable hypothesis is called the Maximum A Posteriori
Hypothesis hMAP. Bayes theorem Eq. (8.1) can be used to find the HMAP.

Maximum Likelihood (ML) Hypothesis, hML

 Given a set of candidate hypotheses, if every hypothesis is equally probable,
only P (E \ h) is used to find the most probable hypothesis.
 The hypothesis that gives the maximum likelihood for P (E\h) is called
the Maximum Likelihood (ML) Hypothesis, hML.

Correctness of Bayes theorem

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

 One related concept of Bayes theorem is the principle of Minimum

Description Length (MDL).
 The minimum description length (MDL) principle is yet another
powerful method like Occam's razor principle to perform inductive
inference.
 It states that the best and most probable hypothesis is chosen for a set of
observed data or the one with the minimum description. Recall from Eq. (8.2)
Maximum A Posteriori (MAP) Hypothesis, hwAR which says that given a set
of candidate hypotheses, the hypothesis which has the maximum value is
considered as the maximum probable hypothesis or most probable hypothesis.
 Naive Bayes algorithm uses the Bayes theorem and applies this MDL
principle to find the best hypothesis for a given problem.

8.3.1 Naïve Bayes Algorithm

 It is a supervised binary class or multi class classification algorithm that
works on the principle of Bayes theorem.
 There is a family of Naive Bayes classifiers based on a common principle.
These algorithms classify for datasets whose features are independent and each
feature is assumed to be given equal weightage. It particularly works for a large
dataset and is very fast.
 It is one of the most effective and simple classification algorithms.
 This algorithm considers all features to be independent of each other
even though they are individually dependent on the classified object.
 Each of the features contributes a probability value independently
during classification and hence this algorithm is called as Naive
algorithm.
 Some important applications of these algorithms are text
classification, recommendation system and face recognition.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

 As explained earlier the Likelihood probability is stated as the sampling

density for the evidence given the hypothesis.
 It is denoted as P (Evidence I Hypothesis), which says how likely the
occurrence of the evidence given the parameters is.
 It is calculated as the number of instances of each attribute value and for a
given class value divided by the number of instances with that class value.
 For example P (CGPA ≥9 I Job Offer = Yes) denotes the number of instances
with 'CGPA 29’ and 'Job Offer = Yes' divided by the total number of
instances with 'Job Offer = Yes'.
 From the Table 8.3 Frequency Matrix of GPA, number of instances with
'CGPA 29' and 'Job Offer = Yes' is 3.
 The total number of instances with Job Offer = Yes' is 7.
 Hence, P (CGPA 29 | Job Offer = Yes) = 3/7.
 Similarly, the Likelihood probability is calculated for all attribute values
of feature CGPA.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

8.3.2 Brute Force Bayes Algorithm

 Applying Bayes theorem, Brute Force Bayes algorithm relies on the idea of
concept learning wherein given a hypothesis space H for the training dataset
I, the algorithm computes the posterior probabilities for all the hypothesis h,
eH.
 Then, Maximum A Posteriori (MAP) Hypothesis, hmap is used to output
the hypothesis with maximum posterior probability. The algorithm is quite
expensive since it requires computations for all the hypotheses.
 Although computing posterior probabilities is inefficient, this idea is applied
in various other algorithms which is also quite interesting.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

8.3.3 Bayes Optimal Classifier

 Bayes optimal classifier is a probabilistic model, which in fact, uses the
Bayes theorem to find the most probable classification for a new instance
given the training data by combining the predictions of all posterior
hypotheses,
 This is different from Maximum A Posteriori (MAP) Hypothesis, Krap
which chooses the maximum probable hypothesis or the most probable
hypothesis.
 Here, a new instance can be classified to a possible classification value C, by
the following

 Hmap chooses h1 which has the maximum probability value 0.3 has the
solution and gives result that the patient is covid negative.
 But Bayes Optimal Classifier combines the predictions of h2, h3 and h4 which is
0.4 and gives the result that the patient is covid negative.

 Therefore max Ci ∈ [COVID Positive, Covid Negative] Σhi ∈H P(Ci\

hi)P(hi\T) = COVID Positive
 Thus the algorithm, diagnoses the new instance to be COVID Positive

8.3.4 Gibbs Algorithm

 The main drawback of Bayes optimal classifier is that it computes the
posterior probability foral hypotheses in the hypothesis space and then
combines the predictions to classity a new instance.

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

 Gibbs algorithm is a sampling technique which randomly selects a hypothesis

from the hypothesis space according to the posterior probability distribution
and classifies a new instance
 It is found that the prediction error occurs twice with the Gibbs algorithm
when compared to Bayes Optimal classifier.

8.4 Naive Bayes Algorithm For Continuous Attributes

 There are two ways to predict with Naive Bayes algorithm for
continuous attributes:
1. Discretize continuous feature to discrete feature.
2. Apply Normal or Gaussian distribution for continuous feature.

Gaussian Naive Bayes Algorithm

 In Gaussian Naive Bayes, the values of continuous features are assumed to
be sampled from a Gaussian distribution.

Example 8.4:
Assess a student's performance using Naïve Bayes algorithm for the continuous
attribute. Predict whether a student gets a job offer or not in his final year of the
course. The training dataset T consists of 10 data instances with attributes such
as 'CGA' and 'Interactiveness' as shown in Table 8.13. The target variable is Job
Offer which is classified as Yes or No for a candidate student.

Solution:
Step 1: Compute the prior probability for the target feature Job offer

Altedegree.co
m
21CS54(AI/ML) Prof. Thameeza

8.5 OTHER POPULAR TYPES OF NAIVE BAYES CLASSIFIERS

Some of the popular variants of Bayesian classifier are listed below:

 Bernoulli Naive Bayes Classifier: Bernoulli Naive Bayes works with discrete
features. In this algorithm, the features used for making predictions are
Boolean variables that take only two values either 'yes' or 'no'. This is
particularly useful for text classification where all features are binary with each
feature containing two values whether the word occurs or not.
 Multinomial Naive Bayes Classifier: This algorithm is a generalization of the
Bernoulli Naive Bayes model that works for categorical data or particularly
integer features. This classifier is useful for text classification where each
feature will have an integer value that represents the frequency of occurrence of
words.
 Multi-class Naive Bayes Classifier: This algorithm is useful for classification
problems with more than two classes where the target feature contains
multiple classes and test instance has to be predicted with the class it belongs
to.

Altedegree.co
m

Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
G8497-90028 - SW - Install v2
No ratings yet
G8497-90028 - SW - Install v2
8 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
Coal Conversions Facts 2013
No ratings yet
Coal Conversions Facts 2013
4 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
TCS NQT Prep Guide
No ratings yet
TCS NQT Prep Guide
156 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
@vtucode - in Module 4 AI 2021 Scheme 5th Sem
No ratings yet
@vtucode - in Module 4 AI 2021 Scheme 5th Sem
11 pages
Decision Trees for CS Students
100% (1)
Decision Trees for CS Students
29 pages
Lecture - 3 Classification (Decision Tree)
No ratings yet
Lecture - 3 Classification (Decision Tree)
44 pages
Basics of Design & Graphics - Practice Questions
No ratings yet
Basics of Design & Graphics - Practice Questions
25 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
3 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Module 4
No ratings yet
Module 4
51 pages
Module - 2
No ratings yet
Module - 2
130 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
23 pages
21CS54 Module 4 2021 Scheme
No ratings yet
21CS54 Module 4 2021 Scheme
42 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
AIML Module-04
No ratings yet
AIML Module-04
46 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
EFI Fuel System
No ratings yet
EFI Fuel System
68 pages
Machine Learning-Lecture 05
No ratings yet
Machine Learning-Lecture 05
21 pages
Gemcom Minex: New Features
No ratings yet
Gemcom Minex: New Features
13 pages
ML Unit 3
No ratings yet
ML Unit 3
30 pages
Getting - Started With Cisco Intersight
No ratings yet
Getting - Started With Cisco Intersight
12 pages
Elfospace Box3: Cassette-Type Indoor Installation
No ratings yet
Elfospace Box3: Cassette-Type Indoor Installation
4 pages
Audi 80/90 Wiring Diagram Guide
No ratings yet
Audi 80/90 Wiring Diagram Guide
20 pages
LED Driver IC for Lighting Systems
No ratings yet
LED Driver IC for Lighting Systems
13 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Olp 34 35 38 Optical Power Meter Manual User Guide en
No ratings yet
Olp 34 35 38 Optical Power Meter Manual User Guide en
36 pages
vx55 4wd
No ratings yet
vx55 4wd
24 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Leaflet HEMK 20191010
100% (1)
Leaflet HEMK 20191010
14 pages
DLL Ict 10
100% (1)
DLL Ict 10
3 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
ThinkPad P Series
No ratings yet
ThinkPad P Series
14 pages
Dutch Fintech Map 2022: Ecosystem Insights
No ratings yet
Dutch Fintech Map 2022: Ecosystem Insights
16 pages
Machine Learning: Decision Trees & Algorithms
No ratings yet
Machine Learning: Decision Trees & Algorithms
24 pages
0936E1001R00
No ratings yet
0936E1001R00
1 page
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
2 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Erp Manager
No ratings yet
Erp Manager
2 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
UV-Laser-engraver HS-UV05 240711 132434.pdf 20240711 134653 0000
No ratings yet
UV-Laser-engraver HS-UV05 240711 132434.pdf 20240711 134653 0000
5 pages
EGEC 2023 Self Placement Guide
No ratings yet
EGEC 2023 Self Placement Guide
4 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Trees
No ratings yet
Trees
78 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
ML Unit3
No ratings yet
ML Unit3
24 pages
J Jfoodeng 2018 01 016
No ratings yet
J Jfoodeng 2018 01 016
8 pages
Unit3 ML
No ratings yet
Unit3 ML
23 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Unit 3
No ratings yet
Unit 3
28 pages
ML Unit-3
No ratings yet
ML Unit-3
23 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
6 Steps How To Jump Start A Car
No ratings yet
6 Steps How To Jump Start A Car
1 page
Avila Et Al 2021 - Characterization of The Mechanical and Physical Properties
No ratings yet
Avila Et Al 2021 - Characterization of The Mechanical and Physical Properties
12 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
5.5 Representing Data - Encryption
No ratings yet
5.5 Representing Data - Encryption
12 pages
21cs54 Aiml Module4
No ratings yet
21cs54 Aiml Module4
128 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
PolyJet Print-Head Claim Procedure
No ratings yet
PolyJet Print-Head Claim Procedure
3 pages
BROCHURE
No ratings yet
BROCHURE
8 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
ECE312 Final Exam 2021
No ratings yet
ECE312 Final Exam 2021
2 pages
Unit 2
No ratings yet
Unit 2
29 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
4 & 5 DWM 2024-25
No ratings yet
4 & 5 DWM 2024-25
32 pages
Unit 3
No ratings yet
Unit 3
21 pages
01 Decision Tree Induction Algorithms - Tutorial
No ratings yet
01 Decision Tree Induction Algorithms - Tutorial
12 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages

Module 4

Uploaded by

Module 4

Uploaded by

21CS54(AI/ML) Prof.

DECISION TREE LEARNING & BAYESIAN LEARNING

 Why called as decision tree?

Structure of a Decision Tree

 The decision tree consists of 2 major procedures:

1) Building a tree and

2) Knowledge inference or classification.

Building the Tree

Knowledge Inference or Classification

Advantages of Decision Trees

Disadvantages of Decision Trees

Algorithm 6.1: General Algorithm for Decision

Decision tree induction algorithms

ID3 Tree Construction(ID3 stands for Iterative Dichotomiser 3 )

node (terminal node) holds a class label.

The final decision tree is shown in figure below.

Dealing with Continuous Attributes in C4.5

Classification and Regression Trees Construction(CART)

Home Work Problems

VALIDATING AND PRUNING OF DECISION TREES

 Bayesian Learning is a learning method that describes and represents knowledge

8.1 Introduction to probability-based learning

8.2 Fundamentals of Bayes theorem

 Naive Bayes Model relies on Bayes theorem

 It is the updated or revised probability of an event taking into account

8.3 Classification using bayes model

 Where, P (Hypothesis h) is the prior probability of the hypothesis h

Maximum A Posteriori (MAP) Hypothesis, hMAP

Maximum Likelihood (ML) Hypothesis, hML

Correctness of Bayes theorem

 One related concept of Bayes theorem is the principle of Minimum

8.3.1 Naïve Bayes Algorithm

 As explained earlier the Likelihood probability is stated as the sampling

8.3.2 Brute Force Bayes Algorithm

8.3.3 Bayes Optimal Classifier

 Therefore max Ci ∈ [COVID Positive, Covid Negative] Σhi ∈H P(Ci\

8.3.4 Gibbs Algorithm

 Gibbs algorithm is a sampling technique which randomly selects a hypothesis

8.4 Naive Bayes Algorithm For Continuous Attributes

Gaussian Naive Bayes Algorithm

8.5 OTHER POPULAR TYPES OF NAIVE BAYES CLASSIFIERS

You might also like