0% found this document useful (0 votes)

17 views51 pages

Module 4

The document discusses decision tree learning and construction algorithms such as ID3, C4.5, and CART. It covers topics like entropy, information gain, gain ratio, and dealing with continuous attributes. Examples are provided to demonstrate constructing decision trees using different algorithms.

Uploaded by

carry9224

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views51 pages

Module 4

Uploaded by

carry9224

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

21CS54(AI/ML) Prof.

Thameeza

MODULE 4

DECISION TREE LEARNING & BAYESIAN LEARNING

Introduction

• Why called as decision tree?

• As starts from root node and finds number of solutions.
• The benefits of having a decision tree are as follows :
• It does not require any domain knowledge.
• It is easy to comprehend.
• The learning and classification steps of a decision tree are simple and fast.
• Example : Toll free number

Structure of a Decision Tree

• A decision tree is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the
outcome of a test, and each leaf node holds a class label.
• The topmost node in the tree is the root node and Applies to classification and
regression model.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

• Figure 6.1 shows symbols that are used to represent different nodes in the
construction of a decision trees.
• Decision networks are also called as influence diagrams.

• The decision tree consists of 2 major procedures:

1) Building a tree and

2) Knowledge inference or classification.

Building the Tree

Knowledge Inference or Classification

Advantages of Decision Trees

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Disadvantages of Decision Trees

Fundamentals of Entropy

• Given a training dataset with a set of attributes, the decision tree is constructed
by finding the attribute that best describes the target class for the given test
instances.
• The best split attribute is the one which contains more information about how to
split the dataset among all features so that target class is accurately identified for
the test instances.
• The splitting should be pure at every stage of selecting the best feature.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Algorithm 6.1: General Algorithm for Decision Trees

Decision tree induction algorithms

ID3 Tree Construction(ID3 stands for Iterative Dichotomiser 3 )

• A decision tree is one of the most powerful tools of supervised learning
algorithmsused for both classification and regression tasks.
• It builds a flowchart-like tree structure where each internal node denotes a test
on anattribute, each branch represents an outcome of the test, and each leaf

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

node (terminal node) holds a class label.

• It is constructed by recursively splitting the training data into subsets based on
the values of the attributes until a stopping criterion is met, such as the
maximum depth of the tree or the minimum number of samples required to split
a node.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Step 1: Calculate the Entropy for the target class ‘Job Offer’.

Entropy_Info (7, 3) =

Iteration 1:
Step 2: Calculate the Entropy info and Gain (Information gain)for each of the attribute
in the training dataset

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Step 3: From table 6.8 choose the attribute for which entropy is minimum and
therefore the gain is maximum as the best split attribute

Iteration 2:

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Example: Construct a decision tree for the training dataset below using ID3
Algorithm

C4.5 Construction
• C4.5 is a widely used algorithm for constructing decision trees from a dataset.
Disadvantages of ID3 are: Attributes must be nominal values, dataset must not
includemissing data, and finally the algorithm tend to fall into over fitting.
• To overcome this disadvantage Ross Quinlan, inventor of ID3, made some
improvements for these bottlenecks and created a new algorithm named C4.5.
• Now, the algorithm can create a more generalized models including continuous
data and could handle missing data. And also works with discrete data,
supports post-pruning.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Example: Construct a decision tree for the table 6.3 using C4.5

Iteration 1:
Step 1: Calculate the Entropy for the target class ‘Job Offer’.

Entropy_Info (7, 3) =

Step 2: Calculate the Entropy info and Gain (Information gain), Split info and gain
ratio for each of the attribute in the training dataset
.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Step 3: Choose the attribute for which gain ratio is maximum as the best split
attribute

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

The final decision tree is shown in figure below.

Example 2:

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Dealing with Continuous Attributes in C4.5

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

• Similarly, the calculations are done for each of the distinct value for the attribute
CGPA and a table is created. Now, the value of GPA with maximum gain is
chosen as the threshold value or the best split point. From Table 6.13, we can
observe that CGPA with 7.9 has the maximum gain as 0.4462.
• Hence, CGPA € 7.9 is chosen as the split point. Now, we can discretize the
continuous values Of CGPA as two categories with CGPA $ 7.9 and GPA > 7.9.
The resulting discretized instances are shown in Table 6.14.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Classification and Regression Trees Construction(CART)

• Classification and Regression Trees (CART) is a widely used algorithm for
constructing decision trees that can be applied to both classification and
regression tasks. CART is similar to C4.5 but has some differences in its
construction and splittingcriteria.
• It constructs a tree as a binary tree by recursively splitting a node into two nodes.
Therefore even if an attribute has more than two possible values. GINI index is
calculated for all the subsets of attributes and the subset which has maximum
value is selected as the best split subset.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Example: Construct a decision tree for the table 6.3 using CART

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Regression Trees

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Home Work Problems

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

VALIDATING AND PRUNING OF DECISION TREES

• Validating and pruning decision trees is a crucial part of building accurate and
robust machine learning models.
• Decision trees are prone to over fitting, which means they can learn to capture
noise and details in the training data that do not generalize well to new, unseen
data.
• Validation and pruning are techniques used to mitigate this issue and improve
the performance of decision tree models.
• The pre-pruning technique of Decision Trees is tuning the hyper parameters
prior to the training pipeline. It involves the heuristic known as ‘early stopping’
which stops the growth of the decision tree - preventing it from reaching its full
depth.
• It stops the tree- building process to avoid producing leaves with small samples.
During each stage of the splitting of the tree, the cross-validation error will be
monitored.
• If the value of the error does not decrease anymore - then we stop the growth of
the decision tree.
• The hyper parameters that can be tuned for early stopping and preventing over
fitting are: max_depth, min_samples_leaf, and min_samples_split
• These same parameters can also be used to tune to get a robust model
• Post-pruning does the opposite of pre-pruning and allows the Decision Tree
model to grow to its full depth. Once the model grows to its full depth, tree
branches are removed to prevent the model from over fitting.
• The algorithm will continue to partition data into smaller subsets until the final
subsets produced are similar in terms of the outcome variable.
• The final subset of the tree will consist of only a few data points allowing the
tree to have learned the data to the T. However, when a new data point is
introduced that differs from the learned data - it may not get predicted well.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Chapter 8
Bayesian Learning
8.1 Introduction to probability-based learning
8.2 Fundamentals of Bayes theorem
8.3 Classification using Bayes model
8.3.1 Naïve Bayes Algorithm
8.3.2 Brute Force Bayes Algorithm
8.3.3 Bayes optimal classifier
8.3.4 Gibbs Algorithm
8.4 Naïve Bayes Algorithm for continuous attributes
8.5 other popular types of naïve Bayes classifiers

Bayesian learning

• Bayesian Learning is a learning method that describes and represents knowledge

in an uncertain domain and provides a way to reason about this knowledge using
probability measure.
• It uses Bayes theorem to infer the unknown parameters of a model. Bayesian
inference is useful in many applications which involve reasoning and diagnosis
such as game theory, medicine, etc.
• Bayesian inference is much more powerful in handling missing data and for
estimating any uncertainty in predictions.

8.1 Introduction to probability-based learning

• Probability-based learning is one of the most important practical learning
methods which combines prior knowledge or prior probabilities with observed
data.
• Probabilistic learning uses the concept of probability theory that describes how
to model randomness, uncertainty, and noise to predict future events.
• It is a tool for modelling large datasets and uses Bayes rule to infer unknown
quantities, predict and learn from data. In a probabilistic model, randomness
plays a major role which gives probability distribution a solution, while in a
deterministic model there is no randomness and hence it exhibits the same initial
conditions every time the model is run and is likely to get a single possible
outcome as the solution.
• Bayesian learning differs from probabilistic learning as it uses subjective
probabilities (ie, probability that is based on an individual's belief or
interpretation about the outcome of an event and it can change over time) to infer
parameters of a model.
• Two practical learning algorithms called Naive Bayes learning and Bayesian
Belief Network (BBN) form the major part of Bayesian learning.
• These algorithms use prior probabilities and apply Bayes rule to infer useful
information

8.2 Fundamentals of Bayes theorem

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

• Naive Bayes Model relies on Bayes theorem

• It works on the principle of three kinds of probabilities called prior probability,
likelihood probability, and posterior probability.

Prior Probability
• It is the general probability of an uncertain event before an observation is seen or
some evidence is collected.
• It is the initial probability that is believed before any new information is
collected.

Likelihood Probability
• Likelihood probability is the relative probability of the observation occurring for
each class or the sampling density for the evidence given the hypothesis.
• It is stated as P (Evidence I Hypothesis), which denotes the likeliness of the
occurrence of the evidence given the parameters.
Posterior Probability

• It is the updated or revised probability of an event taking into account the

observations from the training data.
• P (Hypothesis I Evidence) is the posterior distribution representing the belief
about the hypothesis, given the evidence from the training data. Therefore,
Posterior probability = prior probability + new evidence

8.3 Classification using bayes model

• Naive Bayes Classification models work on the principle of Bayes theorem.
• Bayes' rule is a mathematical formula used to determine the posterior probability,
• Given prior probabilities of events. Generally, Bayes theorem is used to select
the most probable hypothesis from data, considering both prior knowledge and
posterior distributions.
• It is based on the calculation of the posterior probability and is stated as:
P (Hypothesis h I Evidence E)
• Where, Hypothesis ‘h’ is the target class to be classified and Evidence ‘E’ is the
given test instance. P (Hypothesis h) is the prior probability of the hypothesis h
without observing the training data or considering any evidence.

• Where, P (Hypothesis h) is the prior probability of the hypothesis h without

observing the training data or considering any evidence.
• It denotes the prior belief or the initial probability that the hypothesis h is correct.
• P (Evidence E) is the prior probability of the evidence E from the training dataset
without any knowledge of which hypothesis holds. It is also called the marginal
probability.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

Maximum A Posteriori (MAP) Hypothesis, hMAP

• Given a set of candidate hypotheses, the hypothesis which has the maximum
value is considered as the maximum probable hypothesis or most probable
hypothesis.
• This most probable hypothesis is called the Maximum A Posteriori Hypothesis
hMAP. Bayes theorem Eq. (8.1) can be used to find the HMAP.

Maximum Likelihood (ML) Hypothesis, hML

• Given a set of candidate hypotheses, if every hypothesis is equally probable, only
P (E \ h) is used to find the most probable hypothesis.
• The hypothesis that gives the maximum likelihood for P (E\h) is called the
Maximum Likelihood (ML) Hypothesis, hML.

Correctness of Bayes theorem

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

• One related concept of Bayes theorem is the principle of Minimum Description

Length (MDL).
• The minimum description length (MDL) principle is yet another powerful
method like Occam's razor principle to perform inductive inference.
• It states that the best and most probable hypothesis is chosen for a set of
observed data or the one with the minimum description. Recall from Eq. (8.2)
Maximum A Posteriori (MAP) Hypothesis, hwAR which says that given a set of
candidate hypotheses, the hypothesis which has the maximum value is
considered as the maximum probable hypothesis or most probable hypothesis.
• Naive Bayes algorithm uses the Bayes theorem and applies this MDL principle
to find the best hypothesis for a given problem.

8.3.1 Naïve Bayes Algorithm

• It is a supervised binary class or multi class classification algorithm that works
on the principle of Bayes theorem.
• There is a family of Naive Bayes classifiers based on a common principle. These
algorithms classify for datasets whose features are independent and each feature
is assumed to be given equal weightage. It particularly works for a large dataset
and is very fast.
• It is one of the most effective and simple classification algorithms.
• This algorithm considers all features to be independent of each other even
though they are individually dependent on the classified object.
• Each of the features contributes a probability value independently during
classification and hence this algorithm is called as Naive algorithm.
• Some important applications of these algorithms are text classification,
recommendation system and face recognition.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

• As explained earlier the Likelihood probability is stated as the sampling density

for the evidence given the hypothesis.
• It is denoted as P (Evidence I Hypothesis), which says how likely the occurrence
of the evidence given the parameters is.
• It is calculated as the number of instances of each attribute value and for a given
class value divided by the number of instances with that class value.
• For example P (CGPA ≥9 I Job Offer = Yes) denotes the number of instances
with 'CGPA 29’ and 'Job Offer = Yes' divided by the total number of instances
with 'Job Offer = Yes'.
• From the Table 8.3 Frequency Matrix of GPA, number of instances with 'CGPA
29' and 'Job Offer = Yes' is 3.
• The total number of instances with Job Offer = Yes' is 7.
• Hence, P (CGPA 29 | Job Offer = Yes) = 3/7.
• Similarly, the Likelihood probability is calculated for all attribute values of
feature CGPA.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

8.3.2 Brute Force Bayes Algorithm

• Applying Bayes theorem, Brute Force Bayes algorithm relies on the idea of
concept learning wherein given a hypothesis space H for the training dataset I,
the algorithm computes the posterior probabilities for all the hypothesis h, eH.
• Then, Maximum A Posteriori (MAP) Hypothesis, hmap is used to output the
hypothesis with maximum posterior probability. The algorithm is quite
expensive since it requires computations for all the hypotheses.
• Although computing posterior probabilities is inefficient, this idea is applied in
various other algorithms which is also quite interesting.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

8.3.3 Bayes Optimal Classifier

• Bayes optimal classifier is a probabilistic model, which in fact, uses the Bayes
theorem to find the most probable classification for a new instance given the
training data by combining the predictions of all posterior hypotheses,
• This is different from Maximum A Posteriori (MAP) Hypothesis, Krap which
chooses the maximum probable hypothesis or the most probable hypothesis.
• Here, a new instance can be classified to a possible classification value C, by the
following

• Hmap chooses h1 which has the maximum probability value 0.3 has the solution
and gives result that the patient is covid negative.
• But Bayes Optimal Classifier combines the predictions of h2, h3 and h4 which is
0.4 and gives the result that the patient is covid negative.

• Therefore max Ci ∈ [COVID Positive, Covid Negative] Σhi ∈H P(Ci\hi)P(hi\T) =

COVID Positive
• Thus the algorithm, diagnoses the new instance to be COVID Positive

8.3.4 Gibbs Algorithm

• The main drawback of Bayes optimal classifier is that it computes the posterior
probability foral hypotheses in the hypothesis space and then combines the
predictions to classity a new instance.

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

• Gibbs algorithm is a sampling technique which randomly selects a hypothesis

from the hypothesis space according to the posterior probability distribution and
classifies a new instance
• It is found that the prediction error occurs twice with the Gibbs algorithm when
compared to Bayes Optimal classifier.

8.4 Naive Bayes Algorithm For Continuous Attributes

• There are two ways to predict with Naive Bayes algorithm for continuous
attributes:
1. Discretize continuous feature to discrete feature.
2. Apply Normal or Gaussian distribution for continuous feature.

Gaussian Naive Bayes Algorithm

• In Gaussian Naive Bayes, the values of continuous features are assumed to be
sampled from a Gaussian distribution.

Example 8.4:
Assess a student's performance using Naïve Bayes algorithm for the continuous
attribute. Predict whether a student gets a job offer or not in his final year of the
course. The training dataset T consists of 10 data instances with attributes such
as 'CGA' and 'Interactiveness' as shown in Table 8.13. The target variable is Job
Offer which is classified as Yes or No for a candidate student.

Solution:
Step 1: Compute the prior probability for the target feature Job offer

Altedegree.com
21CS54(AI/ML) Prof. Thameeza

8.5 OTHER POPULAR TYPES OF NAIVE BAYES CLASSIFIERS

Some of the popular variants of Bayesian classifier are listed below:

• Bernoulli Naive Bayes Classifier: Bernoulli Naive Bayes works with discrete
features. In this algorithm, the features used for making predictions are Boolean
variables that take only two values either 'yes' or 'no'. This is particularly useful
for text classification where all features are binary with each feature containing
two values whether the word occurs or not.
• Multinomial Naive Bayes Classifier: This algorithm is a generalization of the
Bernoulli Naive Bayes model that works for categorical data or particularly
integer features. This classifier is useful for text classification where each feature
will have an integer value that represents the frequency of occurrence of words.
• Multi-class Naive Bayes Classifier: This algorithm is useful for classification
problems with more than two classes where the target feature contains multiple
classes and test instance has to be predicted with the class it belongs to.

Altedegree.com

Cluster Analysis and Applications
No ratings yet
Cluster Analysis and Applications
37 pages
Decision Trees for Data Scientists
0% (1)
Decision Trees for Data Scientists
24 pages
Module 4
No ratings yet
Module 4
51 pages
What Is A Computer
No ratings yet
What Is A Computer
6 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Module3 Final
No ratings yet
Module3 Final
66 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
GASTAT-700 Interface Protcol V1.06 - 180115
No ratings yet
GASTAT-700 Interface Protcol V1.06 - 180115
21 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
43 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
01 Decision Tree Induction Algorithms - Tutorial
No ratings yet
01 Decision Tree Induction Algorithms - Tutorial
12 pages
4 & 5 DWM 2024-25
No ratings yet
4 & 5 DWM 2024-25
32 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
40 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
43 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
ID3 Problems
No ratings yet
ID3 Problems
6 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
ML Unit3
No ratings yet
ML Unit3
24 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision Tree 2
No ratings yet
Decision Tree 2
19 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Unit3 ML
No ratings yet
Unit3 ML
23 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
ML Unit-3
No ratings yet
ML Unit-3
23 pages
Unit 3
No ratings yet
Unit 3
28 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
44 pages
Unit 4
No ratings yet
Unit 4
78 pages
Machine Learning: Decision Trees & Algorithms
No ratings yet
Machine Learning: Decision Trees & Algorithms
24 pages
Decision Tree-31-01-2025
No ratings yet
Decision Tree-31-01-2025
28 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Classification Trees
No ratings yet
Classification Trees
48 pages
CN Solution 18CS52
No ratings yet
CN Solution 18CS52
25 pages
2024 Lecture11 MLAlgorithms
No ratings yet
2024 Lecture11 MLAlgorithms
84 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
1992 Mercedes 300 SE Audio Wiring Guide
100% (1)
1992 Mercedes 300 SE Audio Wiring Guide
3 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
CC103 Mod3
No ratings yet
CC103 Mod3
12 pages
Week 4 - Classification - Decision Tree 1
No ratings yet
Week 4 - Classification - Decision Tree 1
40 pages
21CS54 Module 4 2021 Scheme
No ratings yet
21CS54 Module 4 2021 Scheme
42 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Unique Student Identity and Profile System
No ratings yet
Unique Student Identity and Profile System
51 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
Decision Tree Classifier Project
100% (1)
Decision Tree Classifier Project
20 pages
IML Unit04 - Learning Decision Trees
No ratings yet
IML Unit04 - Learning Decision Trees
28 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Material Approval Application
No ratings yet
Material Approval Application
1 page
AIML Module-04
No ratings yet
AIML Module-04
46 pages
4 en
No ratings yet
4 en
36 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Alteryx Webinar Lecture 1 - Slides PDF
100% (1)
Alteryx Webinar Lecture 1 - Slides PDF
56 pages
Toll Bridge IT Security Audit Report
0% (1)
Toll Bridge IT Security Audit Report
3 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
Mar-2022 2 Merged
No ratings yet
Mar-2022 2 Merged
11 pages
SKEE BALL Classic: Installation and Operation Single Ball Release
No ratings yet
SKEE BALL Classic: Installation and Operation Single Ball Release
30 pages
BCA Syllabus
No ratings yet
BCA Syllabus
21 pages
09 Decision Tree Induction
No ratings yet
09 Decision Tree Induction
120 pages
Sap Certification Orientation Sep9
No ratings yet
Sap Certification Orientation Sep9
23 pages
Fuzzy Decision Trees
No ratings yet
Fuzzy Decision Trees
12 pages
All Pricelist
No ratings yet
All Pricelist
1 page
Cómo Escribir Un Ensayo Paso A Paso
100% (1)
Cómo Escribir Un Ensayo Paso A Paso
7 pages
Anisha ETL DataEngineer
No ratings yet
Anisha ETL DataEngineer
7 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
A7670 Series Hardware Design - V1.00
No ratings yet
A7670 Series Hardware Design - V1.00
61 pages
Day 7 Task: Understanding Package Manager and Systemctl: Tasks
No ratings yet
Day 7 Task: Understanding Package Manager and Systemctl: Tasks
6 pages
July-2017 Merged
No ratings yet
July-2017 Merged
11 pages
Mathematics Exercise Solutions
No ratings yet
Mathematics Exercise Solutions
17 pages
RBX - G2 - Man08008 (Ing)
No ratings yet
RBX - G2 - Man08008 (Ing)
45 pages
AUTOSAR Memory Stack
No ratings yet
AUTOSAR Memory Stack
31 pages
Chapter 12 Quizzes
No ratings yet
Chapter 12 Quizzes
3 pages
Class Xii Patfil Cs Project Final
No ratings yet
Class Xii Patfil Cs Project Final
81 pages
Senior Mobile App Developer Profile
No ratings yet
Senior Mobile App Developer Profile
5 pages
Monitoring Plant Health Andd Detection of Plant Disease Using Iot
No ratings yet
Monitoring Plant Health Andd Detection of Plant Disease Using Iot
15 pages
Queuing Model for KFC Jember
No ratings yet
Queuing Model for KFC Jember
19 pages
Training Material
No ratings yet
Training Material
15 pages
Carrier Aided Protection Scheme
No ratings yet
Carrier Aided Protection Scheme
4 pages
Mar-2022 4 Merged
No ratings yet
Mar-2022 4 Merged
9 pages
HTML Elements and Tags Guide
No ratings yet
HTML Elements and Tags Guide
8 pages
ZX81 Fpga VHDL
No ratings yet
ZX81 Fpga VHDL
1 page
SIVACON 8MF Calculation Table Doors IP20 2022-05
No ratings yet
SIVACON 8MF Calculation Table Doors IP20 2022-05
1 page
HW3 PDF
No ratings yet
HW3 PDF
1 page

Module 4

Uploaded by

Module 4

Uploaded by

21CS54(AI/ML) Prof.

DECISION TREE LEARNING & BAYESIAN LEARNING

• Why called as decision tree?

Structure of a Decision Tree

• The decision tree consists of 2 major procedures:

1) Building a tree and

2) Knowledge inference or classification.

Building the Tree

Knowledge Inference or Classification

Advantages of Decision Trees

Disadvantages of Decision Trees

Algorithm 6.1: General Algorithm for Decision Trees

Decision tree induction algorithms

ID3 Tree Construction(ID3 stands for Iterative Dichotomiser 3 )

node (terminal node) holds a class label.

The final decision tree is shown in figure below.

Dealing with Continuous Attributes in C4.5

Classification and Regression Trees Construction(CART)

Home Work Problems

VALIDATING AND PRUNING OF DECISION TREES

• Bayesian Learning is a learning method that describes and represents knowledge

8.1 Introduction to probability-based learning

8.2 Fundamentals of Bayes theorem

• Naive Bayes Model relies on Bayes theorem

• It is the updated or revised probability of an event taking into account the

8.3 Classification using bayes model

• Where, P (Hypothesis h) is the prior probability of the hypothesis h without

Maximum A Posteriori (MAP) Hypothesis, hMAP

Maximum Likelihood (ML) Hypothesis, hML

Correctness of Bayes theorem

• One related concept of Bayes theorem is the principle of Minimum Description

8.3.1 Naïve Bayes Algorithm

• As explained earlier the Likelihood probability is stated as the sampling density

8.3.2 Brute Force Bayes Algorithm

8.3.3 Bayes Optimal Classifier

• Therefore max Ci ∈ [COVID Positive, Covid Negative] Σhi ∈H P(Ci\hi)P(hi\T) =

8.3.4 Gibbs Algorithm

• Gibbs algorithm is a sampling technique which randomly selects a hypothesis

8.4 Naive Bayes Algorithm For Continuous Attributes

Gaussian Naive Bayes Algorithm

8.5 OTHER POPULAR TYPES OF NAIVE BAYES CLASSIFIERS

You might also like