0% found this document useful (0 votes)

11 views25 pages

Unit 3

The document outlines a syllabus for a unit on decision trees and the Bayes classifier, covering topics such as decision tree models for classification and regression, impurity measures, and the bias-variance trade-off. It explains the structure and functioning of decision trees, including key terminologies like root node, leaf node, and pruning, as well as the Naive Bayes classifier and its application in classification problems. The document also discusses advantages and disadvantages of decision trees, along with methods for attribute selection and the importance of understanding bias and variance in model training.

Uploaded by

minukuripranitha566

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views25 pages

Unit 3

Uploaded by

minukuripranitha566

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

UNIT – III : Syllabus

Models Based on Decision Trees:

1. Decision Trees for Classification

2. Impurity Measures
3. Properties
4. Regression Based on Decision Trees ( Decision Trees for Regression )
5. Bias–Variance Trade-off
6. Random Forests for Classification and Regression

The Bayes Classifier:

1. Introduction to the Bayes Classifier

2. Bayes’ Rule and Inference
3. The Bayes Classifier and its Optimality
4. Multi-Class Classification
5. Class Conditional Independence
6. Naive Bayes Classifier (NBC)
Decision Tree for Classification
o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.

Why use Decision Trees?

There are various algorithms in Machine learning, so choosing the best algorithm for the
given dataset and problem is the main point to remember while creating a machine learning
model. Below are the two reasons for using the Decision tree:
o Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-
like structure.

Decision Tree Terminologies

Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.

Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.

Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.

Branch/Sub Tree: A tree formed by splitting the tree.

Pruning: Pruning is the process of removing the unwanted branches from the tree.

Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the
root node of the tree. This algorithm compares the values of root attribute with the record
(real dataset) attribute and, based on the comparison, follows the branch and jumps to the
next node.

For the next node, the algorithm again compares the attribute value with the other sub-nodes
and move further. It continues the process until it reaches the leaf node of the tree. The
complete process can be better understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created
in step -3. Continue this process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not. So, to solve this problem, the decision tree starts with the root
node (Salary attribute by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to select the best attribute
for the root node and for sub-nodes. So, to solve such problems there is a technique which is
called as Attribute selection measure or ASM. By this measurement, we can easily select
the best attribute for the nodes of the tree. There are two popular techniques for ASM, which
are:

o Information Gain
o Gini Index

1. Information Gain:

o Information gain is the measurement of changes in entropy after the segmentation of a

dataset based on an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the decision
tree.
o A decision tree algorithm always tries to maximize the value of information gain, and
a node/attribute having the highest information gain is split first. It can be calculated
using the below formula:

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies
randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples

o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Gini
index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑j Pj 2

Pruning: Getting an Optimal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal
decision tree.

A too-large tree increases the risk of overfitting, and a small tree may not capture all the
important features of the dataset. Therefore, a technique that decreases the size of the learning
tree without reducing accuracy is known as Pruning. There are mainly two types of
tree pruning technology used:

o Cost Complexity Pruning

o Reduced Error Pruning.

Advantages of the Decision Tree

o It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree
o The decision tree contains lots of layers, which makes it complex.
o It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
o For more class labels, the computational complexity of the decision tree may increase.
Decision Tree - ID3 Algorithm - Solved Numerical Example-1
Impurity Measures
Therefore
Properties of Impurity Measures
Decision Tree for Regression
A Decision Tree for Regression is a type of supervised learning algorithm used when the
target variable is continuous (numeric), rather than categorical. Instead of predicting class
labels, regression trees predict a real-valued output.

How Regression Trees Work

1. Splitting:
o The dataset is split into subsets based on feature values.
o The split is chosen to minimize a measure of error (commonly Mean Squared
Error (MSE) or Mean Absolute Error (MAE)) between predicted and actual
values.
2. Recursive Partitioning:
o The process is repeated recursively, creating a tree with decision rules at each
node.
o Each split divides the input space into smaller, more homogeneous regions.
3. Prediction:
o For a new input, the tree traverses from the root to a leaf node based on feature
conditions.
o The prediction at the leaf node is usually the mean (or median) of the target
values of training samples in that node.

Example

Suppose you want to predict house price based on features like size and location.

 The tree might first split on size > 2000 sqft.

 Then, within each group, it might split on location = urban or rural.
 Each leaf will output the average house price for the training samples in that region.

Objective Function

Advantages

 Easy to interpret and visualize

 Can capture nonlinear relationships
 No need for feature scaling

Disadvantages

 Can overfit (trees grow too deep)

 High variance (small data changes can alter structure)
 Not as accurate as ensemble methods (e.g., Random Forests, Gradient Boosted Trees)
Bias,Variance ,Trade-off
1. It is important to understand prediction errors (bias and variance).
2. Tradeoff referred best solution for selecting a value called Regularization
constant.
3. Proper understanding of these errors would help to avoid the overfitting and
underfitting.
Bias:
1. The bias is known as the difference between the prediction value and correct
value.
2. High Bias gives a large error in training as well as testing data.
3. An algorithm should always be low bias to avoid the underfitting.
4. By high bias, the predicted data is in a straight line format.
5. Such fitting is known as Underfitting of Data.
6. This happens when the hypothesis is too simple or linear.

HighBias

Linear hypothesis looks like bellow.

Variance:
1. The variability of model prediction for given data point is called the variance of
the model.
2. The model with high variance denotes very complex to fit to the training data.
3. The model perform very well on training data but has high error rate on test
data.
4. When a model is high on variance, then it is said to be Overfitting of Data.
5. While training a data model variance should be low.
6. The high variance data looks like bellow
High Variance

The hypothesis looks like bellow.

Trade-off:

1. If the algorithm is too simple (hypothesis with linear eq.) then it may be on high
bias and low variance condition and thus is error-prone.
2. If algorithms fit too complex ( hypothesis with high degree eq.) then it may be
on high variance and low bias.
3. There is something between both of these conditions is known as Trade-off or
Bias Variance Trade-off.
4. For the graph, the perfect tradeoff will be like.
Model complexity graph shows trade-off like bellow :

This is referred to as the best point for the training of the algorithm which gives low
error in training as well as testing data.
The bias value and variance value should be minimum to fed the data to Algorithm
In best fit area the algorithm performance will be good.
Naive Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training
dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can make
quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naive Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

Why is it called Naive Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:

o Naive: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple
without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem

Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
o The formula for Bayes' theorem is given as:

Outlook Play

1 Overcast Yes

2 Sunny Yes

3 Overcast Yes
4 Overcast Yes

5 Sunny No

6 Rainy Yes

7 Sunny Yes

8 Overcast Yes

9 Rainy No

10 Sunny No

11 Sunny Yes

12 Rainy Yes

13 Overcast Yes

14 Rainy No

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Working of Naive Bayes' Classifier:
Working of Naïve Bayes' Classifier can be understood with the help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play".
So using this dataset we need to decide that whether we should play or not on a particular day
according to the weather conditions. So to solve this problem, we need to follow the below
steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 4

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

Advantages of Naive Bayes Classifier:

o Naive Bayes is one of the fast and easy ML algorithm to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions with compared to the other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naive Bayes Classifier:

o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn
the relationship between features.
Applications of Naive Bayes Classifier:

o It is used for Credit Scoring.

o It is used in medical data classification.
o It can be used in real-time predictions because Naive Bayes Classifier is an eager
learner.
o It is used in Text classification such as Spam filtering and Sentiment analysis.

Understanding Sensory Processing Disorder
100% (2)
Understanding Sensory Processing Disorder
61 pages
Simonkucher Case Interview Prep 2015 PDF
100% (1)
Simonkucher Case Interview Prep 2015 PDF
23 pages
Tree
No ratings yet
Tree
7 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Decision Tree & Random ForestNotes
No ratings yet
Decision Tree & Random ForestNotes
11 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Deciosn Tree
No ratings yet
Deciosn Tree
5 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Main Algorithms Used in Machine Learning Lecture Notes
No ratings yet
Main Algorithms Used in Machine Learning Lecture Notes
26 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
13.decision Tree
No ratings yet
13.decision Tree
29 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Lab 2
No ratings yet
Lab 2
3 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Unit 4
No ratings yet
Unit 4
33 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
2.unit 2
No ratings yet
2.unit 2
23 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Decision Tree Algorithm Guide
No ratings yet
Decision Tree Algorithm Guide
10 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
NOTES
No ratings yet
NOTES
18 pages
2179 Unit 3
No ratings yet
2179 Unit 3
29 pages
1822 B.E Cse Batchno 149
No ratings yet
1822 B.E Cse Batchno 149
66 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
UNIT 2 - Groups (Decision Tree)
No ratings yet
UNIT 2 - Groups (Decision Tree)
20 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
UNIT-3 ML Notes
No ratings yet
UNIT-3 ML Notes
4 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Unit 3
No ratings yet
Unit 3
21 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Decision Trees for Data Enthusiasts
No ratings yet
Decision Trees for Data Enthusiasts
52 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Decision Tree
No ratings yet
Decision Tree
24 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Cours #4-Decision Tree
No ratings yet
Cours #4-Decision Tree
18 pages
2.12 Chapter 6 Decision Tree
No ratings yet
2.12 Chapter 6 Decision Tree
56 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Personalized Learning Path Generator (PLPG)
No ratings yet
Personalized Learning Path Generator (PLPG)
3 pages
Apa Style Dissertation Table of Contents
100% (2)
Apa Style Dissertation Table of Contents
4 pages
The Effects of Poor Reading Comprehension On The Academic Performance of Grade 11 Students at Electron Collage Technical Education
No ratings yet
The Effects of Poor Reading Comprehension On The Academic Performance of Grade 11 Students at Electron Collage Technical Education
11 pages
CLAT 2023 UG Provisional 2nd List
No ratings yet
CLAT 2023 UG Provisional 2nd List
6 pages
Pediatric Neuropsychology Tool Update
No ratings yet
Pediatric Neuropsychology Tool Update
6 pages
SLAC-Proposal-May 19, 2023
No ratings yet
SLAC-Proposal-May 19, 2023
16 pages
Clinical Attachment Guidance UK
No ratings yet
Clinical Attachment Guidance UK
2 pages
CRM for Retail Efficiency
No ratings yet
CRM for Retail Efficiency
80 pages
Cross Cultural Understanding: Aan Pranata (17018106)
No ratings yet
Cross Cultural Understanding: Aan Pranata (17018106)
3 pages
E Twinning
No ratings yet
E Twinning
13 pages
Teaching Listening and Speaking For English Young Learners
No ratings yet
Teaching Listening and Speaking For English Young Learners
18 pages
Strategy: The Totality of Decisions - 47
No ratings yet
Strategy: The Totality of Decisions - 47
1 page
MDSK Atc Revit MEP Essentials
No ratings yet
MDSK Atc Revit MEP Essentials
2 pages
Lesson Plan Grade 2 Competency 1 Quarter 1
No ratings yet
Lesson Plan Grade 2 Competency 1 Quarter 1
17 pages
Professional Education Test
No ratings yet
Professional Education Test
7 pages
Leadership: Definitions and Impact
0% (1)
Leadership: Definitions and Impact
68 pages
Educ 6 142 Module 1 Lesson 1 and 2
No ratings yet
Educ 6 142 Module 1 Lesson 1 and 2
29 pages
Curvitaeko Updated
No ratings yet
Curvitaeko Updated
4 pages
Application 22290050 57
No ratings yet
Application 22290050 57
34 pages
EDUC 5010 Written Assignment U1
No ratings yet
EDUC 5010 Written Assignment U1
7 pages
OB-GYN Outpatient Census 6/7/19
No ratings yet
OB-GYN Outpatient Census 6/7/19
2 pages
6.10 Searchable
No ratings yet
6.10 Searchable
101 pages
Beginner's Guide to Korean Basics
100% (1)
Beginner's Guide to Korean Basics
12 pages
Stockholm's Educational Evolution
No ratings yet
Stockholm's Educational Evolution
2 pages
Business Plan Template
No ratings yet
Business Plan Template
10 pages
Teacher Professional Development On ICT in Education in The Philippines
No ratings yet
Teacher Professional Development On ICT in Education in The Philippines
6 pages
Bahamas Medical Council: Application Form For Registration
No ratings yet
Bahamas Medical Council: Application Form For Registration
2 pages
G7 Physics Comp Review Packet 2022-2023
No ratings yet
G7 Physics Comp Review Packet 2022-2023
25 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

UNIT – III : Syllabus

Models Based on Decision Trees:

1. Decision Trees for Classification

The Bayes Classifier:

1. Introduction to the Bayes Classifier

Why use Decision Trees?

Decision Tree Terminologies

Branch/Sub Tree: A tree formed by splitting the tree.

How does the Decision Tree algorithm Work?

Attribute Selection Measures

o Information gain is the measurement of changes in entropy after the segmentation of a

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

o S= Total number of samples

Pruning: Getting an Optimal Decision tree

o Cost Complexity Pruning

Advantages of the Decision Tree

How Regression Trees Work

 The tree might first split on size > 2000 sqft.

 Easy to interpret and visualize

 Can overfit (trees grow too deep)

Linear hypothesis looks like bellow.

The hypothesis looks like bellow.

Why is it called Naive Bayes?

Solution: To solve this, first consider the below dataset:

Frequency table for the Weather Conditions:

Likelihood table weather condition:

Overcast 0 5 5/14= 0.35

All 4/14=0.29 10/14=0.71

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Advantages of Naive Bayes Classifier:

Disadvantages of Naive Bayes Classifier:

o It is used for Credit Scoring.

You might also like