0% found this document useful (0 votes)

12 views16 pages

Classification 4

The document discusses decision tree induction as a supervised learning method for classification and regression, detailing its structure, key factors like entropy and information gain, and its advantages and disadvantages. It also covers Bayesian classification methods, rule-based classification, and k-nearest-neighbor classifiers, highlighting their respective algorithms, applications, and issues in data preparation. Overall, it provides a comprehensive overview of various classification techniques in data mining.

Uploaded by

vivekwolf61

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views16 pages

Classification 4

Uploaded by

vivekwolf61

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

FDS

4.CLASSIFICATION

Decision Tree Induction

A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal
node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node
holds a class label. The topmost node in the tree is the root node.

The following decision tree is for the concept buy computer that indicates whether a customer at
a company is likely to buy a computer or not. Each internal node represents a test on an attribute.

Each leaf node represents a class.

Decision Tree Induction

 Decision Tree is a supervised learning method used in data mining for classification and
regression methods. It is a tree that helps us in decision-making purposes.
 The decision tree creates classification or regression models as a tree structure. It separates
a data set into smaller subsets, and at the same time, the decision tree is steadily developed.
The final tree is a tree with the decision nodes and leaf nodes.
 A decision node has at least two branches. The leaf nodes show a classification or decision.
We can't accomplish more split on leaf nodes-The uppermost decision node in a tree that

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

relates to the best predictor called the root node. Decision trees can deal with both
categorical and numerical data.

Key factors Entropy:

Entropy refers to a common way to measure impurity. In the decision tree, it measures the
randomness or impurity in data sets.

Information Gain:

Information Gain refers to the decline in entropy after the dataset is split. It is also called Entropy
Reduction. Building a decision tree is all about discovering attributes that return the highest data
gain.

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

In short, a decision tree is just like a flow chart diagram with the terminal nodes showing decisions.
Starting with the dataset, we can measure the entropy to find a way to segment the set until the
data belongs to the same class.

Why are decision trees useful?

 It enables us to analyze the possible consequences of a decision thoroughly.

 It provides us a framework to measure the values of outcomes and the probability of
accomplishing them.
 It helps us to make the best decisions based on existing data and best speculations.
 In other words, we can say that a decision tree is a hierarchical tree structure that can be
used to split an extensive collection of records into smaller sets of the class by
implementing a sequence of simple decision rules.
 A decision tree model comprises a set of rules for portioning a huge heterogeneous
population into smaller, more homogeneous, or mutually exclusive classes. The attributes
of the classes can be any variables from nominal, ordinal, binary, and quantitative values,
in contrast, the classes must be a qualitative type, such as categorical or ordinal or binary.
 In brief, the given data of attributes together with its class, a decision tree creates a set of
rules that can be used to identify the class. One rule is implemented after another, resulting
in a hierarchy of segments within a segment.
 The hierarchy is known as the tree, and each segment is called a node. With each
progressive division, the members from the subsequent sets become more and more similar

to each other. Hence, the algorithm used to build a decision tree is referred to as recursive

partitioning. The algorithm is known as CART (Classification and Regression Trees) Consider the

given example of a factory where

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

 Expanding factor costs $3 million, the probability of a good economy is 0.6 (60%), which
leads to $8 million profit, and the probability of a bad economy is 0.4 (40%), which leads
to $6 million profit.
 Not expanding factor with 0$ cost, the probability of a good economy is 0.6(60%), which
leads to $4 million profit, and the probability of a bad economy is 0.4, which leads to $2
million profit.
 The management teams need to take a data-driven decision to expand or not based on the
given data.

NetExpand=(0.6*8+0.4*6)-3=$4.2M
NetNotExpand=(0.6*4+0.4*2)-0=$3M
$4.2M > $3M,therefore the factory should be expanded.

Decision Tree Terminologies

• Root Node: A decision tree’s root node, which represents the original choice or
feature from which the tree branches, is the highest node.

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

• Internal Nodes (Decision Nodes): Nodes in the tree whose choices are determined
by the values of particular attributes. There are branches on these nodes that go to
other nodes.
• Leaf Nodes (Terminal Nodes): The branches’ termini, when choices or forecasts are
decided upon. There are no more branches on leaf nodes.
• Branches (Edges): Links between nodes that show how decisions are made in
response to particular circumstances.
• Splitting: The process of dividing a node into two or more sub-nodes based on a
decision criterion. It involves selecting a feature and a threshold to create subsets of
data.
• Parent Node: A node that is split into child nodes. The original node from which a
split originates.
• Child Node: Nodes created as a result of a split from a parent node.
• Decision Criterion: The rule or condition used to determine how the data should be
split at a decision node. It involves comparing feature values against a threshold.
• Pruning: The process of removing branches or nodes from a decision tree to improve
its generalisation and prevent overfitting.

Example of a Decision Tree Algorithm

Forecasting Activities Using Weather Information
• Root node: Whole dataset
• Attribute : “Outlook” (sunny, cloudy, rainy).
• Subsets: Overcast, Rainy, and Sunny.

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

• Recursive Splitting: Divide the sunny subset even more according to humidity, for
example.
• Leaf Nodes: Activities include “swimming,” “hiking,” and “staying inside.”

Advantages of Decision Tree

• Easy to understand and interpret, making them accessible to non-experts.
• Handle both numerical and categorical data without requiring extensive
preprocessing.
• Provides insights into feature importance for decision-making.
• Handle missing values and outliers without significant impact.
• Applicable to both classification and regression tasks.
Disadvantages of Decision Tree
• Disadvantages include the potential for overfitting
• Sensitivity to small changes in data, limited generalization if training data is not
representative

• Potential bias in the presence of imbalanced data.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root
node of the tree. This algorithm compares the values of root attribute with the record (real dataset)
attribute and, based on the comparison, follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with the other sub-nodes and
move further. It continues the process until it reaches the leaf node of the tree. The complete process
can be better understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.

o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attribute

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

o Step-4: Generate the decision tree node, which contains the best attribute.

o Step-5: Recursively make new decision trees using the subsets of the dataset created

o Step-6:Continue this process until a stage is reached where you cannot further classify
the nodes and called the final node as a leaf node.

Advantages of the Decision Tree o It is simple to understand as it follows the same process
which a human follow while making any decision in real-life.

o It can be very useful for solving decision-related problems.

o It helps to think about all the possible outcomes for a problem.

o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.

o It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.

o For more class labels, the computational complexity of the decision tree may increase.

Bayes Classification Methods

In numerous applications, the connection between the attribute set and the class variable is non-
deterministic. In other words, we can say the class label of a test record can’t be assumed with
certainty even though its attribute set is the same as some of the training examples. These
circumstances may emerge due to the noisy data or the presence of certain confusing factors that
influence classification, but it is not included in the analysis.

For example, consider the task of predicting the occurrence of whether an individual is at risk for
liver illness based on individuals eating habits and working efficiency. Although most people who

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

eat healthy and exercise consistently having less probability of occurrence of liver disease, they
may still do so due to other factors. For example, due to consumption of the high-calorie street
foods and alcohol abuse. Determining whether an individual's eating routine is healthy or the
workout efficiency is sufficient is also subject to analysis, which in turn may introduce
vulnerabilities into the leaning issue.

Bayesian classification uses Bayes theorem to predict the occurrence of any event. Bayesian
classifiers are the statistical classifiers with the Bayesian probability understandings. The theory
expresses how a level of belief, expressed as a probability.

Bayes theorem came into existence after Thomas Bayes, who first utilized conditional probability
to provide an algorithm that uses evidence to calculate limits on an unknown parameter.

Bayes's theorem is expressed mathematically by the following equation that is given belo

Where X and Y are the events and P (Y) ≠ 0

P(X/Y) is a conditional probability that describes the occurrence of event X is given that Y is
true.

P(Y/X) is a conditional probability that describes the occurrence of event Y is given that X is
true.

P(X) and P(Y) are the probabilities of observing X and Y independently of each other. This is
known as the marginal probability.

Bayesian interpretation:
In the Bayesian interpretation, probability determines a "degree of belief." Bayes theorem connects
the degree of belief in a hypothesis before and after accounting for evidence. For example, lets us

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

consider an example of the coin. If we toss a coin, then we get either heads or tails, and the percent
of occurrence of either head and tails is 50%. If the coin is flipped numbers of times, and the
outcomes are observed, the degree of belief may rise, fall, or remain the same depending on the
outcomes.

For proposition X and evidence Y,

o P(X), the prior, is the primary degree of belief in X o P(X/Y), the

posterior is the degree of belief having accounted for Y.

o The quotient represents the supports Y provides for X.

Bayes theorem can be derived from the conditional probability:

Advantages of Naïve Bayes Classifier:

o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.

o It can be used for Binary as well as Multi-class Classifications.

o It performs well in multi-class predictions as compared to the other Algorithms. o It is

the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn
the relationship between features.

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring. o It is used in medical data classification. o It can be

used in real-time predictions because Naïve Bayes Classifier is an eager learner.

Rule-based classification

o Rule-based classifiers are just another type of classifier which makes the class decision
depending by using various “if..else” rules. These rules are easily interpretable and thus
these classifiers are generally used to generate descriptive models. The condition used
with “if” is called the antecedent and the predicted class of each rule is called the
consequent.

Properties of rule-based classifiers:

Coverage: The percentage of records which satisfy the antecedent conditions of a
particular rule. The rules generated by the rule-based classifiers may not be exhaustive,
i.e. there may be some records which are not covered by any of the rules.
The decision boundaries created by them is linear, but these can be much more complex
than the decision tree because the many rules are triggered for the same record.

Using IF-THEN Rules for Classification

IF-THEN Rule
To define the IF-THEN rule, we can split it into two parts:

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

Rule Antecedent: This is the “if condition” part of the rule. This part is present
in the LHS(Left Hand Side). The antecedent can have one or more attributes as conditions, with
logic AND operator.

Rule Consequent: This is present in the rule's RHS(Right Hand Side). The rule consequent
consists of the class prediction.

Example:
R1: IF tutor = coding Ninja AND student = yes
THEN happy Learning = true
Properties of Rule-Based Classifiers
There are two significant properties of rule-based classification in data mining. They are:
• Rules may not be mutually exclusive
• Rules may not be exhaustive

Rule Pruning
The Assessment of quality is made on the original set of training data. The rule may perform well on

training data but less well on subsequent data. That is why the rule pruning is required.

FOIL is one of the simple and effective method for rule pruning. For a given rule R,

FOIL_Prune = pos - neg / pos + neg

where pos and neg is the number of positive tuples covered by R, respectively.

Note − This value will increase with the accuracy of R on the pruning set. Hence, if the
FOIL_Prune value is higher for the pruned version of R, then we prune R.

Advantages of Rule-Based Classification

• The rule-based classification is easy to generate.

• It is highly expressive in nature and very easy to understand.

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

• It assists in the classification of new records in significantly less time and very quickly.

• It helps us to handle redundant values during classification properly.

k-Nearest-Neighbor Classifiers
The k-nearest-neighbor method was first described in the early 1950s. The method is labor
intensive when given large training sets, and did not gain popularity until the 1960s when
increased computing power became available. It has since been widely used in the area of pattern
recognition.

Nearest-neighbor classifiers are based on learning by analogy, that is, by comparing a given test
tuple with training tuples that are similar to it. The training tuples are described by n attributes.
Each tuple represents a point in an n-dimensional space. In this way, all of the training tuples are
stored in an n-dimensional pattern space. When given an unknown tuple, a k-nearest-neighbor
classifier searches the pattern space for the k training tuples that are closest to the unknown tuple.
These k training tuples are the k ―nearest neighbors‖ of the unknown tuple.

―Closeness‖ is defined in terms of a distance metric, such as Euclidean distance. The

Euclidean distance between two points or tuples, say, X1 = (x11, x12, : : : , x1n) and X2 = (x21,
x22, : : , x2n),

Classification and Prediction Issues

The major issue is preparing the data for Classification and Prediction. Preparing the data involves
the following activities, such as:

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

1. Data Cleaning: Data cleaning involves removing the noise and treatment of missing
values. The noise is removed by applying smoothing techniques, and the problem of
missing values is solved by replacing a missing value with the most commonly occurring
value for that attribute.

2. Relevance Analysis: The database may also have irrelevant attributes. Correlation analysis
is used to know whether any two given attributes are related.

3. Data Transformation and reduction: The data can be transformed by any of the following
methods. o Normalization: The data is transformed using normalization. Normalization
involves scaling all values for a given attribute to make them fall within a small specified
range. Normalization is used when the neural networks or the methods involving
measurements are used in the learning step.

o Generalization: The data can also be transformed by generalizing it to the higher

concept. For this purpose, we can use the concept hierarchies.
NOTE: Data can also be reduced by some other methods such as wavelet transformation,
binning, histogram analysis, and clustering.

Comparison of Classification and Prediction Methods

Here are the criteria for comparing the methods of Classification and Prediction, such as:

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

o Accuracy: The accuracy of the classifier can be referred to as the ability of the classifier
to predict the class label correctly, and the accuracy of the predictor can be referred to as how
well a given predictor can estimate the unknown value.

o Speed: The speed of the method depends on the computational cost of generating and using
the classifier or predictor.

o Robustness: Robustness is the ability to make correct predictions or classifications. In the

context of data mining, robustness is the ability of the classifier or predictor to make correct
predictions from incoming unknown data.

o Scalability: Scalability refers to an increase or decrease in the performance of the classifier

or predictor based on the given data.

o Interpretability: Interpretability is how readily we can understand the reasoning behind

predictions or classification made by the predictor or classifier.

Precision and Recall

Precision is the proportion of correct positive classifications (true positive) divided by the total
number of predicted positive classifications that were made (true positive + false positive).
Recall is the proportion of correct positive classifications (true positive) divided by the total
number of the truly positive classifications (true positive + false negative).
A PR curve is simply a graph with Precision values on the y-axis and Recall values on the
xaxis. In other words, the PR curve contains 𝑇𝑃𝑇𝑃+𝐹𝑃TP+FPTP on the y-axis and
𝑇𝑃𝑇𝑃+𝐹𝑁TP+FNTP on the x-axis.
• It is important to note that Precision is also called the Positive Predictive Value (PPV).

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

• The recall is also called Sensitivity, Hit Rate, or True Positive Rate (TPR). The figure
below shows a comparison of sample PR and ROC curves.

Interpreting a Precision-Recall Curve

It is desired that the algorithm should have both high precision and high recall. However, most
machine learning algorithms often involve a trade-off between the two. A good PR curve has
greater AUC (area under the curve). In the figure above, the classifier corresponding to the blue
line has better performance than the classifier corresponding to the green line. It is important to
note that the classifier that has a higher AUC on the ROC curve will always have a higher AUC
on the PR curve as well.

Need for a PR curve when the ROC curve exists?

PR curve is particularly useful in reporting Information Retrieval results.
Information Retrieval involves searching a pool of documents to find ones that are relevant to a
particular user query. For instance, assume that the user enters a search query “Pink Elephants”.
The search engine skims through millions of documents (using some optimized algorithms) to
retrieve a handful of relevant documents. Hence, we can safely assume that the no. of relevant
documents will be much less compared to the no. of non-relevant documents.
In this scenario,
• TP = No. of retrieved documents that are relevant (good results).
• FP = No. of retrieved documents that are non-relevant (bogus search results).
• TN = No. of non-retrieved documents that are non-relevant.
• FN = No. of non-retrieved documents that are relevant (good documents we missed).

MS. Navya Shree A ,Dept of BCA,SSIBM

FDS

ROC curve is a plot containing Recall = TPR = 𝑇𝑃𝑇𝑃+𝐹𝑁TP+FNTP on the x-axis and FPR =
𝐹𝑃𝐹𝑃+𝑇𝑁FP+TNFP on the y-axis. Since the no. of true negatives, i.e. non-retrieved
documents that are non-relevant, is such a huge number, the FPR becomes insignificantly
small.
Further, FPR does not help us evaluate a retrieval system well because we want to focus more on
the retrieved documents, and not the non-retrieved ones. PR curve helps solve this issue. PR
curve has the Recall value (TPR) on the x-axis, and precision = 𝑇𝑃𝑇𝑃+𝐹𝑃TP+FPTP on the
yaxis. Precision helps highlight how relevant the retrieved results are, which is more
important while judging an IR system. Hence, a PR curve is often more common around
problems involving information retrieval.

MS. Navya Shree A ,Dept of BCA,SSIBM

International Math Bowl Open 2024
No ratings yet
International Math Bowl Open 2024
9 pages
First COT Detailed Lesson Plan
100% (2)
First COT Detailed Lesson Plan
7 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Unit 3
No ratings yet
Unit 3
25 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Trees
No ratings yet
Decision Trees
9 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
8 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
UNIT 2 - Groups (Decision Tree)
No ratings yet
UNIT 2 - Groups (Decision Tree)
20 pages
QUBE-Servo 2 - Second Order Systems Workbook (Student)
No ratings yet
QUBE-Servo 2 - Second Order Systems Workbook (Student)
6 pages
Decision Tree Algorithm Guide
No ratings yet
Decision Tree Algorithm Guide
10 pages
Decision Tree & Random ForestNotes
No ratings yet
Decision Tree & Random ForestNotes
11 pages
Tutorial sheet-1-MA1003E
No ratings yet
Tutorial sheet-1-MA1003E
2 pages
2.unit 2
No ratings yet
2.unit 2
23 pages
Module 6 Texture
No ratings yet
Module 6 Texture
20 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Machine Learning Chapter 4
No ratings yet
Machine Learning Chapter 4
9 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Did Staggered
No ratings yet
Did Staggered
37 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
Deciosn Tree
No ratings yet
Deciosn Tree
5 pages
1822 B.E Cse Batchno 149
No ratings yet
1822 B.E Cse Batchno 149
66 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
NOTES
No ratings yet
NOTES
18 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Notes Decision Tree
No ratings yet
Notes Decision Tree
22 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Cours #4-Decision Tree
No ratings yet
Cours #4-Decision Tree
18 pages
Class XI Math Exam Marking Scheme
No ratings yet
Class XI Math Exam Marking Scheme
6 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
MAMBA
No ratings yet
MAMBA
5 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree
No ratings yet
Decision Tree
24 pages
Tree
No ratings yet
Tree
7 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Ethiopian & Gregorian Digital Calendar Design
No ratings yet
Ethiopian & Gregorian Digital Calendar Design
33 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Breaking Down Decision Tree Algorithm
No ratings yet
Breaking Down Decision Tree Algorithm
10 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Topology Optimization for Engineers
No ratings yet
Topology Optimization for Engineers
14 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
Sas Interview Questions
No ratings yet
Sas Interview Questions
79 pages
Assignment 1
100% (1)
Assignment 1
3 pages
Design & Analysis of Algorithms (DAA) Unit - II
No ratings yet
Design & Analysis of Algorithms (DAA) Unit - II
24 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Chapter 6 Shear and Moments in Beams Updting 2020
No ratings yet
Chapter 6 Shear and Moments in Beams Updting 2020
19 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Unit 4
No ratings yet
Unit 4
33 pages
Semi-Supervised Learning A Brief Review
No ratings yet
Semi-Supervised Learning A Brief Review
6 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Math Problem Set with Solutions
No ratings yet
Math Problem Set with Solutions
5 pages
Estimation of The Low Cycle Fatigue Life For Submarine Pressure Hull
No ratings yet
Estimation of The Low Cycle Fatigue Life For Submarine Pressure Hull
12 pages
Differential Equations Course Guide
No ratings yet
Differential Equations Course Guide
20 pages
Strain Gauge Catalog for Engineers
No ratings yet
Strain Gauge Catalog for Engineers
92 pages
Lecture 9 - Performance Evaluation
No ratings yet
Lecture 9 - Performance Evaluation
2 pages
Secondary 4 Mathematics Exam 2004
No ratings yet
Secondary 4 Mathematics Exam 2004
7 pages
Ateneo de Davao Math Curriculum 2007
No ratings yet
Ateneo de Davao Math Curriculum 2007
2 pages
Beam Analysis in Concrete Design
No ratings yet
Beam Analysis in Concrete Design
15 pages
Deep Work Rules For Focused Success in A Distracted World 1st Edition Cal Newport Download
100% (4)
Deep Work Rules For Focused Success in A Distracted World 1st Edition Cal Newport Download
139 pages
Anurag Tyagi Differentiations
No ratings yet
Anurag Tyagi Differentiations
10 pages
Statistical Methods: Descriptive & Inferential Statistics
No ratings yet
Statistical Methods: Descriptive & Inferential Statistics
48 pages
Automation Chapter 4
No ratings yet
Automation Chapter 4
44 pages
Decision Trees for Data Enthusiasts
No ratings yet
Decision Trees for Data Enthusiasts
52 pages
Furuta Pendulum Final Report - MIT Student Group Project PDF
No ratings yet
Furuta Pendulum Final Report - MIT Student Group Project PDF
33 pages
Unit1 PD
No ratings yet
Unit1 PD
8 pages
Truss Analysis & Elastic Strain Energy
No ratings yet
Truss Analysis & Elastic Strain Energy
12 pages

Classification 4

Uploaded by

Classification 4

Uploaded by

FDS

Decision Tree Induction

Each leaf node represents a class.

Decision Tree Induction

MS. Navya Shree A ,Dept of BCA,SSIBM

Key factors Entropy:

MS. Navya Shree A ,Dept of BCA,SSIBM

Why are decision trees useful?

 It enables us to analyze the possible consequences of a decision thoroughly.

given example of a factory where

MS. Navya Shree A ,Dept of BCA,SSIBM

Decision Tree Terminologies

MS. Navya Shree A ,Dept of BCA,SSIBM

Example of a Decision Tree Algorithm

MS. Navya Shree A ,Dept of BCA,SSIBM

Advantages of Decision Tree

• Potential bias in the presence of imbalanced data.

How does the Decision Tree algorithm Work?

MS. Navya Shree A ,Dept of BCA,SSIBM

o It can be very useful for solving decision-related problems.

o It helps to think about all the possible outcomes for a problem.

o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree

Bayes Classification Methods

MS. Navya Shree A ,Dept of BCA,SSIBM

Where X and Y are the events and P (Y) ≠ 0

MS. Navya Shree A ,Dept of BCA,SSIBM

For proposition X and evidence Y,

o P(X), the prior, is the primary degree of belief in X o P(X/Y), the

posterior is the degree of belief having accounted for Y.

o The quotient represents the supports Y provides for X.

Bayes theorem can be derived from the conditional probability:

Advantages of Naïve Bayes Classifier:

o It can be used for Binary as well as Multi-class Classifications.

o It performs well in multi-class predictions as compared to the other Algorithms. o It is

the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

MS. Navya Shree A ,Dept of BCA,SSIBM

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring. o It is used in medical data classification. o It can be

used in real-time predictions because Naïve Bayes Classifier is an eager learner.

Properties of rule-based classifiers:

Using IF-THEN Rules for Classification

MS. Navya Shree A ,Dept of BCA,SSIBM

FOIL_Prune = pos - neg / pos + neg

Advantages of Rule-Based Classification

• The rule-based classification is easy to generate.

• It is highly expressive in nature and very easy to understand.

MS. Navya Shree A ,Dept of BCA,SSIBM

• It helps us to handle redundant values during classification properly.

―Closeness‖ is defined in terms of a distance metric, such as Euclidean distance. The

Classification and Prediction Issues

MS. Navya Shree A ,Dept of BCA,SSIBM

o Generalization: The data can also be transformed by generalizing it to the higher

Comparison of Classification and Prediction Methods

MS. Navya Shree A ,Dept of BCA,SSIBM

o Robustness: Robustness is the ability to make correct predictions or classifications. In the

o Scalability: Scalability refers to an increase or decrease in the performance of the classifier

or predictor based on the given data.

o Interpretability: Interpretability is how readily we can understand the reasoning behind

predictions or classification made by the predictor or classifier.

Precision and Recall

MS. Navya Shree A ,Dept of BCA,SSIBM

Interpreting a Precision-Recall Curve

Need for a PR curve when the ROC curve exists?

MS. Navya Shree A ,Dept of BCA,SSIBM

MS. Navya Shree A ,Dept of BCA,SSIBM

You might also like