0% found this document useful (0 votes)

16 views10 pages

Unit 4 Classification & Prediction

The document discusses classification techniques, focusing on model construction and usage, and highlights applications such as sentiment analysis and image classification. It details decision tree algorithms, including ID3 and CART, along with Bayesian classification and the Naïve Bayes algorithm, emphasizing their advantages and limitations. Additionally, it covers rule-based classification, lazy learner algorithms like KNN, and key metrics for evaluating model performance, including accuracy, precision, and recall.

Uploaded by

anjoomvkkl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views10 pages

Unit 4 Classification & Prediction

Uploaded by

anjoomvkkl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT – 4 CLASSIFICATION AND PREDICTION

CLASSIFICATION

• Classification technique is a systematic approach to building classification models from an input

data set that is used to identify class label.
• Input data set is called Training data set.
• In classification, we arrange new data with the help of current or past data.

There are two stages of classification

1. Model Construction
2. Model Usage.

Model Construction:

Model Usage / Testing:

APPLICATIONS OF CLASSIFICATION:
1. Sentiment Analysis: Sentiment analysis is highly helpful in social media monitoring. We can use it
to extract social media insights.
2. Document Classification: We can use document classification to organize the documents into
sections according to the content. Document classification refers to text classification; we can
classify the words in the entire document.
3. Image Classification: Image classification is used for the trained categories of an image.
4. Machine Learning Classification: It uses the statistically demonstrable algorithm rules to execute
analytical tasks that would take humans hundreds of more hours to perform.
ISSUES IN CLASSIFICATION & PREDICTION:
DECISION TREE INDCUTION ALGORITHM (ID3, CART ALGORITHM)
A machine researcher named J. Ross Quinlan in 1980 developed a decision tree algorithm known as ID3
(Iterative Dichotomiser). In this algorithm, there is no backtracking; the trees are constructed in a top-
down recursive divide-and-conquer manner.
A decision tree is a structure that includes a root node, branches, and leaf nodes. Decision tree
algorithm creates classification or regression models as a tree structure to solve the problem.
Decision Tree Terminologies:
1. Root Node: Top most node in the tree. Contains data which is known as attributes.
2. Internal Node: Nodes in between root node and leaf node. It denotes a test on attributes.
3. Leaf Node: Last node in the tree. It represents an output or class label. S
4. Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
5. Branch/Sub Tree: A tree formed by splitting the tree.
6. Pruning: Pruning is the process of removing the unwanted branches from the tree.
Root node and Internal Node are represented as Rectangle
Leaf node represented as Ovals.
Attribute Selection Measure / Key Factors:
Entropy: It refers to common way to measure impurity. In the decision tree, it measures the degree of
randomness or uncertainty in the dataset.
Information Gain: It refers to the decline in entropy after the data set is split. It is also called entropy Reduction.
It measures the reduction in entropy or variance that results from splitting a dataset based on a specific
property.
Gini Impurity or index: Gini Impurity is a score that evaluates how accurate a split is among the classified
groups. The Gini Impurity evaluates a score in the range between 0 and 1, where 0 is when all
observations belong to one class, and 1 is a random distribution of the elements within classes.
ALGORITHM:
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.

Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3.
EXAMPLE:
Advantages of Decision Tree algorithm

• Results are simplistic.

• Classification and regression trees are Nonparametric and Nonlinear.
• Classification and regression trees implicitly perform feature selection.
• Outliers have no meaningful effect on CART.
• It requires minimal supervision and produces easy-to-understand models.
Limitations of Decision Tree algorithm

• Overfitting.
• High Variance.
• low bias.
• the tree structure may be unstable.
Applications of the Decision Tree algorithm

• For quick Data insights.

• In Blood Donors Classification.
• For environmental and ecological data.
• In the financial sectors.

BAYES CLASSIFICATION
Bayesian classification uses Bayes theorem to predict the occurrence of any event. Bayesian classifiers
are the statistical classifiers with the Bayesian probability understanding.
BAYES THEOREM: Bayes’ Theorem is named after Thomas Bayes. He first makes use of conditional
probability to provide an algorithm which uses evidence to calculate limits on an unknown parameter.
1. Prior Probability: Prior Probability is the probability of occurring an event before the collection of new
data.
2. Posterior Probability: When new data or information is collected then the Prior Probability of an event
will be revised to produce a more accurate measure of a possible outcome.
NAÏVE BAYES ALGORITHM

• Naïve Bayes algorithm is a supervised learning algorithm, based on Bayes theorem, used to solve
classification problems.
• Used for text classification that includes a high dimensional training dataset.
• It is simple and most effective algorithm in building the fast ML models that can make quick
predictions.
• It is a probabilistic classifier.
EXAMPLE:

𝑌𝐸𝑆 𝐹𝐿𝑈 𝐶𝑂𝑉𝐼𝐷

P (𝐹𝐿𝑈,𝐶𝐿𝑂𝑉𝐼𝐷 ) = P ( 𝑌𝐸𝑆 ) * P ( ) * P (YES)
𝑌𝐸𝑆

= 3/7 * 4/7 * 7/10

= 0.17
𝑁𝑂 𝐹𝐿𝑈 𝐶𝑂𝑉𝐼𝐷
P (𝐹𝐿𝑈,𝐶𝐿𝑂𝑉𝐼𝐷 ) = P ( 𝑁𝑂 ) * P ( ) * P (NO)
𝑁𝑂

= 2/3 * 2/3 * 3/10

= 0.13

P(YES) > P (NO)

So the classification of the Given Person (flu, Covid) belongs to Fever (N0)

RULE BASED CLASSIFICATION

IF-THEN Rules

Rule-based classifier makes use of a set of IF-THEN rules for classification. We can express a rule in the
following from −

IF condition THEN conclusion

Let us consider a rule R1,
R1: IF age = youth AND student = yes THEN buy_computer = yes

• The IF part of the rule is called rule antecedent or precondition.

• The THEN part of the rule is called rule consequent.
• The antecedent part the condition consist of one or more attribute tests and these tests are
logically ANDed.
• The consequent part consists of class prediction. Assessment of Rule
• In rule-based classification in data mining, there are two factors based on which we can access
the rules.
These are:
Coverage of Rule: The fraction of the records which satisfy the antecedent conditions of a particular rule
is called the coverage of that rule.
Coverage(R)=nCOVERS/n
nCOVERS – Number of records satisfying the rule

n = Total of records in the data set

Accuracy of a rule: The fraction of the records that satisfy the antecedent conditions and meet the
consequent values of a rule is called the accuracy of that rule.
Accuracy(R) = nCORRECT / nCOVERS
nCORRECT = Number of records satisfying the consequent values
nCOVERS – Number of records satisfying the rule
Rule Extraction

Here we will learn how to build a rule-based classifier by extracting IF-THEN rules from a decision tree.
To extract a rule from a decision tree –

• One rule is created for each path from the root to the leaf node.
• To form a rule antecedent, each splitting criterion is logically ANDed.
• The leaf node holds the class prediction, forming the rule consequent
RULES:
1. If weather = cloudy then play = ‘yes’
2. If weather = ‘sunny’ or humidity=‘high’ then play =‘No’
3. If weather=‘sunny’ or humidity=‘normal’ then play =‘yes’
4. If weather=‘rainy’ or wind=‘strong’ then play=‘no’
5. If weather=‘rainy’ or wind=‘weak’ then play=‘yes’
If New test data set such as Day=11, weather = cloudy, Temp=Hot, humidity=high AND wind = weak
THEN PLAY = YES.

LAZY LEARNER ALGORITHM (KNN)

Lazy Learners: Lazy Learners are also known as instance-based learners, lazy learners do not learn a
model during the training phase. Instead, they simply store the training data and use it to classify new
instances at prediction time.
It is very fast at prediction time because it does not require computations during the predictions. it is
less effective in high-dimensional spaces or when the number of training instances is large.
Examples of lazy learners include k-nearest neighbours (KNN)
K – NEAREST NEIGHBOURS (KNN) ALGORITHM
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
• K-NN algorithm stores all the available data and classifies a new data point based on the
similarity
• K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
• It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action
on the dataset.

WORKING OF KNN ALGORITHM

• Step-1: Select the number K of the neighbours

• Step-2: Calculate the Euclidean distance of K number of neighbours
• Step-3: Take the K nearest neighbours as per the calculated Euclidean distance.
• Step-4: Among these k neighbours, count the number of the data points in each category.
• Step-5: Assign the new data points to that category for which the number of the neighbour is
maximum.
• Step-6: Our model is ready.
Advantages of KNN Algorithm:

• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:

• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the data points for all
the training samples.
PREDICTION

• Another process of data analysis is prediction. It is used to find a numerical output. Same as in
classification, the training dataset contains the inputs and corresponding numerical output
values.
• The algorithm derives the model or a predictor according to the training dataset. The model
should find a numerical output when the new data is given.
• Unlike in classification, this method does not have a class label.
• The model predicts a continuous-valued function or ordered value.
• Regression is generally used for prediction
ACCURACY
• Accuracy is a metric that measures how often a machine learning model correctly predicts the
outcome.
• You can calculate accuracy by dividing the number of correct predictions by the total number of
predictions.
PRECISION

• Precision is a metric that measures how often a machine learning model correctly predicts the
positive class.
• You can calculate precision by dividing the number of correct positive predictions (true positives)
by the total number of instances the model predicted as positive (both true and false positives).
RECALL

• Recall is a metric that measures how often a machine learning model correctly identifies positive
instances (true positives) from all the actual positive samples in the dataset.
• You can calculate recall by dividing the number of true positives by the number of positive
instances.

AXIOM A Hardware-Software Platform For
No ratings yet
AXIOM A Hardware-Software Platform For
8 pages
Final Project Report On Formulation of A Pesticide (LUBWAMA KENNETH)
No ratings yet
Final Project Report On Formulation of A Pesticide (LUBWAMA KENNETH)
39 pages
Lesson Plan in Mathematics 4: School: Teacher: Date: I. Objectives
No ratings yet
Lesson Plan in Mathematics 4: School: Teacher: Date: I. Objectives
6 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Examples: 238 17 Psychrometrics
No ratings yet
Examples: 238 17 Psychrometrics
12 pages
Crystals, Defects and Microstructures - Modeling Across Scales - R. Phillips (Cambridge, 2004) WW PDF
100% (3)
Crystals, Defects and Microstructures - Modeling Across Scales - R. Phillips (Cambridge, 2004) WW PDF
808 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Jogger Headset Demand Forecasting
No ratings yet
Jogger Headset Demand Forecasting
4 pages
NetSDK Programming Manual
No ratings yet
NetSDK Programming Manual
49 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Missel Product List GB 2017 02 Fire Protection PDF
No ratings yet
Missel Product List GB 2017 02 Fire Protection PDF
36 pages
TLW - Part A Questions & Answers
No ratings yet
TLW - Part A Questions & Answers
14 pages
Lecture+10-12 (Sampling and Reconstruction) PDF
No ratings yet
Lecture+10-12 (Sampling and Reconstruction) PDF
72 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Module 3
No ratings yet
Module 3
64 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Car Audio Systems for Toyota, Honda, Kia
No ratings yet
Car Audio Systems for Toyota, Honda, Kia
68 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Skewb Puzzle Solving Guide
No ratings yet
Skewb Puzzle Solving Guide
12 pages
Mechatronics Engineering Curriculum
No ratings yet
Mechatronics Engineering Curriculum
10 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Classification & Prediction Guide
100% (1)
Classification & Prediction Guide
67 pages
Module 04
No ratings yet
Module 04
75 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
Fix
No ratings yet
Fix
4 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
DVD Lens Actuator
No ratings yet
DVD Lens Actuator
6 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
Ps 6 - Material Balance With Chemical Reactions
No ratings yet
Ps 6 - Material Balance With Chemical Reactions
4 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
Slide 3
No ratings yet
Slide 3
23 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Classification
No ratings yet
Classification
33 pages
Datamining Unit 3
No ratings yet
Datamining Unit 3
47 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
Classification
No ratings yet
Classification
50 pages
Classification and Prediction-Module4
No ratings yet
Classification and Prediction-Module4
26 pages
Pae Receiver Type t6r Maintenance Handbook
80% (5)
Pae Receiver Type t6r Maintenance Handbook
80 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
DM - MP
No ratings yet
DM - MP
15 pages
K-NN & Decision Tree Algorithms
No ratings yet
K-NN & Decision Tree Algorithms
29 pages
DT-10 Owner's Manual: Turning On The Power
No ratings yet
DT-10 Owner's Manual: Turning On The Power
3 pages
Chemistry Basics for Students
No ratings yet
Chemistry Basics for Students
16 pages
2023-24 ML Notes 2
No ratings yet
2023-24 ML Notes 2
16 pages
Exact Solutions of The Sextic Oscillator From The Bi-Confluent Heun Equation
No ratings yet
Exact Solutions of The Sextic Oscillator From The Bi-Confluent Heun Equation
17 pages
OPC Penetration Testing in Oil PCS
No ratings yet
OPC Penetration Testing in Oil PCS
15 pages
Lecture6,7-Logic Design - Transistors To Gates-Final
No ratings yet
Lecture6,7-Logic Design - Transistors To Gates-Final
46 pages
Hydroponic Gardening Guide
No ratings yet
Hydroponic Gardening Guide
11 pages
Quickassist Adapter 8950 Brief
No ratings yet
Quickassist Adapter 8950 Brief
3 pages
Unit 3
No ratings yet
Unit 3
16 pages
DataMining Unit-3
No ratings yet
DataMining Unit-3
8 pages
Antilock Brake System 4f
No ratings yet
Antilock Brake System 4f
24 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Hyd Cylinder Details Jyo Make
No ratings yet
Hyd Cylinder Details Jyo Make
4 pages
Classification
No ratings yet
Classification
52 pages
Classification Chapter 5
No ratings yet
Classification Chapter 5
26 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
21 pages
Classification Notes
No ratings yet
Classification Notes
14 pages
ML Important
No ratings yet
ML Important
11 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
TTH Module 1
No ratings yet
TTH Module 1
4 pages
ML Notes
No ratings yet
ML Notes
50 pages
Algomasterio System Design Interview Handbook
No ratings yet
Algomasterio System Design Interview Handbook
19 pages
Classifiction
No ratings yet
Classifiction
42 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
UNIT 3 Data Warehousing
No ratings yet
UNIT 3 Data Warehousing
39 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Topic7 Classification
No ratings yet
Topic7 Classification
104 pages
Unit 6 DWDM
No ratings yet
Unit 6 DWDM
74 pages
Q1 LE Mathematics-8 Lesson-2 Week-2
No ratings yet
Q1 LE Mathematics-8 Lesson-2 Week-2
25 pages
Machine Learning Chapter 2
No ratings yet
Machine Learning Chapter 2
53 pages
Module 3 Supervised ML Algo
No ratings yet
Module 3 Supervised ML Algo
48 pages

Unit 4 Classification & Prediction

Uploaded by

Unit 4 Classification & Prediction

Uploaded by

UNIT – 4 CLASSIFICATION AND PREDICTION

• Classification technique is a systematic approach to building classification models from an input

There are two stages of classification

Model Usage / Testing:

• Results are simplistic.

• For quick Data insights.

𝑌𝐸𝑆 𝐹𝐿𝑈 𝐶𝑂𝑉𝐼𝐷

= 3/7 * 4/7 * 7/10

= 2/3 * 2/3 * 3/10

P(YES) > P (NO)

RULE BASED CLASSIFICATION

IF condition THEN conclusion

• The IF part of the rule is called rule antecedent or precondition.

n = Total of records in the data set

LAZY LEARNER ALGORITHM (KNN)

WORKING OF KNN ALGORITHM

• Step-1: Select the number K of the neighbours

You might also like