Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views71 pages

ML Module I

Module I of CSC604 introduces Machine Learning, covering its definition, history, types (supervised, unsupervised, semi-supervised, reinforcement), and applications in various fields such as medicine and finance. It emphasizes the importance of data preparation, model validation techniques, and performance measures like accuracy and precision. The module also discusses the steps involved in developing a machine learning application and the significance of understanding model behavior through concepts like overfitting and underfitting.

Uploaded by

yashsp20phpcomp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views71 pages

ML Module I

Module I of CSC604 introduces Machine Learning, covering its definition, history, types (supervised, unsupervised, semi-supervised, reinforcement), and applications in various fields such as medicine and finance. It emphasizes the importance of data preparation, model validation techniques, and performance measures like accuracy and precision. The module also discusses the steps involved in developing a machine learning application and the significance of understanding model behavior through concepts like overfitting and underfitting.

Uploaded by

yashsp20phpcomp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

CSC604 (Machine Learning)

Module I
➢ BY:
DR. ARUNDHATI DAS
Module I: Introduction to Machine Learning
• 1.1 Introduction to Machine Learning, Issues in Machine Learning,
Application of Machine Learning, Steps of developing a Machine
Learning Application.
• 1.2 Supervised and Unsupervised Learning: Concepts of
Classification, Clustering and prediction, Training, Testing and
validation dataset, cross validation, overfitting and underfitting of
model
• 1.3 Performance Measures: Measuring Quality of model- Confusion
Matrix, Accuracy, Recall, Precision, Specificity, F1 Score, RMSE

2
Module I: Introduction to Machine Learning
• 1.1 Introduction to Machine Learning, Issues in Machine Learning,
Application of Machine Learning, Steps of developing a Machine
Learning Application.

3
Module No. 1: Introduction

Courtesy: Internet 4
Module No. 1: Introduction
➢ Brief history of machine learning
➢ 1950s
➢ Samuel’s checker playing game
➢ Samuel coined the term Machine Learning

➢ 1960s
➢ Neural network: Rosenblatt’s perceptron
➢ Perceptron is a simple neural network unit. It resembles how a biological neuron works
➢ Window and Hoff’s Delta Rule (Least Mean Square Rule)
➢ Delta rule is used for learning perceptron
➢ It is a specific type of backpropagation in neural network training
➢ All these combinations gave birth to a good linear classifier
➢ Minsky and Ppert pointed out some limitations of perceptron
➢ Following this nueral network research went on a pause until 1980s

5
Module No. 1: Introduction
➢ Brief history of machine learning
➢ 1970s
➢ Symbolic concept induction
➢ J. R Quinlan in 1986 came up with decision tree learning ID3 algorithm
➢ Subsequently IDs improvement, alternatives developed such as cart, regression trees

➢ 1980s
➢ Advanced decision tree and rule learning developed
➢ Learning, planning, problem solving were developed
➢ Revival of neural network
➢ Multilayer perceptron in 1981
➢ Backpropagation algorithm specific to neural network was developed
➢ Lot of research are done on Multilayer perceptron after that
➢ Theoretical framework of machine learning developed
➢ Vallant’s PAC (Probably Approximately Correct) learning
➢ Focus shifted to experimental methodologies
6
Module No. 1: Introduction
➢ Brief history of machine learning
➢ 1990s
➢ Machine learning (ML) and statistics
➢ SVM proposed by Vapnik and Cortes
➢ S V Hem provided theoretical and experimental/empirical standings
➢ Adaptive agents, web applications
➢ Text learning
➢ Reinforcement learning
➢ Ensembles or boosting, Adaboost
➢ 1994 first self driving car road test

➢ 2000s
➢ Kernel SVM
➢ Random Forest
➢ Baye’s net learning

7
Module No. 1: Introduction
➢ Why machine learning popular today
➢ New software/algorithms
➢ Neural network
➢ Deep learning
➢ New hardware
➢ GPUs
➢ Cloud enable
➢ Big data available

➢ What is machine learning


➢ How machine learning differs from programmatic learning

➢ Fundamental laws that govern machine learning:


➢ ML explores algorithms that learn from data
➢ builds models from data
➢ The models can be used for tasks like prediction, decision making or solving tasks

8
Definition of ML

➢ Formal definition: It is about building the computer systems that automatically improves with experience

➢ Definition by Tom M Mitchell: A computer program is said to learn from experience E with respect to some
class of Tasks T and performance measure P, if its performance on Tasks T as measured by P improves with
experience E

➢ Tasks T: eg. Prediction, classification, acting in an environment

➢ Experience E: eg. Data

➢ Measure of improvement in performance P: eg. You want to improve accuracy in prediction, you might
want to have new skills to the agents which it did not earlier process, or improve efficiency of problem
solving.

9
DL vs ML vs AI

Courtesy: Internet 10
DL vs ML vs AI vs DS

Courtesy: Internet 11
DL vs ML vs AI vs GenAI vs LLM

Courtesy: Internet 12
➢Applications of machine learning

➢ Medicine
➢ Diagnose a disease
➢ i/p → symptoms, lab measurements, test results, DNA tests etc
➢ o/p→ one from the possible set of diseases or none of the above
➢ Background knowledge (examples)→ learn from past medical records

➢ Computer vision
➢ What/ where objects appear in an image
➢ Convert hand-written digits/texts to characters/recognize text written in image (OCR optical character
recognition)

➢ Robot control
➢ Designing autonomous robots

13
➢Applications of machine learning

➢ Natural Language Processing


➢ Speech recognition
➢ Machine translation

➢ Financial
➢ Predict if a stock will rise or fall
➢ Predict if a user will click on an ad or not
➢ Improve customer’s experience in online shopping

➢ Miscellaneous applications
➢ Identify price sensitivity of a customer product, identify optimum price point that maximizes profit
➢ Optimize product location at a super market retail outlet
➢ Credit card fraud detection

14
Steps of developing a Machine Learning
Application.
Data preparation or data pre-processing
• The process of preparing raw data so that it is suitable for further
processing and analysis.
• Data preparation or data pre-processing include collecting, reshaping,
filtering, merging, cleaning, outlier removal, handling missing values,
feature normalization and labeling raw data into a form suitable for
machine learning (ML) algorithms and then exploring and visualizing the
data.

16
Data preparation steps
• Data preparation steps: 1. Gather data, 2. Discover and access the data, 3. Cleanse the data,
4. Transform and enrich the data, 5. Store the data.
• 1. Gather data
• The data preparation process begins with finding the right data. This can come from an existing data catalog or data
sources can be created for a particular application.
• 2. Discover and assess data
• After collecting the data, it is important to discover each dataset. This step is about getting to know the data and
understanding what has to be done before the data becomes useful in a particular context.

• 3. Cleanse data
• Cleaning up the data is traditionally the most time-consuming part of the data
preparation process, but it’s crucial for removing faulty data and filling in gaps.
Important tasks here include:
• Removing extraneous data and outliers
• Filling in missing values
• Conforming data to a standardized pattern mostly by normalizing

17
Data preparation steps
• 4. Transform and enrich data
• Data transformation is the process of updating the format or value entries in
order to reach a well-defined outcome, or to make the data more easily
understood by a wider audience. Enriching data refers to adding and
connecting data with other related information to provide deeper insights.
• 5. Store data
• Once prepared, the data can be stored safely.

• P.S.: Train, Test and improve will be elaborated in the subsequent


slides.
18
ML is about how expert you are in discerning patterns

19
Module I: Introduction to Machine Learning
• 1.2 Supervised and Unsupervised Learning: Concepts of
Classification, Clustering and prediction, Training, Testing and
validation dataset, cross validation, overfitting and underfitting of
model

20
➢How to create a learning system
➢ Choose the training experience/data
➢ Choose the target function that is to be learnt
➢ Choose how we want to represent the model
➢ Choose a learning algorithm to infer the target function

21
Different types of machine learning

➢ Supervised (inductive) learning


➢ Unsupervised learning
➢ Semi-supervised learning
➢ Reinforcement learning

22
➢Different types of machine learning
➢ Supervised (inductive) learning
Training data includes desired outputs
➢ X,y
➢ Given a new observation x, what is the best label of y?

➢ Unsupervised learning
Training data does not include desired outputs
➢ X
➢ Given a set of x’s, cluster or group or summarize them

➢ Semi-supervised learning
Training data includes a few desired outputs(few labelled and few unlabeled)

➢ Reinforcement learning
Rewards from sequence of actions(rewards for correct action, penalty for incorrect action)
➢ Determine what to do based on rewards and penalty

23
Supervised (inductive) learning (classifcation)
➢ Supervised (inductive) learning (classifcation)
Training data includes desired outputs
➢ X,y
➢ Given a new observation x, what is the best label of y?

Fig: Schematic diagram of Supervised learning (Courtesy NPTEL)

24
Unsupervised learning (clustering)
➢ Unsupervised learning (clustering)
Training data does not include desired outputs
➢ X
➢ Given a set of x’s, cluster or group or summarize them

Fig: Schematic diagram of Unsupervised learning (Courtesy NPTEL)

25
Semi-supervised learning
➢ Semi-supervised learning
Training data includes a few desired outputs(few labelled and few unlabeled)

Fig: Schematic diagram of Semi-supervised learning (Courtesy NPTEL)


26
Reinforcement learning

➢ Reinforcement learning
Rewards from sequence of actions (rewards for correct action, penalty for incorrect action)
➢ Determine what to do based on rewards and penalty

Fig: Schematic diagram of Reinforcement learning (Courtesy NPTEL)

27
➢Supervised (inductive) learning

➢ A set of input features x1, x2,…., xn


➢ Target feature y

➢ A set of training examples: values for i/p feature and target features are given for each example

➢ Test data: A new example: values for the i/p features are given

➢ Predict the values for the target features for the new example
➢ Classification when y is discrete
➢ Regression when y is continuous

➢ Feature:
➢ Categorical: eg. Color red, yellow, blue, green; Blood group A, B, AB, O
➢ Ordinal: eg. Small, medium, large
➢ Integer valued: eg. Class 1, 2,3,…
➢ Real valued (continuous): eg. height

28
Supervised Learning
➢ Model or hypothesis

Fig: Schematic diagram of Supervised learning (Courtesy NPTEL)

29
Supervised Learning

Fig: Detailed schematic diagram of Supervised learning (Courtesy NPTEL)

30
➢ Representation

➢ (1) Decision tree

➢ (2) Linear function

➢ (3) Multivariate linear function

Fig. (1) Decision Tree


(Courtesy A K Pujari book)

Fig. (2) Linear function (3) Multivariate linear function (Courtesy S Haykin book) 31
➢ Representation

➢ (4) Single layer perceptron

➢ (5) Multilayer perceptron neural network

Fig. (4) Single L P (5) Multi L P (Courtesy Internet)


32
Some Terminologies

➢ Hypothesis
➢ A machine learning hypothesis is a candidate model that approximates a target function for mapping inputs
to outputs.

➢ Target function
➢ The target function is essentially the formula that an algorithm feeds data to in order to calculate predictions.

➢ Hypothesis space
➢ Supervised learning algorithm/machine can be considered as a device that explores a hypothesis space
➢ Each setting of the parameters in the machine is a different hypothesis about the function that maps
input vectors to output vectors

➢ Features: Distinct traits that can be used to describe each item in a quantitative manner.

➢ Feature vector: n-dimensional vector of numerical features that represent some object/class.
➢ Feature space: set of all features

33
➢Some Terminologies

➢ Instance space X: Set of all possible objects describable by features.

➢ (x,y): instance x (input) with label y (output) where y=f(x)

➢ Concept c: Subset of objects from X (c is unknown)

➢ Target function f: f maps each instance x( X) to target label y ( Y)

➢ Training data: collection of examples observed by the learning algorithm.

➢ Inductive learning (or prediction or concept learning): process where learner discovers rules by observing
examples or on the basis of past experience, formulating a generalized concept.

➢ Classification vs regression vs probability estimation

➢ f(x) is dis vs f(x) conti vs f(x)= probability of x

34
Hypothesis and hypotheses space

35
➢ IRIS data set
➢ Feature values, class labels (target feature)

36
Outline:
Model Validation in Classification: Cross Validation -
Holdout Method, K-Fold, Stratified K-Fold, Leave-One-Out
Cross Validation. Bias-Variance tradeoff, Regularization,
Overfitting, Underfitting.
Model Validation in Classification
• What is model validation?
• Model validation means checking whether the designed Machine learning
model is capable of correctly classifying/performing well on unseen data.
• Validating means checking whether a correct classification is done randomly!
• Why model validation is required?
• To see how it will perform on unseen data.
• To confirm whether the model will perform equally well outside the
laboratory datasets.
• How model validation is done?
• There are various ways of validating a model among which the mostly famous
techniques are Train/Test split and Cross Validation.
➢ Experimental evaluation of machine learning algorithms
➢ Importance of evaluation
➢ We need to evaluate the performance of the trained model
➢ To predict the class labels of the data points

➢ Performance evaluation is done using


➢ Error
➢ Accuracy
➢ Precision/ Recall

➢ Sampling methods
➢ Training and test sets: disjoint sets are preferred
➢ K-fold cross validation: splitting data into different training sets to tune parameters of the algorithm

39
How model validation is done?

Some of the popular model evaluation techniques are:

1. Hold Out method


i. Train/Test split
ii. Train/Validation/Test split
2. Cross validation
i. K-Fold Cross-Validation
ii. Stratified K-Fold Cross-Validation
iii. Leave One Out Cross-Validation
Data Set

1. i. Train/Test split
• The original dataset is split into Train/Training
set and Test set.
• The dataset can be divided into 70-30 or 60-40,
75-25 or 80-20, or even 50-50 depending on the Training Set
application at hand. As a rule, the proportion of
training data has to be larger than the test data.
• On the Train set, the machine learning model is
built.
• The built ML model can be tested on both the
Train data (already seen/known by the model)
and Test set (the unseen/unknown data by the
model).
• The model will give some accuracy value on the
Train set and some other accuracy value on the
Test set. Test Set
• If the accuracy on the Train set and the Test set
are similar or close, then the model is said to be
a well trained model.
• If the accuracy on the Test set is less than the
accuracy on the Train set then the model is said
to be not a well trained model. This situation is
called overfitting.
1. ii. Train/Validation/Test split Data Set

• The original dataset is split into Train set, Validation


set and Test set.
• The dataset can be divided into 70-20-10 or 60-20-
20, or 80-10-10, depending on the application at
hand. Training Set
• On the Train set, suppose ‘k’ machine learning
models are trained (For example, k=6 Logistic
regression, ID3, knn, svm, naïve bayes etc.)
• Now, the question is, out of all these ‘k’ models,
which one is the best performing model. We find it
out by running the ‘k’ models on the Validation set.
• After finding the best performing model in Validation
Validation set, the model is again tested on the Test
set as an added layer of check to make sure that the Set
accuracy value obtained by the best performing
model in the validation set is maintained in the Test
set as well.
• Once the best performing model maintains similar Test Set
accuracy value on the Validation set and the Test
set, then the model is said to be a well trained
model.
Data Set

Cross Validation Group no. 1

Group no. 2
• Cross-validation is another
model validation popularly used
in Machine Learning. The steps .
are as follows: .
1. The dataset is randomly split up into .
‘n’ sets of equal size.
2. One of the set is used as the test set
.
and the rest are used as the training .
set. .
3. The model is trained on the training
set and tested on the test set. .
4. Then the process is repeated for ‘n’ Group no. n
iterations until each unique set has
been used as the test set.
2. i. K-fold Cross Validation Data Set

Group no. 1

• When the value of ‘n’ is ‘k’ it is called K-fold Group no. 2


Cross Validation.
• K-fold Cross-validation has the following .
steps: .
1. The dataset is randomly split up into ‘k’ sets of equal
size. It uses random sampling method to split the .
data.
2. In the first iteration, the 1st set is selected as the test
.
set and the model is trained on the remaining k-1 .
sets. The accuracy value is noted down.
3. In the second iteration, the 2nd set is selected as a .
test set and the remaining k-1 sets are used to train .
the data and the accuracy value is noted down.
4. Then the process is repeated for ‘k’ iterations until Group no. k
each unique set has been used as the test set.
5. Finally, the average of all the accuracy values for ‘k’
iterations are calculated as the final accuracy value:

final accuracy= accuracy i


Fig.: Working of K-fold Cross Validation

Courtesy: Internet
2. ii. Stratified K-Fold Cross- Data Set

Validation Group no. 1 versicolor


virginica
• This is a slight variation from K-Fold Cross
Validation, which uses ‘stratified Group no. 2 versicolor
sampling’ instead of ‘random sampling.’ virginica

• Stratified K-fold Cross-validation has the


following steps: .
1. The dataset is split up into ‘k’ sets of equal size. It .
uses stratified sampling method to split the data. By
using stratified sampling, each split set has the equal .
number of sample contributions from each class of
the dataset. .
2. In the first iteration, the 1st set is selected as the test
set and the model is trained on the remaining k-1
sets. The accuracy value is noted down.
.
3. In the second iteration, the 2nd set is selected as a .
test set and the remaining k-1 sets are used to train
the data and the accuracy value is noted down. .
4. Then the process is repeated for ‘k’ iterations until
each unique set has been used as the test set. Group no. k versicolor
5. Finally, the average of all the accuracy values for ‘k’ virginica
iterations are calculated as the final accuracy value:

final accuracy= accuracy i


Why we need Stratified K-Fold Cross-Validation
• Suppose your data contains reviews for a cosmetic product used by both the male and female population.
• When we perform random sampling to split the data into train and test sets, there is a possibility that most of
the data representing males is not represented in training data but might end up in test data.
• When we train the model on sample training data that is not a correct representation of the actual population,
the model will not predict the test data with good accuracy.
• This is where Stratified Sampling comes to the rescue. Here the data is split in such a way that it represents
all the classes from the population.

• Let’s consider an example which has a cosmetic product


review of 1000 customers out of which 60% is male and
40% is female as shown in the fig. Male population is more
than female population.
• We want to split the data into train and test data in proportion
(80:20). 80% of 1000 customers will be 800 which will be
chosen in such a way that there are 480 reviews associated
with the male population and 320 representing the female
population.
• In a similar fashion, 20% of 1000 customers will be chosen
for the test data ( with the same male and female
representation).
2. iii. Leave One Out Cross-Validation
In this method, we divide the data into train and test
sets, but with a twist.
1. Instead of dividing the data into two subsets, we
select a single sample as test data, and everything
else is labeled as training data and the model is
trained. The accuracy value is noted down.
2. Now the next sample is selected as test data and
the model is trained on the remaining data and the
accuracy value is noted down.
3. Then the process is repeated for ‘n’ iterations
until each sample has been used as the test data.
4. Finally, the average of all the accuracy values for
‘n’ iterations are calculated as the final accuracy
value:

final accuracy= accuracy i


Bias-Variance trade-off
• What is a trade-off?
• giving up of one thing in return for another
• What is Bias or bias error?
• Bias is a value that represents assumptions taken while designing a machine learning
model.
• Low Bias: A low bias model will make fewer assumptions about the form of the target function.
• High Bias: A model with a high bias makes more assumptions, and the model becomes unable to
capture the important features of our dataset.
• For some ML model, we must make some assumptions (limiting hypothesis space,
limiting range of values for some parameter, imposing ordering etc.) to proceed and
get a better conclusion. But the amount of bias should never be very high.
• High bias value will lead to underfitting of the ML model.
• Some examples of ML algorithms with low bias are Decision Trees, KNN, SVM.
Algorithm with high bias is Linear Regression, Logistic Regression, Linear
Discriminant Analysis.
Bias-Variance trade-off
• What is Variance or variance error?
• Variance is a value that represents the variability in the model prediction like how much the ML target
function can adjust depending on the changes in the given data set.
• Low variance means there is a small variation in the prediction of the target function with changes in the training data set.
• High variance shows a large variation in the prediction of the target function with changes in the training dataset.
• If we suppose increase or decrease in the percentage of training data, or change some number of features in
the training data, the amount of variability in the model prediction is variance. The variance should always be
low.
• High variance value leads to overfitting the ML model.
• Some examples of ML algorithms with low variance are, Linear Regression, Logistic Regression, and Linear
discriminant analysis. Algorithms with high variance are Decision tree, SVM and KNN.
• What is Bias-Variance trade-off?
• Bias-Variance trade-off is the conflict in trying to simultaneously minimize these two sources of errors to get
better classification accuracy in a ML model.
• While building the machine learning model, it is really important to take care of bias and variance in order to
avoid overfitting and underfitting in the model.
• If the model is very simple with fewer parameters, it may have low variance and high bias. Whereas, if the
model has a large number of parameters, it will have high variance and low bias.
• So, it is required to make a balance between bias and variance errors, and this balance between the bias error
and variance error is known as the Bias-Variance trade-off.
Bias-Variance trade-off
• For an accurate prediction of the
model, algorithms need a low
variance and low bias. But this
is not possible because bias and
variance are related to each
other:
• If we decrease the variance, it
will increase the bias.
• If we decrease the bias, it will
increase the variance.
Underfitting and Overfitting
Overfitting and Underfitting are the two main
problems that occur in machine learning and
degrade the performance of the machine learning
models.

➢ Underfitting:
➢ Model is too simple to represent all the
relevant class characteristics
➢ High training error, high test error
➢ Low variance, high bias

Fig: Underfitting vs Robust (Appropriate) fitting vs Overfitting (Courtesy


➢ Overfitting: Internet)
➢ Model is too complex and fits irrelevant
characteristics (noise) in the data
➢ Low training error, high test error
➢ High variance, Low bias
Underfitting
• Underfitting occurs when the
ML model is not able to capture
the underlying trend of the data.
• In the case of underfitting, the
model is not able to learn
enough from the training data,
and hence it reduces the
accuracy and produces
unreliable predictions.
• An underfitted model has high
bias and low variance.
Overfitting
• Overfitting occurs when the ML model
tries to cover all the data points or more
than the required data points present in the
given dataset.
• Because of this, the model starts caching
noise and inaccurate values present in the
dataset, and all these factors reduce the
efficiency and accuracy of the model.
• The overfitted model has low
bias and high variance.
• The chances of occurrence of overfitting
increase as much we provide training to
our model. It means the more we train our
model, the more chances of occurring the
overfitted model.
Overfitting in Linear Regression
• 𝑥1 = size of house
• 𝑥2 =no. of bedrooms
• 𝑥3 =no. of floors Price ($)
in 1000’s
• 𝑥4 = age of house
• 𝑥5 = average income in neighborhood
• 𝑥6 =kitchen size
• ⋮
• 𝑥100
Size in feet^2
ℎ𝜃 𝑥 = 𝑔(𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥2 +
If we have too many features (i.e. complex model), the learned hypothesis may fit the 𝜃3 𝑥12 + 𝜃4 𝑥22 + 𝜃5 𝑥1 𝑥2 +
training set very well 𝜃6 𝑥13 𝑥2 + 𝜃7 𝑥1 𝑥23 + ⋯ )
𝑚
1 𝑖 𝑖 2
𝐽 𝜃 = ෍ ℎ𝜃 𝑥 −𝑦 ≈0
2𝑚
𝑖=1

but fail to generalize to new examples


(predict prices on new examples).
Overfitting in Logistic Regression

Age Age Age

Tumor Size Tumor Size Tumor Size


ℎ𝜃 𝑥 = 𝑔(𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥2 ) ℎ𝜃 𝑥 = 𝑔(𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥2 + ℎ𝜃 𝑥 = 𝑔(𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥2 +
𝜃3 𝑥12 + 𝜃4 𝑥22 + 𝜃5 𝑥1 𝑥2 ) 𝜃3 𝑥12 + 𝜃4 𝑥22 + 𝜃5 𝑥1 𝑥2 +
𝜃6 𝑥13 𝑥2 + 𝜃7 𝑥1 𝑥23 + ⋯ )

Underfitting Overfitting
Solution for overfitting
• How do we deal with this?
1) Reduce number of features
• Manually select which features to keep
• But, in reducing the number of features we lose some information
• Ideally select those features which minimize data loss, but even so, some
info is lost
2) Regularization
• Keep all features, but reduce magnitude of parameters θ
• Works well when we have a lot of features, each of which contributes a bit to
predicting y
Module I: Introduction to Machine Learning
• 1.3 Performance Measures: Measuring Quality of model- Confusion
Matrix, Accuracy, Recall, Precision, Specificity, F1 Score, RMSE

58
Classification metrics for Performance
measure
➢Classification metrics are used to evaluate the performance of the data
mining/ machine learning algorithms

➢Experimental evaluation of algorithms


➢ Importance of evaluation
➢ We need to evaluate the performance of the trained model
➢ To predict the class labels of the data points

➢Performance evaluation is done using


➢ Accuracy
➢ Error
➢ Precision
➢ Recall
Classification metrics
➢ Experimental Evaluation on predicted values
➢ To make prediction of target feature (or class label) of x
➢ Say, y is the observed value (from the dataset) of target feature of x
➢ Say, y’ (or or ℎΘ (𝑥)) is the predicted value of target feature of x
➢ What is the error in the observed and predicted value?

➢ Few of the different types of errors:


➢ Absolute error : in case of regression
➢ Sum of squares error: in case of regression
➢ Number of misclassification: in case of classification
➢ Confusion matrix: in case of classification
➢ Accuracy, Precision, Recall
1. Mean Absolute Error (MAE)

• The actual value (Yi)


• Predicted value (Y^)
• N is the total number of samples
• MAE represents the difference between the original and predicted
values extracted by averaged the absolute difference over the data
set.
2. Mean Squared Error(MSE)

• The actual value (Yi)


• Predicted value (Y^)
• N is the total number of samples
• MSE represents the difference between the original and predicted
values extracted by squared the average difference over the data set.
3. Root Mean Squared Error (RMSE)

• The actual value (Yi)


• Predicted value (Y^)
• N is the total number of samples
• RMSE is the error rate by the square root of MSE.
Confusion matrix
• For a 2-class problem

𝑇𝑁
Specificity= 𝑇𝑁+𝐹𝑃
Definitions
• Confusion matrix: A Confusion matrix is an N x N matrix used for evaluating
the performance of a classification model, where N is the number of target
classes. The matrix compares the actual target values with those predicted by the
machine learning model.
• Accuracy: Accuracy simply measures how often the classifier makes the correct
prediction. It’s the ratio between the number of correct predictions and the total
number of predictions.
• Precision: It is a measure of correctness that is achieved in true prediction. In
simple words, it tells us how many predictions are actually positive out of all
the total positive predicted.
• Recall: It is a measure of actual observations which are predicted correctly, i.e.
how many observations of positive class are actually predicted as positive. It is
also known as Sensitivity or TPR (True Positive Rate).
• Specificity: Specificity is the measure of a test's ability to correctly identify true
negatives (i.e., correctly excluding cases where the condition is absent). Also
known as TNR (True Negative Rate).
Confusion matrix in
case of 3-class
problem
Numerical on example dataset
• Discussed in class some random dataset example dog vs cat
• The numerical discussed on board is important, practice it.
Q and A

69
Q and A

70
THANK YOU

71

You might also like