Unit 3: Evaluating Models
What is evaluation?
Model evaluation is the process of evaluating AI model by using
different evaluation metrics to understand a machine learning model’s
performance.
An AI model gets becomes better and more with constructive feedback;
Splitting the data set into training and Testing data set:
After exploring the data in Data Exploration stage the data is divided into
Training and Testing data set.
The train-test split is a technique for evaluating the performance of a machine
learning algorithm.
Training data set can be used for any supervised learning algorithm. Testing
data set can be used for evaluation model.
Why we need of Train-test split?
The train dataset is used
to make the model learn,
after that test dataset are
provided as input to the
trained model. The model
makes predictions, and
the predicted values are
compared to the expected
values (reality). The
objective is to estimate the
performance of the
machine learning model
on new data: data not
used to train the model.
Accuracy and Error
In AI model evaluation accuracy and error are key metrics which helps to
understand how well a model performs and identify the areas for
improvement. In AI model evaluation, higher accuracy means a model is
better, while lower error indicates less mistakes.
Accuracy – Accuracy is a ratio between right prediction and total
prediction made by AI model. The accuracy of the model and
performance of the model is directly proportional, and hence better the
performance of the model, the more accurate are the predictions.
Error – Error can be described as an action that is inaccurate or wrong.
Based on our error, we choose the machine learning model which
performs best for a particular dataset.
How to find accuracy of the AI model
To find the accuracy of an AI model, we have to first calculate the percentage
of correct predictions made by the testing dataset. The formula to find the
accuracy is—
Total Predication = Correct predictions + Wrong Predictions
Correct Prediction == Reality
Wrong Prediction != Reality
Error = Reality – Prediction
Error Rate = Error / Reality
Accuracy = 1 – Error Rate =Correct / Reality
Accuracy in percentage = Accuracy * 100
Calculate the accuracy of the House Price prediction AI model
Predicted Actual
Error Abs Error Rate Accuracy
House House Accuracy%
(Actual- (Error (1-Error
Price Price (Accuracy*100)%
Predicted) /Actual) rate)
(USD) (USD)
75 lakh-65
75 10/75 1-0.1333= 0.8667*100% =
65 lakh lakh=
lakh =0.13333 0.8667 86.67%
10 lakh
Given values:
Predicted House Price = 48 lakhs
Actual House Price = 50 lakhs
Step 1: Calculate Absolute Error
Error: 50 - 48 = 2 lakh
Step 2: Calculate Error Rate
Error Rate: 2 / 50 = 0.04
Step 3: Calculate Accuracy
Accuracy: 1 – 0.04 = 0.96
Step 4: Convert to Percentage
Accuracy in percentage: 0.96 × 100 = 96%
Evaluation metrics for classification
What is Classification?
In artificial intelligence classification is a technique that organizes data into
categories. It’s a type of machine learning that uses algorithms to sort data
into predefined classes. You go to a supermarket and were given two trolleys,
in one, you have to place the fruits and vegetables; in the other, you must put
the grocery items like bread, oil, egg, etc. So basically, you are classifying the
items of the supermarket into two classes:
fruits and vegetables
grocery
Classification metrics
Classification metrics are used to evaluate the performance of a classification
model in machine learning, or you can say that it is performance measures
used to evaluate the effectiveness of the model. It helps to compare between
different models and identify the best one.
Different types of classification techniques in AI
Popular metrics used for classification model
Confusion matrix
Classification accuracy
Precision
Recall
F1 Score
1. What is confusion matrix?
The confusion matrix is a handy presentation of the accuracy of a model with
two or more classes. Comparison between the prediction and reality and can
be recorded in what we call the confusion matrix. The confusion matrix allows
us to understand the prediction results.
It consists of four values:
True Positive (TP): Correctly predicted positive cases.
False Negative (FN): Model predicted negative, but it was actually
positive.
False Positive (FP): Model predicted positive, but it was actually
negative.
True Negative (TN): Correctly predicted negative cases.
Prediction and Reality can be
easily mapped together with the
help of this confusion matrix.
2. Classification accuracy
Classification accuracy allows you to count the total number of accurate
predictions made by a model.
The accuracy calculation is as
follows: How many of the model
predictions were accurate will be
determined by accuracy. True
Positives and True Negatives are
what accuracy considers.
Here, total observations cover all the possible cases of prediction that can be
True Positive (TP), True Negative (TN), False Positive (FP) and False Negative
(FN).
3. Precision
Precision is defined as the percentage of
true positive cases versus all the cases
where the prediction is true. That is, it
takes into account the True Positives and
False Positives.
3. Recall
It can be described as the percentage of
positively detected cases that are positive.
The scenarios where a fire actually existed in
reality but was either correctly or incorrectly
recognized by the machine are heavily
considered. That is, it takes into account both
False Negatives (there was a forest fire but
the model didn’t predict it) and True Positives
(there was a forest fire in reality and the model anticipated a forestfire).
4. F1 Score
F1 score can be defined as the
measure of balance between precision
and recall or F1-Score provides a way
to combine both precisions and recall
into a single measure that captures
both properties.
Let us explore the variations we can have in the F1 Score:
How to draw the confusion matrix
Let’s see one question
Draw the confusion matrix for the
following data
1. the number of true positive = 100
2. the number of true negative 47
3. the number of false positive = 62
4. the number of false negative = 290
Build the confusion matrix from scratch Actual Predicted
Let’s assume we were predicting the presence of a
Yes Yes
disease; for example, “yes” would mean they have the
disease, and “no” would mean they don’t have the
No No
disease. So, the AI model will have output is Yes or No
Yes No
Now, count each type of prediction:
TP (Yes, Yes) = 4 No Yes
FN (Yes, No) = 2
FP (No, Yes) = 2 Yes Yes
TN (No, No) = 2
Yes No
The matrix based on the table given here
No No
Yes Yes
No Yes
Yes Yes
let’s find the accuracy:
Accuracy = (TP + TN) / Total Predictions
= (4+2) / 10 =6 / 10 = 0.6
The model correctly predicted 6 out of 10 cases, meaning the accuracy is 60%
Can we use Accuracy all the time?
It is only suitable when there are an equal number of observations in each
class, i.e., a balanced dataset (which is rarely the case), and that all
predictions and prediction errors are equally important, which is often not the
case.
Classification Accuracy Calculation
Let’s assume you are testing your model
on 1000 total test data. Out of which the
actual values are 900 Yes and only 100
No (Unbalanced dataset). Let’s assume
that you have built a faulty model which,
irrespective of any input, will give a
prediction as Yes.
True Positives (TP) = 900
False Negatives (FN) = 0
False Positives (FP) = 100
True Negatives (TN) = 0
Now, applying the formula:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
= (900 + 0) / (900 + 0 + 100 + 0)
= 900 / 1000
= 0.9
Accuracy = 0.9 x 100 = 90%
Now the model is 90% accurate; it is misleading because it never predicts
“no.” We should use precision, recall, and F1 score to get better evaluation.
Calculate the accuracy of the classifier model using precision,
recall and F1 score
Question 1: An AI model made the following sales prediction for a new mobile
phone which they have recently launched:
Identify the total number of wrong predictions made by the model.
Calculate precision, recall and F1 Score.
Answer:
(i) The total number of wrong predictions made by the model is the sum of
false positive and false negative.
FP+FN=0+100= 100
(ii) Before calculating, we will first see the formulas for precision, recall, and
F1 score.
Precision=TP/(TP+FP) Recall=TP/(TP+FN)
F1 Score = 2 * Precision * Recall /
=900/(900+0) =900/(900+100) ( Precision + Recall )
=(900/900)*100 =900/1000 =2 * 1.0 * 0.9 / (1.0+0.9)
=1.0 =.0.9 =2 * 0.4737
=0.947
Previously we get 90% accurate but now the accuracy of the model is 94.7%.
Question 2: An AI model made the following sales prediction for a new mobile
phone which they have recently launched:
Identify the total number of wrong predictions made by the model.
Calculate precision, recall and F1 Score.
Answer:
(i) The total number of wrong predictions made by the model is the sum of
false positive and false negative.
FP+FN=40+12= 52
(ii) Before calculating, we will first see the formulas for precision, recall, and
F1 score.
Precision=TP/(TP+FP) Recall=TP/(TP+FN) F1 Score = 2 * Precision * Recall / (
=50/(50+40) =50/(50+12) Precision + Recall )
=(50/90)*100 =50/62 =2 * 0.55 * .81 / (.55+.81)
=0.55 =.81 =.891 / 1.36
=0.65
Which metric is appropriate to evaluate the AI model?
Let’s compare which matrix is important for finding accuracy.
Suitability for
Metric Use Case When to Choose It
This Model
When classes are
Accuracy General performance Not suitable
balanced
Minimize false Spam detection,
Precision Suitable
positives Fraud detection
Minimize false Medical diagnosis,
Recall Suitable
negatives Safety alerts
Best choice for
Balance in precision When both FP & FN
F1-Score overall
& recall are important
performance
The F1-score (65.76%) is the most appropriate evaluation metric for this AI
model.
Ethical concerns around model evaluation
Ethical concerns around model evaluation primarily focus on three aspects:
bias, transparency, and accuracy. Nowadays, we are moving from the
Information era to the Artificial Intelligence era. Now we do not use data or
information, but the intelligence collected from the data to build solutions. We
need to keep aspects relating to ethical practices in mind while developing
solutions using AI. Let us understand some of the ethical concerns in detail.
Bias – Bias occurs when a model generates unfair or discriminatory
results. This can happen due to the model favoring certain groups or
due to the algorithm. For example, if the AI application of Amazon is
favoring male candidates only, then the maximum product suggestion
will be shown only to male candidates, which will decrease the profit of
the company.
Transparency – The AI decision-making process should be transparent;
people can easily understand and interpret the result. If the lack of
transparency issue is there, then the people will not trust the model.
For example, if any person has applied for a loan and the AI model
denies a loan application of any candidate, then it is the duty of the AI
that the applicant should know why the loan application is rejected.
Accuracy – The AI model should predict the correct result. The
accurate model makes error-free and reliable results. For example, in
medicine, an AI model should diagnose and generate accurate
predictions; otherwise, due to wrong diagnoses, it can lead to a serious
illness in the people.