EVALUATION - Class 10
Artificial Intellignce(417)
1. What is Evaluation?
Ans : Evaluation is the process of understanding the reliability of any AI model,
based on outputs by feeding test dataset into the model and comparing with
actual answers. Its purpose is to make judgments about a program, to improve
its effectiveness, and/or to inform programming decisions.
2. Why is Evaluation important? Explain.
Ans : Evaluation is a process that critically examines a program by collecting
and analyzing information about a program’s activities, characteristics and
outcomes. The advantages of Evaluation are as follows :
i. Evaluation ensures that the model is operating correctly and
optimally.
ii. Evaluation is an initiative to understand how well it achieves its goals.
iii. Evaluation help to determine what works well and what could be
improved in a Program.
3. What is meant by Overfitting of Data?
Overfitting is "the production of an analysis that corresponds too closely or
exactly to a particular set of data, and may therefore fail to fit additional data
or predict future observations reliably".
OR
Models that use the training dataset during testing, will always results in
correct output. This is known as overfitting
4. What are Prediction & Reality in relation to Evaluation?
Ans :Prediction – It is the output given by the AI model using Machine Learning
Algorithm.
Reality – It is the real scenario of the situation for which the prediction
has been made.
5. Differentiate between Prediction and Reality.
Ans :
a) Prediction is the input given to the machine to receive the expected result
of reality.
b) Prediction is the output given to match reality.
c) The prediction is the output which is given by the machine and the reality is
the real scenario in which the prediction has been made.
d) Prediction and reality both can be used interchangeably.
6. Terminologies of Model Evaluation
The Scenario
Let’s imagine that we have an AI-based prediction model which has been
deployed to identify a Football or a soccer ball.
Now, the objective of the model is to predict whether the given/shown figure
is a football. Now, to understand the efficiency of this model, we need to
check if the predictions which it makes are correct or not. So we need to
consider upon Prediction and Reality.
Case 1 :
a) Prediction = YES
b) Reality = YES
The predicted value matches the actual value.
Here, the Prediction is positive and matches Reality. Hence, this
condition is termed as True Positive.
Case 2 :
a) Prediction = No
b) Reality = No
The predicted value matches the actual value.
Here, the Prediction is negative and matches Reality. Hence, this
condition is termed as True Negative.
Case 3 :
a) Prediction = Yes
b) Reality = No
The predicted value does not match the actual value.
Here, the Prediction is positive and does not match Reality. Hence, this
condition is termed as False Positive.
This is also known as Type 1 Error.
Case 4 :
c) Prediction = No
d) Reality = Yes
The predicted value does not match the actual value.
Here, the Prediction is negative and does not match Reality. Hence, this
condition is termed as False Negative.
This is also known as Type 2 Error.
7. What is Confusion Matrix?
Ans : Confusion Matrix is a
tabular structure which
helps in measuring the
performance of an AI
model using the test data.
The result of comparison
between the prediction
and reality are recorded in
confusion matrix. It is a
record that helps in
evaluation.
8. Parameters to Evaluate a Model
9. What is Accuracy? Mention its formula.
Ans : Accuracy is defined as the percentage of correct predictions out of all
the observations. A prediction is said to be correct if it matches reality.
Here we have two conditions in which the Prediction matches with the
Reality, i.e., True Positive and True Negative. Therefore, Formula for
Accuracy is –
Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN
= False Negatives.
10. What is Precision? Mention its formula.
Ans : Precision is defined as the percentage of true positive cases versus all
the cases where the prediction is true. That is, it takes into account the
True Positives and False Positives.
11. What is Recall? Mention its formula.
Ans : Recall is defined as the fraction of positive cases that are correctly
Identified. It majorly takes into account the true reality cases.
12. How do you suggest which evaluation metric is more important for any
case ?
Ans :
F 1 Evaluation metric is more important in any case. F1 score maintains a
balance between the precision and recall for the classifier. If the precision is
low, the F1 is low and if the recall is low again F1 score is low. The F1 score
is a number between 0 and 1 and is the harmonic mean of precision and
recall. When we have a value of 1 (that is 100%) for both Precision and
Recall, the F1 score would also be an ideal 1 (100%). It is known as the
perfect value for F1 Score. As the values of both Precision and Recall ranges
from 0 to 1, the F1 score also ranges from 0 to 1.
A model is said to have a good performance if the F1 score for that model is
high.
13.Give an example where High Accuracy is not usable.
Ans : SCENARIO: An expensive robotic chicken crosses a very busy road a
thousand times per day. An ML model evaluates traffic patterns and
predicts when this chicken can safely cross the street with an accuracy of
99.99%.
Explanation: A 99.99% accuracy value on a very busy road strongly suggests
that the ML model is far better than chance. In some settings, however, the
cost of making even a small number of mistakes is still too high. 99.99%
accuracy means that the expensive chicken will need to be replaced, on
average, every 10 days. (The chicken might also cause extensive damage to
cars that it hits.)
14.Give an example where High Precision is not usable.
Ans : Example: “Predicting a mail as Spam or Not Spam”
False Positive: Mail is predicted as “spam” but it is “not spam”.
False Negative: Mail is predicted as “not spam” but it is “spam”. Of course,
too many False Negatives will make the spam filter ineffective but False
Positives may cause important mails to be missed and hence Precision is
not usable.
15. Which evaluation metric would be crucial in the following cases? Justify.
In a case like Forest Fire, a False Negative can cost us a lot and is risky
too. Imagine no alert being given even when there is a Forest Fire. The whole
forest might burn down.
Another case where a False Negative can be dangerous is Viral Outbreak.
Imagine a deadly virus has started spreading and the model which is supposed
to predict a viral outbreak does not detect it. The virus might spread widely
and infect a lot of people.
On the other hand, there can be cases in which the False Positive
condition costs us more than False Negatives. One such case is Mining.
Imagine a model telling you that there exists treasure at a point and you keep
on digging there but it turns out that it is a false alarm. Here, the False
Positive case (predicting there is a treasure but there is no treasure) can be
very costly.
Similarly, let’s consider a model that predicts whether a mail is spam or
not. If the model always predicts that the mail is spam, people would not look
at it and eventually might lose important information. Here also False Positive
condition (Predicting the mail as spam while the mail is not spam) would have
a high cost.
16.Cases of High FN Cost
Forest Fire
Viral
Cases of High FP Cost
Spam
Mining
17. Calculate Accuracy, Precision, Recall and F1 Score for the following
Confusion Matrix on Heart Attack Risk. Also suggest which metric would be a
good evaluation parameter here and why?
Where True Positive (TP) = 50, True Negative (TN) = 20, False Positive (FP) = 20 and
False Negative (FN) = 10.
Accuracy
=((50+20) / (50+20+20+10))*100%
= (70/100) * 100%
= 0.7 * 100% = 70%
Precision:
Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true.
= (50 / (50 + 20)) * 100%
= (50/70)*100%
= 0.714 *100% = 71.4%
Recall: It is defined as the fraction of positive cases that are correctly identified.
= 50 / (50 + 10)
= 50 / 60
= 0.83
F1 Score:
F1 score is defined as the measure of balance between precision and recall.
= 2 * (0.714 *0.83) / (0.714 + 0.83)
= 2 * (0.592 / 1.544)
= 2* (0.383) = 0.766
Therefore,
Accuracy= 0.7 Precision=0.714 Recall=0.83 F1 Score=0.766
Here within the test there is a tradeoff. But Recall is a good Evaluation metric.
Recall metric needs to improve more.
Because,
False Positive (impacts Precision): A person is predicted as high risk but does not
have heart attack.
False Negative (impacts Recall): A person is predicted as low risk but has heart
attack. Therefore, False Negatives miss actual heart patients, hence recall metric
need more improvement.
False Negatives are more dangerous than False Positives.
18. Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion
Matrix on Water Shortage in Schools: Also suggest which metric would not be
a good evaluation parameter here and why?
Where True Positive (TP), True Negative (TN), False Positive (FP) and False Negative
(FN).
Accuracy
Accuracy is defined as the percentage of correct predictions out of all the
observations
= ((75+15) / (75+15+5+5))*100%
= (90 / 100) *100%
=0.9 *100% = 90%
Precision:
Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true.
= (75 / (75+5))*100%
= (75 /80)*100%
= 0.9375 *100% = 93%
Recall:
It is defined as the fraction of positive cases that are correctly identified.
= 75 / (75+5)
= 75 /80
= 0.9375
F1 Score:
F1 score is defined as the measure of balance between precision and recall.
= 2 * ((0.9375 *0.9375) / (0.9375+0.9375)
= 2 * (0.8789 / 1.875)
= 2 * 0.46875 = 0.9375
Accuracy= 90% Precision=93% Recall=0.9375 F1 Score=0.9375
Here precision, recall, accuracy, f1 score all are same
19. Calculate Accuracy, Precision, Recall and F1 Score for the following
Confusion Matrix on SPAM FILTERING: Also suggest which metric would not be a
good evaluation parameter here and why?
Accuracy is defined as the percentage of correct predictions out of all the
Observations
Where True Positive (TP) = 10, True Negative (TN) = 25, False Positive (FP) = 55 and
False Negative (FN) = 10.
Accuracy
= ((10 + 25) / (10+25+55+10))*100%
= (35 / 100)*100%
= 0.35 %100% = 35%
Precision:
Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true.
= (10 / (10 +55))*100%
= (10 /65) *100%
= 0.15 *100% = 15%
Recall:
= 10/(10+10)
= 10/20
0.5
F1 Score
F1 score is defined as the measure of balance between precision and recall.
= 2 * ((0.15 * 0.5) / (0.15 + 0.5))
= 2 * (0.075 / 0.65)
= 2 * 0.115
= 0.23
Accuracy= 35% Precision= 15% Recall= 0.5 F1 Score= 0.23
Here within the test there is a tradeoff. But Precision is not a good Evaluation
metric. Precision metric needs to improve more.
Because,
False Positive (impacts Precision): Mail is predicted as “spam” but it is not.
False Negative (impacts Recall): Mail is predicted as “not spam” but spam
Too many False Negatives will make the Spam Filter ineffective. But False
Positives may cause important mails to be missed. Hence, Precision is more
important to improve.