What is evaluation ?
•As we know we have two kinds of datasets:
• Training Data Set
• Testing Data Set
•Evaluation is the process of understanding the reliability of any AI model,
based on outputs by feeding the test dataset into the model and comparing it
with actual answers.
•It’s not recommended to use the data we used to build the model to evaluate it. This is
because our model will simply remember the whole training set, and will therefore always
predict the correct label for any point in the training set. This is known as overfitting.
Models that use the training dataset during testing, will always results in correct output. This is known as
overfitting.
POSSIBLE REASONS FOR AN AI MODEL NOT BEING EFFICIENT?
• Lack of Training Data
• Unauthenticated Data / Wrong Data
• Inefficient coding / Wrong Algorithms
• Not Tested
• Not Easy
• Less Accuracy
Consider this scenario where you have an AI prediction model which
predicts the possibilities of fires in the forest. The main aim of this
model is to predict whether a forest fire has broken out into the forest or
not. To understand whether the model is working properly or not we need
to predict to check if the predictions made by the model is correct or not.
So there are two conditions:
1.Prediction
2.Reality
Type 1 error
Type 2 error
Confusion Matrix
It is a comparison between prediction and reality. It helps us to understand the
prediction result. It is not an evaluation metric but a record that can help in
evaluation.
Evaluation Methods
These evaluation methods are as follows:
1.Accuracy
2.Precision
3.Recall
4.F1 Score
Accuracy
•The percentage of correct predictions out of all the observations is called accuracy.
•If the prediction matches with reality then it said to be correct.
•There are two conditions where prediction matched with reality:
• True Positive
• True Negative
•So the formula for accuracy is:
This returns high accuracy for an AI model. But the actual cases where the fire broke out are not
taken into account. Therefore there is a need to look at another parameter that takes account of
such cases as well.
Precision
•The percentage of true positive cases versus all the cases where the prediction is true.
•It takes into account the True Positives and False Positives.
•If precision is high, it means there are more true positive cases
•If precision is low, it means there are more false positive cases
Let us consider that a model has 100%
precision. Which means that whenever
the machine says there’s a fire, there is
actually a fire (True Positive). In the
same model, there can be a rare
exceptional case where there was
actual fire but the system could not
detect it. This is the case of a False
Negative condition. But the precision
value would not be affected by it
because it does not take FN into
account. Is precision then a good
parameter for model performance
Recall
In the recall method, the fraction of positive cases that are correctly identified will be taken into
consideration. It majorly takes into account the true reality cases wherein Reality there was a
fire but the machine either detected it correctly or it didn’t. That is, it considers True Positives
and False Negatives.
Which Metric is Important?
F1 Score
F1 score can be defined as the measure of balance between precision and recall. Take a look at the
formula and think of when can we get a perfect F1 score? An ideal situation would be when we have a
value of 1 (that is 100%) for both Precision and Recall. In that case, the F1 score would also be an ideal 1
(100%). It is known as the perfect value for F1 Score. As the values of both Precision and Recall ranges
from 0 to 1, the F1 score also ranges from 0 to 1.
EVALUATION METRIC CONSIDERATION
ACCURACY CORRECT PREDICTIONS (TP + TN)
PRECISION POSITIVE PREDICTIONS (TP + FP)
RECALL TRUE REALITY CASES (TP + FN)
F1 SCORE PRECISION & RECALL
Why should we avoid using the training data for evaluation?
What should be the value of F1 score if the model needs to have 100% accuracy?
Calculate Accuracy, precision, recall and F1 score