Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views10 pages

EVALUATION

Uploaded by

mrutyunjaimohand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views10 pages

EVALUATION

Uploaded by

mrutyunjaimohand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

edurev.

in

Evaluation Chapter Notes | Artificial Intelligence for Class 10


PDF Download
9-11 minutes

What is Evaluation?

Evaluation is a process that critically examines a program. It involves collecting and analyzing

information about a program’s activities, characteristics, and outcomes. Its purpose is to make

judgments about a program, to improve its effectiveness, and/or to inform programming decisions.

Let me explain this to you:

Evaluation is basically to check the performance of your AI model. This is done by mainly two things:

“Prediction” & “Reality“. Evaluation is done by:

• First, search for some testing data with the resulted outcome that is 100% true.

• Then, feed that testing data to the AI model while you have the correct outcome with yourself,

which is termed as “Reality.”

• When you get the predicted outcome from the AI model, called “Prediction,” compare it with the

actual outcome, that is, “Reality.”

You can do this to:

• Improve the efficiency and performance of your AI model.

• Identify and correct mistakes.


Prediction and Reality

• Try not to use the dataset that has been used in the process of data acquisition or the training

data in evaluation.

• This is because your model will simply remember the whole training set and will therefore always

predict the correct label for any point in the training set. This is known as overfitting.

Evaluation Terminologies

There are various terminologies that come in when we work on evaluating our model. Let’s explore them

with an example of the Football scenario

The Scenario

• Imagine you have developed an AI-based prediction model designed to identify a football (soccer

ball). The objective of the model is to predict whether the given/shown figure is a football. To

understand the efficiency of this model, we need to check if the predictions it makes are correct

or not. Thus, there exist two conditions that we need to consider: Prediction and Reality.

◦ Prediction: The output given by the machine.

◦ Reality: The actual scenario about the figure shown when the prediction has been made.

• Now, let's look at various combinations that we can have with these two conditions:

◦ True Positive (TP): The model predicts the figure as a football, and it is indeed a football.

◦ True Negative (TN): The model predicts the figure as not a football, and it is indeed not a

football.

◦ False Positive (FP): The model predicts the figure as a football, but it is not a football.

◦ False Negative (FN): The model predicts the figure as not a football, but it is indeed a

football.

By analyzing these combinations, we can evaluate the performance and efficiency of the AI model. The

goal is to maximize the number of True Positives and True Negatives while minimizing the number of

False Positives and False Negatives.

1. Possibility
2. Case

3. Possible action

4. Last case

Try yourself:

What is the term used to describe when the model predicts the figure as a football, and it is indeed a

football?

• A.

True Positive (TP)


• B.

True Negative (TN)

• C.

False Positive (FP)

• D.

False Negative (FN)

Confusion Matrix

The comparison between the results of Prediction and Reality is known as the Confusion Matrix.

The confusion matrix helps us interpret the prediction results. It is not an evaluation metric itself but

serves as a record to aid in evaluation. Let’s review the four conditions related to the football example

once more.
Get additional INR 200 off today with EDUREV200 coupon. Avail Offer

Confusion Matrix table

Parameters to Evaluate a Model

Now let us go through all the possible combinations of “Prediction” and “Reality” & let us see how we

can use these conditions to evaluate the model.


Accuracy

Definition: Accuracy is the percentage of “correct predictions out of all observations.” A prediction is

considered correct if it aligns with the reality.

In this context, there are two scenarios where the Prediction matches the Reality:

Accuracy Formula

Here, total observations cover all the possible cases of prediction that can be True Positive (TP), True

Negative (TN), False Positive (FP), and False Negative (FN).

Example: Let’s revisit the Football example.

Assume the model always predicts that there is no football. In reality, there is a 2% chance of

encountering a football. In this scenario, the model will be correct 98% of the time when it predicts no

football. However, it will be incorrect in the 2% of cases where a football is actually present, as it

incorrectly predicts no football.

Here:
Conclusion

Precision

Definition: The percentage of "true positive cases" out of all cases where the prediction is positive. This

metric considers both True Positives and False Positives. It measures how well the model identifies

positive cases among all cases it predicts as positive.

In other words, it evaluates the proportion of correctly identified positive instances compared to all

instances the model predicted as positive.

Precision Formula

Definition: Precision is the percentage of “true positive cases” out of all cases where the prediction is

positive. It considers both True Positives and False Positives.

In the football example, if the model always predicts the presence of a football, regardless of reality, all

positive predictions are evaluated, including:

• True Positive (Prediction = Yes and Reality = Yes)

• False Positive (Prediction = Yes and Reality = No)

Just like the story of the boy who falsely cried out about wolves and was ignored when real wolves

arrived, if the precision is low (indicating more false positives), it could lead to complacency. Players

might start ignoring the predictions, thinking they're mostly false, and thus fail to check for the ball when
it’s actually there.

Example:

Recall

Definition: Recall, also known as Sensitivity or True Positive Rate, is the fraction of actual positive

cases that are correctly identified by the model.

In the football example, recall focuses on the true cases where a football was actually present,

examining how well the model detected it. It takes into account:

• True Positives (TP): Cases where the model correctly identified the presence of a football.

• False Negatives (FN): Cases where a football was present, but the model failed to detect it.

Recall Formula

In both Precision and Recall, the numerator is the same: True Positives. However, the denominators

differ: Precision includes False Positives, while Recall includes False Negatives.

F1 Score

Definition: The F1 Score measures the balance between precision and recall. It is used when there is

no clear preference for one metric over the other, providing a way to seek a balance between them.

F1 Score Formula

Try yourself:
Which metric measures the balance between precision and recall?

• A.

Accuracy

• B.

Precision

• C.

Recall

• D.

F1 Score

Which Metric is Important?

Choosing between Precision and Recall depends on the specific context and the costs associated with

False Positives and False Negatives:

• Forest Fire Detection: Here, a False Negative (failing to detect a fire when there is one) is

critical because it could lead to devastating consequences, like the forest burning down.

Therefore, Recall (which emphasizes detecting all positive cases) is crucial in this scenario.

• Viral Outbreak Prediction: A False Negative here (not identifying an outbreak when it occurs)

can lead to widespread infection and severe public health issues. Hence, Recall is again more

important.

• Mining: If a model predicts the presence of treasure (a False Positive) but there's none, it could

result in unnecessary and costly digging. In this case, Precision (which focuses on avoiding false

alarms) is more valuable.

• Spam Email Classification: If a model incorrectly labels a legitimate email as spam (a False

Positive), it could lead to missing important messages. Therefore, Precision is critical in this

scenario as well.

Cases of High FN Cost:

• Forest Fire

• Viral

Cases of High FP Cost:


• Spam

• Mining

Both the parameters are important

To sum up, if you want to assess your model’s performance comprehensively, both Precision and Recall

are crucial metrics.

• High Precision might come at the cost of Low Recall, and vice versa.

• The F1 Score is a metric that balances both Precision and Recall, providing a single score to

evaluate model performance.

• An ideal scenario would be where both Precision and Recall are 100%, leading to an F1 Score of

1 (or 100%).

Both Precision and Recall range from 0 to 1, and so does the F1 Score, with 1 representing the perfect

performance.

F1 Score Table

Let us explore the variations we can have in the F1 Score:

You might also like