0% found this document useful (0 votes)

13 views15 pages

03 Performance Metrics

The document discusses performance metrics used to evaluate statistical and machine learning models, focusing on classification and regression tasks. It details various classification performance metrics, including confusion matrix-based metrics, threshold curve-based metrics, probabilistic metrics, and class-imbalance-sensitive metrics, along with their applications in fields like medical diagnosis and fraud detection. Additionally, it explains the importance of metrics such as accuracy, precision, recall, F1 score, ROC curve, and AUC in assessing model performance.

Uploaded by

victor.seelan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views15 pages

03 Performance Metrics

Uploaded by

victor.seelan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Performance Metrics

Performance metrics are quantitative measures that evaluate how well a

statistical or machine learning model fits the data and makes predictions.
They differ based on the type of task: Classification or Regression

Classification Performance Metrics

Classification tasks involve predicting categorical outcomes. Performance is
measured by how well the predicted labels match the true labels.

1 . Confusion Matrix–Based Metrics

2. Threshold Curve–Based Metrics

3. Probabilistic Metrics

4. Class-Imbalance–Sensitive Metrics

1 . Confusion Matrix–Based Metrics

 Accuracy: Overall correctness of the model. Works poorly with imbalanced

data.

 Precision (Positive Predictive Value, PPV): Out of all predicted positives,

how many are actually positive.

 Recall (Sensitivity, True Positive Rate, TPR): Out of all actual positives,
how many are correctly predicted.

 Specificity (True Negative Rate, TNR): Out of all actual negatives, how
many are correctly predicted.

 F1 Score: Harmonic mean of precision and recall. Useful when class

distribution is imbalanced.

 Fβ Score: Weighted F-score. β>1 emphasizes recall, β<1emphasizes precision.

 Balanced Accuracy: Average performance across both classes, useful for

imbalanced data.
 Matthews Correlation Coefficient (MCC): Correlation between true and
predicted labels. Ranges from −1 (worst) to +1(best).

 Cohen’s Kappa: Adjusted accuracy that accounts for agreement by chance.

Confusion matrix

A confusion matrix is a table that is used to evaluate the performance of a classification

model by comparing predicted values against actual values. It is an important tool for
understanding the accuracy of a model, and can help identify areas of improvement.

The confusion matrix serves as the basis for calculating essential

evaluation metrics that offer nuanced insights into a model's
performance:
 Accuracy: Accuracy quantifies the ratio of correct predictions
(TP and TN) to the total number of predictions. While
informative, this metric can be misleading when classes are
imbalanced.
 Precision: Precision evaluates the proportion of true positive
predictions among all positive predictions (TP / (TP + FP)). This
metric is crucial when the cost of false positives is high.
 Recall (Sensitivity or True Positive Rate): Recall measures the
ratio of true positive predictions to the actual number of positive
instances (TP / (TP + FN)). This metric is significant when
missing positive instances is costly.
 Specificity (True Negative Rate): Specificity calculates the
ratio of true negative predictions to the actual number of
negative instances (TN / (TN + FP)). This metric is vital when
the emphasis is on accurately identifying negative instances.
 F1-Score: The F1-Score strikes a balance between precision and
recall, making it useful when both false positives and false
negatives carry similar importance.
Example:

Suppose you are working for a bank and are responsible for assessing loan applications. You
have built a machine learning model that predicts whether an applicant is likely to default on
their loan or not. The model has been trained on historical data and has achieved an accuracy
of 85%.

To evaluate the performance of the model, you can use a confusion matrix. The matrix is
constructed by comparing the predicted values against the actual values, as shown below:
Applications of the Confusion Matrix
The confusion matrix has applications across various fields:
 Model Evaluation: The primary application of the confusion matrix is to
evaluate the performance of a classification model. It provides insights
into the model's accuracy, precision, recall, and F1-score.
 Medical Diagnosis: The confusion matrix finds extensive use in medical
fields for diagnosing diseases based on tests or images. It aids in
quantifying the accuracy of diagnostic tests and identifying the balance
between false positives and false negatives.
 Fraud Detection: Banks and financial institutions use confusion matrices
to detect fraudulent transactions by showcasing how AI algorithms help
identify patterns of fraudulent activities.
 Natural Language Processing (NLP): NLP models use confusion
matrices to evaluate sentiment analysis, text classification, and named
entity recognition.
 Customer Churn Prediction: Confusion matrices play a pivotal role in
predicting customer churn and show how AI-driven models use historical
data to anticipate and mitigate customer attrition.
 Image and Object Recognition: Confusion matrices assist in training
models to identify objects in images, enabling technologies like self-
driving cars and facial recognition systems.
 A/B Testing: A/B testing is crucial for optimizing user experiences.
Confusion matrices help analyze the results of A/B tests, enabling data-
driven decisions in user engagement strategies.
2. Threshold Curve–Based Metrics

Threshold curve–based metrics help visualize and evaluate classifier performance across
different decision thresholds. A decision threshold in machine learning is a cut-off value
used to convert predicted probabilities or scores into class labels.

For binary classification, it

determines whether an instance is
assigned to the positive or negative class
based on whether its predicted score is
above or below this point.

How It Works

 Probabilistic classifiers (like

logistic regression) output a score
between 0 and 1 for each data
point.

 The threshold (often defaulted to 0.5) decides which class a data point is assigned:
scores above the threshold are labeled positive, and scores below are labeled negative.

List of Threshold Curve:

 Receiver Operating Characteristic (ROC) Curve: Visual tool for classifier
performance. Plots TPR vs. FPR at different thresholds.

 Area Under the ROC Curve (AUC-ROC): Probability that the model ranks
a random positive higher than a random negative.

 Precision–Recall Curve (PRC): More informative for imbalanced datasets.

 Area Under the Precision–Recall Curve (AUC-PR): Summarizes trade-off

between precision and recall.

1. Receiver Operating Characteristic (ROC) Curve:

The Receiver Operating Characteristic (ROC) Curve is a graphical tool that visualizes the
performance of a binary classifier across different decision thresholds. It plots the True
Positive Rate (TPR) also called (sensitivity or recall) against the False Positive Rate
(FPR) for each possible threshold.

Applications

ROC curves are widely used in fields like medical diagnostics, signal detection, and machine
learning to assess, select, and compare models independently of the underlying class
distribution or cost context.
2. Area Under the ROC Curve (AUC-ROC):

Area Under the ROC Curve (AUC-ROC) is a scalar metric that summarizes the overall
performance of a binary classifier across all possible decision thresholds. AUC-ROC is the
probability that a classifier will rank a randomly chosen positive example higher than a
randomly chosen negative example.

 The value ranges from 0.5 (no discrimination, equivalent to random guessing) to 1.0
(perfect discrimination).

 Higher values indicate better ability to distinguish between the classes.

Example:
3. Precision–Recall Curve (PRC): More informative for imbalanced datasets.

The Precision–Recall Curve (PRC) is a graphical representation used to evaluate the

performance of a binary classifier, especially on imbalanced datasets where the positive class
is rare.

 The x-axis represents Recall (also called sensitivity or True Positive Rate), which
measures the proportion of actual positives correctly identified by the model.

 The y-axis represents Precision, which measures the proportion of predicted positives
that are actually correct.

4. Area Under the

Precision–Recall Curve (AUC-PR)

Definition:
The area under the Precision–Recall Curve (PRC). It summarizes the trade-off between
precision and recall over all classification thresholds.
 A single-number measure of how well the classifier balances precision and recall.

 Higher AUC–PR = better performance, especially on rare positive classes.

If 1.0 = Perfect classifier (always high precision and recall). Baseline →

Equal to the prevalence of the positive class (i.e., proportion of positives in
the dataset).

3. Probabilistic Metrics

Probabilistic metrics evaluate how well the predicted probabilities match the actual
outcomes. These are more informative than hard classification accuracy, especially when we
care about calibration and confidence.

 Log Loss (Cross-Entropy Loss): Penalizes false predictions with high

confidence.
 Brier Score: Measures accuracy of predicted probabilities (lower is better).

4. Class-Imbalance–Sensitive Metrics

 Macro Average Precision/Recall/F1 (averaging metrics across classes

equally)

 Micro Average Precision/Recall/F1 (averaging metrics across instances)

 Weighted Average Precision/Recall/F1 (weighted by class size)

 Geometric Mean (G-Mean)

5. Ranking / Ordering Metrics

 Top-K Accuracy (useful in multi-class problems like image classification) :

Probability that the true class is among the top K predicted classes.

 Mean Average Precision (mAP): Common in information retrieval/object

detection; averages precision over recall levels.

Unit 3 Theories and Principles in The Use and Design of Technology Driven Learning Lessons
100% (1)
Unit 3 Theories and Principles in The Use and Design of Technology Driven Learning Lessons
49 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
Unit 4
No ratings yet
Unit 4
20 pages
Performance Metrics
No ratings yet
Performance Metrics
34 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
3.4. Evaluation Metrics For AI Models
No ratings yet
3.4. Evaluation Metrics For AI Models
36 pages
2.3 Performance Metrics
No ratings yet
2.3 Performance Metrics
32 pages
Simple Sabotage Field Manual
50% (2)
Simple Sabotage Field Manual
16 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
M1000H
No ratings yet
M1000H
2 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
9.1 Accuracy: Formula: Accuracy (True Positives + True Negatives) / (Total Observations)
No ratings yet
9.1 Accuracy: Formula: Accuracy (True Positives + True Negatives) / (Total Observations)
4 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
W6 CSE 4781 Classification Metrics
No ratings yet
W6 CSE 4781 Classification Metrics
28 pages
? Task
No ratings yet
? Task
23 pages
Data M
No ratings yet
Data M
10 pages
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
No ratings yet
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
31 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
Data M11
No ratings yet
Data M11
5 pages
Confusion Matrix
No ratings yet
Confusion Matrix
16 pages
Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Unit2 - Perfomance Measures
No ratings yet
Unit2 - Perfomance Measures
32 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Performance Metrics ML
No ratings yet
Performance Metrics ML
4 pages
What Are The Evaluation Metrics in Machine Learning
No ratings yet
What Are The Evaluation Metrics in Machine Learning
3 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Module 2
No ratings yet
Module 2
151 pages
Module 01 - Performance Metrics in ML
No ratings yet
Module 01 - Performance Metrics in ML
15 pages
Performance Measures
No ratings yet
Performance Measures
19 pages
Form 4 pH Determination Guide
0% (1)
Form 4 pH Determination Guide
3 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
Evaluation Metrics in Machine Learning - GeeksforGeeks
No ratings yet
Evaluation Metrics in Machine Learning - GeeksforGeeks
6 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
Imp Notes For Aamd
No ratings yet
Imp Notes For Aamd
6 pages
Ads Exp4
No ratings yet
Ads Exp4
3 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
6 pages
Hyundai Engine HMC l4kb9 Shop Manual
100% (64)
Hyundai Engine HMC l4kb9 Shop Manual
10 pages
6.evaluation Metrics - UNIT 2
No ratings yet
6.evaluation Metrics - UNIT 2
4 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Physical Education Revision
No ratings yet
Physical Education Revision
3 pages
Performance Metrics
No ratings yet
Performance Metrics
8 pages
How To Send or Receive SMS Message Via GSM Module by at Commands
100% (1)
How To Send or Receive SMS Message Via GSM Module by at Commands
6 pages
FAI Lecture - 23-10-2023 PDF
No ratings yet
FAI Lecture - 23-10-2023 PDF
12 pages
Model Performance Assessment
No ratings yet
Model Performance Assessment
13 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
8 pages
Solution For "Financial Statement Analysis" Penman 5th Edition
64% (28)
Solution For "Financial Statement Analysis" Penman 5th Edition
16 pages
Classification Metrics Guide
No ratings yet
Classification Metrics Guide
15 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Rockwool Installation Guide
100% (1)
Rockwool Installation Guide
8 pages
Ubd Graphing Slope-Intercept Form
No ratings yet
Ubd Graphing Slope-Intercept Form
4 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
11 pages
Performance Parameters
No ratings yet
Performance Parameters
23 pages
Graph Operations in Applied Math
100% (1)
Graph Operations in Applied Math
20 pages
Alison Vidal
No ratings yet
Alison Vidal
5 pages
Exp7 MLAI2
No ratings yet
Exp7 MLAI2
8 pages
Adani Group Acquires NDTV Assingment No. 1
No ratings yet
Adani Group Acquires NDTV Assingment No. 1
11 pages
Unit II 10 Data Preprocessing Techniques
No ratings yet
Unit II 10 Data Preprocessing Techniques
13 pages
LESSON PLAN - 04-Graphing Linear Equations in Two Variables
No ratings yet
LESSON PLAN - 04-Graphing Linear Equations in Two Variables
6 pages
07 GNP of India
No ratings yet
07 GNP of India
6 pages
Sale of Goods Act 1930 Overview
No ratings yet
Sale of Goods Act 1930 Overview
27 pages
Quiz Questions
No ratings yet
Quiz Questions
2 pages
Unit II 04 Functions and Modules
No ratings yet
Unit II 04 Functions and Modules
7 pages
Integrated Geophysical Approach For Dam Health Checks and Monitoring
No ratings yet
Integrated Geophysical Approach For Dam Health Checks and Monitoring
37 pages
MGMT5410 STRATEGIC MANAGEMENT - Outline 25-1-24
No ratings yet
MGMT5410 STRATEGIC MANAGEMENT - Outline 25-1-24
16 pages
Unit II 07 Numpy
No ratings yet
Unit II 07 Numpy
6 pages
Man Xtvsuite en
No ratings yet
Man Xtvsuite en
74 pages
Nursing Body Mechanics Guide
No ratings yet
Nursing Body Mechanics Guide
66 pages
Origin of Adat Iban
No ratings yet
Origin of Adat Iban
7 pages
Carbon-14: 2 Radiocarbon Dating
No ratings yet
Carbon-14: 2 Radiocarbon Dating
6 pages
Izadian Leila Thesis 2021
No ratings yet
Izadian Leila Thesis 2021
39 pages
Class 12 Physics Electricity Experiment
No ratings yet
Class 12 Physics Electricity Experiment
18 pages
Lentil & Legume Price Guide
No ratings yet
Lentil & Legume Price Guide
15 pages
TCS
No ratings yet
TCS
43 pages
Hypertension Cheat Sheet
No ratings yet
Hypertension Cheat Sheet
4 pages
Can Profitability & Morality Co-Exist
75% (4)
Can Profitability & Morality Co-Exist
38 pages
Manual7298631 Dell Color Management User S Guide For Macos
No ratings yet
Manual7298631 Dell Color Management User S Guide For Macos
13 pages
23:23:48
No ratings yet
23:23:48
364 pages
Business English Vocabulary Guide
No ratings yet
Business English Vocabulary Guide
27 pages