0% found this document useful (0 votes)

13 views15 pages

03 Performance Metrics

The document discusses performance metrics used to evaluate statistical and machine learning models, focusing on classification and regression tasks. It details various classification performance metrics, including confusion matrix-based metrics, threshold curve-based metrics, probabilistic metrics, and class-imbalance-sensitive metrics, along with their applications in fields like medical diagnosis and fraud detection. Additionally, it explains the importance of metrics such as accuracy, precision, recall, F1 score, ROC curve, and AUC in assessing model performance.

Uploaded by

victor.seelan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views15 pages

03 Performance Metrics

Uploaded by

victor.seelan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Performance Metrics

Performance metrics are quantitative measures that evaluate how well a

statistical or machine learning model fits the data and makes predictions.
They differ based on the type of task: Classification or Regression

Classification Performance Metrics

Classification tasks involve predicting categorical outcomes. Performance is
measured by how well the predicted labels match the true labels.

1 . Confusion Matrix–Based Metrics

2. Threshold Curve–Based Metrics

3. Probabilistic Metrics

4. Class-Imbalance–Sensitive Metrics

1 . Confusion Matrix–Based Metrics

 Accuracy: Overall correctness of the model. Works poorly with imbalanced

data.

 Precision (Positive Predictive Value, PPV): Out of all predicted positives,

how many are actually positive.

 Recall (Sensitivity, True Positive Rate, TPR): Out of all actual positives,
how many are correctly predicted.

 Specificity (True Negative Rate, TNR): Out of all actual negatives, how
many are correctly predicted.

 F1 Score: Harmonic mean of precision and recall. Useful when class

distribution is imbalanced.

 Fβ Score: Weighted F-score. β>1 emphasizes recall, β<1emphasizes precision.

 Balanced Accuracy: Average performance across both classes, useful for

imbalanced data.
 Matthews Correlation Coefficient (MCC): Correlation between true and
predicted labels. Ranges from −1 (worst) to +1(best).

 Cohen’s Kappa: Adjusted accuracy that accounts for agreement by chance.

Confusion matrix

A confusion matrix is a table that is used to evaluate the performance of a classification

model by comparing predicted values against actual values. It is an important tool for
understanding the accuracy of a model, and can help identify areas of improvement.

The confusion matrix serves as the basis for calculating essential

evaluation metrics that offer nuanced insights into a model's
performance:
 Accuracy: Accuracy quantifies the ratio of correct predictions
(TP and TN) to the total number of predictions. While
informative, this metric can be misleading when classes are
imbalanced.
 Precision: Precision evaluates the proportion of true positive
predictions among all positive predictions (TP / (TP + FP)). This
metric is crucial when the cost of false positives is high.
 Recall (Sensitivity or True Positive Rate): Recall measures the
ratio of true positive predictions to the actual number of positive
instances (TP / (TP + FN)). This metric is significant when
missing positive instances is costly.
 Specificity (True Negative Rate): Specificity calculates the
ratio of true negative predictions to the actual number of
negative instances (TN / (TN + FP)). This metric is vital when
the emphasis is on accurately identifying negative instances.
 F1-Score: The F1-Score strikes a balance between precision and
recall, making it useful when both false positives and false
negatives carry similar importance.
Example:

Suppose you are working for a bank and are responsible for assessing loan applications. You
have built a machine learning model that predicts whether an applicant is likely to default on
their loan or not. The model has been trained on historical data and has achieved an accuracy
of 85%.

To evaluate the performance of the model, you can use a confusion matrix. The matrix is
constructed by comparing the predicted values against the actual values, as shown below:
Applications of the Confusion Matrix
The confusion matrix has applications across various fields:
 Model Evaluation: The primary application of the confusion matrix is to
evaluate the performance of a classification model. It provides insights
into the model's accuracy, precision, recall, and F1-score.
 Medical Diagnosis: The confusion matrix finds extensive use in medical
fields for diagnosing diseases based on tests or images. It aids in
quantifying the accuracy of diagnostic tests and identifying the balance
between false positives and false negatives.
 Fraud Detection: Banks and financial institutions use confusion matrices
to detect fraudulent transactions by showcasing how AI algorithms help
identify patterns of fraudulent activities.
 Natural Language Processing (NLP): NLP models use confusion
matrices to evaluate sentiment analysis, text classification, and named
entity recognition.
 Customer Churn Prediction: Confusion matrices play a pivotal role in
predicting customer churn and show how AI-driven models use historical
data to anticipate and mitigate customer attrition.
 Image and Object Recognition: Confusion matrices assist in training
models to identify objects in images, enabling technologies like self-
driving cars and facial recognition systems.
 A/B Testing: A/B testing is crucial for optimizing user experiences.
Confusion matrices help analyze the results of A/B tests, enabling data-
driven decisions in user engagement strategies.
2. Threshold Curve–Based Metrics

Threshold curve–based metrics help visualize and evaluate classifier performance across
different decision thresholds. A decision threshold in machine learning is a cut-off value
used to convert predicted probabilities or scores into class labels.

For binary classification, it

determines whether an instance is
assigned to the positive or negative class
based on whether its predicted score is
above or below this point.

How It Works

 Probabilistic classifiers (like

logistic regression) output a score
between 0 and 1 for each data
point.

 The threshold (often defaulted to 0.5) decides which class a data point is assigned:
scores above the threshold are labeled positive, and scores below are labeled negative.

List of Threshold Curve:

 Receiver Operating Characteristic (ROC) Curve: Visual tool for classifier
performance. Plots TPR vs. FPR at different thresholds.

 Area Under the ROC Curve (AUC-ROC): Probability that the model ranks
a random positive higher than a random negative.

 Precision–Recall Curve (PRC): More informative for imbalanced datasets.

 Area Under the Precision–Recall Curve (AUC-PR): Summarizes trade-off

between precision and recall.

1. Receiver Operating Characteristic (ROC) Curve:

The Receiver Operating Characteristic (ROC) Curve is a graphical tool that visualizes the
performance of a binary classifier across different decision thresholds. It plots the True
Positive Rate (TPR) also called (sensitivity or recall) against the False Positive Rate
(FPR) for each possible threshold.

Applications

ROC curves are widely used in fields like medical diagnostics, signal detection, and machine
learning to assess, select, and compare models independently of the underlying class
distribution or cost context.
2. Area Under the ROC Curve (AUC-ROC):

Area Under the ROC Curve (AUC-ROC) is a scalar metric that summarizes the overall
performance of a binary classifier across all possible decision thresholds. AUC-ROC is the
probability that a classifier will rank a randomly chosen positive example higher than a
randomly chosen negative example.

 The value ranges from 0.5 (no discrimination, equivalent to random guessing) to 1.0
(perfect discrimination).

 Higher values indicate better ability to distinguish between the classes.

Example:
3. Precision–Recall Curve (PRC): More informative for imbalanced datasets.

The Precision–Recall Curve (PRC) is a graphical representation used to evaluate the

performance of a binary classifier, especially on imbalanced datasets where the positive class
is rare.

 The x-axis represents Recall (also called sensitivity or True Positive Rate), which
measures the proportion of actual positives correctly identified by the model.

 The y-axis represents Precision, which measures the proportion of predicted positives
that are actually correct.

4. Area Under the

Precision–Recall Curve (AUC-PR)

Definition:
The area under the Precision–Recall Curve (PRC). It summarizes the trade-off between
precision and recall over all classification thresholds.
 A single-number measure of how well the classifier balances precision and recall.

 Higher AUC–PR = better performance, especially on rare positive classes.

If 1.0 = Perfect classifier (always high precision and recall). Baseline →

Equal to the prevalence of the positive class (i.e., proportion of positives in
the dataset).

3. Probabilistic Metrics

Probabilistic metrics evaluate how well the predicted probabilities match the actual
outcomes. These are more informative than hard classification accuracy, especially when we
care about calibration and confidence.

 Log Loss (Cross-Entropy Loss): Penalizes false predictions with high

confidence.
 Brier Score: Measures accuracy of predicted probabilities (lower is better).

4. Class-Imbalance–Sensitive Metrics

 Macro Average Precision/Recall/F1 (averaging metrics across classes

equally)

 Micro Average Precision/Recall/F1 (averaging metrics across instances)

 Weighted Average Precision/Recall/F1 (weighted by class size)

 Geometric Mean (G-Mean)

5. Ranking / Ordering Metrics

 Top-K Accuracy (useful in multi-class problems like image classification) :

Probability that the true class is among the top K predicted classes.

 Mean Average Precision (mAP): Common in information retrieval/object

detection; averages precision over recall levels.

Unit - 5
No ratings yet
Unit - 5
57 pages
Unit 4
No ratings yet
Unit 4
20 pages
Performance Metrics
No ratings yet
Performance Metrics
34 pages
3.4. Evaluation Metrics For AI Models
No ratings yet
3.4. Evaluation Metrics For AI Models
36 pages
2.3 Performance Metrics
No ratings yet
2.3 Performance Metrics
32 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
9.1 Accuracy: Formula: Accuracy (True Positives + True Negatives) / (Total Observations)
No ratings yet
9.1 Accuracy: Formula: Accuracy (True Positives + True Negatives) / (Total Observations)
4 pages
Data M
No ratings yet
Data M
10 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Data M11
No ratings yet
Data M11
5 pages
W6 CSE 4781 Classification Metrics
No ratings yet
W6 CSE 4781 Classification Metrics
28 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
Module 01 - Performance Metrics in ML
No ratings yet
Module 01 - Performance Metrics in ML
15 pages
? Task
No ratings yet
? Task
23 pages
Unit2 - Perfomance Measures
No ratings yet
Unit2 - Perfomance Measures
32 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
No ratings yet
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
31 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Confusion Matrix
No ratings yet
Confusion Matrix
16 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Performance Parameters
No ratings yet
Performance Parameters
23 pages
Module 2
No ratings yet
Module 2
151 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
What Are The Evaluation Metrics in Machine Learning
No ratings yet
What Are The Evaluation Metrics in Machine Learning
3 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
Introduction To Data Science and Python For Data
No ratings yet
Introduction To Data Science and Python For Data
12 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
6.evaluation Metrics - UNIT 2
No ratings yet
6.evaluation Metrics - UNIT 2
4 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Performance Metrics ML
No ratings yet
Performance Metrics ML
4 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Performance Measures
No ratings yet
Performance Measures
19 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
FAI Lecture - 23-10-2023 PDF
No ratings yet
FAI Lecture - 23-10-2023 PDF
12 pages
Exp7 MLAI2
No ratings yet
Exp7 MLAI2
8 pages
Ads Exp4
No ratings yet
Ads Exp4
3 pages
Transcript of Pivotal Climate-Change Hearing 1988
100% (4)
Transcript of Pivotal Climate-Change Hearing 1988
216 pages
Evaluation Metrics in Machine Learning - GeeksforGeeks
No ratings yet
Evaluation Metrics in Machine Learning - GeeksforGeeks
6 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
Lesson 4 Interpret Plans and Drawings
No ratings yet
Lesson 4 Interpret Plans and Drawings
48 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Performance Metrics
No ratings yet
Performance Metrics
8 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
6 pages
Model Performance Assessment
No ratings yet
Model Performance Assessment
13 pages
Imp Notes For Aamd
No ratings yet
Imp Notes For Aamd
6 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Classification Metrics Guide
No ratings yet
Classification Metrics Guide
15 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
8 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
11 pages
TB3 - 117 Engine Maintenance Manual: (EMM Book1 TOC) (Chapter 72 TOC)
No ratings yet
TB3 - 117 Engine Maintenance Manual: (EMM Book1 TOC) (Chapter 72 TOC)
14 pages
Role of Family in Consumer Behaviour
0% (1)
Role of Family in Consumer Behaviour
10 pages
Reoi Construction Supervision Services Leseru-Kitale Morpus-Lokichar - 28.3.2025
100% (1)
Reoi Construction Supervision Services Leseru-Kitale Morpus-Lokichar - 28.3.2025
3 pages
alloy20DataSheet PDF
No ratings yet
alloy20DataSheet PDF
2 pages
Halal Industry Master Plan (2008 - 2020) : The Evolution of The Halal Industry in Malaysia
No ratings yet
Halal Industry Master Plan (2008 - 2020) : The Evolution of The Halal Industry in Malaysia
2 pages
AES DRRM Memo PASS
No ratings yet
AES DRRM Memo PASS
2 pages
Experiment 16: Heat Conduction
No ratings yet
Experiment 16: Heat Conduction
6 pages
18nov-5th Sem Green Synthesis
No ratings yet
18nov-5th Sem Green Synthesis
21 pages
Maths
No ratings yet
Maths
114 pages
Faircode Technologies Private Limited - Home
No ratings yet
Faircode Technologies Private Limited - Home
1 page
Preboard Exam in Ee 2
No ratings yet
Preboard Exam in Ee 2
14 pages
Unit II 10 Data Preprocessing Techniques
No ratings yet
Unit II 10 Data Preprocessing Techniques
13 pages
Quiz Questions
No ratings yet
Quiz Questions
2 pages
Existentialist Feminism and Simone de Beauvoir PDF
No ratings yet
Existentialist Feminism and Simone de Beauvoir PDF
2 pages
07 GNP of India
No ratings yet
07 GNP of India
6 pages
Unit II 04 Functions and Modules
No ratings yet
Unit II 04 Functions and Modules
7 pages
Embankment Design Basic Nov20
No ratings yet
Embankment Design Basic Nov20
83 pages
Unit II 07 Numpy
No ratings yet
Unit II 07 Numpy
6 pages
Perl Arrays and Lists Guide
No ratings yet
Perl Arrays and Lists Guide
5 pages
Thuyết Trình Anh Văn Sáng Thứ 5
No ratings yet
Thuyết Trình Anh Văn Sáng Thứ 5
7 pages
Anthony 8
No ratings yet
Anthony 8
2 pages
Group 17 - Research Proposal-1
No ratings yet
Group 17 - Research Proposal-1
36 pages
Science Quiz Bee
No ratings yet
Science Quiz Bee
5 pages
CSF Anatomy & Physiology
No ratings yet
CSF Anatomy & Physiology
20 pages
Images Line Drawings and Backplanes
No ratings yet
Images Line Drawings and Backplanes
27 pages
Reflection Paper Guide for "The Billionaire"
No ratings yet
Reflection Paper Guide for "The Billionaire"
4 pages
Goodwill Valuation in Accountancy
No ratings yet
Goodwill Valuation in Accountancy
4 pages
Share 'Ch05
100% (1)
Share 'Ch05
81 pages
Colour Dilution Alopecia in Doberman Pinschers With Blue or Fawn Coat Colours - A Study On The Incidence and Histopathology of This Di
No ratings yet
Colour Dilution Alopecia in Doberman Pinschers With Blue or Fawn Coat Colours - A Study On The Incidence and Histopathology of This Di
10 pages
MATH 1300-MIDTERM # 2-2012: For Long Answer Questions, YOU MUST SHOW YOUR WORK
No ratings yet
MATH 1300-MIDTERM # 2-2012: For Long Answer Questions, YOU MUST SHOW YOUR WORK
8 pages
Fender
No ratings yet
Fender
14 pages