Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views15 pages

03 Performance Metrics

The document discusses performance metrics used to evaluate statistical and machine learning models, focusing on classification and regression tasks. It details various classification performance metrics, including confusion matrix-based metrics, threshold curve-based metrics, probabilistic metrics, and class-imbalance-sensitive metrics, along with their applications in fields like medical diagnosis and fraud detection. Additionally, it explains the importance of metrics such as accuracy, precision, recall, F1 score, ROC curve, and AUC in assessing model performance.

Uploaded by

victor.seelan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views15 pages

03 Performance Metrics

The document discusses performance metrics used to evaluate statistical and machine learning models, focusing on classification and regression tasks. It details various classification performance metrics, including confusion matrix-based metrics, threshold curve-based metrics, probabilistic metrics, and class-imbalance-sensitive metrics, along with their applications in fields like medical diagnosis and fraud detection. Additionally, it explains the importance of metrics such as accuracy, precision, recall, F1 score, ROC curve, and AUC in assessing model performance.

Uploaded by

victor.seelan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Performance Metrics

Performance metrics are quantitative measures that evaluate how well a


statistical or machine learning model fits the data and makes predictions.
They differ based on the type of task: Classification or Regression

Classification Performance Metrics


Classification tasks involve predicting categorical outcomes. Performance is
measured by how well the predicted labels match the true labels.

1 . Confusion Matrix–Based Metrics

2. Threshold Curve–Based Metrics

3. Probabilistic Metrics

4. Class-Imbalance–Sensitive Metrics

1 . Confusion Matrix–Based Metrics

 Accuracy: Overall correctness of the model. Works poorly with imbalanced


data.

 Precision (Positive Predictive Value, PPV): Out of all predicted positives,


how many are actually positive.

 Recall (Sensitivity, True Positive Rate, TPR): Out of all actual positives,
how many are correctly predicted.

 Specificity (True Negative Rate, TNR): Out of all actual negatives, how
many are correctly predicted.

 F1 Score: Harmonic mean of precision and recall. Useful when class


distribution is imbalanced.

 Fβ Score: Weighted F-score. β>1 emphasizes recall, β<1emphasizes precision.

 Balanced Accuracy: Average performance across both classes, useful for


imbalanced data.
 Matthews Correlation Coefficient (MCC): Correlation between true and
predicted labels. Ranges from −1 (worst) to +1(best).

 Cohen’s Kappa: Adjusted accuracy that accounts for agreement by chance.

Confusion matrix

A confusion matrix is a table that is used to evaluate the performance of a classification


model by comparing predicted values against actual values. It is an important tool for
understanding the accuracy of a model, and can help identify areas of improvement.

The confusion matrix serves as the basis for calculating essential


evaluation metrics that offer nuanced insights into a model's
performance:
 Accuracy: Accuracy quantifies the ratio of correct predictions
(TP and TN) to the total number of predictions. While
informative, this metric can be misleading when classes are
imbalanced.
 Precision: Precision evaluates the proportion of true positive
predictions among all positive predictions (TP / (TP + FP)). This
metric is crucial when the cost of false positives is high.
 Recall (Sensitivity or True Positive Rate): Recall measures the
ratio of true positive predictions to the actual number of positive
instances (TP / (TP + FN)). This metric is significant when
missing positive instances is costly.
 Specificity (True Negative Rate): Specificity calculates the
ratio of true negative predictions to the actual number of
negative instances (TN / (TN + FP)). This metric is vital when
the emphasis is on accurately identifying negative instances.
 F1-Score: The F1-Score strikes a balance between precision and
recall, making it useful when both false positives and false
negatives carry similar importance.
Example:

Suppose you are working for a bank and are responsible for assessing loan applications. You
have built a machine learning model that predicts whether an applicant is likely to default on
their loan or not. The model has been trained on historical data and has achieved an accuracy
of 85%.

To evaluate the performance of the model, you can use a confusion matrix. The matrix is
constructed by comparing the predicted values against the actual values, as shown below:
Applications of the Confusion Matrix
The confusion matrix has applications across various fields:
 Model Evaluation: The primary application of the confusion matrix is to
evaluate the performance of a classification model. It provides insights
into the model's accuracy, precision, recall, and F1-score.
 Medical Diagnosis: The confusion matrix finds extensive use in medical
fields for diagnosing diseases based on tests or images. It aids in
quantifying the accuracy of diagnostic tests and identifying the balance
between false positives and false negatives.
 Fraud Detection: Banks and financial institutions use confusion matrices
to detect fraudulent transactions by showcasing how AI algorithms help
identify patterns of fraudulent activities.
 Natural Language Processing (NLP): NLP models use confusion
matrices to evaluate sentiment analysis, text classification, and named
entity recognition.
 Customer Churn Prediction: Confusion matrices play a pivotal role in
predicting customer churn and show how AI-driven models use historical
data to anticipate and mitigate customer attrition.
 Image and Object Recognition: Confusion matrices assist in training
models to identify objects in images, enabling technologies like self-
driving cars and facial recognition systems.
 A/B Testing: A/B testing is crucial for optimizing user experiences.
Confusion matrices help analyze the results of A/B tests, enabling data-
driven decisions in user engagement strategies.
2. Threshold Curve–Based Metrics

Threshold curve–based metrics help visualize and evaluate classifier performance across
different decision thresholds. A decision threshold in machine learning is a cut-off value
used to convert predicted probabilities or scores into class labels.

For binary classification, it


determines whether an instance is
assigned to the positive or negative class
based on whether its predicted score is
above or below this point.

How It Works

 Probabilistic classifiers (like


logistic regression) output a score
between 0 and 1 for each data
point.

 The threshold (often defaulted to 0.5) decides which class a data point is assigned:
scores above the threshold are labeled positive, and scores below are labeled negative.

List of Threshold Curve:


 Receiver Operating Characteristic (ROC) Curve: Visual tool for classifier
performance. Plots TPR vs. FPR at different thresholds.

 Area Under the ROC Curve (AUC-ROC): Probability that the model ranks
a random positive higher than a random negative.

 Precision–Recall Curve (PRC): More informative for imbalanced datasets.

 Area Under the Precision–Recall Curve (AUC-PR): Summarizes trade-off


between precision and recall.

1. Receiver Operating Characteristic (ROC) Curve:


The Receiver Operating Characteristic (ROC) Curve is a graphical tool that visualizes the
performance of a binary classifier across different decision thresholds. It plots the True
Positive Rate (TPR) also called (sensitivity or recall) against the False Positive Rate
(FPR) for each possible threshold.

Applications

ROC curves are widely used in fields like medical diagnostics, signal detection, and machine
learning to assess, select, and compare models independently of the underlying class
distribution or cost context.
2. Area Under the ROC Curve (AUC-ROC):

Area Under the ROC Curve (AUC-ROC) is a scalar metric that summarizes the overall
performance of a binary classifier across all possible decision thresholds. AUC-ROC is the
probability that a classifier will rank a randomly chosen positive example higher than a
randomly chosen negative example.

 The value ranges from 0.5 (no discrimination, equivalent to random guessing) to 1.0
(perfect discrimination).

 Higher values indicate better ability to distinguish between the classes.


Example:
3. Precision–Recall Curve (PRC): More informative for imbalanced datasets.

The Precision–Recall Curve (PRC) is a graphical representation used to evaluate the


performance of a binary classifier, especially on imbalanced datasets where the positive class
is rare.

 The x-axis represents Recall (also called sensitivity or True Positive Rate), which
measures the proportion of actual positives correctly identified by the model.

 The y-axis represents Precision, which measures the proportion of predicted positives
that are actually correct.

4. Area Under the


Precision–Recall Curve (AUC-PR)

Definition:
The area under the Precision–Recall Curve (PRC). It summarizes the trade-off between
precision and recall over all classification thresholds.
 A single-number measure of how well the classifier balances precision and recall.

 Higher AUC–PR = better performance, especially on rare positive classes.

If 1.0 = Perfect classifier (always high precision and recall). Baseline →


Equal to the prevalence of the positive class (i.e., proportion of positives in
the dataset).

3. Probabilistic Metrics

Probabilistic metrics evaluate how well the predicted probabilities match the actual
outcomes. These are more informative than hard classification accuracy, especially when we
care about calibration and confidence.

 Log Loss (Cross-Entropy Loss): Penalizes false predictions with high


confidence.
 Brier Score: Measures accuracy of predicted probabilities (lower is better).

4. Class-Imbalance–Sensitive Metrics

 Macro Average Precision/Recall/F1 (averaging metrics across classes


equally)

 Micro Average Precision/Recall/F1 (averaging metrics across instances)

 Weighted Average Precision/Recall/F1 (weighted by class size)

 Geometric Mean (G-Mean)

5. Ranking / Ordering Metrics

 Top-K Accuracy (useful in multi-class problems like image classification) :


Probability that the true class is among the top K predicted classes.

 Mean Average Precision (mAP): Common in information retrieval/object


detection; averages precision over recall levels.

You might also like