03 Performance Metrics
03 Performance Metrics
3. Probabilistic Metrics
4. Class-Imbalance–Sensitive Metrics
Recall (Sensitivity, True Positive Rate, TPR): Out of all actual positives,
how many are correctly predicted.
Specificity (True Negative Rate, TNR): Out of all actual negatives, how
many are correctly predicted.
Confusion matrix
Suppose you are working for a bank and are responsible for assessing loan applications. You
have built a machine learning model that predicts whether an applicant is likely to default on
their loan or not. The model has been trained on historical data and has achieved an accuracy
of 85%.
To evaluate the performance of the model, you can use a confusion matrix. The matrix is
constructed by comparing the predicted values against the actual values, as shown below:
Applications of the Confusion Matrix
The confusion matrix has applications across various fields:
Model Evaluation: The primary application of the confusion matrix is to
evaluate the performance of a classification model. It provides insights
into the model's accuracy, precision, recall, and F1-score.
Medical Diagnosis: The confusion matrix finds extensive use in medical
fields for diagnosing diseases based on tests or images. It aids in
quantifying the accuracy of diagnostic tests and identifying the balance
between false positives and false negatives.
Fraud Detection: Banks and financial institutions use confusion matrices
to detect fraudulent transactions by showcasing how AI algorithms help
identify patterns of fraudulent activities.
Natural Language Processing (NLP): NLP models use confusion
matrices to evaluate sentiment analysis, text classification, and named
entity recognition.
Customer Churn Prediction: Confusion matrices play a pivotal role in
predicting customer churn and show how AI-driven models use historical
data to anticipate and mitigate customer attrition.
Image and Object Recognition: Confusion matrices assist in training
models to identify objects in images, enabling technologies like self-
driving cars and facial recognition systems.
A/B Testing: A/B testing is crucial for optimizing user experiences.
Confusion matrices help analyze the results of A/B tests, enabling data-
driven decisions in user engagement strategies.
2. Threshold Curve–Based Metrics
Threshold curve–based metrics help visualize and evaluate classifier performance across
different decision thresholds. A decision threshold in machine learning is a cut-off value
used to convert predicted probabilities or scores into class labels.
How It Works
The threshold (often defaulted to 0.5) decides which class a data point is assigned:
scores above the threshold are labeled positive, and scores below are labeled negative.
Area Under the ROC Curve (AUC-ROC): Probability that the model ranks
a random positive higher than a random negative.
Applications
ROC curves are widely used in fields like medical diagnostics, signal detection, and machine
learning to assess, select, and compare models independently of the underlying class
distribution or cost context.
2. Area Under the ROC Curve (AUC-ROC):
Area Under the ROC Curve (AUC-ROC) is a scalar metric that summarizes the overall
performance of a binary classifier across all possible decision thresholds. AUC-ROC is the
probability that a classifier will rank a randomly chosen positive example higher than a
randomly chosen negative example.
The value ranges from 0.5 (no discrimination, equivalent to random guessing) to 1.0
(perfect discrimination).
The x-axis represents Recall (also called sensitivity or True Positive Rate), which
measures the proportion of actual positives correctly identified by the model.
The y-axis represents Precision, which measures the proportion of predicted positives
that are actually correct.
Definition:
The area under the Precision–Recall Curve (PRC). It summarizes the trade-off between
precision and recall over all classification thresholds.
A single-number measure of how well the classifier balances precision and recall.
3. Probabilistic Metrics
Probabilistic metrics evaluate how well the predicted probabilities match the actual
outcomes. These are more informative than hard classification accuracy, especially when we
care about calibration and confidence.
4. Class-Imbalance–Sensitive Metrics