Confusion Matrix:
A confusion matrix provides a summary of the predictive results
in a classification problem. Correct and incorrect predictions are
summarized in a table with their values and broken down by
each class.
Confusion Matrix for the Binary Classification
2. Calculate a confusion matrix:
Let’s take an example:
We have a total of 10 cats and dogs and our model predicts
whether it is a cat or not.
Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’,
‘cat’, ‘dog’]
Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’,
‘cat’, ‘cat’]
Remember, we describe predicted values as Positive/Negative
and actual values as True/False.
Definition of the Terms:
True Positive: You predicted positive and it’s true. You predicted
that an animal is a cat and it actually is.
True Negative: You predicted negative and it’s true. You
predicted that animal is not a cat and it actually is not (it’s a
dog).
False Positive (Type 1 Error): You predicted positive and it’s
false. You predicted that animal is a cat but it actually is not (it’s
a dog).
False Negative (Type 2 Error): You predicted negative and it’s
false. You predicted that animal is not a cat but it actually is.
Classification Accuracy:
Classification Accuracy is given by the relation:
Recall (aka Sensitivity):
Recall is defined as the ratio of the total number of correctly
classified positive classes divide by the total number of positive
classes. Or, out of all the positive classes, how much we have
predicted correctly. Recall should be high.
Precision:
Precision is defined as the ratio of the total number of correctly
classified positive classes divided by the total number of
predicted positive classes. Or, out of all the predictive positive
classes, how much we predicted correctly. Precision should be
high.
Trick to remember: Precision has Predictive Results in the
denominator.
F-score or F1-score:
It is difficult to compare two models with different Precision
and Recall. So to make them comparable, we use F-Score. It is
the Harmonic Mean of Precision and Recall. As compared to
Arithmetic Mean, Harmonic Mean punishes the extreme values
more. F-score should be high.
Specificity:
Specificity determines the proportion of actual negatives that
are correctly identified.
Example to interpret confusion matrix:
Let’s calculate confusion matrix using above cat and dog
example:
Classification Accuracy:
Accuracy = (TP + TN) / (TP + TN + FP + FN) =
(3+4)/(3+4+2+1) = 0.70
Recall: Recall gives us an idea about when it’s actually yes, how
often does it predict yes.
Recall = TP / (TP + FN) = 3/(3+1) = 0.75
Precision: Precsion tells us about when it predicts yes, how
often is it correct.
Precision = TP / (TP + FP) = 3/(3+2) = 0.60
F-score:
F-score = (2*Recall*Precision)/(Recall+Presision) =
(2*0.75*0.60)/(0.75+0.60) = 0.67
Specificity:
Specificity = TN / (TN + FP) = 4/(4+2) = 0.67
The AUC-ROC curve, or Area Under the Receiver Operating
Characteristic curve, is a graphical representation of the performance of a
binary classification model at various classification thresholds. It is
commonly used in machine learning to assess the ability of a model to
distinguish between two classes, typically the positive class (e.g.,
presence of a disease) and the negative class (e.g., absence of a
disease).
Environmental scientists want to solve a two-class
classification problem for predicting whether a
population contains a specific genetic variant. They
can use a confusion matrix to determine how many
ways automated processes might confuse the
machine learning classification model they're
analyzing. Assuming the scientists use 500 samples
for their data analysis, a table is constructed for their
predictive and actual values before calculating the
confusion matrix.
Predicted without Predicted with the
the variant variant
Actual number
without the variant
Actual number with
the variant
Total predictive
Total predicted value
value
After creating the matrix, the scientists analyze their
sample data. Assume the scientists predict that 350
test samples contain the genetic variant and 150
samples don't. If they determine the actual number of
samples containing the variant is 305, the actual
number of samples without the variant is 195. These
values become the "true" values in the matrix and the
scientists enter the data in the table:
Predicted with the
Predicted without the variant variant
Actual number without the True negative = 45 False positive =
Predicted with the
Predicted without the variant variant
variant = 195 150
Actual number with the variant True positive =
False negative = 105
= 305 200
150 350