Learning Objectives
• Continuing discussion on KNN
• Performance metrics for Classification
• Significance of different metrics
1
kNN: Classification
Effect of Outliers:
● Consider k=1.
● Sensitive to outliers: Decision boundary
changes drastically with outliers.
● Solution?
○ Increase k
3
kNN: Classification
Effect of k:
● Low k: overfitting, highly
unstable decision boundary
● Good k: Smooth boundary, no
overfitting/underfitting
● Higher k: Everything classified
as most probable class
● How to find a good k?
k=1 k=15
4
kNN: Classification
Effect of k:
● Low k: overfitting, highly
unstable decision boundary
● Good k: Smooth boundary, no
overfitting/underfitting
● Higher k: Everything classified
as most probable class
● How to find a good k?
k=1 k=15
Cross validation is our friend!
5
kNN: Classification
What if we have same votes from
both classes?
Potential solutions for tie-
breaking:
● Take k odd
● Randomly select
● Use the class with larger prior k=1 k=15
6
kNN: Classification
A probabilistic variant: Probabilistic kNN
E.g. k=4, c=3
P=[3/4, 0, 1/4]
y=1 y=2 y=3
8
kNN: Classification
A probabilistic variant: Probabilistic kNN
E.g. k=4, c=3 Variant with pseudo counts:
P=[(3+1)/(4+3), (0+1)/(4+3),(1+1)/(4+3)]
=[4/7, 1/7, 2/7]
y=1 y=2 y=3
9
kNN: Regression
A simple regression algorithm:
● Train examples: where is a continuous real valued target
● Given test input
● Find distances to the n training examples using a distance metric
● Select k closest training examples and their target values
● The output is the mean of the target values of the k neighbours
Can be used for interpolation.
12
kNN: Challenges
Computationally expensive:
● Need to store all training example
● Required to compute distances to all training examples:
There are ways to optimize kNN computation:
● Reduce dimensionality using dimensionality reduction techniques
● Reduce number of comparisons:
○ kD tree implementation
○ Locality sensitive hashing
14
kNN: Computational Complexity
Brute force method
● Training time complexity: O(1)
● Training space complexity: O(1)
● Prediction time complexity: O(k * n * d)
● Prediction space complexity: O(1)
15
kNN: Computational Complexity
k-d tree method
● Training time complexity: O(d * n * log(n))
● Training space complexity: O(d * n)
● Prediction time complexity: O(k * log(n))
● Prediction space complexity: O(1)
16
kNN: k-d Tree
● K Dimensional tree (or k-d tree) is a tree data structure that is used to represent
points in a k-dimensional space.
● Used for various applications like nearest point (in k-dimensional space), efficient
storage of spatial data, range search etc.
18
kNN: k-d Tree
Example:
19
kNN: k-d Tree
Example:
20
kNN: Computational Complexity
The more “traditional” application of the kNN is the classification of data. It
often has quite a lot of points, e. g. MNIST has 60k training images and 10k test
images. Classification is done offline, which means we first do the training
phase, then just use the results during prediction. Therefore, if we want to
construct the data structure, we only need to do so once. For 10k test images,
let’s compare the brute force (which calculates all distances every time) and
k-d tree for 3 neighbors.
21
kNN: Computational Complexity
● Brute force (O(k * n)): 3 * 10,000 = 30,000
● k-d tree (O(k * log(n))): 3 * log(10,000) ~ 3 * 13 = 39
22
Classification Metrics
How to measure the performance of a classification model?
24
Classification Metrics
Most widely used metrics and tools to access classification models:
● Confusion matrix
● Accuracy
● Precision/Recall/F1-score
● Area under the ROC curve
25
Classification Metrics
Confusion Matrix
A table to summarize how successful the classification model is at
predicting examples belonging to various classes.
26
Classification Metrics
Confusion Matrix
E.g., For binary classification, a model predicts two classes: “spam” and
“not_spam” from a given email.
prediction
spam not_spam
spam True Positive False Negative
actual
(TP) (FN)
not_spam False Positive True Negative
(FP) (TN)
27
Classification Metrics
Confusion Matrix
Exercise 1: Consider a Cricket Tournament. Find
the mapping.
True Positive False Negative
(TP) (FN)
1. You had predicted that India would win and it won.
2. You had predicted that England would not win and it
False Positive True Negative
lost. (FP) (TN)
3. You had predicted that England would win, but it lost.
4. You had predicted that India would not win, but it won.
28
Classification Metrics
Confusion Matrix
Exercise 2:
prediction
1 0
1 TP=? FN=?
actual
0 FP=? TN=?
29
Classification Metrics
Confusion Matrix
Exercise 2:
prediction
1 0
1 TP=6 FN=2
actual
0 FP=1 TN=3
30
Classification Metrics
Confusion Matrix
Multiclass Classification: E.g. emotion classification
prediction
Happy Sad Angry Surprise Disgust Neutral
Happy
Sad
actual
=? Angry
Surprise
Disgust
Neutral
31
Classification Metrics
Accuracy
Accuracy is given by the number of correctly classified examples divided by the
total number of classified examples.
prediction
spam not_spam
TP+TN
Acc = spam True Positive False Negative
TP + TN + FP + FN
actual
(TP) (FN)
not_spam False Positive True Negative
(FP) (TN)
33
Classification Metrics
Accuracy
Accuracy is given by the number of correctly classified examples divided by the
total number of classified examples.
prediction
1 0
actual
1 TP=6 FN=2
0 FP=1 TN=3
Accuracy = ?
34
Classification Metrics
Precision
Precision is the ratio of correct positive predictions to the overall number of
positive predictions
prediction
spam not_spam
TP
Precisions = spam True Positive False Negative
TP + FP
actual
(TP) (FN)
FP is costly! not_spam False Positive True Negative
(FP) (TN)
35
Classification Metrics
Recall
Recall is the ratio of correct positive predictions to the overall number of positive
examples.
prediction
spam not_spam
TP spam True Positive False Negative
Recall =
actual
(TP) (FN)
TP + FN
not_spam False Positive True Negative
(FP) (TN)
FN is costly!
36
Classification Metrics
F1-Score
● The formula for the standard F1-score
is the harmonic mean of the precision
and recall.
● Best of both worlds
● A perfect model has an F-score of 1.
● FP & FN both are costly!
37
Classification Metrics
Visualizing Precision/Recall
© wikipedia
38
Classification Metrics
Precision/Recall/F1-score
prediction
1 0
1 P=6 FN=2
Precision = ?
actual
0 FP=1 TN=3
Recall=?
F1-score=?
39
Classification Metrics
Examples: It all depends on the problem!
Diagnosis of cancer.
prediction
It is important in medical cases
cancer no_cancer where it doesn’t matter
whether we raise a false alarm
cancer Perfect X
actual
but the actual positive cases
should not go undetected!
no _cancer OK Perfect
What metric would you pick?
41
Classification Metrics
Examples: It all depends on the problem!
Diagnosis of cancer.
prediction
It is important in medical cases
cancer no_cancer where it doesn’t matter
whether we raise a false alarm
cancer Perfect X
actual
but the actual positive cases
should not go undetected!
no _cancer OK Perfect
TP
Recall =
TP + FN
42
Classification Metrics
Examples: It all depends on the problem!
Detecting if an email spam or no spam.
prediction
It is important in emails where
spam no_spam it is more important that we
don’t miss any important email
spam Perfect OK
actual
as spam than receiving an
occasional spam as no spam.
no_spam X Perfect
What metric would you pick?
43
Classification Metrics
Examples: It all depends on the problem!
Detecting if an email spam or no spam.
prediction
It is important in emails where
spam no_spam it is more important that we
don’t miss any important email
spam Perfect OK
actual
as spam than receiving an
occasional spam as no spam.
no_spam X Perfect
TP
Precision =
TP + FP
44
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
Happy
● Can you define recall (Happy)?
Sad
● Can you define precision (Happy)?
actual
Angry
Surprise
Disgust
Neutral
45
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
recall(Happy)
Happy
Sad
actual
Angry
Surprise
Disgust
Neutral
46
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
precision(Happy)
Happy
Sad
actual
Angry
Surprise
Disgust
Neutral
47
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
Can you define accuracy?
Happy
Sad
actual
Angry
Surprise
Disgust
Neutral
48
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
Happy
Sad
actual
Angry
Surprise
Disgust
Neutral
49
Classification Metrics
Area under the ROC Curve (AUC)
● The ROC curve (ROC stands for “receiver operating characteristic,” the term
comes from radar engineering. The method was originally developed for
operators of military radar receivers starting in 1941, which led to its name.) is
a commonly used method to assess the performance of binary classification
models.
● ROC curves use a combination of:
(1) true positive rate (the proportion of positive examples predicted correctly,
defined exactly as recall) and
(2) false positive rate (the proportion of negative examples predicted
incorrectly)
to build up a summary picture of the classification performance.
51
Classification Metrics
Area under the ROC Curve (AUC)
● ROC curves use a combination of:
(1) true positive rate (the proportion of positive examples predicted correctly,
defined exactly as recall) and
(2) false positive rate (the proportion of negative examples predicted
incorrectly)
to build up a summary picture of the classification performance.
TP
FP
TPR =
FPR =
TP + FN
FP + TN
52
Classification Metrics
Area under the ROC Curve (AUC)
prediction
TP spam not_spam
TPR =
TP + FN spam True Positive False Negative
actual
(TP) (FN)
FP not_spam False Positive True Negative
FPR = (FP) (TN)
FP + TN
Specificity = 1 - FPR = TN / (TN + FP)
Sensitivity/Recall = TPR
53
Classification Metrics
Area under the ROC Curve (AUC)
TP
TPR =
TP + FN
FP
FPR =
FP + TN
54
Classification Metrics
Area under the ROC Curve (AUC)
● We used a threshold for
classification in many classification
models
● Typically for models that give
probabilistic output score.
55
Classification Metrics
Area under the ROC Curve (AUC)
● To compare different classifiers, it can
be useful to summarize the
performance of each classifier into a
single measure.
● One common approach is to
calculate the area under the ROC
curve, which is abbreviated to AUC.
56
Classification Metrics
Area under the ROC Curve (AUC)
● AUC ranges in value from 0 to 1
● A model whose predictions are 100%
wrong has an AUC of 0.0
● One whose predictions are 100%
correct has an AUC of 1.0
● AUC is classification-threshold-
invariant and suitable for comparison
58
Classification Metrics
Area under the ROC Curve (AUC)
prediction
spam not_spam
actual
spam 10 0
not_spam 10 0
All predictions say “spam”.
(1) TPR=?
(2) FPR=?
(3) Where is the point in ROC curve? 59
Classification Metrics
Area under the ROC Curve (AUC)
prediction
spam not_spam
actual
spam 0 10
not_spam 0 10
All predictions say “not_spam”.
(1) TPR=?
(2) FPR=?
(3) Where is the point in ROC curve? 60
Classification Metrics
Area under the ROC Curve (AUC)
prediction
spam not_spam
actual
spam 10 0
not_spam 0 10
All predictions are perfect.
(1) TPR=?
(2) FPR=?
(3) Where is the point in ROC curve? 61
Classification Metrics
Area under the ROC Curve (AUC)
prediction
spam not_spam
actual
spam 5 5
not_spam 5 5
Some random predictions.
(1) TPR=?
(2) FPR=?
(3) Where is the point in ROC curve? 62