Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
32 views3 pages

P04 EvaluationKNN SolutionNotes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views3 pages

P04 EvaluationKNN SolutionNotes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Aprendizagem 2023

Lab 4: kNN and Evaluation

Practical exercises

Consider the following data: input output


y1 y2 y3 y4
𝐱1 1 1 A 1.4
𝐱2 2 1 B 0.5
𝐱3 2 3 B 2
𝐱4 3 3 B 2.2
𝐱5 1 0 A 0.7
𝐱6 1 4 A 1.2

1. Assuming a k-nearest neighbor with k=3 applied within a leave-one-out schema:


a) Let 𝑦3 be the output variable (categoric). Classify 𝐱1 when considering uniform
weights and:
i. Euclidean (l2) distance (real input variables)

‖𝐱𝑖 − 𝐱𝑗 ‖ 𝐱1 𝐱2 𝐱3 𝐱4 𝐱5 𝐱6
2

𝐱1 - 1 √5 √8 1 3

𝑧̂1 = 𝑚𝑜𝑑𝑒(𝐵, 𝐵, 𝐴) = 𝐵

ii. Hamming distance (categorical input variables)

H(𝐱𝑖 , 𝐱𝑗 ) 𝐱1 𝐱2 𝐱3 𝐱4 𝐱5 𝐱6
𝐱1 - 1 2 2 1 1

𝑧̂1 = 𝑚𝑜𝑑𝑒(𝐵, 𝐴, 𝐴) = 𝐴

b) Let 𝑦4 be the output variable (numeric). Considering cosine similarity, provide the mean
regression estimate for 𝐱1

cos 𝐱1 𝐱2 𝐱3 𝐱4 𝐱5 𝐱6
𝐱1 - 0.95 0.98 1 0.70 0.86

𝑧̂2 = 𝑚𝑒𝑎𝑛(0.5,2, 2.2) = 1.5(6)

c) Consider a weighted-distance kNN with Euclidean (l2) distance, identify:


i. the weighted mode estimate of 𝐱1 for the 𝑦3 outcome
𝑙1 𝐱1 𝐱2 𝐱3 𝐱4 𝐱5 𝐱6
𝐱1 - 1 √5 √8 1 3
1 1
𝑧̂1 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑚𝑜𝑑𝑒 (1 × 𝐴, ( + ) 𝐵) = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑚𝑜𝑑𝑒(𝐴, 1.45 × 𝐵) = 𝐵
1 √5
ii. the weighted mean estimate of 𝐱1 for the 𝑦4 outcome
1 1 1
0.5 + 2 + 0.7
1 √5 1
𝑧̂1 = = 0.86
1 1 1
+ +
1 √5 1

2. Let 𝑥𝑗 be the measurement on variable 𝑦𝑗 for observation 𝐱.


Given the learnt regression model 𝑥̂4 = 1 − 0.8𝑥1 + 0.2𝑥2 2 + 0.2𝑥1 𝑥2 :
a) Compute the 𝑦4 regression estimates for the observations of the aforementioned dataset
𝒛̂ = (0.6 0 2.4 2.2 0.2 4.2)

b) Compute the training Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)

𝒛 − 𝒛̂ = (0.8 0.5 -0.4 0 0.5 -3)


𝑀𝐴𝐸 = 0.8(6), 𝑅𝑀𝑆𝐸 = 1.31

c) Perform a residue analysis to assess the presence of systemic biases against 𝑦1 and 𝑦2

2
1
0
residues

-1 0 0.5 1 1.5 2 2.5 3 3.5


-2
-3
-4
y1
2
1
0
residues

-1 0 1 2 3 4 5
-2
-3
-4
y2
There is no evidence towards the presence of biases on y1. However, as residues appear to be correlated
against y2, we can hypothesize that the learnt regressor is moderately biased against y2 for the given data.

3. [optional] Consider the probabilistic outcome of a classifier for the given six observations to be
𝐩(𝑦3 = 𝐴 | 𝐱) = [𝑝(𝑦3 = 𝐴 | 𝐱 𝟏 ), … , 𝑝(𝑦3 = 𝐴 | 𝐱 𝟔 )] = [0.45 0.4 0.3 0.6 0.8 0.4]
a) Draw the training ROC curve

𝑧 𝑧̂ 0 >0.3 >0.4 >0.45 >0.6 >0.8


1 0.45 TP TP TP FN FN FN
0 0.4 FP FP TN TN TN TN
0 0.3 FP TN TN TN TN TN
0 0.6 FP FP FP FP TN TN
1 0.8 TP TP TP TP TP FN
1 0.4 TP TP FN FN FN FN
FPR=FP/N 1.00 0.67 0.33 0.33 0.00 0.00
TPR=TP/P 1.00 1.00 0.67 0.33 0.33 0.00
F1 2/3 0.75 2/3 0.5 0.5 NA
1.00
𝜃 > 0.3 𝜃>0
0.90
0.80
𝜃 > 0.4
0.70
0.60
TPR 0.50
0.40 𝜃 > 0.6
𝜃 > 0.45
0.30
0.20
0.10
𝜃 > 0.8
0.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
FDR

b) Compute the training AUC

1 1 2 1 1 1 1 1
𝐴𝑈𝐶 = ( × ) + ( × + × × ) + (1 × ) = 0.72
3 3 3 3 2 3 3 3

c) Would you change the default 0.5 probability threshold for this classifier in order to maximize
training F1?
Yes, training F1 is maximal when the probability threshold 𝜃 ∈ ]0.3, 0.4]

Programming quest

1. Consider the accuracy estimates collected under a 5-fold CV for two predictive models M1 and
M2, accM1=(0.7,0.5,0.55,0.55,0.6) and accM2=(0.75,0.6,0.6,0.65,0.55).
Using scipy, assess whether the differences in predictive accuracy are statistically significant.
Resource: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html

4. Consider the housing dataset available at https://web.ist.utl.pt/~rmch/dscience/data/housing.arff and


the Regression notebook available at the course’s webpage. Using a 10-fold cross-validation:
a) Assess the MAE of a kNN regressor for 𝑘 ∈ {1,5,9} (remaining parameters as default)
b) Compare the RMSE of the default kNN and decision tree regressors

You might also like