0% found this document useful (0 votes)

32 views3 pages

P04 EvaluationKNN SolutionNotes

Uploaded by

Emília Morgado Santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views3 pages

P04 EvaluationKNN SolutionNotes

Uploaded by

Emília Morgado Santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Aprendizagem 2023

Lab 4: kNN and Evaluation

Practical exercises

Consider the following data: input output

y1 y2 y3 y4
𝐱1 1 1 A 1.4
𝐱2 2 1 B 0.5
𝐱3 2 3 B 2
𝐱4 3 3 B 2.2
𝐱5 1 0 A 0.7
𝐱6 1 4 A 1.2

1. Assuming a k-nearest neighbor with k=3 applied within a leave-one-out schema:

a) Let 𝑦3 be the output variable (categoric). Classify 𝐱1 when considering uniform
weights and:
i. Euclidean (l2) distance (real input variables)

‖𝐱𝑖 − 𝐱𝑗 ‖ 𝐱1 𝐱2 𝐱3 𝐱4 𝐱5 𝐱6
2

𝐱1 - 1 √5 √8 1 3

𝑧̂1 = 𝑚𝑜𝑑𝑒(𝐵, 𝐵, 𝐴) = 𝐵

ii. Hamming distance (categorical input variables)

H(𝐱𝑖 , 𝐱𝑗 ) 𝐱1 𝐱2 𝐱3 𝐱4 𝐱5 𝐱6
𝐱1 - 1 2 2 1 1

𝑧̂1 = 𝑚𝑜𝑑𝑒(𝐵, 𝐴, 𝐴) = 𝐴

b) Let 𝑦4 be the output variable (numeric). Considering cosine similarity, provide the mean
regression estimate for 𝐱1

cos 𝐱1 𝐱2 𝐱3 𝐱4 𝐱5 𝐱6
𝐱1 - 0.95 0.98 1 0.70 0.86

𝑧̂2 = 𝑚𝑒𝑎𝑛(0.5,2, 2.2) = 1.5(6)

c) Consider a weighted-distance kNN with Euclidean (l2) distance, identify:

i. the weighted mode estimate of 𝐱1 for the 𝑦3 outcome
𝑙1 𝐱1 𝐱2 𝐱3 𝐱4 𝐱5 𝐱6
𝐱1 - 1 √5 √8 1 3
1 1
𝑧̂1 = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑚𝑜𝑑𝑒 (1 × 𝐴, ( + ) 𝐵) = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑_𝑚𝑜𝑑𝑒(𝐴, 1.45 × 𝐵) = 𝐵
1 √5
ii. the weighted mean estimate of 𝐱1 for the 𝑦4 outcome
1 1 1
0.5 + 2 + 0.7
1 √5 1
𝑧̂1 = = 0.86
1 1 1
+ +
1 √5 1

2. Let 𝑥𝑗 be the measurement on variable 𝑦𝑗 for observation 𝐱.

Given the learnt regression model 𝑥̂4 = 1 − 0.8𝑥1 + 0.2𝑥2 2 + 0.2𝑥1 𝑥2 :
a) Compute the 𝑦4 regression estimates for the observations of the aforementioned dataset
𝒛̂ = (0.6 0 2.4 2.2 0.2 4.2)

b) Compute the training Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)

𝒛 − 𝒛̂ = (0.8 0.5 -0.4 0 0.5 -3)

𝑀𝐴𝐸 = 0.8(6), 𝑅𝑀𝑆𝐸 = 1.31

c) Perform a residue analysis to assess the presence of systemic biases against 𝑦1 and 𝑦2

2
1
0
residues

-1 0 0.5 1 1.5 2 2.5 3 3.5

-2
-3
-4
y1
2
1
0
residues

-1 0 1 2 3 4 5
-2
-3
-4
y2
There is no evidence towards the presence of biases on y1. However, as residues appear to be correlated
against y2, we can hypothesize that the learnt regressor is moderately biased against y2 for the given data.

3. [optional] Consider the probabilistic outcome of a classifier for the given six observations to be
𝐩(𝑦3 = 𝐴 | 𝐱) = [𝑝(𝑦3 = 𝐴 | 𝐱 𝟏 ), … , 𝑝(𝑦3 = 𝐴 | 𝐱 𝟔 )] = [0.45 0.4 0.3 0.6 0.8 0.4]
a) Draw the training ROC curve

𝑧 𝑧̂ 0 >0.3 >0.4 >0.45 >0.6 >0.8

1 0.45 TP TP TP FN FN FN
0 0.4 FP FP TN TN TN TN
0 0.3 FP TN TN TN TN TN
0 0.6 FP FP FP FP TN TN
1 0.8 TP TP TP TP TP FN
1 0.4 TP TP FN FN FN FN
FPR=FP/N 1.00 0.67 0.33 0.33 0.00 0.00
TPR=TP/P 1.00 1.00 0.67 0.33 0.33 0.00
F1 2/3 0.75 2/3 0.5 0.5 NA
1.00
𝜃 > 0.3 𝜃>0
0.90
0.80
𝜃 > 0.4
0.70
0.60
TPR 0.50
0.40 𝜃 > 0.6
𝜃 > 0.45
0.30
0.20
0.10
𝜃 > 0.8
0.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
FDR

b) Compute the training AUC

1 1 2 1 1 1 1 1
𝐴𝑈𝐶 = ( × ) + ( × + × × ) + (1 × ) = 0.72
3 3 3 3 2 3 3 3

c) Would you change the default 0.5 probability threshold for this classifier in order to maximize
training F1?
Yes, training F1 is maximal when the probability threshold 𝜃 ∈ ]0.3, 0.4]

Programming quest

1. Consider the accuracy estimates collected under a 5-fold CV for two predictive models M1 and
M2, accM1=(0.7,0.5,0.55,0.55,0.6) and accM2=(0.75,0.6,0.6,0.65,0.55).
Using scipy, assess whether the differences in predictive accuracy are statistically significant.
Resource: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html

4. Consider the housing dataset available at https://web.ist.utl.pt/~rmch/dscience/data/housing.arff and

the Regression notebook available at the course’s webpage. Using a 10-fold cross-validation:
a) Assess the MAE of a kNN regressor for 𝑘 ∈ {1,5,9} (remaining parameters as default)
b) Compare the RMSE of the default kNN and decision tree regressors

ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
425796316-COBIT-2019-Framework-Governance-and-Management-Objectives English
100% (4)
425796316-COBIT-2019-Framework-Governance-and-Management-Objectives English
288 pages
Crash Barrier BBS & QTY
100% (10)
Crash Barrier BBS & QTY
4 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
Message
No ratings yet
Message
313 pages
Catamaran Inclining Report
No ratings yet
Catamaran Inclining Report
24 pages
Machine Learning Homework
No ratings yet
Machine Learning Homework
8 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Socket Base Connections With Precast Concrete Columns PDF
100% (5)
Socket Base Connections With Precast Concrete Columns PDF
11 pages
Enderlein and Pleomorphism
No ratings yet
Enderlein and Pleomorphism
1 page
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
Liquid Coating Resins and Additives
No ratings yet
Liquid Coating Resins and Additives
12 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
Bridge Works - Miscellaneous
No ratings yet
Bridge Works - Miscellaneous
26 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Prime and Composite Numbers PDF
No ratings yet
Prime and Composite Numbers PDF
6 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Module 4 Algorithms For Data Science
No ratings yet
Module 4 Algorithms For Data Science
66 pages
Problem Solving
No ratings yet
Problem Solving
16 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
11 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
Presentation 1
No ratings yet
Presentation 1
91 pages
Republic of The Philippines Tanggapan NG Sangguniang Panlungsod City of Naga
No ratings yet
Republic of The Philippines Tanggapan NG Sangguniang Panlungsod City of Naga
5 pages
Online Credit Risk Analytics and Modeling
0% (2)
Online Credit Risk Analytics and Modeling
7 pages
Digital Innovations Exam UiTM
No ratings yet
Digital Innovations Exam UiTM
6 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Project2 2022 Fall
No ratings yet
Project2 2022 Fall
7 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
SPE 101937-STU: Determining Cutting Transport Parameter in A Horizontal Coiled Tubing Underbalanced Drilling Operation
No ratings yet
SPE 101937-STU: Determining Cutting Transport Parameter in A Horizontal Coiled Tubing Underbalanced Drilling Operation
11 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
16 pages
Manual Wms
No ratings yet
Manual Wms
4 pages
EndSem 202223 Solution
No ratings yet
EndSem 202223 Solution
4 pages
Wa0006.
No ratings yet
Wa0006.
4 pages
Machine Learning PYQ 2021
No ratings yet
Machine Learning PYQ 2021
4 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
CHE 1000-E LEARNING - BALANCING REDOX REACTIONS
No ratings yet
CHE 1000-E LEARNING - BALANCING REDOX REACTIONS
17 pages
Northern Black Polished Ware in India
100% (1)
Northern Black Polished Ware in India
19 pages
Data Science Regression Analysis
No ratings yet
Data Science Regression Analysis
25 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
40 ICAGRI SlsiRevisi LukmanHakim
No ratings yet
40 ICAGRI SlsiRevisi LukmanHakim
10 pages
Math 10 SLM 18 Permutation and Combination
No ratings yet
Math 10 SLM 18 Permutation and Combination
17 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
Preparation of Fermented Blue Crab With Rice and It'S Market Ability
No ratings yet
Preparation of Fermented Blue Crab With Rice and It'S Market Ability
6 pages
P05 LinearRegression SolutionNotes
No ratings yet
P05 LinearRegression SolutionNotes
4 pages
Argha's ML LAB - 240927 - 121838
No ratings yet
Argha's ML LAB - 240927 - 121838
13 pages
Cohesity License Terms Overview
No ratings yet
Cohesity License Terms Overview
5 pages
Compitators
No ratings yet
Compitators
32 pages
Happy Elevators India PVT LTD: Sub: Elevator Quotation
No ratings yet
Happy Elevators India PVT LTD: Sub: Elevator Quotation
9 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
53 pages
Assignment 1-12 ML
No ratings yet
Assignment 1-12 ML
54 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
Daphnia Heart Rate Experiment Guide
No ratings yet
Daphnia Heart Rate Experiment Guide
7 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
AI Lec 3
No ratings yet
AI Lec 3
36 pages
James Dobson Homework
100% (1)
James Dobson Homework
6 pages
LogisticRegression ExercisesSolutions
No ratings yet
LogisticRegression ExercisesSolutions
5 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
ML June 2024
No ratings yet
ML June 2024
12 pages
演讲技巧与主题选择
100% (1)
演讲技巧与主题选择
6 pages
Spring Mid Sem ML Evalution Scheme
No ratings yet
Spring Mid Sem ML Evalution Scheme
8 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
B.Arch. Curriculum Map Overview
No ratings yet
B.Arch. Curriculum Map Overview
1 page
Unit - Iv
No ratings yet
Unit - Iv
11 pages
Create Gantt Chart and Cash Flow Using Excel With A File
No ratings yet
Create Gantt Chart and Cash Flow Using Excel With A File
6 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
Extract Pages From 1 ML
No ratings yet
Extract Pages From 1 ML
4 pages
S&UL Subjective Question Bank
No ratings yet
S&UL Subjective Question Bank
7 pages
BHU RET Geology 2020
0% (1)
BHU RET Geology 2020
41 pages
t4 Sol
No ratings yet
t4 Sol
8 pages
Shubham Pract 6 - Merged
No ratings yet
Shubham Pract 6 - Merged
12 pages
2022 - Machine Learning
No ratings yet
2022 - Machine Learning
6 pages
2024 Machine Learning
No ratings yet
2024 Machine Learning
8 pages
ML 2024a QP Solution Full
No ratings yet
ML 2024a QP Solution Full
13 pages
Logistic Regression
No ratings yet
Logistic Regression
61 pages
Activities Super
No ratings yet
Activities Super
6 pages
MLDAP Module2
No ratings yet
MLDAP Module2
32 pages
Practice Numericals ML
No ratings yet
Practice Numericals ML
8 pages
VLM ZG513 Updated Problem Sheet
No ratings yet
VLM ZG513 Updated Problem Sheet
4 pages
Mltee t5 Assignment Pseudo Code
No ratings yet
Mltee t5 Assignment Pseudo Code
10 pages

P04 EvaluationKNN SolutionNotes

Uploaded by

P04 EvaluationKNN SolutionNotes

Uploaded by

Aprendizagem 2023

Lab 4: kNN and Evaluation

Consider the following data: input output

1. Assuming a k-nearest neighbor with k=3 applied within a leave-one-out schema:

ii. Hamming distance (categorical input variables)

𝑧̂2 = 𝑚𝑒𝑎𝑛(0.5,2, 2.2) = 1.5(6)

c) Consider a weighted-distance kNN with Euclidean (l2) distance, identify:

2. Let 𝑥𝑗 be the measurement on variable 𝑦𝑗 for observation 𝐱.

𝒛 − 𝒛̂ = (0.8 0.5 -0.4 0 0.5 -3)

-1 0 0.5 1 1.5 2 2.5 3 3.5

𝑧 𝑧̂ 0 >0.3 >0.4 >0.45 >0.6 >0.8

b) Compute the training AUC

4. Consider the housing dataset available at https://web.ist.utl.pt/~rmch/dscience/data/housing.arff and

You might also like