Homework Assignment1

This document outlines Homework Assignment 1 for Statistics 3850A, due on February 4, 2025, and includes detailed instructions for submission and collaboration. It contains five problems related to machine learning concepts, including linear regression, logistic regression, linear discriminant analysis, and K-nearest neighbor classification. Students are encouraged to ask questions for clarification and must submit their own work with penalties for late submissions.

Uploaded by

Xiaojian Yi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views4 pages

Homework Assignment1

Uploaded by

Xiaojian Yi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Statistics 3850A - Introduction to Machine Learning

Winter 2025
Homework Assignment 1
January 28, 2025
Due: February 4, 17:00

• You may create your document in Word, Google Docs, LaTeX or any other word processor or you
may write by hand. If you decide for the latter option, try to write as legible as possible.
• Write detailed solutions with as much detail as possible. Justify your answers. Round your results
to four decimal digits.
• If a task is not clear, feel free to ask questions before or after the lectures, during office hours or
via e-mail; do not wait until the deadline to ask for explanations.
• Collaboration is encouraged when completing assignments. However, each student needs to submit
their own work.
• The assignments can be dropped off until the deadline in the box outside the office C521 labelled
STAT3850A or can be handed in during the lectures. Please make sure to write your name on the
assignment.
• The assignment will be accepted with a 25% penalty if submitted within 48 hours of the due date.
Otherwise, late assignments will not be marked, except for documented medical reasons.

Problem 1 (3 points)
Consider the simple linear regression model between the quantitative response Yi and the regressor Xi ,
i = 1, . . . , n:
i.i.d.
Yi = β0 + β1 · xi + εi , εi ∼ N(0, σ 2 ),
where x1 , . . . , xn are the realizations of X1 , . . . , Xn . The least squares estimators for the slope β1 and
the intercept β1 are: Pn
(xi − xn )(Yi − Y n )
β1 = i=1Pn
b
2
,
i=1 (xi − xn )
βb0 = Y n − βb1 · xn ,
where Y n is the sample mean (estimator) of RVs Y1 , . . . , Yn and xn is the estimated sample mean of
x1 , . . . , x n .
Prove the following equivalent expression for the variance of the estimator βb0 :
Pn 2
x2n

c2 · P i=1 xi
Var(βb0 ) = σ = c2 1 + P
σ ,
n n
n i=1 (xi − xn )2 n i=1 (xi − xn )
2

c2 is the unbiased estimator for σ 2 .

where σ

Problem 2 (3 points)
Consider the multiple linear regression model between the quantitative response Yi and the regressors
Xi,1 , . . . , Xi,k , i = 1, . . . , n, k ≥ 2:
i.i.d.
Yi = β0 + β1 · xi,1 + . . . βk · xi,k + εi , εi ∼ N(0, σ 2 ),

1
where xi = (xi,1 , . . . , xi,k )⊤ are the realizations of Xi,j , for i = 1, . . . , n and j = 1, . . . , k.
To check whether there is a relation between the response and the regressors, the following hypothesis
is tested:
H0 : β1 = . . . = βk = 0 versus H1 : at least one βj is non-zero,
with the F -statistic
TSS − RSS
F= k ,
RSS
n−k−1
Pn Pn
where TSS = i=1 (Yi − Y n ) and RSS = i=1 (Yi − Ybi )2 are the total sum of squares and the residual
2

sum of squares, respectively.

RSS
If the assumptions of the model hold, is an unbiased estimator for σ 2 .
n−k−1
TSS − RSS
a) Prove that, if H0 holds, is an unbiased estimator for σ 2 .
k
TSS
Hint: observe that is an unbiased estimator for σ 2 .
n−1
b) If there is no relation between the response and the regressor, what is the value of F ? Justify your
answer.

Problem 3 (8 points)
The dataset mtcars in the R package datasets contains the characteristics of 32 cars, among which wt =
weight (1000 lbs), mpg = miles/gallon and hp = gross horsepower. These variables are used in a multiple
logistic regression model to predict the response am = transmission (0 = automatic, 1 = manual):
logit(am) ≈ β0 + β1 · wt + β2 · mpg + β3 · hp.
The model provides the following output:
Estimate Std. error z value p-value
Intercept -15.72137 40.00281 -0.393 0.6943
wt -6.95492 3.35297 -2.074 ?
mpg 1.22930 1.58109 ? 0.4369
hp 0.08389 0.08228 ? ?

a) Based on the estimated values for the slopes, approximate the probability of a car having a manual
transmission, given that w
ft = 3.215, mg
pg = 21.4 and hfp = 110.

b) Determine the missing values in the output of the model. You may use the z table at the end of
the assignment to approximate the values.
βb
Hint: under the null hypothesis for each regressor H0 : β = 0, the statistic ∼ N(0, 1), where βb
Sβb
is the estimator for β and S b is the standard deviation (standard error) of the estimator β.
β
b

c) If the significant level of each hypothesis is α = 0.05, decide which regressor is significant in the
model. What is the relation between the significant regressor(s) and the response am ?

Problem 4 (6 points)

x1
The linear discriminant analysis classifies an observation x = to class 2 if δb2 (x) > δb1 (x), or
x2
equivalently,
1 π
b1
x⊤ Σ
b −1 (µ
b2 − µ
b 1 ) > (µ b 2 )⊤ Σ
b1 + µ b −1 (µ
b2 − µ
b 1 ) − log ,
2 π
b2

2
and to class 1, otherwise, where µb 1 and µb 2 are the estimators for the mean vector of regressors in class 1
b −1 is the estimator for the inverse covariance matrix, and π
and class 2, respectively, Σ b1 and πb2 are the
prior probabilities of belonging to class 1 and 2, respectively.
For a given sample with the same number of observations in each group, the following values are
estimated (rounded to 2 digits):

−0.42 0.69 −1 0.07 0
µ
b1 = , µ b2 = , Σ =
b .
2.21 4.24 0 0.07

Determine the LDA boundary.

Problem 5 (5 points)
The Euclidean distance (p = 2) between a training set of 7 observations and a new observation x̃ is given
below:
Observation 1 2 3 4 5 6 7
Distance to x̃ 25.00 33.54 68.01 14.14 61.03 47.17 45.28
Class 2 1 1 2 1 2 1
We apply a K-nearest neighbor classifier for x̃ with K = 5. Run the algorithm and decide in which class
is the new observation x̃.

3
Cumulative distribution function of standard normal distribution

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Quantiles of the standard normal

distribution
α 0.9 0.95 0.975 0.99 0.995
zα 1.282 1.645 1.96 2.326 2.576

CSC TXT-VOL02-Chap16 2023 08 EN V01
No ratings yet
CSC TXT-VOL02-Chap16 2023 08 EN V01
26 pages
740 Packet Instructions 2023
No ratings yet
740 Packet Instructions 2023
40 pages
Chapter 13 - IP
No ratings yet
Chapter 13 - IP
60 pages
Finance 3470 Book Problems
No ratings yet
Finance 3470 Book Problems
25 pages
MGT 4090 - Week 5 - Session 1 - Jan 30, 2023
No ratings yet
MGT 4090 - Week 5 - Session 1 - Jan 30, 2023
23 pages
Lasso Regression for Data Scientists
No ratings yet
Lasso Regression for Data Scientists
16 pages
Chapter 16 Lecture
No ratings yet
Chapter 16 Lecture
31 pages
CSC TXT-VOL02-Chap25 2023 08 EN V01
No ratings yet
CSC TXT-VOL02-Chap25 2023 08 EN V01
14 pages
Cep - Are You Ready For Level 1
No ratings yet
Cep - Are You Ready For Level 1
10 pages
First Mid Term Examination - 2019
No ratings yet
First Mid Term Examination - 2019
2 pages
Quiz 1
No ratings yet
Quiz 1
5 pages
670 Final Exam
No ratings yet
670 Final Exam
12 pages
University of Michigan STATS 500 hw4 F2020
No ratings yet
University of Michigan STATS 500 hw4 F2020
2 pages
Systat 13.2 Brochure
No ratings yet
Systat 13.2 Brochure
8 pages
Reading 1 Multiple Regression
No ratings yet
Reading 1 Multiple Regression
73 pages
Introductury Econometrics: A Modern Approach 7th Edition Jeffrey M. Wooldridge - Quickly Download The Ebook To Explore The Full Content
100% (3)
Introductury Econometrics: A Modern Approach 7th Edition Jeffrey M. Wooldridge - Quickly Download The Ebook To Explore The Full Content
57 pages
Lecture05 - Survival Analysis
No ratings yet
Lecture05 - Survival Analysis
52 pages
3480A SYLLABUS Fall 2024 Classroom Lecture
No ratings yet
3480A SYLLABUS Fall 2024 Classroom Lecture
5 pages
Chris Brooks Chapter 3 Slides
No ratings yet
Chris Brooks Chapter 3 Slides
80 pages
Artificial Intelligence Lec 4
No ratings yet
Artificial Intelligence Lec 4
13 pages
MATH341 Exercises
No ratings yet
MATH341 Exercises
14 pages
BLT13e Business Law Exam 1
No ratings yet
BLT13e Business Law Exam 1
9 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
Capstone2.0 Presentation-Rubric
No ratings yet
Capstone2.0 Presentation-Rubric
2 pages
New Assign 1
No ratings yet
New Assign 1
5 pages
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
No ratings yet
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
226 pages
Summer 2024 Assignment
No ratings yet
Summer 2024 Assignment
1 page
CH 03 CLA Student Handout
No ratings yet
CH 03 CLA Student Handout
1 page
Registration Directions (Sec B)
No ratings yet
Registration Directions (Sec B)
1 page
Chapter 15 Moodle Problem
No ratings yet
Chapter 15 Moodle Problem
1 page
Ticket
No ratings yet
Ticket
1 page
Assignment 2
No ratings yet
Assignment 2
1 page
Exam Practice
No ratings yet
Exam Practice
5 pages
Lesson 1 - Time Series Basics
No ratings yet
Lesson 1 - Time Series Basics
23 pages
MIT18 650F16 PSet8
No ratings yet
MIT18 650F16 PSet8
4 pages
Regression Techniques Exam Series
No ratings yet
Regression Techniques Exam Series
428 pages
Regression and Monte Carlo
No ratings yet
Regression and Monte Carlo
5 pages
Assignment 1 New Version
No ratings yet
Assignment 1 New Version
4 pages
Business Analytics for Academics
No ratings yet
Business Analytics for Academics
54 pages
Budgeting Case ACCT 2400 Summer 2024
No ratings yet
Budgeting Case ACCT 2400 Summer 2024
3 pages
f23 Econ103 Week2 Ta Note
No ratings yet
f23 Econ103 Week2 Ta Note
5 pages
Theo Assignment 2 New
No ratings yet
Theo Assignment 2 New
10 pages
Regression Analysis 1683992943
No ratings yet
Regression Analysis 1683992943
22 pages
Exam Practice Solution
No ratings yet
Exam Practice Solution
9 pages
HW Week5 Solutions
No ratings yet
HW Week5 Solutions
7 pages
GU4291 GR5291 Homework1 23079925
No ratings yet
GU4291 GR5291 Homework1 23079925
3 pages
Demo0 Sol1
No ratings yet
Demo0 Sol1
5 pages
Data Science Unit-II
No ratings yet
Data Science Unit-II
28 pages
Econometrics - Exercise Set 2 (Solution)
No ratings yet
Econometrics - Exercise Set 2 (Solution)
12 pages
Reg HW1 Solution
No ratings yet
Reg HW1 Solution
2 pages
Regression Analysis Random Motors
100% (2)
Regression Analysis Random Motors
11 pages
Ps 4
No ratings yet
Ps 4
4 pages
Final Exam - Sample Test
No ratings yet
Final Exam - Sample Test
6 pages
Certificate 2
No ratings yet
Certificate 2
1 page
Sem3.P2.Unit1.Simple Random Sampling
No ratings yet
Sem3.P2.Unit1.Simple Random Sampling
47 pages
2023 Applied Stat Comp Qual Exam
No ratings yet
2023 Applied Stat Comp Qual Exam
4 pages
Stat QEs S24
No ratings yet
Stat QEs S24
11 pages
Lecture BDS 3 23 24 Print
No ratings yet
Lecture BDS 3 23 24 Print
20 pages
Econ0064 Final Exam 2018-19 Term3
No ratings yet
Econ0064 Final Exam 2018-19 Term3
7 pages
Exercises
No ratings yet
Exercises
15 pages
E1 244 March2020-Mid
No ratings yet
E1 244 March2020-Mid
2 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
049 Stat 326 Regression Final Paper
No ratings yet
049 Stat 326 Regression Final Paper
17 pages
Sample Size and Estimation New
No ratings yet
Sample Size and Estimation New
4 pages
MATH3714 Jan 2024
No ratings yet
MATH3714 Jan 2024
9 pages
Linear Regression & Least Squares
No ratings yet
Linear Regression & Least Squares
29 pages
HW2 22 Withpaper
No ratings yet
HW2 22 Withpaper
20 pages
Metrix in ML
No ratings yet
Metrix in ML
7 pages
LM Ques PPR
No ratings yet
LM Ques PPR
8 pages
6013B0519Y T2 Homework Solutions 20240504
No ratings yet
6013B0519Y T2 Homework Solutions 20240504
6 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
Econometrics: VECM Guide
No ratings yet
Econometrics: VECM Guide
13 pages
Statistics For Business and Economics: Simple Regression
No ratings yet
Statistics For Business and Economics: Simple Regression
68 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
3 pages
BUSI 650 - Final Exam
No ratings yet
BUSI 650 - Final Exam
14 pages
hw1 PDF
No ratings yet
hw1 PDF
3 pages
15 Model Averaging
No ratings yet
15 Model Averaging
7 pages
Metrics Aug 2023
No ratings yet
Metrics Aug 2023
10 pages
Problem Set 3 PDF
No ratings yet
Problem Set 3 PDF
2 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
ps3 PDF
No ratings yet
ps3 PDF
3 pages
Answer Key To Exercises - LN3 - Ver2
No ratings yet
Answer Key To Exercises - LN3 - Ver2
16 pages
ˆ β = (X X) X y: dyˆ i,i dy
No ratings yet
ˆ β = (X X) X y: dyˆ i,i dy
7 pages
LM
No ratings yet
LM
18 pages
Regression Analysis: Interpretation of Regression Model
100% (1)
Regression Analysis: Interpretation of Regression Model
22 pages
MSC Econometrics (Ec402) : 2021-2022 Problem Set #3
No ratings yet
MSC Econometrics (Ec402) : 2021-2022 Problem Set #3
3 pages
EconometricsII Exercises
100% (1)
EconometricsII Exercises
27 pages
Kalman Filtering & Data Fusion
100% (2)
Kalman Filtering & Data Fusion
70 pages
UCLA Statistics Final Exam
No ratings yet
UCLA Statistics Final Exam
5 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Linear Stochastic Models: 5.1 Least Squares
No ratings yet
Linear Stochastic Models: 5.1 Least Squares
12 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
Rumus MPC 2 - UASgfhg
No ratings yet
Rumus MPC 2 - UASgfhg
2 pages
Applied Regression Assignment
No ratings yet
Applied Regression Assignment
2 pages
Econometrics: Problem Set 2: Professor: Mauricio Sarrias
No ratings yet
Econometrics: Problem Set 2: Professor: Mauricio Sarrias
10 pages
SPSS Answers (Chapter 5)
No ratings yet
SPSS Answers (Chapter 5)
6 pages
3008 Assignment 1 - Due Oct 9th Revised
No ratings yet
3008 Assignment 1 - Due Oct 9th Revised
3 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
172 pages
Beginner's Guide to IRIS in Matlab
No ratings yet
Beginner's Guide to IRIS in Matlab
67 pages
2101 F 17 Assignment 1
No ratings yet
2101 F 17 Assignment 1
8 pages
Homework 5 Solution
No ratings yet
Homework 5 Solution
7 pages
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
Series 1
No ratings yet
Series 1
2 pages
Activity 7
No ratings yet
Activity 7
5 pages
Old Exam-Dec PDF
No ratings yet
Old Exam-Dec PDF
6 pages
Econometrics Regression Insights
No ratings yet
Econometrics Regression Insights
20 pages

Homework Assignment1

Uploaded by

Homework Assignment1

Uploaded by

Statistics 3850A - Introduction to Machine Learning

c2 is the unbiased estimator for σ 2 .

sum of squares, respectively.

Determine the LDA boundary.

Quantiles of the standard normal

You might also like