SDSC3006 - Assignment 1

Uploaded by

sze

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views3 pages

SDSC3006 - Assignment 1

Uploaded by

sze

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

SDSC 3006 Fundamentals of Machine Learning I

Assignment #1

Deadline: October 13, Sunday@ 10:00 PM

1. For each of parts (a) through (d), indicate whether we would generally expect the performance
of a flexible statistical learning method to be better or worse than an inflexible method. Justify
your answer.
(a) The sample size n is extremely large, and the number of predictors p is small.
(b) The number of predictors p is extremely large, and the number of observations n is small.
(c) The relationship between the predictors and response is highly non-linear.
(d) The variance of the error terms, i.e. 𝜎 2 = 𝑉𝑎𝑟(𝜖), is extremely high.

2. We now revisit the bias-variance decomposition.

(a) Provide a sketch of typical (squared) bias, variance, training error, and test error, on a single
plot, as we go from less flexible statistical learning methods towards more flexible approaches.
The x-axis should represent the amount of flexibility in the method, and the y-axis should
represent the values for each curve. There should be four curves. Make sure to label each one.
(b) Explain why each of the four curves has the shape displayed in part (a).

3. Suppose we have a data set with five predictors, 𝑋1 = GPA, 𝑋2 = IQ, 𝑋3 = Gender (1 for
Female and 0 for Male), 𝑋4 = Interaction between GPA and IQ, and 𝑋5 = Interaction between
GPA and Gender. The response is starting salary after graduation (in thousands of dollars).
Suppose we use least squares to fit the model, and get 𝛽̂0 = 50, 𝛽̂1 = 20, 𝛽̂2 = 0.07, 𝛽̂3 = 35,
𝛽̂4 = 0.01, 𝛽̂5 = −10.
(a) Which answer is correct, and why?
i. For a fixed value of IQ and GPA, males earn more, on average, than females.
ii. For a fixed value of IQ and GPA, females earn more, on average, than males.
iii. For a fixed value of IQ and GPA, males earn more, on average, than females provided that
the GPA is high enough.
iv. For a fixed value of IQ and GPA, females earn more, on average, than males provided that the
GPA is high enough.
(b) Predict the salary of a female with IQ of 110 and a GPA of 4.0.
(c) True or false: Since the coefficient for the GPA/IQ interaction term is very small, there is
very little evidence of an interaction effect. Justify your answer.

1
4. Using the Carseats data set (ISLP package) to answer the following questions.
(a) Fit a multiple regression model to predict Sales using Price, Urban, and US.
(b) Provide an interpretation of each coefficient in the model. Be careful—some of the variables
in the model are qualitative!
(c) Write out the model in equation form, being careful to handle the qualitative variables
properly.
(d) For which of the predictors can you reject the null hypothesis H0: 𝛽𝑗 = 0?
(e) On the basis of your response to the previous question, fit a smaller model that only uses the
predictors for which there is evidence of association with the outcome.
(f) How well do the models in (a) and (e) fit the data?
(g) Using the model from (e), obtain 95% confidence intervals for the coefficient(s).
(h) Is there evidence of outliers or high leverage observations in the model from (e)?

5. Suppose we collect data for a group of students in a statistics class with variables 𝑋1 = hours
studied, 𝑋2 = undergrad GPA, and 𝑌 = receive an A. We fit a logistic regression and produce
estimated coefficient, 𝛽̂0 = −6, 𝛽̂1 = 0.05, 𝛽̂2 = 1.
(a) Estimate the probability that a student who studies for 40 h and has an undergrad GPA of 3.5
gets an A in the class.
(b) How many hours would the student in part (a) need to study to have a 50% chance of getting
an A in the class?

6. Answer the following questions about the differences between LDA and QDA.
(a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the
training set? On the test set?
(b) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on
the training set? On the test set?
(c) In general, as the sample size n increases, do we expect the test prediction accuracy of QDA
relative to LDA to improve, decline, or be unchanged? Why?
(d) True or False: Even if the Bayes decision boundary for a given problem is linear, we will
probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible
enough to model a linear decision boundary. Justify your answer.

2
7. This question should be answered using the Weekly data set (ISLP package). This data is similar
in nature to the Smarket data from this chapter’s lab, except that it contains 1089 weekly returns
for 21 years, from the beginning of 1990 to the end of 2010.
(a) Produce some numerical and graphical summaries of the Weekly data. Do there appear to be
any patterns?
(b) Use the full data set to perform a logistic regression with Direction as the response and the five
lag variables plus Volume as predictors. Use the summary function to print the results. Do any of
the predictors appear to be statistically significant? If so, which ones?
(c) Compute the confusion matrix and overall fraction of correct predictions. Explain what the
confusion matrix is telling you about the types of mistakes made by logistic regression.
(d) Now fit the logistic regression model using a training data period from 1990 to 2008, with Lag2
as the only predictor. Compute the confusion matrix and the overall fraction of correct predictions
for the held out data (that is, the data from 2009 and 2010).
(e) Repeat (d) using LDA.
(f) Repeat (d) using QDA.
(g) Repeat (d) using KNN with K = 1.
(h) Which of these methods appears to provide the best results on this data?

Machine Learning Assignment Guide
No ratings yet
Machine Learning Assignment Guide
2 pages
SDSC3006 - Assignment 2
No ratings yet
SDSC3006 - Assignment 2
3 pages
HW1
100% (1)
HW1
18 pages
As Past Exams
No ratings yet
As Past Exams
60 pages
SLA Mid-termV2 Soln
No ratings yet
SLA Mid-termV2 Soln
5 pages
SDSC3006 - Assignment 3
No ratings yet
SDSC3006 - Assignment 3
4 pages
SDS Solution1
No ratings yet
SDS Solution1
26 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
Assignment 02
No ratings yet
Assignment 02
2 pages
Practice Questions
No ratings yet
Practice Questions
3 pages
Activities Super
No ratings yet
Activities Super
6 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
ML MID-1 Question Bank
No ratings yet
ML MID-1 Question Bank
6 pages
Big Data Science Assignment
No ratings yet
Big Data Science Assignment
17 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
Practice Questions ML1 CSM354
No ratings yet
Practice Questions ML1 CSM354
5 pages
Stats 12 Practice Test
No ratings yet
Stats 12 Practice Test
6 pages
Business Stats Regression Analysis
No ratings yet
Business Stats Regression Analysis
4 pages
Assignment 2 Full
No ratings yet
Assignment 2 Full
10 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
No ratings yet
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
11 pages
Machine Learning PYQ 2021
No ratings yet
Machine Learning PYQ 2021
4 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
DOC-corrige-9 Are You Yanking My Pizzleare You Yanking My Pizzleare You Yanking My Pizzle
No ratings yet
DOC-corrige-9 Are You Yanking My Pizzleare You Yanking My Pizzleare You Yanking My Pizzle
10 pages
Module 4: Recommended Exercises: Problem 1: KNN (Exercise 2.4.7 in ISL Textbook, Slightly Modified)
No ratings yet
Module 4: Recommended Exercises: Problem 1: KNN (Exercise 2.4.7 in ISL Textbook, Slightly Modified)
6 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
ML - Gate - Test 1
No ratings yet
ML - Gate - Test 1
7 pages
Mid Semester Regular-DM
No ratings yet
Mid Semester Regular-DM
3 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Linear Regression and Qualitative Predictors Analysis
No ratings yet
Linear Regression and Qualitative Predictors Analysis
66 pages
ISLP - Website-135-200 (1) - 1-60
No ratings yet
ISLP - Website-135-200 (1) - 1-60
60 pages
Exam Final 1 Exam
No ratings yet
Exam Final 1 Exam
12 pages
Quiz 2 2021 Sol
No ratings yet
Quiz 2 2021 Sol
8 pages
MID SEM QP 2024 MARCH Final
No ratings yet
MID SEM QP 2024 MARCH Final
4 pages
Statistics GIDP Ph.D. Qualifying Exam Methodology: January 10, 9:00am-1:00pm
No ratings yet
Statistics GIDP Ph.D. Qualifying Exam Methodology: January 10, 9:00am-1:00pm
20 pages
2021 Quiz2 Problems
No ratings yet
2021 Quiz2 Problems
13 pages
DA Question Bank
No ratings yet
DA Question Bank
4 pages
Activity 7
No ratings yet
Activity 7
5 pages
HW 02
No ratings yet
HW 02
3 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
Advanced Stats Final Exam Sample
No ratings yet
Advanced Stats Final Exam Sample
9 pages
CSE 312-Introduction To Statistical Tools in Research - Question Bank
No ratings yet
CSE 312-Introduction To Statistical Tools in Research - Question Bank
6 pages
Cie 2
No ratings yet
Cie 2
4 pages
Confidence Interval Practice Questions
No ratings yet
Confidence Interval Practice Questions
8 pages
STAT 31631 - Statistical Modeling - Assignment01
No ratings yet
STAT 31631 - Statistical Modeling - Assignment01
2 pages
Due: Monday September 17: Homework 2 ECE 445 Biomedical Instrumentation, Fall 2012
No ratings yet
Due: Monday September 17: Homework 2 ECE 445 Biomedical Instrumentation, Fall 2012
2 pages
All The Previous Questions
No ratings yet
All The Previous Questions
37 pages
HW 11
No ratings yet
HW 11
3 pages
Intermediate Statistics Sample Test 1
0% (3)
Intermediate Statistics Sample Test 1
17 pages
MGEB12 SampleFinal
No ratings yet
MGEB12 SampleFinal
19 pages
Assignment 1-12 ML
No ratings yet
Assignment 1-12 ML
54 pages
ML MID-1 QB With Answers
No ratings yet
ML MID-1 QB With Answers
10 pages
Assignment STAT5002
No ratings yet
Assignment STAT5002
5 pages
CSE3506 PPT Ref1
No ratings yet
CSE3506 PPT Ref1
135 pages
Practice 02 Nonlinear Regression
No ratings yet
Practice 02 Nonlinear Regression
3 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Homework 5 - Logistic Regression in R
No ratings yet
Homework 5 - Logistic Regression in R
3 pages
ML Ques Mod-1
No ratings yet
ML Ques Mod-1
25 pages
ECO465 SampleMidterm
No ratings yet
ECO465 SampleMidterm
2 pages
Polynomial Regression Presentation
No ratings yet
Polynomial Regression Presentation
11 pages
Exp No 2
No ratings yet
Exp No 2
5 pages
CPOK Density and Temperature Analysis
No ratings yet
CPOK Density and Temperature Analysis
18 pages
STAT 310 Syllabus
No ratings yet
STAT 310 Syllabus
5 pages
Lasso Regression
No ratings yet
Lasso Regression
3 pages
Fixed and Random Effects
No ratings yet
Fixed and Random Effects
23 pages
Regression Excel Template
No ratings yet
Regression Excel Template
4 pages
Excel Regression Analysis Guide
No ratings yet
Excel Regression Analysis Guide
17 pages
Lecture 4 Notes Final20180219203938
No ratings yet
Lecture 4 Notes Final20180219203938
21 pages
Inference Assignment 3
No ratings yet
Inference Assignment 3
4 pages
Econometrics Exercise Solutions
No ratings yet
Econometrics Exercise Solutions
12 pages
Dynamic Panel Estimator: GMM: Dr. Elya Nabila Abdul Bahri
No ratings yet
Dynamic Panel Estimator: GMM: Dr. Elya Nabila Abdul Bahri
18 pages
Data Analysis and Statistical Packages 1
No ratings yet
Data Analysis and Statistical Packages 1
19 pages
Generalized Linear Models: Simon Jackman Stanford University
No ratings yet
Generalized Linear Models: Simon Jackman Stanford University
7 pages
Correlation & Regression Analysis
No ratings yet
Correlation & Regression Analysis
19 pages
Econometrics of LPGA Golf Earnings
No ratings yet
Econometrics of LPGA Golf Earnings
22 pages
Multinominal Logistic Regression - D. Boduszek
No ratings yet
Multinominal Logistic Regression - D. Boduszek
17 pages
Econometrics Course Syllabus 2014
No ratings yet
Econometrics Course Syllabus 2014
3 pages
Construction and Analysis of An Augmented Lattice Square Design
No ratings yet
Construction and Analysis of An Augmented Lattice Square Design
12 pages
Poe 5 Statatoc
No ratings yet
Poe 5 Statatoc
12 pages
Ch04 ClassProblems
No ratings yet
Ch04 ClassProblems
11 pages
Uji Univariat
No ratings yet
Uji Univariat
2 pages
Comprehensive Guide to Data Analysis
No ratings yet
Comprehensive Guide to Data Analysis
1 page
LM
No ratings yet
LM
18 pages
Stata Time Series Varsoc
No ratings yet
Stata Time Series Varsoc
6 pages
Linear Regression Guide for Beginners
No ratings yet
Linear Regression Guide for Beginners
23 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
USING DUMMY VARIABLES IN THE EVENT METHODOLOGY Imre Karafiath
No ratings yet
USING DUMMY VARIABLES IN THE EVENT METHODOLOGY Imre Karafiath
7 pages
2 - CHAPTER TWO-Mean and Total Estimation
No ratings yet
2 - CHAPTER TWO-Mean and Total Estimation
14 pages

SDSC3006 - Assignment 1

Uploaded by

SDSC3006 - Assignment 1

Uploaded by

SDSC 3006 Fundamentals of Machine Learning I

Deadline: October 13, Sunday@ 10:00 PM

2. We now revisit the bias-variance decomposition.

You might also like