Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views14 pages

Machine Learning-Lecture 1 (Student)

The document outlines a Machine Learning course taught by Ya-Mei Chang, including details on textbooks, grading criteria, and office hours. It covers concepts in statistical learning, linear regression, and multiple linear regression, providing examples and R code for practical application. Key topics include estimating coefficients, assessing model accuracy, and hypothesis testing.

Uploaded by

hubertkuo418
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

Machine Learning-Lecture 1 (Student)

The document outlines a Machine Learning course taught by Ya-Mei Chang, including details on textbooks, grading criteria, and office hours. It covers concepts in statistical learning, linear regression, and multiple linear regression, providing examples and R code for practical application. Key topics include estimating coefficients, assessing model accuracy, and hypothesis testing.

Uploaded by

hubertkuo418
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Machine Learning

Lecturer :Ya-Mei Chang

Office: Room 446, ext 66117

[email protected]
Textbook
Title: An Introduction to Statistical Learning: with Applications in R, 2021
Authors: G. James, D. Witten, T. Hastie and R. Tibshirani

Reference Book
Title: The Elements of Statistical Learing: Data mining, Inference and Prediction
Authors: D. Hastie, R. Tibshirani and J. Friedman

Grading:
⚫ Attendance 10%
⚫ Mark of usual 30%
⚫ Midterm Exam 30%
⚫ Final Report 30%

Office hours:
Tue. 10:00~11:00
Thr. 10:00~11:00
What Is Statistical Learning?
⚫ An exemple:
X

Y
More generally, suppose that we observe a quantitative response Y and p
different predictors, X1,X2, . . .,Xp. We assume that there is some relationship
between Y and X = (X1,X2, . . .,Xp), which can be written in the very general form

Here f is some fixed but unknown function of X1, . . . , Xp, and  is a random
error term, which is independent of X and has mean zero. In this formulation, f
represents the systematic information that X provides about Y .

⚫ In essence, statistical learning refers to

Linear Regression
⚫ Simple Linear Regression
➢ Assumption:
It assumes that there is approximately a linear relationship between X and
Y . Mathematically, we can write this linear relationship as

 0 and  1 are two unknown constants that represent the intercept and
slope terms in the linear model. Once we have used our training data to

produce estimates ̂ 0 and ˆ1 for the model coefficients, we can predict

future sales on the basis of a particular value of TV advertising by


computing
➢ Estimating the Coefficients:
Let (x1, y1), (x2, y2), . . . , (xn, yn) represent n observation pairs, each of which
consists of a measurement of X and a measurement of Y . Let

yˆi = ˆ0 + ˆ1 xi be the prediction for Y based on the ith value of X. Then

ei = yi − yˆi represents the ith residual—this is the difference between the


ith observed response value and the ith response value that is predicted by
our linear model. We define the residual sum of squares (RSS) as

or equivalently

The least squares approach chooses ̂ 0 and ˆ1 to minimize the RSS.

Using some calculus, one can show that the minimizers are

n n
where y =  yi / n and x =  xi / n are the sample means.
i =1 i =1

➢ Assessing the Accuracy of the Coefficient Estimates:


1. About  (the population mean of Y),

In general, σ2 is not known. It is estimated by residual standard error

(RSE) and is given by the formula RSE = RSS / (n − 2).

2. About ̂ 0 ,

the 95% confidence interval for ̂ 0 approximately takes the form


3. About ˆ1 ,

the 95% confidence interval for  1 approximately takes the form

H0 : There is no relationship between X and Y


versus
Ha : There is some relationship between X and Y .
Mathematically, this corresponds to testing

➢ Assessing the Accuracy of the Model:


1. RSE
2. R 2 Statistics:
3.

⚫ Multiple Linear Regression


➢ Assumption:

Y = 0 + 1 X1 + + p X p +

➢ Estimating the Coefficients:

We choose 0 , 1, , p to minimize the sum of squared residuals


➢ Some Important Questions
1. Is at least one of the predictors X1,X2, . . . , Xp useful in predicting
the response?
◼ We test the null hypothesis,
H0 : β1 = β2 = ···= βp = 0
versus
Ha : at least one βj is non-zero.
This hypothesis test is performed by computing the F-statistic,

◼ 0 , 1, , p−q

 p−q+1, , p : The last q variables maybe not useful in prediction

This corresponds to a null hypothesis


H0 : βp−q+1 = βp−q+2 = . . . = βp = 0,
In this case we fit a second model that uses all the variables except those
last q. Suppose that the residual sum of squares for that model is RSS0.
Then the appropriate F-statistic is

2. Do all the predictors help to explain Y , or is only a subset of the


predictors useful?

3. How well does the model fit the data?

4. Given a set of predictor values, what response value should we
predict, and how accurate is our prediction?

Computer Session

➢ Simple Linear Regression

library (MASS)
library (ISLR2)

##
## 載入套件:'ISLR2'

## 下列物件被遮斷自 'package:MASS':
##
## Boston

head (Boston)

## crim zn indus chas nox rm age dis rad tax ptratio lstat medv
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 4.98 24.0
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 9.14 21.6
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 4.03 34.7
## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 2.94 33.4
## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 5.33 36.2
## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 5.21 28.7

lm.fit <- lm(medv~lstat, data=Boston)


attach(Boston)
lm.fit<- lm(medv~lstat)
lm.fit

##
## Call:
## lm(formula = medv ~ lstat)
##
## Coefficients:
## (Intercept) lstat
## 34.55 -0.95

summary(lm.fit)

##
## Call:
## lm(formula = medv ~ lstat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.168 -3.990 -1.318 2.034 24.500
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.55384 0.56263 61.41 <2e-16 ***
## lstat -0.95005 0.03873 -24.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.216 on 504 degrees of freedom
## Multiple R-squared: 0.5441, Adjusted R-squared: 0.5432
## F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16

names(lm.fit)

## [1] "coefficients" "residuals" "effects" "rank"


## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"

coef(lm.fit)

## (Intercept) lstat
## 34.5538409 -0.9500494

confint(lm.fit)

## 2.5 % 97.5 %
## (Intercept) 33.448457 35.6592247
## lstat -1.026148 -0.8739505

predict(lm.fit, data.frame(lstat = (c(5, 10, 15) )),interval = "confidence")

## fit lwr upr


## 1 29.80359 29.00741 30.59978
## 2 25.05335 24.47413 25.63256
## 3 20.30310 19.73159 20.87461

predict(lm.fit, data.frame(lstat = (c(5, 10, 15) )),interval = "prediction")

## fit lwr upr


## 1 29.80359 17.565675 42.04151
## 2 25.05335 12.827626 37.27907
## 3 20.30310 8.077742 32.52846

plot(lstat,medv)
abline(lm.fit)
abline(lm.fit,lwd=3)
abline(lm.fit,lwd=3,col="red")

plot(lstat,medv, col = "red")


plot(lstat,medv, pch=20)

plot(lstat,medv,pch="+")
plot (1:20 , 1:20 , pch = 1:20)
par( mfrow = c(1, 2))
plot(lm.fit)
plot(predict(lm.fit), residuals(lm.fit))

➢ Multiple Linear Regression

lm.fit <- lm(medv~lstat+age, data=Boston)


summary(lm.fit)

## Call:
## lm(formula = medv ~ lstat + age, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.981 -3.978 -1.283 1.968 23.158
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.22276 0.73085 45.458 < 2e-16 ***
## lstat -1.03207 0.04819 -21.416 < 2e-16 ***
## age 0.03454 0.01223 2.826 0.00491 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.173 on 503 degrees of freedom
## Multiple R-squared: 0.5513, Adjusted R-squared: 0.5495
## F-statistic: 309 on 2 and 503 DF, p-value: < 2.2e-16
lm.fit <- lm(medv~., data=Boston)
summary(lm.fit)

##
## Call:
## lm(formula = medv ~ ., data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.1304 -2.7673 -0.5814 1.9414 26.2526
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.617270 4.936039 8.431 3.79e-16 ***
## crim -0.121389 0.033000 -3.678 0.000261 ***
## zn 0.046963 0.013879 3.384 0.000772 ***
## indus 0.013468 0.062145 0.217 0.828520
## chas 2.839993 0.870007 3.264 0.001173 **
## nox -18.758022 3.851355 -4.870 1.50e-06 ***
## rm 3.658119 0.420246 8.705 < 2e-16 ***
## age 0.003611 0.013329 0.271 0.786595
## dis -1.490754 0.201623 -7.394 6.17e-13 ***
## rad 0.289405 0.066908 4.325 1.84e-05 ***
## tax -0.012682 0.003801 -3.337 0.000912 ***
## ptratio -0.937533 0.132206 -7.091 4.63e-12 ***
## lstat -0.552019 0.050659 -10.897 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 4.798 on 493 degrees of freedom
## Multiple R-squared: 0.7343, Adjusted R-squared: 0.7278
## F-statistic: 113.5 on 12 and 493 DF, p-value: < 2.2e-16
library (car)

## 載入需要的套件:carData

vif(lm.fit)

## crim zn indus chas nox rm age dis


## 1.767486 2.298459 3.987181 1.071168 4.369093 1.912532 3.088232 3.954037
## rad tax ptratio lstat
## 7.445301 9.002158 1.797060 2.870777

lm.fit1<-lm( medv~.-age, data = Boston)


summary (lm.fit1)

## Call:
## lm(formula = medv ~ . - age, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.1851 -2.7330 -0.6116 1.8555 26.3838
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.525128 4.919684 8.441 3.52e-16 ***
## crim -0.121426 0.032969 -3.683 0.000256 ***
## zn 0.046512 0.013766 3.379 0.000785 ***
## indus 0.013451 0.062086 0.217 0.828577
## chas 2.852773 0.867912 3.287 0.001085 **
## nox -18.485070 3.713714 -4.978 8.91e-07 ***
## rm 3.681070 0.411230 8.951 < 2e-16 ***
## dis -1.506777 0.192570 -7.825 3.12e-14 ***
## rad 0.287940 0.066627 4.322 1.87e-05 ***
## tax -0.012653 0.003796 -3.333 0.000923 ***
## ptratio -0.934649 0.131653 -7.099 4.39e-12 ***
## lstat -0.547409 0.047669 -11.483 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.794 on 494 degrees of freedom
## Multiple R-squared: 0.7343, Adjusted R-squared: 0.7284
## F-statistic: 124.1 on 11 and 494 DF, p-value: < 2.2e-16

You might also like