0% found this document useful (0 votes)

4 views52 pages

Topic3 Linear Regression

Uploaded by

Bui Xuan Phong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views52 pages

Topic3 Linear Regression

Uploaded by

Bui Xuan Phong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 1 / 52

1 Introduction

2 Forming Equation for Simple Linear Regression

Manual Practice
SLR in General
OLS in R

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 2 / 52

1 Introduction

2 Forming Equation for Simple Linear Regression

Manual Practice
SLR in General
OLS in R

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 3 / 52

Supervised Learning

In data science, many applications involve making predictions about the outcome y based
on a number of predictors x.

Often we assume models of the form

y ≈ f (x)

where f (x) is a function that maps the predictor(s) to the outcome.

One example is the linear regression model.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 4 / 52

Supervised Learning

It is called `supervised learning' because we have data on both the outcome y and the
predictor x.

Therefore, the data can `teach' us, given a certain predictor value for x, what is the most
likely corresponding outcome y.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 5 / 52

Linear regression

Linear regression is an analytical technique used to model the relationship between several
input variables and a continuous outcome variable.

A key assumption is that the relationships between the input variables and the outcome
variable is linear.

For example, in simple linear regression with only one predictor, we assume a model of the
form
y ≈ f (x) = β0 + β1 x.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 6 / 52

Examples

Forecasting models can be built to predict

taxi demand, emergency room visits, and
ambulance dispatches.

Source: The Business Times

Topic 3 Linear Regression DSA1101 Introduction to Data Science 7 / 52

Examples

Medical: Linear regression model can be

used to analyze the eect of a proposed
radiation treatment on reducing tumor
sizes.

Multiple input variables might include

duration of a single radiation treatment,
frequency of radiation treatment, and
patient attributes such as age or weight.

Source: The Business Times

Topic 3 Linear Regression DSA1101 Introduction to Data Science 8 / 52

Examples

Pharmaceutical Industry:: Linear

regression model can be used to analyze
the clinical ecacy of drugs.

Input variables may include age, gender

and other patient characteristics such as
blood pressure and blood sugar level.

Source: The Business Times

Topic 3 Linear Regression DSA1101 Introduction to Data Science 9 / 52

Examples

Finance: Linear regression is used to

model the relationships between stock
market prices and other variables such as
economic performance, interest rates and
geopolitical risks.

Source: The Business Times

Topic 3 Linear Regression DSA1101 Introduction to Data Science 10 / 52

Examples

Real estate: Linear regression analysis

can be used to model at's price as a
function of the oor area.

Such a model helps set or evaluate the list

price of a at on the market.

The model could be further improved by

including other input variables such as
number of bathrooms, number of
bedrooms, district rankings, etc.

Source: The Business Times

Topic 3 Linear Regression DSA1101 Introduction to Data Science 11 / 52

Data on HDB Resale Flats

Data on resale HDB prices based on

registration date is publicly available from
https://data.gov.sg/dataset/
resale-flat-prices
We have extracted a subset of all the
resale records from March 2012 to
December 2014 based on registration date.

Available as the data set

hdbresale_reg.csv on the course
website.
Source: The Business Times

Topic 3 Linear Regression DSA1101 Introduction to Data Science 12 / 52

HDB Resale Flats

> resale = read.csv("C:/Data/hdbresale_reg.csv")

> head(resale[ ,2:7]) # 1st column indicates ID of flats
month town flat_type block street_name storey_range
1 2012-03 CENTRAL AREA 3 ROOM 640 ROWELL RD 01 TO 05
2 2012-03 CENTRAL AREA 3 ROOM 640 ROWELL RD 06 TO 10
3 2012-03 CENTRAL AREA 3 ROOM 668 CHANDER RD 01 TO 05
4 2012-03 CENTRAL AREA 3 ROOM 5 TG PAGAR PLAZA 11 TO 15
5 2012-03 CENTRAL AREA 3 ROOM 271 QUEEN ST 11 TO 15
6 2012-03 CENTRAL AREA 4 ROOM 671A KLANG LANE 01 TO 05

Topic 3 Linear Regression DSA1101 Introduction to Data Science 13 / 52

HDB Resale Flats

> head(resale[ ,8:11])

floor_area_sqm flat_model lease_commence_date resale_price
1 74 Model A 1984 380000
2 74 Model A 1984 388000
3 73 Model A 1984 400000
4 59 Improved 1977 460000
5 68 Improved 1979 488000
6 75 Model A 2003 495000

Suppose we are interested in building a linear regression model that estimates a HDB
unit's resale price as a function of oor area in square meters.

How to form such function?

Topic 3 Linear Regression DSA1101 Introduction to Data Science 14 / 52

1 Introduction

2 Forming Equation for Simple Linear Regression

Manual Practice
SLR in General
OLS in R

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 15 / 52

Simple Linear Regression (SLR)

Suppose we have three observations. Each observation has an outcome y and an input
variable x.

We are interested in the linear relationship

yi ≈ β0 + β1 xi

Since there is only one input variable, this is an example of simple linear model.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 16 / 52

1 Introduction

2 Forming Equation for Simple Linear Regression

Manual Practice
SLR in General
OLS in R

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 17 / 52

Ordinary Least Squares for SLR

i xi yi
1 -1 -1
2 3 3.5
3 5 3

Plot of the three data points.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 18 / 52

Ordinary Least Squares for SLR

We are interested in the linear relationship

yi ≈ f (xi ) = β0 + β1 xi

There are many dierent straight lines

that can be used to model x and y, as
shown in the plot.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 19 / 52

Ordinary Least Squares for SLR

Intuitively, we want the line to be as close

to the data points as possible.

This closeness can be measured in terms

of the vertical distance between each point
to the line.

The line that is closest to the data points

is chosen as best-tting line. The values of
intercept and slope of it are picked for β0
and β1 .

Topic 3 Linear Regression DSA1101 Introduction to Data Science 20 / 52

Ordinary Least Squares for SLR

i xi yi β0 + β1 xi residual: ei = yi − (β0 + β1 xi )
1 -1 -1 β0 + (−1)β1 −1 − (β0 + (−1)β1 ) = −1 − β0 + β1
2 3 3.5 β0 + (3)β1 3.5 − (β0 + (3)β1 ) = 3.5 − β0 − 3β1
3 5 3 β0 + (5)β1 3 − (β0 + (5)β1 ) = 3 − β0 − 5β1

Topic 3 Linear Regression DSA1101 Introduction to Data Science 21 / 52

Ordinary Least Squares for SLR

The residual for each point may be positive or negative.

We do not want the residuals to cancel o each other, so we square each of them,
leading to the squared residuals.

The squared residuals are in the last column.

i ei = yi − (β0 + β1 xi )
residual: squared residual: e2i
1 −1 − (β0 + (−1)β1 ) = −1 − β0 + β1 [−1 − β0 + β1 ]2
2 3.5 − (β0 + (3)β1 ) = 3.5 − β0 − 3β1 [3.5 − β0 − 3β1 ]2
3 3 − (β0 + (5)β1 ) = 3 − β0 − 5β1 [3 − β0 − 5β1 ]2

Topic 3 Linear Regression DSA1101 Introduction to Data Science 22 / 52

Ordinary Least Squares for SLR

To express the total magnitude of the deviations, we sum up the squared residuals for all
the data points, Residual Sum of Squares, abbreviated as RSS, some might denote it as
SSres , sum of squared residuals.

For the 3 data points, we have

RSS = e21 + e21 + e23

= [−1 − β0 + β1 ]2 + [3.5 − β0 − 3β1 ]2 + [3 − β0 − 5β1 ]2 .

Topic 3 Linear Regression DSA1101 Introduction to Data Science 23 / 52

Ordinary Least Squares for SLR

We now need to nd the values of β0 and β1 such that RSS is minimized, where

RSS = [−1 − β0 + β1 ]2 + [3.5 − β0 − 3β1 ]2 + [3 − β0 − 5β1 ]2 .

The whole process (from getting each ei , e2i , RSS and minimize it to get the values of β0
and β1 ) is known as the method of ordinary least squares (OLS).

Topic 3 Linear Regression DSA1101 Introduction to Data Science 24 / 52

Ordinary Least Squares for SLR

Consider RSS as a function in terms of β0 and β1 . Let's call it h(β0 , β1 ).

To nd the minimum of h(β0 , β1 ), we rst take the derivative of it w.r.t β0 while holding
β1 constant, and then take the derivative of it w.r.t β1 while holding β0 constant.

∂h(β0 , β1 )
= −11 + 6β0 + 14β1 .
∂β0
∂h(β0 , β1 )
= −53 + 14β0 + 70β1 .
∂β1

Topic 3 Linear Regression DSA1101 Introduction to Data Science 25 / 52

Ordinary Least Squares for SLR
Lastly, setting both the derivative to zero, we have a system of two equations

−11 + 6β0 + 14β1 = 0

−53 + 14β0 + 70β1 = 0

Solving it, we have the solution which is the least squares estimates

β0 ≈ 0.1250
β1 ≈ 0.7321

We usually add the hat sign on top of the parameter to denote estimated value of the
parameter, so the least squares estimates are

β̂0 = 0.1250
β̂1 = 0.7321.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 26 / 52

1 Introduction

2 Forming Equation for Simple Linear Regression

Manual Practice
SLR in General
OLS in R

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 27 / 52

Ordinary Least Squares for SLR in General

In the previous slides, we had a specic example, a data with 3 points, and had manually
practiced the OLS method.

We now generalize OLS to a data set which has 2 variables X and Y with n observations
(x1 , y1 ), ..., (xn , yn ).

The simple model (straight line) has the form

yi ≈ β0 + β1 xi , i = 1, ..., n.

Then, the residuals are

ei = yi − (β0 + β1 xi ), i = 1, ..., n

Topic 3 Linear Regression DSA1101 Introduction to Data Science 28 / 52

Ordinary Least Squares for SLR in General

The residuals sum of squares is then

n
X h i2
RSS = e2i = yi − (β0 + β1 xi ) .
i=1

n
∂RSS X
Take derivative of RSS w.r.t β0 and β1 , one at a time. = −2 (yi − β0 − β1 xi )
∂β0
i=1
n
∂RSS X
= −2 (yi − β0 − β1 xi )xi
∂β1
i=1

Topic 3 Linear Regression DSA1101 Introduction to Data Science 29 / 52

Ordinary Least Squares for SLR in General

The least square estimate of β0 and β1 , βˆ0 and βˆ1 , are the solution when we set the
derivative to zero.

n n
1 X 1X
βˆ0 + β̂1 xi − yi = 0 (1)
n n
i=1 i=1
n n n
1 X 1 X 1 X
βˆ0 xi + β̂1 x2i − yi x i = 0 (2)
n n n
i=1 i=1 i=1

(1) and (2) are called the least-squares normal equations.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 30 / 52

Ordinary Least Squares for SLR in General

n n
1X 1X
Denote ȳ = yi and x̄ = xi .
n n
i=1 i=1

From (1), we have β̂0 = ȳ − β̂1 x̄, and replace this β̂0 into (2), we have

Pn Pn
Pn i=1 yi i=1 xi
i=1 yi xi − n
β̂1 = 2
Pn
Pn i=1 xi
2 −
i=1 xi n

Topic 3 Linear Regression DSA1101 Introduction to Data Science 31 / 52

1 Introduction

2 Forming Equation for Simple Linear Regression

Manual Practice
SLR in General
OLS in R

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 32 / 52

Ordinary Least Squares in R
We can check that the least squares estimates we computed manually are equivalent to
those returned by the lm() function in R:

> x = c( -1, 3, 5)
> y = c( -1, 3.5 , 3)
> lm(y~x)
Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
0.1250 0.7321
We now can write the tted model as
ŷ = 0.125 + 0.7321x

Topic 3 Linear Regression DSA1101 Introduction to Data Science 33 / 52

Ordinary Least Squares in R
With the tted model, we now can obtain the tted outcome (predicted outcome) value ŷ
given any value of the predictor, x.

For example, if x = 2, then the tted value for the outcome is

ŷ = 0.125 + 0.7321 × 2 = 1.589.

In R, we use function predict().

> M = lm(y~x) # M = name of the fitted model

> new = data.frame(x = 2) # create dataframe of new point
> predict(M, newdata = new)
1
1.589286

Topic 3 Linear Regression DSA1101 Introduction to Data Science 34 / 52

HDB Resale Flats Data Set
We now can answer the question in slide 14 on building a linear regression model that
estimates a HDB unit's resale price as a function of oor area in square meters.

The R code is

> price = resale$resale_price

> area = resale$floor_area_sqm
> lm(price~area)$coef # coefficients of the model
(Intercept) area
115145.730 3117.212

The tted model is then

ŷ = 115145.73 + 3117.21 ∗ area
where y is the resale price of a at, in SGD.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 35 / 52

1 Introduction

2 Forming Equation for Simple Linear Regression

Manual Practice
SLR in General
OLS in R

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 36 / 52

Goodness-of-t of a Model

The goodness-of-t of a model could be accessed by some measures. In this course, we

consider only two basic measurements:

▶ The signicance of the model by a test (F-test).

▶ Coecient of determination, R2 .

▶ When comparing the goodness-of-t of two models with the same response, we can use
Residual Standard Error (RSE) as a criterion.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 37 / 52

F-test of a Linear Model

To test if the whole model is signicant or not, we use F-test.

Its null hypothesis (H0 ) states model is NOT signicant. Its alternative (H1 ) states
model is signicant. Equivalently:

H0 : all the coecients, except intercept, are zero

versus
H1 : at least one of the coecients (except intercept), is NON-zero.

If the test has small p-value (such as <0.05), then data provide strong evidence against
H0 . Otherwise, we cannot eliminate H0 .

Topic 3 Linear Regression DSA1101 Introduction to Data Science 38 / 52

F-test of a Linear Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 39 / 52

F-test of a Linear Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 40 / 52

Coecient of Determination R2

The quantity R2 is dened as

T SS − RSS RSS
R2 = =1−
T SS T SS
n
X
where T SS = (yi − ȳ)2 is the total sum of squares .
i=1

T SS measures the total variance in the response in the given data, and can be thought of
as the amount of variability inherent in the response before the regression is performed.

RSS measures the amount of variability that is left unexplained after performing the
regression.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 41 / 52

Coecient of Determination R2

R2 measures the proportion of variability in the response Y that is explained by the tted
model. Larger R2 indicates better model t.

R2 directly
Deriving

> RSS =sum ((y- M$fitted )^2)

> TSS = var(y)*(length(y) -1)
> R2 = 1 - RSS/TSS; R2
[1] 0.822407

Or getting it from the model output, Multiple R-squared, or as below.

> summary(M)$r.squared
[1] 0.822407

Topic 3 Linear Regression DSA1101 Introduction to Data Science 42 / 52

Residual Standard Error (RSE)

RSE in simple linear regression is dened as

r n
1 X
RSE = RSS where RSS = (yi − ŷi )2
n−2
i=1

For the same response, one may t many dierent linear models, the one with larger RSE
indicates poorer model t.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 43 / 52

Residual Standard Error (RSE)
RSE could be obtained from the R output, at Residual standard error.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 44 / 52

1 Introduction

2 Forming Equation for Simple Linear Regression

Manual Practice
SLR in General
OLS in R

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 45 / 52

Settings

Suppose we have n observations. Each observation has an outcome y and multiple input
variables x1 , ..., xp .

We are interested in the linear relationship

y ≈ β0 + β1 x1 + β2 x2 + ... + βp xp

or equivalently
yi ≈ β0 + β1 x1i + β2 x2i + ... + βp xpi , i = 1, ..., n.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 46 / 52

Settings

The OLS method minimizes the RSS given by

n h
i 2
yi − β0 + β1 x1i + β2 x2i + ... + βp xpi
X
RSS = .
i=1

We rely on R to derive the minimizers, β̂0 , ..., β̂p .

Topic 3 Linear Regression DSA1101 Introduction to Data Science 47 / 52

MLR in R

The least squares estimates of β0 , β1 , β2 , ..., βp , are returned by the lm() function in R.

Consider a simulated data with x1 , x2 and response y where y is created as

1 2
(1 + 2x − 5x ) with some noise added.
> set.seed(250)
> x1 = rnorm(100) ; x2 = rnorm(100)
> y = 1 + 2*x1 -5*x2+ rnorm(100)

We now t a linear model, y ∼ x1 + x2 .

Topic 3 Linear Regression DSA1101 Introduction to Data Science 48 / 52

MLR in R

The tted model can be obtained from the R code below.

> lm(y~x1+x2)
Call:
lm(formula = y ~ x1 + x2)

Coefficients:
(Intercept) x1 x2
0.9362 1.7649 -4.9560

Topic 3 Linear Regression DSA1101 Introduction to Data Science 49 / 52

Adjusted R2 in MLR
A multiple linear model has R2 which is dened exactly as in simple linear regression, and
its meaning remains the same.

R2 can be inated simply by adding more regressors to the model (even insignicant
terms).

We have adjusted R2 , denoted as

2
Radj -which penalizes the model for adding regressors of
too little help to the model.

2 RSS/(n − p − 1)
Radj =1− .
T SS/(n − 1)

2
When comparing two models of the same response, the model with larger Radj is preferred.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 50 / 52

MLR for HDB Resale Flats

HDB ats are sold on 99-year leases. Hence, the older lease commence date (it's the date
the rst owner took the key from HDB), usually the lower resale price, given other
conditions are similar.

Hence, we may consider the number of years from the lease commence date till this year
as a quantitative regressor, called years_left.

> years_left = 2022 - resale$lease_commence_date

Can you try to t a linear model for the resale price with two regressors, oor area and the
years left (called model M2).

Topic 3 Linear Regression DSA1101 Introduction to Data Science 51 / 52

MLR for HDB Resale Flats

Write down the equation of model M2. Report the goodness-of-t of it.

Compared to the simple model (with only oor area as the regressor), which model would
you prefer and why?

Topic 3 Linear Regression DSA1101 Introduction to Data Science 52 / 52

DSA1101 2019 Week2 Part1
No ratings yet
DSA1101 2019 Week2 Part1
51 pages
Linear Regression Techniques Explained
100% (1)
Linear Regression Techniques Explained
44 pages
SLR Notes
No ratings yet
SLR Notes
96 pages
02 Linear Regression Models
No ratings yet
02 Linear Regression Models
206 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
16 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Chapter 14
No ratings yet
Chapter 14
18 pages
Engineering Curve Fitting Guide
No ratings yet
Engineering Curve Fitting Guide
44 pages
Regress A o Linear
No ratings yet
Regress A o Linear
8 pages
Ordinary Least Squares With A Single Independent Variable
No ratings yet
Ordinary Least Squares With A Single Independent Variable
6 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Data Science: OLS with NumPy/SciPy
No ratings yet
Data Science: OLS with NumPy/SciPy
6 pages
Linear Regression
No ratings yet
Linear Regression
97 pages
Part A Assignment - No - 4
No ratings yet
Part A Assignment - No - 4
14 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
ML Lecture # 02a LR (Solved Example)
No ratings yet
ML Lecture # 02a LR (Solved Example)
30 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Linear and Polynomial Regression
No ratings yet
Linear and Polynomial Regression
26 pages
CE 207 Lecture 10 - Linear Regression
No ratings yet
CE 207 Lecture 10 - Linear Regression
28 pages
Linear Least Sqaure S Regression
No ratings yet
Linear Least Sqaure S Regression
10 pages
Ch17 Curve Fitting
No ratings yet
Ch17 Curve Fitting
44 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
14 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
EEE 305: Regression Techniques
No ratings yet
EEE 305: Regression Techniques
12 pages
L4.5 Linear Regression 2023
No ratings yet
L4.5 Linear Regression 2023
47 pages
CVEN2002 Week11
No ratings yet
CVEN2002 Week11
49 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
ANUM 2012 Curve-Fitting
No ratings yet
ANUM 2012 Curve-Fitting
44 pages
Lecture 09
No ratings yet
Lecture 09
18 pages
Curve Fitting & Interpolation Guide
No ratings yet
Curve Fitting & Interpolation Guide
64 pages
Linear Regression for Beginners
No ratings yet
Linear Regression for Beginners
11 pages
Curve Fitting
No ratings yet
Curve Fitting
48 pages
Chapter Two
No ratings yet
Chapter Two
44 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Clase 11 Calculo Numerico I
No ratings yet
Clase 11 Calculo Numerico I
37 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
63 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
7 pages
Chapter 7 - New 1
No ratings yet
Chapter 7 - New 1
29 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Fe5209 3 Ay 2024
No ratings yet
Fe5209 3 Ay 2024
59 pages
Session 10
No ratings yet
Session 10
14 pages
Slides Prepared by John S. Loucks St. Edward's University
No ratings yet
Slides Prepared by John S. Loucks St. Edward's University
48 pages
Linear Regression Guide & Examples
No ratings yet
Linear Regression Guide & Examples
36 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
A Review On Linear Regression Comprehensive in Machine Learning
No ratings yet
A Review On Linear Regression Comprehensive in Machine Learning
8 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Machine Learning Regression Basics
No ratings yet
Machine Learning Regression Basics
20 pages
ML Unit
No ratings yet
ML Unit
23 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Topic1 R Introduction
No ratings yet
Topic1 R Introduction
58 pages
Topic5 Decision Trees
No ratings yet
Topic5 Decision Trees
66 pages
Topic2 Basic Prob Stats
No ratings yet
Topic2 Basic Prob Stats
47 pages
Topic4 KNN
No ratings yet
Topic4 KNN
49 pages
The Handbook of Mathematics and Computational Science
No ratings yet
The Handbook of Mathematics and Computational Science
24 pages
Tiger in The Zoo
No ratings yet
Tiger in The Zoo
5 pages
Parlor Games
No ratings yet
Parlor Games
13 pages
Resume Film
No ratings yet
Resume Film
1 page
Catalog Fortuner GR Sport Compressed
No ratings yet
Catalog Fortuner GR Sport Compressed
8 pages
Toolbox Solidworks 2016
No ratings yet
Toolbox Solidworks 2016
53 pages
Note - 2024-04-13 - 09-22-07 5 - Copy 4
No ratings yet
Note - 2024-04-13 - 09-22-07 5 - Copy 4
50 pages
MBMC Grade5
No ratings yet
MBMC Grade5
4 pages
Annex+A-2 Draft+Amended+Net+Metering+Agreement
No ratings yet
Annex+A-2 Draft+Amended+Net+Metering+Agreement
5 pages
CL - 2 - UIMO - Model Paper For Online Registered Users
No ratings yet
CL - 2 - UIMO - Model Paper For Online Registered Users
21 pages
problems-A.C. Circuit Complex
No ratings yet
problems-A.C. Circuit Complex
2 pages
High Quality Knitting in The Nordic Tradition Instant EPUB Download
0% (1)
High Quality Knitting in The Nordic Tradition Instant EPUB Download
15 pages
Basic Marine Engineering For Maritime Students
100% (5)
Basic Marine Engineering For Maritime Students
55 pages
AR Parts AR-6
No ratings yet
AR Parts AR-6
3 pages
Tle
No ratings yet
Tle
7 pages
Pembuatan Tawas Dari Limbah Kaleng Alumunium
No ratings yet
Pembuatan Tawas Dari Limbah Kaleng Alumunium
8 pages
How The Rib of Adam Is Incorrectly Translated
No ratings yet
How The Rib of Adam Is Incorrectly Translated
5 pages
Medieval English Architecture Guide
No ratings yet
Medieval English Architecture Guide
4 pages
11th Physics Book Back Questions With Answers in English
No ratings yet
11th Physics Book Back Questions With Answers in English
29 pages
Fixed Drug Eruptions PDF
No ratings yet
Fixed Drug Eruptions PDF
7 pages
Skin Disease in Travelers Premium Ebook Download
No ratings yet
Skin Disease in Travelers Premium Ebook Download
14 pages
District Resource Centre Mahbubnagar: Physics Paper: Ii
No ratings yet
District Resource Centre Mahbubnagar: Physics Paper: Ii
4 pages
Dynamic Response Analysis of Induction Motor Drive Influenced by Controller Design Methods
No ratings yet
Dynamic Response Analysis of Induction Motor Drive Influenced by Controller Design Methods
9 pages
June 2018 Question Paper 11
No ratings yet
June 2018 Question Paper 11
28 pages
Pythagorean Triples Guide
No ratings yet
Pythagorean Triples Guide
9 pages
Summer 2122 Aubf Lab Periodical Test 2
No ratings yet
Summer 2122 Aubf Lab Periodical Test 2
38 pages
Essay Patriotism
100% (2)
Essay Patriotism
6 pages
SAE 1065 Steel Composition Guide
No ratings yet
SAE 1065 Steel Composition Guide
2 pages
COMPOUND-SDS - INDONESIA-English - Jayaboard (2023)
No ratings yet
COMPOUND-SDS - INDONESIA-English - Jayaboard (2023)
6 pages
Science Teaching Reflection
No ratings yet
Science Teaching Reflection
2 pages

Topic3 Linear Regression

Uploaded by

Topic3 Linear Regression

Uploaded by

Linear Regression

Topic 3  Linear Regression DSA1101 Introduction to Data Science 1 / 52

2 Forming Equation for Simple Linear Regression

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3  Linear Regression DSA1101 Introduction to Data Science 2 / 52

2 Forming Equation for Simple Linear Regression

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3  Linear Regression DSA1101 Introduction to Data Science 3 / 52

Often we assume models of the form

where f (x) is a function that maps the predictor(s) to the outcome.

One example is the linear regression model.

Topic 3  Linear Regression DSA1101 Introduction to Data Science 4 / 52

Topic 3  Linear Regression DSA1101 Introduction to Data Science 5 / 52

Topic 3  Linear Regression DSA1101 Introduction to Data Science 6 / 52

Forecasting models can be built to predict

Source: The Business Times

Topic 3  Linear Regression DSA1101 Introduction to Data Science 7 / 52

Medical: Linear regression model can be

Multiple input variables might include

Source: The Business Times

Topic 3  Linear Regression DSA1101 Introduction to Data Science 8 / 52

Pharmaceutical Industry:: Linear

Input variables may include age, gender

Source: The Business Times

Topic 3  Linear Regression DSA1101 Introduction to Data Science 9 / 52

Finance: Linear regression is used to

Source: The Business Times

Topic 3  Linear Regression DSA1101 Introduction to Data Science 10 / 52

Real estate: Linear regression analysis

Such a model helps set or evaluate the list

The model could be further improved by

Source: The Business Times

Topic 3  Linear Regression DSA1101 Introduction to Data Science 11 / 52

Data on resale HDB prices based on

Available as the data set

Topic 3  Linear Regression DSA1101 Introduction to Data Science 12 / 52

> resale = read.csv("C:/Data/hdbresale_reg.csv")

Topic 3  Linear Regression DSA1101 Introduction to Data Science 13 / 52

> head(resale[ ,8:11])

How to form such function?

Topic 3  Linear Regression DSA1101 Introduction to Data Science 14 / 52

2 Forming Equation for Simple Linear Regression

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3  Linear Regression DSA1101 Introduction to Data Science 15 / 52

We are interested in the linear relationship

Topic 3  Linear Regression DSA1101 Introduction to Data Science 16 / 52

2 Forming Equation for Simple Linear Regression

3 Assessing the Goodness-of-t of the Model

4 Multiple Linear Regression

Topic 3  Linear Regression DSA1101 Introduction to Data Science 17 / 52

Plot of the three data points.

Topic 3  Linear Regression DSA1101 Introduction to Data Science 18 / 52

We are interested in the linear relationship

There are many dierent straight lines

Topic 3  Linear Regression DSA1101 Introduction to Data Science 19 / 52

Intuitively, we want the line to be as close

This closeness can be measured in terms

The line that is closest to the data points

Topic 3  Linear Regression DSA1101 Introduction to Data Science 20 / 52

Topic 3  Linear Regression DSA1101 Introduction to Data Science 21 / 52

The residual for each point may be positive or negative.

The squared residuals are in the last column.

Topic 3  Linear Regression DSA1101 Introduction to Data Science 22 / 52

For the 3 data points, we have

RSS = e21 + e21 + e23

Topic 3  Linear Regression DSA1101 Introduction to Data Science 23 / 52

RSS = [−1 − β0 + β1 ]2 + [3.5 − β0 − 3β1 ]2 + [3 − β0 − 5β1 ]2 .

Topic 3  Linear Regression DSA1101 Introduction to Data Science 24 / 52

Consider RSS as a function in terms of β0 and β1 . Let's call it h(β0 , β1 ).

Topic 3  Linear Regression DSA1101 Introduction to Data Science 25 / 52

−11 + 6β0 + 14β1 = 0

Topic 3  Linear Regression DSA1101 Introduction to Data Science 26 / 52

2 Forming Equation for Simple Linear Regression

Topic 3 Linear Regression DSA1101 Introduction to Data Science 1 / 52

3 Assessing the Goodness-of-t of the Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 2 / 52

3 Assessing the Goodness-of-t of the Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 3 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 4 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 5 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 6 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 7 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 8 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 9 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 10 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 11 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 12 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 13 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 14 / 52

3 Assessing the Goodness-of-t of the Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 15 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 16 / 52

3 Assessing the Goodness-of-t of the Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 17 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 18 / 52

There are many dierent straight lines

Topic 3 Linear Regression DSA1101 Introduction to Data Science 19 / 52

This closeness can be measured in terms

Topic 3 Linear Regression DSA1101 Introduction to Data Science 20 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 21 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 22 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 23 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 24 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 25 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 26 / 52

3 Assessing the Goodness-of-t of the Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 27 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 28 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 29 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 30 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 31 / 52

3 Assessing the Goodness-of-t of the Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 32 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 33 / 52

For example, if x = 2, then the tted value for the outcome is

Topic 3 Linear Regression DSA1101 Introduction to Data Science 34 / 52

The tted model is then

Topic 3 Linear Regression DSA1101 Introduction to Data Science 35 / 52

3 Assessing the Goodness-of-t of the Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 36 / 52

The goodness-of-t of a model could be accessed by some measures. In this course, we

▶ The signicance of the model by a test (F-test).

Topic 3 Linear Regression DSA1101 Introduction to Data Science 37 / 52

To test if the whole model is signicant or not, we use F-test.

H0 : all the coecients, except intercept, are zero

Topic 3 Linear Regression DSA1101 Introduction to Data Science 38 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 39 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 40 / 52

The quantity R2 is dened as

Topic 3 Linear Regression DSA1101 Introduction to Data Science 41 / 52

Or getting it from the model output, Multiple R-squared, or as below.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 42 / 52

RSE in simple linear regression is dened as

Topic 3 Linear Regression DSA1101 Introduction to Data Science 43 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 44 / 52

3 Assessing the Goodness-of-t of the Model

Topic 3 Linear Regression DSA1101 Introduction to Data Science 45 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 46 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 47 / 52

We now t a linear model, y ∼ x1 + x2 .

Topic 3 Linear Regression DSA1101 Introduction to Data Science 48 / 52

The tted model can be obtained from the R code below.

Topic 3 Linear Regression DSA1101 Introduction to Data Science 49 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 50 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 51 / 52

Topic 3 Linear Regression DSA1101 Introduction to Data Science 52 / 52