0% found this document useful (0 votes)

37 views24 pages

Lesson 3-Multiple Linear Regression

Uploaded by

Innoj Maco

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views24 pages

Lesson 3-Multiple Linear Regression

Uploaded by

Innoj Maco

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

ENGINEERING DATA ANALYSIS

University of Southeastern Philippines

COLLEGE OF ENGINEERING
Obrero, Davao City

MATH 212
ENGINEERING DATA ANALYSIS

DALIA M. RECONALLA, Ph.D

August 2020

1|Page
ENGINEERING DATA ANALYSIS

Faculty Information:

Name: Dalia M. Reconalla

Email: [email protected]
Contact Number: 0906-209-6611
Office: College of Engineering
Contact Number: (082) 224-3334
Consultation Hours: By appointment - may be arranged through:
 Official email
 Facebook messenger/Facebook group chat
 Text or call

Getting help

For academic concerns (College/Adviser - Contact details)

For administrative concerns (College Dean - Contact details)
For UVE concerns (KMD - Contact details)
For health and wellness concerns (UAGC, HSD and OSAS - Contact
details)

2|Page
ENGINEERING DATA ANALYSIS

TABLE OF CONTENTS

CONTENTS PAGE

Cover page ………………………………… 1

Faculty Information ……………………………….... 2

Table of Contents ………………………………… 3

Lesson 3 ………..……………………………….. 14

Application 3…………………………………………. 15

Module Summary ………………………………… ... 16

Module Assessment ……………………………….. 17

References ………………………………………….. 18

F-Distribution Table …………………………………. 19

3|Page
ENGINEERING DATA ANALYSIS

Learning Outcome:

o Estimate the value of the response variable from a given value of independent
variable.
o Conduct test of hypothesis of the significance about the regression line.

Time Frame: Week 13

Introduction

In most research problem where regression analysis is applied, more than one
independent variable is needed in the regression model. The complexity of most
scientific mechanism is such that in order to be able to predict an important response,
a multiple regression model is needed. When this model is linear in the coefficients, it
is called multiple linear regression model.

Activity

Given a simple linear regression model y = 30.04 + 0.897x , for the

intelligence test score x and the freshmen Math 121 grades y of group of engineering
students. What could be the grade of a student randomly selected with intelligence
test score of 75?

Analysis

Sketch the graph and the regression line and interpret the model.

Abstraction

From Lesson 2(Simple Linear Regression) we learned that when there is one
independent variable or predictor, the regression equation for predicting y from x is

The simultaneous use of two or more independent variables in predicting a dependent

variable is called multiple regression.

When there are two independent variables,

4|Page
ENGINEERING DATA ANALYSIS

̂
where:
̂ = the predicted value
a = the y-intercept
the expected change in y when changes one unit and remains
constant,
value of the first independent variable,
the expected change in y when changes one unit and remains
constant, and
= the value of the second independent variable.
i = number of observations

The equation for two independent variables can be extended to any number of
independent variables, say, k, such as , the mean of y│
( read as y given ) is given by the multiple
regression model:

(k = number of independent variables)

and the estimated response is obtained from the sample regression equation

̂ +... +

Where each regression coefficient is estimated by from the sample data

using the method of least square.

Estimating Coefficients

We shall obtain the least squares estimators of the parameters ,..., by

fitting the multiple linear regression model

To the data points {( n > k},

where is the observed response to values of the k
independent variables

E satisfies the equation

where and are the random error and residual, respectively, associated with
the response

In using the concept of least squares to arrive at the estimates ,

we minimize the expression

5|Page
ENGINEERING DATA ANALYSIS

SSE = ∑ ∑ .

Differentiating SSE in turn with respect to and equating to

zero, we generate the set of k + 1 normal equations:

∑ ∑ ∑ ∑

. . . . .
. . . . .
. . . .
∑ ∑ ∑ ∑
∑

These equations can solve for by any appropriate method for

solving systems of linear equations and further using vector-matrix approach. But
this method is a tedious process, hence estimating coefficients requires the use of
computer program.

This time, we will only consider two independent variables as example, for ease of
computation using algebraic manual computations where:

̂ where regression coefficients are

determined from the system of equation:

∑ ∑ ∑

∑ ∑ ∑ ∑

Example 1: The average monthly electric power consumption (y) at a certain

manufacturing plant is considered to be linearly dependent on the ambient
temperature ( and the number of working days in a month ( Consider a one
year data given in the table.
a. Determine the least-squares estimates of the associated linear regression
coefficients. Find the regression equation representing the average electric power
consumption in terms of ambient temperature and number of working days in a
month.
b. Estimate the average monthly electric power consumption if the plants average
ambient temperature is 48 and the number of working days in a month is 22.

6|Page
ENGINEERING DATA ANALYSIS

Solution: Formulating the equation of regression coefficients:

n = 2 , i = 12. From the system of equations:

Solving for the values of the variables (presented in the table)

Formulating the equation of regression coefficients: n = 2 , i = 12, substituting the

values of the variables:

7|Page
ENGINEERING DATA ANALYSIS

Thus, the system of equation of regression coefficients is:

Find the values of using algebraic method, determinants, or matrices.

The estimated regression equation based on the data represented by the equation

Interpretation:

For every unit change in the ambient temperature, there correspond a 0.39 increase
in average the monthly electric power consumption, holding the number of working
days in a month constant. Likewise, for every increase in the working days in a month
by the company, there is a 10.80 increase in the average monthly power
consumption holding the ambient temperature constant.

b. Estimate the average monthly electric power consumption if the plants average
ambient temperature is 48 and the number of working days in a month is 22.

From the equation ̂ where and

̂
= 222.48

Properties of the Least Squares Estimator

For the linear regression equation

y=x

an unbiased estimate of variance is given by the error or residual mean square

8|Page
ENGINEERING DATA ANALYSIS

where
∑ ∑ ̂

The sum of squares identity

∑ ̅ =∑ ̂ ̅ +∑ ̂ continues to hold.

Sum of squares identity

SST = SSR + SSE

with SST = ∑ ̅ = total sum of squares

and SSR = ∑ ̂ ̅ regression sum of squares.

There are k degrees of freedom associated with SSR and , as always,

SST has n – 1 degrees of freedom.

Inference in Multiple Regression

In the multiple regression analysis, the response variable is described as a function of

more than one predictor variable. Therefore, there are several types of inferences that
can be made using this model. In the simple linear model studied earlier, the test for
the slope (t-test) is equivalent to the test for the utility of the model (F-test). However,
in the multiple regression they differ on the account of having more than one slope
parameter.

A Test of Model Adequacy

To find a statistic that measures how well a multiple regression model fits a set of
data, we use the multiple regression equivalent of , the coefficient of determination
for the straight-line model. Thus, we define the multiple coefficient of determination
, as

Thus, we define the multiple coefficient of determination , as

∑ ̂
∑ ̅
=1

where: ̂ = the predicted value using the underlying model.

= the fraction of the sample variation of the y values
(measured by SSyy) that is explained by the least-squares prediction equation.

Thus, = 0 implies a complete lack of fit of the model to the data, and
= 1 implies a perfect fit, with the model passing through every data point.

In general, , and the larger the value of , the better the model fits the data.

9|Page
ENGINEERING DATA ANALYSIS

The fact that is a sample statistic implies that it can be used to make inference
about the utility of the entire model for predicting the population of y values at each
setting of the independent variables.

Testing the Utility of Multiple Regression Model: The Global F-Test

: At least one of the parameters is nonzero.

Rejection Region: F >

Using the p-value approach: Reject if p value < , where the

p-value = P(F(k, n – [k + 1] > F ).

Conditions:

1. The error component is normally distributed.

2. The mean of is zero.
3. The errors associated with different observations are independent.

Analysis of Variance (ANOVA)

The analysis of variance table for multiple regression problem provides a test of the
null hypothesis

which implies that response variable is not related to any of the k

input variables.

Analysis of Variance (ANOVA) Table

10 | P a g e
ENGINEERING DATA ANALYSIS

The tail values of the F-distribution are given in Tables . The F-test statistic becomes
large as the coefficient of determination becomes large.

To determine how large F must be before we can conclude at a

given significance level that the model is useful for predicting y, we set up the
rejection region (RR) as F > (k, n – [k + 1]).

Example 2. From the given data in Example 1, decide, at the 5% significance level,
whether the data provide sufficient evidence to conclude that the ambient temperature
and the monthly consumption and the number of working days in a month (predictor
variables) are useful for predicting the average monthly power consumption(response
variable).

Solution:

Step 1. State the null and alternative hypotheses.

of the parameters is nonzero.

Step 2 Decide on the significance level, α.

Perform the hypothesis test at the 5% significance level, or α = 0.05.

Step 3. Compute the value of the test statistic (F)

k = 2, n = 12
SSE = ∑ ̂ = 2004.7456

11 | P a g e
ENGINEERING DATA ANALYSIS

SST = ∑ ̅ = 6707.667
Finding

∑ ̂
∑ ̅
=1

= = 10.56 .

We can also find F using the formula:

Solving for the means

MSR = = = 2351.4607
MSE = = 222.7495

Step 4. Decide whether to accept or to reject

Compare

Since = 4.2565¸

(4.2565) (Refer to the F Distribution Table –Appendix

Table 1 in this module)

Since (4.2565), we reject the null hypothesis and

conclude that at 5% level of significance, there is sufficient evidence to support that
the ambient temperature and the number of working days in a month can be used to
predict the average monthly power consumption. Likewise , it can be concluded that
average monthly power consumption is linearly related to either ambient temperature
or number of working days in a month or both.

Multiple Correlation

The correlation between y and the combined predictors x1, x2, . . . , xk is

called the coefficient of multiple correlation and is denoted by , or
simply R.

12 | P a g e
ENGINEERING DATA ANALYSIS

The dot after y in the notation separates the dependent variable, y, from the
independent variables, x1, x2, . . . , xk.

For the two predictor case, is given by

where
and are correlation coefficients for the respective variables.
The multiple regression coefficient can assume values from 0 to 1, where 0 indicates
the absence of a linear multiple correlation between y and the independent variables
and 1 indicates a perfect linear multiple correlation in which all of the observed y’s
fall on the regression plane.

Coefficient of Multiple Determination. The proportion of variance in y

accounted for by the combined predictors x1, x2, . . . , xk is obtained by squaring
the multiple correlation coefficient and is called the coefficient of multiple
determination, R2.

This coefficient is an extension of the coefficient of determination for one predictor,

r2 discussed in Lesson 1.

For , =
√

For , =√

A comparison of the value of R2 with that for r2 indicates the improvement in

predicting y that can be achieved by using a multiple regression equation instead of a
one-predictor regression equation.

Table 1. Intercorrelation among the Variables

Variable
Variable y
y 1.000

1.000

13 | P a g e
ENGINEERING DATA ANALYSIS

The coefficient of multiple determination will be relatively large when the correlation
of each of the predictors with y is large and the correlations among the predictors are
0 or very small.

In fact, if the independent variables are uncorrelated,

If correlations exist among some or all of the independent variables, it is usually the
case that

The presence of nonzero correlations among the independent variables is referred to

as multicollinearity.

Extreme multicollinearity occurs when one independent variable is a linear function

of other independent variables; for example, x2 might equal 3x1, or might equal

In the latter case, the inclusion of in the regression equation would not account for
any variance in y not already accounted for by and Ideally, you would like to
have predictors that have high correlations with the dependent variable and zero
correlations with each other. Unfortunately in the behavioral sciences, health sciences,
and education, it is difficult to find predictors that meet these criteria. Once you have
found three or four good predictors, it is often difficult to find additional predictors
that are not highly correlated with at least one of the original predictors.

Application

1. Construct a table showing the intercorelation among the variables, average

monthly electric power consumption (y) , the ambient temperature ( and the
number of working days in a month ( in Example 1 and determine the
multiple coefficient of determination . Verify if there exist a multicollinearity
between the dependent and the independent variables.

Closure
Congratulations! You have successfully completed the tasks and activities
for Lesson 3. It is expected that your knowledge about correlation and regression will
surely help you in solving other real life problems or practical applications involving
predictions or estimation.
You are almost done with this module. The module summary and assessment will
follow.

14 | P a g e
ENGINEERING DATA ANALYSIS

SUMMARY

o A measure of the degree of linear relationship is called

correlation coefficient, r.

 The value of r is a measure of the extent to which x and y are

linearly related
 The value of r does not depend on the unit of
measurement for either variable
 The value of r does not depend on which of the two
variables is considered x.
 The value of r is between 1 and +1.
 A correlation coefficient of r =1 occurs only when all
the points in a scatterplot of the data lie exactly on a
straight line that slopes upward. Similarly, r = 1 only
when
all the points lie exactly on a downward-sloping line.

o Regression Analysis is a statistical technique used for

determining the functional form of the relationship between
two or more variables, where one variable is called the
dependent or response variable and the rest are called the
independent or explanatory variables.

o The coefficient of determination, denoted by r 2, gives the

proportion of variation in y that can be attributed to an
approximate linear relationship between x and y.

15 | P a g e
ENGINEERING DATA ANALYSIS

MODULE ASSESSMENT

Solve the following problems.

1. Regression methods were used to analyze the data from a study investigating
the relationship between roadway surface temperature (x) and pavement
deflection ( y). The data follow :

Given the data above:

(a)estimate the intercept and slope regression coefficients. Write
the estimated regression line.
(b) Find the standard error of the slope and intercept coefficients.
(c) Compute the coefficient of determination, 2. Comment on the value.
(d) Use a t-test to test for significance of the intercept and slope coefficients
at = 0.05.
(e) Draw the regression line

2. The article “How to Optimize and Control the Wire Bonding Process”
described an experiment carried out to assess the impact of the variables

and temperature(degree Celsius), on ball bond shear strength(gm),y. The

following data were generated:

16 | P a g e
ENGINEERING DATA ANALYSIS

a. Find the regression equation representing the ball bond shear strength in terms
force and temperature.
b. Determine whether the data provide sufficient evidence to conclude that the
force and temperature are useful for predicting the ball bond shear strength at 5%
level of significance.

17 | P a g e
ENGINEERING DATA ANALYSIS

References

Broto, A.S. (2007). Simplified Approach to Inferential Statistics(1st ed.). National .

Philippines.

Carambas, Zenaida U(2011). Basic probability and Statistics. Valencia Educational

Supply. Baguio City

Peck, R., Olsen, C. and Devore, J.L. (2012): Introduction to Statistics and Data
Analysis(4th edition). Brooks/Cole/Cengage Learning, 20 Channel
Center Street Boston, MA 02210, USA

Ott, R.L., Longnecker, M. (2010). An Introduction to Statistical Methods and Data

Amalysis(6th ed). Brooks/Cole, Cengage Learning, CA, USA.

Raussas, George(2003). Introduction to Probability and Statistical Inference.

Elseviere Science, USA

Walpole, RE, & Myers, RH.(1993). Probability and Statistics for Engineers and (5th
ed.). Macmillan Publishing Company, New York.

Weiss, N.A. (2012). Elementary Statistics (8th ed.)Addison-Wesley. Pearson

Education, Inc. Boston, MA.

Woodbury, George(2002): An Introduction to Statistics(1st ed.) Thomson Learning,

Inc. Thomson Learning, USA

18 | P a g e
ENGINEERING DATA ANALYSIS

Appendix Table 1. F Distribution

19 | P a g e
ENGINEERING DATA ANALYSIS

20 | P a g e
ENGINEERING DATA ANALYSIS

21 | P a g e
ENGINEERING DATA ANALYSIS

22 | P a g e
ENGINEERING DATA ANALYSIS

23 | P a g e
ENGINEERING DATA ANALYSIS

24 | P a g e

Module 3 Regression and Interpolation
No ratings yet
Module 3 Regression and Interpolation
88 pages
Tugas 6 Analisis Multivariat Data Panel
No ratings yet
Tugas 6 Analisis Multivariat Data Panel
11 pages
Least Squares Curve Fitting: Numerical Methods
No ratings yet
Least Squares Curve Fitting: Numerical Methods
39 pages
Module 3 Regression and Interpolation
No ratings yet
Module 3 Regression and Interpolation
65 pages
Chapter-4 Curve Fitting PDF
No ratings yet
Chapter-4 Curve Fitting PDF
17 pages
Curve Fitting & Interpolation Guide
No ratings yet
Curve Fitting & Interpolation Guide
64 pages
Least Squares Regression Techniques
No ratings yet
Least Squares Regression Techniques
44 pages
Sol - PQ220 6234F.Ch 11
No ratings yet
Sol - PQ220 6234F.Ch 11
13 pages
Curve Fitting Techniques Guide
No ratings yet
Curve Fitting Techniques Guide
32 pages
Regression with Dummy Variables
No ratings yet
Regression with Dummy Variables
37 pages
Curve Fitting (Lecturers)
No ratings yet
Curve Fitting (Lecturers)
27 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Engineering Curve Fitting Guide
No ratings yet
Engineering Curve Fitting Guide
44 pages
CHP 4 Activities
No ratings yet
CHP 4 Activities
7 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
19 pages
6 Multiple Regression
No ratings yet
6 Multiple Regression
26 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
GLMM Introduction for Tree Breeders
No ratings yet
GLMM Introduction for Tree Breeders
47 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
Topic09. Multiple Regression
No ratings yet
Topic09. Multiple Regression
36 pages
Curve Fitting & Regression Techniques
No ratings yet
Curve Fitting & Regression Techniques
48 pages
Numerical Methods: Curve Fitting
No ratings yet
Numerical Methods: Curve Fitting
19 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
7 pages
Econometrics - Sheet 2A - MR - 2024
No ratings yet
Econometrics - Sheet 2A - MR - 2024
3 pages
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
No ratings yet
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
38 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Polynomial Curve Fitting
No ratings yet
Polynomial Curve Fitting
44 pages
Regression & Correlation
No ratings yet
Regression & Correlation
44 pages
Curve Fitting - Lecturers - 2
No ratings yet
Curve Fitting - Lecturers - 2
21 pages
Stat2 2023 Syllabus B v1.0 Weeks 5-6-7
No ratings yet
Stat2 2023 Syllabus B v1.0 Weeks 5-6-7
41 pages
Curve Fitting: There Are Two General Approaches For Curve Fitting
No ratings yet
Curve Fitting: There Are Two General Approaches For Curve Fitting
63 pages
Ch17 Curve Fitting
No ratings yet
Ch17 Curve Fitting
44 pages
Regression
No ratings yet
Regression
12 pages
Cea Ece069 Sas-17-1
No ratings yet
Cea Ece069 Sas-17-1
9 pages
END4810 Inventory Lecture4
No ratings yet
END4810 Inventory Lecture4
24 pages
Curve Fitting & Regression Techniques
No ratings yet
Curve Fitting & Regression Techniques
25 pages
Section 2
No ratings yet
Section 2
22 pages
Econometrics Lecture Note Chapter 3
No ratings yet
Econometrics Lecture Note Chapter 3
39 pages
Curve Fitting & Interpolation Guide
No ratings yet
Curve Fitting & Interpolation Guide
48 pages
Paper On Polynomial Regression
No ratings yet
Paper On Polynomial Regression
7 pages
Sample Questions
No ratings yet
Sample Questions
12 pages
Lab 05-1
No ratings yet
Lab 05-1
6 pages
Mathematical Modeling for Engineers
No ratings yet
Mathematical Modeling for Engineers
9 pages
4-Regression Analysis 2
No ratings yet
4-Regression Analysis 2
10 pages
Regression Problems - Dec 18 Class
No ratings yet
Regression Problems - Dec 18 Class
15 pages
EEPC102 Module - 6 Lesson 2
No ratings yet
EEPC102 Module - 6 Lesson 2
12 pages
Sheet 2A - MR - 2021
No ratings yet
Sheet 2A - MR - 2021
3 pages
Chapter 1 Linear Regression Notes (As FS2)
No ratings yet
Chapter 1 Linear Regression Notes (As FS2)
29 pages
WST 311 Notes Part 2 2024
No ratings yet
WST 311 Notes Part 2 2024
21 pages
11+ Merged
No ratings yet
11+ Merged
36 pages
Experiment 6 - Linear Systems, Regression, Curve Fitting, and Interpolation
No ratings yet
Experiment 6 - Linear Systems, Regression, Curve Fitting, and Interpolation
24 pages
Linear Ization 1
No ratings yet
Linear Ization 1
24 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
Introduction To Estimation
No ratings yet
Introduction To Estimation
9 pages
Lecture 6
No ratings yet
Lecture 6
33 pages
Lesson 1-Correlation
100% (1)
Lesson 1-Correlation
12 pages
Lec 2
No ratings yet
Lec 2
50 pages
Intro to Simple Linear Regression
0% (1)
Intro to Simple Linear Regression
50 pages
Clase 11 Calculo Numerico I
No ratings yet
Clase 11 Calculo Numerico I
37 pages
Handout 3 Non Stationarity
No ratings yet
Handout 3 Non Stationarity
27 pages
Reading 7a
No ratings yet
Reading 7a
26 pages
Uct633 Est 23
No ratings yet
Uct633 Est 23
3 pages
Curve Fitting, B-Splines & Approximations
No ratings yet
Curve Fitting, B-Splines & Approximations
14 pages
7 Regression Analysis
No ratings yet
7 Regression Analysis
23 pages
Graduate Statistics Standards
No ratings yet
Graduate Statistics Standards
9 pages
Lesson 1 - Time Series Basics
No ratings yet
Lesson 1 - Time Series Basics
23 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Notes 3
No ratings yet
Notes 3
4 pages
Academic CV for Data Science Experts
No ratings yet
Academic CV for Data Science Experts
10 pages
Simple Linear Regression and Correlation
100% (1)
Simple Linear Regression and Correlation
52 pages
Excel Regression for College Students
No ratings yet
Excel Regression for College Students
14 pages
History of Curriculum Development
No ratings yet
History of Curriculum Development
10 pages
CH 2
No ratings yet
CH 2
31 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Quiz 5 Chap 6
No ratings yet
Quiz 5 Chap 6
5 pages
8.3 Residuals and Outliers Notes FILLED IN
No ratings yet
8.3 Residuals and Outliers Notes FILLED IN
3 pages
CTSCAN - Image Characteristics and Reconstruction
0% (1)
CTSCAN - Image Characteristics and Reconstruction
2 pages
Bull Heritability Analysis
No ratings yet
Bull Heritability Analysis
4 pages
EC203 Tutorial 12 Time Series 16
No ratings yet
EC203 Tutorial 12 Time Series 16
4 pages
Estiiiiiiiii Terbaru
No ratings yet
Estiiiiiiiii Terbaru
4 pages
Multiple-Choice Test Linear Regression Regression: y X y X y X
No ratings yet
Multiple-Choice Test Linear Regression Regression: y X y X y X
2 pages
Lampiran 3. Hasil Analisis Data Descriptive Statistics
No ratings yet
Lampiran 3. Hasil Analisis Data Descriptive Statistics
5 pages
Imacd and Linreg
No ratings yet
Imacd and Linreg
2 pages
Regression in Excel - Easy Excel Tutorial
No ratings yet
Regression in Excel - Easy Excel Tutorial
6 pages
Statistics Assignment Guide
No ratings yet
Statistics Assignment Guide
2 pages
Comprehensive Guide to Data Analysis
No ratings yet
Comprehensive Guide to Data Analysis
1 page
Answer Assignment 2
No ratings yet
Answer Assignment 2
6 pages
PLUM - Ordinal Regression: Warnings
No ratings yet
PLUM - Ordinal Regression: Warnings
3 pages
Statisticians' Guide to MSEP
No ratings yet
Statisticians' Guide to MSEP
8 pages