10/12/13
Three segments
Overview
Calculation of regression coefficients
Assumptions
Statistics One
Lecture 7
Introduction to Regression
1
Regression: Overview
Important concepts & topics
Simple regression vs. multiple regression
Regression equation
Regression model
Lecture 7 ~ Segment 1
Regression: Overview
10/12/13
Regression: Overview
Regression: Overview
Regression: a statistical analysis used to
predict scores on an outcome variable,
based on scores on one or multiple
predictor variables
Example: IMPACT (see Lab 2)
An online assessment tool to investigate the
effects of sports-related concussion
http://www.impacttest.com
Simple regression: one predictor variable
Multiple regression: multiple predictors
5
IMPACT: Correlations preinjury
IMPACT example
IMPACT provides data on 6 variables
Verbal memory
Visual memory
Visual motor speed
Reaction time
Impulse control
Symptom score
7
10/12/13
IMPACT: Correlations preinjury
IMPACT: Correlations postinjury
IMPACT: Correlations postinjury
10
IMPACT example
For this example, assume:
Symptom Score is the outcome variable
Simple regression example:
Predict Symptom Score from just one variable
Multiple regression example:
Predict Symptom Score from two variables
11
12
10/12/13
Regression equation
Regression equation
Y = m + bX + e
Y = B0 + B1X1 + e
Y is a linear function of X
m = intercept
b = slope
e = error (residual)
Y is a linear function of X1
B0 = intercept = regression constant
B1 = slope = regression coefficient
e = error (residual)
13
14
Model R and R2
IMPACT example
R = multiple correlation coefficient
Y = B0 + B1X1 + e
R = rY
The correlation between the predicted scores
and the observed scores
R2
The percentage of variance in Y explained by
the model
15
Let Y = Symptom Score
Let X1 = Impulse Control
Solve for B0 and B1
In R, function lm
16
10/12/13
IMPACT example
IMPACT example
= 20.48 + 1.43(X)
r = .40
R2 = 16%
17
18
Regression model
Regression: It gets better
The regression model is used to model or
predict future behavior
The goal is to produce better models so
we can generate more accurate
predictions
The model is just the regression equation
Later in the course we will discuss more
complex models that consist of a set of
regression equations
Add more predictor variables, and/or
Develop better predictor variables
19
20
10/12/13
IMPACT example
IMPACT example
Y = B0 + B1X1 + B2X2 + e
Let Y = Symptom Score
Let X1 = Impulse Control
Let X2 = Verbal Memory
Solve for B0 and B1 and B2
= 4.13 + 1.48(X1) + 0.22(X2)
R2 = 22%
In R, function lm
21
IMPACT example
22
Model R and R2
R = multiple correlation coefficient
R = rY
The correlation between the predicted scores
and the observed scores
R2
The percentage of variance in Y explained by
the model
23
24
10/12/13
IMPACT example
Segment summary
Important concepts & topics
R2 = 22%
Simple regression vs. multiple regression
Regression equation
Regression model
rY = .47
25
END SEGMENT
26
Lecture 7 ~ Segment 2
Calculation of regression coefficients
27
28
10/12/13
Estimation of coefficients
Estimation of coefficients
Regression equation:
The values of the coefficients (e.g., B1) are
estimated such that the regression model
yields optimal predictions
Y = B0 + B1X1 + e
= B0 + B1X1
Minimize the residuals!
(Y ) = e (residual)
29
Estimation of coefficients
30
IMPACT example
Ordinary Least Squares estimation
Minimize the sum of the squared (SS)
residuals
SS.RESIDUAL = (Y )2
31
32
10/12/13
Estimation of coefficients
Estimation of coefficients
Sum of Squared deviation scores (SS) in variable
Y
Sum of Squared deviation scores (SS) in variable
X
SS.Y
SS.X
SS.Y
SS.X
33
34
Estimation of coefficients
Estimation of coefficients
Sum of Cross Products
Sum of Cross Products = SS of the Model
SP.XY
SP.XY = SS.MODEL
SS.Y
SS.Y
SP.XY
SS.MODEL
SS.X
SS.X
35
36
10/12/13
Estimation of coefficients
Estimation of coefficients
SS.RESIDUAL = (SS.Y SS.MODEL)
Formula for the unstandardized coefficient
B1 = r x (SDy/ SDx)
SS.Y
SS.RESIDUAL
SS.MODEL
SS.X
37
38
Estimation of coefficients
Segment summary
Formula for the standardized coefficient
Important concepts
If X and Y are standardized then
Regression equation and model
Ordinary least squares estimation
Unstandardized regression coefficients
Standardized regression coefficients
SDy = SDx = 1
B = r x (SDy/ SDx)
= r
39
40
10
10/12/13
END SEGMENT
Lecture 7 ~ Segment 3
Assumptions
41
42
Assumptions
Assumptions
Assumptions of linear regression
Assumptions of linear regression
Normal distribution for Y
Linear relationship between X and Y
Homoscedasticity
Reliability of X and Y
Validity of X and Y
Random and representative sampling
43
44
11
10/12/13
Assumptions
Anscombes quartet
Assumptions of linear regression
Normal distribution for Y
Linear relationship between X and Y
Homoscedasticity
45
46
Anscombes quartet
Anscombes quartet
Regression equation for all 4 examples:
To test assumptions, save residuals
Y = B0 + B1X1 + e
= 3.00 + 0.50(X1)
e = (Y )
47
48
12
10/12/13
Anscombes quartet
Anscombes quartet
Then examine a scatterplot with
X on the X-axis
Residuals on the Y-axis
49
50
Segment summary
Assumptions when interpreting r
Normal distributions for Y
Linear relationship between X and Y
Homoscedasticity
Examine residuals to evaluate assumptions
END SEGMENT
51
52
13
10/12/13
END LECTURE 7
53
14