Multiple Regression and
Correlation
Dr. Carlo Magno
Y = a + bX
Bivariate correlation
y = b1x1 + b2x2 + ... + bnxn + c
Multiple correlation
Multiple Regression– association between
a criterion variable and two or more
predictor variables (Aron & Aron, 2003).
Multiple correlation coefficient = R
Using two or more variables to predict a
criterion variable.
Onwuegbuzie, A. J., Bailey, P, & Daley, C. E. (2000). Cognitive, affective,
personality, and demographic predictors of foreign-language achievement.
The Journal of Educational Research, 94, 3-15.
Cognitive
Academic Ach.
Study Habits
Expectation
Foreign
Affective
Language
Perception
Achievement
Anxiety
Personality
Cooperativeness
Competitiveness
Demographic
Gender
Age
Espin, C., Shin, J., Deno, S. L., Skare, S., Robinson, S., & Brenner, B. (2000).
Identifying indicators of written expression proficiency for middle school students.
The Journal of Special Education, 34, 140-153.
Words written
Words correct
Characters
Sentences
Character/Word Written
Word/sentences Expression
Correct word sentences Proficiency
Incorrect Word sentences
Correct minus incorrect
word sentences
Mean length of correct
word sentences
Results
Regression coefficient (β) /Beta weight– Distinct
contribution of a variable, excluding any overlap
with other predictor variables. Unstandardized
simple regression coefficient
Standardized regression coefficient - converted
variables (independent and dependent) to z-
scores before doing the regression. Indicates
which independent variable has most effect on
the dependent variable.
Results
Multiple correlation coefficient (R) – the
correlation between the criterion variable and all
the predictor variables taken together.
Squared Correlation Coefficient (R2) –The
percent of variance in the dependent variable
explained collectively by all of the independent
variables.
R2adjusted - assessing the goodness of fit of a
regression equation. How well do the predictors
(regressors), taken together, explain the
variation in the dependent variable.
R2adj = 1 - (1-R2)(N-n-1)/(N-1)
R2adj
above 75% as very good;
50-75% as good;
25-50% as fair;
below 25% as poor and perhaps
unacceptable. R2adj values above 90% are
rare in psychological data
Residual - The deviation of a particular
point from the regression line (its
predicted value).
t-tests - used to assess the significance of
individual b coefficients.
F test - The F test is used to test the
significance of R,
F = [R2/k]/[(1 - R2 )/(n - k - 1)].
Considerations in using
multiple regression:
The units (usually people) observed
should be a random sample from some
well defined population.
The dependent variable should be
measured on an interval, continuous
scale.
The independent variables should be
measured on interval scales
Considerations in using
multiple regression:
The distributions of all the variables
should be normal
The relationships between the dependent
variable and the independent variable
should be linear.
Although the independent variables can
be correlated, there must be no perfect (or
near-perfect) correlations among them, a
situation called multicollinearity.
Considerations in using
multiple regression:
There must be no interactions (in the
anova sense) between independent
variables
a rule of thumb for testing b coefficients is
to have N >= 104 + m, where m = number
of independent variables.
Reporting regression results:
“The data were analyzed by multiple regression,
using as regressors age, income and gender.
The regression was a rather poor fit (R2adj =
40%), but the overall relationship was significant
(F3,12 = 4.32, p < 0.05). With other variables held
constant, depression scores were negatively
related to age and income, decreasing by 0.16
for every extra year of age, and by 0.09 for
every extra pound per week income. Women
tended to have higher scores than men, by 3.3
units. Only the effect of income was significant
(t12 = 3.18, p < 0.01).
Partial Correlation
In its squared form is the percent of
variance in the dependent uniquely and
jointly attributable to the given
independent when other variables in the
equation are controlled
Stepwise Regression
y = ß0 + ß1x1 + ß2x2 + ß3x3 + ß4x4 + ß5x5 +
ß6x6 + ß7x7 + ß8x8 + ß9x9 + ß10x10 + ß11x11 +
ß12x12 + ß13x13 + ß14x14 + ß14x14 + ε
choose a subset of the independent
variables which "best" explains the
dependent variable.
Heirarchical Regression
1) Forward Selection
Start by choosing the independent
variable which explains the most variation
in the dependent variable.
Choose a second variable which explains
the most residual variation, and then
recalculate regression coefficients.
Continue until no variables "significantly"
explain residual variation.
Stepwise Regression
2) Backward Selection
Start with all the variables in the model, and
drop the least "significant", one at a time, until
you are left with only "significant" variables.
3) Mixture of the two
Perform a forward selection, but drop variables
which become no longer "significant" after
introduction of new variables.
Hierarchical Regression
The researcher determines the order of
entry of the variables.
F-tests are used to compute the
significance of each added variable (or set
of variables) to the explanation reflected in
R-square
an alternative to comparing betas for
purposes of assessing the importance of
the independents
Categorical Regression
Used when there is a combination of
nominal, ordinal, and interval-level
independent variables.