Essentials of
Research
Day 1
Dr Shweta
Pandey
Quantitative estimate of linear correlation
Karl Pearson’s correlation coefficient ( ) : Interval or ratio scale; linear; normally
distributed
Significance of correlation coefficient
i.e. null hypothesis of no correlation in the population against the
alternative hypothesis that there is a significant correlation
Where rxy is the sample correlation coefficient between x and Y
and t statistic has n-2 degrees of freedom.
If computed absolute value of t > tabulated value of t with n-2
degrees of freedom then, the null hypothesis is rejected.
In SPSS if p-value <0.05 then null hypothesis will be rejected at
5% level of significance
What is regression?
Dependent variable
(y)
Independent variable (x)
• It depicts the variation in a dependent variable using one or more independent
variables.
• It is an explanation of causation
• If the independent variable sufficiently explain the variation in the dependent
variable, the model can be used for prediction.
• Dependent variable= Ratio/Interval/Continuous
• Independent variable= Continuous or Categorical
Simple linear regression
Simple Linear Regression
Depicted as y= b0+ b1x1+e
Dependent variable
y = b 0 + b1 x ± e Where
є
y is the dependent/effect variable
b1 = slope
x is the independent/ regressor/causal variable,
b0 (y intercept) = ∆y/ ∆x
e is the error
(y)
Independent variable (x) b0 and b1 are parameters that must be estimated
so that the equation best represents a given
data
b1= slope i.e. what is estimated change in
average value of y with 1-unit change in x1
Simple linear regression fits a straight line to
the data.
Multiple Regression
The percentage of variation in the dependent variable explained by the
independent variables is known as the Coefficient of Determination, and is often
referred to R²
H0: There is no relationship between the dependent and independent variables
H1: There is a significant relationship between the dependent and independent
variables
If probability of the F statistic for the overall regression relationship is <0.001,
less than or equal to the level of significance of 0.05, we reject the null
hypothesis (R² = 0).
6
Coefficient of Determination
The Coefficient of Determination
The value of R² can range between 0 and 1, and the higher its value the more
accurate the regression model is.
It is independent of units of measurement and can therefore be used for
comparing goodness of fit of two regression equations.
Rrule of thumb:
correlation <=0.20 is characterized as very weak;
>= 0.20 and less than or equal to 0.40 is weak;
>= 0.40 and less than or equal to 0.60 is moderate;
>= 0.60 and less than or equal to 0.80 is strong;
>= 0.80 is very strong.
Types of Multiple Regression
1. Standard multiple regression is used to evaluate the relationships between a
set of independent variables and a dependent variable.
2. Hierarchical, or sequential, regression is used to examine the relationships
between a set of independent variables and a dependent variable, after
controlling for the effects of some other independent variables on the dependent
variable.
3. Stepwise, or statistical, regression is used to identify the subset of
independent variables that has the strongest relationship to a dependent
variable.
8
Standard Multiple Regression: Session 1.3 Activity 1 data
To compute a multiple
regression in SPSS,
select the Regression |
Linear command from
the Analyze menu.
9
Variables
First, move the
dependent variable
Demand to the
Dependent text box.
Third, select the
method for entering
Second, move the the variables into the
independent variables analysis from the
price and income to drop-down Method
the Independent(s) list menu. In this
box. example, we accept
the default of Enter
for direct entry of all
variables, which
Fourth, click on the produces a standard
Statistics… button multiple regression.
to specify the
statistics options
that we want.
Standard Multiple Regression: Statistics
First, mark the
checkboxes for
Estimates on the
Regression
Coefficients Third,
panel. Collinearity
statistics for
multi-
collinearity
Fourth, click on the issues
Second, mark the
Continue button to
checkboxes for
close the dialog box.
Model Fit and
Then click OK
Descriptives.
Standard Multiple Regression
1. All the independent variables are entered at the same time
2. R² measures the strength of the relationship between the set of independent
variables and the dependent variable.
3. An F test is used to determine if the relationship can be generalized to the
population represented by the sample.
4. A t-test is used to evaluate the individual relationship between each
independent variable and the dependent variable.
12
Dummy Variables:
• Dependent variable may be influenced by the qualitative variables- gender,
marital status, profession, geographical region, and religion etc.
• To quantify the qualitative variables, dummy variables are used.
• The number of dummy variables in a regression model equals the number of
categories of data less one.
• Dummy variable may take two values such as zero, one; ten, eleven; or any
other such value.
• Dummy variables could also be used to examine the moderator effect
between two variables.
Dummy Variables: Categorical variables
Suppose dependent variable is impacted by categorical/nominal variables
Example cosmetics ordered = f(Price, Gender)
Here we use DUMMY variables
Number of Dummy variables= No of groups-1= 2-1
Define Gender:
Value=0 if female
= 1 if male
Session 1.3 Activity 2 data
Dummy Variables
Salary, Y= f( Experience, Gender)
Gender (D)
= 1 if male
= 0 if female
Dummy Variables Y= 17.231+1.545*Experience
+3.286 Gender
All p values<0.05 so significant
Interpretation: Gender= 0, Female
For females:
Salary= 17.231+1.545 Experience
For males:
Salary= 17.231+1.545*Experience +
3.286
Average salary of males >Average
salary of females if experience is
kept constant
Session 1.3 Activity 2
Multi-Collinearity
VIF (Variance Inflation Factor)<5 indicates no multi-collinearity issue
Session 1.3 Activity 2 dummy variables data
Moderating effects:
Suppose we want to see moderating impact of years of experience on the
starting salary of males' vs females
Salary= f( X, DX)
X= Experience
D= Dummy variable for gender
Value=0 if female
= 1 if male
Moderating effects: Session 1.3 Activity 2 Data
Compute variable DX in SPSS
Moderating effects: Session 1.3 Activity 2
Suppose we want to see moderating impact of years of experience on the starting salary of males vs
females
Moderating effects: Session 1.3 Activity 2
Suppose we want to see moderating impact of years of experience on the starting salary of males vs
females
Moderating effects: Session 1.3 Activity 2
Y (Salary)= 18.964+1.225X+
0.639 DX
D=0 for females
Female Salary= 18.964+1.225X+
0.639*0
D=1 for males
Male salary= 18.964+1.225X+
0.639X
Thus male salary is Peso 639
higher per month for every year of
experience as compared to
female lecturers.
Moderator Variable
Consider Y = a + b1x + b2z + b3xz
• If b3 is insignificant and b2 is significant, than z is not a moderator
variable but simply an independent predictor variable.
• If b2 is insignificant and b3 is significant, than z is a PURE moderator
variable.
• If both b2 and b3 are significant, than z is a QUASI moderator variable.
• Our example both coefficient of Experience and Gender*Experience
(DX) are significant so Gender is a Quasi Moderator
Standard multiple regression
1. Dependent variable metric?/ Independent variables metric or dichotomous?
2. Ratio of cases to independent variables at least 5 to 1?
3. Probability of ANOVA test of regression less than/equal to level of significance?
4. Probability of relationship between each IV and DV <= level of significance?
5. Coefficient of determination value
HIEARCHICAL MULTIPLE REGRESSION
1. In hierarchical multiple regression, the independent variables are entered in
two stages.
2. In the first stage, the independent variables that we want to control for are
entered. In the second stage, the independent variables whose relationship
we want to examine after the controls are entered.
3. A statistical test of the change in R² from the first stage is used to evaluate
the importance of the variables entered in the second stage.
25
Hierarchical multiple regression
Happiness= F(age, Sex,
Health, life)
Control variables:
Age, Sex
To compute a multiple
regression in SPSS,
select the Regression |
Linear command from
the Analyze menu.
Specify Control variables
First, move the
dependent
variable to the
Dependent text
box.
Second, move the Fourth, click on the
independent Next button to tell
variables to control SPSS to add another
for age and sex to block of variables to
the Independent(s) the regression
list box. analysis.
Third, select the method for entering
the variables into the analysis from
the drop down Method menu. In this
example, we accept the default of
Enter for direct entry of all variables in
the first block which will force the
controls into the regression.
Add the other independent variables
1. SPSS identifies that
we will now be adding
variables to a second
block.
Move the other
independent variables-
Health, Life to the
Independent(s) list box
for block 2. Click on the
Statistics… button
to specify the
statistics options
that we want.
Specify the statistics output options
Mark the checkboxes for Model Fit,
Descriptive, and R squared change.
The R squared change statistic will tell
us if the variables added after the
controls have a relationship to the
dependent variable.
Hierarchical multiple regression
1. Dependent variable metric?/ Independent variables metric or
dichotomous?
2. Ratio of cases to independent variables at least 5 to 1?
3. Probability of F test of for change in R² less than or equal to level of
significance?
4. Change in R² correctly reported?
5. Probability of relationship between each IV added after controls and DV
less than or equal to level of significance?
6. Direction of relationship between each IV added after controls and DV
interpreted correctly?
STEPWISE MULTIPLE REGRESSION
1. Find the most parsimonious set of predictors that are most effective in
predicting the dependent variable.
2. Variables are added to the regression equation one at a time, using the
statistical criterion of maximizing the R² of the included variables.
31
Request a stepwise multiple regression
To compute a multiple
regression in SPSS,
select the Regression |
Linear command from
the Analyze menu.
Specify variables and method for selecting
variables
First, we move the dependent
variable income98 to the
Dependent text box.
Second, move the
independent variables to
control for hrs1,
prestg80, educ, and
degree to the
Independent(s) list box.
Third, select the Stepwise method
for entering the variables into the
analysis from the drop down
Method menu.
Open statistics options dialog box
Click on the Statistics…
button to specify the statistics
options that we want.
Specify the statistics output options
First, mark the
checkboxes for
Estimates on the
Regression
Coefficients panel.
Third, click on
the Continue
Second, mark button to close
the checkboxes the dialog box.
for Model Fit and
Descriptives.
Request the regression output
Click on the OK
button to request
the regression
output.
Relationship between dependent and independent
variables Model Summary
Adjusted Std. Error of
Model R R Square R Square the Estimate
1 .492 a .242 .237 3.607
2 .532 b .283 .273 3.522
a. Predictors: (Constant), RS HIGHEST DEGREE
b. Predictors: (Constant), RS HIGHEST DEGREE, RS
OCCUPATIONAL PRESTIGE SCORE (1980)
The Multiple R for the relationship between the
subset of independent variables that best predict
the dependent variable is 0.532, which would be
characterized as moderate
Relationship between dependent and independent
variables
Variables Entered/Removeda
The most important predictor of Variables Variables
Model Entered Removed Method
total family income is highest 1 Stepwise
(Criteria:
academic degree. Probabilit
y-of-F-to-e
RS
nter <=
HIGHEST .
.050,
The second most important DEGREE
Probabilit
y-of-F-to-r
predictor of total family income is emove >=
.100).
occupational prestige score. 2 Stepwise
(Criteria:
RS Probabilit
OCCUPATI y-of-F-to-e
ONAL nter <=
.
PRESTIGE .050,
SCORE Probabilit
(1980) y-of-F-to-r
emove >=
.100).
a. Dependent Variable: TOTAL FAMILY INCOME
Stepwise multiple regression
1. Dependent variable metric?/ Independent variables metric or dichotomous?
2. Ratio of cases to independent variables at least 5 to 1?
3. Probability of ANOVA test of regression less than/equal to level of significance?
4. Strength of relationship for included variables interpreted correctly?
5. Is the stated order of importance independent variables correct?
6. Probability of F test of for change in R² less than or equal to level of significance?
7. Change in R² correctly reported?
8. Probability of relationship between each IV added after controls and DV less than or
equal to level of significance?
9. Direction of relationship between each IV added after controls and DV interpreted
correctly?