Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
81 views8 pages

Ch. 8 Measures of Association

Uploaded by

Etsub Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views8 pages

Ch. 8 Measures of Association

Uploaded by

Etsub Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CHAPTER 8: MEASURES OF ASSOCIATION

Session Learning Objectives:

By the end of this session students are expected to:

 Explain the purpose of correlation analysis


 Understand the use of scatter plots to study association
 Compute covariance and Karl Pearson’s coefficient of correlation
 Apply linear regression analysis to estimate the linear relationship between two
variables using the least square method
 Identify the coefficient of regression
 Describe and compute the coefficient of determination

Coefficient of Correlation: A measure of strength of linear relationship between 2 variables. It


ranges between -1 & 1. -1 denotes perfect negative relationship, 1 denotes perfect positive linear
relationship.

As r gets closer to (approximates) -1 or 1 strong correlation is implied; 0 indicates no linear


relationship; and r near 0 shows weak relationship. Correlation can be simple, partial and
multiple; positive and negative; linear and non linear. ~ does not necessarily imply causation.
Correlation could be clearly depicted on a scatter diagram.

E.g. high school GPA Vs. college GPA; Price vs Demand/quantity sold; Number of children vs.
annual demand. Height and weight; hot weather and ice cream consumption

Covariance: indicated direction of relationship; + = positive linear relationship (I,III)

-=negative relationship(II,IV) ; 0= no linear relationship (Even in all quadrants) It indicates the


location/quadrant of the points /values with greatest influence thus shows the direction.

Methods: i) scatter diagram ii) Karl Pearson method iii) Spearman’s rank correlation method

Simple Linear Regression: The simplest type of regression analysis involving one independent
variable and one dependent variable, in which the relationship between the variables is
approximated by a straight line.

1
It is a statistical method used to estimate unknown value of one from known value of another
given they are correlated.

Dependent variable: variable being predicted; Independent variable: Predictor variable.

Simple Linear Regression Model: explains how y is related to x and E.

Y=B0+B1X+E B0 ; B1 are parameters of the model and E accounts for the variability that
cannot be attributed to the relationship between x and y.

a+bx+E=Y E=Error assuming E=0 Regression equation of Y on X.

Estimated simple linear regression equation: y=bx+a is developed from sampe data using the
least square method

As it can be recalled, the coefficient of correlation, except for -1, 0, and +1, we cannot precisely
interpret its meaning. We can judge the coefficient of correlation in relation to its proximity to
only -1, 0, and +1. Fortunately, another measure that can be precisely interpreted is the
coefficient of determination, which is calculated by squaring the coefficient of correlation. For this
reason, it is denote as R2. The coefficient of determination measures the amount
(proportion) of variation in the dependent variable that is explained (accounted for) by the
variation in the independent variable. For instance, if the coefficient of correlation r =0.8711,
thus, the coefficient of determination is r2 (0.8711)2 = 0.7588; this indicates that 75.88% of the
variation in the dependent variable is explained by the independent variable. The remaining
24.12% is unexplained. The value of r2 ranges between 0 and 1 (inclusive). It cannot be
negative, hence it does not show the direction of relationship. When r 2 = 1, all the points on the
scatter diagram fall on the regression line and the entire variations are explained by the straight
line. On the other hand, when r2 = 0, none of the points on the scatter diagram falls on the
regression line, meaning thereby that there is no relationship between the two variables.

Example
ABU Furniture is a family business that has been selling to retail customers in the Merkato area
for many years. The company advertises extensively on radio, TV, and the Internet, emphasizing
low prices and easy credit terms. The owner would like to review the relationship between sales
and the amount spent on advertising. Below is information on sample of sales and advertising
expense for the last four months.
2
Month July August September October
Advertising Expense
2 1 3 4
(x) ($ million)
Sales Revenue(y)
7 3 8 10
($ million)

The owner wants to forecast sales on the basis of advertising expense.


a) Which variable is the dependent variable? Which variable is the independent variable?

b) Draw a scatter diagram.

c) Determine the sample covariance and interpret


d) Determine the correlation coefficient and interpret the result
e) Determine the estimated regression equation.

f) Interpret the values of a and b.

g) Draw the estimated regression line on the scatter diagram

h) Estimate sales when $3 million is spent on advertising.

i) Identify the coefficient of regression (byx)


j) Compute the coefficient of determination (r2) and interpret.

Answer

a) Independent variable: Advertisement expense; Dependent variable: Sales

b) Refer to g.

c) Covariance: Cov(x,y)= ∑(x- *(y- )/n-1 =11/4-1 = 11/3=3.67

Interpretation:
Cov(x,y)=3.667; there is a positive covariance between advertising expense and sales
d) Correlation coefficient: r = Cov (x,y)/Sx.Sy = 3.667/ (1.2910*2.9439)=0.9648
3
or

r= = =0.9648

Note:

Sx= = =1.2910 Sy= = =2.9439

Interpretation:
r=0.9648; there is a positive strong correlation between the advertising expense and sales

e) Estimated regression equation y=a+bx: =1.5+2.2X

Note:

b= =11/5=2.2 a=7-2.2(2.5)=1.5

f) The slope is 2.2. This indicates that an increase of $1 million in advertising will result in an
increase of $2.2 million in sales.
The intercept is 1.5. If there was no expenditure for advertising, sales would be $1.5 million.
g) The straight line graph of the estimated regression equation is drawn on the scatter diagram
(refer to a)
X 1 2 3 4
3.7 5.9 8.1 10.3

h) =1.5+2.2(3)=8.1million
i)Coefficient of regression (byx) is the slope (b)=2.2

b= Sx and Sy are respective standard deviations; r is the correlation coefficient.

This is another formula that can be used to find b; find data from d; b= =2.2
j) Coefficient of Determination: r was 0.9648, r2=0.9308
Interpretation: Ninety-three percent of the variation in sales is accounted for by advertising
expense.
4
Exercise
1. The following data refer to two variables promotional expenses and sales (1000 dollars)
collected in the context of a promotional study.
Promotional Expenses 10 12 15 23 20
Sales 14 17 23 25 21

a. Which variable is the dependent variable? Which variable is the independent variable?
b. Draw the scatter diagram
c. Calculate the covariance and interpret.
d. Calculate the correlation coefficient and interpret.
e. Determine the estimated regression equation.
f. Interpret the values of a and b.
g. Draw the estimated regression line on the scatter diagram
h. Estimate sales when $30 thousand is spent on advertising.
i. Identify the coefficient of regression (byx)
j. Compute the coefficient of determination (r2) and interpret.
Synopsis:

 The scatter diagram depicts relationships graphically; the covariance and the
coefficient of correlation describe the linear relationship numerically for interval or
ration scale data.
 Simple correlation analysis, which is concerned with measuring the relationship
between only one independent variable and the dependent variable.
 There are assumptions concerning (population) simple correlation analysis
 Correlation coefficient does not necessarily mean causation
 Correlation coefficient lies between -1 and 1 inclusive
 r=1 indicates positive perfect correlation, and r=-1 is negative perfect correlation;
while r=0 implies no correlation
 positive sign indicates direct relationship, while negative sign implies inverse
relationship

 Sample covariance Cov (x,y)=


 (x i  x )( yi  y )
n 1

 Pearson’s sample correlation coefficient r= or

5
r=

 The purpose of the simple linear regression equation is to quantify a linear relationship
between two variables (that are interval or ratio scale).
 In regression analysis, we estimate one variable based on another variable; the
variable being estimated is the dependent variable; and the variable used to make the
estimate or predict the value is the independent variable.
 The least squares criterion is used to determine the regression equation:

where a is the y intercept & b is the slope or coefficient of regression.

a and b are determined using the appropriate formula


 The slope of regression line is called the regression coefficient (byx)
 The coefficient of determination is a more objective measure of the degree of
relationship; it gives the percentage variation in the dependent variable that is
accounted for by the independent variable.
 The square of the correlation coefficient (r) is the coefficient of determination r 2; its
value ranges between 0 and 1 inclusive.
Wrap up Discussion Questions:

 Identify tools used to measure association?


 Does correlation imply cause and effect relationship?
 What is the difference between covariance and correlation?
 Explain how Pearsonian correlation coefficient is interpreted and what it measures
 What is the purpose of regression analysis?
 What is the difference between correlation coefficient and regression
 What is the coefficient of regression
 Explain the least square principle
 What is coefficient of determination? How is it measured?

Topic: Linear Correlation: Testing the Significance of Correlation Coefficient

Session Learning Objectives:

By the end of this session students are expected to:

6
 Test the significance of Correlation Coefficient of population

Reading Assignment Discussion:

 Why is test of significance of correlation coefficient conducted for the population?

Reading Text:

Typically, the null hypothesis of interest is that the population correlation =0, for if this
hypothesis is rejected at a specified level, we would conclude that there is a relationship
between the two variables in the population. The hypothesis can also be formulated as a one-tail
test. Given that the assumptions in Session 56’s reading text are satisfied, the following sampling
statistic involving r is distributed as the t distribution with degrees of freedom, df= n-2, when
=0:

Example

For a sample of n=10 loan recipients at a finance company, the correlation coefficient between
household income and amount of outstanding short-term debt is found to be r=+0:50.
a. Test the hypothesis that there is no correlation between these two variables for the entire
population of loan recipients, using the 5 percent level of significance.
b. Interpret the meaning of the correlation coefficient which was computed.
H 0: =0, H1: ≠0
Critical t (df = 10-2=8, =0.05) = ± 2:306

= =1.634

Because the computed t statistic of +1.634 is not in a region of rejection, the null hypothesis
cannot be rejected, and we continue to accept the assumption that there is no relationship
between the two variables. The observed sample relationship can be ascribed to chance at the 5
percent level of significance.
(b) Based on the correlation coefficient of r = 0:50, we might be tempted to conclude that
2
because r = 0.25, approximately 25 percent of the variance in short-term debt is explained
statistically by the amount of household income. This is true for the sample data. However,

7
because the null hypothesis in part (a) above was not rejected, a more appropriate interpretation
for the population is that none of the variance in Y can be assumed to be associated with
changes in X.

Exercise
1. A sample of 25 mayoral campaigns in medium-sized cities with populations between
50,000 and 250,000 showed that the correlation between the percent of the vote received
and the amount spent on the campaign by the candidate was 0.43. At the 0.05
significance level, is there a positive association between the variables?
2. Ethiopian Petroleum Corporation is studying the relationship between the pump price of
gasoline and the number of gallons sold. For a sample of 20 stations last Tuesday, the
correlation was 0.78. At the 0.01 significance level, is the correlation in the population
greater than zero?

Synopsis:

 Hypothesis test of the coefficient of correlation (r) can be made for making effective
generalizations.
 To test a hypothesis that a population correlation is different from 0, we use the following
statistic:

with n − 2 degrees of freedom

Wrap up Discussion Questions:

 How is test of significance of correlation conducted for the population?

You might also like