1
CHAPTER 8
Draw a scatter plot/diagram to see relationship between two variables. Understand and interpret the terms dependent variable and independent variable. Find linear regression model and make predictions. Study on the strength of the relationship called correlation analysis.
2
CHAPTER 8
In a simple linear relationship, only TWO variables are involved:
X = independent variable Y = dependent variable
CHAPTER 8
Examples: 1. A sociologist wants to find out if increase in crime rate is due to increase in cost of living. X = cost of living Y = crime rate 2. A fitness instructor wants to find out the relationship between weight loss and the amount of workout time. X = amount of workout time Y = weight
5
CHAPTER 8
A plot between the pairs (x, y) values. To examine relationship between two variables, X and Y.
Gives general idea whether X is related to Y.
Plots that give a certain pattern means there is a relationship between X and Y. Plots that have no particular pattern means there is no relationship between X and Y.
7
CHAPTER 8
Increasing pattern. As X increases, Y also increases.
Positive linear relationship between X and Y.
8
CHAPTER 8
Decreasing pattern. As X increases, Y decreases.
Negative linear relationship between X and Y.
9
CHAPTER 8
No particular pattern.
No relationship between X and Y.
10
CHAPTER 8
Question: You are a marketing analyst for Hasbro Toys. You gather the following data: Ad (RM) 1 2 3 4 5 Sales (Units) 1 1 2 2 4
11
Sketch a scatter plot of the data above.
CHAPTER 8
Answer:
Sales, Y
4 3 2 1 0
0 1 2 3 4 5
Advertising, X
1. Is X and Y related? 2. Positive or Negative Relationship?
12
13
CHAPTER 8
A mathematical equation that describes the linear relationship between X and Y. Can be used to predict the values of Y from known values of X. Represents a straight line, so it is of the form y=mx + c, where m is the slope and c is the y-intercept.
14
CHAPTER 8
In statistical regression, we write the linear model as
Y = + X +
where = y-intercept = slope = random error component
15
CHAPTER 8
This regression line is usually estimated by using the paired sample data. The estimated regression line is given by
Y ' a bX
where
a = estimated b = estimated
16
CHAPTER 8
The method used to find the values of a and b is slightly different from the familiar method you learned in algebra.
Uses the concept of Least-Square Method.
17
CHAPTER 8
Formula to estimate a and b:
n( XY ) ( X )( Y ) b 2 2 n( X ) ( X ) Y X a b n n
Now we can fit the regression line to the data using the values of a and b. The estimated regression line is
Y ' a bX
18
CHAPTER 8
Question: You are an economist for the county cooperative. You gather the following data. Fertilizer (lb.) 4 6 10 12 Yield (lb.) 3.0 5.5 6.5 9.0
Find the estimated regression line relating crop yield and fertilizer.
19
CHAPTER 8
Answer: Construct this table first. X Y X XY
4
6 10
3.0
5.5 6.5
16
36 100
12
33 65
12
Total: Mean: 32 8
9.0
24.0 6
144
296
108
218
20
CHAPTER 8
Answer: Using values from the table, estimate a and b.
4(218 ) (32 )( 24 ) b 0.65 2 4(296 ) (32 )
a 6 0.65(8) 0.8
Therefore, the estimated regression line is
Y ' 0.8 0.65X
21
CHAPTER 8
Answer:
Yield (Y)
10 8 6 4 2 0
0
Fertilizer (X)
y .8 .65x
10
15
22
CHAPTER 8
Answer: What do a and b in the regression line means? 1. Y-intercept, a = 0.8 Average Crop Yield (Y) is expected to be 0.8 lb. when no Fertilizer (X) is used. X = 0, Y = 0.8 2. Slope, b = 0.65 Crop Yield (Y) is expected to increase by 0.65 lb. for each 1 lb. increase in Fertilizer (X).
23
CHAPTER 8
Question: A student wants to know the relationship between number of pages and the price of the book. To analyze this, he selects a sample of 8 textbooks currently on sale in a bookstore. Develop a regression line to fit the data given.
24
CHAPTER 8
Question:
Book History Algebra Geometry Physics Sociology Biology Statistics Nursing No. of Pages (X) 500 700 800 600 400 500 600 800 Price (Y) 84 75 99 72 69 81 63 93
25
CHAPTER 8
Answer: Construct this table first.
X 500 700 Y 84 75 X 250,000 490,000 XY 42000 52500
800
600 400 500
99
72 69 81
640,000
360,000 160,000 250,000
79200
43200 27600 40500
600
800 Total: Mean: 4900 612.5
63
93 636 79.5
360,000
640,000 3150,000
37800
74400 397,200
26
CHAPTER 8
Answer: Using values from the table, estimate a and b.
8(397200 ) (4900 )( 636 ) b 0.0514 2 8(3150000 ) (4900 )
a 79.5 0.0514(612.5) 48
Therefore, the estimated regression line is
Y ' 48 0.0514X
27
CHAPTER 8
Now, that we have estimated the regression line, we can predict Y given any values of X. This can be found by substituting X into the estimated regression line, Y ' a bX However, the value of X to insert in the equation must be within the range of X in the data set.
28
CHAPTER 8
For Example 3, predict the price of the book that has 550 pages.
Y ' 48 0.0514(550) 76.27
Thus, if the book is 550 page thick, the price is estimated to be RM76.27
REMEMBER! To predict Y , X must have values within the data set range.
29
30
CHAPTER 8
Correlation measures the strength of a linear relationship between two variables. (strong? weak?)
Correlation coefficient tells us about the strength and direction of a relationship.
31
CHAPTER 8
A numerical measure for correlation of the quantitative data is the Pearson correlation coefficient, r. The formula is given by
[n(X ) (X ) ][nY Y ]
2 2 2 2
n(XY ) (X )(Y )
32
CHAPTER 8
0r1 Values of r close to 1 strong positive linear relationship between X and Y.
Values of r close to -1 strong negative linear
relationship between X and Y. Values of r close to 0 little or no linear relationship between X and Y.
33
CHAPTER 8
Question: A food analyst wants to know how much a person would spend on food, given certain amount of income. He selects a random sample of 7 people with their income and food expenditure as shown below.
Income (RM 00) 35 49 21 39 15 28 25
Food Expend. (RM 00)
15
11
34
CHAPTER 8
Question: (i) Find the estimated regression line for the data.
(ii) How much would a person spend on food if his income is RM 3000?
(iii) Compute Pearson correlation coefficient, r. Interpret the r value.
35
CHAPTER 8
Answer: Construct this table first.
Income, X 35 Food Exp, Y 9 X 1225 Y 81 XY 315
49
21 39 15
15
7 11 5
2401
441 1521 225
225
49 121 25
735
147 429 75
28
25 Total: Mean: 212 30.2857
8
9 64 9.1429
784
625 7222
64
81 646
224
225 2150
36
CHAPTER 8
Answer:
7(2150 ) (212 )( 64 ) b 0.2642 2 7(7222 ) (212 )
a 9.1429 0.2642(30.2857) 1.1414
(i) Therefore, the estimated regression line is
Y ' 1.1414 0.2642X
The slope, b = 0.2642 means the relationship is positive. That is, people with higher income will spend more on food.
37
CHAPTER 8
Answer: (ii) If income is RM3000, that is X=30, then food expenditure is
Y ' 1.1414 0.2642(30) 9.0674
So we expect him to spend RM906.74 on food if his income is RM3000.
38
CHAPTER 8
Answer: (iii) Pearson correlation coefficient, r
7(2150) (212)(64) [7(7222) (212) 2 ][7646 64 ]
2
0.9587
The value r = 0.9587 shows a very strong positive relationship between income and food expenditure. When income is high, the food expenditure also increases.
39