Unit 2
1. We want to calculate the correlation r between two variables X and Y for a sample of
individuals. Which of the following conditions are necessary for r to be a meaningful
measure of association?
(I) X and Y are both quantitative variables.
(II) The relationship between X and Y is linear.
(III) X is an explanatory variable and Y is a response variable.
(A) I only
(B) II only
(C) I and II only
(D) I and III only
(E) I, II and III
2. Determine whether the correlation for each of the following pairs of variables is most
likely positive or negative:
(I) X = Speed of wind in a snowstorm
Y = Visibility
(II) X = Global supply of oil
Y = Price of gasoline
(III) X = Number of people in line at a bank when you arrive
Y = Time until you are served by a teller
(A) (I) negative, (II) positive, (III) positive
(B) (I) positive, (II) positive, (III) positive
(C) (I) negative, (II) negative, (III) positive
(D) (I) positive, (II) negative, (III) negative
(E) (I) negative, (II) negative, (III) negative
3. Which of the following pairs of variables would be the most likely to have a correlation
close to r = 0.5?
(A) Select a sample of commercial airline flights leaving from the airport one day:
X = flight distance in kilometres; Y = flight distance in miles
(B) Select a sample of grocery stores:
X = price of orange juice; Y = amount of orange juice sold
(C) Select a sample of STAT 1000 students:
X = number of incorrect answers on the midterm test; Y = score on the test
(D) Select a sample of adults in Winnipeg:
X = IQ; Y = weight
(E) Select a sample of male students at the University of Manitoba:
X = height; Y = shoe size
4. We take a sample of cars driving down a highway. Which of the following pairs of
variables will likely have a positive correlation?
(A) X = age of the car, Y = value of the car
(B) X = amount of gas in car’s tank, Y = cost to fill the tank with gas
(C) X = age of the driver, Y = speed of the car
(D) X = year the car was made, Y = total distance the car has been driven
(E) none of the above
5. A clothing store is having a sale. All items are 30% off (i.e., the sale price of any item
is only 70% of the retail price). What is the correlation between the retail price and the
sale price for items in the clothing store?
(A) 1 (B) −0.30 (C) 0.70 (D) −1 (E) 0.30
6. A national consumer magazine obtained data for several variables measured on a random
sample of cars. The magazine reported the following correlations:
The correlation between car weight and car reliability is −0.30.
The correlation between car weight and annual maintenance cost is 0.20.
Which of the following statements is/are true?
(I) Lighter weight cars tend to be more reliable.
(II) Heavier cars tend to cost more to maintain.
(III) Car weight is related more strongly to maintenance cost than to reliability.
(A) I only
(B) II only
(C) I and II only
(D) II and III only
(E) I, II and III
7. A course coordinator is interested in determining whether there is a relationship be-
tween a student’s section (A01, A02, A03, etc.) and his or her final exam grade for an
introductory course. A random sample of 500 students was selected and the coordinator
calculated a correlation of r = −0.8 between section and final exam grade. You conclude
that:
(A) students in lower numbered sections (A01, A02, etc.) did better on the exam than
students in higher numbered sections (A08, A09, etc.).
(B) students in lower numbered sections (A01, A02, etc.) did worse on the exam than
students in higher numbered sections (A08, A09, etc.).
(C) there is a weak negative linear association between a student’s section and his or
her final exam grade.
(D) a correlation is inappropriate here because the relationship between section number
and final exam grade is likely non-linear.
(E) the coordinator has made an error.
8. Recording January 1st as Day 1 and December 31st as Day 365, the manager of a 7-11
store records the daily Slurpee sales (in $) over the course of a year. The manager
calculates the correlation between the day number X and sales Y to be close to zero,
which surprises her. What is the most likely explanation for this?
(A) The daily sales didn’t change much over the course of the year.
(B) The manager made a calculation error.
(C) The relationship between the two variables is not linear.
(D) There is a lurking variable that was not accounted for.
(E) There really is no relationship between the two variables.
9. Four swimmers participated in a race. The shortest swimmer finished first, the second
shortest swimmer finished second, the second tallest swimmer finished third and the
tallest swimmer finished fourth. What can be said about the correlation between height
and race time for these four swimmers?
(A) There is a perfect negative correlation, so r = −1.
(B) The correlation is negative, but not necessarily equal to −1.
(C) The correlation is close to zero.
(D) There is a perfect positive correlation, so r = 1.
(E) The correlation is positive, but not necessarily equal to 1.
10. A police officer would like to determine how the number of alcoholic beverages consumed
by a person can predict his or her blood alcohol level. The officer measures the values of
both variables on a sample of people leaving a bar one night. The correlation between
the two variables is calculated to be 0.88 and the equation of the least squares regression
line is calculated to be ŷ = 0.003 + 0.012x. Which of the following statements is true?
(A) 88% of the variation in blood alcohol level can be accounted for by its regression on
number of alcoholic beverages consumed.
(B) 77% of the variation in number of alcoholic beverages consumed can be accounted
for by its regression on blood alcohol level.
(C) 94% of the variation in blood alcohol level can be accounted for by its regression on
number of alcoholic beverages consumed.
(D) 88% of the variation in number of alcoholic beverages consumed can be accounted
for by its regression on blood alcohol level.
(E) 77% of the variation in blood alcohol level can be accounted for by its regression on
number of alcoholic beverages consumed.
11. Data are collected for some explanatory variable X and some response variable Y and the
correlation r and the least squares regression line are calculated. Which of the following
statements is false?
(A) The correlation tells us the direction of the linear relationship between X and Y .
(B) The correlation measures the strength of the linear relationship between X and Y .
(C) The slope of the regression line tells us the direction of the linear relationship be-
tween X and Y .
(D) The slope of the regression line measures the strength of the linear relationship
between X and Y .
(E) The intercept of the regression line is the predicted value of Y when X = 0.
12. The least squares regression line is the line that minimizes:
P
(A) (ŷi − ȳ)
(yi − ŷi )2
P
(B)
(yi − ȳ)2
P
(C)
(ŷi − ȳ)2
P
(D)
P
(E) (yi − ŷi )
13. Which of the following statements about the least squares regression line is/are true?
(I) The slope of the least squares regression line always has the same sign as the
correlation.
(II) The least squares regression line is the line that minimizes the sum of residuals.
(III) The least squares regression line is the line that maximizes the value of r2 .
(A) I only
(B) II only
(C) I and II only
(D) I and III only
(E) I, II and III
14. Consider the scatterplot shown below, displaying the relationship between some explana-
tory variable X and some response variable Y :
Which of the following statements is true?
(A) Point A (the red point) is the residual when X = 4.
(B) Point B (the yellow point) is the predicted value of Y when X = 8.
(C) Point C (the blue point) is an influential observation.
(D) Point D (the green point) is an outlier in the y-direction.
(E) All of the above are true.
15. A professor would like to conduct a regression analysis to determine whether a student’s
STAT 2000 final exam score can be predicted from their STAT 1000 final exam score.
She records the exam scores for a sample of students who have taken both courses. The
STAT 1000 exam scores have a mean of 71.0 and a standard deviation of 9.4. The STAT
2000 exam scores have a mean of 66.2 and a standard deviation of 10.3. The correlation
between STAT 1000 and STAT 2000 final exam score is calculated to be 0.84.
What is the predicted increase in STAT 2000 exam score when a student’s STAT 1000
exam score increases by one?
(A) 0.71 (B) 0.76 (C) 0.84 (D) 0.88 (E) 0.92
The next two questions (16 and 17) refer to the following:
Leonardo da Vinci believed that a person’s height could be predicted from their armspan
(the distance, measured across the back, between the person’s fingertips, when the arms
are held out straight and horizontally). The values of both variables are measured (in
cm) for a sample of nine individuals. The scatterplot is shown below with the least
squares regression line.
The equation of the least squares regression line is calculated to be ŷ = −8.52 + 1.06x.
The sample means and standard deviations are shown below:
mean std. dev.
Armspan 169.6 6.87
Height 171.3 8.05
16. Which of the following statements is false?
(A) A reliable prediction for the height of a person with an armspan of 170 cm is 171.68
cm.
(B) A reliable prediction for the height of a person with an armspan of 195 cm is 198.18
cm.
(C) Despite the strong linear relationship between armspan and height, we cannot con-
clude a causal relationship.
(D) One individual in the sample had an armspan of 176.1 cm and a height of 180.0 cm.
The residual for this individual is equal to 1.854 cm.
(E) There are five positive residuals and four negative residuals.
17. What is the value of the correlation between armspan and height for this sample of
individuals?
(A) 0.837 (B) 0.858 (C) 0.884 (D) 0.905 (E) 0.939
18. Do people buy more ice cream when it’s hot outside? Researchers collected data from 19
American cities for the month of August. The average monthly temperature X (in ◦ F)
was recorded for each city, as well as the ice cream consumption Y (in pints per capita).
The following calculations were made:
The correlation between temperature and ice cream consumption was 0.5.
The average temperature was 74.
The average ice cream consumption was 0.4.
The standard deviation of temperatures was 8.
The standard deviation of ice cream consumption was 0.16.
What is the predicted ice cream consumption in an American city with an average
August temperature of 80◦ F?
(A) 0.45 (B) 0.46 (C) 0.47 (D) 0.48 (E) 0.49
19. Can the number of calories in breakfast cereal be predicted by the sugar content? Re-
searchers gathered data for ten breakfast cereals, including sugar content and calories
per serving (both in grams). The data are as follows:
Cereal 1 2 3 4 5 6 7 8 9 10 mean std. dev.
Sugar 4.3 7.1 3.8 5.7 8.5 4.2 9.7 3.5 4.9 6.3 5.8 2.09
Calories 99 109 97 106 107 104 112 102 103 102 104.1 4.53
The correlation between sugar content and calories for this sample is calculated to be
0.84. The equation of the least squares regression line is:
(A) ŷ = 93.54 + 1.82x
(B) ŷ = 101.84 + 0.39x
(C) ŷ = 95.23 + 1.53x
(D) ŷ = 114.29 + 1.82x
(E) ŷ = 104.10 + 0.39x
20. Researchers want to determine how the height of a mountain can help explain the tem-
perature at the top of the mountain. The two variables were measured for a sample of
mountains and the least squares regression line was calculated. It was also reported that
67% of the variation in temperature at the top of a mountain can be explained by its
regression on the mountain’s height. What is the value of the correlation between the
two variables?
(A) 0.82 (B) 0.67 (C) −0.45 (D) −0.82 (E) −0.67
The next two questions (21 and 22) refer to the following:
A student would like to determine whether the number of pages in a textbook can be
used to predict its price. She took a random sample of 30 textbooks from the campus
bookstore and recorded the price (in $) and the number of pages for each book. The
least squares regression line is calculated to be ŷ = 83 + 0.3x. It is also reported that
58% of the variation in the price of a textbook can be explained by its regression on the
number of pages.
21. What is the value of the correlation between price and number of pages?
(A) 0.76 (B) 0.34 (C) 0.30 (D) 0.58 (E) 0.83
22. One textbook in the sample had 150 pages and cost $121. What is the value of the
residual for this textbook?
(A) 7 (B) 5 (C) −7 (D) −5 (E) 6
The next three questions (23 to 25) refer to the following:
We would like to examine how the speed of a professional baseball player can be used
to predict his power. Two important statistics in baseball are home runs and stolen
bases. Strong hitters will hit many home runs, and fast runners will steal many bases. A
random sample of 20 Major League Baseball players is selected, and the number of bases
X the player stole and the number of home runs Y the player hit last season are recorded.
The equation of the least squares regression line is calculated to be ŷ = 35.19 − 0.78x.
It is also determined that 61.9% of the variation in number of home runs hit can be
accounted for by its regression on number of stolen bases.
23. What is the correct interpretation of the slope of the least squares regression line?
(A) When the number of stolen bases increases by one, we predict an increase of 0.78
home runs.
(B) When the number of home runs increases by one, we predict a decrease of 0.78
stolen bases.
(C) When the number of stolen bases increases by 0.78, we predict a decrease of one
home run.
(D) When the number of home runs increases by 0.78, we predict a decrease of one
stolen base.
(E) When the number of stolen bases increases by one, we predict a decrease of 0.78
home runs.
24. What is the correlation between number of stolen bases and number of home runs?
(A) 0.787 (B) −0.619 (C) −0.352 (D) 0.619 (E) −0.787
25. One player in the sample hit 4 home runs and had a residual of −7.79. How many stolen
bases did this player have?
(A) 28 (B) 30 (C) 32 (D) 35 (E) 40
26. Which of the following represents the strongest linear relationship between X and Y ?
(A) ŷ = 0.37 + 0.82x, r = 0.44
(B) ŷ = 0.63 − 0.54x, r = −0.77
(C) ŷ = 0.96 + 0.72x, r = 0.55
(D) ŷ = 0.74 − 0.95, r = −0.33
(E) ŷ = 0.58 + 0.49, r = 0.66
The next two questions (27 and 28) refer to the following:
A frustrated student wonders why it seems like the bus takes longer to come when it’s
cold outside. He records the temperature (in ◦ C) and the time he waits for the bus (in
minutes) over a seven-day period. He would like to determine how the temperature can
help predict the time it takes his bus to come. The data are shown below:
Day 1 2 3 4 5 6 7
Temperature X −32 −21 −25 −27 −18 −20 −17
Time Y 12.5 8.7 10.1 9.8 7.3 9.2 3.4
The least squares regression line is calculated to be ŷ = −1.57−0.45x and the correlation
is calculated to be r = −0.86.
27. What is the residual for the sixth day?
(A) 5.72 (B) −1.77 (C) 7.43 (D) −5.72 (E) 1.77
28. What is the correct interpretation of the slope of the least squares regression line?
(A) As temperature increases by 1◦ C, we predict the wait time to decrease by 0.45
minutes.
(B) As temperature decreases by 0.45◦ C, we predict the wait time to increase by 1
minute.
(C) As the wait time increases by one minute, we predict the temperature to decrease
by 0.45◦ C.
(D) As the wait time decreases by 0.45 minutes, we predict the temperature to increase
by 1◦ C.
(E) As temperature increases by 1◦ C, we predict the wait time to increase by 0.45
minutes.
29. The values of an explanatory variable X and a response variable Y are measured on a
sample of individuals. The scatterplot of the data is shown below:
Which of the following is a histogram of the residuals?
(A) (B)
(C) (D)
(E)
30. Data are collected on a sample of individuals for some explanatory variable X and some
response variable Y and are displayed on the scatterplot below:
What would be the equation of the least squares regression line for these data?
(A) ŷ = 80 − 0.3x
(B) ŷ = 70 − 0.6x
(C) ŷ = 80 − 1.2x
(D) ŷ = 70 − 0.3x
(E) ŷ = 80 − 0.6x
31. Can the speed of a car be predicted by the age of its driver? The speeds (in km/h) of a
sample of ten vehicles are recorded, along with the ages of their drivers. The equation
of the least squares regression line is calculated to be ŷ = 133.72 − 0.56x. Melissa is
driving home from work on the highway. It is known that her predicted speed is 107.4
km/h. How old is Melissa?
(A) 39 (B) 43 (C) 47 (D) 51 (E) 55
32. The values of an explanatory variable X and a response variable Y are measured on a
sample of individuals.untitled
Their 3: Fit Y by
values areX plotted
of Y by Xon the scatterplot shown below: Page 1 of 1
Bivariate Fit of Y By X
80
!#!
70
60
50
!"!
Y
40
30
20
10
5 10 15 20
X
!
Which of the following statements is true about the effect these points have on the least
squares regression line?
(A) Point A will have a greater effect because it is an outlier in the x-direction, and
these outliers are more likely to be influential.
(B) Point B will have a greater effect because it is an outlier in the y-direction, and
these outliers are more likely to be influential.
(C) The two points will have an approximately equal effect, as they are about equally
far from the rest of the points.
(D) Neither point will have an effect on the least squares regression line, as this line is
resistant to the presence of outliers.
(E) It is impossible to determine which point will have a greater effect without calcu-
lating the least squares regression line with and without these points present.
33. Which of the following statements is false?
(A) The correlation r should not be used as a measure of association if the relationship
between two variables is not linear.
(B) The point (x̄, ȳ) always falls on the least squares regression line.
(C) The slope of the least squares regression line cannot be negative if the correlation
is positive.
(D) The least squares regression line minimizes the sum of squared residuals.
(E) The correlation always falls between 0 and 1, inclusive.
The next two questions (34 and 35) refer to the following:
The values of an explanatory variable X and a response variable Y are measured on a
sample of individuals. The scatterplot of the data and the least squares regression line
are shown below:
34. Suppose we add an additional point to the scatterplot, for which x = 15 and y = 5. If
we recalculate the regression line including this point in the data set, then:
(A) the intercept will decrease and the slope will increase.
(B) the intercept will increase and the slope will decrease.
(C) both the intercept and the slope will increase.
(D) both the intercept and the slope will decrease.
(E) the intercept will remain the same and the slope will decrease.
35. The change in the position of the regression line in the previous question is due to:
(A) an outlier in the y-direction.
(B) a lurking variable.
(C) extrapolation.
(D) an influential observation.
(E) a large residual.
36. Which of the following statements about regression and correlation is true?
(A) The correlation r is resistant to the effects of outliers.
(B) Positive values of r indicate a strong linear relationship and negative values of r
indicate a weak linear relationship.
(C) If r2 is high, a predicted value of Y will be reliable, even if the value of X is far
outside the range of data values in the sample.
(D) The correlation r is expressed in the same units as the standard deviations of X
and Y .
(E) none of the above
37. A veterinarian would like to determine how the age of a Shetland pony can be used to
predict its weight. She measures both variables for a sample of ponies and calculates the
least squares regression line. This is the line that minimizes:
(A) the sum of the distances between the actual weights and the predicted weights.
(B) the sum of the squared differences between the actual ages and the predicted ages.
(C) the sum of the distances between the actual ages and the predicted ages.
(D) the sum of the squared differences between the actual weights and the predicted
weights.
(E) the sum of the squared differences between the actual ages and the predicted weights.
38. We would like to determine how the dose of an allergy drug (measured in mg) affects
the duration of relief (in hours) from a person’s allergy symptoms. The values of both
variables are measured for the 50 individuals who participated in a study. The data
are plotted on a scatterplot and it is apparent that a linear relationship is a reasonable
assumption. The correlation between the two variables is calculated to be 0.87, and the
sample means and standard deviations are shown below:
mean std. dev.
Dose 9.14 4.45
Duration 12.86 6.80
When dose increases by 1 mg, duration of relief is predicted to increase by how many
hours?
(A) 0.57 (B) 0.76 (C) 1.33 (D) 1.84 (E) 2.45
The next two questions (39 and 40) refer to the following:
Crickets make their chirping sounds by rapidly sliding one wing over the other. The faster
this movement, the higher the chirping sound that is produced. It has been discovered
by scientists that crickets move their wings faster when the temperature is warm than
when it is cold. As such, by listening to the pitch of a chirping cricket, it is possible
to estimate the temperature. We record the pitch X of a chirping cricket (measured
in vibrations per minute) and the temperature Y (in ◦ C) for a sample of 15 days. The
equation of the least squares regression line is calculated to be ŷ = 2.5 + 0.025x.
39. One cricket in the sample had a pitch of 930 and the temperature was 27◦ C. What is
the value of the residual for this observation?
(A) −3.50 (B) −1.25 (C) 25.75 (D) 1.25 (E) 3.50
40. If we had instead measured pitch in vibrations per second, which of the following values
would change?
(I) slope
(II) intercept
(III) correlation
(A) I only
(B) II only
(C) I and II only
(D) I and III only
(E) II and III only
Answers
1. C 21. A
2. C 22. C
3. E 23. E
4. E 24. E
5. A 25. B
6. C 26. B
7. E 27. E
8. C 28. A
9. E 29. C
10. E 30. E
11. D 31. C
12. B 32. A
13. A 33. E
14. C 34. B
15. E 35. D
16. B 36. E
17. D 37. D
18. B 38. C
19. A 39. D
20. D 40. A