Chapter 4
Chapter 4
C H A P T E R
4
Bivariate data
E
What is a scatterplot, how is it constructed and what does it tell us?
What is the q-correlation coefficient, how is it calculated and what does it tell us?
PL
How do we fit a straight line to a scatterplot by eye?
How do we fit a straight line to a scatterplot using the two-mean method?
How do we interpret the intercept and slope of a line fitted to a scatterplot?
How do we use a line fitted to a scatterplot to make predictions?
What is the difference between interpolation and extrapolation?
In Chapter 1, ‘Univariate data’, you learned about the statistical methods we use to analyse
data recorded about a single variable, such as a person’s weight. In this chapter, you will learn
about the statistical methods used to analyse data recorded about two related variables, such as
M
a person’s weight and height. Such data is called bivariate data (two-variable data).
When we analyse bivariate data, we are interested in how the two variables relate to each
other. We try to answer questions such as: ‘Is there a relationship between these two
variables?’ and ‘Does knowing the value of one of the variables tell us anything about the
value of the second variable?’
For example, let us take as our two variables the mark a student obtained on a test and the
SA
amount of time they spent studying for that test. Since the amount of time spent studying may
affect the mark obtained, we distinguish between the two variables by calling the time spent
studying the independent variable (IV) and the mark obtained the dependent variable (DV).
140
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4
2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Student 1 2 3 4 5 6 7 8 9 10
Time (hours) 4 36 23 19 1 11 18 13 18 8
Mark (%) 41 87 67 62 23 52 61 43 65 52
E
will become very important when we come to fitting lines to scatterplots later in the chapter.
The horizontal or x-coordinate of the point represents the time spent studying (the IV).
The vertical or y-coordinate represents the mark obtained (the DV).
The scatterplot below shows the point for Student 1, who studied 4 hours for the examination
PL
and obtained a mark of 41.
90
80
70
60
Mark (%)
50
Student 1 (4, 41)
40
30
M
20
10
0
5 10 15 20 25 30 35 40
Time (hours)
The scatterplot is completed by plotting the points for each remaining student, as shown
SA
below.
90
80
70
60
Mark (%)
50
40
30
20
10
0 5 10 15 20 25 30 35 40
Time (hours)
Price ($’000)
12
In this relationship, the car’s price is clearly
the dependent variable (DV) as it depends on 10
its age, so price is plotted on the vertical
8
axis. Age, the independent variable (IV), is
plotted on the horizontal axis. 0 2 4 6 8
E
Age (years)
PL
complete this task.
The data below give the marks that students obtained on an examination and the times
they spent studying for the examination.
Time (hours) 4 36 23 19 1 11 18 13 18 8
Mark (%) 41 87 67 62 23 52 61 43 65 52
M
Use a graphics calculator to construct a scatterplot. Treat time as the independent
(x) variable.
Steps
1 Start a new document (by pressing / + N )
and select 3:Add Lists & Spreadsheet.
SA
3 To construct a scatterplot
a Move the cursor to the textbox area below the
horizontal (or x-) axis. Press when
prompted and select the variable time (i.e. the
independent variable). Press enter to paste the
variable onto that axis.
b Move the cursor towards the centre of the
vertical (or y-) axis until a textbox appears.
E
Press when prompted to select the variable
mark.
c Finally, press enter to paste the variable mark
onto that axis and generate the required
scatterplot, which is shown opposite. The plot
PL is automatically scaled.
The data below give the marks that students obtained on an examination and the
M
times they spent studying for the examination.
Time (hours) 4 36 23 19 1 11 18 13 18 8
Mark (%) 41 87 67 62 23 52 61 43 65 52
Use a graphics calculator to construct a scatterplot. Treat time as the independent
(x) variable.
SA
Steps
1 Open the Statistics application and
enter the coordinate values into
lists named time and mark, as
shown.
2 Tap from the toolbar to open
the Set StatGraphs dialog box.
E
Tap h to confirm your selections.
3 Tapping from the toolbar at
the top of the screen
automatically plots a scaled
graph in the lower-half of the
PL screen.
Tapping the icon will give a
full-screen sized graph. Tap
again to return to a half-screen.
4 Tapping from the toolbar
places a marker on the first data
point (xc = 4, yc = 41).
Use the horizontal cursor arrow
M
( ) to move from point to
point.
Exercise 4A
SA
3 Number of seats 405 296 288 258 240 193 188 148
Airspeed (km/h) 830 797 774 736 757 765 760 718
The table above shows the numbers of seats and airspeeds of eight passenger aircraft. Use a
graphics calculator to construct a scatterplot with number of seats as the independent
variable.
4 Drug dosage (mg) 0.5 1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 0.6
Response time (min) 65 35 15 10 22 16 10 18 70 50
E
The table above shows the response times of 10 patients
given a pain relief drug, and the drug dosages. Use a
graphics calculator to construct a scatterplot using
drug dosage as the independent variable.
4.2
5
PL Time (min)
Number in cinema
0 5 10 15 20 25
87 102 118 123 135 137
The table above shows the numbers of people in a cinema at 5-minute intervals after the
advertisements started. Use a graphics calculator to construct an appropriate scatterplot.
Presence of a relationship
First we look to see if there is a clear pattern in the scatterplot. y
In the example opposite, there is no clear pattern
SA
For the three examples below, there is a clear (but different) pattern in each set of points, so
we conclude that there is a relationship in each case.
y y y
x x x
Having found a clear pattern, there are two main things we look for in the pattern of points:
direction and outliers (if any)
strength of the relationship (amount of scatter).
E
0 20 22 24 26 28 30 32 34 36
Age (years)
In contrast, there is a clear pattern in this scatterplot
of the mark students obtained in an exam and the
time they spent studying for the exam.
Strong relationship
When there is a strong relationship between the variables, the points will tend to follow a
single stream. A pattern is clearly seen. There is only a small amount of scatter in the plot.
E
Strong positive relationship Strong positive relationship Strong negative relationship
Moderate relationship
PL
As the amount of scatter in the plot increases, the pattern becomes less clear. This indicates
that the relationship is less strong. In the examples below, we might say that there is a
moderate relationship between the variables.
M
Moderate positive relationship Moderate positive relationship Moderate negative relationship
Weak relationship
SA
As the amount of scatter increases further, the pattern becomes even less clear. This indicates
that any relationship between the variables is weak. The scatterplots below are examples of
weak relationships between the variables.
No relationship
Finally, when all we have is scatter, as seen in the scatterplots below, no pattern can be seen. In
this situation we say that there is no relationship between the variables.
E
No relationship No relationship No relationship
These scatterplots should help you to get a feel for the strength of a relationship, as indicated
PL
by the amount of scatter in a scatterplot. Later in this chapter, you will learn to calculate its
value using the idea of q-correlation. At the moment, you only need be able to estimate the
strength of a relationship as strong, moderate, weak or none, by comparing it with the standard
scatterplots given above.
Exercise 4B
1 For each of the following pairs of variables, indicate whether you expect a relationship to
M
exist between the variables and, if so, whether you would expect the variables to be
positively or negatively related.
a Fitness level and amount of daily exercise b Foot length and height
◦
c Comfort level and temperature above 30 C d Foot length and intelligence
e Time taken to get to school and distance travelled
f Weight of an ice cube and surrounding temperature
SA
200
Business ($’000)
10
Height (cm)
190
180 5
170
E
20 45
40
30
10 25
20
5
4.3 PL
The q-correlation coefficient
0
20 25 30 35
Temperature (°C)
40
In the previous section you learned how to estimate the strength of a relationship from a
scatterplot by considering the amount of scatter in the plot. In this section, you will learn how
the q-correlation coefficient (q for quadrant) can be used to give a measure of the strength of
the relationship between two variables.
15
15 20 25 30 35 40 45
Husband’s age (years)
M
The idea behind the q-correlation coefficient
From our earlier investigation of the relationship between two variables, we found that for:
positive relationships, high values on one variable tend to go with high values for the
other variable, and vice versa
negative relationships, high values on one variable tend to go with low values for the other
SA
Back
B a c kto
t oMenu
M e n u>>>
>>>
150 Essential Standard General Mathematics
Solution
1 Find the median of the x-values. There are Median of
y x-values
10
11 points, so the median will be the 6th B A
9
point from the left. 8
2 Draw a vertical dotted line through this point. 7
6
3 Find the median of the y-values. There are Median of
5
11 points, so the median will be the 6th point 4 y -values
up from the bottom of the scatterplot. 3
2
E
4 Draw a horizontal dotted line through this C D
1
point. 0 x
1 2 3 4 5 6 7 8 9 10
5 The scatterplot has now been divided into y
10
four quadrants. Label them A, B, C and D, B b=1 A a=4
9
proceeding anticlockwise from the top right. 8
PL
6 Count the number of points in each of the
quadrants A, B, C and D.
Call these a, b, c and d respectively.
Any points that lie on the line are omitted.
q=
(a + c) − (b + d)
7
6
5
4
3
2
1
0
∴q =
1
C
c=2
2 3 4 5 6 7 8 9 10
(4 + 3) − (1 + 1)
4+1+3+1
=
5
9
D d=1
x
M
a+b+c+d
Substitute the values for a, b, c and d and evaluate.
We can see that the q-correlation can take both positive and negative values.
Suppose that all the points lie in quadrants A and C, as shown. Then b = 0 and
d = 0 and y
(a + c) − (0 + 0) (a + c)
q= = =1 B A
a+0+c+0 (a + c)
C D
x
E
(a + c) − (b + d) 0
q= = =0
a+b+c+d 4a C D
Here there is no relationship (q = 0). x
Thus we can see that in general:
r −1 ≤ q ≤ 1
PL
r If there is a positive relationship then most of the points are in A and C and q is positive.
r If there is a negative relationship then most of the points are in B and D and q is negative.
70
60
50
M
40
30
20
10
SA
0 1 2 3 4 5 6
Drug dosage (mg)
Solution
1 Draw in the median line for both 70
B A
variables on the scatterplot. 60
Since there are 10 points, the median b=4 a=1
50
Reaction time (min)
E
using the q-correlation coefficient
Earlier, we used the degree of scatter in a Strong positive relationship
scatterplot to classify the strength of the 0.75 ≤ q ≤ 1
relationship observed as weak, moderate
or strong. Using the table opposite, we can Moderate positive relationship
PL
do the same using the q-correlation
coefficient.
For example, a q-correlation coefficient
of q = 0.86 indicates that there is a strong
positive relationship.
In contrast, a q-correlation coefficient of
q = −0.34 indicates that there is a weak
negative relationship.
0.5 ≤ q < 0.75
No relationship
– 0.25 < q < 0.25
Exercise 4C
1 Use the table of q-correlation coefficients to classify each of the following.
a q = 0.20 b q = −0.30 c q = −0.85 d q = 0.33
e q = 0.95 f q = −0.75 g q = 0.75 h q = −0.24
i q = −1 j q = 0.25 k q=1 l q = −0.50
2 Calculate the value of the q-correlation coefficient for each of the following scatterplots.
E
a y b y
10 10
9 9
8 8
7 7
6 6
PL
c
5
4
3
2
1
10
9
8
0
y
1 2 3 4 5 6 7 8 9 10
x
d
5
4
3
2
1
0
10
9
8
y
1 2 3 4 5 6 7 8 9 10
x
M
7 7
6 6
5 5
4 4
3 3
2 2
1 1
x x
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
SA
e y f y
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
x x
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
3 Calculate the q-correlation coefficient for each pair of variables shown in the following
scatterplots.
a 16 b 25
12 15
10 10
E
8 5
0 2 4 6 8 0
20 22 24 26 28 30 32 34 36
Age (years) Age (years)
PL 90
80
70
60
Mark (%)
50
40
30
20
10
0 5 10 15 20 25 30 35 40
M
Time (hours)
Once we have the equation, we can use it to predict the value of the dependent variable (y) for
different values of the independent variable (x).
E
Once the line is drawn, we can use the methods you learned in ‘Linear graphs’ (Chapter 3)
to find its equation. The starting point for fitting a line ‘by eye’ is a scatterplot.
PL
The scatterplot opposite plots mark against
time spent studying for an examination, for
10 students.
In this plot, mark is the y (or dependent) variable
and time is the x (or dependent) variable.
Fit a line to the scatterplot by eye and write its
equation in terms of:
a x and y
Mark (%)
90
80
70
60
50
40
30
20
10
M
b the variables mark and time. 0
5 10 15 20 25 30 35 40
Time (hours)
Solution
1 Place a transparent ruler on the scatterplot so 90
SA
80
that the points in the scatterplot are reasonably
70
evenly spread around the line made by the 60
rise = 60
Mark (%)
E
Example 4 Fitting a line by eye using the two-point formula
PL
height for eight people. In this plot, height is the
x (or dependent) variable, and weight is the y
(or dependent) variable.
Fit a line to the scatterplot by eye and write its
equation in terms of:
a x and y
b the variables weight and height.
Weight (kg)
75
70
65
60
55
50
0
5
5
5
16
17
18
19
15
16
17
19
18
M
Height (cm)
Solution
1 Place a ruler on the scatterplot
80
so that the points in the scatterplot
75
are reasonably evenly spread around
SA
70
the line.
Weight (kg)
0
5
5
5
5
170
175
16
18
19
16
15
19
18
=
x − x1 x2 − x1 Height (cm)
E
4 Noting that y represents the variable ∴ height = −40 + 0.6 × weight
weight and x represents the variable
height, rewrite the equation in terms of
weight and height.
PL
Fitting a line using the two-mean method
While fitting a line by eye is quick and easy, it is not a reliable method for finding the equation
of the line that best fits a scatterplot, as everyone is likely to come up with a slightly different
line.
One method for overcoming this problem is to use the two-mean method. The two-mean
method locates the line on the scatterplot by finding the mean of the bottom half and top half
of the data values and draws a line between the two.
M
To fit a line using the two-mean method requires both the scatterplot and the data values.
The data below give the marks that students obtained on an examination and the times they
spent studying for the examination.
SA
Time (hours), x 4 36 23 19 1 11 18 13 18 8
Mark (%), y 41 87 67 62 23 52 61 43 65 52
Fit a line to the scatterplot using the two-mean method and write its equation in terms of:
a x and y b the variables mark and time.
Solution
1 Rewrite the data pairs in order, according to the x values.
Time, x 1 4 8 11 13 18 18 19 23 36
Mark, y 23 41 52 52 43 61 65 62 67 87
2 Divide the ordered table into two new tables: one for the lower half of data values, the
other for the top half of data values. Find the mean values of x and y for each new table.
Lower half
Time,x 1 4 8 11 13 -x L = 7.4
Mark,y 23 41 52 52 43 y-L = 42.2
Upper half
Time,x 18 18 19 23 36 -x u = 22.8
Mark,y 61 65 62 67 87 -y u = 68.4
E
90
3 Plot the two mean points (7.4, 42.2)
80
and (22.8, 68.4) on the scatterplot. 70
4 Draw in the line through the two mean 60 (22.8, 68.4)
Mark (%)
points to plot the two-mean line. 50
PL
5 Use the two mean points (7.4, 42.2)
and (22.8, 68.4) to find the equation of the
line in terms of y and x. Use either the
two-point formula or a graphics calculator
(see page 109).
6 Rewrite the equation of the two-mean line
in terms of the variables mark and time.
40
30
20
10
0
(7.4, 42.2)
5 10 15 20 25 30 35 40
Time (hours)
Equation of the two-mean line :
y = 29.6 + 1.7x
∴ mark = 29.6 + 1.7 × time
M
It is interesting to note that the equation of the two-mean line is very close to the equation we
got by fitting a line by eye. This is often the case when the points in the scatterplot are
reasonably closely scattered around the line. However, for scatterplots where this is not the
case, the two-mean method is a more reliable technique to use than fitting a line by eye.
SA
Exercise 4D
E
40
20
0 25 50 75 100
Female literacy rate (%)
100
2 Fit a line by eye to the scatterplot opposite.
PL
Write the equation of the line in terms of the
variables height and age.
90
85
80
180
36 40 44 48 52 56 60
Age (months)
M
Write the equation of the line in terms of the
170
variables daughter’s height and mother’s height.
160
150
SA
0.5 19.3 25
Velocity (m/s)
1 20.4 20
1.5 18.6 15
2 22.2 10
3 22.5 5
3.5 24.3 0
1 2 3 4 5
4 22.5 Time (s)
5 25.5
Find the equation of the two-mean line for this data. Write the equation in terms of the
variables velocity and time.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4
2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
5 The data below gives the prices and ages of 12 used cars. Also shown is the scatterplot
constructed from this data.
Age (years) Price ($)
2 15 800
16
3 14 300
3 13 800 14
Price ($’000)
4 11 800
12
4 13 000
E
4 13 300 10
5 11 000
8
6 12 200
6 9 500 0 2 4 6 8
7 8 300
PL 7
8
9 700
8 000
Find the equation of the two-mean line for this data. Write the equation in terms of the
variables price and age.
6 The data below gives the airspeed and the number of seats in 8 aircraft. Also shown is the
scatterplot constructed from this data.
288 774
775
258 736
750
240 757
725
193 765
SA
700
188 760
148 718 100 150 200 250 300 350 400 450
Number of seats
Find the equation of the two-mean line for this data. Write the equation in terms of the
variables airspeed and number of seats.
For example, in Example 5 we fitted a line to the data relating students’ marks on an
examination to the time they spent studying for the examination. The equation was
mark = 29.6 + 1.7 × time
Using this equation, and rounding off to the nearest whole number, we would predict that a
student who spent:
0 hours studying would obtain a mark of 30% (mark = 29.6 + 1.7 × 0 = 29.6)
8 hours studying would obtain a mark of 43% (mark = 29.6 + 1.7 × 8 = 43.2)
12 hours studying would obtain a mark of 50% (mark = 29.6 + 1.7 × 12 = 50)
E
30 hours studying would obtain a mark of 81% (mark = 29.6 + 1.7 × 30 = 80.6)
80 hours studying would obtain a mark of 166%! (mark = 29.6 + 1.7 × 80 = 165.6)
This last result points to one of the limitations of regression lines. We are predicting someone
to get more than 100%. When using a regression line to make predictions, we must remember
that, strictly speaking, the equation only applies to the range of data values used to determine
PL
the equation.
Thus, we are safe using the line to make predictions within this data range. This is called
interpolation.
However, we must be extremely careful about how much faith we put into predictions made
outside the data range. Making predictions outside the data range is called extrapolating.
120
Mark (%)
Exercise 4E
1 Complete the following sentences. Using a regression line to make a prediction:
a within the range of data that was used to derive the equation is called .
b outside the range of data that was used to derive the equation is called .
2 For children between the ages of 36 and 60 months, the equation relating their height (in cm)
to their age (in months) is:
height = 72 + 0.4 × age
Use this equation to predict the height (to the nearest cm) of a child who is:
a 40 months old. Is this interpolation or extrapolation?
b 55 months old. Is this interpolation or extrapolation?
c 70 months old. Is this interpolation or extrapolation?
E
3 For shoe sizes between 6 and 12, the equation
relating a person’s weight (in kg) to shoe size is:
weight = 48.1 + 2.2 × shoe size
Use this equation to predict the weight (to
the nearest kg) of a person whose shoe size is:
4 When preparing between 25 and 100 meals, a cafeteria’s cost (in dollars) is given by the
equation:
cost = 175 + 5.8 × number of meals
Use this equation to predict the cost (to the nearest dollar) of preparing:
M
a no meals. Is this interpolation or extrapolation?
b 60 meals. Is this interpolation or extrapolation?
c 89 meals. Is this interpolation or extrapolation?
5 For women of heights from 150 to 180 cm, the equation relating a daughter’s adult height
(in cm) to her mother’s height (in cm) is:
SA
Back
B a c kto
t oMenu
M e n u>>>
>>>
Chapter 4 — Bivariate data 163
Review
Key ideas and chapter summary
E
DV 3
2.5
2
1.5
1
25 30 35 40 45 50 55 60
PL
Identifying relationships
between two numerical
variables
axis.
Age
IV
In a scatterplot, the dependent variable (DV) is plotted on the
vertical axis and the independent variable (IV) on the horizontal
E
No relationship
– 0.25 < q < 0.25
PL
Fitting lines to
scatterplots: linear
regression
Moderate negative relationship
– 0.75 < q ≤ – 0.5
Fitting a line by eye Fitting a line by eye means drawing a line on the scatterplot that
SA
captures the general trend of the data. It is most suitable when there is
minimal scatter in the scatterplot.
Review
Fitting a line using The two-mean method positions the line on the scatterplot by finding the
the two-mean mean of the bottom half and top half of the data values. A line is then
method
drawn between the two.
E
Using a regression The regression line y = a + bx enables the value of y to be determined
line to make for a given value of x.
PL
predictions
Interpolation and
extrapolation
Skills check
Predicting within the range of data is called interpolation.
Predicting outside the range of data is called extrapolation.
Multiple-choice questions
E
C Political party preference (Labor, Liberal, Other) and age in years
D Age in years and blood pressure in mm Hg
E Height in cm and sex (male, female)
2 For the scatterplot shown, the relationship between the y
variables is best described as:
PL A weak negative
B strong negative
C no relationship
D weak positive
E strong positive
Review
5 For the scatterplot shown, the relationship between the y
variables is best described as:
A weak negative
B strong negative
C no relationship
D weak positive x
E strong positive
E
6 A q-correlation coefficient of 0.32 would describe a relationship classified as:
A weak positive B moderate positive C strong positive
D close to zero E moderately strong
7 For the scatterplot shown, the q-correlation 10
9
PL coefficient is:
A −1
B −0.5
C 0
D 0.5
E 1
10
0
1 2 3 4 5 6 7 8 9 10
M
coefficient is: 9
8
A −1 7
6
B −0.5 5
C 0 4
3
D 0.5 2
1
E 1
0 1 2 3 4 5 6 7 8 9 10
SA
Velocity (m/s)
B velocity = 19 + 1 × time
C velocity = 1 + 19 × time 15
D velocity = 19 + 5 × time 10
E velocity = 5 + 19 × time 5
E
0
1 2 3 4 5
Time (s)
11 For the scatterplot shown, the line drawn by 16
PL B −1000
C −200
D 2000
E 1000
10
0
2 4
Age (years)
6 8
M
The weekly income and weekly food costs for a group of 10 university students is given
in the following table.
Income ($) 150 250 300 300 380 450 600 850 950 1000
Food cost ($) 40 60 70 130 150 260 120 460 200 600
12 The equation of the two-mean line would be found by finding the equation of the
SA
Review
14 The equation predicts that the amount spent on entertainment by a person with an
income of $800 is:
A $40 B $80 C $120 D $160 E $1200
15 The following statements relate to the equation
expenditure = 40 + 0.10 × income
Which statement is not true?
A Expenditure is the dependent variable. B Income is the independent variable.
E
C The slope of the line is 0.10. D The intercept of the line is 40.
E Using the line to predict the expenditure of a person with an income of $1500
per week is called interpolation.
PL
Short-answer questions
1 The following table gives the number of times the ball was inside the 50 m line in an
AFL football game, and the team’s score in that game.
Inside 50 m 64 57 34 61 51 52 53 51 64 55 58 71
Score (points) 90 134 76 92 93 45 120 66 105 108 88 133
a Construct a scatterplot of score against the number of times the ball was
inside 50 m.
M
b From the scatterplot, describe any relationship between the two variables.
2 Determine the q-correlation coefficient for the 80
70
scatterplot shown.
60
Distance (km)
50
40
30
SA
20
10
0
5 10 15 20 25 30 35 40 45 50
Time (min)
160
scatterplot. Find the equation of the line.
140
120
100
4 The time taken to complete a task, and the number of errors on the task, were
recorded for a sample of 10 primary school children. Determine the equation of the
two-mean line that fits this data.
Time (s) 22.6 21.7 21.7 21.3 19.3 17.6 17.0 14.6 14.0 8.8
Errors 2 3 3 4 5 5 7 7 9 9
Extended-response questions
E
1 A marketing company wishes to predict the likely number of new clients that each of
its graduates will attract to the business in their first year of employment. It plans to
do this by using the graduates’ scores on a marketing examination in the final year of
their course.
PL Graduate
1
2
3
4
5
6
Examination score
65
72
68
85
74
61
Number of new clients
10
10
7
9
8
8
M
7 60 6
8 78 10
9 70 5
10 82 11
Review
2 To investigate the relationship between marks on an assignment and the final
examination mark, a sample of 10 students was taken. The table below indicates the
marks for the assignment and the final exam mark for each student.
Assignment mark 80 77 71 78 65 80 68 64 50 66
(max = 80)
Final exam mark 83 83 79 75 68 84 71 69 66 58
(max = 90)
E
a Which is the independent variable and which is the dependent variable?
b Construct a scatterplot of this data.
c Describe the relationship between the assignment mark and the final examination
mark.
PL d Determine the value of the q-correlation coefficient for this data, and classify the
strength of the relationship.
e Use your answer to part d to comment on the statement: ‘Good final exam marks
are the result of good assignment marks.’
f Determine the equation for the two-mean line and write it down in terms of the
variables final exam mark and assignment mark.
g Use your equation to predict the final examination mark for a student who scored
50 on the assignment.
h In making this prediction, are you interpolating or extrapolating?
M
3 A marketing firm wanted to investigate the relationship between airplay and CD
sales (in the following week) of newly released CDs. The following data was
collected on a random sample of 10 CDs.
Number of 47 34 40 34 33 50 28 53 25 46
times played
SA
Weekly sales 3950 2500 3700 2800 2900 3750 2300 4400 2200 3400
4 The following table gives the gold-medal winning distance, in metres, for the men’s
long jump for the Olympic games for the years 1896 to 1996. (Some years were
missing owing to the two world wars.)
Year 1896 1900 1904 1908 1912 1920 1924 1928 1932 1936 1948 1952 1956
Distance (m) 6.35 7.19 7.34 7.49 7.59 7.16 7.44 7.75 7.65 8.05 7.82 7.57 7.82
Year 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004
Distance (m) 8.13 8.08 8.92 8.26 8.36 8.53 8.53 8.72 8.67 8.50 8.55 8.59
E
a Which is the independent variable and which is the dependent variable?
b Construct a scatterplot of these data.
c Describe the association between the distance and year.
d Determine the value of the q-correlation coefficient for these data, and classify the