Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
70 views33 pages

Chapter 4

This document provides an introduction to analyzing bivariate or two-variable data using scatterplots. It discusses how to construct a scatterplot by plotting points representing each pair of values from two variables, with the dependent variable on the y-axis and independent variable on the x-axis. Examples are given of interpreting relationships from scatterplots and using a graphics calculator to easily construct scatterplots from list data. Guidelines are provided for fitting lines to scatterplots to study the relationship between the two variables.

Uploaded by

coffee080403
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views33 pages

Chapter 4

This document provides an introduction to analyzing bivariate or two-variable data using scatterplots. It discusses how to construct a scatterplot by plotting points representing each pair of values from two variables, with the dependent variable on the y-axis and independent variable on the x-axis. Examples are given of interpreting relationships from scatterplots and using a graphics calculator to easily construct scatterplots from list data. Guidelines are provided for fitting lines to scatterplots to study the relationship between the two variables.

Uploaded by

coffee080403
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

P1: FXS/ABE P2: FXS

0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>

C H A P T E R

4
Bivariate data

E
What is a scatterplot, how is it constructed and what does it tell us?
What is the q-correlation coefficient, how is it calculated and what does it tell us?

PL
How do we fit a straight line to a scatterplot by eye?
How do we fit a straight line to a scatterplot using the two-mean method?
How do we interpret the intercept and slope of a line fitted to a scatterplot?
How do we use a line fitted to a scatterplot to make predictions?
What is the difference between interpolation and extrapolation?

In Chapter 1, ‘Univariate data’, you learned about the statistical methods we use to analyse
data recorded about a single variable, such as a person’s weight. In this chapter, you will learn
about the statistical methods used to analyse data recorded about two related variables, such as
M
a person’s weight and height. Such data is called bivariate data (two-variable data).
When we analyse bivariate data, we are interested in how the two variables relate to each
other. We try to answer questions such as: ‘Is there a relationship between these two
variables?’ and ‘Does knowing the value of one of the variables tell us anything about the
value of the second variable?’
For example, let us take as our two variables the mark a student obtained on a test and the
SA

amount of time they spent studying for that test. Since the amount of time spent studying may
affect the mark obtained, we distinguish between the two variables by calling the time spent
studying the independent variable (IV) and the mark obtained the dependent variable (DV).

4.1 Displaying bivariate data


Scatterplots
The first step in investigating the relationship between two numerical variables is to construct a
scatterplot.
We will illustrate the process by constructing a scatterplot to display the marks students
obtained on an examination (the DV) against the times they spent studying for the examination
(the IV).

140
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4
2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 141

Student 1 2 3 4 5 6 7 8 9 10
Time (hours) 4 36 23 19 1 11 18 13 18 8
Mark (%) 41 87 67 62 23 52 61 43 65 52

In a scatterplot, each point represents a single case, in this instance a student.


When constructing a scatterplot, it is conventional to use the vertical or y-axis for the
dependent variable (DV) and the horizontal or x-axis for the independent variable (IV). This

E
will become very important when we come to fitting lines to scatterplots later in the chapter.
The horizontal or x-coordinate of the point represents the time spent studying (the IV).
The vertical or y-coordinate represents the mark obtained (the DV).
The scatterplot below shows the point for Student 1, who studied 4 hours for the examination

PL
and obtained a mark of 41.

90
80
70
60
Mark (%)

50
Student 1 (4, 41)
40
30
M
20
10

0
5 10 15 20 25 30 35 40
Time (hours)

The scatterplot is completed by plotting the points for each remaining student, as shown
SA

below.
90
80
70
60
Mark (%)

50
40
30
20
10

0 5 10 15 20 25 30 35 40
Time (hours)

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


142 Essential Standard General Mathematics

For example, in the scatterplot opposite, the 16

advertised prices of 12 second-hand cars are 14


plotted against the cars’ ages (in years).

Price ($’000)
12
In this relationship, the car’s price is clearly
the dependent variable (DV) as it depends on 10
its age, so price is plotted on the vertical
8
axis. Age, the independent variable (IV), is
plotted on the horizontal axis. 0 2 4 6 8

E
Age (years)

Using a graphics calculator to construct a scatterplot


While you need to understand the principles of constructing a scatterplot, and maybe to
construct one by hand for a few points, in practise you will use a graphics calculator to

PL
complete this task.

How to construct a scatterplot using the TI-Nspire CAS

The data below give the marks that students obtained on an examination and the times
they spent studying for the examination.

Time (hours) 4 36 23 19 1 11 18 13 18 8
Mark (%) 41 87 67 62 23 52 61 43 65 52
M
Use a graphics calculator to construct a scatterplot. Treat time as the independent
(x) variable.
Steps
1 Start a new document (by pressing / + N )
and select 3:Add Lists & Spreadsheet.
SA

Enter the data into lists named time and mark.


2 Statistical graphing is done through the Data &
Statistics application.
Press and select 5:Data & Statistics.
Note: A random display of dots will appear – this is to
indicate list data are available for plotting. It is not a
statistical plot.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 143

3 To construct a scatterplot
a Move the cursor to the textbox area below the
horizontal (or x-) axis. Press when
prompted and select the variable time (i.e. the
independent variable). Press enter to paste the
variable onto that axis.
b Move the cursor towards the centre of the
vertical (or y-) axis until a textbox appears.

E
Press when prompted to select the variable
mark.
c Finally, press enter to paste the variable mark
onto that axis and generate the required
scatterplot, which is shown opposite. The plot

PL is automatically scaled.

How to construct a scatterplot using the ClassPad

The data below give the marks that students obtained on an examination and the
M
times they spent studying for the examination.

Time (hours) 4 36 23 19 1 11 18 13 18 8
Mark (%) 41 87 67 62 23 52 61 43 65 52
Use a graphics calculator to construct a scatterplot. Treat time as the independent
(x) variable.
SA

Steps
1 Open the Statistics application and
enter the coordinate values into
lists named time and mark, as
shown.
2 Tap from the toolbar to open
the Set StatGraphs dialog box.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


144 Essential Standard General Mathematics

Complete the dialog box as given below.


For Draw: select On
For Type: select Scatter ( )
For XList: select main \ time ( )
For YList: select main \ mark ( )
Leave Freq: as 1
Leave Mark: as square

E
Tap h to confirm your selections.
3 Tapping from the toolbar at
the top of the screen
automatically plots a scaled
graph in the lower-half of the

PL screen.
Tapping the icon will give a
full-screen sized graph. Tap
again to return to a half-screen.
4 Tapping from the toolbar
places a marker on the first data
point (xc = 4, yc = 41).
Use the horizontal cursor arrow
M
( ) to move from point to
point.

Exercise 4A
SA

1 Height, x 190 183 176 178 185 165 185 163


Weight, y 77 73 70 65 65 65 74 54
The table above shows the heights and weights of eight people. Use a graphics calculator to
construct a scatterplot with height as the IV (i.e. x variable).
2 Wife’s age 26 29 27 21 23 31 27 20 22 17 22
Husband’s age 29 43 33 22 27 36 26 25 26 21 24
The table above shows the ages at marriage of 11 couples. Use a graphics calculator to
construct a scatterplot with wife’s age as the independent variable.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 145

3 Number of seats 405 296 288 258 240 193 188 148
Airspeed (km/h) 830 797 774 736 757 765 760 718
The table above shows the numbers of seats and airspeeds of eight passenger aircraft. Use a
graphics calculator to construct a scatterplot with number of seats as the independent
variable.

4 Drug dosage (mg) 0.5 1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 0.6
Response time (min) 65 35 15 10 22 16 10 18 70 50

E
The table above shows the response times of 10 patients
given a pain relief drug, and the drug dosages. Use a
graphics calculator to construct a scatterplot using
drug dosage as the independent variable.

4.2
5

PL Time (min)
Number in cinema
0 5 10 15 20 25
87 102 118 123 135 137
The table above shows the numbers of people in a cinema at 5-minute intervals after the
advertisements started. Use a graphics calculator to construct an appropriate scatterplot.

How to interpret a scatterplot


M
What features do we look for in a scatterplot that will help us to identify and describe any
relationships present?

Presence of a relationship
First we look to see if there is a clear pattern in the scatterplot. y
In the example opposite, there is no clear pattern
SA

in the points. The points are randomly scattered across


the plot, so we conclude that there is no relationship.
x

For the three examples below, there is a clear (but different) pattern in each set of points, so
we conclude that there is a relationship in each case.

y y y

x x x
Having found a clear pattern, there are two main things we look for in the pattern of points:
direction and outliers (if any)
strength of the relationship (amount of scatter).

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


146 Essential Standard General Mathematics

Direction and outliers 25


This scatterplot of calf diameter against age
20

Calf diameter (cm)


of a group of people is just a random scatter
of points. This suggests that there is no 15
relationship between the variables calf
diameter and age for this group of 10
people. However, there is an outlier, the 5
person with a calf diameter of 22 cm.

E
0 20 22 24 26 28 30 32 34 36
Age (years)
In contrast, there is a clear pattern in this scatterplot
of the mark students obtained in an exam and the
time they spent studying for the exam.

PL The two variables, mark and time, are related.


Furthermore, the points seem to drift upwards
from left to right. When this happens, we say
that there is a positive relationship between
the variables. People who spend more time
studying tend to get higher marks, and
vice versa.
In this scatterplot there are no outliers.
Mark (%)
90
80
70
60
50
40
30
20
10
0
5 10 15 20 25 30 35 40
M
Time (hours)

Likewise, this scatterplot of the price against 16


age of a number of second-hand cars shows
14
Price ($’000)

a clear pattern. The two variables are


related. However, in this case the points seem 12
SA

to drift downwards from left to right. When 10


this happens, we say that there is a negative
relationship between the variables. Older 8
second-hand cars tend to have a lower
price than newer second-hand cars. 0 2 4 6 8
Age (years)
In this scatterplot there are no outliers.

Strength of a relationship (scatter)


The strength of a relationship is measured by how much scatter there is in a scatterplot.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 147

Strong relationship
When there is a strong relationship between the variables, the points will tend to follow a
single stream. A pattern is clearly seen. There is only a small amount of scatter in the plot.

E
Strong positive relationship Strong positive relationship Strong negative relationship

Moderate relationship

PL
As the amount of scatter in the plot increases, the pattern becomes less clear. This indicates
that the relationship is less strong. In the examples below, we might say that there is a
moderate relationship between the variables.
M
Moderate positive relationship Moderate positive relationship Moderate negative relationship

Weak relationship
SA

As the amount of scatter increases further, the pattern becomes even less clear. This indicates
that any relationship between the variables is weak. The scatterplots below are examples of
weak relationships between the variables.

Weak positive relationship Weak positive relationship Weak negative relationship

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


148 Essential Standard General Mathematics

No relationship
Finally, when all we have is scatter, as seen in the scatterplots below, no pattern can be seen. In
this situation we say that there is no relationship between the variables.

E
No relationship No relationship No relationship

These scatterplots should help you to get a feel for the strength of a relationship, as indicated

PL
by the amount of scatter in a scatterplot. Later in this chapter, you will learn to calculate its
value using the idea of q-correlation. At the moment, you only need be able to estimate the
strength of a relationship as strong, moderate, weak or none, by comparing it with the standard
scatterplots given above.

Exercise 4B
1 For each of the following pairs of variables, indicate whether you expect a relationship to
M
exist between the variables and, if so, whether you would expect the variables to be
positively or negatively related.
a Fitness level and amount of daily exercise b Foot length and height

c Comfort level and temperature above 30 C d Foot length and intelligence
e Time taken to get to school and distance travelled
f Weight of an ice cube and surrounding temperature
SA

2 For each of the following scatterplots:


i state whether the variables appear to be related and note any possible outliers.
If the variables appear to be related:
ii state whether the relationship is positive or negative.
iii estimate the strength of the relationship as strong, moderate or weak.
a b
210 15

200
Business ($’000)

10
Height (cm)

190

180 5

170

18 20 22 24 26 28 30 32 0 100 200 300 400 500 600


Age (years) Advertising ($)
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4
2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 149

Daughter’s height (cm)


180 d 70

Reaction time (min)


60
170 50
40
160
30
150 20
10
150 160 170 180
0
Mother’s height (cm) 1 2 3 4 5 6
Drug dosage (mg)
e f

E
20 45
40

Wife’s age (years)


15 35
Score on test

30
10 25
20
5

4.3 PL
The q-correlation coefficient
0
20 25 30 35
Temperature (°C)
40

In the previous section you learned how to estimate the strength of a relationship from a
scatterplot by considering the amount of scatter in the plot. In this section, you will learn how
the q-correlation coefficient (q for quadrant) can be used to give a measure of the strength of
the relationship between two variables.
15

15 20 25 30 35 40 45
Husband’s age (years)
M
The idea behind the q-correlation coefficient
From our earlier investigation of the relationship between two variables, we found that for:
positive relationships, high values on one variable tend to go with high values for the
other variable, and vice versa
negative relationships, high values on one variable tend to go with low values for the other
SA

variable, and vice versa.


The q-correlation coefficient gives a measure of the tendency for points in a scatterplot to
follow these patterns.

Example 1 Calculating the q-correlation coefficient

Calculate the q-correlation coefficient for the y


10
scatterplot shown.
9
8
7
6
5
4
3
2
1
0 x
1 2 3 4 5 6 7 8 9 10
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4
2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back
B a c kto
t oMenu
M e n u>>>
>>>
150 Essential Standard General Mathematics

Solution
1 Find the median of the x-values. There are Median of
y x-values
10
11 points, so the median will be the 6th B A
9
point from the left. 8
2 Draw a vertical dotted line through this point. 7
6
3 Find the median of the y-values. There are Median of
5
11 points, so the median will be the 6th point 4 y -values
up from the bottom of the scatterplot. 3
2

E
4 Draw a horizontal dotted line through this C D
1
point. 0 x
1 2 3 4 5 6 7 8 9 10
5 The scatterplot has now been divided into y
10
four quadrants. Label them A, B, C and D, B b=1 A a=4
9
proceeding anticlockwise from the top right. 8

PL
6 Count the number of points in each of the
quadrants A, B, C and D.
Call these a, b, c and d respectively.
Any points that lie on the line are omitted.

7 The q-correlation coefficient is given by

q=
(a + c) − (b + d)
7
6
5
4
3
2
1
0

∴q =
1
C
c=2

2 3 4 5 6 7 8 9 10

(4 + 3) − (1 + 1)
4+1+3+1
=
5
9
D d=1

x
M
a+b+c+d
Substitute the values for a, b, c and d and evaluate.

The properties of the q-correlation coefficient are summarised below.

The q-correlation coefficient


y
SA

The q-correlation coefficient is defined by


(a + c) − (b + d) B A
q=
a+b+c+d
where a, b, c and d are the numbers of points in the four C D
quadrants of the scatterplot, labelled A, B, C and D respectively. x
Any points that lie on the lines are omitted

We can see that the q-correlation can take both positive and negative values.
Suppose that all the points lie in quadrants A and C, as shown. Then b = 0 and
d = 0 and y
(a + c) − (0 + 0) (a + c)
q= = =1 B A
a+0+c+0 (a + c)
C D
x

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 151

Suppose all the points lie in quadrants B and D, as shown. y


Then a = 0 and c = 0 and
B A
(0 + 0) − (b + d) −(b + d)
q= = = −1
0+b+0+d (b + d)
C D
x

When there are an equal number of points in each y


quadrant, then a + b = c + d, and B A

E
(a + c) − (b + d) 0
q= = =0
a+b+c+d 4a C D
Here there is no relationship (q = 0). x
Thus we can see that in general:
r −1 ≤ q ≤ 1

PL
r If there is a positive relationship then most of the points are in A and C and q is positive.
r If there is a negative relationship then most of the points are in B and D and q is negative.

Example 2 Calculating the q-correlation coefficient

Use the scatterplot opposite to calculate the


q-correlation coefficient for reaction time and
drug dosage.
Reaction time (min)

70
60
50
M
40
30
20
10
SA

0 1 2 3 4 5 6
Drug dosage (mg)
Solution
1 Draw in the median line for both 70
B A
variables on the scatterplot. 60
Since there are 10 points, the median b=4 a=1
50
Reaction time (min)

lines fall between the 5th and


6th points. 40
2 Count the number of points in each 30
of the quadrants A, B, C and D. Call 20
these a, b, c and d respectively.
10
c=1 C d=4 D
0
1 2 3 4 5 6
Drug dosage (mg)

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


152 Essential Standard General Mathematics

3 The q-correlation coefficient is given by


(a + c) − (b + d)
q=
a+b+c+d
(1 + 1) − (4 + 4) −6
Substitute the values for q = = = −0.6
1+4+1+4 10
a, b, c and d and evaluate.

Guidelines for classifying the strength of a linear relationship

E
using the q-correlation coefficient
Earlier, we used the degree of scatter in a Strong positive relationship
scatterplot to classify the strength of the 0.75 ≤ q ≤ 1
relationship observed as weak, moderate
or strong. Using the table opposite, we can Moderate positive relationship

PL
do the same using the q-correlation
coefficient.
For example, a q-correlation coefficient
of q = 0.86 indicates that there is a strong
positive relationship.
In contrast, a q-correlation coefficient of
q = −0.34 indicates that there is a weak
negative relationship.
0.5 ≤ q < 0.75

Weak positive relationship


0.25 ≤ q < 0.5

No relationship
– 0.25 < q < 0.25

Weak negative relationship


– 0.5 < q ≤ –0.25
M
Moderate negative relationship
– 0.75 < q ≤ – 0.5

Strong negative relationship


–1 ≤ q ≤ – 0.75
SA

Correlation and causation


The existence of even a strong relationship between two variables is not, in itself, sufficient to
imply that altering one variable causes a change in the other. It only implies that this may be
the explanation. It may be that both the measured variables are affected by a third and different
variable. For example, if data about the variables crime rates and unemployment in a range of
cities were gathered, a high correlation would be found. But could it be inferred that high
unemployment causes high crime rates? The explanation could be that both of these variables
are dependent on other variables, such as home circumstances, peer group pressure, level of
education or economic conditions, all of which may be related to both unemployment and
crime rates. These two variables may vary together, without one being the direct cause of the
other. Correlations must be interpreted with care.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 153

Exercise 4C
1 Use the table of q-correlation coefficients to classify each of the following.
a q = 0.20 b q = −0.30 c q = −0.85 d q = 0.33
e q = 0.95 f q = −0.75 g q = 0.75 h q = −0.24
i q = −1 j q = 0.25 k q=1 l q = −0.50

2 Calculate the value of the q-correlation coefficient for each of the following scatterplots.

E
a y b y
10 10
9 9
8 8
7 7
6 6

PL
c
5
4
3
2
1

10
9
8
0

y
1 2 3 4 5 6 7 8 9 10
x

d
5
4
3
2
1
0

10
9
8
y
1 2 3 4 5 6 7 8 9 10
x
M
7 7
6 6
5 5
4 4
3 3
2 2
1 1
x x
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
SA

e y f y
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
x x
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


154 Essential Standard General Mathematics

3 Calculate the q-correlation coefficient for each pair of variables shown in the following
scatterplots.
a 16 b 25

Calf diameter (cm)


14 20
Price ($’000)

12 15

10 10

E
8 5

0 2 4 6 8 0
20 22 24 26 28 30 32 34 36
Age (years) Age (years)

PL 90
80
70
60
Mark (%)

50
40
30
20
10
0 5 10 15 20 25 30 35 40
M
Time (hours)

4.4 Fitting lines to scatterplots


If the points on the scatterplot tend to lie on a straight line, then we can fit a line to the
scatterplot. The process of fitting a straight line to bivariate data is known as linear
regression. The aim of linear regression is to model the relationship between two numerical
SA

variables by using a simple equation: the equation of a straight line.

In regression, we write the equation of a straight line as


y = a + bx
where:
y is the dependent variable (DV)
x is the independent variable (IV).
a is the y-intercept of the line
b is the slope of the line.

Once we have the equation, we can use it to predict the value of the dependent variable (y) for
different values of the independent variable (x).

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 155

Fitting a line by eye


What we want to find is the straight line that ‘best’ fits the data. You met this idea earlier, in the
chapter on linear graphs. There is no one way of finding the line that best fits a set of bivariate
data. There are many ways.
The easiest way to fit a line to bivariate data is to construct a scatterplot and draw the line in
‘by eye’. To do this, place a ruler on the scatterplot in a position that captures the general trend
of the data, and then use the ruler to draw a straight line. This method works best when the
points in the scatterplot are reasonably tightly clustered around a straight line.

E
Once the line is drawn, we can use the methods you learned in ‘Linear graphs’ (Chapter 3)
to find its equation. The starting point for fitting a line ‘by eye’ is a scatterplot.

Example 3 Fitting a line by eye using the intercept and slope

PL
The scatterplot opposite plots mark against
time spent studying for an examination, for
10 students.
In this plot, mark is the y (or dependent) variable
and time is the x (or dependent) variable.
Fit a line to the scatterplot by eye and write its
equation in terms of:
a x and y
Mark (%)
90
80
70
60
50
40
30
20
10
M
b the variables mark and time. 0
5 10 15 20 25 30 35 40
Time (hours)

Solution
1 Place a transparent ruler on the scatterplot so 90
SA

80
that the points in the scatterplot are reasonably
70
evenly spread around the line made by the 60
rise = 60
Mark (%)

edge of the ruler. 50


2 Draw in the line. 40
3 Find the equation of the line in terms 30 run = 35
20
of y and x.
10
As the y-intercept can be read from the graph,
0 5 10 15 20 25 30 35 40
use the intercept–slope form of the equation
Time (hours)
of a straight line, y = a + bx.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


156 Essential Standard General Mathematics

To calculate the slope, choose two y = a + bx


easily read points that are reasonably a = y -intercept = 30
widely separated. The points (0, 30) rise 60
b = slope = = = 1.7 (to 1 d.p.)
and (35, 90) are suitable. run 35
∴ y = 30 + 1.7x
Substitute the values for a and b into
the equation.
4 Noting that y represents the variable mark ∴ mark = 30 + 1.7 × time
and x represents the variable time, rewrite
the equation in terms of mark and time.

E
Example 4 Fitting a line by eye using the two-point formula

The scatterplot on the right plots weight against 80

PL
height for eight people. In this plot, height is the
x (or dependent) variable, and weight is the y
(or dependent) variable.
Fit a line to the scatterplot by eye and write its
equation in terms of:
a x and y
b the variables weight and height.
Weight (kg)
75
70
65
60
55
50

0
5

5
5
16

17

18

19
15

16

17

19
18
M
Height (cm)

Solution
1 Place a ruler on the scatterplot
80
so that the points in the scatterplot
75
are reasonably evenly spread around
SA

70
the line.
Weight (kg)

2 Draw in the line. 65

3 Find the equation of the line in 60


terms of y and x. 55
As the y-intercept cannot be read from 50
the graph, use the two-point formula,
y − y1 y2 − y1
0

0
5
5

5
5
170
175
16

18

19
16
15

19
18

=
x − x1 x2 − x1 Height (cm)

or use a graphics calculator.


Choose two easily read points that are
reasonably widely separated. The points
(155, 53) and (195, 77) are suitable.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 157

Either substitute these values into the y − y1 y2 − y1


=
formula and transform or use a graphics x − x1 x2 − x1
calculator (see page 109). x 1 = 155, y 1 = 53; x 2 = 195, y 2 = 77
y − 53 77 − 53
=
x − 155 195 − 155
y − 53
= 0.6
x − 155
y − 53 = 0.6(x − 155)
y − 53 = 0.6x − 93
∴ y = −40 + 0.6x

E
4 Noting that y represents the variable ∴ height = −40 + 0.6 × weight
weight and x represents the variable
height, rewrite the equation in terms of
weight and height.

PL
Fitting a line using the two-mean method
While fitting a line by eye is quick and easy, it is not a reliable method for finding the equation
of the line that best fits a scatterplot, as everyone is likely to come up with a slightly different
line.
One method for overcoming this problem is to use the two-mean method. The two-mean
method locates the line on the scatterplot by finding the mean of the bottom half and top half
of the data values and draws a line between the two.
M
To fit a line using the two-mean method requires both the scatterplot and the data values.

Example 5 Fitting a line using the two-mean method

The data below give the marks that students obtained on an examination and the times they
spent studying for the examination.
SA

Time (hours), x 4 36 23 19 1 11 18 13 18 8
Mark (%), y 41 87 67 62 23 52 61 43 65 52
Fit a line to the scatterplot using the two-mean method and write its equation in terms of:
a x and y b the variables mark and time.

Solution
1 Rewrite the data pairs in order, according to the x values.
Time, x 1 4 8 11 13 18 18 19 23 36
Mark, y 23 41 52 52 43 61 65 62 67 87

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


158 Essential Standard General Mathematics

2 Divide the ordered table into two new tables: one for the lower half of data values, the
other for the top half of data values. Find the mean values of x and y for each new table.
Lower half
Time,x 1 4 8 11 13 -x L = 7.4
Mark,y 23 41 52 52 43 y-L = 42.2
Upper half
Time,x 18 18 19 23 36 -x u = 22.8
Mark,y 61 65 62 67 87 -y u = 68.4

E
90
3 Plot the two mean points (7.4, 42.2)
80
and (22.8, 68.4) on the scatterplot. 70
4 Draw in the line through the two mean 60 (22.8, 68.4)

Mark (%)
points to plot the two-mean line. 50

PL
5 Use the two mean points (7.4, 42.2)
and (22.8, 68.4) to find the equation of the
line in terms of y and x. Use either the
two-point formula or a graphics calculator
(see page 109).
6 Rewrite the equation of the two-mean line
in terms of the variables mark and time.
40
30
20
10
0
(7.4, 42.2)

5 10 15 20 25 30 35 40
Time (hours)
Equation of the two-mean line :
y = 29.6 + 1.7x
∴ mark = 29.6 + 1.7 × time
M
It is interesting to note that the equation of the two-mean line is very close to the equation we
got by fitting a line by eye. This is often the case when the points in the scatterplot are
reasonably closely scattered around the line. However, for scatterplots where this is not the
case, the two-mean method is a more reliable technique to use than fitting a line by eye.
SA

To find the equation of the two-mean line:


Order the data pairs according to the x values and divide into two equal-sized groups:
lower and upper. If there is an odd number of data points, discard the middle data point.
Find the coordinates of the point (x L , y L ), where x L is the mean of the x values in the
lower half and y L the mean of the y values in the lower half.
Find the coordinates of the point (xU , yU ), where xU is the mean of the x values in the
upper half and yU the mean of the y values in the upper half.
Mark in the two points on the scatterplot. Draw a line through the two points to display
the two-mean line.
Use the two points (x L , y L ) and (xU , yU ) to find the equation of the line. This can be
done using either the two-point formula or a graphics calculator.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 159

Exercise 4D

Infant death rate (per 100 000)


1 Fit a line by eye to the scatterplot opposite. 180
Write the equation of the line in terms of the 160
140
variables infant death rate and female literacy 120
rate. 100
80
60

E
40
20
0 25 50 75 100
Female literacy rate (%)
100
2 Fit a line by eye to the scatterplot opposite.

PL
Write the equation of the line in terms of the
variables height and age.

3 Fit a line by eye to the scatterplot opposite.


Height (cm)
Daughter’s height (cm)
95

90

85

80

180
36 40 44 48 52 56 60
Age (months)
M
Write the equation of the line in terms of the
170
variables daughter’s height and mother’s height.
160

150
SA

150 160 170 180


Mother’s height (cm)
4 The data below gives the velocity of a motorbike (in m/s) over a 5-second interval. Also
shown is the scatterplot in which velocity is plotted against time.
Time (s) Velocity (m/s) 30

0.5 19.3 25
Velocity (m/s)

1 20.4 20

1.5 18.6 15
2 22.2 10
3 22.5 5
3.5 24.3 0
1 2 3 4 5
4 22.5 Time (s)
5 25.5

Find the equation of the two-mean line for this data. Write the equation in terms of the
variables velocity and time.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4
2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


160 Essential Standard General Mathematics

5 The data below gives the prices and ages of 12 used cars. Also shown is the scatterplot
constructed from this data.
Age (years) Price ($)
2 15 800
16
3 14 300
3 13 800 14

Price ($’000)
4 11 800
12
4 13 000

E
4 13 300 10
5 11 000
8
6 12 200
6 9 500 0 2 4 6 8
7 8 300

PL 7
8
9 700
8 000

Find the equation of the two-mean line for this data. Write the equation in terms of the
variables price and age.

6 The data below gives the airspeed and the number of seats in 8 aircraft. Also shown is the
scatterplot constructed from this data.

Number of seats Airspeed (km/hr) 850


Age (years)
M
405 830 825
296 797 800
Airspeed (km/h)

288 774
775
258 736
750
240 757
725
193 765
SA

700
188 760
148 718 100 150 200 250 300 350 400 450
Number of seats

Find the equation of the two-mean line for this data. Write the equation in terms of the
variables airspeed and number of seats.

4.5 Using regression lines to make predictions


As we said earlier, the process of fitting a straight line to bivariate data is known as linear
regression. The aim of linear regression is to model the relationship between two numerical
variables by using the equation of a straight line. Once we have this equation, we can use the
equation to make predictions.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 161

For example, in Example 5 we fitted a line to the data relating students’ marks on an
examination to the time they spent studying for the examination. The equation was
mark = 29.6 + 1.7 × time
Using this equation, and rounding off to the nearest whole number, we would predict that a
student who spent:
0 hours studying would obtain a mark of 30% (mark = 29.6 + 1.7 × 0 = 29.6)
8 hours studying would obtain a mark of 43% (mark = 29.6 + 1.7 × 8 = 43.2)
12 hours studying would obtain a mark of 50% (mark = 29.6 + 1.7 × 12 = 50)

E
30 hours studying would obtain a mark of 81% (mark = 29.6 + 1.7 × 30 = 80.6)
80 hours studying would obtain a mark of 166%! (mark = 29.6 + 1.7 × 80 = 165.6)
This last result points to one of the limitations of regression lines. We are predicting someone
to get more than 100%. When using a regression line to make predictions, we must remember
that, strictly speaking, the equation only applies to the range of data values used to determine

PL
the equation.
Thus, we are safe using the line to make predictions within this data range. This is called
interpolation.
However, we must be extremely careful about how much faith we put into predictions made
outside the data range. Making predictions outside the data range is called extrapolating.

Predicting within the range of data is called interpolation.


Predicting outside the range of data is called extrapolation.
M
For example, if we use the regression
line to predict the examination mark for
30 hours of studying time, we would be 200
Extrapolation: line is
interpolating because we would be making 180
used to make prediction
a prediction within the data. 160 outside the data range.
However, if we use the regression 140
SA

120
Mark (%)

line to predict the examination mark for


100
50 hours of studying time, we would be
80
extrapolating because we would be making
60
a prediction outside the data. Extrapolation Interpolation: line is
40
is a less reliable process than interpolation used to make prediction
20
within the data range.
because we are going beyond the original
0 10 20 30 40 50 60 70 80
data, and we don’t know if the relationship is
Time (hours)
still linear there.

Exercise 4E
1 Complete the following sentences. Using a regression line to make a prediction:
a within the range of data that was used to derive the equation is called .
b outside the range of data that was used to derive the equation is called .

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


162 Essential Standard General Mathematics

2 For children between the ages of 36 and 60 months, the equation relating their height (in cm)
to their age (in months) is:
height = 72 + 0.4 × age
Use this equation to predict the height (to the nearest cm) of a child who is:
a 40 months old. Is this interpolation or extrapolation?
b 55 months old. Is this interpolation or extrapolation?
c 70 months old. Is this interpolation or extrapolation?

E
3 For shoe sizes between 6 and 12, the equation
relating a person’s weight (in kg) to shoe size is:
weight = 48.1 + 2.2 × shoe size
Use this equation to predict the weight (to
the nearest kg) of a person whose shoe size is:

PL a 5. Is this interpolation or extrapolation?


b 8. Is this interpolation or extrapolation?
c 11. Is this interpolation or extrapolation?

4 When preparing between 25 and 100 meals, a cafeteria’s cost (in dollars) is given by the
equation:
cost = 175 + 5.8 × number of meals
Use this equation to predict the cost (to the nearest dollar) of preparing:
M
a no meals. Is this interpolation or extrapolation?
b 60 meals. Is this interpolation or extrapolation?
c 89 meals. Is this interpolation or extrapolation?

5 For women of heights from 150 to 180 cm, the equation relating a daughter’s adult height
(in cm) to her mother’s height (in cm) is:
SA

daughter’s height = 18.3 + 0.91 × mother’s height


Use this equation to predict (to the nearest centimetre) the adult height of a woman whose
mother is:
a 168 cm tall. Is this interpolation or extrapolation?
b 196 cm tall. Is this interpolation or extrapolation?
c 155 cm tall. Is this interpolation or extrapolation?

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back
B a c kto
t oMenu
M e n u>>>
>>>
Chapter 4 — Bivariate data 163

Review
Key ideas and chapter summary

Scatterplot A scatterplot is used to help identify and describe the relationship


between two numerical variables.
5
4.5

Socre on hearing test


4
3.5

E
DV 3
2.5
2
1.5
1
25 30 35 40 45 50 55 60

PL
Identifying relationships
between two numerical
variables
axis.
Age
IV
In a scatterplot, the dependent variable (DV) is plotted on the
vertical axis and the independent variable (IV) on the horizontal

A random cluster of points (no clear pattern) indicates that the


variables are unrelated.
M
A clear pattern in the scatterplot indicates that the variables are
related.

Describing relationships Relationships are described in terms of:


SA

in scatterplots direction (positive or negative) and outliers


strength (strong, moderate, weak or none).
q-correlation coefficient The quadrant or q-correlation coefficient is a measure of the
strength of the relationship between two numerical variables.
The q-correlation coefficient is defined by y
(a + c) − (b + d)
q= B A
a+b+c+d
where, a, b, c and d are the number of points
C D
in the four quadrants of the scatterplot
labelled A, B, C and D respectively. x
O
Any points that lie on the lines are omitted.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Review 164 Essential Standard General Mathematics

q-correlation: strength The q-correlation coefficient Strong positive relationship


can be used to classify the 0.75 ≤ q ≤ 1
strength of the relationship
between two numerical Moderate positive relationship
0.5 ≤ q < 0.75
variables as weak, moderate
or strong, using the Weak positive relationship
guidelines shown in the 0.25 ≤ q < 0.5
table.

E
No relationship
– 0.25 < q < 0.25

Weak negative relationship


– 0.5 < q ≤ –0.25

PL
Fitting lines to
scatterplots: linear
regression
Moderate negative relationship
– 0.75 < q ≤ – 0.5

Strong negative relationship


–1 ≤ q ≤ – 0.75

A straight line can be used to model the relationship between two


numerical variables when the relationship is linear. This is known as
linear regression.
The relationship can then be described by a rule of the form
M
y = a + bx
where y is the dependent variable (DV), x is the independent
variable (IV), a is the y-intercept of the line and b is the slope of the
line.

Fitting a line by eye Fitting a line by eye means drawing a line on the scatterplot that
SA

captures the general trend of the data. It is most suitable when there is
minimal scatter in the scatterplot.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 165

Review
Fitting a line using The two-mean method positions the line on the scatterplot by finding the
the two-mean mean of the bottom half and top half of the data values. A line is then
method
drawn between the two.

E
Using a regression The regression line y = a + bx enables the value of y to be determined
line to make for a given value of x.

PL
predictions
Interpolation and
extrapolation

Skills check
Predicting within the range of data is called interpolation.
Predicting outside the range of data is called extrapolation.

Having completed the current chapter you should be able to:


M
construct a scatterplot
use a scatterplot to comment on the direction of a relationship (positive or negative)
and possible outliers
calculate and interpret the q-correlation coefficient
determine the equation of a line drawn by eye
determine the equation of a two-mean line
SA

use the equation of the line for prediction


distinguish between interpolation and extrapolation when using a line to make a
prediction.

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Review 166 Essential Standard General Mathematics

Multiple-choice questions

1 For which one of the following pairs of variables would it be appropriate to


construct a scatterplot?
A Eye colour (blue, green, brown, other) and hair colour (black, brown, blonde,
red, other)
B Score out of 100 on a test for a group of Year 9 students and a group of Year 11
students

E
C Political party preference (Labor, Liberal, Other) and age in years
D Age in years and blood pressure in mm Hg
E Height in cm and sex (male, female)
2 For the scatterplot shown, the relationship between the y
variables is best described as:

PL A weak negative
B strong negative
C no relationship
D weak positive
E strong positive

3 For the scatterplot shown, the relationship between the


variables is best described as:
A weak negative
y
x
M
B strong negative
C no relationship
D weak positive x
E strong positive

4 For the scatterplot shown, the relationship between the y


SA

variables is best described as:


A weak negative
B strong negative
C no relationship
D weak positive x
E strong positive

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 167

Review
5 For the scatterplot shown, the relationship between the y
variables is best described as:
A weak negative
B strong negative
C no relationship
D weak positive x
E strong positive

E
6 A q-correlation coefficient of 0.32 would describe a relationship classified as:
A weak positive B moderate positive C strong positive
D close to zero E moderately strong
7 For the scatterplot shown, the q-correlation 10
9

PL coefficient is:
A −1
B −0.5
C 0
D 0.5
E 1

8 For the scatterplot shown, the q-correlation


8
7
6
5
4
3
2
1

10
0
1 2 3 4 5 6 7 8 9 10
M
coefficient is: 9
8
A −1 7
6
B −0.5 5
C 0 4
3
D 0.5 2
1
E 1
0 1 2 3 4 5 6 7 8 9 10
SA

9 For the scatterplot shown, the q-correlation 10


coefficient is: 9
8
A 0.2 7
6
B 0.4 5
C 0.6 4
3
D 0.8 2
1
E 1.0
0 1 2 3 4 5 6 7 8 9 10

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Review 168 Essential Standard General Mathematics

10 For the scatterplot shown, the line drawn by eye 30


would have an equation closest to:
25
A velocity = 5 × time
20

Velocity (m/s)
B velocity = 19 + 1 × time
C velocity = 1 + 19 × time 15

D velocity = 19 + 5 × time 10
E velocity = 5 + 19 × time 5

E
0
1 2 3 4 5
Time (s)
11 For the scatterplot shown, the line drawn by 16

eye would have a slope closest to:


14
A −2000

PL B −1000
C −200
D 2000
E 1000

The following information relates to Questions 12 and 13


Price ($’000)
12

10

0
2 4
Age (years)
6 8
M
The weekly income and weekly food costs for a group of 10 university students is given
in the following table.

Income ($) 150 250 300 300 380 450 600 850 950 1000
Food cost ($) 40 60 70 130 150 260 120 460 200 600

12 The equation of the two-mean line would be found by finding the equation of the
SA

line passing through the points:


A (276, 90) and (770, 328) B (300, 70) and (850, 460)
C (90, 276) and (328, 770) D (150, 40) and (1000, 600)
E (276, 84) and (770, 334)
13 The equation of the two-mean line that would enable food cost to be predicted from
weekly income is closest to:
A food cost = 0.48 + 43 × income B food cost = 0.48 − 43 × income
C food cost = −43 + 0.48 × income D food cost = 240 + 1.4 × income
E food cost = 1.4 + 240 × income

The following information relates to Questions 14 and 15


For incomes between $600 and $1200 per week, the equation of a line that relates
weekly expenditure on entertainment (in dollars) to weekly income (in dollars) is given
by:
expenditure = 40 + 0.10 × income
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4
2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 169

Review
14 The equation predicts that the amount spent on entertainment by a person with an
income of $800 is:
A $40 B $80 C $120 D $160 E $1200
15 The following statements relate to the equation
expenditure = 40 + 0.10 × income
Which statement is not true?
A Expenditure is the dependent variable. B Income is the independent variable.

E
C The slope of the line is 0.10. D The intercept of the line is 40.
E Using the line to predict the expenditure of a person with an income of $1500
per week is called interpolation.

PL
Short-answer questions

1 The following table gives the number of times the ball was inside the 50 m line in an
AFL football game, and the team’s score in that game.

Inside 50 m 64 57 34 61 51 52 53 51 64 55 58 71
Score (points) 90 134 76 92 93 45 120 66 105 108 88 133

a Construct a scatterplot of score against the number of times the ball was
inside 50 m.
M
b From the scatterplot, describe any relationship between the two variables.
2 Determine the q-correlation coefficient for the 80
70
scatterplot shown.
60
Distance (km)

50
40
30
SA

20
10
0
5 10 15 20 25 30 35 40 45 50
Time (min)

3 The following scatterplot shows the relationship 220


200
between height and weight for a group of obese
180
people. A line by eye has been drawn on the
Weight (kg)

160
scatterplot. Find the equation of the line.
140
120
100

150 160 170 180 190 200


Height (cm)

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Review 170 Essential Standard General Mathematics

4 The time taken to complete a task, and the number of errors on the task, were
recorded for a sample of 10 primary school children. Determine the equation of the
two-mean line that fits this data.
Time (s) 22.6 21.7 21.7 21.3 19.3 17.6 17.0 14.6 14.0 8.8
Errors 2 3 3 4 5 5 7 7 9 9

Extended-response questions

E
1 A marketing company wishes to predict the likely number of new clients that each of
its graduates will attract to the business in their first year of employment. It plans to
do this by using the graduates’ scores on a marketing examination in the final year of
their course.

PL Graduate
1
2
3
4
5
6
Examination score
65
72
68
85
74
61
Number of new clients

10
10
7
9
8

8
M
7 60 6
8 78 10
9 70 5
10 82 11

a Which is the independent variable and which is the dependent variable?


SA

b Construct a scatterplot of this data.


c Describe the relationship between the number of new clients and the examination
score.
d Determine the value of the q-correlation coefficient for this data, and classify the
strength of the relationship.
e Determine the equation for the two-mean line and write it down in terms of the
variables number of new clients and examination score.
f Use your equation to predict, to the nearest whole number, the number of new
clients for a graduate who scored 100 on the examination.
g In making this prediction, are you interpolating or extrapolating?

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Chapter 4 — Bivariate data 171

Review
2 To investigate the relationship between marks on an assignment and the final
examination mark, a sample of 10 students was taken. The table below indicates the
marks for the assignment and the final exam mark for each student.

Assignment mark 80 77 71 78 65 80 68 64 50 66
(max = 80)
Final exam mark 83 83 79 75 68 84 71 69 66 58
(max = 90)

E
a Which is the independent variable and which is the dependent variable?
b Construct a scatterplot of this data.
c Describe the relationship between the assignment mark and the final examination
mark.

PL d Determine the value of the q-correlation coefficient for this data, and classify the
strength of the relationship.
e Use your answer to part d to comment on the statement: ‘Good final exam marks
are the result of good assignment marks.’
f Determine the equation for the two-mean line and write it down in terms of the
variables final exam mark and assignment mark.
g Use your equation to predict the final examination mark for a student who scored
50 on the assignment.
h In making this prediction, are you interpolating or extrapolating?
M
3 A marketing firm wanted to investigate the relationship between airplay and CD
sales (in the following week) of newly released CDs. The following data was
collected on a random sample of 10 CDs.

Number of 47 34 40 34 33 50 28 53 25 46
times played
SA

Weekly sales 3950 2500 3700 2800 2900 3750 2300 4400 2200 3400

a Which is the independent variable and which is the dependent variable?


b Construct a scatterplot of this data.
c Describe the association between the number of times the CD was played and
weekly sales.
d Determine the value of the q-correlation coefficient for this data, and classify the
strength of the relationship.
e Determine the equation for the two-mean line and write it down in terms of the
variables number of times played and weekly sales.
f Use your equation to predict the weekly sales for a CD that was played 60 times.
g In making this prediction, are you interpolating or extrapolating?

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
P1: FXS/ABE P2: FXS
0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27

Back to Menu >>>


Review 172 Essential Standard General Mathematics

4 The following table gives the gold-medal winning distance, in metres, for the men’s
long jump for the Olympic games for the years 1896 to 1996. (Some years were
missing owing to the two world wars.)

Year 1896 1900 1904 1908 1912 1920 1924 1928 1932 1936 1948 1952 1956
Distance (m) 6.35 7.19 7.34 7.49 7.59 7.16 7.44 7.75 7.65 8.05 7.82 7.57 7.82
Year 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004
Distance (m) 8.13 8.08 8.92 8.26 8.36 8.53 8.53 8.72 8.67 8.50 8.55 8.59

E
a Which is the independent variable and which is the dependent variable?
b Construct a scatterplot of these data.
c Describe the association between the distance and year.
d Determine the value of the q-correlation coefficient for these data, and classify the

PL strength of the relationship.


e Determine the equation for the two-mean line and write down in terms of the
variables distance and year.
f Use your equation to predict the winning distance in the year 2008.
g How reliable is the prediction made in part f?
M
SA

Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4


2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

You might also like