Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
433 views44 pages

Regression MS

Uploaded by

efetinerci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
433 views44 pages

Regression MS

Uploaded by

efetinerci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Regression [157 marks]

1. [Maximum mark: 5] SPM.2.SL.TZ0.5


The following table below shows the marks scored by seven students on two
different mathematics tests.

Let L1 be the regression line of x on y. The equation of the line L1 can be written in
the form x = ay + b.

(a) Find the value of a and the value of b. [2]

Markscheme

a = 1.29 and b = −10.4 A1A1

[2 marks]

(b) Let L2 be the regression line of y on x. The lines L1 and L2 pass


through the same point with coordinates (p , q).

Find the value of p and the value of q. [3]

Markscheme

recognising both lines pass through the mean point (M1)

p = 28.7, q = 30.3 A2

[3 marks]

2. [Maximum mark: 7] SPM.2.AHL.TZ0.4


The following table below shows the marks scored by seven students on two
different mathematics tests.

Let L1 be the regression line of x on y. The equation of the line L1 can be written in
the form x = ay + b.

(a) Find the value of a and the value of b. [2]

Markscheme

a = 1.29 and b = −10.4 A1A1

[2 marks]

Let L2 be the regression line of y on x. The lines L1 and L2 pass through the same
point with coordinates (p , q).

(b) Find the value of p and the value of q. [3]

Markscheme

recognising both lines pass through the mean point (M1)

p = 28.7, q = 30.3 A2

[3 marks]

(c) Jennifer was absent for the first test but scored 29 marks on the
second test. Use an appropriate regression equation to estimate
Jennifer’s mark on the first test. [2]

Markscheme

substitution into their x on y equation (M1)


x= 1.29082(29) − 10.3793

x= 27.1 A1

Note: Accept 27.

[2 marks]

3. [Maximum mark: 15] EXM.2.SL.TZ0.1


The principal of a high school is concerned about the effect social media use might
be having on the self-esteem of her students. She decides to survey a random
sample of 9 students to gather some data. She wants the number of students in
each grade in the sample to be, as far as possible, in the same proportion as the
number of students in each grade in the school.

(a) State the name for this type of sampling technique. [1]

Markscheme

Stratified sampling A1

[1 mark]

The number of students in each grade in the school is shown in table.

(b.i) Show that 3 students will be selected from grade 12. [3]

Markscheme
There are 260 students in total A1

84

260
× 9 = 2.91 M1A1

So 3 students will be selected. AG

[3 marks]

(b.ii) Calculate the number of students in each grade in the sample. [2]

Markscheme

grade 9 = 60

260
× 9 ≈ 2, grade 10 = 83

260
× 9 ≈ 3, grade 11
=
33

260
× 9 ≈ 1 A2

[2 marks]

In order to select the 3 students from grade 12, the principal lists their names in
alphabetical order and selects the 28th, 56th and 84th student on the list.

(c) State the name for this type of sampling technique. [1]

Markscheme

Systematic sampling A1

[1 mark]

Once the principal has obtained the names of the 9 students in the random sample,
she surveys each student to find out how long they used social media the previous
day and measures their self-esteem using the Rosenberg scale. The Rosenberg scale
is a number between 10 and 40, where a high number represents high self-esteem.
(d.i) Calculate Pearson’s product moment correlation coefficient, r. [2]

Markscheme

r = −0.901 A2

[2 marks]

(d.ii) Interpret the meaning of the value of r in the context of the


principal’s concerns. [1]

Markscheme

The negative value of r indicates that more time spent on social media leads
to lower self-esteem, supporting the principal’s concerns. R1

[1 mark]

(d.iii) Explain why the value of r makes it appropriate to find the


equation of a regression line. [1]

Markscheme

r beingclose to –1 indicates there is strong correlation, so a regression line is


appropriate. R1

[1 mark]

(e) Another student at the school, Jasmine, has a self-esteem value of


29.

By finding the equation of an appropriate regression line, estimate


the time Jasmine spent on social media the previous day.
[4]

Markscheme

Find the regression line of t on s. M1

t = −0.281s + 9.74 A1

t = (−0.2807 …) (29) + 9.739 … = 1.60 hours M1A1

[4 marks]

4. [Maximum mark: 7] 23M.2.SL.TZ1.4


The total number of children, y, visiting a park depends on the highest
temperature, T , in degrees Celsius (°C). A park official predicts the total number of
children visiting his park on any given day using the model
+ 23T + 110, where 10 ≤ T ≤ 35.
2
y = −0. 6T

(a) Use this model to estimate the number of children in the park on a
day when the highest temperature is 25 °C. [2]

Markscheme

recognising to find y(25) (M1)

2
y(25) = −0. 6 × 25 + 23 × 25 + 110

= 310 (children) A1

[2 marks]

An ice cream vendor investigates the relationship between the total number of
children visiting the park and the number of ice creams sold, x. The following table
shows the data collected on five different days.
Total number
81 175 202 346 360
of children (y)
Ice creams
15 27 23 35 46
sold (x)
(b) Find an appropriate regression equation that will allow the
vendor to predict the number of ice creams sold on a day when
there are y children in the park. [3]

Markscheme

recognizing x on y is required (M1)

0. 0935114 … and 7. 43053 … (A1)

x = 0. 0935y + 7. 43 A1

[3 marks]

(c) Hence, use your regression equation to predict the number of ice
creams that the vendor sells on a day when the highest
temperature is 25°C. [2]

Markscheme

attempt to substitute their answer to part (a) into their regression equation for
either x or y (M1)

x = 0. 0935114. . . ×310 + 7. 43053. . . (= 36. 4190. . . )

36 (accept 37 or 36. 4) A1

Note: Award (M1)A1FT for x = 37 found from using y = 9. 39x − 41. 5.

Award (M1)A0FT for a correct FT answer that lies outside [15, 46].
[2 marks]

5. [Maximum mark: 5] 22N.2.SL.TZ0.1


The following table shows the Mathematics test scores (x) and the Science test
scores (y) for a group of eight students.

The regression line of y on x for this data can be written in the form y = ax + b.

(a) Find the value of a and the value of b. [2]

Markscheme

1. 01206 … , 2. 45230 …

a = 1. 01, b = 2. 45 (1. 01x + 2. 45) A1A1

[2 marks]

(b) Write down the value of the Pearson’s product-moment


correlation coefficient, r. [1]

Markscheme

0. 981464 …

r = 0. 981 A1
Note: A common error is to enter the data incorrectly into the GDC, and obtain
the answers a = 1. 01700 … , b = 2. 09814 … and
r = 0. 980888 … Some candidates may write the 3 sf answers, ie.

a = 1. 02, b = 2. 10 and r = 0. 981 or 2 sf answers, ie.

a = 1. 0, b = 2. 1 and r = 0. 98. In these cases award A0A0 for part (a)

and A0 for part (b). Even though some values round to an accepted answer,
they come from incorrect working.

[1 mark]

(c) Use the equation of your regression line to predict the Science test
score for a student who has a score of 78 on the Mathematics test.
Express your answer to the nearest integer. [2]

Markscheme

correct substitution of 78 into their regression equation (M1)

81. 3930 … 81. 23 from 3 sf answer

81 A1

[2 marks]

6. [Maximum mark: 7] 22M.1.SL.TZ1.3


A survey at a swimming pool is given to one adult in each family. The age of
the adult, a years old, and of their eldest child, c years old, are recorded.

The ages of the eldest child are summarized in the following box and whisker
diagram.
(a) Find the largest value of c that would not be considered an outlier. [3]

Markscheme

IQR = 10 − 6(= 4) (A1)

attempt to find Q3 + 1. 5 × IQR (M1)

10 + 6

16 A1

[3 marks]

The regression line of a on c is a =


7

4
c + 20. The regression line of c on a is
a − 9.
1
c =
2

(b.i) One of the adults surveyed is 42 years old. Estimate the age of
their eldest child. [2]

Markscheme

choosing c =
1

2
a − 9 (M1)

1
× 42 − 9
2

= 12 (years old) A1
[2 marks]

(b.ii) Find the mean age of all the adults surveyed. [2]

Markscheme

attempt to solve system by substitution or elimination (M1)

34 (years old) A1

[2 marks]

7. [Maximum mark: 7] 21N.2.AHL.TZ0.1


In Lucy’s music academy, eight students took their piano diploma examination and
achieved scores out of 150. For her records, Lucy decided to record the average
number of hours per week each student reported practising in the weeks prior to
their examination. These results are summarized in the table below.

(a) Find Pearson’s product-moment correlation coefficient, r, for


these data. [2]

Markscheme

use of GDC to give (M1)

r = 0. 883529 …

r = 0. 884 A1
Note: Award the (M1) for any correct value of r, a, b or r2 = 0. 780624 …

seen in part (a) or part (b).

[2 marks]

(b) The relationship between the variables can be modelled by the


regression equation D = ah + b. Write down the value of a
and the value of b. [1]

Markscheme

a = 1. 36609 … , b = 64. 5171 …

a = 1. 37 , b = 64. 5 A1

[1 mark]

(c) One of these eight students was disappointed with her result and
wished she had practised more. Based on the given data,
determine how her score could have been expected to alter had
she practised an extra five hours per week. [2]

Markscheme

attempt to find their difference (M1)

5 × 1. 36609 … OR
1. 36609 … (h + 5) + 64. 5171 … − (1. 36609 … h + 64. 5171 …)

6. 83045 …

= 6. 83 (6. 85 f rom 1. 37)

the student could have expected her score to increase by 7 marks. A1


Note: Accept an increase of 6, 6. 83 or 6. 85.

[2 marks]

(d) Lucy asserts that the number of hours a student practises has a
direct effect on their final diploma result. Comment on the validity
of Lucy’s assertion. [1]

Markscheme

Lucy is incorrect in suggesting there is a causal relationship.

This might be true, but the data can only indicate a correlation. R1

Note: Accept ‘Lucy is incorrect as correlation does not imply causation’ or


equivalent.

[1 mark]

(e) Lucy suspected that each student had not been practising as much
as they reported. In order to compensate for this, Lucy deducted a
fixed number of hours per week from each of the students’
recorded hours.

State how, if at all, the value of r would be affected. [1]

Markscheme

no effect A1

[1 mark]
8. [Maximum mark: 7] 21M.2.SL.TZ1.2
The following table shows the data collected from an experiment.

The data is also represented on the following scatter diagram.

The relationship between x and y can be modelled by the regression line of y on x


with equation y = ax + b, where a, b ∈ R.

(a) Write down the value of a and the value of b. [2]

Markscheme

a = 0. 433156 … , b = 4. 50265 …

a = 0. 433, b = 4. 50 A1A1

[2 marks]
(b) Use this model to predict the value of y when x = 18. [2]

Markscheme

attempt to substitute x = 18 into their equation (M1)

y = 0. 433 × 18 + 4. 50

= 12. 2994 …

= 12. 3 A1

[2 marks]

(c)
¯
¯ Write down the value of x and the value of y . [1]

Markscheme

x = 15, y = 11
¯
¯ A1

[1 mark]

(d) Draw the line of best fit on the scatter diagram. [2]

Markscheme
A1A1

Note: Award marks as follows:

A1 for a straight line going through (15, 11)

A1 for intercepting the y-axis between their b ± 1. 5 (when their line is


extended), which includes all the data for 3. 3 ≤ x ≤ 25. 3.

If the candidate does not use a ruler, award A0A1 where appropriate.

[2 marks]

9. [Maximum mark: 6] 21M.2.SL.TZ2.1


At a café, the waiting time between ordering and receiving a cup of coffee is
dependent upon the number of customers who have already ordered their coffee
and are waiting to receive it.

Sarah, a regular customer, visited the café on five consecutive days. The following
table shows the number of customers, x, ahead of Sarah who have already ordered
and are waiting to receive their coffee and Sarah’s waiting time, y minutes.

The relationship between x and y can be modelled by the regression line of y on x


with equation y = ax + b.
(a.i) Find the value of a and the value of b. [2]

Markscheme

a = 0. 805084 … and b = 2. 88135 …

a = 0. 805 and b = 2. 88 A1A1

[2 marks]

(a.ii) Write down the value of Pearson’s product-moment correlation


coefficient, r. [1]

Markscheme

r = 0. 97777 …

r = 0. 978 A1

[1 mark]

(b) Interpret, in context, the value of a found in part (a)(i). [1]

Markscheme
a represents the (average)increase in waiting time (0. 805 mins) per
additional customer (waiting to receive their coffee) R1

[1 mark]

(c) On another day, Sarah visits the café to order a coffee. Seven
customers have already ordered their coffee and are waiting to
receive it.

Use the result from part (a)(i) to estimate Sarah’s waiting time to
receive her coffee. [2]

Markscheme

attempt to substitute x = 7 into their equation (M1)

8. 51693 …

8. 52 (mins) A1

[2 marks]

10. [Maximum mark: 6] 20N.2.SL.TZ0.S_2


Lucy sells hot chocolate drinks at her snack bar and has noticed that she sells
more hot chocolates on cooler days. On six different days, she records the maximum
daily temperature, T , measured in degrees centigrade, and the number of hot
chocolates sold, H . The results are shown in the following table.
The relationship between H and T can be modelled by the regression line
with equation H = aT + b.
(a.i) Find the value of a and of b. [3]

Markscheme

valid approach (M1)

eg correct value for a or b (or for r or r2 = 0. 962839 seen in (ii))

a = −9. 84636, b = 221. 592

a = −9. 85, b = 222 A1A1 N3

[3 marks]

(a.ii) Write down the correlation coefficient. [1]

Markscheme

−0. 981244

r = −0. 981 A1 N1

[1 mark]

(b) Using the regression equation, estimate the number of hot


chocolates that Lucy will sell on a day when the maximum
temperature is 12°C. [2]

Markscheme

correct substitution into their equation (A1)

eg −9. 85 × 12 + 222

103. 435 (103. 8 from 3 sf )


103 (hot chocolates) A1 N2

[2 marks]

11. [Maximum mark: 6] 19N.2.SL.TZ0.S_1


The number of messages, M , that six randomly selected teenagers sent during the
month of October is shown in the following table. The table also shows the time, T ,
that they spent talking on their phone during the same month.

The relationship between the variables can be modelled by the regression


equation M = aT + b.

(a) Write down the value of a and of b. [3]

Markscheme

evidence of set up (M1)

eg correct value for a or b (accept r = 0.966856)

4.30161, 163.330

a = 4.30, b = 163 (accept y = 4.30x + 163) A1A1 N3

[3 marks]

(b) Use your regression equation to predict the number of messages


sent by a teenager that spent 154 minutes talking on their phone
in October. [3]

Markscheme
valid approach (M1)

eg 4.30 (154) + 163

eg 825.778 (825.2 from 3 sf values) (A1)

number of messages = 826 (must be an integer) A1 N3

[3 marks]

12. [Maximum mark: 7] 19N.3.AHL.TZ0.Hsp_1


Peter, the Principal of a college, believes that there is an association between the
score in a Mathematics test, X , and the time taken to run 500 m, Y seconds, of his
students. The following paired data are collected.

It can be assumed that (X, Y ) follow a bivariate normal distribution with product
moment correlation coefficient ρ.

(a.i) State suitable hypotheses H0 and H1 to test Peter’s claim, using a


two-tailed test. [1]

Markscheme

H0 : ρ = 0 H1 : ρ ≠ 0 A1

Note: It must be ρ.

[1 mark]

(a.ii) Carry out a suitable test at the 5 % significance level. With


reference to the p-value, state your conclusion in the context of
Peter’s claim. [4]
Markscheme

p = 0.649 A2

Note: Accept anything that rounds to 0.65

0.649 > 0.05 R1

hence, we accept H0 and conclude that Peter’s claim is wrong A1

Note: The A mark depends on the R mark and the answer must be given in
context. Follow through the p-value in part (b).

[4 marks]

(b) Peter uses the regression line of y on x as y = 0.248x + 83.0


and calculates that a student with a Mathematics test score of 73
will have a running time of 101 seconds. Comment on the validity
of his calculation. [2]

Markscheme

a statement along along the lines of ‘(we have accepted that) the two
variables are independent’ or ‘the two variables are weakly correlated’ R1

a statement along the lines of ‘the use of the regression line is invalid’ or ‘it
would give an inaccurate result’ R1

Note: Award the second R1 only if the first R1 is awarded.

Note: FT the conclusion in(a)(ii). If a candidate concludes that the claim is


correct, mark as follows: (as we have accepted H1) the 2 variables are
dependent and 73 lies in the range of x values R1, hence the use of the
regression line is valid R1.

[2 marks]
13. [Maximum mark: 6] 19M.1.SL.TZ2.T_2
Colorado beetles are a pest, which can cause major damage to potato crops. For a
certain Colorado beetle the amount of oxygen, in millilitres (ml), consumed each
day increases with temperature as shown in the following table.

This information has been used to plot a scatter diagram.

(a) Find the equation of the regression line of y on x. [2]

Markscheme

* This question is from an exam for a previous syllabus, and may contain minor
differences in marking or structure.

y = 15.5x − 80 (A1)(A1) (C2)

Note: Award (A1) for 15.5x; (A1) for −80. Award at most (A1)(A0) if answer is not
an equation. Award (A0)(A1)(ft) for y = −80x + 15.5.

[2 marks]
The mean point has coordinates (20, 230).

(b) Draw the regression line of y on x on the scatter diagram. [2]

Markscheme

(A1)(A1) (C2)

Note: Award (A1) for a straight line using a ruler passing through (20, 230); (A1)
for correct y-intercept. If a ruler has not been used, award at most (A0)(A1).

[2 marks]

(c) In order to estimate the amount of oxygen consumed, this


regression line is considered to be reliable for a temperature x
such that a ≤ x ≤ b.

Write down the value of a and of b. [2]

Markscheme
a = 10 AND b = 30 (A1)(A1) (C2)

Note: Accept [10, 30] or 10 ≤ x ≤ 30.

[2 marks]

14. [Maximum mark: 6] 19M.1.SL.TZ2.T_2


Colorado beetles are a pest, which can cause major damage to potato crops. For a
certain Colorado beetle the amount of oxygen, in millilitres (ml), consumed each
day increases with temperature as shown in the following table.

This information has been used to plot a scatter diagram.

(a) Find the equation of the regression line of y on x. [2]

Markscheme
* This question is from an exam for a previous syllabus, and may contain minor
differences in marking or structure.

y = 15.5x − 80 (A1)(A1) (C2)

Note: Award (A1) for 15.5x; (A1) for −80. Award at most (A1)(A0) if answer is not
an equation. Award (A0)(A1)(ft) for y = −80x + 15.5.

[2 marks]

The mean point has coordinates (20, 230).

(b) Draw the regression line of y on x on the scatter diagram. [2]

Markscheme

(A1)(A1) (C2)

Note: Award (A1) for a straight line using a ruler passing through (20, 230); (A1)
for correct y-intercept. If a ruler has not been used, award at most (A0)(A1).
[2 marks]

(c) In order to estimate the amount of oxygen consumed, this


regression line is considered to be reliable for a temperature x
such that a ≤ x ≤ b.

Write down the value of a and of b. [2]

Markscheme

a = 10 AND b = 30 (A1)(A1) (C2)

Note: Accept [10, 30] or 10 ≤ x ≤ 30.

[2 marks]

15. [Maximum mark: 6] 19M.2.SL.TZ1.S_5


A jigsaw puzzle consists of many differently shaped pieces that fit together to form
a picture.

Jill is doing a 1000-piece jigsaw puzzle. She started by sorting the edge pieces from
the interior pieces. Six times she stopped and counted how many of each type she
had found. The following table indicates this information.
Jill models the relationship between these variables using the regression equation
y = ax + b.

(a) Write down the value of a and of b. [3]

Markscheme

* This question is from an exam for a previous syllabus, and may contain minor
differences in marking or structure.

valid approach (M1)

eg correct value for a or b (ignore incorrect labels)

a = 6.92986, b = 8.80769

a = 6.93, b = 8.81 (accept y = 6.93x + 8.81) A1A1 N3

[3 marks]

(b) Use the model to predict how many edge pieces she had found
when she had sorted a total of 750 pieces. [3]

Markscheme

valid approach (M1)

eg 750 = x + y, edge + interior = 750

correct working (A1)

eg 750 − x = 6.9298x + 8.807 , 93.4684

93 (pieces) (accept 94) A1 N3

[3 marks]
16. [Maximum mark: 16] 19M.2.SL.TZ1.T_1
A healthy human body temperature is 37.0 °C. Eight people were medically
examined and the difference in their body temperature (°C), from 37.0 °C, was
recorded. Their heartbeat (beats per minute) was also recorded.

(a) Draw a scatter diagram for temperature difference from 37 °C (x)


against heartbeat (y). Use a scale of 2 cm for 0.1 °C on the
horizontal axis, starting with −0.3 °C. Use a scale of 1 cm for 2
heartbeats per minute on the vertical axis, starting with 60 beats
per minute. [4]

Markscheme

* This question is from an exam for a previous syllabus, and may contain minor
differences in marking or structure.
(A4)

Note: Award (A1) for correct scales, axis labels, minimum x = −0.3, and
minimum y = 60. Award (A0) if axes are reversed and follow through for their
points.

Award (A3) for all eight points correctly plotted,


(A2) for six or seven points correctly plotted.
(A1) for four or five points correctly plotted.

Allow a tolerance of half a small square.

If graph paper has not been used, award at most (A1)(A0)(A0)(A0).


If accuracy cannot be determined award (A0)(A0)(A0)(A0).

[4 marks]

(b.i) Write down, for this set of data the mean temperature difference
from 37 °C, x̄. [1]

Markscheme

0.025 ( 40
1
) (A1)

[1 mark]

(b.ii) Write down, for this set of data the mean number of heartbeats per
minute, ȳ. [1]

Markscheme

74 (A1)

[1 mark]

(c) Plot and label the point M(x̄, ȳ) on the scatter diagram. [2]

Markscheme

the point M labelled, correctly plotted on their diagram (A1)(A1)(ft)

Note: Award (A1) for labelled M. Do not accept any other label. Award (A1)(ft)
for their point M correctly plotted. Follow through from part (b).

[2 marks]

(d.i) Use your graphic display calculator to find the Pearson’s product–
moment correlation coefficient, r. [2]
Markscheme

0.807 (0.806797…) (G2)

[2 marks]

(d.ii) Hence describe the correlation between temperature difference


from 37 °C and heartbeat. [2]

Markscheme

(moderately) strong, positive (A1)(ft)(A1)(ft)

Note: Award (A1) for (moderately) strong, (A1) for positive. Follow through
from part (d)(i). If there is no answer to part (d)(i), award at most (A0)(A1).

[2 marks]

(e) Use your graphic display calculator to find the equation of the
regression line y on x. [2]

Markscheme

y = 22.0x + 73.5 (y = 21.9819 … x + 73.4504 …) (G2)

Note: Award (G1) for 22.0x, (G1) for 73.5.

Award a maximum of (G0)(G1) if the answer is not an equation.

[2 marks]

(f ) Draw the regression line y on x on the scatter diagram. [2]

Markscheme

their regression line correctly drawn on scatter diagram (A1)(ft)(A1)(ft)


Note: Award (A1)(ft) for a straight line, using a ruler, intercepting their mean
point, and (A1)(ft) for intercepting the y-axis at their 73.5 and the gradient of
the line is positive. If graph paper is not used, award at most (A1)(A0). Follow
through from part (e).

[2 marks]

17. [Maximum mark: 6] 19M.2.SL.TZ2.S_1


A group of 7 adult men wanted to see if there was a relationship between their
Body Mass Index (BMI) and their waist size. Their waist sizes, in centimetres, were
recorded and their BMI calculated. The following table shows the results.

The relationship between x and y can be modelled by the regression equation


y = ax + b.

(a.i) Write down the value of a and of b. [3]

Markscheme

valid approach (M1)

eg correct value for a or b (or for correct r or r2 = 0.955631 seen in (ii))

0.141120, 11.1424

a= 0.141, b= 11.1 A1A1 N3

[3 marks]

(a.ii) Find the correlation coefficient. [1]


Markscheme

0.977563

r= 0.978 A1 N1

[1 mark]

(b) Use the regression equation to estimate the BMI of an adult man
whose waist size is 95 cm. [2]

Markscheme

correct substitution into their regression equation (A1)

eg 0.141(95) + 11.1

24.5488

24.5 A1 N2

[2 marks]

18. [Maximum mark: 14] 18N.2.SL.TZ0.T_1


The marks obtained by nine Mathematical Studies SL students in their projects (x)
and their final IB examination scores (y) were recorded. These data were used to
determine whether the project mark is a good predictor of the examination score.
The results are shown in the table.

(a.i) Use your graphic display calculator to write down x̄, the mean
project mark. [1]
Markscheme

14 (G1)

[1 mark]

(a.ii) Use your graphic display calculator to write down ȳ, the mean
examination score. [1]

Markscheme

54 (G1)

[1 mark]

(a.iii) Use your graphic display calculator to write down r , Pearson’s


product–moment correlation coefficient. [2]

Markscheme

0.5 (G2)

[2 marks]

The equation of the regression line y on x is y = mx + c.

(b.i) Find the exact value of m and of c for these data. [2]

Markscheme
m = 0.875, c = 41.75 (m =
7

8
, c =
167

4
) (A1)(A1)

Note: Award (A1) for 0.875 seen. Award (A1) for 41.75 seen. If 41.75 is rounded to
41.8 do not award (A1).

[2 marks]

(b.ii) Show that the point M (x̄, ȳ) lies on the regression line y on x. [2]

Markscheme

y = 0.875(14) + 41.75 (M1)

Note: Award (M1) for their correct substitution into their regression line. Follow
through from parts (a)(i) and (b)(i).

= 54

and so the mean point lies on the regression line (A1)

(accept 54 is ȳ, the mean value of the y data)

Note: Do not award (A1) unless the conclusion is explicitly stated and the 54
seen. The (A1) can be awarded only if their conclusion is consistent with their
equation and it lies on the line.

The use of 41.8 as their c value precludes awarding (A1).

OR

54 = 0.875(14) + 41.75 (M1)

54 = 54
Note: Award (M1) for their correct substitution into their regression line. Follow
through from parts (a)(i) and (b)(i).

and so the mean point lies on the regression line (A1)

Note: Do not award (A1) unless the conclusion is explicitly stated. Follow
through from part (a).

The use of 41.8 as their c value precludes the awarding of (A1).

[2 marks]

A tenth student, Jerome, obtained a project mark of 17.

(c.i) Use the regression line y on x to estimate Jerome’s examination


score. [2]

Markscheme

y = 0.875(17) + 41.75 (M1)

Note: Award (M1) for correct substitution into their regression line.

= 56.6 (56.625) (A1)(ft)(G2)

Note: Follow through from part (b)(i).

[2 marks]

(c.ii) Justify whether it is valid to use the regression line y on x to


estimate Jerome’s examination score. [2]
Markscheme

the estimate is valid (A1)

since this is interpolation and the correlation coefficient is large enough (R1)

OR

the estimate is not valid (A1)

since the correlation coefficient is not large enough (R1)

Note: Do not award (A1)(R0). The (R1) may be awarded for reasoning based on
strength of correlation, but do not accept “correlation coefficient is not strong
enough” or “correlation is not large enough”.

Award (A0)(R0) for this method if no numerical answer to part (a)(iii) is seen.

[2 marks]

(d) In his final IB examination Jerome scored 65.

Calculate the percentage error in Jerome’s estimated examination


score. [2]

Markscheme

56.6−65

65
× 100 (M1)

Note: Award (M1) for correct substitution into percentage error formula.
Follow through from part (c)(i).

= 12.9 (%)(12.9230…) (A1)(ft)(G2)

Note: Follow through from part (c)(i). Condone use of percentage symbol.
Award (G0) for an answer of −12.9 with no working.
[2 marks]

19. [Maximum mark: 6] 18N.2.SL.TZ0.S_2


The following table shows the hand lengths and the heights of five athletes on a
sports team.

The relationship between x and y can be modelled by the regression line with
equation y = ax + b.

(a.i) Find the value of a and of b. [3]

Markscheme

evidence of set up (M1)

eg correct value for a or b or r (seen in (ii)) or r2 (= 0.973)

9.91044, −31.3194

a = 9.91, b = −31.3, y = 9.91x − 31.3 A1A1 N3

[3 marks]

(a.ii) Write down the correlation coefficient. [1]

Markscheme

0.986417
r = 0.986 A1 N1

[1 mark]

(b) Another athlete on this sports team has a hand length of 21.5 cm.
Use the regression equation to estimate the height of this athlete. [2]

Markscheme

substituting x = 21.5 into their equation (M1)

eg 9.91(21.5) − 31.3

181.755

182 (cm) A1 N2

[2 marks]

20. [Maximum mark: 6] 18M.1.SL.TZ1.T_4


A scientist measures the concentration of dissolved oxygen, in milligrams per litre
(y) , in a river. She takes 10 readings at different temperatures, measured in degrees
Celsius (x).

The results are shown in the table.

It is believed that the concentration of dissolved oxygen in the river varies linearly
with the temperature.
(a.i) For these data, find Pearson’s product-moment correlation
coefficient, r. [2]

Markscheme

−0.974 (−0.973745…) (A2)

Note: Award (A1) for an answer of 0.974 (minus sign omitted). Award (A1) for an
answer of −0.973 (incorrect rounding).

[2 marks]

(a.ii) For these data, find the equation of the regression line y on x. [2]

Markscheme

y = −0.365x + 17.9 (y = −0.365032…x + 17.9418…) (A1)(A1) (C4)

Note: Award (A1) for −0.365x, (A1) for 17.9. Award at most (A1)(A0) if not an
equation or if the values are reversed (eg y = 17.9x −0.365).

[2 marks]

(b) Using the equation of the regression line, estimate the


concentration of dissolved oxygen in the river when the
temperature is 18 °C. [2]

Markscheme

y = −0.365032… × 18 + 17.9418… (M1)

Note: Award (M1) for correctly substituting 18 into their part (a)(ii).

= 11.4 (11.3712…) (A1)(ft) (C2)

Note: Follow through from part (a)(ii).


[2 marks]

21. [Maximum mark: 6] 18M.1.SL.TZ2.T_1


The following scatter diagram shows the scores obtained by seven students in their
mathematics test, m, and their physics test, p.

The mean point, M, for these data is (40, 16).

(a) Plot and label the point M(m̄, p̄) on the scatter diagram. [2]

Markscheme

* This question is from an exam for a previous syllabus, and may contain minor
differences in marking or structure.
(A1)(A1) (C2)

Note: Award (A1) for mean point plotted and (A1) for labelled M.

[2 marks]

(b) Draw the line of best fit, by eye, on the scatter diagram. [2]

Markscheme

straight line through their mean point crossing the p-axis at 5±2 (A1)(ft)(A1)(ft)
(C2)

Note: Award (A1)(ft) for a straight line through their mean point. Award (A1)(ft)
for a correct p-intercept if line is extended.

[2 marks]

(c) Using your line of best fit, estimate the physics test score for a
student with a score of 20 in their mathematics test. [2]

Markscheme
point on line where m = 20 identified and an attempt to identify y-coordinate
(M1)

10.5 (A1)(ft) (C2)

Note: Follow through from their line in part (b).

[2 marks]

© International Baccalaureate Organization, 2024

You might also like