Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
33 views10 pages

PP Stats From Past Papers-1

The document contains a series of statistical problems and experiments related to plant heights, correlation coefficients, ogives, and regression analysis. It includes tasks such as estimating heights, calculating correlations, drawing box-and-whisker plots, and analyzing cumulative frequency graphs. The problems are structured in sections, each focusing on different statistical concepts and applications.

Uploaded by

Favour Emeruh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views10 pages

PP Stats From Past Papers-1

The document contains a series of statistical problems and experiments related to plant heights, correlation coefficients, ogives, and regression analysis. It includes tasks such as estimating heights, calculating correlations, drawing box-and-whisker plots, and analyzing cumulative frequency graphs. The problems are structured in sections, each focusing on different statistical concepts and applications.

Uploaded by

Favour Emeruh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Statistics

Section A
Question 1

The heights of several plants (in cm) was measured at a certain stage after planting, and
the following data was recorded:

x = days after
14 20 8 15 18 11 14
planting
h = height (cm) 6 11 3 8 10 4

The record of the last height has been lost, but we do know that the regression line had
equation h = 0,72x – 3,31.

1.1 Estimate, to the nearest centimetre, what the last recorded height was. (2)

1.2 Calculate the correlation coefficient for the data relating to the first 6 plants
(i.e. ignoring the last column). (2)

1.3 Some time later another plant’s height 25 days after planting was found to
be 20 cm. Comment on how surprising (or not) this is in the light of your
previous results. (2)
[6]
Question 2

A scientific experiment finds two variables related by the following table:

x 0,2 0,7 0,3 1 1,3 0,95 1,1


y 3,98 0,4 2,5 0,1 0,03 0,125 0,06

To try and find relationship between them, the scientists plot the graph of x against
log y and look for the line of best fit.

2.1 Complete the table given on the diagram sheet by writing in the values of
log y correct to 3 d.p. (2)

2.2 Find the equation of the line of best fit between x and log y, giving
coefficients correct to the nearest integer. (3)

2.3 Deduce from this equation a new one that describes y explicitly in terms
of x. (2)

2.4 Hence determine the value of y that corresponds to x = 0,5 (1)


[8]
Question 3

Given the stem-and-leaf plot below, draw the box-and-whisker plot for
the same data on the scale provided on the diagram sheet:

0 3
1 4 7
2 3 3 4 6
3 2 2 3 4 5 5
4 1 2 2 3 6 6
5 0 3 4 4 5 [6]

Question 4

The diagram shows three ogives, which all correspond to the same number of observations,
with the same median in each case. List the ogives in order of
increasing IQR, using their letters to identify the ogives. [3]

35 OGIVES
30
25
20 A
15 B
10 C
5
0
1 2 3 4 5 6 7 8 9 10
Section B

1.1 Below is a cumulative frequency graph representing the 2015 mid-year


matric mathematics marks at a certain school. The maximum possible mark
is 100%. There were 67 scholars who wrote mathematics, and the mean
result was 63%.

Mid-year Mathematics Marks (%)

1.1.1 Draw a box-and-whisker plot associated with the cumulative


frequency graph. Use the horizontal axis given in the answer
book to align your box-and-whisker plot and indicate on the
ogive where the necessary readings were taken. (4)

1.1.2 Given that the standard deviation is 18,6% determine how


many scholars achieved a mark within one standard deviation
of the mean. (2)

1.1.3 The ogive groups the data into intervals


20  x  30 ; 30  x  40 et cetera. Which of these intervals
is the modal group? (1)

1.1.4 How many scholars achieved above 80% on their mid-year


report? (2)
1.2 Below is a scatter plot indicating the height (in feet) of the basket of a hot air
balloon above the ground during its first 4 minutes of flight.
[Note: The line given is NOT the best-fit (regression) line.]

Height above the


ground (in feet)

Time (in minutes)

1.2.1 Daniel has placed an estimated line-of-best-fit on the scatter


plot. Would the correct line-of-best-fit be steeper or less steep
than Daniel’s line? (1)

1.2.2 What is the real world meaning of the slope of the regression
line in this context? (2)

Given that the data used for the scatter plot above is as indicated in the
table below, answer the questions that follow:

Time 0 0,5 1 1,5 2 2,5 3 3,5 4


(in mins)
Height (in 5 10 30 58 70 88 92 107 112
feet)

1.2.3 Determine the equation of the regression line. (3)


1.2.4 Determine the correlation coefficient, and briefly describe the
nature of the correlation between the time and height of the flight. (3)
1.2.5 Estimate the height of the basket above the ground after 5 minutes.(2)
[20]
Section C

1.

The histogram above shows the ages of staff in a school.

1.1 Use the histogram to complete the cumulative frequency table below. (2)

Age Frequency Cumulative Frequency


25 < A ≤30 2 2
30 < A ≤35 8 10
35 < A ≤40
40 < A ≤45
45 < A ≤50
50 < A ≤55
55 < A ≤60
60 < A ≤65 6

1.2 Draw a cumulative frequency graph on the set of axes provided to represent
the data in the table. (3)

1.3 Use your cumulative frequency graph to find an estimate for the median age. (2)

1.4 Use your cumulative frequency graph to find an estimate for the percentage of
teachers older than 50 years. (2)

1.5 Use your cumulative frequency graph to draw a box and whisker diagram
for the given data. Use the number line provided. (3)

1.6 Comment on the skewness of the data. (1)


2. In the table below, the scores in beam and floor events of 13 gymnasts who
participated at the Rio 2016 Games are given. A typical score under today's rules
ranges from 13 to 16 points.

A scatterplot diagram is also given to show the correlation between the two events.

Beam 13.666 13.066 13.2 13.6 14.366 13.7 14.866 13.2 13.866 13.9 13.8 13.8 14.666
x

Floor 14.733 14.133 13.233 13.766 13.9 14.3 15.433 13.833 13.933 14.133 14.233 14.075 14.9
y

2.1 Use your calculator to determine the equation of the least squares regression
line y = A + B x. Give your answers correct to 4 decimal places. (3)

2.2 Calculate the value of r, the correlation coefficient for the data, correct to
4 decimal places. (2)

2.3 Discuss the correlation between the two sets of data. (2)

2.4 Use the least squares regression line found in 2.1 to predict what a gymnast
is likely to score for the floor event if she were to obtain a score of 14,5 for the
beam event. (2)
Section D

2.
The given data values reflect the masses (in kg) of 20 athletes in the school team:

40 47 52 53 55 57 57 58 60 x

63 64 64 65 66 67 67 68 69 73

(a) Determine the value of x if the median of the data is 61,5 kg. (2)

(b) Draw a box-and-whisker diagram of the given data, indicating the necessary values
clearly.
Use a scale of 2 𝑐𝑚 = 5 𝑘𝑔. (5)

(c) A value in the data set is considered to be an outlier if that value is either:
less than 𝑸𝟏 − (𝟏, 𝟓 × 𝑰𝑸𝑹) or greater than 𝑸𝟑 + (𝟏, 𝟓 × 𝑰𝑸𝑹).

Determine whether 40 kg is an outlier of this data set. (3)

(d) If 4 kg were added to each data value in the given set, what effect would it have on
the resulting
(1) mean and (2) standard deviation? (2)

7.
An athlete’s ability to take and use oxygen effectively is called his / her VO 2-max.

Twelve athletes with pre-recorded VO2-max readings ran for one hour. The distances (in
kilometres) that they each covered are represented in the table:
VO2-max
20 55 30 25 40 30 50 40 35 30 50 40
reading
Distance run
8 18 13 10 11 12 16 14 13 9 15 12
(km)

(a) Draw a scatter plot of the data. (Place VO2-max on the horizontal axis) (4)

(b) Use the correlation coefficient to describe the correlation between the two sets of
data. (2)

(c) Determine the equation of the least squares line of best fit and draw it on the graph. (5)

(d) Use the method of interpolation to predict the distance run if an athlete has a VO 2-
max value
of 26. (2)
Section E

1.

Mr Jacobs owns a laundromat and charges his clients by the weight of their laundry. He kept
a record of the weights (in kg) brought in by 22 clients.

0,5 0,8 1,5 1,9 2,1 2,8 2,8 3,3 3,4 3,4 3,6

4,0 4,5 5,4 6,2 6,4 6,5 7,7 7,8 9,2 9,8 13,0

1.1 Determine the mean weight of the laundry. (2)

1.2 Calculate the standard deviation of the data. (2)

1.3 How many clients have laundry weights that are within one standard deviation of the
mean? (2)

1.4 He calibrates his scale and discovers that the scale was recording a weight 0,5 kg
greater than the actual weight. Determine the actual mean and standard deviation.
(2)

2.

Further to his investigations, Mr Jacobs collects data for the whole week, and prepares an
ogive (cumulative frequency curve) to summarise his data.
Cumulative Frequency

Amount of Laundry (kg)


2.1 Use the table in the ANSWER BOOK and complete the table below. (3)

Weight (in kg) Cumulative Frequency Frequency


0≤𝑥<2
2≤𝑥<4
4≤𝑥<6
6≤𝑥<8
8 ≤ 𝑥 < 10
10 ≤ 𝑥 < 12
12 ≤ 𝑥 < 14
2.2 Use the frequency data to determine an approximate mean weight of laundry for the
week. (3)

2.3 The data set has a median laundry weight of 5,6 kg. Use the ogive to estimate the
lower quartile and upper quartile. Show evidence on the ogive.
(2)

2.4 A client brings in a load of 13,6 kg. Determine (by calculation) whether this weight is
an outlier. (3)

2.5 Mr Jacobs uses a heavy duty machine to wash loads greater than 11 kg. Determine
the number of clients who will have their laundry washed in the heavy duty machine.
(2)

Section F
2. The diagram below shows a cumulative frequency curve (ogive) which summarizes the
amount of time spent by 50 people answering their emails on a particular day:

Use the ogive to answer the following questions:

1.1. What was the median time spent answering emails? (1)
Indicate with the letter A where you read this value from on the ogive.

1.2. What was the inter-quartile range (IQR)? (3)


Indicate with the letters B and C where you read the values from
on the ogive.

1.3. How many people spent between 15 and 20 minutes answering emails? (2)

1.4. If 80% of the people spent more than y minutes answering emails, what
is the value of y? (2)
1.5. Using the 5-number summary, draw a box-and-whisker plot for the data
on the scale below: (3)

0 5 10 15 20 25 30 35 40

1.6. Describe the skewness of the data in the box-and-whisker plot. (1)
Is the data skewed to the left, skewed to the right or symmetrical?

3. A person’s level of fitness is often determined by the time t (in minutes) that it takes
for his or her pulse rate P (in beats per minute) to return to normal after strenuous
exercise. The assumption is that the greater a person’s level of fitness, the less time
that it takes for their pulse rate to return to normal.

Olorato’s pulse rate was measured for several minutes after she had been running
on a steep tread-mill. Her results were as follows:

Time (t), minutes 1,0 2,0 3,0 4,0 5,0 6,0 7,0

Pulse rate (P), beats/min 130 115 105 96 88 82 72

3.1 Determine the equation of the least squares regression line for P in
terms of t in the form Pˆ  A  Btˆ . (Round off your values for A and B to
3 decimal places): (3)

3.2 Determine the correlation coefficient, r, for the set of data, and describe
the relationship between P and t that this value represents: (3)

3.3 Owami used the equation in 3.1. to calculate Olorato’s pulse rate after
8 minutes would have been 62 beats per minute. Comment on the
reliability of using this calculation method: (2)

You might also like