CH 11 UnivariateDataAnalysis
CH 11 UnivariateDataAnalysis
11.1 Overview
11.1.1 Introduction
A boxplot is a method of graphically representing groups
of numerical data through their quartiles, where the
spacings between the different sections of the box indicate
the degree of spread and skewness in the data. Boxplots
are also used to determine and show outliers. The boxplot
gives a snapshot of a number of values, such as interquar-
tile range, maximum value, minimum value, range and
median of the data.
The boxplot was introduced by the mathematician John
Tukey, who is regarded as one of the most influential statis-
ticians of the past 50 years. Some of his work in modern
statistics led to concepts that played a central role in the
creation of today’s telecommunication technology. He is
credited with the invention of the computer term ‘bit’.
John Tukey was born in New Bedford, Massachusetts
in 1915. He obtained a BA and a MSc in chemistry from
Brown University in 1937, before moving to Princeton
University where he completed his PhD in mathematics.
He became a professor at 35 and founding chairman of the
Princeton statistics department in 1965. He was awarded
the IEEE Medal of Honor in 1982 for his contributions
to the spectral analysis of random processes and the fast
Fourier transform (FFT) algorithm. He introduced the
boxplot in his book Exploratory data analysis in 1977.
LEARNING SEQUENCE
11.1 Overview
11.2 Classifying and displaying data
11.3 Construct, describe and interpret dot plots and stem-and-leaf plots
11.4 Construct, describe and interpret column graphs and histograms
11.5 Measures of centre
11.6 Measures of spread
11.7 Review: exam practice
Fully worked solutions for this chapter are available in the Resources section of your eBookPLUS at
www.jacplus.com.au.
Categorical data
Data that can be organised into groups or categories is known as categorical data. Categorical data is often
an ‘object’, ‘thing’ or ‘idea’, with examples including brand names, colours, general sizes and opinions.
Categorical data can be classified as either ordinal or nominal. Ordinal data is placed into a natural order or
ranking, whereas nominal data is split into subgroups with no particular order or ranking.
For example, if you were collecting data on income in terms of whether it was ‘High’, ‘Medium’ or ‘Low’,
the assumed order would be to place the ‘Medium’ category between the other two, so this is ordinal data.
On the other hand, if you were investigating preferred car colours the order doesn’t really matter, so this is
nominal data.
Numerical data
Data that can be counted or measured is known as numerical data. Numerical data can be classified as either
discrete or continuous. Discrete data is counted in exact values, with the values often being whole numbers,
whereas continuous data can have an infinite number of values, with an additional value always possible
between any two given values.
For example, the housing industry might consider the number of bedrooms in residences offered for sale. In
this case, the data can only be a restricted group of numbers (1, 2, 3, etc.), so this is discrete data. Now consider
meteorological data, such as the maximum daily temperatures over a particular time period. Temperature data
could have an infinite number of decimal places (23°C, 25.6°C, 18.21°C, etc.), so this is continuous data.
Data
418 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 1
THINK WRITE
1. Identify the type of data. The data collected is the brand or model of
cars, so this is categorical data.
2. Identify whether the order of the data is When assessing the types of different cars,
relevant. the order is not relevant, so this is nominal
data.
3. State the answer. The data collected is nominal data.
WORKED EXAMPLE 2
THINK WRITE
1. Identify the type of data. The data collected is the number of people in
sporting venues, so this is numerical data.
2. Does the data have a restricted or infinite set The data involves counting people, so only
of possible values? whole number values are possible.
3. State the answer. The data collected is discrete data.
Frequency tables
Frequency tables split the collected data into defined categories and
register the frequency of each category in a separate column. A tally
column is often included to help count the frequency.
Frequency
4
Red |||| 4
3
Blue ||| 3 2
1
Yellow || 2 0
ue
le
d
lo
rp
Re
Bl
Pu
Ye
Purple | 1
Favourite colour
Bar charts
Bar charts display the categories of data on the horizontal axis and the frequency of the data on the vertical
axis. As the categories are distinct, there should be a space between all of the bars in the chart.
The bar chart above displays the previous data about favourite colours.
WORKED EXAMPLE 3
The number of students from a particular school who participate in organised sport on weekends
is shown in the frequency table.
Display the data in a bar chart.
Sport Frequency
Tennis 40
Swimming 30
Cricket 60
Basketball 50
No sport 70
THINK WRITE
50
Display the different categories along the 40
horizontal axis. 30
20
10
0
s
t
g
t
ll
ni
ke
or
in
ba
n
sp
ic
m
Te
et
Cr
im
o
sk
N
Sw
Ba
Sport played
420 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
2. Draw bars to represent the frequency of each
70
category, making sure there are spaces between
60
the bars.
Frequency
50
40
30
20
10
0
t
g
t
ll
ni
ke
or
in
ba
n
sp
ic
m
Te
et
Cr
im
o
sk
N
Sw
Ba
Sport played
The mode
For categorical data, the mode is the category that has the highest frequency. When displaying categorical
data in a bar chart, the modal category is the highest bar.
Identifying the mode allows us to know which category is the most common or most popular, which can
be particularly useful when analysing data.
In some instances there may be either no modal category or more than one modal category. If the data has
no modal category then there is no mode, if it has 2 modal categories then it is bimodal, and if it has 3 modal
categories it is trimodal.
WORKED EXAMPLE 4
Thirty students were asked to pick their favourite time of the day between the following categories:
Morning (M), Early afternoon (A), Late afternoon (L), Evening (E)
The following data was collected:
A, E, L, E, M, L, E, A, E, M, E, L, E, A, L, M, E, E, L, M, E, A, E, M, L, L, E, E, A, E.
a. Represent the data in a frequency table.
b. Construct a bar chart to represent the data.
c. Determine which time of day is the most popular.
THINK WRITE
Frequency
8
7
6
5
4
3
2
1
0
g
g
n
n
in
in
oo
oo
en
n
rn
rn
or
Ev
te
te
M
af
af
te
rly
La
Ea
8
7
6
5
4
3
2
1
0
ng
ng
n
n
oo
oo
ni
ni
rn
rn
e
or
Ev
te
te
M
af
af
te
ly
La
r
Ea
c. 1. The highest bar is the modal category. This is c. Evening is the most popular time of day
the most popular category. Write the answer. among the students.
422 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
Interactivity: Create a bar chart (int-6493)
discrete or continuous.
a. The amount of daily rainfall in Geelong
b. The heights of players in the National Basketball League
c. The number of children in families
d. The type of pet owned by families
4. Identify whether the following categori cal or numerical data is nominal, ordinal, discrete or continuous.
a. The times taken for the place getters in the Olympic 100 m sprinting final
b. The number of gold medals won by countries competing at the Olympic Games
c. The type of medals won by a country at the Olympic Games
d. The countries that won at least one gold medal in any Olympics Games
5. WE3 The preferred movie genre of 100 students is shown in the following frequency table.
15
14
13
12
11
10
Frequency
9
8
7
6
5
4
3
2
1
0
a
ni
er
as
rit
em
ia
o
th
fe
ar
er
he
O
pr
et
pp
t
g
ea
Su
eg
ar
Pe
M
M
V
Favourite pizza
Construct a frequency table to represent the data.
7. A group of students at a university were surveyed about their usual method of travel, with the results
shown in the following table.
424 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
8. In a telephone survey people were asked the question, ‘Do you agree that convicted criminals should be
required to serve their full sentence and not receive early parole?’ They were required to respond with
either ‘Yes’, ‘No’ or ‘Don’t care’ and the results are as follows.
Data Type
Example: The types of meat displayed in a butcher shop. Categorical Nominal
a. Wines rated as high, medium or low quality
b. The number of downloads from a website
c. Electricity usage over a three-month period
d. The volume of petrol sold by a petrol station per day
10. WE4 Twenty-five students were asked to pick their favourite type of animal to keep as a pet. The
20
18
16
14
Frequency
12
10
8
6
4
2
0
so
tte
no
ha
te
to
ac
hi
ia
es
oc
La
ci
bl
ch
pr
uc
M
ng
ac
at
Es
pp
Fl
M
Lo
Ca
Type of coffee
Determine the modal category of the coffees sold.
a.
How many coffees were sold in that hour?
b.
13. The results of an opinion survey are displayed in the following bar chart.
45
40
35
Frequency
30
25
20
15
10
5
0
ee
ee
di e
ee
re
e
su
r
gr
gr
gr
ag
isa
sa
ot
ly
N
D
ng
ly
ro
ng
St
ro
St
Opinion
a. What type of data is being displayed?
b. Explain what is wrong with the current data display.
c. Redraw the bar chart displaying the data correctly.
426 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
14. Exam results for a group of students are shown in the following table.
Number of bedrooms
City 2 3 4 5
Adelaide 8 12 5 4
Brisbane 15 11 8 6
Canberra 8 12 9 2
Hobart 3 9 5 1
Melbourne 16 18 12 11
Sydney 23 19 15 9
Perth 7 9 12 3
Use the given information to construct a bar graph that represents the number of bedrooms of properties
sold in the capital cities during this time period.
16. The maximum daily temperatures (°C) in Adelaide during a 15-day period in February are listed in the
following table.
Day 1 2 3 4 5 6 7 8
Temp(°C) 31 32 40 42 32 34 41 29
Day 9 10 11 12 13 14 15
Temp(°C) 25 33 34 24 22 24 30
Category Frequency
Fruit 6
Vegetables 8
Frozen goods 5
Packaged goods 11
Toiletries 3
Other 7
Birthplace Frequency
Australia 128
United Kingdom 14
India 10
China 9
Ireland 6
Other 33
20
18
16
14
Frequency
12
10
8
6
4
2
0
0s
29
39
+
9
–5
–4
60
r2
–
20
30
50
40
e
nd
U
Age (years)
428 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
a. Construct a frequency table to represent the data.
b. Determine the modal category.
c. The age groups are changed to Under ‘20’, ‘20–39’, ‘40–59’ and ‘60+’. Redraw the bar chart with
these new categories.
d. Does this change the modal category?
20. Data for the main area of education and study for a selected group of people aged 15 to 64 during a
particular year in Australia is shown in the following table.
a. Construct separate bar charts for each area of education and study to represent the data.
b. Construct separate bar charts for each age group to represent the data.
Note that the data has been presented in neat vertical columns, making it easy to read.
Always remember to include a key with your stem plot to indicate what the stem and the leaf represents
when put together.
CHAPTER 11 Univariate data analysis 429
Back-to-back stem plots Key: 1 | 4 = 14
As we will see later in this chapter back-to-back stem plots can be used to 1* | 7 = 17
compare two different sets of data. Back-to-back stem plots share the same Leaf Stem Leaf
stem, with one data set appearing on the left of the stem and the other data 3 2 1 4
set appearing on the right.
9 8 6 6 1* 7 7 9
4 2 1 0 2 2 3 4 4
7 6 2* 6 8
4 2 1 3 0 1 3
6 5 3* 5 6 9
4 0
WORKED EXAMPLE 5
The following data set (of 31 values) shows the maximum daily temperature during the month of
January in a particular area.
26, 22, 24, 26, 28, 28, 27, 42, 25, 25, 29, 31, 23, 33, 34, 27, 39, 44, 35, 34, 27, 30, 36, 30, 30, 28, 33,
23, 24, 34, 37
Construct a stem plot to represent the data.
THINK WRITE
1. Identify the place values for the data. If there are 4 The temperature data has values in the
or less different place values, split each into two. 20s, 30s and 40s.
Stem Leaf
2
2*
3
3*
4
2. Write the units for each stem place value in Key: 2 | 2 = 22°C
numerical order, with the smallest values closest to Stem Leaf
the stem. Make sure to keep consecutive numbers 2 2 3 3 4 4
level as they move away from the stem. 2* 5 5 6 6 7 7 7 8 8 8 9
Remember to add a key to your plot. 3 0 0 0 1 3 3 4 4 4
3* 5 6 7 9
4 2 4
430 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 6
The frequency table shows the number of floors in apartment buildings in a particular area.
Construct a dot plot to represent the data.
THINK WRITE
1. Draw a horizontal scale using the discrete data The discrete data values are given by
values shown. the number of floors.
2 3 4 5 6 7
Number of floors
Stem-and-leaf plots and dot plots Summary screen and practice questions
b. The number of hours per week spent checking emails by a group of workers at a particular company.
432 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
6. Construct dot plots to represent the following collections of data.
a. The scores per round of a golfer over a particular time period (40 values):
73, 77, 74, 77, 73, 74, 72, 75, 72, 76, 77, 75, 74, 73, 75, 77, 78, 73, 77, 74,
72, 77, 73, 70, 72, 75, 73, 70, 77, 75, 77, 76, 70, 73, 75, 76, 78, 77, 74, 75
b. The scores out of 10 in a multiple choice test for a group of students (30 values):
6, 7, 4, 7, 3, 7, 7, 5, 7, 6, 7, 5, 1, 3, 5, 7, 8, 3, 7, 4, 9, 5, 4, 6, 7, 9, 10, 5, 7, 4
7. The data below give the time taken (in minutes) for each of 40 runners on a 10 km fun run. Prepare a
stem-and-leaf diagram for the data using a class size of 10 minutes.
36 42 52 38 47 59 72 68 57 82
66 75 45 42 55 38 42 46 48 39
42 58 40 41 47 53 68 43 39 48
71 42 50 46 40 52 37 54 48 52
8. The typing speed (in words per minute) of 30 word processors is recorded below. Prepare a stem-and-leaf
diagram of the data using a class size of 5.
96 102 92 96 95 102 95 115 110 108
88 86 107 111 107 108 103 121 107 96
124 95 98 102 108 112 120 99 121 130
9. Twenty transistors are tested by applying increasing voltage until they are destroyed. The maximum
voltage that each could withstand is recorded below. Prepare a stem-and-leaf plot of the data using a class
size of 0.5.
14.8 15.2 13.8 14.0 14.8 15.7 15.5 15.6 14.7 14.3
14.6 15.2 15.9 15.1 14.3 14.6 13.9 14.7 14.5 14.2
Questions 10 and 11 refer to the following. Each student in a class has been assigned a newly planted
tree to look after, and must provide a weekly report on its growth and condition. From the latest reports, the
teacher recorded the height of each tree (in mm), and entered these in the stem-and-leaf plot shown below.
Key: 12 | 1 = 1210 mm
12* | 5 = 1250 mm
Stem Leaf
12 1 2 4
12* 5 7 7 9 9
13 0 1 1 2 3 4 4
13* 5 6 6 7 9 9
14 0 2 3 4
14* 6 7
13. The total number of games played by the players from two
basketball squads is shown in the following stem plots.
14. Consider the set of data in the stem plot shown. Key: 0 |1 = 1
a. Instead of grouping the data in 10s, the stems could be split in Stem Leaf
half to use groups of 5. For example, a split stem plot for the same 0 1
data set could place the data values from 10 to 14 in a row 1 1 1 1 4 4 6 6 7 8
labelled ′ 1′ , while data values from 15 to 19 are put in a row
2 3 3 4 4 7 7 9
labelled ′ 1∗′ . Use the data from the original stem plot to complete
the split stem plot.
b. Comment on the effect of splitting the stem for the data in this question.
434 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
11.4 Construct, describe and interpret column
graphs and histograms
11.4.1 Grouped data
Numerical data may be represented as either grouped data or ungrouped data. When assessing ungrouped
data, the analysis we do is exact; however, if we have a large data set, the data can be difficult to work with.
Grouping data allows us to gain a clearer picture of the data’s distribution, and the resultant data is usually
easier to work with.
When grouping data, we try to pick class sizes so that between 5 and 10 classes are formed. Ensure that all
of the classes are distinct and that there are no overlaps between classes.
When creating a frequency table to represent grouped continuous data, we will represent our class intervals
in the form 12 −< 14. This interval covers all values from 12 up to 14 but does not include 14.
WORKED EXAMPLE 7
The following data represents the time (in seconds) it takes for each individual in a group of
20 students to run 100 m.
18.2, 20.1, 15.6, 13.5, 16.7, 15.9, 19.3, 22.5, 18.4, 15.9, 12.4, 14.1, 17.7, 19.4, 21.0, 20.4,
18.2, 15.8, 16.1, 14.6
Group and display the data in a frequency table.
THINK WRITE
1. Identify the smallest and largest values in the data Smallest value = 12.4
set. This will help you to choose your class size and Largest value = 22.5
decide what the first class should be. We will have class intervals of 2,
starting with 12−<14.
2. Draw a frequency table to represent the data.
Complete the tally column in your table, and use Time (seconds) Tally Frequency
this to fill in the frequency column. 12−<14 ∥ 2
14−<16 ∥∥ | 6
16−<18 ||| 3
18−<20 ∥∥ 5
20−<22 ||| 3
22−<24 | 1
Sport Frequency
AFL 6
Basketball 2
Cricket 7
Netball 2
Rugby League 3
Rugby Union 1
Soccer 2
Tennis 1
THINK WRITE
Favourite sports of 24 students
1. Draw the horizontal axis showing each
y
sport.
2. Draw a vertical axis to show frequencies 7
6
Frequency
up to 7.
5
3. Draw the columns all the same width 4
with gaps between. 3
4. Use a ruler. 2
1
5. Label the axes.
0 x
6. Give the graph a title.
ue
on
er
s
FL
ll
ll
ni
ke
ba
cc
ba
ag
ni
n
A
ic
So
Te
et
U
et
Le
Cr
N
sk
y
y
gb
Ba
gb
Ru
Ru
Sport
Sector graphs
A sector graph (circle graph, or pie graph) is used when we want the graph to display a comparison of
quantities. An angle is drawn at the centre of the circle that is the same fraction of 360° as the fraction of
people making each response.
436 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 9
Sport Frequency
AFL 6
Basketball 2
Cricket 7
Netball 2
Rugby League 3
Rugby Union 1
Soccer 2
Tennis 1
THINK WRITE
6 2
1. Calculate each angle as a fraction AFL = × 360° Basketball = × 360°
24 24
of 360° by dividing the frequency of
each sport by the total frequency = 90° = 30°
and multiplying by 360°. 7 2
Cricket = × 360° Netball = × 360°
24 24
= 105° = 30°
3 1
Rugby League = × 360° Rugby Union = × 360°
24 24
= 45° = 15°
2 1
Soccer = × 360° Tennis = × 360°
24 24
= 30° = 15°
Frequency
x
0 10 20 30 40 50 60 70 80 90
Data values
WORKED EXAMPLE 10
The following frequency table represents the heights of players in a basketball squad.
Frequency 1 3 6 3 1 1
THINK WRITE
1. Look at the data range and use the leading values The height data in the table has
from each interval in the table for the scale of the intervals starting from 175 cm and
horizontal axis. increasing by 5 cm.
Heights of a basketball squad
7
6
Frequency
5
4
3
2
1
0 x
175 180 185 190 195 200 205
Height (cm)
438 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
2. Draw rectangles for each interval to the height of Heights of a basketball squad
the frequency indicated by the data in the table.
7
6
Frequency
5
4
3
2
1
0 x
175 180 185 190 195 200 205
Height (cm)
40–49
50–59
60–69
70–79
80–89
90–99
6. Construct a column graph to display the data from question 5.
7. Construct a sector graph to compare the number of people in each category from question 5.
8. A class of students was asked to identify the make of car their family owned. Their responses are shown
below.
440 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
11. WE10 The following frequency table represents the cholesterol levels measured for a group of people.
12. The following frequency table represents the distances travelled to school by a group of students.
Construct a histogram to represent this data.
13. Organise each of the following data sets into a frequency table using intervals of five, commencing from
the lowest value. Then draw a histogram to represent the data.
5, 7, 14, 17, 13, 24, 22, 15, 12, 26, 17, 15, 14, 13, 15, 7, 8, 13, 17, 24,
22, 7, 13, 20, 12, 15, 23, 20, 17, 15, 17, 16, 20, 23, 15, 16, 8, 17, 14, 15
14. Organise each of the following data sets into a frequency table using intervals of five, commencing from
the lowest value. Then draw a histogram to represent the data.
34, 28, 45, 46, 13, 24, 11, 33, 41, 35, 16, 15, 35, 13, 14, 28, 27, 22, 36, 31,
11, 18, 24, 20, 12, 15, 41, 50, 27, 13, 14, 16, 20, 23, 31, 26, 25, 27, 34, 35
Shape
The shape of a numerical distribution is an important indicator of some of the key measures for further analysis
and is one of the most important reasons for displaying the data in a graphical form. Shape will generally be
described in terms of symmetry or skew. Symmetrical data distributions have higher frequencies around their
centres with a relatively evenly balanced spread to either side, while skewed distributions have the majority of
their values towards one end. Distributions with higher frequencies on the left side of the graph are positively
skewed, while those with higher frequencies on the right side are negatively skewed.
y
Frequency
Frequency
x x
Symmetrical Positively skewed
Frequency
x
Negatively skewed
Modality
The mode of a distribution is the data value or class interval y
that has the highest frequency. This will be the column or
row on the display that is the longest. When there is more
Frequency
than one mode, the data distribution is multimodal. This can
indicate that there may be subgroups within the distribution
that may require further investigation. Bimodal distributions
can occur when there are two distinct groups present, such as
in data values that typically have clear differences between Bimodal x
male and female measurements.
Spread
An awareness of how widely spread the data is can be an important consideration when conducting any further
analysis. Common indicators of spread include the measures of range and the standard deviation. The graph
will again point to which measures might be most appropriate to use.
Outliers
An outlier is a data value that is an anomaly when compared to the y
majority of the sample. Sometimes outliers are just unusual readings or
measurements, but they can also be the result of errors when recording
the data. Outliers can have a significant effect on some of the measures
that are used for further data analysis, and they are sometimes removed
from the sample for those calculations. The graphical display of the data
can alert us to the presence of potential outliers.
x
WORKED EXAMPLE 11
5
4
3
2
1
0 x
1 2 3 4 5 6 7 8 9 10 11 12 13
442 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
THINK WRITE
1. Look for the mode and comment on its The distribution has one mode with data values
value. most frequently in the 2 ≤ x < 3 interval.
2. Identify the presence of any potential There is one potential outlier in the interval
outliers. between 12 and 13.
3. Describe the shape in terms of symmetry If we include the outlier, the data set can be
or skewness. described as positively skewed as it is clustered
to the left. If we don’t include the outlier, the
distribution can be considered to be
approximately symmetrical.
The Greek letter Σ(sigma) indicates calculating the sum of these values.
WORKED EXAMPLE 12
Calculate the mean of the following data set, correct to 2 decimal places.
6, 3, 4, 5, 7, 7, 4, 8, 5, 10, 6, 10, 9, 8, 3, 6, 5, 4
THINK WRITE
1. Calculate the sum of the data values. 6 + 3 + 4 + 5 + 7 + 7 + 4 + 8 + 5 + 10 +
6 + 10 + 9 + 8 + 3 + 6 + 5 + 4 = 110
110
2. Divide the sum by the number of data values. x=
18
= 6.111...
3. State the answer. The mean of the data set is 6.11.
Σxf
If f = the values of the frequencies and x = the values of the midpoints, then x = .
Σf
WORKED EXAMPLE 13
Calculate the mean of the data set displayed in the following frequency table.
Intervals Frequency
0−<5 3
5−<10 12
10−<15 3
15−<20 2
THINK WRITE
1. Add a column to the table and enter the Intervals Frequency Midpoint (x)
midpoints for the corresponding
0−<5 3 2.5
intervals.
5−<10 12 7.5
10−<15 3 12.5
15−<20 2 17.5
2. Add a column to the table and enter the Intervals Frequency Midpoint (x) xf
product of the frequencies and midpoints
0−<5 3 2.5 7.5
(xf) for the corresponding intervals.
5−<10 12 7.5 90
10−<15 3 12.5 37.5
15−<20 2 17.5 35
444 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
3. Calculate the totals of the f and xf Intervals Frequency Midpoint (x) xf
columns.
0– < 5 3 2.5 7.5
5– < 10 12 7.5 90
10– < 15 3 12.5 37.5
15– < 20 2 17.5 35
f = 20 fx = 170
∑ xf
4. Calculate the mean using the formula x=
∑ xf ∑f
x= . 170
∑f =
20
= 8.5
n + 1 th data value
Median = (
2 )
WORKED EXAMPLE 14
b. 16, 3, 4, 5, 17, 27, 14, 18, 15, 10, 6, 10, 9, 8, 23, 26, 35
THINK WRITE
3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 7, 7, 8, 8, 9, 10, 10
median = 5.5
3, 4, 5, 6, 8, 9, 10, 10, 14, 15, 16, 17, 18, 23, 26, 27, 35
median = 14
3. State the answer. The median of the data set is 14.
The median
In the previous example we saw that the mean is affected by an
extreme value in the data set; however, the same cannot be said 3, 4, 5, 6, 7, 8, 9 3, 4, 5, 6, 7, 8, 90
for the median. In both data sets the median will be the value in
the fourth position. median
The median is therefore more reliable than the mean when the data is skewed or contains extreme values.
Another potential advantage of median is that it is often one of the data values, whereas the mean often isn’t.
However, the median can be considered unrepresentative as it is not calculated by taking into account the
actual values in the data set.
Choosing between measures of centre
In most situations it is preferable to give both the median and the mean as a measure of centre, as between
them they portray a more accurate picture of the data set. However, sometimes it is only possible to give one
of these values to represent our data set.
When choosing which measure of centre to use to represent a data set, take into account the distribution of
the data. If the data has no outliers and is approximately symmetrical, then the mean is probably the better mea-
sure of centre to represent the data. If there are outliers, the median will be significantly less affected by these
and would be a better choice to represent the data. The median is also a good choice to represent skewed data.
446 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
Also consider what each measure of centre tells you about the data. The values of the mean and median
can vary significantly, so choosing which one to represent the data set can be important, and you will need to
justify your choice.
WORKED EXAMPLE 15
The following histogram represents the IQ test results for a group of people.
y
Frequency
x
0 60 70 80 90 100 110 120 130 140 150 160
IQ
Determine which measure of centre is best to represent the data set.
THINK WRITE
1. Look at the distribution of the data set. The data set is approximately
symmetrical and has no
outliers.
2. If the data set is approximately symmetrical with no outliers, The mean is the better
the mean is probably the better measure of centre to represent measure of centre to
the data set. If there are outliers or the data is skewed, the represent this data set.
median is probably the best measure of centre to use. State
the answer.
x x
0 5 10 15 20 25 30 35 40 45 0
15
30
45
60
75
90
5
0
5
10
12
13
2. Describe the distribution of the following data sets after drawing histograms with intervals of
10 commencing with the smallest values.
a. 105, 70, 140, 127, 132, 124, 122, 125, 123, 126, 107, 105, 104, 113, 125, 70, 88, 103, 107,
124, 122, 76, 103, 120, 112, 115, 123, 120, 117, 115, 107, 106, 120, 123, 115, 74, 128, 119
b. 4, 18, 35, 26, 12, 25, 21, 34, 43, 37, 6, 25, 25, 23, 34, 38, 37, 22, 36, 31,
21, 28, 34, 30, 32, 25, 31, 40, 37, 33, 24, 26, 10, 13, 21, 36, 35, 37, 24, 25
3. A group of 26 students received the following marks on a test:
6, 4, 3, 8, 6, 9, 5, 6, 9, 7, 7, 8, 5, 7, 4, 3, 8, 6, 5, 7, 9, 5, 6, 6, 7, 8
a. Construct a dot plot to display the data.
b. Describe the distribution.
4. WE12 Calculate the mean of the following data set.
108, 135, 120, 132, 113, 138, 125, 138, 107, 131, 113, 136, 119, 152, 134, 158, 136, 132, 113, 128
5. a. Calculate the mean of the following data set correct to 2 decimal places.
25, 23, 24, 25, 27, 26, 23, 28, 24, 20, 25, 20, 29, 28, 23, 27, 24
b. Replace the highest value in the data set from part a with the number 79, and then calculate the mean
again, correct to 2 decimal places.
c. How did changing the highest value in the data set affect the mean?
6. WE13 Calculate the means of the data sets displayed in the following tables, giving your answer correct
to 2 decimal places.
a. b.
Intervals Frequency Intervals Frequency
448 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
7. For each of the following sets of data, estimate the mean by creating a table using intervals that
commence with the lowest data value and increase by an amount that is equal to the difference between
the highest and lowest data value divided by 5. Give your answers correct to 2 decimal places.
a. 205, 203, 204, 205, 207, 216, 213, 218, 214, 220, 225, 220, 229, 228, 233, 238, 234
b. 5, 13, 24, 5, 27, 16, 13, 18, 24, 10, 5, 20, 30, 18, 13, 7, 14
8. Calculate the means of the following data sets correct to 2 decimal places.
Key: 1 | 2 = 12
a. b.
10
Stem Leaf 8
6
0 1 1 5 7
4
1 2 6 2
2 3 4 4 5 0
x
0 5 10 15 20 25 30 35
3 1 3
4 0 0 3
5 5
6 5
a.
b. 1.02, 2.01, 3.21, 4.63, 1.49, 3.45, 1.17, 1.38, 1.47, 1.70, 5.02, 1.38, 1.91, 8.54
12. WE15 The following stem plot represents the lifespan of different animals at an animal sanctuary.
Determine which measure of centre is better to represent the data set.
Key: 1 | 2 = 12
Stem Leaf
0 3 5 9
1 2 4 6 8
2 0 1 4 5 5 7 9
3 0 2 6
4
5
6 0 3
b. Which would be the most appropriate measure of centre to represent this data?
15. On a particular weekend, properties are sold at auction for the following 30 prices:
$4 700 000, $3 160 000, $2 725 000, $2 616 000, $2 560 000, $241 000,
$265 000, $266 000, $310 000, $320 000, $3 010 000, $2 580 000,
$2 450 000, $2 300 000, $2 275 000, $286 000, $325 000, $330 000,
$435 500, $456 000, $1 350 000, $1 020 000, $900 000, $735 000,
$733 000, $305 000, $330 000, $347 000, $357 000, $408 000
a. Calculate the mean and median for the data.
b. Construct a histogram of the data using intervals commencing at the lowest value and increasing by amounts
of $250 000.
c. Mark in the location of the mean and median on the histogram.
d. Which would be the more appropriate measure of centre to represent this data?
16. The heights in metres of fruit trees in an orchard were measured with the following results:
1.83, 1.94, 1.98, 1.91, 1.88, 1.76, 2.12, 2.05, 2.11, 2.01, 2.04, 2.08,
2.07, 2.06, 2.05, 2.03, 1.94, 1.96, 2.12, 2.14, 2.04, 2.01, 2.03, 2.06,
2.02, 1.94, 1.98, 2.25, 2.04, 2.06
a. Use intervals of 0.05 m starting with 1.75−<1.80 to display the data in
a frequency table.
b. Construct a histogram to display the data and use it to comment on the
appropriateness of using either the mean or the median to represent
the data.
c. Use the frequency table to calculate the mean.
450 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
17. The winning margins in the NRL over a particular period of time were as follows.
Quartiles
A clearer picture of the spread of data can be obtained by looking at smaller sections. A common way to do
this is to divide the data into quarters, known as quartiles.
The lower quartile (Q1 ) is the value that indicates the median of the lower half of the data.
The second quartile (Q2 ) is the median of the distribution of data.
The upper quartile (Q3 ) is the value that indicates the median of the upper half of the data.
When calculating the values of the lower and upper quartiles, the median should not be included. If the
median is between values, then these values should be considered in your calculations.
452 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 16
THINK WRITE
1. Put the data in order. 15, 16, 17, 19, 22, 22, 23, 23, 24, 34, 34,
34, 35, 45, 56, 56, 56, 67, 78
2. Identify the median. There are 19 data values, so the median will be in position
19 + 1
= 10.
( 2 )
median
Q1 = 22
4. Identify Q3 by finding the There are 9 values in the upper half of the data, so Q3 will
median of the upper half of the be the 5th of these values.
Q3
data.
Q3 = 56
∑ f (x – x)2
Sample variance: s2 =
(∑ f ) – 1
√
∑ f (x – x)2
Sample standard deviation: s =
(∑ f ) – 1
√
∑(xi – x)2
=
n–1
This reverses the previous mathematical process of squaring the differences between the data values and
the mean, so that the standard deviation reverts to a comparative unit of measurement for the original data.
The following example shows that the variance and standard deviation can become very messy to calculate
once you have large groups of data. Spreadsheets, calculators and similar technologies are a more practical
and reliable option for these computations.
The table shows a grouped distribution of a sample of data with a mean of 6.5.
The second last column in the lower table shows the square of the difference between the midpoint and the
mean, and the last column shows this value multiplied by the frequency for the interval.
The sum of the final column can then be used with the sum of the frequency column in the formulas to
calculate the variance and standard deviation of the sample.
∑ f(x – x)2
Sample variance: s2 =
(∑ f )– 1
40
=
9
≈ 4.44
√
Sample standard deviation: s = 4.44
≈ 2.11
454 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 17
Calculate the variance and standard deviation for the sample from the information shown in the
table. Give your answers correct to 2 decimal places.
THINK WRITE
2 2
1. Sum the f (x – x ) column. ∑ f (x – x ) = 72 + 48
= 120
120
2. Substitute the values into the formulas s2 =
19
for variance and standard deviation.
≈ 6.32
√
s = 6.32
≈ 2.51
3. State the answer. The variance of the sample is 6.32 and the
standard deviation of the sample is 2.51.
Interactivity: The median, the interquartile range, the range and the mode (int-6244)
Interactivity: The mean and the standard deviation (int-6246)
Calculate the spread for the ‘Goals for’ column by using the range.
a.
b. Calculate the spread for the ‘Goals for’ column by using the interquartile range.
c. Compare the spread of the ‘Goals for’ column with the spread of the ‘Goals against’ column.
5. WE17 Calculate the variance and standard deviation for the sample from the information shown in the
table. Give your answers correct to 2 decimal places.
6. Complete the table and calculate the variance and standard deviation for the following sample correct to
3 decimal places.
456 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
7. Complete the details of the following table, which shows the results of a survey of the ages of a sample
of workers in the hospitality industry.
15– < 20 14
20– < 25 18
25– < 30 11
30– < 35 7
35– < 40 5
∑f = ∑ xf =
Calculate the interquartile range and standard deviation (correct to 1 decimal place) for both years.
a.
b. Recalculate the interquartile range and standard deviation for both years after removing the smallest
category.
c. Comment on the effect of removing the smallest category on the interquartile ranges and standard
deviations.
11. The table shows the number of registered passenger vehicles in two particular years for the states and
territories of Australia.
Calculate the interquartile range and standard deviation (correct to 1 decimal place) for both years.
a.
Recalculate the interquartile range and standard deviation for both years after removing the three
b.
smallest values.
c. Comment on the effect of the removal of the three smallest values on the interquartile ranges and
standard deviations.
12. Data collected on the number of daylight hours in Alice Springs is as shown.
10.3, 9.8, 9.6, 9.5, 8.5, 8.4, 9.1, 9.8, 10.0, 10.0, 10.1, 10.0, 10.1, 10.1, 10.6, 8.7, 8.8, 9.0, 8.0,
8.5, 10.6, 10.8, 10.5, 10.9, 8.5, 9.5, 9.3, 9.0, 9.4, 10.6, 8.3, 9.3, 9.0, 10.3, 8.4, 8.9
a. Calculate the range of the data.
b. Calculate the interquartile range of the data.
c. Comment on the difference between the two measures and what this indicates.
458 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
13. The volume of wine (′ 000 litres) available for consumption in Australia for a random selection of
months over a 10-year time period is shown in the following table.
Calculate the mean and standard deviation of the data correct to 2 decimal
a.
places.
b. Calculate the median and interquartile range of the data.
c. What percentage, correct to 2 decimal places, of the actual data values
from the sample are within one standard deviation of the mean (i.e.
between the number obtained by subtracting the standard deviation
from the mean and the number obtained by adding the
standard deviation to the mean)?
d. What percentage of the actual data values from the sample are between the first and third quartiles?
e. Comment on the differences between your answers for parts c and d.
14. A random sample of the monthly consumer price indices in various cities of Australia is shown in the
following table. Answer the following questions, giving answers correct to 2 decimal places where
appropriate.
Calculate the standard deviation and interquartile range of the entire data set.
a.
Calculate the standard deviation and interquartile range for each city.
b.
c. Which city bears the closest similarity to the entire data set?
d. Which city bears the least similarity to the entire data set?
15. Answer the questions on the data in the following table. Where appropriate, give answers correct
to 2 decimal places.
a. Calculate the interquartile range and standard deviation for the Australian data.
b. Compare the measures of spread for the Australian data with those for India, China, the United
Kingdom and the United States.
c. For this data, which measure of spread is more appropriate?
16. Answer the questions on the data in the following table. Where appropriate, give your answers correct
to 2 decimal places.
Alcohol consumption per adult (litres)
Country Consumption per adult (litres)
Australia 10.21
Canada 10.01
France 12.48
Germany 12.14
Greece 11.01
Indonesia 0.56
Ireland 14.92
New Zealand 9.99
Russia 16.23
South Africa 10.16
Spain 11.83
Sri Lanka 0.81
United Kingdom 13.24
United States 9.7
Yemen 0.2
460 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
a. Calculate the interquartile range and variance for the data set correct to 2 decimal places.
b. Calculate the interquartile range and variance after removing the three lowest values, correct to 2
decimal places.
c. Compare the results from parts a and b.
462 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
12. The frequency table below shows the crowds at football matches for a team over a season.
Copy the frequency table and complete the class centre column.
a.
Show the information in a frequency histogram.
b.
Complex familar
13. The price of a barrel of oil in US dollars over a particular 18-month time period is shown in the
following table.
Month Price(US$)
Jan 102.96
Feb 97.63
Mar 108.76
Apr 105.25
May 106.17
Jun 83.17
Jul 83.72
Aug 88.99
Sep 95.34
Oct 92.44
Nov 87.05
Dec 88.69
Jan 93.14
Feb 97.46
Mar 90.71
Apr 97.1
May 90.74
Jun 93.41
a. Calculate the mean and median for this data set. Give your answers correct to 1 decimal place.
b. Calculate the standard deviation for this data set. Give your answer correct to 2 decimal places.
Complex unfamilier
17. Use the data on the incidence of communicable diseases in Australia to answer the following questions.
Incidence of communicable diseases in Australia over two consecutive years
a. Calculate the mean (correct to 1 decimal place) and median number of cases of communicable
diseases of the sample for each year.
b. Comment on the differences between the mean and median values calculated in part a.
464 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
18. The number of passengers arriving from overseas during a particular time period at various airports in
Australia is shown in the following table.
Calculate the mean and standard deviation for the sample. Give your answers correct to 1 decimal place.
19. Use the data shown to answer the following questions.
Women who gave birth and Indigenous status by states and territories, 2009
Year Temp. (°C) Year Temp. (°C) Year Temp. (°C) Year Temp. (°C)
1980 19.3 1985 19.4 2003 19.6 2008 20.1
1981 19.0 1986 18.8 2004 21.2 2009 20.3
1982 19.6 1987 20.0 2005 20.4 2010 20.6
1983 19.6 1988 19.0 2006 19.9 2011 20.2
1984 18.8 1989 19.9 2007 20.6 2012 20.0
a. Calculate the mean and standard deviation of the temperature data for the two 10-year periods of
1980–89 and 2003–12. Give your answers correct to 2 decimal places.
b. What do the means and standard deviations calculated indicate about the two 10-year periods?
c. Calculate the mean and standard deviation of the total 20 years of the sample data. Give your
answers correct to 2 decimal places.
d. How do the measurements in part c compare to the calculations you made in part a?
Frequency
3. a. Numerical and continuous 12
b. Numerical and continuous 10
c. Numerical and discrete 8
6
d. Categorical
4
4. a. Continuous b. Discrete 2
c. Ordinal d. Nominal 0
Yes Don’t No
5. care
35
30 Response
Frequency
ce
n
y
A ical
m
ro
io
io
ed
an
ra
ct
at
or
m
low quality
us
m
D
A
m
H
Co
Ro
ni
6
7
Frequency
5
6
4
5
3
4
2
3
1
2
0
1
og
ig
t
t
bi
0
Ca
Ra
rre
ap
D
b
Ra
Fe
ne
s
r
k
e
Ca
Bu
ai
ui
cl
al
Tr
G
cy
W
Bi
Favourite animal
Transport method
c. Cat
466 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
11. a. b.
Favourite type of music Frequency 9
8
Pop 9 7
Frequency
6
Rock 7
5
Classical 2 4
3
Folk 3 2
1
Electronic 9 0
A B C D E
b. Result
10
9 c. Ordinal categorical
8
7
Frequency
15.
6 100
5 90
4 80
3 70
Frequency
2 60
1 50
0 40
30
p
c
ck
El olk
al
ni
Po
sic
Ro
20
tro
F
as
ec
10
Cl
30 9
25 8
20
7
Frequency
15
6
10
5
5
4
0
3
e
ee
ee
ee
2
re
r
su
gr
gr
gr
ag
isa
A
sa
ot
1
ly
di
D
ng
ly
0
ro
ng
St
average average
Response
Temperature
14. a.
Result Frequency c. Ordinal categorical
A 3
17. a. 40
B 2 b. 15%
Frequency
80 40
60 30
40 20
20 10
0 ia
0
a
m
nd
er
di
in
4
4
al
do
–1
–3
–4
–6
th
–2
la
In
Ch
tr
O
Ire
ng
15
25
35
45
us
20
ki
A
Age group
d
te
ni
U
Birthplace
c. 64% Engineering
19. a. 80
Age group Frequency
70
Under 20s 18 60
Frequency
20 − 29 15 50
40
30 − 39 13 30
40 − 49 10 20
10
50 − 59 7 0
4
4
60+ 12
–1
–3
–4
–6
–2
15
25
35
45
20
b. Under 20s Age group
c.
30 Health
25
Frequency
20 80
15 70
10 60
Frequency
5 50
0 40
30
s
+
9
20
–5
–3
60
20
40
20
er
nd
10
U
Age group 0
9
4
4
–1
–3
–4
–6
25
35
45
20
10 180
8 160
6 140
Frequency
4 120
2 100
0 80
60
9
4
4
–1
–3
–4
–6
–2
40
15
25
35
45
20
Age group 20
0
9
4
4
–1
–3
–4
–6
–2
15
25
35
45
20
Age group
468 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
15–19 45–64
80 70
70 60
Frequency
60 50
Frequency
50 40
40 30
30 20
20 10
10 0
0
re
em lth
in s
er d
rt
rin
m n
tu
a
ea
m ta
ce
e
ul
ee
re
em lth
H
rts
er d
co en
tiv
ic
in
m n
tu
ea
ea
m ta
gr
er
ce
g
ea
ul
En
H
A
ne
co en
tiv
Cr
ic
ag
gi
gr
ea
an
En
A
Cr
ag
M
an
Main area of education and study
M
em lth
gi rts
er d
r
0* 6 7 9
in
m n
Cr ultu
ea
a
m ta
er
ce
e
H
ne
co en
ic
1 3 4 4
gr
ea
En
A
1* 7 7
ag
an
M
2 0 0 1 1 1 3 3 4 4
Main area of education and study 2* 5 5 6
25–34 3.
Key: 1 | 2 = 12 passengers
Stem Leaf
160 1 2 3 4
140 1* 5 5 5 5 7 7
120
2 0 0 2 2 3 3 3 3 4
100
80
2* 5 7 7 7 7 7 8 8
60 3 0 3 4 4
40 3* 5 5 6 6 6 7
20 4 2 3 4
0 4* 7
re
em lth
rts
er d
Key: 1 | 7 = 17 patients
rin
4.
m n
tu
ea
ea
m ta
ce
ul
ee
co en
iv
ic
in
Stem Leaf
t
gr
g
ea
En
A
Cr
ag
1 7 7
an
M
2 1 2 3 3 3 4 4 4 5 5 5 6 6 6 8
Main area of education and study
3 0 0 1 2 3 4 4 5 7 8 8
35–44 4 1 1 3 4 5 5 6
5 1 1 5 6
100 6 0
90
5. a.
80
70
Frequency
60
50
40 0 1 2 3 4 5
30 Number of wickets
20 b.
10
0
re
em lth
ts
er d
in
m n
En e ar
tu
ea
m ta
er
ce
ul
H
ne
co en
tiv
ic
1 2 3 4 5 6
gi
gr
ea
A
Cr
ag
470 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
b. 7. Number of students Marks on maths exam
Time(Seconds) Frequency 14.4º
28.8º
85– < 90 2
40–49%
28.8º 50–59%
90– < 95 1
129.6º 60–69%
95– < 100 5 43.2º 70–79%
80–89%
100– < 105 2 90–99%
115.2º
105– < 110 3 8.
Make Tally Frequency
110– < 115 3
Nissan ∥ 2
3.
Class interval Frequency Mazda ∥∥ 3
100–124 3 Toyota ∥∥ ∥ 7
125–149 5 Mitsubishi ∥ 2
150–174 10
9. .
175–199 9 9
8
200–224 4 7
Frequency 6
5
4. 4
10 3
Frequency
8 2
6 1
4 0
2
n
an
da
rd
ta
i
de
ish
0
yo
Fo
iss
az
ol
ub
M
To
N
H
its
4
4
12
14
17
19
22
M
0–
5–
0–
5–
0–
Make of car
10
12
15
17
20
10
6.
10 8
9 6
Number of students
8 4
7 2
6 0
5 1 2.5 4.0 5.5 7.0 8.5
4 Cholesterol level (mmol/L)
3
2
1
0
9
9
–4
–5
–6
–7
–8
–9
40
50
60
70
80
90
Frequency
10– < 15 9
10
15– < 20 15 8
6
20– < 25 9 4
2
25– < 30 1 0
70 80 90 100 110 120 130 140 150
The distribution has one mode with data values that are
18 most frequent in the 120– < 130 interval. There are
16 potential outliers in the 70– < 80 interval, and there is a
14 negative skew to the distribution.
Frequency
12 b.
10 16
8 14
6 12
Frequency
4 10
2 8
0 6
5 10 15 20 25 30 4
2
14. 0
Class interval Frequency 4 14 24 34 44
11– < 16 10 The distribution has one mode with data values that are
16– < 21 5 most frequent in the 24– < 34 interval. There are no
obvious outliers, and there is a negative skew to the
21– < 26 5 distribution.
26– < 31 6 3. a.
31– < 36 8
36– < 41 1
3 4 5 6 7 8 9
41– < 46 3 Marks
b. The distribution has one mode with a value of 6. There
46– < 51 2
are no obvious outliers and there is a slight negative
skew to the distribution.
4. 128.4
5. a. 24.76
12
b. 27.71
10
Frequency
472 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
7. a.
Interval Frequency( f) Midpoint (x) xf
∑ f = 17 ∑ fx = 3727.5
x = 219.26
b.
5– < 10 4 7.5 30
∑ f = 17 ∑ fx = 272.5
x = 16.03
8. a. 26.18 b. 21.67
9. a. 51 b. 198
10. a. 24
b. 24
c. The median is unchanged.
11. a. 100 b. 1.805
12. The median, as the data set has two clear outliers
13. a. $74 230.77
b. $65 000
c. It would be in the workers’ interest to use a higher figure when negotiating salaries, whereas it would be in the
management’s interest to use a lower figure.
14. a. Mean = 867.89 mm, Median = 654 mm
b. The median, as it is not affected by the extreme values present in the data set.
10 Mean
8
Median
6
4
2
0
241 491 741 991 1241 1491 1741 1991 2241 2491 2741 2991 3241 3491 3741 3991 4241 4491 4741
Property prices ($000s)
d. The median, as the mean is affected by a few very high values.
Frequency
6
1.80– < 1.85 1 5
1.85– < 1.90 1 4
3
1.90– < 1.95 4 2
1
1.95– < 2.00 3 0
1.75 1.80 185 1.90 1.95 2.00 2.05 2.10 2.15 2.20 2.25 2.30
2.00– < 2.05 8
Height (m)
2.05– < 2.10 7 The median would be the preferred choice due to the
extreme values in the data set.
2.10– < 2.15 4
2.15– < 2.20 0
2.20– < 2.25 0
2.25– < 2.30 1
c. 2.02 m
17. a. Mean= 7.55, median = 6
b. The median would be the preferred choice due to the extreme value of 34.
18. a. Mean = 91.125, median = 89.5
b., c
16
14
12
Frequency
10
8
6
Median
4
Mean
2
0
80 85 90 95 100 105 110
Value (US cents)
c. The mean is higher than the median as it has been more influenced by the values at the higher end of the distribution.
19. a. Mean = $847 354, median = $310 000
b., c ∗
16
14
12
Frequency
10
8
6
Median
4 Mean
2
0
380
580
0
980
11 0
80
80
80
80
80
80
80
80
80
80
80
80
18
78
15
19
23
27
31
35
39
43
47
51
55
Earnings ($000s)
d. The median is the best measure as the mean is affected by the extreme values.
474 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
20. a. i.
Interval Frequency
12
10
Frequency
8
6
4
2
0
18.1 21.3 24.5 27.7 30.9 34.1 37.3
BMI
ii.
Interval Frequency
9
8
7
Frequency
6
5
4
3
2
1
0
18.1 19.7 21.3 22.9 24.5 26.1 27.7 29.3 30.9 32.5 34.1 35.7
BMI
b. The first histogram has two modes and is near symmetrical, with a slight positive skew.
The second histogram shows two distinct groups, with a symmetrical lower group and a positively skewed upper group.
c. Table 1: 25.95, Table 2: 25.91, Raw data: 25.76
Both of the tables give a higher value for the mean than the raw data, although the differences are small.
d. The total data set is generally symmetrical with no obvious outliers, so the mean is the best measure of centre.
∑ f = 55 ∑ fx = 1367.5
8. a. 7.37
b. 7
c. Standard deviation = 11.49, interquartile range = 7
d. The standard deviation increased by 4.12, while the interquartile range was unchanged.
9. a. $16 327.50
b. $46 902
c. There is a much larger spread in the maximum salaries than the minimum salaries.
10. a. Year 1: standard deviation = 20 382.8, interquartile range = 38 907.5
Year 2: standard deviation = 19 389.0, interquartile range = 37 110.0
b. Year 1: standard deviation = 18 123.5, interquartile range = 31 107.5
Year 2: standard deviation = 17 289.2, interquartile range = 29 140
c. Both values are reduced by a similar amount, but there is a larger impact on the standard deviation than the interquartile
range.
11. a. Year 1: standard deviation = 1 301 033.5, interquartile range = 2 336 546
Year 2: standard deviation = 1 497 303.5, interquartile range = 2 734 078
b. Year 1: standard deviation = 1 082 470.9, interquartile range = 2 136 718
Year 2: standard deviation = 1 228 931.0, interquartile range = 2 415 365
c. Both values are reduced, but there is a bigger impact on the interquartile range than the standard deviation.
12. a. 2.9
b. 1.25
c. The range is slightly more (2.9 < 2.1.25) than double the value of the interquartile range. This indicates that the data is
bunched with no outliers.
13. a. Mean = 41 440.78, standard deviation = 2248.92
b. Median = 41 333, interquartile range = 3609
c. 59.38%
d. 50%
476 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
e. There is a greater percentage of the sample within one standard deviation of the mean than between the first and third
quartiles.
14. a. Standard deviation = 0.46, interquartile range = 0.7.
b.
Sydney Melbourne Brisbane Adelaide Perth Hobart Darwin Canberra
Std dev. 0.46 0.40 0.51 0.45 0.44 0.38 0.67 0.41
c. Sydney
d. Darwin
15. a. Standard deviation = 18.81, interquartile range = 36.21
b. India: standard deviation = 105.86 , interquartile range = 158.59
China: standard deviation = 1143.53, interquartile range = 1988.7
United Kingdom: standard deviation = 8.21, interquartile range = 9.48
USA: standard deviation = 87.34, interquartile range = 145.48
c. The standard deviation is appropriate, as there appear to be no obvious outliers in the data for any country.
16. a. Interquartile range = 2.78, variance = 25.40
b. Interquartile range = 2.78, variance = 4.43
c. The interquartile range has stayed the same value, while the variance has reduced significantly.
0–4 ||| 3 8
7
5–9 ∥∥ ∥∥ 9 6
5
10–14 ∥∥ ∥∥ 9 4
3
15–19 ||| 3
2
20–24 ∥∥ 4 1
0
25–29 | 1
4
9
0–
5–
–1
–1
–2
–2
–3
–3
10
15
20
25
30
35
|
9. Key: 0 6 = 6 errors
Stem Leaf
0 6
1 3 5 7 8
2 0 0 5 6 6 7 8 9
3 1 2 2 8
4 3 6
5 2
10. 8.4s
11. Mean = 250.65 g, Range = 4.6 g
5000–9999 7 500 1
b. f
10
Frequency
8
6
4
2
0 x
00
0
50
50
50
50
50
75
12
17
22
27
32
12
Frequency
10
8
6
4
2
0 x
0 1 2 3 4 5 6
Key 2|1 = 21
Number of sales
15.
Stem Leaf
2 1134888999
3 034555688889
4 00011224555689
16. a. 28
b. 38
17. a. Year 1: mean = 6432.9, median = 1324
Year 2: mean = 2637.6, median = 1201
b. The mean values are significantly different but the medians are very similar. This would seem to indicate the presence of
extreme values in the data.
18. Mean = 202 461.4, standard deviation = 257 819.6
478 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
19. a. Indigenous
100 000 Non-Indigenous
90 000
80 000
70 000
60 000
50 000
40 000
30 000
20 000
10 000
0
NSW Vic. Qld WA SA Tas. ACT NT
b. Indigenous mean = 1410.5, Non-Indigenous mean = 35 241.6
c. Indigenous median = 1156, Non-Indigenous median = 24 008
d. Indigenous: standard deviation = 1193.8, Non-Indigenous: standard deviation = 33 949.0
20. a. 1980 – 89: mean = 19.34, standard deviation = 0.44
2003 – 12: mean = 20.29, standard deviation = 0.45
b. The mean temperature is about one degree higher in the period 2003–12, but the standard deviations indicate that the data
have similar spreads.
c. Total data:mean = 19.82, standard deviation = 0.65
d. The mean of the total data is halfway between the two separate time periods. The standard deviation indicates a much
greater variation from the mean for the total data.