Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views63 pages

CH 11 UnivariateDataAnalysis

Chapter 11 focuses on univariate data analysis, introducing boxplots as a method for graphically representing numerical data through quartiles and identifying outliers. It discusses the classification of data into categorical (ordinal and nominal) and numerical (discrete and continuous) types, along with methods for displaying this data using frequency tables and bar charts. The chapter also includes worked examples and exercises to reinforce the concepts presented.

Uploaded by

636charliseblue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views63 pages

CH 11 UnivariateDataAnalysis

Chapter 11 focuses on univariate data analysis, introducing boxplots as a method for graphically representing numerical data through quartiles and identifying outliers. It discusses the classification of data into categorical (ordinal and nominal) and numerical (discrete and continuous) types, along with methods for displaying this data using frequency tables and bar charts. The chapter also includes worked examples and exercises to reinforce the concepts presented.

Uploaded by

636charliseblue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

CHAPTER 11

Univariate data analysis

11.1 Overview
11.1.1 Introduction
A boxplot is a method of graphically representing groups
of numerical data through their quartiles, where the
spacings between the different sections of the box indicate
the degree of spread and skewness in the data. Boxplots
are also used to determine and show outliers. The boxplot
gives a snapshot of a number of values, such as interquar-
tile range, maximum value, minimum value, range and
median of the data.
The boxplot was introduced by the mathematician John
Tukey, who is regarded as one of the most influential statis-
ticians of the past 50 years. Some of his work in modern
statistics led to concepts that played a central role in the
creation of today’s telecommunication technology. He is
credited with the invention of the computer term ‘bit’.
John Tukey was born in New Bedford, Massachusetts
in 1915. He obtained a BA and a MSc in chemistry from
Brown University in 1937, before moving to Princeton
University where he completed his PhD in mathematics.
He became a professor at 35 and founding chairman of the
Princeton statistics department in 1965. He was awarded
the IEEE Medal of Honor in 1982 for his contributions
to the spectral analysis of random processes and the fast
Fourier transform (FFT) algorithm. He introduced the
boxplot in his book Exploratory data analysis in 1977.

LEARNING SEQUENCE
11.1 Overview
11.2 Classifying and displaying data
11.3 Construct, describe and interpret dot plots and stem-and-leaf plots
11.4 Construct, describe and interpret column graphs and histograms
11.5 Measures of centre
11.6 Measures of spread
11.7 Review: exam practice

Fully worked solutions for this chapter are available in the Resources section of your eBookPLUS at
www.jacplus.com.au.

CHAPTER 11 Univariate data analysis 417


11.2 Classifying and displaying data
11.2.1 Data types
When analysing data it is important to know
what type of data you are dealing with. This
can help to determine the best way to both
display and analyse the data.
Data can be split into two major groups:
categorical data and numerical data. Both
of these can be further divided into two
subgroups.

Categorical data
Data that can be organised into groups or categories is known as categorical data. Categorical data is often
an ‘object’, ‘thing’ or ‘idea’, with examples including brand names, colours, general sizes and opinions.
Categorical data can be classified as either ordinal or nominal. Ordinal data is placed into a natural order or
ranking, whereas nominal data is split into subgroups with no particular order or ranking.
For example, if you were collecting data on income in terms of whether it was ‘High’, ‘Medium’ or ‘Low’,
the assumed order would be to place the ‘Medium’ category between the other two, so this is ordinal data.
On the other hand, if you were investigating preferred car colours the order doesn’t really matter, so this is
nominal data.

Numerical data
Data that can be counted or measured is known as numerical data. Numerical data can be classified as either
discrete or continuous. Discrete data is counted in exact values, with the values often being whole numbers,
whereas continuous data can have an infinite number of values, with an additional value always possible
between any two given values.
For example, the housing industry might consider the number of bedrooms in residences offered for sale. In
this case, the data can only be a restricted group of numbers (1, 2, 3, etc.), so this is discrete data. Now consider
meteorological data, such as the maximum daily temperatures over a particular time period. Temperature data
could have an infinite number of decimal places (23°C, 25.6°C, 18.21°C, etc.), so this is continuous data.

Data

Categorical data Numerical data

Ordinal data Nominal data Discrete data Continuous data

418 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 1

Data on the different types of cars on display in a car


yard is collected.
Verify that the collected data is categorical, and
determine whether it is ordinal or nominal.

THINK WRITE

1. Identify the type of data. The data collected is the brand or model of
cars, so this is categorical data.
2. Identify whether the order of the data is When assessing the types of different cars,
relevant. the order is not relevant, so this is nominal
data.
3. State the answer. The data collected is nominal data.

WORKED EXAMPLE 2

Data on the number of people attending matches at sporting venues is collected.


Verify that the collected data is numerical, and determine whether it is discrete or continuous.

THINK WRITE

1. Identify the type of data. The data collected is the number of people in
sporting venues, so this is numerical data.
2. Does the data have a restricted or infinite set The data involves counting people, so only
of possible values? whole number values are possible.
3. State the answer. The data collected is discrete data.

11.2.2 Displaying categorical data


Once raw data has been collected, it is helpful to summarise the infor-
mation into a table or display. Categorical data is usually displayed
in either frequency tables or bar charts. Both of these display
the frequency (number of times) that a piece of data occurs in the
collected data.

Frequency tables
Frequency tables split the collected data into defined categories and
register the frequency of each category in a separate column. A tally
column is often included to help count the frequency.

CHAPTER 11 Univariate data analysis 419


For example, if we have collected the following data about people’s favourite colours, we could display it
in a frequency table.
Red, Blue, Yellow, Red, Purple, Blue, Red, Yellow, Blue, Red

Favourite colour Tally Frequency 5

Frequency
4
Red |||| 4
3
Blue ||| 3 2
1
Yellow || 2 0

ue

le
d

lo

rp
Re

Bl

Pu
Ye
Purple | 1
Favourite colour

Bar charts
Bar charts display the categories of data on the horizontal axis and the frequency of the data on the vertical
axis. As the categories are distinct, there should be a space between all of the bars in the chart.
The bar chart above displays the previous data about favourite colours.

WORKED EXAMPLE 3

The number of students from a particular school who participate in organised sport on weekends
is shown in the frequency table.
Display the data in a bar chart.

Sport Frequency
Tennis 40
Swimming 30
Cricket 60
Basketball 50
No sport 70

THINK WRITE

1. Choose an appropriate scale for the bar chart. As the


70
frequencies go up to 70 and all of the values are
60
multiples of 10, we will mark our intervals in 10s.
Frequency

50
Display the different categories along the 40
horizontal axis. 30
20
10
0
s

t
g

t
ll
ni

ke

or
in

ba
n

sp
ic
m
Te

et
Cr
im

o
sk

N
Sw

Ba

Sport played

420 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
2. Draw bars to represent the frequency of each
70
category, making sure there are spaces between
60
the bars.

Frequency
50
40
30
20
10
0

t
g

t
ll
ni

ke

or
in

ba
n

sp
ic
m
Te

et
Cr
im

o
sk

N
Sw

Ba
Sport played

The mode
For categorical data, the mode is the category that has the highest frequency. When displaying categorical
data in a bar chart, the modal category is the highest bar.
Identifying the mode allows us to know which category is the most common or most popular, which can
be particularly useful when analysing data.
In some instances there may be either no modal category or more than one modal category. If the data has
no modal category then there is no mode, if it has 2 modal categories then it is bimodal, and if it has 3 modal
categories it is trimodal.

WORKED EXAMPLE 4

Thirty students were asked to pick their favourite time of the day between the following categories:
Morning (M), Early afternoon (A), Late afternoon (L), Evening (E)
The following data was collected:
A, E, L, E, M, L, E, A, E, M, E, L, E, A, L, M, E, E, L, M, E, A, E, M, L, L, E, E, A, E.
a. Represent the data in a frequency table.
b. Construct a bar chart to represent the data.
c. Determine which time of day is the most popular.

THINK WRITE

a. 1. Create a frequency table to capture the data. a.


Time of day Tally Frequency
Morning
Early afternoon
Late afternoon
Evening

CHAPTER 11 Univariate data analysis 421


2. Go through the data, filling in the tally column
Time of day Tally Frequency
as you progress. Sum the tally columns to
complete the frequency column. Morning ∥∥ 5
Early afternoon ∥∥ 5
Late afternoon ∥∥ ∥ 7
Evening ∥∥ ∥∥ ||| 13

b. 1. Choose an appropriate scale for the bar chart. 14


As the frequencies only go up to 13, we will 13
mark our intervals in single digits. Display the 12
different categories along the horizontal axis. 11
10
9

Frequency
8
7
6
5
4
3
2
1
0

g
g

n
n

in
in

oo
oo

en
n

rn
rn
or

Ev
te
te
M

af
af

te
rly

La
Ea

b. Favourite time of day


2. Draw bars to represent the frequency of each
category, making sure there are spaces between 14
the bars. 13
12
11
10
9
Frequency

8
7
6
5
4
3
2
1
0
ng
ng

n
n

oo
oo

ni
ni

rn
rn

e
or

Ev
te
te
M

af
af

te
ly

La
r
Ea

Favourite time of day

c. 1. The highest bar is the modal category. This is c. Evening is the most popular time of day
the most popular category. Write the answer. among the students.

422 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
Interactivity: Create a bar chart (int-6493)

Units 1 & 2 Area 6 Sequence 1 Concepts 1 & 2

Classifying data types Summary screen and practice questions


Categorial data Summary screen and practice questions

Exercise 11.2 Classifying and displaying data


1. WE1 Data on the different types of cereal on
supermarket shelves is collected.
Verify that the collected data is categorical, and
determine whether it is ordinal or nominal.
2. Data on the rating of hotels from ‘one star’ to ‘five star’
is collected.
Verify that the collected data is categorical, and
determine whether it is ordinal or nominal.
3. WE2 Identify whether the following is categorical or numerical data and whether any numerical data is

discrete or continuous.
a. The amount of daily rainfall in Geelong
b. The heights of players in the National Basketball League
c. The number of children in families
d. The type of pet owned by families
4. Identify whether the following categori cal or numerical data is nominal, ordinal, discrete or continuous.
a. The times taken for the place getters in the Olympic 100 m sprinting final
b. The number of gold medals won by countries competing at the Olympic Games
c. The type of medals won by a country at the Olympic Games
d. The countries that won at least one gold medal in any Olympics Games
5. WE3 The preferred movie genre of 100 students is shown in the following frequency table.

Favourite movie genre Frequency


Action 32
Comedy 19
Romance 13
Drama 15
Horror 7
Musical 4
Animation 10

Construct a bar chart to represent the data.

CHAPTER 11 Univariate data analysis 423


6. The favourite pizza type of 60 students is shown in the following bar chart.

15
14
13
12
11
10

Frequency
9
8
7
6
5
4
3
2
1
0
a

ni

er
as
rit

em

ia
o

th
fe

ar
er
he

O
pr

et
pp

t
g

ea
Su

eg
ar

Pe

M
M

V
Favourite pizza
Construct a frequency table to represent the data.
7. A group of students at a university were surveyed about their usual method of travel, with the results
shown in the following table.

Student Transport method Student Transport method


A Bus N Car
B Walk O Bus
C Train P Car
D Bus Q Bus
E Car R Bicycle
F Bus S Car
G Walk T Train
H Bicycle U Bus
I Bus V Walk
J Car W Car
K Car X Train
L Train Y Bus
M Bicycle Z Bus

a. What type of data is being collected?


b. Organise the data into a frequency table.
c. Construct a bar chart to represent the data.

424 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
8. In a telephone survey people were asked the question, ‘Do you agree that convicted criminals should be
required to serve their full sentence and not receive early parole?’ They were required to respond with
either ‘Yes’, ‘No’ or ‘Don’t care’ and the results are as follows.

Person Opinion Person Opinion


A Yes N Yes
B Yes O No
C Yes P No
D Yes Q Yes
E Don’t care R Yes
F No S Yes
G Don’t care T Yes
H Yes U No
I No V Yes
J No W Yes
K Yes X Don’t care
L No Y Yes
M Yes Z Yes
Organise the data into an appropriate table.
a.
b. Contruct a bar graph to represent the data.
c. Identify the data as either nominal or ordinal. Explain your answer.
9. Complete the following table by indicating the type of data.

Data Type
Example: The types of meat displayed in a butcher shop. Categorical Nominal
a. Wines rated as high, medium or low quality
b. The number of downloads from a website
c. Electricity usage over a three-month period
d. The volume of petrol sold by a petrol station per day

10. WE4 Twenty-five students were asked to pick their favourite type of animal to keep as a pet. The

following data was collected.


Dog, Cat, Cat, Rabbit, Dog, Guinea pig, Dog, Cat, Cat, Rat, Rabbit, Ferret, Dog, Guinea pig, Cat,
Rabbit, Rat, Dog, Dog, Rabbit, Cat, Cat, Guinea pig, Cat, Dog
a. Construct a frequency table to represent the data.
b. Construct a bar chart to represent the data.
c. Which animal is the most popular?

CHAPTER 11 Univariate data analysis 425


11. Thirty students were asked to pick their favourite type of music from the following categories: Pop (P),
Rock (R), Classical (C), Folk (F), Electronic (E).
The following data was collected:
E, R, R, P, P, E, F, E, E, P, R, C, E, P, E, P, C, R, P, F, E, P, P, E, R, R, E, F, P, R
a. Construct a frequency table to represent the data.
b. Construct a bar chart to represent the data.
c. Which type of music is the most popular?
12. The different types of coffee sold at a café in one hour
are displayed in the following bar chart.

20
18
16
14
Frequency

12
10
8
6
4
2
0
so

tte

no

ha

te

to
ac

hi

ia
es

oc
La

ci

bl

ch
pr

uc

M
ng

ac
at
Es

pp

Fl

M
Lo
Ca

Type of coffee
Determine the modal category of the coffees sold.
a.
How many coffees were sold in that hour?
b.
13. The results of an opinion survey are displayed in the following bar chart.

45
40
35
Frequency

30
25
20
15
10
5
0
ee

ee

di e
ee

re
e

su
r

gr

gr

gr
ag
isa

sa

ot
ly

N
D
ng

ly
ro

ng
St

ro
St

Opinion
a. What type of data is being displayed?
b. Explain what is wrong with the current data display.
c. Redraw the bar chart displaying the data correctly.

426 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
14. Exam results for a group of students are shown in the following table.

Student Result Student Result Student Result Student Result


1 A 6 C 11 B 16 C
2 B 7 C 12 C 17 A
3 D 8 C 13 C 18 C
4 E 9 E 14 C 19 D
5 A 10 D 15 D 20 E

Construct a frequency table to represent the exam result data.


a.
b. Construct a bar chart to represent the data.
c. What is the type of data collected?
15. The number of properties sold in the capital cities of Australia for a particular time period is shown in
the following table.

Number of bedrooms
City 2 3 4 5
Adelaide 8 12 5 4
Brisbane 15 11 8 6
Canberra 8 12 9 2
Hobart 3 9 5 1
Melbourne 16 18 12 11
Sydney 23 19 15 9
Perth 7 9 12 3
Use the given information to construct a bar graph that represents the number of bedrooms of properties
sold in the capital cities during this time period.
16. The maximum daily temperatures (°C) in Adelaide during a 15-day period in February are listed in the
following table.

Day 1 2 3 4 5 6 7 8

Temp(°C) 31 32 40 42 32 34 41 29

Day 9 10 11 12 13 14 15

Temp(°C) 25 33 34 24 22 24 30

Temperatures greater than or equal to 39°C are


considered above average and those less than 25°C
are considered below average.

CHAPTER 11 Univariate data analysis 427


a. Organise the data into three categories and display the results in a frequency table.
b. Construct a bar graph to display the data.
c. What is the type of data displayed in your bar graph?
17. The following frequency table displays the different categories of purchases in a shopping basket.

Category Frequency
Fruit 6
Vegetables 8
Frozen goods 5
Packaged goods 11
Toiletries 3
Other 7

a. Determine how many items were purchased in total.


b. Calculate what percentage of the total purchases were fruit.
18. The birthplaces of 200 Australian citizens were recorded and are shown in the following frequency table.

Birthplace Frequency
Australia 128
United Kingdom 14
India 10
China 9
Ireland 6
Other 33

a. What type of data is being collected?


b. Construct a bar chart to represent this information.
c. Determine what percentage of the respondents were born in Australia.
19. The following bar chart represents the ages of attendees at a local sporting event.

20
18
16
14
Frequency

12
10
8
6
4
2
0
0s

29

39

+
9

–5
–4

60
r2


20

30

50
40
e
nd
U

Age (years)

428 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
a. Construct a frequency table to represent the data.
b. Determine the modal category.
c. The age groups are changed to Under ‘20’, ‘20–39’, ‘40–59’ and ‘60+’. Redraw the bar chart with
these new categories.
d. Does this change the modal category?
20. Data for the main area of education and study for a selected group of people aged 15 to 64 during a
particular year in Australia is shown in the following table.

Number of people (thousands)


Main area of education and study 15–19 20–24 25–34 35–44 45–64
Agriculture 10 9 14 5 5
Creative arts 36 51 20 10 9
Engineering 59 75 50 13 6
Health 44 76 64 32 32
Management and commerce 71 155 135 86 65

a. Construct separate bar charts for each area of education and study to represent the data.
b. Construct separate bar charts for each age group to represent the data.

11.3 Construct, describe and interpret dot plots


and stem-and-leaf plots
11.3.1 Stem plots
Stem plots (or stem-and-leaf plots) can be used to display both discrete and continuous numerical data. The
data is grouped according to its numerical place value (the ‘stem’) and then displayed horizontally as a single
digit (the ‘leaf’). In an unordered stem plot, the data values have been placed into categories but do not appear
in order. In an ordered stem plot, the values are placed in numerical order with the smallest values closest to
the stem. When answering questions relating to stem plots, present your final answer as an ordered stem plot.
If there are 4 or fewer different place values in your data, it may be preferable to make the stems of the plot
represent a class set of 5 instead of a class set of 10. This can be done by inserting an asterisk (∗) after the
second of the stems with the same number, as shown in the following example.
Key: 1 | 4 = 14
1* | 7 = 17
Stem Leaf
1 4
1* 7 7 9
2 2 3 4 4
2* 6 8
3 0 1 3
3* 5 6 9
4 0

Note that the data has been presented in neat vertical columns, making it easy to read.
Always remember to include a key with your stem plot to indicate what the stem and the leaf represents
when put together.
CHAPTER 11 Univariate data analysis 429
Back-to-back stem plots Key: 1 | 4 = 14
As we will see later in this chapter back-to-back stem plots can be used to 1* | 7 = 17
compare two different sets of data. Back-to-back stem plots share the same Leaf Stem Leaf
stem, with one data set appearing on the left of the stem and the other data 3 2 1 4
set appearing on the right.
9 8 6 6 1* 7 7 9
4 2 1 0 2 2 3 4 4
7 6 2* 6 8
4 2 1 3 0 1 3
6 5 3* 5 6 9
4 0
WORKED EXAMPLE 5

The following data set (of 31 values) shows the maximum daily temperature during the month of
January in a particular area.
26, 22, 24, 26, 28, 28, 27, 42, 25, 25, 29, 31, 23, 33, 34, 27, 39, 44, 35, 34, 27, 30, 36, 30, 30, 28, 33,
23, 24, 34, 37
Construct a stem plot to represent the data.

THINK WRITE

1. Identify the place values for the data. If there are 4 The temperature data has values in the
or less different place values, split each into two. 20s, 30s and 40s.
Stem Leaf
2
2*
3
3*
4

2. Write the units for each stem place value in Key: 2 | 2 = 22°C
numerical order, with the smallest values closest to Stem Leaf
the stem. Make sure to keep consecutive numbers 2 2 3 3 4 4
level as they move away from the stem. 2* 5 5 6 6 7 7 7 8 8 8 9
Remember to add a key to your plot. 3 0 0 0 1 3 3 4 4 4
3* 5 6 7 9
4 2 4

11.3.2 Dot plots


Discrete numerical data can also be displayed as a dot plot. In these plots every
data value is represented by a dot. The most common values can then be clearly
identified. You can also use dot plots to represent categorical data.
When drawing a dot plot, be careful to make sure that the dots are evenly and
consistently spaced.
1 2 3 4 5

430 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 6

The frequency table shows the number of floors in apartment buildings in a particular area.
Construct a dot plot to represent the data.

Number of floors Frequency


2 2
3 5
4 3
5 0
6 4
7 2

THINK WRITE

1. Draw a horizontal scale using the discrete data The discrete data values are given by
values shown. the number of floors.

2 3 4 5 6 7
Number of floors

2. Place one dot directly above the number on the


scale for each discrete data value present, making
sure to keep corresponding dots at the same level.
2 3 4 5 6 7
Number of floors

Interactivity: Stem plots (int-6242)


Interactivity: Create stem plots (int-6495)
Interactivity: Dot plots, frequency tables and histograms, and bar charts (int-6243)

Units 1 & 2 Area 6 Sequence 1 Concept 3

Stem-and-leaf plots and dot plots Summary screen and practice questions

CHAPTER 11 Univariate data analysis 431


Exercise 11.3 Construct, describe and interpret dot plots and
stem-and-leaf plots
1. WE5 Construct a stem plot for the following data set.

The dollars spent per day on lunch by a group of 15 people:


22, 21, 22, 24, 19, 22, 24, 21, 22, 23, 25, 26, 22, 23, 22
2. Construct a stem plot for the following data set. The number of hours spent per week playing computer
games by a group of 20 students at a particular school:
14, 21, 25, 7, 25, 20, 21, 14, 21, 20, 6, 23, 26, 23, 17, 13, 9, 24, 17, 24
3. Construct a stem plot for the following data set, the num-
ber of passengers per day transported by a taxi driver (40
values):
33, 27, 44, 47, 23, 24, 22, 35, 42, 36, 17, 25, 34,
13, 15, 27, 28, 23, 37, 34, 22, 27, 23, 20, 12, 15,
43, 30, 27, 15, 27, 36, 20, 23, 35, 36, 28, 17, 14, 15
4. Construct a stem plot for the number of patients per day
treated by a doctor (40 values):
44, 38, 55, 56, 23, 34, 31, 43, 51, 45, 26, 25, 45, 23,
24, 38, 37, 32, 46, 41, 21, 28, 20, 34, 30, 22, 25, 51, 60, 17, 23, 24,
26, 30, 33, 41, 26, 35, 17, 24, 25
5. WE6 Construct a dot plot to represent each of the following collections of data.
a. The number of wickets per game taken by a bowler in a cricket season.

Number of wickets Frequency


0 4
1 6
2 4
3 2
4 1
5 1

b. The number of hours per week spent checking emails by a group of workers at a particular company.

Hours checking emails Frequency


1 1
2 1
3 2
4 4
5 8
6 4

432 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
6. Construct dot plots to represent the following collections of data.
a. The scores per round of a golfer over a particular time period (40 values):

73, 77, 74, 77, 73, 74, 72, 75, 72, 76, 77, 75, 74, 73, 75, 77, 78, 73, 77, 74,
72, 77, 73, 70, 72, 75, 73, 70, 77, 75, 77, 76, 70, 73, 75, 76, 78, 77, 74, 75
b. The scores out of 10 in a multiple choice test for a group of students (30 values):
6, 7, 4, 7, 3, 7, 7, 5, 7, 6, 7, 5, 1, 3, 5, 7, 8, 3, 7, 4, 9, 5, 4, 6, 7, 9, 10, 5, 7, 4
7. The data below give the time taken (in minutes) for each of 40 runners on a 10 km fun run. Prepare a
stem-and-leaf diagram for the data using a class size of 10 minutes.
36 42 52 38 47 59 72 68 57 82
66 75 45 42 55 38 42 46 48 39
42 58 40 41 47 53 68 43 39 48
71 42 50 46 40 52 37 54 48 52
8. The typing speed (in words per minute) of 30 word processors is recorded below. Prepare a stem-and-leaf
diagram of the data using a class size of 5.
96 102 92 96 95 102 95 115 110 108
88 86 107 111 107 108 103 121 107 96
124 95 98 102 108 112 120 99 121 130
9. Twenty transistors are tested by applying increasing voltage until they are destroyed. The maximum
voltage that each could withstand is recorded below. Prepare a stem-and-leaf plot of the data using a class
size of 0.5.
14.8 15.2 13.8 14.0 14.8 15.7 15.5 15.6 14.7 14.3
14.6 15.2 15.9 15.1 14.3 14.6 13.9 14.7 14.5 14.2
Questions 10 and 11 refer to the following. Each student in a class has been assigned a newly planted
tree to look after, and must provide a weekly report on its growth and condition. From the latest reports, the
teacher recorded the height of each tree (in mm), and entered these in the stem-and-leaf plot shown below.

Key: 12 | 1 = 1210 mm
12* | 5 = 1250 mm
Stem Leaf
12 1 2 4
12* 5 7 7 9 9
13 0 1 1 2 3 4 4
13* 5 6 6 7 9 9
14 0 2 3 4
14* 6 7

CHAPTER 11 Univariate data analysis 433


10. MC The class size used in the stem-and-leaf plot is:
A. 1 B. 10 C. 33 D. 50
11. MC The number of scores that have been recorded is:
A. 21 B. 27 C. 33 D. 1210

12. The following set of data indicates the number of people


who attend early morning fitness classes run by a business
for its workers:
14, 17, 13, 8, 16, 21, 25, 16, 19, 17, 21, 8, 13
Display the data as a stem plot.

13. The total number of games played by the players from two
basketball squads is shown in the following stem plots.

Key: 0 | 1 = 1 game played Key: 2 | 4 = 24 games played


Stem Leaf Stem Leaf
0 1 2 4
1 4 7 3 1 2 6
2 4 4 8 4 3 4 5
3 3 3 5 6 5 2
4 1 2 3 6
5 1 1 7
6 5 8 2 5 7
7 9 3
8
9 1

a. Describe the shape of each distribution.


b. Construct a stem plot that combines the data for the two teams.

14. Consider the set of data in the stem plot shown. Key: 0 |1 = 1
a. Instead of grouping the data in 10s, the stems could be split in Stem Leaf
half to use groups of 5. For example, a split stem plot for the same 0 1
data set could place the data values from 10 to 14 in a row 1 1 1 1 4 4 6 6 7 8
labelled ′ 1′ , while data values from 15 to 19 are put in a row
2 3 3 4 4 7 7 9
labelled ′ 1∗′ . Use the data from the original stem plot to complete
the split stem plot.
b. Comment on the effect of splitting the stem for the data in this question.

434 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
11.4 Construct, describe and interpret column
graphs and histograms
11.4.1 Grouped data
Numerical data may be represented as either grouped data or ungrouped data. When assessing ungrouped
data, the analysis we do is exact; however, if we have a large data set, the data can be difficult to work with.
Grouping data allows us to gain a clearer picture of the data’s distribution, and the resultant data is usually
easier to work with.
When grouping data, we try to pick class sizes so that between 5 and 10 classes are formed. Ensure that all
of the classes are distinct and that there are no overlaps between classes.
When creating a frequency table to represent grouped continuous data, we will represent our class intervals
in the form 12 −< 14. This interval covers all values from 12 up to 14 but does not include 14.

WORKED EXAMPLE 7

The following data represents the time (in seconds) it takes for each individual in a group of
20 students to run 100 m.
18.2, 20.1, 15.6, 13.5, 16.7, 15.9, 19.3, 22.5, 18.4, 15.9, 12.4, 14.1, 17.7, 19.4, 21.0, 20.4,
18.2, 15.8, 16.1, 14.6
Group and display the data in a frequency table.

THINK WRITE

1. Identify the smallest and largest values in the data Smallest value = 12.4
set. This will help you to choose your class size and Largest value = 22.5
decide what the first class should be. We will have class intervals of 2,
starting with 12−<14.
2. Draw a frequency table to represent the data.
Complete the tally column in your table, and use Time (seconds) Tally Frequency
this to fill in the frequency column. 12−<14 ∥ 2
14−<16 ∥∥ | 6
16−<18 ||| 3
18−<20 ∥∥ 5
20−<22 ||| 3
22−<24 | 1

11.4.2 Displaying numerical distributions


The types of display we chose to represent numerical data depend on whether that data is discrete or
continuous. Representations of discrete data should imply that irrelevant values are impossible, so we usually
insert gaps between the data values. On the other hand, continuous data displays often have no gaps between
whole numbers, as all possible values between the listed values are possible.
A column graph (or bar graph) is used when we wish to show a quantity. Categories are written on the
horizontal axis and frequencies on the vertical axis.

CHAPTER 11 Univariate data analysis 435


WORKED EXAMPLE 8

The table below shows the results of the survey on


favourite sports.
Construct a column graph to represent this information.

Sport Frequency

AFL 6

Basketball 2

Cricket 7

Netball 2

Rugby League 3

Rugby Union 1

Soccer 2

Tennis 1

THINK WRITE
Favourite sports of 24 students
1. Draw the horizontal axis showing each
y
sport.
2. Draw a vertical axis to show frequencies 7
6
Frequency

up to 7.
5
3. Draw the columns all the same width 4
with gaps between. 3
4. Use a ruler. 2
1
5. Label the axes.
0 x
6. Give the graph a title.
ue

on

er

s
FL

ll
ll

ni
ke

ba

cc
ba

ag

ni

n
A

ic

So

Te
et

U
et

Le
Cr

N
sk

y
y

gb
Ba

gb

Ru
Ru

Sport

Sector graphs
A sector graph (circle graph, or pie graph) is used when we want the graph to display a comparison of
quantities. An angle is drawn at the centre of the circle that is the same fraction of 360° as the fraction of
people making each response.

436 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 9

The table below shows the results of a survey


on favourite sports. Draw a sector graph of the
results.

Sport Frequency

AFL 6

Basketball 2

Cricket 7

Netball 2

Rugby League 3

Rugby Union 1

Soccer 2

Tennis 1

THINK WRITE
6 2
1. Calculate each angle as a fraction AFL = × 360° Basketball = × 360°
24 24
of 360° by dividing the frequency of
each sport by the total frequency = 90° = 30°
and multiplying by 360°. 7 2
Cricket = × 360° Netball = × 360°
24 24
= 105° = 30°
3 1
Rugby League = × 360° Rugby Union = × 360°
24 24
= 45° = 15°

2 1
Soccer = × 360° Tennis = × 360°
24 24
= 30° = 15°

2. Construct the graph, labelling each 15° Sport


sector or providing a legend. 15°
Tennis
30° Rugby Union
Basketball
105°
30° Netball
Soccer
30° Rugby League
90° AFL
45° Cricket

CHAPTER 11 Univariate data analysis 437


Histogram
We can represent continuous numerical data using a histogram, which is very similar to a bar chart with a
few essential differences.
In a histogram, the width of each column represents a range of data values, while the height represents their
frequencies. For example, in the following histogram the first column represents the frequency of data values
that are greater than or equal to 10 but less than 20 (10 ≤ x < 20).

Frequency

x
0 10 20 30 40 50 60 70 80 90
Data values

WORKED EXAMPLE 10

The following frequency table represents the heights of players in a basketball squad.

Height (cm) 175−<180 180−<185 185−<190 190−<195 195−<200 200−<205

Frequency 1 3 6 3 1 1

Construct a histogram to represent this data.

THINK WRITE

1. Look at the data range and use the leading values The height data in the table has
from each interval in the table for the scale of the intervals starting from 175 cm and
horizontal axis. increasing by 5 cm.
Heights of a basketball squad

7
6
Frequency

5
4
3
2
1

0 x
175 180 185 190 195 200 205
Height (cm)

438 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
2. Draw rectangles for each interval to the height of Heights of a basketball squad
the frequency indicated by the data in the table.
7
6

Frequency
5
4
3
2
1

0 x
175 180 185 190 195 200 205
Height (cm)

Choosing which plot to use


Grouped data should be represented by a histogram, boxplot or dot plot. On the other hand, we should usually
represent ungrouped data by using stem plots.
We should not use a stem plot to represent our data if the range of values in the data set is large, or if the
data values have a high number of units in them (ignoring decimal places), as these stem plots can become
unwieldy and difficult to use.

Interactivity: Create a histogram (int-6494)

Units 1 & 2 Area 6 Sequence 1 Concept 4

Column graphs and histograms Summary screen and practice questions

Exercise 11.4 Construct, describe and interpret column graphs


and histograms

1. The following data represents the time (in seconds) it


WE7

takes for each individual in a group of 20 students


to swim 50 m.
48.5, 54.1, 63.0, 39.7, 51.3, 57.7, 68.4, 59.4, 37.5, 41.8,
72.3, 56.3, 45.4, 39.2, 60.3, 56.6, 48.1, 42.9, 53.3, 64.1

Group and display the data in a frequency table.

CHAPTER 11 Univariate data analysis 439


2. The following data set indicates the time, in seconds, it takes for a tram to travel between two stops on
20 weekday mornings.
95, 112, 99, 91, 105, 110, 97, 122, 108, 101, 95, 89, 100, 115, 124, 98, 87, 111, 115, 106
a. Group and display the data in a frequency table with intervals of width 10 seconds.
b. Group and display the data in a frequency table with intervals of width 5 seconds.
3. The data below show the number of customers that entered a shop each day in a certain month.
114, 195, 175, 163, 180, 120, 204, 199, 178, 216, 200, 147, 168, 173, 102, 150,
169, 185, 173, 164, 130, 199, 158, 163, 141, 155, 132, 143, 190, 179, 200
Display the data in a frequency table of 5 groups.
4. Construct a column graph to display the data from question 3.
5. The marks scored on a Maths exam, out of 100, by 25 Year
11 students are shown below.
87, 44, 95, 66, 78, 69, 66, 92, 78, 54, 60, 66, 69, 66, 77, 79, 66, 71, 71, 83,
74, 81, 69, 70, 57
Copy and complete the table.

Mark Tally Frequency

40–49

50–59

60–69

70–79

80–89

90–99
6. Construct a column graph to display the data from question 5.
7. Construct a sector graph to compare the number of people in each category from question 5.
8. A class of students was asked to identify the make of car their family owned. Their responses are shown
below.

Holden Ford Nissan Mazda Toyota Holden


Ford Holden Ford Mitsubishi Toyota Toyota
Nissan Holden Holden Ford Toyota Mazda
Mazda Toyota Ford Holden Holden Ford
Mitsubishi Toyota Holden Ford Ford Toyota
Construct a frequency table to display these results.
9. WE8 Construct a column graph to display the data from question 8.
10. WE9 Construct a sector graph to display the data from question 8.

440 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
11. WE10 The following frequency table represents the cholesterol levels measured for a group of people.

Construct a histogram to represent this data.

Cholesterol level (mmol / L) 1−<2.5 2.5−<4.0 4.0−<5.5 5.5−<7.0 7.0−<8.5


Frequency 2 8 12 14 10

12. The following frequency table represents the distances travelled to school by a group of students.
Construct a histogram to represent this data.

Distance travelled (km) 0−<2 2−<4 4−<6 6−<8 8−<10


Frequency 18 26 14 8 2

13. Organise each of the following data sets into a frequency table using intervals of five, commencing from
the lowest value. Then draw a histogram to represent the data.
5, 7, 14, 17, 13, 24, 22, 15, 12, 26, 17, 15, 14, 13, 15, 7, 8, 13, 17, 24,
22, 7, 13, 20, 12, 15, 23, 20, 17, 15, 17, 16, 20, 23, 15, 16, 8, 17, 14, 15
14. Organise each of the following data sets into a frequency table using intervals of five, commencing from
the lowest value. Then draw a histogram to represent the data.
34, 28, 45, 46, 13, 24, 11, 33, 41, 35, 16, 15, 35, 13, 14, 28, 27, 22, 36, 31,
11, 18, 24, 20, 12, 15, 41, 50, 27, 13, 14, 16, 20, 23, 31, 26, 25, 27, 34, 35

11.5 Measures of centre


11.5.1 Describing distributions
The distribution of a set of data can be described in terms of a number of key features, including shape,
modality, spread and outliers.

Shape
The shape of a numerical distribution is an important indicator of some of the key measures for further analysis
and is one of the most important reasons for displaying the data in a graphical form. Shape will generally be
described in terms of symmetry or skew. Symmetrical data distributions have higher frequencies around their
centres with a relatively evenly balanced spread to either side, while skewed distributions have the majority of
their values towards one end. Distributions with higher frequencies on the left side of the graph are positively
skewed, while those with higher frequencies on the right side are negatively skewed.
y
Frequency
Frequency

x x
Symmetrical Positively skewed

CHAPTER 11 Univariate data analysis 441


y

Frequency
x
Negatively skewed

Modality
The mode of a distribution is the data value or class interval y
that has the highest frequency. This will be the column or
row on the display that is the longest. When there is more

Frequency
than one mode, the data distribution is multimodal. This can
indicate that there may be subgroups within the distribution
that may require further investigation. Bimodal distributions
can occur when there are two distinct groups present, such as
in data values that typically have clear differences between Bimodal x
male and female measurements.

Spread
An awareness of how widely spread the data is can be an important consideration when conducting any further
analysis. Common indicators of spread include the measures of range and the standard deviation. The graph
will again point to which measures might be most appropriate to use.

Outliers
An outlier is a data value that is an anomaly when compared to the y
majority of the sample. Sometimes outliers are just unusual readings or
measurements, but they can also be the result of errors when recording
the data. Outliers can have a significant effect on some of the measures
that are used for further data analysis, and they are sometimes removed
from the sample for those calculations. The graphical display of the data
can alert us to the presence of potential outliers.
x

WORKED EXAMPLE 11

Describe the distribution of the data shown in the following histogram.

5
4
3
2
1

0 x
1 2 3 4 5 6 7 8 9 10 11 12 13

442 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
THINK WRITE

1. Look for the mode and comment on its The distribution has one mode with data values
value. most frequently in the 2 ≤ x < 3 interval.
2. Identify the presence of any potential There is one potential outlier in the interval
outliers. between 12 and 13.
3. Describe the shape in terms of symmetry If we include the outlier, the data set can be
or skewness. described as positively skewed as it is clustered
to the left. If we don’t include the outlier, the
distribution can be considered to be
approximately symmetrical.

11.5.2 Measures of centre


In many practical settings it is common to use a single measurement to represent an entire set of data. For
example, discussions about fuel costs will often focus on the average price of petrol, while in real estate
the median house price is considered an important measurement. These representative values are known as
measures of centre as they are located in the central region of the data. The mean, median and mode are all
measures of centre, and the most appropriate one to use depends on various characteristics of the data set.

11.5.3 The mean (or arithmetic mean)


The mean of a data set is what we commonly refer to as the average. It is calculated by dividing the sum of
the data values by the number of data values. If the data set is a sample of the population, the symbol used for
the mean is x (pronounced ‘x-bar’), whereas if the data set is the whole population, we use the Greek letter 𝜇
(pronounced ‘mu’).

sum of data values


x=
number of data values
Σxi
=
n
Σxi
(or 𝜇 = )
n

The Greek letter Σ(sigma) indicates calculating the sum of these values.

WORKED EXAMPLE 12

Calculate the mean of the following data set, correct to 2 decimal places.
6, 3, 4, 5, 7, 7, 4, 8, 5, 10, 6, 10, 9, 8, 3, 6, 5, 4

THINK WRITE
1. Calculate the sum of the data values. 6 + 3 + 4 + 5 + 7 + 7 + 4 + 8 + 5 + 10 +
6 + 10 + 9 + 8 + 3 + 6 + 5 + 4 = 110
110
2. Divide the sum by the number of data values. x=
18
= 6.111...
3. State the answer. The mean of the data set is 6.11.

CHAPTER 11 Univariate data analysis 443


Calculating the mean for grouped data
To calculate the mean from a table of data that has been organised into groups, we first need to calculate the
midpoints of the intervals. We then multiply the values of the midpoints by the corresponding frequencies,
and find the sum of these values. Finally, we divide this sum by the total of the frequencies.

Σxf
If f = the values of the frequencies and x = the values of the midpoints, then x = .
Σf

WORKED EXAMPLE 13

Calculate the mean of the data set displayed in the following frequency table.

Intervals Frequency
0−<5 3
5−<10 12
10−<15 3
15−<20 2

THINK WRITE

1. Add a column to the table and enter the Intervals Frequency Midpoint (x)
midpoints for the corresponding
0−<5 3 2.5
intervals.
5−<10 12 7.5
10−<15 3 12.5
15−<20 2 17.5

2. Add a column to the table and enter the Intervals Frequency Midpoint (x) xf
product of the frequencies and midpoints
0−<5 3 2.5 7.5
(xf) for the corresponding intervals.
5−<10 12 7.5 90
10−<15 3 12.5 37.5
15−<20 2 17.5 35

444 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
3. Calculate the totals of the f and xf Intervals Frequency Midpoint (x) xf
columns.
0– < 5 3 2.5 7.5
5– < 10 12 7.5 90
10– < 15 3 12.5 37.5
15– < 20 2 17.5 35
f = 20 fx = 170

∑ xf
4. Calculate the mean using the formula x=
∑ xf ∑f
x= . 170
∑f =
20
= 8.5

5. State the answer. The mean of the distribution is 8.5.

11.5.4 The median


When considering a value that truly indicates the centre of a distribution, it would make sense to look at the
number that is actually in the middle of the data set. The median of a distribution is the middle value of the
ordered data set if there are an odd number of values. If there are an even number of values, the median is
halfway between the two middle values. It can be found using the rule:

n + 1 th data value
Median = (
2 )

WORKED EXAMPLE 14

Calculate the median of the following data sets.


a. 5, 3, 4, 5, 7, 7, 4, 8, 5, 10, 6, 10, 9, 8, 3, 6, 5, 4

b. 16, 3, 4, 5, 17, 27, 14, 18, 15, 10, 6, 10, 9, 8, 23, 26, 35

THINK WRITE

a. 1. Put the data set in order from lowest a. 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 7, 7, 8, 8, 9, 10, 10


to highest.
2. Identify the data value in the There are 18 data values, so the median will be in
n+1
th 18 + 1
position. position = 9.5, or halfway between
( 2 ) ( 2 )
position 9 and position 10.

3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 7, 7, 8, 8, 9, 10, 10

median = 5.5

CHAPTER 11 Univariate data analysis 445


3. State the answer. The median of the data set is 5.5.
b. 1. Put the data set in order from lowest b. 3, 4, 5, 6, 8, 9, 10, 10, 14, 15, 16, 17, 18, 23, 26, 27, 35
to highest.
2. Identify the data value in the There are 17 data values, so the median will be in
n+1
th 17 + 1
position. position = 9.
( 2 ) ( 2 )

3, 4, 5, 6, 8, 9, 10, 10, 14, 15, 16, 17, 18, 23, 26, 27, 35

median = 14
3. State the answer. The median of the data set is 14.

11.5.5 Limitations of measures of centre


The mean
An important property of the mean is that it includes all the data in its calculation. As such, it has genuine
credibility as a representative value for the distribution. On the other hand, this property also makes it sus-
ceptible to being adversely affected by the presence of extreme values when compared to the majority of the
distribution.
Consider the data set: 3, 4, 5, 6, 7, 8, 9, which would have a mean of
3+4+5+6+7+8+9
= 6.
7
Compare this to a data set with the same values with the exception of the largest one: 3, 4, 5, 6, 7, 8, 90,
which has a mean of
3 + 4 + 5 + 6 + 7 + 8 + 90
= 17.6.
7
As we can see, the mean has been significantly influenced by the one extreme value.
When the data is skewed or contains extreme values, the mean becomes less reliable as a measure of centre.

The median
In the previous example we saw that the mean is affected by an
extreme value in the data set; however, the same cannot be said 3, 4, 5, 6, 7, 8, 9 3, 4, 5, 6, 7, 8, 90

for the median. In both data sets the median will be the value in
the fourth position. median
The median is therefore more reliable than the mean when the data is skewed or contains extreme values.
Another potential advantage of median is that it is often one of the data values, whereas the mean often isn’t.
However, the median can be considered unrepresentative as it is not calculated by taking into account the
actual values in the data set.
Choosing between measures of centre
In most situations it is preferable to give both the median and the mean as a measure of centre, as between
them they portray a more accurate picture of the data set. However, sometimes it is only possible to give one
of these values to represent our data set.
When choosing which measure of centre to use to represent a data set, take into account the distribution of
the data. If the data has no outliers and is approximately symmetrical, then the mean is probably the better mea-
sure of centre to represent the data. If there are outliers, the median will be significantly less affected by these
and would be a better choice to represent the data. The median is also a good choice to represent skewed data.
446 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
Also consider what each measure of centre tells you about the data. The values of the mean and median
can vary significantly, so choosing which one to represent the data set can be important, and you will need to
justify your choice.

WORKED EXAMPLE 15

The following histogram represents the IQ test results for a group of people.
y

Frequency

x
0 60 70 80 90 100 110 120 130 140 150 160
IQ
Determine which measure of centre is best to represent the data set.
THINK WRITE

1. Look at the distribution of the data set. The data set is approximately
symmetrical and has no
outliers.
2. If the data set is approximately symmetrical with no outliers, The mean is the better
the mean is probably the better measure of centre to represent measure of centre to
the data set. If there are outliers or the data is skewed, the represent this data set.
median is probably the best measure of centre to use. State
the answer.

Interactivity: Mean, median, mode and quartiles (Int-6496)

Units 1 & 2 Area 6 Sequence 1 Concepts 5 & 6

Describing distributions Summary screen and practice question


Measures of centre Summary screen and practice question

CHAPTER 11 Univariate data analysis 447


Exercise 11.5 Measures of centre
1. WE11 Describe the numerical distributions shown by the following histograms.
a. y b. y

x x
0 5 10 15 20 25 30 35 40 45 0

15
30
45
60
75
90
5
0
5
10
12
13
2. Describe the distribution of the following data sets after drawing histograms with intervals of
10 commencing with the smallest values.
a. 105, 70, 140, 127, 132, 124, 122, 125, 123, 126, 107, 105, 104, 113, 125, 70, 88, 103, 107,
124, 122, 76, 103, 120, 112, 115, 123, 120, 117, 115, 107, 106, 120, 123, 115, 74, 128, 119
b. 4, 18, 35, 26, 12, 25, 21, 34, 43, 37, 6, 25, 25, 23, 34, 38, 37, 22, 36, 31,
21, 28, 34, 30, 32, 25, 31, 40, 37, 33, 24, 26, 10, 13, 21, 36, 35, 37, 24, 25
3. A group of 26 students received the following marks on a test:
6, 4, 3, 8, 6, 9, 5, 6, 9, 7, 7, 8, 5, 7, 4, 3, 8, 6, 5, 7, 9, 5, 6, 6, 7, 8
a. Construct a dot plot to display the data.
b. Describe the distribution.
4. WE12 Calculate the mean of the following data set.

108, 135, 120, 132, 113, 138, 125, 138, 107, 131, 113, 136, 119, 152, 134, 158, 136, 132, 113, 128
5. a. Calculate the mean of the following data set correct to 2 decimal places.
25, 23, 24, 25, 27, 26, 23, 28, 24, 20, 25, 20, 29, 28, 23, 27, 24
b. Replace the highest value in the data set from part a with the number 79, and then calculate the mean
again, correct to 2 decimal places.
c. How did changing the highest value in the data set affect the mean?
6. WE13 Calculate the means of the data sets displayed in the following tables, giving your answer correct
to 2 decimal places.
a. b.
Intervals Frequency Intervals Frequency

0– < 5 12 20– < 35 12

5– < 10 10 35– < 50 6

10– < 15 1 50– < 65 13

15– < 20 4 65– < 80 4

448 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
7. For each of the following sets of data, estimate the mean by creating a table using intervals that
commence with the lowest data value and increase by an amount that is equal to the difference between
the highest and lowest data value divided by 5. Give your answers correct to 2 decimal places.
a. 205, 203, 204, 205, 207, 216, 213, 218, 214, 220, 225, 220, 229, 228, 233, 238, 234
b. 5, 13, 24, 5, 27, 16, 13, 18, 24, 10, 5, 20, 30, 18, 13, 7, 14
8. Calculate the means of the following data sets correct to 2 decimal places.

Key: 1 | 2 = 12
a. b.
10
Stem Leaf 8
6
0 1 1 5 7
4
1 2 6 2
2 3 4 4 5 0
x
0 5 10 15 20 25 30 35
3 1 3
4 0 0 3
5 5
6 5

9. WE14Calculate the medians of the following data sets.


15, 3, 54, 53, 27, 72, 41, 85, 15, 11, 62, 16, 49, 81, 53, 56, 75, 42
a.
b. 126, 301, 422, 567, 179, 267, 149, 198, 165, 170, 602, 180, 109, 85, 223, 206, 335
10. a. Calculate the median of the following data set.
21, 22, 23, 24, 27, 26, 22, 27, 23, 21, 24, 20, 31, 25, 24, 28, 23
b. Replace the highest value in the data set from part a with the number 96 and then calculate the
median again.
c. How does changing the highest value in the data set affect the median?
11. Calculate the median of the following data sets.

a.

76 80 84 88 92 96 100 104 108 112

b. 1.02, 2.01, 3.21, 4.63, 1.49, 3.45, 1.17, 1.38, 1.47, 1.70, 5.02, 1.38, 1.91, 8.54
12. WE15 The following stem plot represents the lifespan of different animals at an animal sanctuary.
Determine which measure of centre is better to represent the data set.

Key: 1 | 2 = 12
Stem Leaf
0 3 5 9
1 2 4 6 8
2 0 1 4 5 5 7 9
3 0 2 6
4
5
6 0 3

CHAPTER 11 Univariate data analysis 449


13.The following data set represents the salaries (in $ 000s) of workers at a small business.
45, 50, 55, 55, 55, 60, 65, 65, 70, 70, 75, 80, 220
a. Calculate the mean of the salaries correct to 2 decimal places.
b. Calculate the median of the salaries.
c. When it comes to negotiating salaries, the workers want to use the mean to represent the data and the
management want to use the median. Explain why this might be the case.
14. a. Calculate the mean (correct to 2 decimal places) and median for the following data set.

Average annual rainfall in selected Australian cities


City Rainfall (mm)
Sydney 1276
Melbourne 654
Brisbane 1194
Adelaide 563
Perth 745
Hobart 576
Darwin 1847
Canberra 630
Alice Springs 326

b. Which would be the most appropriate measure of centre to represent this data?
15. On a particular weekend, properties are sold at auction for the following 30 prices:
$4 700 000, $3 160 000, $2 725 000, $2 616 000, $2 560 000, $241 000,
$265 000, $266 000, $310 000, $320 000, $3 010 000, $2 580 000,
$2 450 000, $2 300 000, $2 275 000, $286 000, $325 000, $330 000,
$435 500, $456 000, $1 350 000, $1 020 000, $900 000, $735 000,
$733 000, $305 000, $330 000, $347 000, $357 000, $408 000
a. Calculate the mean and median for the data.
b. Construct a histogram of the data using intervals commencing at the lowest value and increasing by amounts
of $250 000.
c. Mark in the location of the mean and median on the histogram.
d. Which would be the more appropriate measure of centre to represent this data?
16. The heights in metres of fruit trees in an orchard were measured with the following results:

1.83, 1.94, 1.98, 1.91, 1.88, 1.76, 2.12, 2.05, 2.11, 2.01, 2.04, 2.08,
2.07, 2.06, 2.05, 2.03, 1.94, 1.96, 2.12, 2.14, 2.04, 2.01, 2.03, 2.06,
2.02, 1.94, 1.98, 2.25, 2.04, 2.06
a. Use intervals of 0.05 m starting with 1.75−<1.80 to display the data in
a frequency table.
b. Construct a histogram to display the data and use it to comment on the
appropriateness of using either the mean or the median to represent
the data.
c. Use the frequency table to calculate the mean.

450 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
17. The winning margins in the NRL over a particular period of time were as follows.

Winning margin Frequency


2 4
4 12
6 8
8 5
10 4
12 4
16 1
20 1
34 1

Calculate the mean and the median.


a.
Which is the more appropriate measure of centre for this data set and why?
b.
18. The value of the Australian dollar in US cents over a particular period of time was as follows:
93, 91, 88, 94, 86, 90, 93, 95, 84, 81, 91, 96, 99, 101, 106, 104, 104, 99, 99, 96, 94, 95, 91,
90, 89, 88, 89, 86, 88, 87, 83, 88, 84, 85, 86, 86, 87, 88, 87, 84
Calculate the mean and median of the raw data.
a.
b. Construct a histogram to display the data, using intervals commencing at 80 – < 85.
c. Mark in the positions of the mean and median on the histogram.
d. Comment on the positions of the mean and median.
19. The annual earnings of a group of professional tennis players are as follows:
$5 700 000, $1 125 000, $620 000, $4 950 000, $275 000, $220 000, $242 000, $350 000,
$375 000, $300 000, $422 000, $2 150 000, $270 000, $420 000, $300 000, $245 000,
$385 000, $284 000, $320 000, $444 000, $185 000, $200 500, $264 000, $290 000
Calculate the mean and median of the raw data. Give your answers correct to the nearest dollar.
a.
Construct a histogram to display the data, using intervals commencing with $180 000– < $380 000.
b.
c. Mark in the positions of the mean and median on the histogram.
d. Which is the more appropriate measure of centre for this data. Justify your response.
20. The body mass index (BMI) is an accepted measure of obesity with a value of 30 or more being the
obese category. The BMI results for a group of people are shown in the table.

22.5 31.4 28.4 18.5 33.2 26.3


27.1 28.6 31.2 21.2 19.8 20.4
20.7 26.4 29.4 27.1 31.6 21.4
34.1 32.1 26.3 21.4 27.3 23.2
28.3 21.4 26.1 26.3 28.4 29.1
22.8 23.7 20.4 28.1 30.4 22.4
18.1 22.5 24.3 25.2 24.7 30.2

CHAPTER 11 Univariate data analysis 451


a. Display the data in two frequency tables and draw the corresponding histograms.
i. In the first frequency table, use intervals commencing at the lowest value and increasing by an
amount that is calculated by dividing the difference between the lowest and highest data value by 5.
ii. In the second
frequency table, use intervals commencing at the lowest value and increasing by an amount
that is calculated by dividing the difference between the lowest and highest data value by 10.
b. Describe the two histograms.
c. Calculate the mean for each frequency table and compare them to the mean of the raw data. Give
your answers correct to 2 decimal places.
d. Which measure of centre is the better representation of this data? Justify your response.

11.6 Measures of spread


11.6.1 Measures of centre and measures of spread
While measures of centre such as the mean or median give valuable information about a set of data,
taken in isolation they can be quite misleading. Take for example the data sets {36, 43, 44, 59, 68} and
{1, 2, 44, 80, 123}. Both groups have a mean of 50 and a median of 44, but the values in the second set are
much further apart from each other. Measures of centre tell us nothing about how variable the data values in
a set might be; for this we need to consider the measures of spread of the data.

11.6.2 Range and quartiles


Range
In simplest terms the spread of a data set can be determined by looking at the difference between the smallest
and largest values. This is called the range of the distribution. While the range is a useful calculation, it can
also be limited. Any extreme values (outliers) will result in the range giving a false indication of the spread
of the data.

Range = largest value – smallest value

Quartiles
A clearer picture of the spread of data can be obtained by looking at smaller sections. A common way to do
this is to divide the data into quarters, known as quartiles.
The lower quartile (Q1 ) is the value that indicates the median of the lower half of the data.
The second quartile (Q2 ) is the median of the distribution of data.
The upper quartile (Q3 ) is the value that indicates the median of the upper half of the data.
When calculating the values of the lower and upper quartiles, the median should not be included. If the
median is between values, then these values should be considered in your calculations.

The interquartile range


The interquartile range is found by calculating the difference between the third and first quartiles (Q3 – Q1 ),
which gives an indication of the spread of the middle 50% of the data.

452 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 16

Calculate the interquartile range of the following set of data.


23, 34, 67, 17, 34, 56, 19, 22, 24, 56, 56, 34, 23, 78, 22, 16, 15, 35, 45

THINK WRITE

1. Put the data in order. 15, 16, 17, 19, 22, 22, 23, 23, 24, 34, 34,
34, 35, 45, 56, 56, 56, 67, 78
2. Identify the median. There are 19 data values, so the median will be in position
19 + 1
= 10.
( 2 )
median

15, 16, 17, 19, 22, 22, 23, 23, 24, 34

The median is 34.


3. Identify Q1 by finding the There are 9 values in the lower half of the data, so Q1 will
median of the lower half of the be the 5th of these values.
Q1
data.

15, 16, 17, 19, 22, 22, 23, 23, 24

Q1 = 22

4. Identify Q3 by finding the There are 9 values in the upper half of the data, so Q3 will
median of the upper half of the be the 5th of these values.
Q3
data.

34, 34, 35, 45, 56, 56, 56, 67, 78

Q3 = 56

5. Calculate the interquartile IQR = (Q3 –Q1 )


range using IQR = (Q3 – Q1 ). = 56–22
= 34
6. State the answer. The interquartile range is 34.

11.6.3 Spread around the mean


When the mean is used as a representative value for data, it makes sense to take note of how much the data
varies in comparison to the mean. Two indicators of the spread of data around the mean are the variance and
the standard deviation. These measures generally only apply to continuous numerical data. The larger the
variance and standard deviation are, the more spread out the data is away from the mean.
Variance
The variance is calculated by finding the difference between each data value and the mean. To adjust for the
fact that values below the mean will result in a negative number, the results are then squared. These values
are then averaged to give a single number. The variance is calculated using the following formula:

∑ f (x – x)2
Sample variance: s2 =
(∑ f ) – 1

CHAPTER 11 Univariate data analysis 453


Standard deviation
The standard deviation is calculated by taking the square root of the variance.


∑ f (x – x)2
Sample standard deviation: s =
(∑ f ) – 1

∑(xi – x)2
=
n–1

This reverses the previous mathematical process of squaring the differences between the data values and
the mean, so that the standard deviation reverts to a comparative unit of measurement for the original data.
The following example shows that the variance and standard deviation can become very messy to calculate
once you have large groups of data. Spreadsheets, calculators and similar technologies are a more practical
and reliable option for these computations.
The table shows a grouped distribution of a sample of data with a mean of 6.5.

Intervals Frequency (f ) Midpoint (x) xf


0– < 5 2 2.5 5
5– < 10 8 7.5 60
∑ xf = 65

The second last column in the lower table shows the square of the difference between the midpoint and the
mean, and the last column shows this value multiplied by the frequency for the interval.

Intervals Frequency Midpoint (x) xf (x − x)2 f (x – x)2


0– < 5 2 2.5 5 (2.5 – 6.5)2 = 16 32
5– < 10 8 7.5 60 (7.5– 6.5)2 = 1 8
∑ f = 10 ∑ xf = 65 ∑ f(x–x)2 = 40

The sum of the final column can then be used with the sum of the frequency column in the formulas to
calculate the variance and standard deviation of the sample.
∑ f(x – x)2
Sample variance: s2 =
(∑ f )– 1
40
=
9
≈ 4.44

Sample standard deviation: s = 4.44
≈ 2.11

454 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
WORKED EXAMPLE 17

Calculate the variance and standard deviation for the sample from the information shown in the
table. Give your answers correct to 2 decimal places.

Intervals Frequency ( f ) Midpoint (x) xf (x − x)2 f (x – x)2


10– < 15 8 12.5 100 (12.5 –15.5)2 = 9 72
15– < 20 12 17.5 210 (17.5–15.5)2 = 4 48
∑ f = 20 ∑ xf = 310

THINK WRITE
2 2
1. Sum the f (x – x ) column. ∑ f (x – x ) = 72 + 48
= 120
120
2. Substitute the values into the formulas s2 =
19
for variance and standard deviation.
≈ 6.32

s = 6.32
≈ 2.51
3. State the answer. The variance of the sample is 6.32 and the
standard deviation of the sample is 2.51.

11.6.4 Preferred measures of spread


The standard deviation is generally considered the preferred measure of the spread of a distribution when
there are no outliers and no skew, as all of the data contributes to its calculation. When there are outliers or the
data is skewed, the interquartile range is a better option as it is not adversely influenced by extreme values.
As the interquartile range is calculated on the basis of just two numbers that may or may not be actual
values from the data set, it could be considered to be unrepresentative of the data set.

Interactivity: The median, the interquartile range, the range and the mode (int-6244)
Interactivity: The mean and the standard deviation (int-6246)

Units 1 & 2 Area 6 Sequence 1 Concept 7

Measures of spread Summary screen and practice question

Exercise 11.6 Measures of spread


1. WE16 Calculate the interquartile range of the following set of data.
421, 331, 127, 105, 309, 512, 129, 232, 124, 154, 246, 124, 313, 218, 112, 136, 155, 305, 415
2. Calculate the interquartile range of the following set of data.
3.11, 3.16, 1.13, 1.56, 3.19, 4.43, 1.98, 4.89, 2.12, 4.78, 3.21, 8.88, 1.21, 5.67, 2.22, 3.34

CHAPTER 11 Univariate data analysis 455


3. The results for a multiple choice test for 20 students in two different classes are as follows.
Class A: 7, 13, 14, 13, 14, 14, 12, 8, 18, 13, 14, 12, 16, 14, 12, 11, 13, 14, 13, 15
Class B: 18, 19, 12, 12, 11, 17, 9, 18, 17, 14, 13, 11, 17, 13, 17, 14, 14, 15, 13, 12
a. Compare the spread of the marks for each class by using the range.
b. Compare the spread of the marks for each class by using the interquartile range.
4. The competition ladder of the Australian and New Zealand netball championship is as follows.

Position Team Win Loss Goals for Goals against


1 Adelaide Thunderbirds 12 1 688 620
2 Melbourne Vixens 9 4 692 589
3 Waikato BOP Magic 9 4 749 650
4 Queensland Firebirds 9 4 793 691
5 Central Pulse 8 5 736 706
6 Southern Steel 6 7 812 790
7 West Coast Fever 5 8 715 757
8 NSW Swifts 4 9 652 672
9 Canterbury Tactix 2 11 700 882
10 Northern Mystics 1 12 699 879

Calculate the spread for the ‘Goals for’ column by using the range.
a.
b. Calculate the spread for the ‘Goals for’ column by using the interquartile range.
c. Compare the spread of the ‘Goals for’ column with the spread of the ‘Goals against’ column.
5. WE17 Calculate the variance and standard deviation for the sample from the information shown in the
table. Give your answers correct to 2 decimal places.

Intervals Frequency Midpoint (x) xf (x − x)2 f (x – x)2


0– < 10 14 5 70 (5 –8.3)2 = 10.89 152.46
10– < 20 7 15 105 (15 –8.3)2 = 44.89 314.23
∑ f = 21 ∑ xf = 175

6. Complete the table and calculate the variance and standard deviation for the following sample correct to
3 decimal places.

Intervals Frequency Midpoint (x) xf (x − x)2 f (x – x)2


0– < 50 35
50– < 100 125
∑f = ∑ xf = ∑ f(x – x)2 =

456 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
7. Complete the details of the following table, which shows the results of a survey of the ages of a sample
of workers in the hospitality industry.

Age group (years) Frequency ( f ) Midpoint (x) xf

15– < 20 14

20– < 25 18

25– < 30 11

30– < 35 7

35– < 40 5

∑f = ∑ xf =

8. A survey of the number of motor vehicles that pass a school


between 8.30 am and 9.30 am on 10 days during a term are
as follows.
72, 89, 94, 78, 83, 84, 88, 97, 82, 88
a. Calculate the standard deviation of the sample correct to
2 decimal places.
b. Calculate the interquartile range of the sample.
c. The lowest number is reduced by 10 and the highest
value increased by 10. Recalculate the values of the
standard deviation and interquartile range.
d. How is each of the measures affected by the change in the values?
9. A survey of a large sample of people from particular areas of employment found the following average
Australian salary ranges.

Employment area Average minimum Average maximum


Mining $65 795 $262 733
Management $66 701 $240 000
Engineering $56 572 $233 451
Legal $53 794 $193 235
Building and construction $46 795 $186 412
Telecommunications $47 354 $193 735
Science $47 978 $211 823
Medical $42 868 $228 806
Sales $42 917 $176 783

a. Calculate the interquartile range for the average minimum salaries.


b. Calculate the interquartile range for the average maximum salaries.
c. Comment on the two interquartile ranges.

CHAPTER 11 Univariate data analysis 457


10. A sample of crime statistics over a two-year period are shown in the following table.

Crime Year 1 Year 2


Theft from motor vehicle 46 700 42 900
Theft (steal from shop) 19 800 20 600
Theft of motor vehicle 15 650 14 670
Theft of bicycle 4 200 4 660
Theft (other) 50 965 50 650

Calculate the interquartile range and standard deviation (correct to 1 decimal place) for both years.
a.
b. Recalculate the interquartile range and standard deviation for both years after removing the smallest
category.
c. Comment on the effect of removing the smallest category on the interquartile ranges and standard
deviations.
11. The table shows the number of registered passenger vehicles in two particular years for the states and
territories of Australia.

Number of passenger vehicles


Year 1 Year 2
New South Wales 3 395 905 3 877 515
Victoria 2 997 856 3 446 548
Queensland 2 138 364 2 556 581
South Australia 915 059 1 016 590
Western Australia 1 205 266 1 476 743
Tasmania 271 365 305 913
Northern Territory 73 302 91 071
Australian Capital Territory 191 763 229 060

Calculate the interquartile range and standard deviation (correct to 1 decimal place) for both years.
a.
Recalculate the interquartile range and standard deviation for both years after removing the three
b.
smallest values.
c. Comment on the effect of the removal of the three smallest values on the interquartile ranges and
standard deviations.
12. Data collected on the number of daylight hours in Alice Springs is as shown.

10.3, 9.8, 9.6, 9.5, 8.5, 8.4, 9.1, 9.8, 10.0, 10.0, 10.1, 10.0, 10.1, 10.1, 10.6, 8.7, 8.8, 9.0, 8.0,
8.5, 10.6, 10.8, 10.5, 10.9, 8.5, 9.5, 9.3, 9.0, 9.4, 10.6, 8.3, 9.3, 9.0, 10.3, 8.4, 8.9
a. Calculate the range of the data.
b. Calculate the interquartile range of the data.
c. Comment on the difference between the two measures and what this indicates.

458 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
13. The volume of wine (′ 000 litres) available for consumption in Australia for a random selection of
months over a 10-year time period is shown in the following table.

38 595 41 301 44 212 39 362 38 914 38 273 39 456 38 823


41 123 42 981 44 567 41 675 41 365 42 845 43 987 41 583
39 347 42 673 44 835 39 773 38 586 38 833 39 756 39 095
42 946 46 382 44 892 41 038 41 402 42 587 43 689 41 209

Calculate the mean and standard deviation of the data correct to 2 decimal
a.
places.
b. Calculate the median and interquartile range of the data.
c. What percentage, correct to 2 decimal places, of the actual data values
from the sample are within one standard deviation of the mean (i.e.
between the number obtained by subtracting the standard deviation
from the mean and the number obtained by adding the
standard deviation to the mean)?
d. What percentage of the actual data values from the sample are between the first and third quartiles?
e. Comment on the differences between your answers for parts c and d.
14. A random sample of the monthly consumer price indices in various cities of Australia is shown in the
following table. Answer the following questions, giving answers correct to 2 decimal places where
appropriate.

Sydney Melbourne Brisbane Adelaide Perth Hobart Darwin Canberra


0.4 0.8 0.9 0.7 0.6 0.3 1.2 0.8
0.9 1.0 1.1 1.0 0.8 0.8 0.3 1.0
1.4 1.3 1.3 1.5 1.4 1.3 0.9 1.4
1.5 1.2 1.7 1.3 1.6 1.0 1.5 1.2
1.1 1.2 1.4 1.3 1.0 1.1 1.8 1.5
0.1 0.3 0.2 0.2 0.1 0.2 0.1 0.3
0.4 0.3 0.5 0.5 0.9 0.5 1.1 0.6
1.1 0.5 1.4 1.1 0.8 1.2 1.9 0.9
0.5 0.6 0.3 0.4 0.5 0.6 0.1 0.4
0.8 1.3 0.7 0.5 1.2 0.7 0.5 0.6

Calculate the standard deviation and interquartile range of the entire data set.
a.
Calculate the standard deviation and interquartile range for each city.
b.
c. Which city bears the closest similarity to the entire data set?
d. Which city bears the least similarity to the entire data set?
15. Answer the questions on the data in the following table. Where appropriate, give answers correct
to 2 decimal places.

CHAPTER 11 Univariate data analysis 459


Carbon dioxide emissions (million metric tons of carbon dioxide)
Country 2001 2002 2003 2004 2005 2006
Australia 374.05 382.65 380.68 391.03 416.89 417.06
Canada 553.55 573.25 602.46 614.69 632.01 614.33
China 3107.99 3440.60 4061.64 4847.33 5429.30 6017.69
Germany 877.71 857.35 874.04 871.88 852.57 857.60
India 1035.42 1033.52 1048.11 1151.33 1194.01 1293.17
Indonesia 300.18 314.88 305.44 323.29 323.51 280.36
Japan 1197.15 1203.33 1253.29 1257.89 1249.62 1246.76
Russia 1571.14 1571.77 1626.86 1663.44 1698.56 1704.36
United Kingdom 575.19 563.89 575.17 582.29 584.65 585.71
United States 5762.33 5823.80 5877.73 5969.28 5994.29 5902.75

a. Calculate the interquartile range and standard deviation for the Australian data.
b. Compare the measures of spread for the Australian data with those for India, China, the United
Kingdom and the United States.
c. For this data, which measure of spread is more appropriate?
16. Answer the questions on the data in the following table. Where appropriate, give your answers correct
to 2 decimal places.
Alcohol consumption per adult (litres)
Country Consumption per adult (litres)
Australia 10.21
Canada 10.01
France 12.48
Germany 12.14
Greece 11.01
Indonesia 0.56
Ireland 14.92
New Zealand 9.99
Russia 16.23
South Africa 10.16
Spain 11.83
Sri Lanka 0.81
United Kingdom 13.24
United States 9.7
Yemen 0.2

460 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
a. Calculate the interquartile range and variance for the data set correct to 2 decimal places.
b. Calculate the interquartile range and variance after removing the three lowest values, correct to 2
decimal places.
c. Compare the results from parts a and b.

11.7 Review: exam practice


A summary of this chapter is available in the Resources section of your eBookPLUS at www.jacplus.com.au.
Simple familiar
1. MC The interquartile range of the data distribution shown in the stem plot is:
Key: 2 6 = 26
Stem Leaf
0 2
1 1 5
2 6 6 7 8
3 8 8 9
4 3 4
5 2
A. 41 B. 50 C. 28 D. 20.5
2. MC The mean of the data distribution shown in the table is:

Interval Frequency (f)


0– < 15 5
15– < 30 7
30– < 45 6
45– < 60 2

A. 22.4 B. 26.25 C. 24.35 D. 25.65


3. MC Data gathered on the number of home runs in a baseball season would be classified as:
A. discrete B. nominal
C. continuous D. ordinal
4. MC For the sample data set 2, 3, 5, 2, 3, 6, 3, 8, 9, 2, 8, 9, 2, 6, 7, the mean and
standard deviation respectively would be closest to:
A. 5 and 6 B. 5 and 2.6
C. 2.6 and 5 D. 5 and 2.7
5. MC For the following stem plot, the median and range respectively are:
Key: 5 1 = 51
Stem Leaf
5 1 2
6 2 3 4
7 3 4 4 5
8 6 6
9 2
A. 73 and 41 B. 73.5 and 41 C. 71 and 39 D. 71 and 41

CHAPTER 11 Univariate data analysis 461


6. State whether each of the following data types is categorical or numerical.
a. The television program that people watch at 7: 00 pm
b. The number of pets in each household
c. The amount of water consumed by athletes in a marathon run
d. The average distance that students live from school
7. For each of the numerical data types below, determine if the data are discrete or continuous.
a. The dress sizes of Year 11 girls
b. The volume of backyard swimming pools
c. The amount of water used in households
d. The number of viewers of a particular television program
8. A group of Year 11 students was asked to state the number of movies that they had purchased in the last
year. The results are shown below.
12, 1, 13, 20, 5, 22, 35, 12, 17, 20,
9, 5, 11, 0, 14, 25, 3, 8, 10, 9,
12, 6, 18, 7, 10, 9, 6, 23, 14, 19
Put the results into a table using the categories 0– 4, 5–9, 10–14 etc.
a.
Draw a column graph to represent the results.
b.
9. The data below give the number of errors made each week by 20 machine operators. Prepare a
stem-and-leaf diagram of the data using a stem of 0, 1, 2, etc.
6, 15, 20, 25, 28, 18, 32, 43, 52, 27, 17, 26, 38, 31, 26, 29, 32, 46, 13, 20
10. The time taken (in seconds) for a test vehicle to accelerate from 0 to 100 km/h is recorded during a test
of 24 trials. The results are represented by the stem-and-leaf plot below.
Calculate the median of the data.
Key: 7 | 2 = 7.2 s
7* | 6 = 7.6 s
Stem Leaf
7 2 4 4
7* 5 5 7 9
8 0 0 1 2 4 4 4
8* 5 5 6 8 9
9 2 2 3
9* 5 7
11. The stem-and-leaf plot below gives the exact mass of 24 packets of biscuits. Find the mean and range
of the data.
Key: 248 | 4 = 248.4 g
Stem Leaf
248 4 7 8
249 2 3 6 6
250 0 0 1 1 6 9 9
251 1 5 5 5 6 7
252 1 5 8
253 0

462 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
12. The frequency table below shows the crowds at football matches for a team over a season.

Class Classcentre Frequency


5000–9999 1
10 000–14 999 5
15 000–19 999 9
20 000–24 999 3
25 000–29 999 2
30 000–34 999 2

Copy the frequency table and complete the class centre column.
a.
Show the information in a frequency histogram.
b.
Complex familar
13. The price of a barrel of oil in US dollars over a particular 18-month time period is shown in the
following table.

Month Price(US$)
Jan 102.96
Feb 97.63
Mar 108.76
Apr 105.25
May 106.17
Jun 83.17
Jul 83.72
Aug 88.99
Sep 95.34
Oct 92.44
Nov 87.05
Dec 88.69
Jan 93.14
Feb 97.46
Mar 90.71
Apr 97.1
May 90.74
Jun 93.41

a. Calculate the mean and median for this data set. Give your answers correct to 1 decimal place.
b. Calculate the standard deviation for this data set. Give your answer correct to 2 decimal places.

CHAPTER 11 Univariate data analysis 463


14. The table below shows the number of sales made each day over a month in a car yard.

Number of sales Frequency


0 2
1 5
2 12
3 6
4 2
5 0
6 1

Show this information in a frequency histogram.


15. Display the following scores in a stem-and-leaf plot.
45, 21, 38, 46, 42, 41, 42, 49, 35, 29, 24, 28,
36, 21, 38, 45, 44, 40, 29, 28, 35, 35, 33, 38,
40, 41, 48, 39, 34, 38, 45, 28, 23, 29, 30, 40
16. Use the stem-and-leaf plot drawn in the previous question to find:
a.the range b. the median

Complex unfamilier
17. Use the data on the incidence of communicable diseases in Australia to answer the following questions.
Incidence of communicable diseases in Australia over two consecutive years

Disease Year 1 Year 2


Hepatitis C 11 089 7 286
Typhoid Fever 116 96
Legionellosis 302 298
Meningococcal disease 259 230
Tuberculosis 1 324 1 327
Influenza (laboratory confirmed) 59 090 13 419
Measles 104 70
Mumps 165 95
Chickenpox 1 753 1 743
Shingles 2 716 2 978
Dengue virus infection 1 406 1 201
Malaria 508 399
Ross River virus infection 4 796 5 147

a. Calculate the mean (correct to 1 decimal place) and median number of cases of communicable
diseases of the sample for each year.
b. Comment on the differences between the mean and median values calculated in part a.

464 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
18. The number of passengers arriving from overseas during a particular time period at various airports in
Australia is shown in the following table.

Airport Number of passengers


Adelaide 5 743
Brisbane 480 625
Cairns 5 110
Coolangatta 7 655
Darwin 5 318
Melbourne 594 286
Perth 318 493

Calculate the mean and standard deviation for the sample. Give your answers correct to 1 decimal place.
19. Use the data shown to answer the following questions.
Women who gave birth and Indigenous status by states and territories, 2009

Status NSW Vic. Qld WA SA Tas ACT NT Aust


Indigenous 2 904 838 3 332 1 738 607 284 107 1 474 11 284
Non-Indigenous 91 958 70 328 57 665 29 022 18 994 5 996 5 601 2 369 281 933

Display the data in an appropriate display.


a.
b. Calculate the mean births per state/territory of Australia in 2009 for both Indigenous and
Non-Indigenous groups. Give your answers correct to 1 decimal place.
c. Calculate the median births per state/territory of Australia in 2009 for both Indigenous and
Non-Indigenous groups.
d. Calculate the standard deviation (correct to 1 decimal place) for the data on births per state/territory
of Australia in 2009 for both Indigenous and non-Indigenous groups.
20. Use the data on Tokyo’s average maximum temperatures to answer the questions.
Tokyo average maximum temperature, 1980–89 and 2003–12

Year Temp. (°C) Year Temp. (°C) Year Temp. (°C) Year Temp. (°C)
1980 19.3 1985 19.4 2003 19.6 2008 20.1
1981 19.0 1986 18.8 2004 21.2 2009 20.3
1982 19.6 1987 20.0 2005 20.4 2010 20.6
1983 19.6 1988 19.0 2006 19.9 2011 20.2
1984 18.8 1989 19.9 2007 20.6 2012 20.0

a. Calculate the mean and standard deviation of the temperature data for the two 10-year periods of
1980–89 and 2003–12. Give your answers correct to 2 decimal places.
b. What do the means and standard deviations calculated indicate about the two 10-year periods?
c. Calculate the mean and standard deviation of the total 20 years of the sample data. Give your
answers correct to 2 decimal places.
d. How do the measurements in part c compare to the calculations you made in part a?

Units 1 & 2 Sit chapter test

CHAPTER 11 Univariate data analysis 465


Answers 8. a. Response Frequency
Yes 16
Chapter 11 Univariate data Don’t care 3
analysis No 7
Exercise 11.2 Classifying and displaying data b.
1. Nominal 18
16
2. Categorical, ordinal 14

Frequency
3. a. Numerical and continuous 12
b. Numerical and continuous 10
c. Numerical and discrete 8
6
d. Categorical
4
4. a. Continuous b. Discrete 2
c. Ordinal d. Nominal 0
Yes Don’t No
5. care
35
30 Response
Frequency

25 c. Ordinal, as it makes sense to arrange the data in order


20
from ‘Yes’ to ‘No’, with ‘Don’t care’ between them.
15
10 9.
Data Type
5
0 a Wines rated as Categorical Ordinal
high, medium or
n

ce

n
y

A ical
m

ro
io

io
ed

an

ra
ct

at
or
m

low quality
us
m

D
A

m
H
Co

Ro

ni

b The number of Numerical Discrete


Favourite movie genre
downloads from a
6. website
Favourite pizza Frequency
c The electricity Numerical Continuous
Margherita 7 usage over a
Pepperoni 11 three-month
period
Supreme 9
d A volume of Numerical Continuous
Meat feast 14 petrol sold by a
Vegetarian 6 petrol station per
day
Other 13

7. a. Nominal categorical 10. a.


Favourite animal Frequency
b. Dog 7
Transport method Frequency
Cat 8
Bus 9
Rabbit 4
Walk 3
Guinea pig 3
Train 4
Rat 2
Car 7
Ferret 1
Bicycle 3
b.
c. 9
10
8
9
8 7
Frequency

6
7
Frequency

5
6
4
5
3
4
2
3
1
2
0
1
og

ig
t

t
bi

0
Ca

Ra

rre
ap
D

b
Ra

Fe
ne
s

r
k

e
Ca
Bu

ai

ui
cl
al
Tr

G
cy
W

Bi

Favourite animal
Transport method
c. Cat

466 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
11. a. b.
Favourite type of music Frequency 9
8
Pop 9 7

Frequency
6
Rock 7
5
Classical 2 4
3
Folk 3 2
1
Electronic 9 0
A B C D E
b. Result
10
9 c. Ordinal categorical
8
7
Frequency

15.
6 100
5 90
4 80
3 70

Frequency
2 60
1 50
0 40
30
p

c
ck

El olk
al

ni
Po

sic
Ro

20
tro
F
as

ec

10
Cl

Favourite type of music 0


2 3 4 5
c. Pop and electronic Number of bedrooms
12. a. Flat white
16. a.
b. 70 Temperature Frequency
13. a. Ordinal categorical
Above average 3
b. The data should be in order from ‘Strongly agree’
through to ‘Strongly disagree’. Average 9
c. Below average 3
45
40
35 b.
10
Frequency

30 9
25 8
20
7
Frequency

15
6
10
5
5
4
0
3
e

ee
ee

ee

2
re

r
su

gr
gr

gr
ag

isa
A

sa
ot

1
ly

di
D
ng

ly

0
ro

ng
St

Above Average Below


ro
St

average average
Response
Temperature
14. a.
Result Frequency c. Ordinal categorical
A 3
17. a. 40
B 2 b. 15%

C 8 18. a. Nominal categorical


D 4
E 3

CHAPTER 11 Univariate data analysis 467


b. Creative arts
140
120 60
Frequency
100 50

Frequency
80 40
60 30
40 20
20 10
0 ia
0

a
m

nd

er
di

in

4
4
al

do

–1

–3

–4

–6
th

–2
la
In

Ch
tr

O
Ire
ng

15

25

35

45
us

20
ki
A

Age group
d
te
ni
U

Birthplace

c. 64% Engineering
19. a. 80
Age group Frequency
70
Under 20s 18 60

Frequency
20 − 29 15 50
40
30 − 39 13 30
40 − 49 10 20
10
50 − 59 7 0

4
4
60+ 12

–1

–3

–4

–6
–2
15

25

35

45
20
b. Under 20s Age group

c.
30 Health
25
Frequency

20 80
15 70
10 60
Frequency

5 50
0 40
30
s

+
9
20

–5
–3

60

20
40
20
er
nd

10
U

Age group 0
9

4
4
–1

–3

–4

–6

d. Yes, the modal category is now 20–39.


–2
15

25

35

45
20

20. a. Agriculture Age group


16
14 Management and commerce
12
Frequency

10 180
8 160
6 140
Frequency

4 120
2 100
0 80
60
9

4
4
–1

–3

–4

–6
–2

40
15

25

35

45
20

Age group 20
0
9

4
4
–1

–3

–4

–6
–2
15

25

35

45
20

Age group

468 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
15–19 45–64
80 70
70 60

Frequency
60 50

Frequency
50 40
40 30
30 20
20 10
10 0
0

re

em lth
in s

er d
rt

rin

m n
tu

a
ea

m ta
ce
e
ul

ee
re

em lth

H
rts

er d

co en
tiv
ic
in

m n
tu

ea
ea

m ta

gr
er

ce

g
ea
ul

En
H

A
ne

co en
tiv

Cr
ic

ag
gi
gr

ea

an
En
A

Cr

ag

M
an
Main area of education and study
M

Main area of education and study


20–24 Exercise 11.3 Construct, describe and interpret
dot plots and stem-and-leaf plots
160
140
1.
Key: 1* | 9 = $19
120 Stem Leaf
100 1* 9
80 2 1 1 2 2 2 2 2 2 3 3 4 4
60
2* 5 6
40
20 2.
Key: 0* | 6 = 6 hours
0 Stem Leaf
tiv e

em lth
gi rts

er d
r

0* 6 7 9
in

m n
Cr ultu

ea
a

m ta
er

ce
e

H
ne

co en
ic

1 3 4 4
gr

ea
En
A

1* 7 7
ag
an
M

2 0 0 1 1 1 3 3 4 4
Main area of education and study 2* 5 5 6
25–34 3.
Key: 1 | 2 = 12 passengers
Stem Leaf
160 1 2 3 4
140 1* 5 5 5 5 7 7
120
2 0 0 2 2 3 3 3 3 4
100
80
2* 5 7 7 7 7 7 8 8
60 3 0 3 4 4
40 3* 5 5 6 6 6 7
20 4 2 3 4
0 4* 7
re

em lth
rts

er d

Key: 1 | 7 = 17 patients
rin

4.
m n
tu

ea
ea

m ta
ce
ul

ee

co en
iv
ic

in

Stem Leaf
t
gr

g
ea
En
A

Cr

ag

1 7 7
an
M

2 1 2 3 3 3 4 4 4 5 5 5 6 6 6 8
Main area of education and study
3 0 0 1 2 3 4 4 5 7 8 8
35–44 4 1 1 3 4 5 5 6
5 1 1 5 6
100 6 0
90
5. a.
80
70
Frequency

60
50
40 0 1 2 3 4 5
30 Number of wickets
20 b.
10
0
re

em lth
ts

er d
in

m n
En e ar
tu

ea

m ta
er

ce
ul

H
ne

co en
tiv
ic

1 2 3 4 5 6
gi
gr

ea
A

Cr

ag

Hours checking email


an
M

Main area of education and study

CHAPTER 11 Univariate data analysis 469


6. a. 13. a. The first stem plot has one mode with data values
that are most frequent in the 30 –< 40 interval.
There is a possible outlier at 91, and the distribution
appears to be symmetrical.
The second stem plot has 3 modes and two distinct
70 71 72 73 74 75 76 77 78 groups of data. There are no obvious outliers, and there
Round score is a slight positive skew to the distribution.
b. b.
Key: 0 | 1 = 1 game played
Stem Leaf
0 1
1 4 7
2 4 4 4 8
1 2 3 4 5 6 7 8 9 10 3 1 2 3 3 5 6 6
Test results 4 1 2 3 3 4 5
5 1 1 2
7. Key: 3 | 6 = 36 min
6 5
Stem Leaf 7
3 6 7 8 8 9 9 8 2 5 7
4 0 0 1 2 2 2 2 2 3 5 6 6 7 7 8 8 8 9 1 3
5 0 2 2 2 3 4 5 7 8 9 14. a.
Key: 0 | 1 = 1
6 6 8 8 Stem Leaf
7 1 2 5 0 1
8 2 0*
8. Key: 10 | 1 = 101 wpm 10* | 6 = 106 wpm 1 1 1 1 4 4
1* 6 6 7 8
Stem Leaf
2 3 3 4 4
8* 6 8 2* 7 7 9
9 2 b. Splitting the stem for this data gives a clearer picture of
9* 5 5 5 6 6 6 8 9 the spread and shape of the distribution of the data set.
10 2 2 2 3
10* 7 7 7 8 8 8
Exercise 11.4 Construct, describe and interpret
11 0 1 2
column graphs and histograms
11* 5
1.
12 0 1 1 4 Time (seconds) Frequency
12*
30– < 40 3
13 0
9. Key: 14 | 3 = 14.3 V 14* | 8 = 14.8 V 40– < 50 5
Stem Leaf
50– < 60 7
13* 8 9
14 0 2 3 3 60– < 70 4
14* 5 6 6 7 7 8 8 70– < 80 1
15 1 2 2
15* 5 6 7 9 2. a.
10. D Time (seconds) Frequency
11. B 80– < 90 2
12.
Key: 0* | 8 = 8 people 90– < 100 6
Stem Leaf
0* 8 8 100– < 110 5
1 3 3 4
110– < 120 5
1* 6 6 7 7 9
2 1 1 120– < 130 2
2* 5

470 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
b. 7. Number of students Marks on maths exam
Time(Seconds) Frequency 14.4º
28.8º
85– < 90 2
40–49%
28.8º 50–59%
90– < 95 1
129.6º 60–69%
95– < 100 5 43.2º 70–79%
80–89%
100– < 105 2 90–99%
115.2º
105– < 110 3 8.
Make Tally Frequency
110– < 115 3

115– < 120 2


Holden ∥∥ ||| 8

120– < 125 2


Ford ∥∥ ||| 8

Nissan ∥ 2
3.
Class interval Frequency Mazda ∥∥ 3

100–124 3 Toyota ∥∥ ∥ 7

125–149 5 Mitsubishi ∥ 2

150–174 10
9. .
175–199 9 9
8
200–224 4 7
Frequency 6
5
4. 4
10 3
Frequency

8 2
6 1
4 0
2
n

an

da
rd

ta

i
de

ish
0

yo
Fo

iss

az
ol

ub
M

To
N
H

its
4

4
12

14

17

19

22

M
0–

5–

0–

5–

0–

Make of car
10

12

15

17

20

Number of customers per day 10.


5. 6.67%
Mark Tally Frequency
6.67% Make of car
40–49 | 1 Mitsubishi
26.67% 10% Nissan
50–59 ∥ 2 Mazda
Toyota
60–69 ∥∥ ∥∥ 9 23.33% Holden
26.67%
70–79 ∥∥ ||| 8 Ford

80–89 ||| 3 11.


16
90–99 ∥ 2 14
12
Frequency

10
6.
10 8
9 6
Number of students

8 4
7 2
6 0
5 1 2.5 4.0 5.5 7.0 8.5
4 Cholesterol level (mmol/L)
3
2
1
0
9

9
–4

–5

–6

–7

–8

–9
40

50

60

70

80

90

Maths exam mark

CHAPTER 11 Univariate data analysis 471


12. Exercise 11.5 Measures of centre
30
25 1. a. The distribution has one mode with data values that are
Frequency
20 most frequent in the 35– < 40 interval. There are no
15 obvious outliers, and there is a negative skew to the
10 distribution.
5
b. The distribution has one mode with data values that are most
0
0 2 4 6 8 10 frequent in the 45– < 60 interval. There are potential
Distance travelled (km) outliers in the 1 20– < 135 interval, and the distribution
is either symmetrical (excluding the outliers) or has a
13. slight positive skew (including the outliers).
Class interval Frequency
2. a.
16
5– < 10 6 14
12

Frequency
10– < 15 9
10
15– < 20 15 8
6
20– < 25 9 4
2
25– < 30 1 0
70 80 90 100 110 120 130 140 150
The distribution has one mode with data values that are
18 most frequent in the 120– < 130 interval. There are
16 potential outliers in the 70– < 80 interval, and there is a
14 negative skew to the distribution.
Frequency

12 b.
10 16
8 14
6 12
Frequency

4 10
2 8
0 6
5 10 15 20 25 30 4
2
14. 0
Class interval Frequency 4 14 24 34 44
11– < 16 10 The distribution has one mode with data values that are
16– < 21 5 most frequent in the 24– < 34 interval. There are no
obvious outliers, and there is a negative skew to the
21– < 26 5 distribution.
26– < 31 6 3. a.

31– < 36 8
36– < 41 1
3 4 5 6 7 8 9
41– < 46 3 Marks
b. The distribution has one mode with a value of 6. There
46– < 51 2
are no obvious outliers and there is a slight negative
skew to the distribution.
4. 128.4
5. a. 24.76
12
b. 27.71
10
Frequency

c. As the highest value increased, the mean increased


8
6 significantly.
4 6. a. 6.94 b. 46.36
2
0
11 16 21 26 31 36 41 46 51

472 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
7. a.
Interval Frequency( f) Midpoint (x) xf

203– < 210 5 206.5 1032.5

210– < 217 3 213.5 640.5

217– < 224 3 220.5 661.5

224– < 231 3 227.5 682.5

231– < 238 2 234.5 469

238– < 245 1 241.5 241.5

∑ f = 17 ∑ fx = 3727.5
x = 219.26

b.

Interval Frequency( f) Midpoint (x) xf

5– < 10 4 7.5 30

10– < 15 5 12.5 62.5

15– < 20 3 17.5 52.5

20– < 25 3 22.5 67.5

25– < 30 1 27.5 27.5

30– < 35 1 32.5 32.5

∑ f = 17 ∑ fx = 272.5
x = 16.03

8. a. 26.18 b. 21.67
9. a. 51 b. 198
10. a. 24
b. 24
c. The median is unchanged.
11. a. 100 b. 1.805
12. The median, as the data set has two clear outliers
13. a. $74 230.77
b. $65 000
c. It would be in the workers’ interest to use a higher figure when negotiating salaries, whereas it would be in the
management’s interest to use a lower figure.
14. a. Mean = 867.89 mm, Median = 654 mm
b. The median, as it is not affected by the extreme values present in the data set.

15. a. Mean = $1 269 850, Median = $594 500


b., c
16
14
12
Frequency

10 Mean
8
Median
6
4
2
0
241 491 741 991 1241 1491 1741 1991 2241 2491 2741 2991 3241 3491 3741 3991 4241 4491 4741
Property prices ($000s)
d. The median, as the mean is affected by a few very high values.

CHAPTER 11 Univariate data analysis 473


16. a. b.
Interval Frequency 9
8
1.75– < 1.80 1 7

Frequency
6
1.80– < 1.85 1 5
1.85– < 1.90 1 4
3
1.90– < 1.95 4 2
1
1.95– < 2.00 3 0
1.75 1.80 185 1.90 1.95 2.00 2.05 2.10 2.15 2.20 2.25 2.30
2.00– < 2.05 8
Height (m)
2.05– < 2.10 7 The median would be the preferred choice due to the
extreme values in the data set.
2.10– < 2.15 4
2.15– < 2.20 0
2.20– < 2.25 0
2.25– < 2.30 1

c. 2.02 m
17. a. Mean= 7.55, median = 6
b. The median would be the preferred choice due to the extreme value of 34.
18. a. Mean = 91.125, median = 89.5
b., c
16
14
12
Frequency

10
8
6
Median
4
Mean
2
0
80 85 90 95 100 105 110
Value (US cents)
c. The mean is higher than the median as it has been more influenced by the values at the higher end of the distribution.
19. a. Mean = $847 354, median = $310 000
b., c ∗
16
14
12
Frequency

10
8
6
Median
4 Mean
2
0
380
580
0
980
11 0
80

80

80

80

80

80

80

80

80

80

80

80
18

78

15

19

23

27

31

35

39

43

47

51

55

Earnings ($000s)
d. The median is the best measure as the mean is affected by the extreme values.

474 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
20. a. i.
Interval Frequency

18.1– < 21.3 7

21.3– < 24.5 10

24.5– < 27.7 10

27.7– < 30.9 9

30.9– < 34.1 5

34.1– < 37.3 1

12
10
Frequency

8
6
4
2
0
18.1 21.3 24.5 27.7 30.9 34.1 37.3
BMI

ii.
Interval Frequency

18.1– < 19.7 2

19.7– < 21.3 5

21.3– < 22.9 7

22.9– < 24.5 3

24.5– < 26.1 2

26.1– < 27.7 8

27.7– < 29.3 6

29.3– < 30.9 3

30.9– < 32.5 4

32.5– < 34.1 1

34.1– < 35.7 1

9
8
7
Frequency

6
5
4
3
2
1
0
18.1 19.7 21.3 22.9 24.5 26.1 27.7 29.3 30.9 32.5 34.1 35.7
BMI
b. The first histogram has two modes and is near symmetrical, with a slight positive skew.
The second histogram shows two distinct groups, with a symmetrical lower group and a positively skewed upper group.
c. Table 1: 25.95, Table 2: 25.91, Raw data: 25.76
Both of the tables give a higher value for the mean than the raw data, although the differences are small.
d. The total data set is generally symmetrical with no obvious outliers, so the mean is the best measure of centre.

CHAPTER 11 Univariate data analysis 475


Exercise 11.6 Measures of spread
1. 186
2. 2.555
3. a. Class A = 11, Class B = 10
b. Class A = 2, Class B = 5
4. a. 160
b. 57
c. Goals against: range = 293, interquartile range = 140
The ‘Goals against’ column is significantly more spread out than the ‘Goals for’ column.
5. Variance = 23.33, standard deviation = 4.83
6.
Interval Frequency (f ) Midpoint (x) xf (x − x)2 f (x − x)2

0– < 50 35 25 875 1525.88 53405.8

50– < 100 125 75 9375 119.63 14953.6

∑ f = 160 ∑ xf = 10 250 ∑ f(x–x)2 = 68 359.4

Variance = 429,933, standard deviation = 20,735


7.
Age group (years) Frequency (f ) Midpoint (x) xf

15– < 20 14 17.5 245

20– < 25 18 22.5 405

25– < 30 11 27.5 302.5

30– < 35 7 32.5 227.5

35– < 40 5 37.5 187.5

∑ f = 55 ∑ fx = 1367.5

8. a. 7.37
b. 7
c. Standard deviation = 11.49, interquartile range = 7
d. The standard deviation increased by 4.12, while the interquartile range was unchanged.
9. a. $16 327.50
b. $46 902
c. There is a much larger spread in the maximum salaries than the minimum salaries.
10. a. Year 1: standard deviation = 20 382.8, interquartile range = 38 907.5
Year 2: standard deviation = 19 389.0, interquartile range = 37 110.0
b. Year 1: standard deviation = 18 123.5, interquartile range = 31 107.5
Year 2: standard deviation = 17 289.2, interquartile range = 29 140
c. Both values are reduced by a similar amount, but there is a larger impact on the standard deviation than the interquartile
range.
11. a. Year 1: standard deviation = 1 301 033.5, interquartile range = 2 336 546
Year 2: standard deviation = 1 497 303.5, interquartile range = 2 734 078
b. Year 1: standard deviation = 1 082 470.9, interquartile range = 2 136 718
Year 2: standard deviation = 1 228 931.0, interquartile range = 2 415 365
c. Both values are reduced, but there is a bigger impact on the interquartile range than the standard deviation.
12. a. 2.9
b. 1.25
c. The range is slightly more (2.9 < 2.1.25) than double the value of the interquartile range. This indicates that the data is
bunched with no outliers.
13. a. Mean = 41 440.78, standard deviation = 2248.92
b. Median = 41 333, interquartile range = 3609
c. 59.38%
d. 50%

476 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
e. There is a greater percentage of the sample within one standard deviation of the mean than between the first and third
quartiles.
14. a. Standard deviation = 0.46, interquartile range = 0.7.
b.
Sydney Melbourne Brisbane Adelaide Perth Hobart Darwin Canberra

Std dev. 0.46 0.40 0.51 0.45 0.44 0.38 0.67 0.41

IQR 0.7 0.7 0.9 0.8 0.6 0.6 1.2 0.6

c. Sydney
d. Darwin
15. a. Standard deviation = 18.81, interquartile range = 36.21
b. India: standard deviation = 105.86 , interquartile range = 158.59
China: standard deviation = 1143.53, interquartile range = 1988.7
United Kingdom: standard deviation = 8.21, interquartile range = 9.48
USA: standard deviation = 87.34, interquartile range = 145.48
c. The standard deviation is appropriate, as there appear to be no obvious outliers in the data for any country.
16. a. Interquartile range = 2.78, variance = 25.40
b. Interquartile range = 2.78, variance = 4.43
c. The interquartile range has stayed the same value, while the variance has reduced significantly.

11.7 Review: exam practice


1. D
2. B
3. A
4. D
5. B
6. a. Categorical b. Numerical c. Numerical d. Numerical
7. a. Discrete b. Continuous c. Continuous d. Discrete
8. a. b.
Number of DVDs Tally Number of students 10
9
Number of students

0–4 ||| 3 8
7
5–9 ∥∥ ∥∥ 9 6
5
10–14 ∥∥ ∥∥ 9 4
3
15–19 ||| 3
2
20–24 ∥∥ 4 1
0
25–29 | 1
4

9
0–

5–

–1

–1

–2

–2

–3

–3
10

15

20

25

30

35

30–34 0 Number of movies purchased


35–39 | 1

|
9. Key: 0 6 = 6 errors
Stem Leaf

0 6
1 3 5 7 8
2 0 0 5 6 6 7 8 9

3 1 2 2 8
4 3 6
5 2

10. 8.4s
11. Mean = 250.65 g, Range = 4.6 g

CHAPTER 11 Univariate data analysis 477


12. a.
Class Class centre Frequency

5000–9999 7 500 1

10 000–14 999 12 500 5

15 000–19 999 17 500 9

20 000–24 999 22 500 3

25 000–29 999 27 500 2

30 000–34 999 32 500 2

b. f

10
Frequency

8
6
4
2
0 x
00

0
50

50

50

50

50
75
12

17

22

27

32

No. of people at a football match

13. a. Mean = 94.6, median = 93.3


b. Standard deviation = 7.49
14. f

12
Frequency

10
8
6
4
2
0 x
0 1 2 3 4 5 6

Key 2|1 = 21
Number of sales
15.

Stem Leaf
2 1134888999
3 034555688889
4 00011224555689

16. a. 28
b. 38
17. a. Year 1: mean = 6432.9, median = 1324
Year 2: mean = 2637.6, median = 1201
b. The mean values are significantly different but the medians are very similar. This would seem to indicate the presence of
extreme values in the data.
18. Mean = 202 461.4, standard deviation = 257 819.6

478 Jacaranda Maths Quest 11 General Mathematics Units 1 & 2 for Queensland
19. a. Indigenous
100 000 Non-Indigenous
90 000
80 000
70 000
60 000
50 000
40 000
30 000
20 000
10 000
0
NSW Vic. Qld WA SA Tas. ACT NT
b. Indigenous mean = 1410.5, Non-Indigenous mean = 35 241.6
c. Indigenous median = 1156, Non-Indigenous median = 24 008
d. Indigenous: standard deviation = 1193.8, Non-Indigenous: standard deviation = 33 949.0
20. a. 1980 – 89: mean = 19.34, standard deviation = 0.44
2003 – 12: mean = 20.29, standard deviation = 0.45
b. The mean temperature is about one degree higher in the period 2003–12, but the standard deviations indicate that the data
have similar spreads.
c. Total data:mean = 19.82, standard deviation = 0.65
d. The mean of the total data is halfway between the two separate time periods. The standard deviation indicates a much
greater variation from the mean for the total data.

CHAPTER 11 Univariate data analysis 479

You might also like