Scales of measurement,
Measures of Central
Tendency and Dispersion
Quantitative Methods
Prof. Sonia Nangalia
Amazon Data Example
10675489 B000001OAA 10.99 Chris G.
Samuel P. Orange County 10783489 12837593
Canada Garbage 16.99 Ohio
B000002BK9 312 Monique D. Y
Boston 15.98 Kansas 902
B000068ZVQ Bad Blood Nashville N
Chicago N 11.99 N
B00000I5Y6 440 15783947 413
Massachusetts Katherine H. Illinois Let Go
• Data doesn’t always have to be numerical
• Numbers don’t always represent numerical quantities
• Data is meaningless without context
• Consider Who, What, When, Where, Why and How to
add context
Amazon Data Example
PO Ship Area Previous
Name Price Gift? ASIN Artist
Number To Code Purchase
Katherine
10675489 Ohio 10.99 440 Nashville N B00000I5Y6 Kansas
H.
Samuel Orange B000002BK
10783489 Illinois 16.99 312 Y Boston
P. County 9
Chris Mass’tt Bad B000068ZV
12837593 15.98 413 N Chicago
G. s Blood Q
Monique Canad B000001OA
15783947 11.99 902 Let Go N Garbage
D. a A
• Who? Where?
• What? When?
• Why? How?
Types of Data
Types of Data
Categorical
Quantitative
(or Qualitative)
Discrete or
Discrete
Continuous
Nominal Ordinal Interval Ratio
Four ‘levels of measurement’)
• Nominal data. These are purely qualitative data. All
you can do to present these data neatly is putting
each observation into one of a number of categories.
That is why they are also called ‘categorical’ data.
The different categories of the variable involved
cannot be ranked from ‘high’ to ‘low’. Examples:
religion, eye colour.
• Ordinal data. These data record some kind of
ranking. It is possible to say whether one
observation is ‘bigger’ or ‘better’ than another. But
due to the subjectivity or the lack of precision of the
measurement it not possible to state ‘how much
bigger’ or ‘better’. Example: Favourite bands, ‘rare’,
‘medium’, ‘well done’ steaks.
• Interval data. These data can be ranked on a scale
that uses a fixed unit of measurement. Thus, you can
say ‘how much more’ one observation is compared
to another. However, it does not make sense to
express one observation as a ratio of another.
Example: temperature.
• Ratio data. With these data you can sensibly express
one observation as a ratio of another. Examples:
distance, time, income.
EXERCISE
• Classify the following data using the variable
types on the previous slide:
– Temperatures of the 30 days in June
– The hair colour of first year students
– The number of DVD’s sold by a music store each day
– Social class codings of A, B, C1, C2, D, E.
– Species of butterfly
EXERCISE
– How could a customer’s age be collected so it
is recorded as qualitative data?
– How could a customer’s age be collected so it
is recorded as quantitative data?
Summary Statistics
• A summary measure is a single value that
describes a characteristic of a sample of data.
For example, if we want to know something about where
the centre of the data is located (i.e., average) we would
calculate a summary measure of location, e.g., the
mean, median or mode.
• Two characteristics important to decision makers
Central Tendency
Dispersion
Central Tendency
• Middle point of a distribution.
• Also called the measures of location.
Dispersion
• The spread of the data in a distribution.
or
• The extent to which the observations are
scattered.
Arithmetic Mean (average)
• Most common measure of central tendency.
• Best for making predictions.
• Symbolized as: X
– for the mean of a sample
– μ for the mean of a population
Calculating mean for ungrouped
data
Add up all the values and divide by the number
of values
x
x
n
Example
• Eleven Geography students were asked how much they spent on
travel each week (in £).
16 20 24 11 20 15 18 22 10 14 17
• Find the mean travel spend for these students:
Mean – Grouped Data
Example: The following table gives the frequency distribution of the number
of orders received each day during the past 50 days at the office of a mail-order
company. Calculate the mean. Number f
of order
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
Solution: n = 50
X is the midpoint of the
Number f x fx class. It is adding the
of order class limits and divide by
10 – 12 4 11 44 2.
13 – 15 12 14 168
x=
fx = 832 = 16.64
16 – 18 20 17 340 n 50
19 – 21 14 20 280
n = 50 = 832
• Advantages:
Easy to understand and calculate
As it is based on all observations, it becomes a good
representative.
Capable of further algebraic treatment.
• Disadvantages:
Affected by extreme values
Sometimes gives absurd results like 4.4 children per
family.
Cannot calculate mean for open-end class intervals
present in the data
Median
• Middle-most Value
• 50% of observations are above the Median, 50%
are below it
To compute the median
• First arrange the data into ascending or
descending order.
• use n 1 to find the position of the middle
value 2
• If the data contains an odd number of items, the
middle item is the median.
• If there is an even number of items, the median
is the average of the two middle items.
Example
• Eleven Geography students were asked how much they
spent on travel each week (in £).
16 20 24 11 20 15 18 22 10 14 17
• Find the median travel spend for these students:
Median for grouped data
Step 1: Construct the cumulative frequency distribution.
Step 2: Decide the class that contain the median.
Class Median is the first class with the value of cumulative
frequency equal at least n/2.
Step 3: Find the median by using the following formula:
n
2 -F
Median = Lm + i
fm
Where:
n = the total frequency
F = the cumulative frequency before class median
fm = the frequency of the class median
i = the class width
= the lower boundary of the class
Lm
median
Example
Time to travel to Frequency
work
1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
Solution:
41 – 50 7
1st Step: Construct the cumulative frequency distribution
Time to travel Frequency Cumulative
to work Frequency
1 – 10 8 8
11 – 20 14 22
21 – 30 12 34
31 – 40 9 43
41 – 50 7 50
n 50
25 class median is the 3rd class
2 2
fSo,
m F =L22,
m
= 12, = 20.5 and i = 10
Therefore,
n
- F
Median = Lm 2 i
f
m
25 - 22
= 21.5 10
12
= 24
Thus, 25 persons take less than 24 minutes to travel to work and another 25 persons
take more than 24 minutes to travel to work.
• Advantages:
Easy to calculate and understand.
Not affected by extreme values.
• Disadvantages:
The arrangement of data is time consuming when there
are large number of elements.
Mode
• Mode is the value that is repeated most
often in the data set.
• Bimodal and Multimodal distribution
Example
• Eleven Geography students were asked how much they
spent on travel each week (in £).
16 20 24 11 20 15 18 22 10 14 17
• Find the mode travel spend for these students:
Mode - Grouped data
Mode
•Mode is the value that has the highest frequency in a data set.
•For grouped data, class mode (or, modal class) is the class with the highest frequency.
•To find mode for grouped data, use the following formula:
Δ1
Mode = Lmo + i
Δ
1 + Δ2
Where:
i is the class
width
1 is the difference between the frequency of class mode and the frequency
of the class below the class mode
is the difference between the frequency of class
mode
2
and the frequency of the class above the class
mode
Lmo is the lower boundary of class mode
Example
Time to travel to Frequency
work
1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution:
Based on the table,
Lmo = 10.5, 1 = (14 – 8) = 6, 2 = (14 – 12) = 2 and
i = 10
6
Mode = 10.5 10 17.5
6 2
• Advantages:
It is simple to understand and calculate
It is not unduly affected by extreme values.
• Disadvantages:
When data set contains 2, 3, or many modes they are
difficult to interpret and compare.
Dispersion
• Central Tendency doesn’t tell us everything
• Dispersion/Deviation/Spread tells us a lot about
how a variable is distributed.
• We are most interested in Standard Deviations
(σ) and Variance (σ2)
Importance and application of
Dispersion
• It gives additional information that enables us to
judge the reliability of our measure of central
tendency.
• It enables us to compare dispersions of various
samples.
• Used by Financial Analysts to know the
dispersion in a firm’s earnings.
• Quality control experts use it to analyse
dispersion of a products quality levels.
Variance and Standard Deviation
• They both tell us an average distance of any
observation in the data set from the mean of the
distribution.
Denoted by
Sample- Population-
s: Standard Deviation σ: Standard
Deviation
s2: Variance σ2: Variance
Calculating variance
• Divide the sum of the squared distances
between the mean and each observation
in the population by the total number of
observations
Variance and Standard Deviation
-Grouped Data
fx
2
Population Variance: fx 2
N
2
N
fx
2
fx 2
n
Variance for sample data: s
2
n 1
Standard
Deviation:
Population: 2 2
Sample: s2 s2
• Other measures of dispersion
Range
Quartile deviations
RANGE= VALUE OF HIGHEST OBSERVATION -
VALUE OF LOWEST OBSERVATION
Range
• Advantages
Easy to calculate and understand.
• Disadvantages
Ignores the nature of variation in all
observations
Affected by extreme values.
No range for open end class.
INTERFRACTILE RANGE
• A measure of spread between two fractiles
in a frequency distribution , ie difference
between the values of two fractiles.
• Fractiles have special names: deciles,
quartiles, percentiles.
Interquartile range
• The difference between the first and the
third quartile.
• IQR = Q3-Q1
RELATIVE DISPERSION:COEFFICIENT
OF VARIATION
• The standard deviation cannot be the sole
basis for comparing two variations.
• Relative measure gives the magnitude of
the deviation relative to the measure of the
mean.
Some concepts
• Skewness
• Kurtosis
• Chebychev’s Theorem
EXERCISE DA20: Symmetrical
Distribution
EXERCISE DA20: Positive or Right
Skew
EXERCISE DA20: Negative or Left
Skew
Thank You