Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
68 views4 pages

Data Analysis

There are three main measures of dispersion used to describe the variability in a data set: 1. Range is the difference between the highest and lowest values, but it only uses two values and can be impacted by outliers. 2. Interquartile range describes the middle 50% of values and is not impacted by outliers, but it is not amenable to mathematical manipulation. 3. Standard deviation is the most commonly used measure, calculated as the square root of the average squared deviations from the mean. It indicates how spread out values are and can be used to detect skewness, but it is inappropriate for skewed data sets.

Uploaded by

Rjian Llanes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views4 pages

Data Analysis

There are three main measures of dispersion used to describe the variability in a data set: 1. Range is the difference between the highest and lowest values, but it only uses two values and can be impacted by outliers. 2. Interquartile range describes the middle 50% of values and is not impacted by outliers, but it is not amenable to mathematical manipulation. 3. Standard deviation is the most commonly used measure, calculated as the square root of the average squared deviations from the mean. It indicates how spread out values are and can be used to detect skewness, but it is inappropriate for skewed data sets.

Uploaded by

Rjian Llanes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

R.J.

Ian Dela Cruz


ELET-2201

Measures of Dispersion

The measures of central tendency are not adequate to describe data. Two data sets can
have the same mean but they can be entirely different. Thus to describe data, one needs to know
the extent of variability. This is given by the measures of dispersion. Range, interquartile range,
and standard deviation are the three commonly used measures of dispersion.

RANGE

The range is the difference between the largest and the smallest observation in the data.
The prime advantage of this measure of dispersion is that it is easy to calculate. On the other
hand, it has lot of disadvantages. It is very sensitive to outliers and does not use all the
observations in a data set. It is more informative to provide the minimum and the maximum
values rather than providing the range.

A range is the most common and easily understandable measure of dispersion. It is the
difference between two extreme observations of the data set. If X max and X min are the two extreme
observations then.

Range = X max – X min

Merits of Range

-It is the simplest of the measure of dispersion


-Easy to calculate
-Easy to understand
-Independent of change of origin

Demerits of Range
-It is based on two extreme observations. Hence, get affected by fluctuations
-A range is not a reliable measure of dispersion
-Dependent on change of scale
INTERQUARTILE RANGE
Interquartile range is defi ned as the difference between the

Hence the interquartile range describes the middle 50% of


observations. If the interquartile range is large it means that
the middle 50% of observations are spaced wide apart. The
important advantage of interquartile range is that it can be used

INTERQUARTILE RANGE

Interquartile range is defined as the difference between the 25th and 75th percentile (also
called the fi rst and third quartile). Hence the interquartile range describes the middle 50% of
observations. If the interquartile range is large it means that the middle 50% of observations are
spaced wide apart. The important advantage of interquartile range is that it can be used as a
measure of variability if the extreme values are not being recorded exactly (as in case of open-
ended class intervals in the frequency distribution). Other advantageous feature is that it is not
affected by extreme values. The main disadvantage in using interquartile range as a measure of
dispersion is that it is not amenable to mathematical manipulation.

STANDARD DEVIATION

Standard deviation (SD) is the most commonly used measure of dispersion. It is a


measure of spread of data about the mean. SD is the square root of sum of squared deviation
from the mean divided by the number of observations.

This formula is a definitional one and for calculations, an easier formula is used. The
computational formula also avoids the rounding errors during calculation.

In both these formulas n - 1 is used instead of n in the denominator, as this produces a


more accurate estimate of population SD.
The reason why SD is a very useful measure of dispersion is that, if the observations are
from a normal distribution, then[3] 68% of observations lie between mean ± 1 SD 95% of
observations lie between mean ± 2 SD and 99.7% of observations lie between mean ± 3 SD
The other advantage of SD is that along with mean it can be used to detect skewness. The
disadvantage of SD is that it is an inappropriate measure of dispersion for skewed data.
A standard deviation is the positive square root of the arithmetic mean of the squares of
the deviations of the given values from their arithmetic mean. It is denoted by a Greek letter
sigma, σ. It is also referred to as root mean square deviation. The standard deviation is given as

σ = [(Σi (yi – ȳ) ⁄ n] ½ =  [(Σ i yi 2 ⁄ n) – ȳ 2] ½

For a grouped frequency distribution, it is

σ = [(Σi  fi (yi – ȳ) ⁄ N] ½ = [(Σi fi  yi 2 ⁄ n) – ȳ 2] ½

The square of the standard deviation is the variance. It is also a measure of dispersion.

σ 2 = [(Σi (yi – ȳ ) / n] ½ =  [(Σi yi 2 ⁄ n) – ȳ 2]

For a grouped frequency distribution, it is

σ 2 = [(Σi  fi (yi – ȳ ) ⁄ N] ½ = [(Σ i fi  xi 2 ⁄ n) – ȳ 2].

If instead of a mean, we choose any other arbitrary number, say A, the standard deviation
becomes the root mean deviation.

Mean Deviation

Mean deviation is the arithmetic mean of the absolute deviations of the observations from
a measure of central tendency. If x1, x2, … , xn are the set of observation, then the mean deviation
of x about the average A (mean, median, or mode) is

Mean deviation from average A = 1⁄n [∑i|xi – A|]

For a grouped frequency, it is calculated as:


Mean deviation from average A = 1⁄N [∑i  fi |xi – A|], N = ∑fi
Here, xi and fi are respectively the mid value and the frequency of the ith class interval.

Merits of Mean Deviation


-Based on all observations
-It provides a minimum value when the deviations are taken from the median
-Independent of change of origin

Demerits of Mean Deviation


-Not easily understandable
-Its calculation is not easy and time-consuming
-Dependent on the change of scale
-Ignorance of negative sign creates artificiality and becomes useless for further mathematical
treatment

Find the Variance and Standard Deviation of the Following Numbers: 1, 3, 5, 5, 6, 7, 9, 10.

The mean = 46/ 8 = 5.75


Step 1: (1 – 5.75), (3 – 5.75), (5 – 5.75), (5 – 5.75), (6 – 5.75), (7 – 5.75), (9 – 5.75), (10 – 5.75)
= -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
Step 2: Squaring the above values we get, 22.563, 7.563, 0.563, 0.563, 0.063, 1.563, 10.563,
18.063
Step 3: 22.563 + 7.563 + 0.563 + 0.563 + 0.063 + 1.563 + 10.563 + 18.063
= 61.504
Step 4: n = 8, therefore variance (σ2) = 61.504/ 8 = 7.69 (3sf)
Now, Standard deviation (σ) = 2.77 (3sf)

You might also like