0% found this document useful (0 votes)

28 views22 pages

Descriptive Stats

Uploaded by

dina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views22 pages

Descriptive Stats

Uploaded by

dina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

DESCRIPTIVE STATISTICS

1 INTRODUCTION

Numbers and quantification offer us a very special language which

enables us to express ourselves in exact terms. This language is called
Mathematics. We will now learn the basic rules of Mathematics in order to
communicate effectively with figures. A huge part of psychological research deals
with statistical analysis so that one needs an adequate mathematical background
to understand statistical computations.

1.1 Pocket calculator

For this course, you will need a scientific calculator, that is, one that has
statistical functions and, more preferably, one having the linear regression (LR)
mode. The most cost-effective calculator for the purpose of this section is the
CASIO FX-82 TL. It will save you a tremendous amount of time – once
statistical data entered, statistics like the number of observations, mean, standard
deviation, correlation and regression coefficients can be readily obtained by just
pressing buttons. Obviously, computer software like SPSS or SAS are much more
powerful but the calculator can help you to determine basic statistics very quickly
‘on the spot’.

1.2 Summation notation

The summation notation is used to summarise a series, that is, the sum of
the terms of a sequence. It is denoted by Greek capital letter sigma, ∑ , as
opposed to small letter sigma, σ , which, in Statistics, stands for standard
deviation.

Sigma is most of the time seen in the following form:

b
∑
r =a
f (r )

where r is known as the index, a and b are the lower and upper limits of
summation respectively and f (r) is known as the general term. r, just like a
counter, starts at a and increases by steps of 1 until it reaches b. Each term of the
series is obtained by substituting successive values of r in the general term. The
following example illustrates the mechanism.

1
Example

∑
6

(2k + 1) =[2(2) + 1] + [2(3) + 1] + ... + [2(6) + 1] = 5 + 7 + 9 + 11 + 13 = 45.

k =2

Here, the index (counter) is k. It can be observed that k takes on an initial value of
2 (the lower limit) and increases by steps of 1 until it reaches the upper limit 6.
Every value that k assumes is substituted in the general term (2k + 1) in order to
generate a term of the series. Obviously, the terms are added up since Sigma
stands for summation.

In Statistics, however, we do not actually evaluate such expressions

numerically but rather use the summation notation strictly for summarisation
purposes. This is because the upper limit is generally non-numerical, that is, a
n
variable. We deal mostly with expressions of the form ∑ xi . If expanded, this
i =1

summation cannot be evaluated since it only gives the expression

x1 + x 2 + x3 + ... + x n −1 + x n .

Such expressions are found in the formulae for arithmetic mean and
standard deviation. In this module, students are simply required to recognise the
summation notation and understand its meaning so that they can at least use
relevant statistical functions on calculators.

2 DISTRIBUTIONS

A distribution is a set of observations which have been classified and

organised in an attempt to display information or calculate descriptive statistics. A
frequency distribution of grouped data is a good example of a distribution.

2.1 Ungrouped data

This type of information occurs as individual observations, usually as a

table or array of disorderly values (Fig. 2.1.1). These observations are to be firstly
arranged in some order (ascending or descending if they are numerical) or simply
‘grouped’ together in the form of a discrete frequency table (Fig. 2.1.2), which is
unlike a continuous frequency table, before proper presentation on diagrams is
possible. We do not lose any information if the original data is arranged in an
array or grouped as a discrete frequency table.

2
2 7 8 11 15
16 18 19 19 19
23 23 24 26 27
29 33 40 44 47
49 51 54 63 68

Fig. 2.1.1 Array

Age Frequency
19 14
20 23
21 134
22 149
23 71
24 8
Total 399

Fig. 2.1.2 Discrete frequency table

2.2 Grouped data

When the range of values (not observations) is too wide, a discrete

frequency table starts to become quite lengthy and cumbersome. Observations are
then grouped into cells or classes in order to compress the set of data for more
suitable tabulation. In this case, the data from Fig. 2.1.2 would not be a good
illustration, given the little variation in ages of students (from 19 to 24).

Age group Real limits Mid-class value Frequency

21 – 25 20.5 – 25.5 23 5
26 – 30 25.5 – 30.5 28 12
31 – 35 30.5 – 35.5 33 23
36 – 40 35.5 – 40.5 38 39
41 – 45 40.5 – 45.5 43 32
46 – 50 45.5 – 50.5 48 21
51 – 55 50.5 – 55.5 53 9
56 – 60 55.5 – 60.5 58 2
Total 143

Fig. 2.2.1 Continuous frequency distribution

3
The main drawback in grouping of data is that the identity (value) of each
observation is lost so that important descriptive statistics like the mean and
standard deviation can only be estimated and not exactly calculated. For example,
if the age group ‘21–25’ has frequency 5 (Fig. 2.2.1), nothing can be said about
the values of these 5 observations. Besides, a lot of new quantities have to be
calculated in order to satisfy statistical calculations and analyses as will be
explained in the following sections.

2.2.1 Limits and real limits (or boundaries)

A class is bounded by a lower and an upper limit – in the previous

paragraph, the lower and upper limits of the age group ’21–25’ are 21 and 25
respectively. A real limit (Fig. 2.2.1) is obtained by making a continuity
correction to a limit (explained below). In a frequency distribution, we
differentiate between limits and real limits by the fact that the upper limit of a cell
can never be equal to the lower limit of the next cell. Real limits are fictitious
values if the values recorded are discrete. However, they are useful not only for
the purpose of calculations but also for presentation of data on histograms as well
as several other types of charts and diagrams.

For instance, if we have a frequency distribution of ages in which we have

the two neighbouring cells ‘21–25’ and ‘26–30’, then drawing a histogram for this
distribution will require that the limits 25 and 26 be equal, the reason being that
there is no ‘gap’ between any two successive rectangles of a histogram! We
therefore make a continuity correction of ± 0.5, the equivalent of half a ‘gap’.

Note The ‘gap’ between any pair of successive cells in a frequency distribution
is equal to the degree of accuracy to which the original observations were
recorded.

In the above example, it is easy to deduce that age was recorded to the
nearest unit since the ‘gap’ between the cells ‘21–25’ and ‘26–30’ is 1. The real
limits of these 2 will now be ‘20.5–25.5’ and ‘25.5–30.5’. Note that the following
relationships hold:

Lower real limit = Lower limit – continuity correction

Upper real limit = Upper limit + continuity correction

4
2.2.2 Mid-class values (MCV)

The mid-class value, MCV, of a cell is defined as its midpoint, that is, the
average of its limits or real limits. Thus, the MCV of the cell ‘21–25’ is 23. The
MCV of a cell is the representative of that cell in the sense that, since the values
of all the observations in the cell are unknown individually, it is assumed that they
are all equal to the MCV. This assumption is not fortuitous and neither is it
unjustified. It has the logical implication that if observations are unknown, the
best way of estimating statistics more accurately would be to assume that, at least,
they are uniformly distributed within the cell (which could be untrue, of course!).
Mathematically, the sum of the observations would be equal to the number of
observations multiplied by the MCV (think about it!). The importance of the mid-
class value can thus never be underestimated, especially for the calculation of the
crucial statistics like the mean and standard deviation.

2.2.3 Class interval or cell width

The cell width is simply the length of the cell, that is, the difference
between its lower and upper real limits.

Note Do not make the mistake of subtracting the lower limit from the upper
limit since this will not give the exact cell width.

This can be easily verified by taking the cell ’21–25’. Its cell width is 5
(21, 22, 23, 24 and 25), which is obtained by subtracting 20.5 from 25.5. We
therefore use the following formula:

Cell width = Upper real limit – Lower real limit

3 DESCRIPTION OF A DISTRIBUTION

A distribution is usually defined in terms of very precisely calculated

statistics like the mean and standard deviation. The main objective of descriptive
statistics is to be able to summarise an entire set of data, grouped or ungrouped, in
terms of a few figures only. Summary statistics must be powerful and explicit
enough to paint a global idea of a distribution, especially for the non-statistician.
In general, a distribution is described in terms of four main characteristics:

1. Location
2. Dispersion
3. Skewness
4. Kurtosis

5
3.1 LOCATION (LOCALITY OR CENTRAL TENDENCY)

A measure of location, otherwise known as central tendency, is a point in

a distribution that corresponds to a typical, representative or middle score in that
distribution. The most common measures of location are the mean (arithmetic),
median and mode.

3.1.1 Arithmetic mean

The arithmetic mean is the most common form of average. For a given set
of data, it is defined as the sum of the values of all the observations divided by the
total number of observations. The mean is denoted by x for a sample and by μ
for a population. Its formula, however, differs for ungrouped and grouped data.

Ungrouped data

x=
∑x
μ=
∑X
n N
Grouped data

x=
∑ fx μ=
∑ fX
∑f N

n = sample size
N = population size
f = frequency of classes

Merits

1. It is widely understood.
2. Its calculation involves all observations.
3. It is suited to further statistical analysis.

Limitations

1. It cannot be located by inspection nor can it be found graphically.

2. Its value may be purely theoretical.
3. It is sensitive to extreme values.
4. It is not applicable for qualitative data.

6
3.1.2 Geometric mean

The geometric mean is a specialised measure of location. It is used to

measure proportional changes in, for example, wages or prices of goods.

The geometric mean of n items is defined as the nth root of their combined
product. The general formula which is used to calculate the geometric mean is as
follows:

Geometric mean = n x1 × x 2 × x3 ... × x n

where n is the number of items to be averaged and

x1 , x 2 , x3 , ... , x n are the individual values of the items to be averaged.

The best way to demonstrate the geometric mean when it is used to

calculate proportional increases is by means of an example.

Example

The price of a particular commodity has been increasing over a four-year

period as follows.
$84 $97 $116 $129

The proportional increases from each year to the next are

97 − 84
= 0.155 = p1
84

116 − 97
= 0.196 = p 2
97

129 − 117
= 0.112 = p3
117

Geometric mean = n (1 + p1 )(1 + p 2 )...(1 + p n )

= 3 1.155 × 1.196 × 1.112 = 1.154

Average proportional increase = 1.154 – 1 = 15.4%

Note $84 × 1.154 3 = $129.

7
Merit

It takes little account of extreme values.

Limitation

It cannot be applied if the data contains zero values.

3.1.3 Harmonic mean

The harmonic mean is another specialised measure of location. It is used

when the data consists of a set of rates such as prices ($/kg), speeds (km/hr) or
production (output/man-hour).

The harmonic mean of n items is the number of items divided by the sum of the
reciprocal of each individual item.

The general formula for calculating the harmonic mean is given as:

n
Harmonic mean =
1 1 1 1
+ + + ... +
x1 x 2 x3 xn

with the usual notation.

Example

An organisation owns three lorries. Over a distance of 100 miles, one does
14 miles per gallon, one 18 miles per gallon and one 20 miles per gallon.

3
Harmonic mean = = 16.95
1
14 + 181 + 1
20

Average consumption = 16.95 miles per gallon.

Merit

It takes little account of extreme values.

8
3.1.4 Weighted mean

A weighted mean is used whenever a simple average fails to give an

accurate reflection of the relative importance of the items being averaged.

If a weight of wi is assigned to an item xi , then the formula for the

weighted mean is given by

∑ wi xi
x weighted =
∑ wi

Example

In a certain institution, the year marks for modules are based upon a first-
term test, a second-term test and a final exam at the end of the year. Given the
number of topics to be covered for each assessment, they have a relative
importance in the ratio 2:3:5. If a student obtained 74 marks in the first test, 63 in
the second test and 55 in the final exams, what is his year mark?

The year mark is calculated as

(2 × 74) + (3 × 63) + (5 × 55)

= 61.2
(2 + 3 + 5)

3.1.5 Median

The median is the middle observation of a distribution and is usually

denoted by Q2 , given that it is also the second quartile. It is important to know
that the median can only be determined after arranging numerical data in
ascending (or descending) order. If n is the total number of observations, then the
rank of the median is given by 12 (n + 1) . For ungrouped data, if n is odd, the
median is simply the middle observation but, if n is even, then the median is the
mean of the two middle observations.

In the case of grouped data, the determination of the value of the median is
slightly more complicated since the identity of individual observations is
unknown. We proceed as follows:

1. Calculate the rank of the median.

2. Locate the cell in which the median is found (with the help of cumulative
frequencies).
3. Determine the value of the median by linear interpolation (simple
proportion).

9
The formula for calculating the median is given by

⎛ n +1 − CF ⎞
Median = LCB + ⎜⎜ 2 ⎟⎟ c
⎝ f ⎠

where LCB is the lower real limit of the median class

f is the frequency of the median class
c is the class interval of the median class
CF is the cumulative frequency of the class preceding the median class

Note The ‘median class’ is the class containing the median.

Merits

1. It is rigidly defined.
2. It is easily understood and, in some cases, it can even be located by
inspection.
3. It is not at all affected by extreme values.

Limitations

1. If n is even, the median is purely theoretical.

2. It is a rank-based statistic so that its calculation does not involve all the
observations.
3. It is not suited to further statistical analysis.

Special note on percentiles

A percentile is a number or score-indicating rank which reveals the

percentage of those being measured fall below that particular score. The k-th
k
percentile is denoted by Pk and its rank is given by 100 (n + 1) . For example, the
th
median, Q2 or P50 , is the 50 percentile. The most widely used percentiles are the
quartiles. Quartiles divide a distribution in four equal parts in terms of
observations. The first or lower quartile is the value below which 25% of the
distribution lies while the upper 25% of the distribution lies above the third or
upper quartile. The median is also known as the second or middle quartile.

Quartiles are calculated in the same way as the median, that is using the
same formula except, obviously, for the rank. (Formula to be explained in detail.)

10
3.1.6 Mode

The mode is the observation which occurs the most or with the highest
frequency. Sometimes, it is denoted by x̂ . For ungrouped data, it may easily be
detected by inspection. If there is more than one observation with the same
highest frequency, then we either say that there is no mode or that the distribution
is multimodal.

For grouped data, we can only estimate the mode – the class with the
highest frequency is known as the modal class. Since we would prefer a single
value for the mode (instead of an entire class), a rough approximation is the mid-
class value of the modal class. However, there are two ways of estimating the
mode quite accurately. Both should theoretically lead to the same result, the first
one being numerical and the second, graphical.

The formula for a numerical estimation of the mode is given by

⎛ f ⎞
Mode = LCB + ⎜⎜ 1
⎟⎟ c
+
⎝ 1 f2 ⎠
f

where f 1 is the difference between the frequencies of the modal class and that of
the class preceding it and f 2 is the difference between the frequencies of the
modal class and that of the class following it.

The mode may also be estimated by means of a frequency distribution

histogram. We simply draw a histogram with the modal class and its two
neighbouring classes, that is, found immediately before and after it.

Modal
class
Frequency
density

O Values

Fig. 3.1.6 Estimating the mode on a histogram

11
Merits

1. It is easy to understand and can sometimes be located by inspection.

2. It is not influenced by extreme values.
3. It may even be used for non-numerical data.

Limitations

1. Its calculation does not involve all the observations.

2. It is not clearly defined when there are several modes in a distribution.
3. It is not suited to further statistical analysis.

3.2 DISPERSION

A measure of dispersion shows the amount of variation or spread in the

scores (values of observations) of a variable. When the dispersion is large, the
values are widely scattered whereas, when it is small, they are tightly clustered.
The two most well-known measures of dispersion are the variance and standard
deviation.

3.2.1 Range

The range is simply the difference between the values of the maximum
and minimum observations. It can only measure the extent to which the
distribution spreads over the x-axis.

Merit

It is easy to calculate and understand.

Limitations

1. It is directly affected by extreme values.

2. It gives no indication of spread between the extremes.
3. It is not suited to further statistical analysis.

12
3.2.2 Variance

The variance is the most accurate way of determining the spread of a

distribution as it qualifies for almost all the properties laid down for an ideal
measure of dispersion. Sample and population variances are denoted by s 2 and
σ 2 respectively. All statistical formulae, for ungrouped or grouped data, are given
in terms of variance:

Ungrouped data

s 2
=
∑ (x − x)2
σ 2
=
∑ (X − μ)2
n N

Grouped data

s2 =
∑ f (x − x)2 σ2 =
∑ f (X − μ)2
∑f N

with the usual notations.

Note The formula for variance can be simplified using the laws of summation
so that calculations may become shorter and less complicated.

s 2
=
∑ x2
−x 2
s 2
=
∑ fx 2
− x2
n ∑f

3.2.3 Standard deviation

Standard deviation is defined as the positive square root of variance. It is

as important as variance but is more commonly used due to its linear nature. The
more widely the scores are spread out, the larger the standard deviation. We also
use the term standard error in the case of an estimate.

The concept of standard deviation is so important that it can be treated as

the foundation stone for inferential statistics, that is, estimation and hypothesis
testing.

13
3.2.4 Mean deviation

The mean deviation is a measure of the average amount by which the

values in a distribution differ from the arithmetic mean. Its formula is given by

∑ f x−x
Mean deviation =
n

Note Obviously, the frequency f falls off when there are no classes in the
distribution, that is, only individual values.

Merits

1. It uses all values in the distribution to measure dispersion.

2. It is not greatly affected by extreme values.

Limitations

1. The distance from the mean does not reveal whether the observation is less
than or greater than the mean.
2. It is not suitable for further statistical analysis.

3.2.5 Quartile deviation and the inter-quartile range

A measure of spread in a frequency distribution is the quartile deviation.

This is equal to half the difference between the lower and upper quartiles and is
sometimes called the semi inter-quartile range. Its formula is given by

Q3 − Q1
Quartile deviation =
2

The quartile deviation shows the average distance between a quartile and
the median. The smaller the quartile deviation, the less dispersed is the
distribution. Just like the range, the quartile deviation can be misleading. If the
majority of the data is towards the lower end of the range, then the third quartile
will be considerably further above the median than the first quartile is below it. In
such a case, when the two distances from the median are averaged, the difference
is disguised. Then, it would be better to quote the actual values of the two
quartiles rather than the quartile deviation.

It is customary to compare the efficiency of the median-quartile deviation

pair with the mean-standard deviation in describing a distribution. Most of the
time, the mean and the standard deviation are better since both their calculations
involve all the observations. However, the median and the quartile deviation are
hardly influenced by extreme values given that they are more rank-based.

14
3.2.6 Coefficient of variation

The coefficient of variation (CV) is mainly used to compare two

distributions and is thus considered to be a relative measure of dispersion. When
two distributions have the same mean but different standard deviations, it is easy
to conclude which one is more dispersed – that would be the one with the higher
standard deviation. However, if the means are not equal, it is somewhat difficult
to compare the dispersions just by looking at the standard deviations.

The formula for the coefficient of variation is given by

s
Coefficient of variation = × 100
x

Example

Consider the two variables A and B in the following distributions.

A B
Mean 120 125
Standard deviation 50 51
Coefficient of variation 41.7 40.8

Table 3.2.6

At first glance, we would conclude that B has a greater variation (dispersion)

since it has a higher standard deviation (51). We should also look at the values of
the means – they are not equal. Thus, the only way to determine the degree of
dispersion is by calculating the coefficient of variation for each distribution.

A has a CV of 41.7% while B has a CV of 40.8%. This shows the

usefulness of the coefficient of variation. It is especially used in the comparison of
rates of return in financial investments.

3.2.7 Quartile coefficient of dispersion

The quartile coefficient of dispersion measures the dispersion using

quartiles. It differs from the quartile deviation because it is expressed as a
proportion and not in units of the value of the variable. The lower the proportion,
the less the dispersion.

Its formula is given by

Q3 − Q1
Quartile coefficient of dispersion =
Q3 + Q1

15
3.2.8 Coefficient of mean deviation

The coefficient of mean deviation is simply the mean deviation expressed

as a proportion of the arithmetic mean. This may be useful measure because it
shows the relative size of the mean deviation.

Its formula is given as

∑ f x−x
Coefficient of mean deviation =
nx

Again, the frequency f falls off if there are no classes in the distribution.

3.3 SKEWNESS

Skewness is a measure of symmetry – it determines whether there is a

concentration of observations somewhere in particular in a distribution. If most
observations lie at the lower end of the distribution, the distribution is said to be
positively skewed (or skewed to the right). If the concentration of observations is
towards the upper end of the distribution, then it is said to display negative
skewness (skewed to the left). A symmetrical distribution is said to have zero
skewness.

Fig. 2.3.3 shows the various possible shapes of frequency distributions.

The vertical bars on each diagram indicate the respective positions of the mean
(bold), median (dashed) and mode (normal). In the case of a symmetrical
distribution, the mean, median and mode are all equal in values (for example, the
normal distribution).

Positively skewed Symmetrical Negatively skewed

Fig. 3.3 Skewness

16
3.3.1 Pearson’s coefficient of skewness

This is the most accurate measure of dispersion since its formula contains
two of the most reliable statistics, the mean and standard deviation. The formula
is given as

3 ( x − Q2 )
α=
s

Note The validity of the formula can be verified by looking at the positions of
the mean and median in Fig. 3.3.

3.3.2 Quartile coefficient of skewness

A less accurate but relatively quicker way of estimating skewness is by the

use of quartiles of a distribution. The formula is given by

Q1 + Q3 − 2Q2
α=
Q3 − Q1

3.4 KURTOSIS

Kurtosis has a specific mathematical definition but, in the general sense, it

indicates the degree of ‘peakedness’ of a unimodal frequency distribution. It may
be also considered as a measure of the relative concentration of observations in
the centre, upper and lower ends and the ‘shoulders’ of a distribution. Kurtosis
usually indicates to which extent a curve (distribution) departs from the bell-
shaped or normal curve.

Kurtosis can be expressed numerically or graphically. The normal

distribution has a kurtosis of 3 and is used as a reference in the calculation of the
coefficient of kurtosis of any given distribution. If we observe the normal curve,
we will see that its tails are neither too thick nor too thin and that there are neither
too many nor too few observations concentrated in the centre. It is thus said to be
mesokurtic.

If we start with the normal distribution and move scores from both centre
and tails towards the shoulders, the curve becomes flatter and is said to be
platykurtic. If, on the other hand, we move scores from the shoulders to the centre
and tails, the curve becomes more peaked with thicker tails. In that case, it is said
to be leptokurtic. Fig. 3.4 shows the degree of peakedness for three types of
distributions.

17
Platykurtic Mesokurtic Leptokurtic

Fig. 3.4 Kurtosis

3.4.1 Coefficient of kurtosis

The formula for calculating kurtosis is given by

β=
∑ (x − x)4
or β =
∑ f (x − x)
4

ns 4 ns 4

It is customary to subtract 3 from β for the sake of reference to the

normal distribution. A negative value would indicate a platykurtic curve whereas
a positive coefficient of kurtosis indicates a leptokurtic distribution.

4 EXAMPLES

We shall now illustrate the application of all the theory learnt in

the previous sections by means of the following three examples (ungrouped and
grouped data). The complete procedures for the calculations of descriptive
statistics will be shown but it is generally advisable to use a pocket calculator to
save computation time.

All three cases will be studied:

1. Ungrouped raw data

2. Ungrouped data in a discrete frequency distribution
3. Grouped (continuous) data

The full descriptive statistics have been calculated and given in Tables
4.4, 4.5 and 4.6.

18
4.1 Example 1 (ungrouped raw data)

Data already arranged in ascending order:

2 7 8 11 15
16 18 19 19 19
23 23 24 26 27
29 33 40 44 47
49 51 54 63 68

Table 4.1

4.2 Example 2 (ungrouped data – discrete frequency table)

Age (x) Frequency (f) cf fx fx2

19 14 14 266 5054
20 23 37 460 9200
21 134 171 2814 59094
22 149 320 3278 72116
23 71 391 1633 37559
24 8 399 192 4608
Total 399 8643 187631
Table 4.2

4.3 Example 3 (grouped data)

Age group Real limits MCV (x) f cf fx fx2

21 – 25 20.5 – 25.5 23 5 5 115 2645
26 – 30 25.5 – 30.5 28 12 17 336 9408
31 – 35 30.5 – 35.5 33 23 40 759 25047
36 – 40 35.5 – 40.5 38 39 79 1482 56316
41 – 45 40.5 – 45.5 43 32 111 1376 59168
46 – 50 45.5 – 50.5 48 21 132 1008 48384
51 – 55 50.5 – 55.5 53 9 141 477 25281
56 – 60 55.5 – 60.5 58 2 143 116 6728
Total 143 5669 232977

Table 4.3

19
Table 4.4 Descriptive statistics for Example 4.1

x=
∑ fx = 735 = 29.4
Mean
∑f 25
1
Rank of median = (25 + 1) = 13
Median 2
Median = 24

Mode The observation with the highest frequency (3) is 19

1
Rank of first quartile = (25 + 1) = 6.5
4
Lower Quartile
(15 + 16)
Q1 = = 15.5
2
3
Rank of third quartile = (25 + 1) = 19.5
4
Upper Quartile
(44 + 47)
Q3 = = 67.5
2
Maximum Maximum observation = 68

Minimum Minimum observation = 2

Range Range = 68 – 2 = 66

Quartile deviation QD = 0.5 × (67.5 − 15.5) = 26

∑ f x−x 368.8
Mean deviation MD = = = 14.752
n 25
∑ x − x 2 = 29351 − (29.4) 2 = 17.598
2
Standard deviation s2 =
n 25
Q3 − Q1 67.5 − 15.5
Quartile coefficient of dispersion Quart. coeff. of dis. = = = 0.313
Q3 + Q1 67.5 + 15.5

∑ f x−x 14.752
Coefficient of mean deviation Coeff. of MD = = = 0.50
nx 29.4
3 ( x − Q2 ) (3)(29.4 − 24)
Pearson’s coefficient of skewness α= = = 0.92
s 17.598

β=
∑ ( x − x ) 4 = 4226007.248 = 1.763
Coefficient of kurtosis 4 4
ns (25)(17.598)

20
Table 4.5 Descriptive statistics for Example 4.2

x=
∑ fx = 8643 = 21.66
Mean
∑f 399
1
Rank of median = (399 + 1) = 200
Median 2
Median = 22

Mode The observation with the highest frequency (149) is 22

1
Rank of first quartile = (399 + 1) = 100
Lower Quartile 4
Q1 = 21
3
Rank of third quartile = (399 + 1) = 300
Upper Quartile 4
Q3 = 22

Maximum Maximum observation = 24

Minimum Minimum observation = 19

Range Range = 24 – 19 = 5

Quartile deviation QD = 0.5 × (22 − 21) = 0.5

∑ f x−x 328.38
Mean deviation MD = = = 0.823
n 399

Standard deviation s2 =
∑ fx 2 − x 2 = 187631 − (21.66) 2 = 1.013
∑f 399

Q3 − Q1 22 − 21
Quartile coefficient of dispersion Quart. coeff. of dis. = = = 0.023
Q3 + Q1 22 + 21

∑ f x−x 0.823
Coefficient of mean deviation Coeff. of MD = = = 0.038
nx 21.66

3 ( x − Q2 ) (3)(21.66 − 22)
Pearson’s coefficient of skewness α= = = –1.007
s 1.013

β=
∑ f (x − x)4 = 468.7743
= 1.116
Coefficient of kurtosis 4
ns (399)(1.103) 4

21
Table 4.6 Descriptive statistics for Example 4.3

x=
∑ fx = 5669 = 39.64.
Mean
∑f 143
1
Rank of median = (143 + 1) = 72
2
Median
⎛ 72 − 40 ⎞
Median = 35.5 + ⎜ ⎟(5) = 39.60
⎝ 39 ⎠
Modal class: 36 – 40
Mode ⎛ 16 ⎞
Mode = 35.5 + ⎜ ⎟ (5) = 38.98
⎝ 16 + 7 ⎠
1
Rank of first quartile = (143 + 1) = 36
4
Lower Quartile
⎛ 36 − 17 ⎞
Q1 = 30.5 + ⎜ ⎟(5) = 34.63
⎝ 23 ⎠
3
Rank of median = (143 + 1) = 108
4
Upper Quartile
⎛ 108 − 79 ⎞
Q3 = 40.5 + ⎜ ⎟(5) = 45.03
⎝ 32 ⎠
Maximum Maximum observation = 60

Minimum Minimum observation = 21

Range Range = 60 – 21 = 39

Quartile deviation QD = 0.5 × (45.03 − 34.63) = 5.2

∑ f x−x 879.6
Mean deviation MD = = = 6.151
n 143

Standard deviation s2 =
∑ fx 2 − x 2 = 232977 − (39.64) 2 = 7.590
∑f 143
Q3 − Q1 45.03 − 34.63
Quartile coefficient of dispersion Quart. coeff. of dis. = = = 0.13
Q3 + Q1 45.03 + 34.63

∑ f x−x 6.151
Coefficient of mean deviation Coeff. of MD = = = 0.155
nx 39.64
3 ( x − Q2 ) (3)(39.64 − 39.60)
Pearson’s coefficient of skewness α= = = 0.016
s 7.590

β=
∑ f (x − x)4 = 468.7743
= 1.116
Coefficient of kurtosis 4
ns (399)(1.103) 4

Locskew
No ratings yet
Locskew
8 pages
Lesson Note For S.S 2
No ratings yet
Lesson Note For S.S 2
24 pages
Data Collection and Display
No ratings yet
Data Collection and Display
34 pages
General Statistics
No ratings yet
General Statistics
32 pages
Agrc 311 Part One
No ratings yet
Agrc 311 Part One
25 pages
L I - Statistics
No ratings yet
L I - Statistics
6 pages
Introduction To Statistics2
No ratings yet
Introduction To Statistics2
66 pages
AMSG 20 Statistics PDF
No ratings yet
AMSG 20 Statistics PDF
13 pages
Statistical Analysis 2023
No ratings yet
Statistical Analysis 2023
56 pages
SMA 160 Stds Notes PDF
No ratings yet
SMA 160 Stds Notes PDF
41 pages
Statistics: Continuous Variables
No ratings yet
Statistics: Continuous Variables
13 pages
Notes Part 2
No ratings yet
Notes Part 2
310 pages
OCR MEI S1 Summary Sheets
No ratings yet
OCR MEI S1 Summary Sheets
9 pages
History and Basics of Statistics
No ratings yet
History and Basics of Statistics
18 pages
A Pictograph Is A Way of Showing Data Using Images. Each Image Stands For A Certain Number of Things
No ratings yet
A Pictograph Is A Way of Showing Data Using Images. Each Image Stands For A Certain Number of Things
9 pages
Chapter 1 INTRODUCTION TO DATA
No ratings yet
Chapter 1 INTRODUCTION TO DATA
9 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
MMW Module 4 - Statistics
No ratings yet
MMW Module 4 - Statistics
18 pages
Lesson 5 - Quantitative Analysis and Interpretation of Data
No ratings yet
Lesson 5 - Quantitative Analysis and Interpretation of Data
78 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
1st Mid
No ratings yet
1st Mid
19 pages
Analytical Techniques Lec 1
No ratings yet
Analytical Techniques Lec 1
42 pages
Statistic CH 1 30-Jan-2025 08-57-44
No ratings yet
Statistic CH 1 30-Jan-2025 08-57-44
14 pages
1.ungrouped Data Mean, Median&Mode
No ratings yet
1.ungrouped Data Mean, Median&Mode
39 pages
GCSE Statistics Revision Guide
100% (4)
GCSE Statistics Revision Guide
11 pages
Statistics & Probability
No ratings yet
Statistics & Probability
105 pages
11.11 Statistics
No ratings yet
11.11 Statistics
28 pages
1 - Descriptive Statistics Data: Frequency Distribution
No ratings yet
1 - Descriptive Statistics Data: Frequency Distribution
57 pages
Gcse Statistics Revision Notes
No ratings yet
Gcse Statistics Revision Notes
10 pages
Statatics Chapter 1
No ratings yet
Statatics Chapter 1
21 pages
Data Organization Techniques in Statistics
No ratings yet
Data Organization Techniques in Statistics
14 pages
GCSE Statistics
No ratings yet
GCSE Statistics
10 pages
Yr10 Chapter 22U Statistics 2023
No ratings yet
Yr10 Chapter 22U Statistics 2023
12 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
5 pages
Introduction to Probability & Statistics
No ratings yet
Introduction to Probability & Statistics
76 pages
Frequency Distribution Table Graph
No ratings yet
Frequency Distribution Table Graph
10 pages
Or Lecture 202209
No ratings yet
Or Lecture 202209
21 pages
Statistics Review
No ratings yet
Statistics Review
59 pages
Statistical Data Analysis Guide
No ratings yet
Statistical Data Analysis Guide
5 pages
Statistics Lec 1
No ratings yet
Statistics Lec 1
28 pages
MAT114, 217 Lecture Note.
No ratings yet
MAT114, 217 Lecture Note.
12 pages
Statistics: Afrah Umran
No ratings yet
Statistics: Afrah Umran
27 pages
CH#14 1
No ratings yet
CH#14 1
9 pages
Statistical Analysis Principles Guide
No ratings yet
Statistical Analysis Principles Guide
45 pages
Chapter 2-Descriptive Statistics and Data Presentation
No ratings yet
Chapter 2-Descriptive Statistics and Data Presentation
7 pages
Lesson2 Stats
No ratings yet
Lesson2 Stats
58 pages
18bge14a U2
No ratings yet
18bge14a U2
27 pages
Intro to Descriptive Statistics
No ratings yet
Intro to Descriptive Statistics
28 pages
Collecting Presenting
No ratings yet
Collecting Presenting
18 pages
QT Module-2
No ratings yet
QT Module-2
45 pages
Descriptive Statistics Guide
No ratings yet
Descriptive Statistics Guide
28 pages
IEM Outline Lecture Notes Autumn 2016
No ratings yet
IEM Outline Lecture Notes Autumn 2016
198 pages
Elementary Statistics
No ratings yet
Elementary Statistics
73 pages
Frequency Distribution: A Frequency Distribution Is Constructed For Three Main Reasons
No ratings yet
Frequency Distribution: A Frequency Distribution Is Constructed For Three Main Reasons
15 pages
Statistics
No ratings yet
Statistics
39 pages
Statistics and Probability
100% (7)
Statistics and Probability
141 pages