DESCRIPTIVE STATISTICS
PART 1
Abdulwali Sabo Abdulrahman (MSc Medical Statistics)
Department of Community Medicine
OUTLINE
1. INTRODUCTION
2. MEASURES OF CENTRAL TENDENCY
3. MEASURES OF VARIABILITY
1. INTRODUCTION
When measurements of a random variable are taken on the
entities of a population or sample, the resulting values are
made available to the researcher or statistician as a mass of
unordered data.
Measurements that have not been organized, summarized, or
otherwise manipulated are called raw data.
Unless the number of observations is extremely small, it will be
unlikely that these raw data will impart much information until
they have been put into some kind of order.
2. MEASURES OF CENTRAL TENDENCY
Measures of central tendency convey information
regarding the average value of a set of values. They give
us information about the center of the distribution.
The three most commonly used measures of central
tendency are the mean, the median, and the mode.
These measures give us the ability to summarize the data
by means of a single number called a descriptive measure.
2.1 The Mean
The most familiar measure of central tendency is the mean. It is the
descriptive measure most people have in mind when they speak of the
“average.”
The mean is obtained by adding all the values in a population or sample
and dividing by the number of values that are added.
For example: Porcellini et al., studied 13 HIV-positive patients who
were treated with highly active antiretroviral therapy (HAART) for at
least 6 months. The CD4 T cell counts (X 106/L) at baseline for the 13
subjects are listed below.
230, 205, 313, 207, 227, 245, 173, 58, 103, 181, 105, 301, 169.
2.1 Properties of the Mean
1. Uniqueness: For a given set of data, there is one and only one mean.
2. Simplicity: The mean is easily understood and easy to compute.
3. It is affected by extreme values.
2.2 The Median
The median is that value that divides the sorted data set into two equal parts such that the
number of values equal to or greater than the median is equal to the number of values equal to
or less than the median.
If the number of values is odd, the median will be the middle value when all values have been
arranged in order of magnitude. When the number of values is even, there is no single middle
value, as such, the median is taken to be the mean of these two middle values.
Median = (n + 1)/2th observation.
For example, Find the median age of the participants presented below. 59, 61, 50, 57, 64, 65,
66, 38, 43, 57.
2.2 Properties of the median
1. Uniqueness. As is true with the mean, there is only one median for a given
set of data.
2. Simplicity. The median is easy to calculate.
3. It is not as drastically affected by extreme values as is the mean.
2.3 The Mode
The mode of a set of values is that value that occurs most frequently. If all the values are
different there is no mode; on the other hand, a set of values may have two modes (bimodal).
It’s also possible to find data sets with more than two modes (multimodal).
The mode may be used for describing qualitative data.
For example, suppose the patients seen in a mental health clinic during a given year received
one of the following diagnoses: mental retardation, organic brain syndrome, psychosis, neurosis,
and personality disorder. The diagnosis occurring most frequently in the group of patients would
be called the modal diagnosis.
3. MEASURES OF VARIABILITY
A measure of variability conveys information regarding the amount of dispersion present in a set of
data.
If all the values are the same, there is no dispersion; if they are not all the same, dispersion is
present in the data. The amount of dispersion may be small when the values, though different, are
close together.
The Figure below shows the frequency polygons for two populations that have equal means but
different amounts of variability. Population B, which is more variable than population A, is more
spread out. If the values are widely scattered, the dispersion is greater.
Other terms used synonymously with dispersion include variation, spread, and scatter.
Two frequency distributions with equal means but
different amounts of dispersion.
3.1 The Range
The range is the difference between the largest and smallest value in a set of observations.
For example, we wish to compute the range of the ages of the sample subjects from the sample
below
59, 61, 50, 57, 64, 65, 66, 38, 43, 57.
The oldest of the sample = 66.
The youngest of the sample = 38.
Range = 66 – 38 = 28.
The usefulness of the range is limited. The fact that it takes into account only two values causes it
to be a poor measure of dispersion.
3.2 The Variance
Is a measure of dispersion relative to the scatter of the values about their mean.
In computing the variance of a sample of values, we subtract the mean from each of the values,
square the resulting differences, and then add up the squared differences. This sum of squared
differences is divided by the sample size, minus 1, to obtain the sample variance.
Letting S2 stand for the sample variance, the procedure may be written in a notational form as
follows:
3.2 The Variance
The reason for dividing by n – 1 rather than n, as we might have expected, is
the theoretical consideration referred to as degrees of freedom.
Example: Compute the variance of the ages of subjects below.
43, 66, 61, 64, 65, 38, 59, 57, 57, 50.
3.3 The Standard Deviation
The variance represents squared units and, therefore, is not an appropriate measure of
dispersion when we wish to express this concept in terms of the original units.
To obtain a measure of dispersion in original units, we merely take the square root of the
variance. The result is called the standard deviation.
In general, the standard deviation of a sample is given by:
3.4 The Coefficient of Variation
When we want to compare the dispersion in two sets of data, however, comparing two standard
deviations may lead to erroneous conclusion.
It may be that the two variables involved are measured in different units. For example, we may
wish to know, for a certain population, whether serum cholesterol levels, measured in
milligrams per 100 ml, are more variable than body weight, measured in pounds.
Furthermore, although the same unit of measurement is used, the two means may be quite
different.
3.4 The Coefficient of Variation
The formula is expressed as
Suppose two samples of students yield the following results
Sample 1 Sample 2
Age 25 years 18 years
Mean weight 145 pounds 80 pounds
We wish todeviation
Standard know which is more variable,
10 pounds the weights of the1025-year-olds
pounds or the weights of the
18-year-olds.
THANK YOU FOR YOUR AUDIENCE