Explain why study statistics
Identify types of statistics
Definition of Biostatistics
The word statistics has two meanings for social scientists or epidemiologist.
First, it refers to any collection of numerical observations on an item or
aggregates such as the number of students in the Open University, number of
people who vote in a particular election or number of people suffering from a
particular disease and so on. Second, statistics refers to a specialized field of
study like mathematics or sociology. Statistics involves numbers, an
understanding of it requires some basic knowledge of mathematics. But as
social scientists or epidemiologists, we are more interested in the application
of statistics as a tool.
The concept statistics is a more generic term while the concept biostatistics is
the statistics relating to living things e.g.
Attendance at the STD clinic over a period of time.
The age distribution of deaths from HIV/AIDS over a period of time.
The number of people treated for malaria in a health centre in a month or
one year.
Since human beings are different from each other, the information collected
on two or more people is likely to be different. For example the number of
people attending STD clinic in a particular health center are not likely to be
of the same age, sex, marital status, number of partners and social
background. So also the number of people treated or died from malaria are
likely to vary from one clinic to another, from one place to another and from
one year to another.
All these descriptions suggest that measurements or observations do not
necessarily have the same magnitude and so it is subject to a lot of variation.
This variation is central to the subject matter of statistics. Statistics as a
discipline can therefore be defined as the “scientific methods” for the
collection, summarization, presentation, analysis and interpretation of data.
Why Study Statistics
We study statistics for two main reasons. First, we are expose to a
wide range of information in our lives daily. For example we are told
8
the number of candidates who sit for the Joint Admission
Matriculation board Examination every year, the number who met the
minimum requirements for admission and the number that are finally
placed in the different universities. We are told about the number of
motor accidents on our roads, the number of people who died from
such accidents and number who suffer from injuries. We also read
about the results of the distribution of certain products over a period
of time. For example it is now customary for the NNPC to publish
daily distribution of petroleum products across the country. These
information may represent the truth. It may also be untrue. Statistics
can be used to support almost anything. It is also easy to lie with
statistics and use it purposely to distort the truth. Nevertheless, it is
important to study statistics so that we can know how to evaluate
published data, when to believe them, when to be skeptical and when
to reject them.
The second reason for studying statistics is to secure employment in
both private and public sectors of the economy knowledge of statistics
is relevant for the understanding of nearly all aspect of daily lives.
An employee may be asked to record or estimate the frequency of
occurrence of certain events. For example, the number of students
who sat or who are likely to sit for a particular examination or the
number of people who are likely to purchase a particular product or
the number of people who are likely to purchase a particular product
or the number of people who are likely to vote for a particular
political party in a national election. We may also want to know the
characteristics of these people, the age, sex and other social
background.
Although statistics is a separate discipline, it is essential to the social,
biological and physical sciences because all scientists use
observations of natural phenomena, through sample surveys or
experimentation to gather facts, test hypotheses and develop theories.
Statistics has become one of the major tools of the scientific method
and is necessary for the development of knowledge in all fields of
science.
3.2 Types of Statistics
There are two types of statistics, which are important for application
in the social sciences. They are descriptive statistics and inferential
statistics.
What is descriptive statistics? Descriptive Statistics provides the
social scientists with graphical and numerical techniques for
9
describing or summarizing in a concise from complex and massive
information. For example, if 700,000 candidates applied for
admission in to the Open and distance learning Education programme
in a particular year, we might wan to know the characteristics of the
candidates of the applicants. We might wan to know their social
background, their age, sex and their state of origin or their local
government areas, descriptive statistics provides us the techniques of
summarizing such mass of data.
Descriptive statistics can also be used to describe the characteristics
of the people treated for malaria in a clinic over a period of time or
women that attending ante-natal clinic. Descriptive statistics can also
be used to make comparisons between two characteristics measured
on every person in a group or between groups using the same
characteristics. We might wan to examine the relationship between
UME score and performance of students or class of degrees on
graduation or the socioeconomic backgrounds of male and females at
the time of their admission.
Descriptive statistics is important in preliminary analysis of data prior
to the use of more rigorous statistical analysis.
What is inferential statistics? Most social scientists study large
groups in their investigations and it is often impossible to reach every
member of the population. What social scientists then do is to select a
fraction or a sample of the large population. Let us go back to the
example of the 700,000 applicants for admission in the Open
University programme. Suppose we wish to estimate the proportion
of all the 700,000 applicants who are qualified for admission and who
are willing to take up the offer if given admission. One way of doing
this is to contact all the 700,000 applicants. It will be time consuming
and almost impossible to visit and interview all the 700,000
candidates. An easier and more efficient approach is to select a
fraction or a sample of them. We can do a random sample of 2,500 of
them or even less depending on the resources available for the
exercise. We can then collect such information from the sample and
use it to estimate for the total population.
Inferential statistics are useful in predicting election results. For
example, we might want to predict the outcome of local government
election or presidential election, it is also not possible to contact all
eligible voters even in a local government area, a sample of the
registered voters is then taken and questions posed to them to know
their voting behaviour. The results are then analyzed to predict
10
possible outcome of the elections. Survey techniques are widely used
in opinion surveys today and inferential statistics is the man tool for
handling such information.
4.0 Conclusion
Statistics is an important tool for the social scientists. It has dual meaning for
social researchers. It refers to numbers such as rates of deaths, number of
applicants to the Open University and number of students per course.
Statistics is also a distinct field of study. Statistics is essential to the social,
biological and physical sciences because all scientists make use of
observations of natural phenomena through surveys or experimentation, to
gather facts, test hypotheses and develop theories. Statistics help social
scientists in the understanding, objectivity and predictions of social
phenomenon.
5.0 Summary
In this Unit, students have learnt about the definition of statistics in general,
the distinction between general statistics and biostatistics. We have also
discussed why we study statistics, the two types often used in social sciences
and use of statistics.
6.0 Exercise
1. Define statistics and biostatistics.
2. Distinguish between descriptive and inferential statistics.
3. Discuss why we study and use of statistics.
7.0 Further Readings and Other resources
Blalock, H.M. Social Statistics, 2nd ed. New York: McGraw-Hill, 1972
7.0 Further Readings and Other Resources
I.O ORUBULOYE AND FOLAKEMI Oguntimehin. The study of Human
Populations.
Centre of Population and health Research, Ado-Ekiti, Nigeria, 2000.
11
UNIT 2
BIOSTATISTICS: VARIABLES AND MEASUREMENT
Table of contents
1.0 Introduction
2.0 Objection
3.0 Definition of Variables
3.1 Measurement of Variables
3.2 Scale of Measurement of Variables
4.0 Conclusion
5.0 Summary
6.0 Exercise
7.0 Further Readings and Other Resources
1.0 Introduction
This Unit is a follow-up to Unit 1 which focused on the definition of
biostatistics, types of statistics and use of statistics. This Unit will discuss
definition of variables and examine the different scale of measurement:
nominal, ordinal, interval and ratio.
2.0 Objectives
At the end of this unit, students should be able to do the following:
Define the concept variables
Examine the nature of measurement
Identify the different types of scales of measurement
12
3.0 Definition of variables
In Unit 1 we emphasized that social scientists like other scientists make use
of observations of natural phenomena. Through surveys or experimentation,
to gather facts, test hypotheses and develop theories.
When such observations on the same phenomenon remain constant in
successive trials, the phenomenon is called constant and when they vary from
trial to trial, the phenomenon is called a variable. In general, two types of
variables are distinguishable in statistics. These are quantitative variable
whose observation vary in magnitude from trial to trial or has numerical values
and qualitative variables whose observations vary in kind but not in degree or
are mainly categorical with no notion of numerical strength
Examples of quantitative variables are age, size of the class of the Diploma in
HIV Education and Management or size of the student population of the
Open University. Qualitative variables on the other hand can be sex,
education, religion and ethnic group. Quantitative variables can further be
classified into discrete variable and continuous variable. When observations
on a quantitative variable can assume only a countable number of values or
whose values are distinct and separated or take whole numbers the variable is
called a discrete variable. Examples of discrete variables are parity and age
last birthday. When observations on a quantitative variable can assume any
one of the countless numbers of values in a line interval or the value can
assume decimal and fractions it is called a continuous variable. Examples of
continuous variables are height, weight and exact age.
3.1 Measurement of Variables
It is generally assumed that human social behaviour is so complex and
elusive that measurement is meaningless or impossible. Measurement
always take place in a more or less complex situation in which
innumerable factors may affect both the characteristics being
measured and the process of measurement itself.
According to Selltiz, Jahoda, Deutsh and Cook, the basic problem in
evaluating the results of any measurement is that of defining what
shall be considered as variations due to error in measurement. They
went further to consider some possible sources of differences in
scores among a group of individuals. They are:
Differences due to transient personal factors
13
Differences due to statistical factors.
True differences in the characteristic, which one is attempting to
measure
True differences in other relatively stable characteristics of the
individual which affect his score.
Differences due to variations in administration.
Differences due to sampling of items.
Differences due to lack of clarity of the measuring instrument
Differences due to mechanical factors.
Differences due to factors in the analysis (processes of scoring,
coding, and computation.)
3.2 Scales of Measurement of variables
There are four scales on which variables are measured:
Norminal Scale: A norminal scale classifies qualitative objects into
categories by name. This is the simplest scale and it is mainly
classificatory. It has no notion of numerical magnitude, as categories
are mutually exclusive and unordered. Examples of norminal scale
are ethnic group, sex, religion, marital status, level of education etc.
all qualitative variables are measured on a norminal scale. For
example the qualitative variable education can be divided into six or
more groups such as no schooling, koranic, primary only, secondary,
polytechnic and university. For purpose of identification norminal
scales are assigned numerical values to indicate levels on a norminal
scale for qualitative variable as follows: 1 for no schooling, 2 for
Koranic,3 for primary only, 4 for secondary, 5 for polytechnic and 6
for university.
Ordinal Scale: An ordinal scale incorporates the features of a
norminal scale and the additional feature that observations can be
ordered or ranked from low to high. That is, there is no notion of
numerical magnitude in the categories but are well defined, ordered,
and ranked in a way that shows that one level is higher than the other.
Exmples of norminal scales are the grading of academic staff or the
Open University. On an ordinal scale, the academic staff can be
ranked as follows: Professor 1, Associate Professor 2, Senior
Lecturer 3, Lecturer I 4, Lecturer II 5, Assistant Lecturer 6 and
Graduate Assistant 7.
Interval Scale: An interval scale contains all features of both
norminal and ordinal scales and the additional feature that makes it
possible to specify distances between levels on the scale. The
14
numerical distances between any two points on interval scale are
known and can be calculated. An example of an interval scale is the
average scores of students who sat for the UME in all the secondary
schools in Abuja in 2002. In an interval scale it is possible to rank the
average scores from lowest to highest and state exact distances
measured in score units on the performance between the schools.
Ratio Scale: A ration scale contains all the features of norminal,
ordinal and interval scales and in addition ratios can be formed with
levels of the scale. Nearly all quantitative variables are measured on
the ratio scale. Examples of ration scales are morbidity rates, death
rates and population growth rates.
4.0 Conclusion
Two types of variables are distinguishable in biostatistics. They are
quantitative variables whose magnitude has numerical values and qualitative
variables, which are mainly categorical with no notion of numerical strength.
Qualitative variables are always measured on a norminal scale, while ordinal,
interval an ration scales are appropriate for quantitative variables. The four
scales differ in their ability to quantify data norminal scales allow for no
qualitative interpretation, while ratio scales possess the most quantitative
sophistication.
5.0 Summary
In this Unit, we discuss types of variables. A distinction is made between
quantitative variables and qualitative variables. The Unit also examines
measurements and scales of measurement for qualitative and quantitative
variables. The measurement scales are norminal scale whose levels are
identified by name only, ordinal scale whose levels can be identified by name
and can be ranked according to their relative magnitudes, interval scale which
combines the functions of both norminal and ordinal scales, and in addition
distances can be determined between levels of the scale, and ratio scales
which combines the functions of the other scales and in addition ratios can be
formed with levels of the scale. Norminal scale is the least advanced as a
measurement instrument while ratio scale is the most advanced.
6.0 Exercise
Discuss the types of variables in social sciences
15
Examine the possible sources of differences in scores among a group
of individuals.
Identify and discuss scales of measurement of qualitative and
quantitative variables.
7.0 Further Readings and Other Resources
Blalock, H.M. Social Statistics, 2nd ed. New Yourk: McGraw-Hill, 1972.
Selltiz, C., M. Jahoda, M. Deutsch and S.W. Cook. Research Methods in
Social Relations, Revised One-Volume Edition. London: Methuen & CO.
Ltd. 1959.
16
UNIT 3
BIOSTATISTICS: MEASURES OF LOCATION
Table of Contents
1.0 Introduction
2.0 Objectives
3.0 Definition of Concepts
3.1 Arithmetic mean
3.2 Median
3.3 Mode
4.0 Conclusion
5.0 Summary
6.0 Exercise
7.0 Further Readings and Other Resources
1.0 Introduction
This Unit is a follow-up to Unit 2, which focused on types of variables and
scales of measurement. This Unit will discuss measurement. This Unit will
discuss measures of location and examine the differences between the various
measures of location.
2.0 Objectives
At the end of this unit, students should be able to do the following:
Define the measures of location
Identify the different types of measures of location
17
Understand the various statistical usage of the measure
3.0 Definition of Concepts
When quantitative data are collected on the characteristics of the population,
summary figures are required to enable good use of the data. One of the
techniques of summarizing data is the measure of location otherwise known
as the measures of central tendency. Measure of location is the Arithmetic
mean, the Median, the Mode, the Harmonic mean, the Geometric mean and
the weighted mean. Of the measures of location, the Arithmetic mean, the
Median and the Mode are the most commonly used in social and
epidemiological studies. These are discussed in details below.
3.1 The Arithmetic Mean
An arithmetic mean or the average, in simple arithmetic, is the mean
or average of a group observation. It is a measure of location of the
group and a singular number that enables researchers or observers to
assess the position in which the group is located with respect to other
groups. For example if a group of 15 students sat for a competitive
examination in Biostatistics and obtained the following marks 58, 34,
57, 54, 44, 57, 61, 21, 36, 57, 45, 47, 38, 48,51, the total marks scored
by the group are added up and divided by the total number of the
group. The figure obtained is the arithmetic mean for the group. The
procedure is as follows:
58+34+57+54+44+57+61+21+36+57+45+47+38+48+51=708.
When the total sum of 708 is divided by the number of students who
sat for the examination, it gives a value of 47.2. Therefore the
arithmetic mean or the average score for the group is 47.2
3.2 Median
The median is the middle observation or scores when the scores are
arranged in the other of magnitude. This arrangement can be in the
ascending or descending order. The median is the most central or
the figure at the centre of the observations or scores. Let us return
to scores of the 15 students who sat for the statistics examination.
The scores can be arranged in ascending order as follows:
21, 34, 36, 38, 44, 45, 47, 48, 51, 54, 57, 57, 57, 58, 61 or in
descending order as follows:
61, 58, 57, 57, 57, 54, 51, 48, 47,45,44,38, 36, 34, 21. The figure in
the middle of the observations or score is 48. Therefore the median
for the scores is 48. When two observations or scores fall in the
18
middle or the number of observations even, the average of the two
observations in the middle is taken as the median.
3.3 Mode
The mode is the observation that occurs most frequently in a series of
observations. This is describable when certain figures are frequently
repeated in series of observations. From the example of the scores of
the 15 students who sat for the statistics examination as shown above,
the observation that occurs most frequently is 57 scored by three
students. Therefore, the mode of the 15 observations or the scores is
57. It is also possible for two observations to occur most frequently,
when a situation like this occurs, we have a bi-modal distribution.
4.0 Conclusion
In sample surveys or when we want to evaluate the performance of students
in competitive examination, the measures of location are useful in descriptive
analysis. Of all the measures of location, the arithmetic mean is the most
useful as it is easy to calculate and amenable to mathematical calculation.
However, certain situations arise when the other measures provide better
summary measures. For example abnormal individuals among the observed
may have an exaggerated effect on the mean. From the example of the scores
in statistics examination, six students scored below the mean of the group,
while eight other students scored above the mean by interpretation the mean
is an indicator of the performance of the group. The mean is not useful in the
situation where the observations are markedly skewed. Some observations
may be too low or too high compared with majority of observation. In such
situation the median provide a more comfortable measure of location.
The median is unaffected by abnormal observations, but it is unsuitable for
work demanding mathematical calculation in the sense that if figure are
elongated or reduced, the median will be altered.
The mode has a limited use because it is not easy to determine with precision
when observations fall into groups. Although the mode is unaffected by
abnormal individuals, the observation that occurs most frequently in series of
observations may fall well below or above the group average. This can lead
to erroneous conclusions about the group.
5.0 Summary
In this Unit, we discuss the various measures of location or measures of
central tendency, the mean, median and mode. We also discuss the procedure
19
for calculating the various measures and their relative advantages and
disadvantages.
6.0 Exercise
Identify and discuss the various measures of location
What are the merits and demerits of the mean, median and mode as
measures of location:
Twenty students obtained the following marks in an examination:
60,69,56,75,57,65,54,51,47,65,45,38,61,44,36,48,54,25,33,37.
(a) Calculate: (i) the mean score
(ii) the median
(iii) the mode
(b) Discuss your findings
7.0 Further Readings and Other Resources
L.Ott, R.F. Larson and W. Menden-Hall. Statistics: A Tool for the Social
Sciences. Third Edition, Duxbury Press, Boston, Massachusetts, 1983.
20
UNIT 4
BIOSTATISTICS: MEASURES OF VARIABILITY
Table of Contents
1.0 Introduction
2.0 Objective
3.0 The Range
3.1 Variance
3.2 Standard Deviation
3.3 Coefficient of relative variation
3.4 Interquartile Range
4.0 Conclusion
5.0 Summary
6.0 Exercise
7.0 Further Readings and Other Resources
1.0 Introduction
This unit is a follow-up to Unit 3, which discussed the various measures of
location, the mean, median, and mode. The three measures of location only
locate the centre of a distribution of data but tell us nothing about the spread
or variation of the scores. This Unit will discuss measures of variability and
their use in social investigation.
2.0 Objectives
At the end of this unit, students should be able to do the following:
Define the measures of dispersion
21
Identify the different types of measures of variability
Understand the various statistical usage of the measures
3.0 The Range
The range is the simplest measure of data variation. It is defined as the
difference between the largest and smallest scores of ungrouped data. When
the data is grouped in classes, the range is defined to be the difference
between the upper real limit of the highest class and lower limit of the lowest
class. For example, if the scores for ten students in an examination are
45,47,48,52,54,57,64,69,72,75, the range is the difference between 75, the
largest score and 45, the smallest score. In this case the range of the scores is
75 minus 45 which is equal to 30. The range can be used to determine
variations in weight, height, salaries, wages, and temperature and rainfall
e.t.c. The range is a good measure of variation when dealing with small
sample.
3.1 Interquartile Range
The range is simple to define and calculate but not always a
satisfactory measure of variability. Two observations can have the
same range but could defer greatly in variation because any change in
the extreme observations can alter its value. Calculating interquartile
range minimizes this instability.
Therefore interquartile range is defined as the differences between the
upper and lower quartiles of a set of observations. It is the set of
observations that fall between the lower quartile and upper quartile of
large set of data.
The range, the interquartile range and the median provide fairly good
descriptions of a set of observations. Nevertheless, they do not say
much about the variations in the set of observations. The variance
and the standard deviation are suitable for the understanding of the
variations in a set of observations.
3.2 Variance
The variance for an ungrouped data in a set of n scores is the sum of
squared deviations of the scores about their mean, divided by n-1.
The varioance is denoted as follows:
22
Σ ( x – x )2
n -1
Where n is the population, and the sum of the squared deviation is
Σ ( x – x )2
The variance is rarely used in general data presentation but most
frequently used in technical and professional reports.
3.3 Standard Deviation
The standard deviation of a set observation is defined as the positive
root of the variance. It is devised to ensure that the unit of the
measurement of variability is the same as the unit of the observations.
Like in the variance its unit is the square of the units of each
observation. The standard deviation like the Arithmetic mean
employs each of the observations in its calculation. It shows how
distant each observation is to the Arithmetic mean. For example if the
scores of five students in an examination are 6,5,8,7 and 4, the sum of
the squared deviations of each score from the mean x is
2 2 2 2 2.
(6-6) +(5-6) +(8-6) +(7-6) +(4-6)
Therefore the variance is 0+1+4+1+4 = 10 = 2.5
5.1 4
The standard deviation is =
Variance = 2.5
3.4 Coefficient of Relative Variation
It is often desirable to compare the variability of observations made
on different items or variables, which are not measured, in the same
unit. The procedure for doing this is often referred to as the
coefficient of relative variation defined as the standard deviation
expressed as a percentage of the mean x. The comparison can be
effected by calculating the coefficient of variation in each observation
and comparing their magnitudes.
The coefficient of variation can be obtained as follows:
23
Standard Deviation x 100
Arithmetic Mean
That is expressing the standard deviation of observations of each
variable as a percentage of its arithmetic mean. The variable with the
higher coefficient of variation now has the highest variability.
4.0 Conclusion
The mean, median and mode are three measures of location of central
tendency. They are measures of location of the centre of a distribution of
quantitative observations. They tell the observers little about the spread or
the variation of the observations. The range, the interquartile range, the
variance and the standard deviation provide measures for the description of
quantitative data. The variance and the standard deviation of a set of
quantitative observations provide the information that enables social
researchers to compare variability between sets of observations and interpret
the variability of a single set of observations.
5.0 Summary
In this Unit, we discussed measures of variability, the range, the interquartile
range, the variance, the standard deviation and the coefficient of relative
variation. These are measures useful in the description of qualitative sets of
observations. The variance and standard deviations of a set of observations
are particularly useful in comparing variations between two sets of
quantitative observations and interpretation of a single set of observation.
6.0 Exercise
A group of 25 students sat for a competitive examination for admission into
the HIV/AIDS Diploma Programme of the Open University and score the
following marks:
58,34,57,54,44,57,61,21,36,57,45,47,38,48,51,60,61,65,51,75,58,54,51,45,25
.
Calculate
(i) the mean, the median and the mose
(ii) the range, the interquartile range, the variance, the standard deviation
and
the coefficient of relative variation
(iii) write an essay on your findings.
7.0 Further Readings and Other Resources
L.Ott, R.F. Larson and W. Mendenhall. Statistics: A Tool for
24
the Social Sciences. Third Edition, Duxbury Press, Boston,
Massachusetts, 1983.
UNIT 5
BIOSTATISTICS: SOURCES OF STATISTICS
Table of Contents
10.0 Introduction
11.0 Objective
12.0 The Census
3.1 Vital Registration
3.2 Population Register
3.3 Migration Statistics
3.4 Sample Surveys
3.5 Other Sources of Statistics
4.0 Conclusion
5.0 Summary
6.0 Exercise
7.0 Further Readings and Other Resources
1.0 Introduction
This Unit is a follow-up to Unit 4, which discussed the measures of
variability, the range, interquartile range, variance, standard deviation and
coefficient of relative variation. This Unit will discuss the various sources of
statistics in social and health investigation.
2.0 Objective
At the end of this unit, students should be able to do the following:
25
Identify the different sources of statistics
Determine the problems associated with the various sources
Understand the ways in which such data are collected
3.0 The Census
The census is the main source of population statistics in many countries. The
census can be defined as the total process of collecting, compiling and
publishing demographic data about every inhabitant of a particular territory.
It is a sort of social photograph of certain conditions of a population at a
given moment.
Periodicity and universality are important characteristics of censuses.
Censuses must be taken at a well-defined period usually every ten years and
must also include all members of the defined territory.
The concept of census differs from place to place. There is a de facto
approach, whereby each individual is counted at the place where he was
found at the time of a census or the de jure approach, whereby people are
recorded according to their usual place of residence. Nigeria and Britain are
examples of countries that practice de facto approach while the United States
of America practices the de jure approach. In Brazil both approaches have
been adopted in past censuses. It is not easy to say which is more superior to
the other. The success of each of the methods depends upon the
circumstances of individual country. In Nigeria for example, population
mortality, multiple residence and homelessness of a significant number of the
population made the de jure approach less satisfactory than the de facto
approach. During the 1991 Nigerian Population Census, slightly more than
one million for about 1 per cent of the population were reported as nomadic,
homeless and transient population.
Items covered by Censuses Schedules in Nigeria 1951-1991
1952/53 Year 1973
1963 1991
(i) Name X X X X
(ii) Relationship to X X X X
head of X X X X
Household X X X X
(iii) Sex X X X X
(iv) Age X X X -
26
(v) Marital Status X
(vi) Religion X -
(vii) Ethnic origin / X X -
Nationality X X
(viii) Home Place / X X X X
Birth Place X X
(ix) Language /
Spoke
(x) Literacy /
Education
(xi) Employment
Status
(xii) Occupation
(xiii) Industry
Sources: (1) Federal Office of Statistics, Population Census of
Nigeria, Lagos, 1952, 1963
(2) National Census Board, the Report of the 1973
population Census Board of Nigeria, Lagos, 1973.
(3) National Population Commission, Census 1991 Summary,
Lagos, 1994.
Coverage of censuses
The basic statistics normally obtained in a complete census are sex, age
residential status, birthplace, employment, education, ethnic origin,
religion and marital status. These statistics vary from place to place, from
time to time and from census to census.
The United Nations recommends that the census should seek the
following information:
Total population
Age, sex and marital status
Place of birth, citizenship or nationality
Mother tongue, literacy and educational qualifications
Economic status
Urban or rural domicile
Household or family structure
Fertility
27
How census began
Early census counts or enumeration were often related to taxation. The
word census originated from the Latin censere meaning to value or tax (as
in China). Another reason for census was military service. The ancient
Greeks were known to count the numbers of adult males in times of war
and of the general population when food was in short supply.
In the United States of America, early censuses were for the
determination of political representation. Today, census information is
confidential and used for statistical analysis only. Data on individuals are
not published. People are less concerned that the census is connected
with taxation. However, in the developed countries, many people are
worried about the kind of questions asked during censuses and are
concerned about maintaining their privacy.
The first modern census, that is, a continuing complete count taken
accurately at regular intervals, began in Sweden in 1749. Norway and
Denmark conducted general enumeration in 1769. In the United States,
local censuses were conducted in Virginia in 1624, in Connecticut in
1756, and in Massachusetts in 1764. The 1787 American constitution
made provision for a national census every ten years. This opened the
way for the first nationwide census conducted in 1790. The census was
conducted for a period of 18 months at a cost of $0.01 per capita. The
census put the population of continental USA at 3.9 million.
By the turn of the 19th century, most European countries had begun to
hold modern-type censuses. The first modern census took place in
England in 1801, Belgium in 1829, France in 1835, Japan in 1873, India
in 1881, Egypt in 1897 and in Russia 1897. Although China is rich in
historical population statistics dating back to the Zhou Dynasty,
nationwide censuses were conducted only in 1953, 1964, 1982 and year
2000. The 1982 census was the largest ever executed anywhere in the
world. China is one of the countries that have conducted a population
census in the new millennium. A total of six million enumerators were
involved in the census.
In Nigeria, the first census of the colony of Lagos was taken in 1866. A
rough estimate of the population by sex was made for the northern parts
of the country in 1911. In the Southern parts the 1911 census was bases
partly on estimates and partly on enumeration. The 1952/53 census was
generally considered the first modern census conducted in Nigeria.
28
Subsequent head counts were undertaken in 1962, 1963, 1973 and 1991.
The 1962 and 1973 censuses were rejected on account of irregularities.
The 1991 population census was the less controversial census ever
conducted in Nigeria. Another population census scheduled to take place
in year 2001 has now been postponed till year 2004.
Between 1955 and 1964 an estimated 68 per cent of the world’s
population were covered by censuses. The coverage was almost universal
in Europe 97 per cent, but was 62 per cent in Africa and 53 per cent in
Asia. The position has since changed in Asia because China, the largest
single group in that region, has joined the club of countries which have
conducted a population census in the 20the and 21st centuries.
Problem that militate against accurate head counts vary from country to
country and from time to time. The problems are more acute in the
developing countries. Such problems are often organizational, physical,
technical and attitudinal.
In the conduct of population census, the officially recognized body in
charge of the census operations has encountered certain problems. Those
identified are:
Organization
Engagement of functionaries hastily put together to conduct the
census in their respective states without adequate training and relevant
experience.
Lack of sufficient and experienced specialists to organize the conduct
of the census.
Insufficient training for the different cadres of census functionaries.
Lack of uniformity in the execution of the actual head count.
Inadequate publicity and enlightenment of census operations.
Time allocated for the planning and preparation for the census not
adequate for successful operation of the actual head count.
Inconsistency in the approach to the operations of head count: people
who are expected to be counted in their households were counted at
random at road blocks, market places, churches, mosques thus
resulting into a double count and negating the concept of a de facto
approach.
Poor census logistics.
29
Physical
Inaccessibility of all parts of the country throughout year round
because of the difficult terrain.
Poor communication and transportation facilities.
Lack of office and storage facilities at both the states and National
Census Offices.
Technical
Most enumeration areas are poorly demarcated.
Lack of adequate and up to date base maps.
Difficulties in the distribution and retrieval of completed census
questionnaires.
Attitudinal
Politicization of census operations which often lead to double
counting and over enumeration.
Lack of cooperation from the part of the respondents in respect of
questions that are personal or conflicts with traditional beliefs and
values
Lack of patriotism and sense of responsibility on the part of census
enumerators who collude with members of the public to inflate census
figures.
Although most of the problems reported during the 1973 were
minimized during the 1991 census, allegations of double counting
and under enumeration were reported, while politicization of the
exercise was still a major problem. Nevertheless, the experience
gained in the last two censuses is likely to lead to a significant
improvement in the proposed year 2004 census exercise.
3.1 Vital Registration
While censuses describe the state of the population at a fixed point in
time, vital statistics are a major source and a movie-camera record of
the incidence of births, deaths, marriages, annulments, separations
and adoptions. These events recorded and compiled at the time of
occurrence or near their times of occurrence. In many countries of the
world, particularly the developed ones, such registrations are
compulsory and legal. In most of the developing countries, vital
30
statistics are non-existent. Where they exist, the data are not available
in sufficient quantity and quality for any meaningful usage.
The registration of vital events started in the 14th century in Europe
where local records or parish registers were kept by some churches.
Later, the registration of vital events became state affair. However,
up till today, the church authorities still have responsibility for
baptisms, burials, and weddings, rather than births, deaths and
marriages.
Civil and State registration systems developed in many parts of the
world during the 18th and 19th centuries. Europeans who were not
members of an established church were omitted from vital statistics
until national civil registration was established in Europe. Civil
registration was established in Norway in 1685, Sweden in 1756,
France in 1792, Belgium in 1796, England in 1837 and in the United
States of America in 1885.
Vital registration was first initiated in Lagos Colony in 1867 and
made compulsory in 1908. A compulsory national vital registration
system decree was passed in 1979 and the 1979 Federal Republic of
Nigeria’s Constitution charged the National Population Commission
with the responsibility of establishing and maintenance of the
machinery for continuous and universal registration of births and
deaths throughout the country. The nationwide vital registration
system is yet to take off. In limited centres where such exercises
exist, the extent of coverage and the reliability of the information
gathered are poor.
In many developed countries, a wide coverage of the registration of
vital events has been achieved. In the developing countries, the
difficulties and costs of establishing a complete registration system
are so enormous that vital registration is unlikely to provide reliable
demographic data in the foreseeable future.
Coverage of Information
The coverage and type of information derived from vital registration
system vary greatly between countries. In some countries, over 50
different items of information may occur on the statistical report
forms of births, deaths, marriages and divorces.
In the cases of births, deaths and marriages, the following minimum
information are recommended by the United Nations.:
31
Minimum Information Recommended for Vital Registration by
the United Nations
Births Deaths Marriages
Date and place Date and place Date of marriage
of of Name and
birth Death surname
Name (if any) Name, surname Ages
Sex Sex Age Marital condition
Name and Occupation Occupations
Surname of Cause of death Residences at
father Marital status time of
Name and (of
maiden deceased) marriage
surname of Age of the Name, signature
mother surviving spouse, and description of
Father’s if any, of the person
occupation Deceased. solemnizing
Signature, the marriage
Description, and
residence of
information
3.2 Population Registers
The population register is a government data collection system in
which the demographic and socio-economic characteristics of all or a
part of the population are continuously recorded. According to the
United Nations. The idea of population register is expected to
provide for the continuous recording of the characteristic of each
individual and of information on the vital events that occur to the
individual.
Population registers were first kept in ancient China, and were later
adapted by the Japanese. In a number of European countries, a
continuous population register is maintained to served a number of
legal and administrative functions such registers may provide a
continuous flow of data on vital events or a cross-section population
census. It may also provide a direct information on internal
movements of population.
Universal population registers, which cover the whole population, are
less common than censuses or vital registration statistics. Sweden has
32
the most well established population record in the world. In all, only
a few countries have registers with almost complete coverage used for
demographic purposes. Out of these countries only four: Taiwan,
Israel, Korea and Thailand are outside Europe. The maintenance of
population registers requires a lot of resources and a reasonably
accurate address system and a literate population. Keeping the
register often means that everyone has to carry some kind of
identification mark, and in some countries this is though to infringe
on the freedom of the individual. People who move within the
country are expected to register at the place of destination as well as
give notice at the place of origin of the movement.
3.3 Migration Statistics
In the past, much of the information on internal migration has been
obtained from comparison of successive census enumeration after
allowance had been made for natural increase. Modern censuses now
contain information on change of residence and place of birth, which
facilitates migration analysis. It is now possible to obtain information
on volume, frequency, direction and characteristics of internal
migrations.
Internal migration statistics can also be obtained through sample
surveys where information can be sought about the characteristics of
migrants and the reasons for movement. In general internal migration
data remain among the accurate of all demographic data.
International migration statistics are derived from the records of
arrivals and departures at the international boundaries. Data are
drawn from a variety of sources: frontiers control, port statistics,
passport statistics of certain categories of travellers, local population
registers, work permits for aliens. Persons crossing international
boundaries usually have to produce their passport, and to complete
various forms on arrival and departure.
Statistics of international migrations are available for only a small
number of countries. Each country collects only the data, which it
needs for its own administrative purposes. It has always been
difficult to record all international movements by all countries.
Several millions of Mexicans are known to have illegally crossed the
border to live in the United States of America. The problem is more
acute within the West African region where most of the borders are
artificial. In the fifties and early sixties, several thousands from
Nigeria, Togo and Upper Volta (now Burkina-Fasso) were known to
33
have migrated illegally to Ghana and Ivory Coast. In the 1980’s
several thousands of West African citizens particularly from Ghana
has also illegally migrated into Nigeria. Since the economic
difficulties began in Nigeria in 1987, millions of Nigeria have
migrated to other countries in the world especially to Europe and
United States of America, and recently to North and South African.
A significant proportion of those who emigrated from Nigeria arrived
at their destinations as illegal migrants.
Their host countries often do not have accurate statistics on them,
hence they do not appear in the migration statistics of both their home
and host countries.
Coverage of information on international migration
The information contained in the arrival and departure cards varies
from country to country. Most countries normally ask the following
minimum questions:
Information Contained in Immigration and Emigration Records
Immigration Record Emigration Record
Name Name
Sex Sex
Age Age
Occupation Occupation
Address at point exit Address at point exit
Address at destination Address at destination
Port of embarkment Port of embarkment
Mode of transportation Mode of transportation
Duration of stay Duration of absence
Reasons for entry Reasons for departure
Signature and date Signature and date
3.4 Sample surveys
It is expensive and time consuming to interview everyone in a
country, hence demographers and other social scientists often use
sample surveys to acquire information about a segment of the
population on which they make generalizations about the entire
population.
34
Sample surveys are frequently used to test the accuracy of census and
registration data or collect vital statistics if registration is in adequate
or non-existent. In the United States, the ten-year inter-censual period
are filled with several surveys on demographic issues usually
conducted by the Bureau of the Census and other organizations. It is
customary to ask only a few questions from the entire population
during censuses, while most items were administered to a 25 per cent
sample of the population.
In many developing countries of the world where census statistics are
incomplete or unreliable sample surveys often provide necessary
information, which can be used to estimate demographic parameters.
In Nigeria, the 1965-66 Rural Demographic Sample Survey was
aimed at providing information that could not be obtained through the
census of 1963. Similar surveys have recently been conducted to
provide necessary benchmark data for estimating demographic
parameters.
The World Fertility Survey(WFS) which ran through the 1970’s, has
been described as the largest single social science research project
ever undertaken in the world since the inception of demographic
sample surveys. The project involved some 350,000 women in 40
developing and 20 developed countries throughout the world. The
primary aim of the project is to assist a large number of interested
countries, particularly the developing ones, in carrying out nationally
representative, internationally comparable, and scientifically designed
and conducted surveys of human reproductive behaviour. Findings of
the investigation are now available for nearly all the countries that
took part in the project.
From the mid-1980 to late 1990’s Demographic and Health Surveys
(DHS) were conducted in several countries in the world, mostly in the
developing countries to generate data for estimating demographic and
health parameters. The Nigerian component of DHS was undertaken
in 1990 and 1999. Data were collected from 8,999 households and
complete interviews were conducted with 8,781 women aged 15-49
years across Nigeria in 1990. The 1999 Nigeria demographic and
Health Survey like that of 1990 is also a nationally representative
survey of 8,199 women age 15-49 years and 3,082 men age 15-64
years. The 1999 survey was designed to provide information on
levels and trends of fertility, family planning practice, maternal and
child health, infant and child mortality, and maternal mortality, as
well as awareness of HIV/AIDS and other sexually transmitted
35
diseases and female circumcision. The results of demographic and
Health Surveys are now being used for planning purposes.
Problems of sample surveys
Sampling consists of selecting a component or a segment to represent
the entire population or a particular section of it. This process of
selection creates certain problems such as sampling errors and the
difficulty of making a sample truly representative. A sample contains
a small number of people and it is often impossible to make
generalizations about the entire populations. If a researcher is not
adequately trained in statistics and research methodology, he may find
it difficult to achieve representativeness in studies based on sampling.
If the sample is not properly designed, it will save money, energy and
enables the researcher to pay more attention to cases that are often
neglected during censuses.
Inspite of all the methodological and the unrepresentativeness of
sample surveys, they provide more detailed and high quality
information than a census because more time and effort can be spent
over each interview. A census question may indicate the number of
children each woman has details about each birth and pregnancy will
require several probing questions which are only possible from
sample surveys.
3.5 Other Sources of Statistics
Administrative record systems of both private and public may serve
as sources of demographic data. Such records include, the various
social security programmes, which contained information on old age
and survivors insurance, unemployment compensation and
employment services. In the United States where such records are in
sufficient quantity and quality they provide valuable demographic
data.
The national identification system as well as the voting register may
provide excellent demographic data for a large segment of the
population of a country particularly the adult population if they are
carefully kept. These records are available in most of the developed
countries and can be used to generate demographic information.
In most of the developing countries, social security programmes and
national identification systems do not exist. The voting registers that
36
exist in some developing countries contain a lot of irregularities and
multiple entries that render them useless for any meaningful
demographic analysis.
4.0 Conclusion
Nigeria is deficient in data and the census, which is the main source of
information in many parts of the world has not been conducted on a regular
basis. By the normal tradition of censuses, another census was due in Nigeria
in year 2001 but for certain reasons, it has now been postponed till year 2004,
two years behind schedule. Vital registration that could be used as a
substitute has no national coverage. Most of the statistics being used for
health planning are derived from sample surveys that are of limited value.
5.0 Summary
In this Unit, we have discussed the various sources of statistics, how they are
generated, the problems associated with the collection of data from the
various sources, and their relative advantages and disadvantages.
6.0 Exercise
a.
Identify and discuss the various sources of statistics in your country
Discuss the history of censuses and vital registration in Nigeria
Enumerate the items normally covered by census and vital registration
schedules
Discuss the various stages, which you will adopt in population census of
your community.
37
UNIT 6
BIOSTATISTICS
Table of Contents
1.0 Introduction
1.0 Objectives
3.0 Use of Census data
3.1 Use of Vital Registration Data
3.2 Use of Sample Survey data
3.3 Use of Population Register and Migration Statistics
Conclusion
Summary
Exercise
Further Readings and Other Resources
1.0 Introduction
This Unit is a follow-up to Unit 5 which focused on the various sources of
statistics in social and health investigation. This unit will discuss the various
uses of statistics, the problems and the limitation.
2.0 Objective
At the end of this unit, students should be able to do the following:
38
Identify the different uses of statistics
Determine the problems associated with the various uses
Understand the limitations in the use of the various sources
3.1 Use of Census Data
Census statistics are the basic data required for planning,
administrative and research purposes. There are two ways in which
population components may enter into the planning process: for the
distribution of goods and services and for the supply of the required
manpower to administer them. Statistics are needed in the planning
for education, health, housing, and employment as well as the demand
for food and other essential services.
Educational
In educational planning the knowledge of the population the school-
going age and sex is essential. The size of the population of children
one year old is needed for planning for those who will be subsequently
admitted into the day care centres, nursery schools and primary schools
at age one, two and six years respectively. Data on population age- 11
years are needed for the development of primary school and for
forecasting secondary school aged population; population aged 12 – 17
years and 18 – 24 years are useful for the development of secondary
school and tertiary levels of education respectively.
Health
To plan for the number of persons per doctor, nurse and other
paramedical personnel, the number of persons per hospital bed, requires
population data on the size, age and sex distribution of the population in
the country.
Health needs of the populations tend to differ according to their age and
sex. The knowledge of the population age 0 -1 will aid the government
in forecasting the magnitude of immunization programme for the young
members of the population, childcare aid and family allowance to
mothers. The size of the population of women aged 15-45 years will
enable the planners to estimate the population of women of child
bearing age, as well as their health need. The old members of the
population particularly those aged 60 and above need special care and
attention. The size of such group will enable the government to
39
determine whether the population is ageing and also to plan for the type
of health care, housing and general welfare of this group.
Housing
The demand for housing depends largely on the size of the population,
the age and sex distribution, family size and population distribution,
(rural-urban distribution) and the mobility of the population as well as
household incomes. An increase in population size, household size and
income may create a new demand for residential places. Housing
demand varies from place to place and from time to time. In Nigerian
the demand for more residential places is generally higher in the urban
areas where the rate of growth of the population appears to be higher
than in the rural areas.
Labour supply, manpower and employment
The supply of labour in any given population depends upon the size of
the population, its age and sex structure, labour force participation rates
and the level of fertility. The size of the population between the ages of
18 and 64 years will enable planners to forecast the country’s
employment need for persons in the labour force. Reliable population
statistics will enable the planners to make necessary predictions about
future employment situations in the country.
Demand for food
The effects of changes in the number of people and the demand for
food have been observed since the inception of population study. An
increase in the size of population without a corresponding increase in
food supply is a major cause of food shortage in many developing
countries. Where food production has failed to keep pace with
population, demand for food imports have usually risen sharply.
In many developing countries, the rate of growth of food production in
the last decades has barely kept pace with the rate of growth of the
population. Consequently, this has led to severe shortages and
starvation. Population statistics are important to monitor the
relationship between population growth, supply and demand for food.
3.1 Use of Vital registration statistics
40
Vital registration Statistics can be used as follows:
To provide additional data, independent of census on measures
of fertility
To check on census enumeration, particularly at the infant and young
ages where under-enumeration is most common.
The sex ratio at birth provides information that can be used in
population projections.
Provides legal and documentary evidence for purposes of certification
and determination of age, civil status, rights and claims.
For epidemiological studies of the incidence and prevalence of
diseases and the planning of health services and programmes.
3.2 Use of Population register and migration statistics
Population register can be used for a wide range of administrative
matters which include: identification of persons in control of electoral
rolls, selection for military service or any other national service
programme and the preparation of the tax list. Statistics on social
security, health, education, family
income, housing and taxation can also be derived from population
registers.
A wide range of statistics on the size, structure, composition and
movement of the population can be derived from migration records.
Such information is useful in urban and country planning as well as in
the study of problems of population pressure and depopulation. The
pattern, volume and seasonal variation in population movement are
very crucial in agricultural regions. Statistics on these aspects are
useful in agricultural planning, particularly in labour intensive
ventures.
3.2 Use of sample surveys
Statistics from sample surveys can be used in checking the accuracy
of census returns. Post-enumeration survey conducted after main
population census is one of the most important ways by which the
accuracy of any census returns can be verified. Information from
sample surveys can also be used in the estimation of demographic
components such as fertility, mortality and migration, or the total size
and the spatial distribution of the population of a country or territory.
The rate of growth generated from the 1965-66 Rural Demographic
Sample Surveys conducted in Nigeria was the only national figure
41
available for the estimation of the population until 1984 when the
result of the National sample survey of 1981/82 was published. The
new rate of growth from this study was used for several years as basis
for estimating the population of Nigeria, until the 1990 Demographic
and Health Survey and the 1991 Population Census were undertaken.
The general lack of reliable and adequate statistics on the size structure
and composition of the population of any country or territory will affect
the planning and development process of the countries. This has
apparently been the case in Nigeria since the inception of the various
Development Plans.
4.0 Conclusion
As social scientists we use statistics in several ways. We use statistics to
characterize a group or to make comparison between groups. For example
how many people will vote or voted for a particular party in an election or
what is the voting pattern between the rural population and urban population
or between Christian and Muslims. Statistics can also be used to design
surveys, experiments, for planning and above all for evaluating the sucdess or
otherwise of government programmes.
5.0 Summary
In this Unit we have discussed the various uses of statistics. We discussed
the use of census data, vital registration data, population register and
migration records, and sample survey data in educational, health, housing,
labour, manpower and employment and food planning. In planning the
various sources data tend to complement each other.
5.0 Exercise
Identify and discuss the use of census and vital registration data in
educational and health planning
Discuss how population and vital registration statistics can be useful for
planning in your community
Compare the use of census data and sample survey data for planning
7.0 Further Readings and Other Resources
I.O. Orubuloye and Folakemi Oguntimehin. The study of Human
Populations. Centre for Polulation and Health Research,
Ado-Ekiti, Nigeria, 2002.
42