LECTURE NOTES FOR DEGREE STUDENTS
MAFLEKUMEN
BIOSTATISTICS
Course instructor
Tabe Cletus
(Molecular epidemiology, Public Health)
SECTION 1. INTRODUCTION TO BIOSTATISTICS
INTRODUCTION TO BIOSTATISTICS
1
SECTION OBJECTIVES
After completing this section, the students will be able to:
1. Define and explain Statistics and Biostatistics
2. Enumerate the importance and limitations of statistics
3. Define and Identify the different types of data and understand why we need to
classifying variables
1. Definition of Statistics and Biostatistics
The word statistics is a field of study concerned with the collection, organization and
summarization of data, and the drawing of inferences about a body of data when
only part of the data are observed. In this sense it is a branch of scientific method and
helps us to know in a better way the object under study.
INTRODUCTION TO BIOSTATISTICS
2
It can also be seen as the study and use of theory and methods for the analysis of
data arising from random pro cesses or phenomena. The study of how we make
sense of data.
It can be simply defined as the science of assembling and interpreting numerical
data.
When the different statistical methods are applied in biological, medical and public
health data they constitute the discipline of Biostatistics.
2. Rationale, importance and limitations of statistics
The planning, conduct, and interpretation of much of medical research are becoming
increasingly reliant on statistical processes and techniques. As such questions like: Is this
new drug or procedure better than the one commonly in use? How much better? What, if
any, are the risks of side effects associated with its use? In testing a new drug how many
patients must be treated, and in what manner, in order to demonstrate its worth? What
is the normal variation in some clinical measurement? How reliable and valid is the
measurement? What is the magnitude and effect of laboratory and technical error? How
does one interpret abnormal values?
From the above questions you can see that answering them will provide a more
quantifiable measure and statistical process can help us achieve this. With such
information we can make better decisions in society.
Therefore;
Statistics allows us to organize information on a wider and more formal basis than
relying on the exchange of anecdotes (stories in the neighbourhood) and personal
experience.
With statistics, many things are now measured quantitatively in medicine and public
health.
INTRODUCTION TO BIOSTATISTICS
3
There is a great deal of intrinsic (inherent) variation in most biological processes. So
statistics (Biostatistics) helps us to quantify this variations and draw adequate
conclusion with the end goal of solving a problem.
Statistics can help us to make predictions about the future. For example prediction
are been made in the field of environmental science about the devastating effect of
climate change.
Uses of Biostatistics in different settings
Hospital utility statistics (Example: At Limbe general hospital, 5% of the patients were diagnosed
with DM last year).
Resource allocation.
Vaccination uptake.
Magnitudes of a disease/condition.
Assessing risk factors.
Disease frequency.
Making diagnosis and choosing an appropriate treatment (probability).
*Question. How can we use Biostatistics in the different settings above to make better
judgement and decisions?
Limitations of statistics (Biostatistics)
It deals with only those subjects of inquiry that are capable of being quantitatively
measured and numerically expressed.
It deals on aggregates of facts and no importance is attached to individual items–
suited only if their group characteristics are desired to be studied.
Statistical data are only approximately and not mathematically correct.
3. Define and Identify the different types of data and understand why we need to
classifying variables
INTRODUCTION TO BIOSTATISTICS
4
What is data?
Data are the quantities (numbers) and qualities (attributes) measured or observed that are
to be collected and/or analysed. The word data is plural and datum is singular.
Data also referred to as raw data in other fields, refers to collection of text, numbers or
symbols in raw or unorganised form. A collection of text, numbers and symbols with no
meaning. Data therefore has to be processed, or provided with a context, before it can have
meaning (information). Example;
• 3, 6, 9, 12
• cat, dog, gerbil, rabbit,
• 161.2, 175.3, 166.4, 164.7, 169.3
These are meaningless sets of data. They could be the first four answers in the 3 x table, a list of
household pets and the heights of 15-year-old students but without a context we don’t know.
Information:
Information is the result of processing data, usually by computer or the brain. This results in
facts (situation, phenomenon that exist, or that can be proven, verified due to the existence of
data*) which enables the processed data to be used in context and have meaning.
Information is data that has meaning. For example.
1+1=2
The sum of 1+1=2 because two will not exist if one cannot be added to another one to give 2.
So 1 is a data 1+1 is the processing and 2 is the fact or information. Note that the above
process can be proven.
When does data become information?
Data on its own has no meaning. It only takes on meaning and becomes information when it
is interpreted. Data consists of raw facts and figures. When that data is processed into sets
according to context, it provides information. Data refers to raw input that when processed
or arranged makes meaningful output. Information is usually the processed outcome of data.
INTRODUCTION TO BIOSTATISTICS
5
When data is processed into information, it becomes interpretable and gains significance. In
other words, data in a meaningful form becomes information. Information can be about facts,
things, concepts, or anything relevant to the topic concerned. It may provide answers to
questions like who, which, when, why, what, and how. If we put Information into an equation
it would look like this:
Data + Meaning = Information.
Data Information Knowledge
Data and Variables
In an environment or system, features or characteristic vary from the things that make up the
ecosystem. For example two dogs can live in an ecosystem but one has a red colour and the
other a black colour. This becomes clear that colours vary from one dog to another within the
same ecosystem. The colour is a feature that vary (variable) between the two dogs within the
same environment. The colour used here is a variable.
What then is a variable?
Any aspect of an individual that is measured and take any value (data) for different individuals
or cases, like blood pressure, or records, like age, sex is called a variable. A variable can be any
characteristic that differs from person to person, such as height, sex, smallpox vaccination status,
or physical activity pattern. The value of a variable is the number or descriptor that applies to a
particular person, such as 5'6" (168 cm), female, and never vaccinated.
The values we get from measuring features that change from one person to another (variable) is
data.
Types of Variables/Data
Variables can be classified as qualitative and quantitative
Classification Qualitative Quantitative
INTRODUCTION TO BIOSTATISTICS
6
based on the
nature of data
Definition This are variables that are non-numerical in This are variables that are
nature. They take the form of words and numerical in nature. They can
different categories. assume numbers after
measurement.
Example Gender: male/female, Height: 1.5m, 1.8m. 1.6m etc.
Number of cells: 1000 cells,
Severity of pain: mild/moderate/severe
200000 cells. etc.
Type of variable/data
Quantitative Qualitative
Interval Ratio Nominal Ordinal
Discrete Continuous Continuous Discrete
Classification Scale Definition/example
based on scale of
measurement
Qualitative Nominal There is no implied order to the categories of nominal data.
scale In these types of data, individuals are simply placed in the
proper category or group. Eye colour - brown, black, etc.
Religion - Christianity, Islam, Hinduism, etc.
Sex - male, female. Each of the categories can come before
the other so there is no other.
Ordinal Have order among the response classifications
(Categories). The spaces or intervals between the
scale
categories are not necessarily equal. For example
perception measures like;
strongly agree>agree>no opinion>disagree>strongly
disagree
Quantitative Discrete Quantitative data that takes whole numbers and not
(Metric decimal or fractions.
variable)
Continuou Quantitative data that can take decimal or fractions.
s
Interval In interval data the intervals between values are the
INTRODUCTION TO BIOSTATISTICS
7
Data Same. For example, in the Fahrenheit temperature scale,
the difference between 70 degrees and 71 degrees is the
same as the difference between 32 and 33 degrees. But the
scale is not a RATIO Scale. 40 degrees Fahrenheit is not
twice as much as 20 degrees Fahrenheit.
Ratio scale The data values in ratio data do have meaningful ratios, for
example, age is a ratio data, and someone who is 40 is
twice as old as someone who is 20.
Exercises
Identify the type of data (nominal, ordinal, interval and ratio) represented by each of the
following.
1. Blood group
2. Temperature (Celsius)
3. Ethnic group
4. Job satisfaction index (1-5)
5. Number of heart attacks
6. Calendar year
7. Serum uric acid (mg/100ml)
8. Number of accidents in 3 - year period
9. Number of cases of each reportable disease reported by a health worker
The characteristics such as age, sex, height, weight, body mass index (BMI), blood group, body
temperature, blood glucose level, blood pressure, heart rate, number of teeth, severity of disease
(mild, moderate, severe) etc. are some of the examples for biological variables in research.
INTRODUCTION TO BIOSTATISTICS
8
INTRODUCTION TO BIOSTATISTICS