Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views80 pages

1 Introduction

Uploaded by

DrElias Davis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views80 pages

1 Introduction

Uploaded by

DrElias Davis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 80

UNIVERSITY OF DODOMA

COLLEGE OF HEALTH AND ALLIED


SCIENCES

1
ER 615: BIOSTATISTICS

INSTRUCTOR: CHRISTOPHER MBOTWA

2
COURSE DESCRIPTION
• This course covers Basic Principles of
Biostatistics and thus equipping
students to deal with biostatistics skills.
• The course will provide skills for
designing, implementing, analyzing
and interpreting scientific
investigations.
3
Learning Objectives
1. To introduce students on Basic
principles and methods of Biostatistics
2. To enhance students on how to apply
biostatistics procedures in different
medical disciplines
3. To orient students on application of
biostatistics for providing descriptive and
inferential statistics
4
Learning Outcomes
 Apply the basic principles of Biostatics in planning and the
provision of health care services
 Explain the need for studying biostatistics in medicine and
describe descriptive methods for qualitative and quantitative
data
 Describe the Normal distribution curve, its characteristics and
demonstrate skills to calculate standard Normal distribution
 Describe the concept of sampling techniques and calculate
sample size for estimation of mean and proportion.
 Effective use significance tests and its interpretations
 Use statistical packages for data analysis
5
TOPIC 1: INTRODUCTION AND
DESCRIPTIVE STATISTICS

6
Definition
• Biostatistics can be defined as the
application of statistical methods to the
solution of biological problems.
• The biological problems of this definition are
those arising in the basic biological sciences
as well as in such applied areas as the
health-related sciences and the agricultural
sciences.
7
Definition Cont’d..
• Biostatistics is a growing field with
applications in many areas of biology
including epidemiology, medical
sciences, health sciences, educational
research and environmental sciences.
• Statistical thinking is not really different
from ordinary disciplined scientific
thinking, in which we try to quantify our
observations. 8
Definition Cont’d..
• Any science needs precision for its
development.
• Precision is all the more important
when it comes to health sciences. For
precision; facts, observations or
measurements have to be expressed in
figures.
9
Definition Cont’d..
• Medicine and any other health science
is essentially an empirical science. It
depends on observations and not on
theories or theorems.
• As a part of clinical practice or research
we deal with many observations, which
when systematically arranged, are
called Data. 10
Definition Cont’d..
• Everything in medicine be it research,
diagnosis or treatment, depends on
counting or measurement.
• High or low blood pressure has no
meaning, unless it is expressed in figures.
• Thus, the knowledge of health statistics or
biostatistics is needed to analyze and
interpret those figures.
11
Definition Cont’d..
• In nature, blood pressure, pulse rate,
action of a drug or any other
measurement or counting varies not only
from person to person but also from group
to group.
• The extent of this variability in an attribute
or a character, whether it is by chance i.e.
biological or normal, is learnt by studying
statistics as a science. 12
Definition Cont’d..
• The data after collection, lying in a
haphazard mass are of no use, unless
they are properly sorted, presented,
compared, analyzed and interpreted.
• For such a study of figures, one has to
apply certain mathematical techniques
called statistical methods.
13
Definition Cont’d..
• Statistics is concerned with collection,
organization, summarization and
analysis of data.
• We seek to draw inferences about a
body of data when only a part of the
data is observed.

14
Application of Biostatistics
Discuss the use of biostatistics in:-
1. Medicine
2. Physiology and Anatomy
3. Pharmacology
4. Public health
5. Nutrition
6. Health planning, monitoring and evaluation
7. In Genetics
15
Descriptive vs Inferential Statistics
• Descriptive statistics are used to
describe the basic features of the data in
a study. They provide simple summaries
about the sample and the measures.
• With inferential statistics, you are trying
to reach conclusions that extend beyond
the immediate data alone.
16
Descriptive vs Inferential Statistics
• For instance, we use inferential statistics
to try to infer from the sample data what
the population might think.
• Or, we use inferential statistics to make
judgments of the probability that an
observed difference between groups is a
dependable one or one that might have
happened by chance in this study.
17
Descriptive vs Inferential Statistics
• Thus, we use inferential statistics to
make inferences from our data to more
general conditions;
• we use descriptive statistics simply to
describe what's going on in our data.

18
Data
• Data are numbers which can be
measured or can be obtained by
counting.
• Statistics is concerned with the
interpretation of the data and the
communication of information about the
data.
19
Sources of data
Data are obtained from
• Surveys
• Experiments
• Direct observation
• Records
• Reports e.g Census report
20
Variables
• A variable is an object, characteristic or
property that can have different values. A
variable can be quantitative or qualitative
• A quantitative variable can be
measured in some way and expressed in
numerical form. E.g Weight, Height.
• A qualitative variable is characterized
by its inability to be measured but it can be
sorted into categories. E.g. colors, Sex.
21
Variables Cont’d..
• A random variable is one that cannot
be predicted in advance because it arises
by chance. It is a real number x
connected to a random experiment E
•Observations or measurements are used
to obtain the value of a random variable.
• Random variables may be discrete or
continuous.
22
Variables Cont’d..
• A discrete random variable takes at
most a countable values in a specified
range.
•Discrete variables can not be expressed
in decimals or fractions, e.g. number of
peoples, number of still births per year,
number of accidents per month.
23
Variables Cont’d..
• A continuous random variable
takes all possible values in a
specified range.
•Continuous variables can not be
counted and can be expressed in
decimals or fractions, e.g
temperature, height, weight, etc.
24
Scales of Measurement of the Variable
There are four scales/levels of
measurement of the variable
i. Nominal
ii.Ordinal
iii.Interval
iv.Ratio
25
Nominal
• Nominal scales are used for labeling
variables, without any quantitative value.
• Nominal scales could simply be called
labels.
• A good way to remember all of this is
that “nominal” sounds a lot like “name”
and nominal scales are kind of like
“names” or labels. 26
Examples (Nominal)
• Gender
1. Male 2. Female
• Hair color
1. Brown 2. Black 3. Gray 4. Other
• Place of residence
1. Rural 2. Urban
NB: The best descriptive statistics for
nominal variables is mode. Mean and
median can not be computed.
27
Ordinal
• With ordinal scales, the order of the
values is what’s important and
significant, but the differences between
each one is not really known.
• For example, is the difference between
“OK” and “Unhappy” the same as the
difference between “Very Happy” and
“Happy?” We can’t say. 28
Ordinal
• Ordinal scales are typically measures
of non-numeric concepts like
satisfaction, happiness, discomfort, etc.
• “Ordinal” is easy to remember because
it sounds like “order” and that’s the key
to remember with “ordinal scales”–it is
the order that matters, but that’s all you
really get from these. 29
Ordinal
• Note: The best way to determine
central tendency on a set of ordinal
data is to use the mode or median; the
mean cannot be defined from an
ordinal set.
• Here are some of the examples of
ordinal scale level of measurement:
30
Ordinal
• How do you feel today?
1. Very unhappy 2. Unhappy 3. Ok
4. Happy 5. Very happy
• How satisfied are you with our
services?
1. Very unsatisfied 2. satisfied
3. Neutral 4. Satisfied 5. Very
satisfied 31
Interval
• Interval scales are numeric scales in
which we know not only the order, but
also the exact differences between the
values.
• The classic example of an interval scale
is Celsius temperature because the
difference between each value is the
same. 32
Interval
• For example, the difference between 60
and 50 degrees is a measurable 10
degrees, as is the difference between
80 and 70 degrees.
• Time is another good example of an
interval scale in which the increments
are known, consistent, and measurable.
33
Interval
• Interval scales are nice because the
realm of statistical analysis on these
data sets opens up.
• For example, central tendency can be
measured by mode, median, or mean;
standard deviation can also be
calculated.
34
Interval
• Like the others, you can remember the
key points of an “interval scale” pretty
easily. “Interval” itself means “space in
between,” which is the important thing to
remember–interval scales not only tell us
about order, but also about the value
between each item.
• Here’s the problem with interval scales:
35
Interval
• They don’t have a “true zero.” For
example, there is no such thing as “no
temperature.” Without a true zero, it is
impossible to compute ratios.
• With interval data, we can add and
subtract, but cannot multiply or divide.
consider this: 10 degrees + 10 degrees
= 20 degrees, but 20 degrees is not
twice as hot as 10 degrees. 36
Ratio
• Ratio scales are the best when it
comes to measurement scales
because they tell us about the order,
they tell us the exact value between
units, AND they also have an absolute
zero–which allows for a wide range of
both descriptive and inferential
statistics to be applied.
37
Ratio
• As with interval scale, everything about
interval data applies to ratio scales with
addition that ratio scales have a clear
definition of zero.
• Good examples of ratio variables include
height and weight.
• Ratio scales provide a wealth of
possibilities when it comes to statistical
analysis. 38
Ratio
• These variables can be meaningfully
added, subtracted, multiplied, divided
(ratios).
• Central tendency can be measured by
mode, median, or mean; measures of
dispersion, such as standard deviation
and coefficient of variation can also be
calculated from ratio scales.
39
Population and sample
• A population is the collection or set
of all of the values that a variable
may have.
• A sample is a portion of a
population in which a researcher
intends to study.
40
Statistic and parameter
• A statistic is a descriptive measure
computed from the data of the sample.
Example, sample mean and sample
standard deviation
• A parameter is a descriptive measure
computed from the data of the population.
Example, population mean, and population
standard deviation 41
Statistical inference
• Statistical inference is the procedure
used to reach a conclusion about a
population based on the information
derived from a sample that has been
drawn from that population.

42
Descriptive Statistics
• Descriptive statistics is a branch of statistics
that describe what you get. It includes:-
i. Frequencies and frequency distribution
ii. Measures of central tendency
iii. Measures of dispersion
iv. Measures of distribution of the data

43
Frequencies
• Summarizing categorical variables is
straightforward, the main task being to count
the number of observations in each category.
These counts are called frequencies.
• They are often also presented as relative
frequencies; that is as proportions or
percentages of the total number of
individuals
44
Frequencies
• For example, Table 1.1 summarizes the method
of delivery recorded for 600 births in a hospital.
• The variable of interest here is method of
delivery, a categorical variable with three levels
(Normal, Forceps, and Caesarean section).
• Frequencies and relative frequencies are
commonly illustrated by a bar chart (also known
as a bar diagram) or by a pie chart.

45
Table 3.1 Method of delivery of 600 babies born in a hospital.

Methods of delivery No. of births Percentage

Normal 478 79.7

Forceps 65 10.8

Caesarean 57 9.5

Total 600 100.0

46
Frequency distributions
• Frequency distribution is a table showing
the number of observations at different
values or within certain ranges.
• For a discrete variable the frequencies
may be tabulated either for each value of
the variable or for groups of values.
• With continuous variables, groups have to
be formed 47
Guidelines for preparing frequency distribution table

• Choose between 5 and 20 classes


• Choose classes that will accommodate all the
data
• Choose classes that are mutually exclusive
• If possible make all classes of equal length
• Consider the data on Table 1.2 for hemoglobin
which has been measured g/100 ml. Construct
a frequency distribution table
48
Table 1.2 Hemoglobin levels in g/100 ml for
70 women.
10.2 13.7 10.4 14.9 11.5 12.0 11.0
13.3 12.9 12.1 9.4 13.2 10.8 11.7
10.6 10.5 13.7 11.8 14.1 10.3 13.6
12.1 12.9 11.4 12.9 10.6 11.4 11.9
9.3 13.5 14.6 11.2 11.7 10.9 10.4
12.0 12.9 11.1 8.8 10.2 11.6 12.5
13.4 12.1 10.9 11.3 14.7 10.8 13.3
11.9 11.4 12.5 13.0 11.6 13.1 9.7
15.1 10.7 12.9 13.4 12.3 12.9 11.0
11.1 13.5 10.9 13.1 11.8 12.2 13.5
49
Table 1.3. Frequency distribution table
Hemoglobin No. of Women Percentage
8.0-8.9 1 1.4
9.0-9.9 3 4.3
10.0-10.9 14 20.0
11.0-11.9 19 27.1
12.0-12.9 14 20.0
13.0-13.9 13 18.6
14.0-14.9 5 7.1
15.0-15.9 1 1.4
Total 70 100 50
• NB: Frequency distributions are usually
illustrated by histograms. Either the
frequencies or the percentages may be
used; the shape of the histogram will be the
same.
Question: Construct a histogram from
frequency distribution above (Table 1.3)

51
Terminologies Associated with a Frequency Distribution

1. Class Interval
 A symbol of group such as 10.0-10.9 in Table 1.3
is called interval. Smaller one 10.0 in this case is
called Lower Class Limit and the larger one 10.9
in this case is called the Upper Class Limit.
 In case where either the lower class limit of the
first class interval or the upper class limit of the
last class interval is not indicated then we say we
have an Open Class Interval. 52
2. Class Boundaries
 These are the dividing lines between
successive class intervals.
 The boundaries are obtained by adding the
upper class limit of one class interval with
the lower class limit of the succeeding
class interval and dividing by 2.
 Example, the class boundary of class
interval 10.0-10.9 is 9.95-10.95.
53
3. Class Mark
 This is the mid-point of the class interval.
 It is obtained by adding the lower class limit
and the upper class limit and dividing by 2
 For example, the class mark for class
interval 10.0-10.9 in our case is 10.45
 We usually denote class mark by X

54
4. Class Interval Size (or Length or
Width)
 This is the difference between the upper
class boundary and the lower class
boundary of a given class interval.
 In our example, the class interval size is 1

55
Measures of Central Tendency
• Measures of central tendency are statistical
measures which describe the position of a
distribution.
• In the univariate context, the mean, median
and mode are the most commonly used
measures of central tendency.
• Measures of central tendency can be computed
for ungrouped and grouped data
56
Ungrouped Data
1. Arithmetic Mean
• Arithmetic mean is a mathematical average and it
is the most popular measures of central tendency.
• It is frequently referred to as ‘mean’ or ‘average’ it is
obtained by dividing sum of the values of all
observations in a series by the number of items
constituting the series.
• Arithmetic mean for ungrouped data is computed
as:-
57
• X is the value of the item
• Where n refer to number of items

58
2. Median
• The median is the value that divides the given
distribution into two equal parts.
• In order to calculate the median we have to
arrange the observations in ascending or
descending order
• When we have an odd number of observations.
Then, the median is in position.

59
• If there is an even number of observations,
there is two middle items. In this case, the
average of the two ‘middle’ ones is taken
as the Median
NB: The most important thing when
computing the median is to arrange your
observations in either ascending order or
descending order.

60
3. Mode
• The mode is that value (or those values) that
occurs most often in the distribution. That is
the value with the highest frequency.
• Theoretically, there can be no mode or there
can be more than one mode.
• A distribution with one mode is said to be
unmodal.
• A distribution with more than one mode is
called multimodal. 61
Example 1.1: Compute mean, median and mode
for the following plasma volumes
(litres) of eight healthy adult males:
2.75, 2.86, 3.37, 2.76, 2.62, 3.49,
3.05, 3.12
Example 1.2: Calculate the mean, median and
mode of the following BMI measured for 7 adult
females:
18.6, 20.5, 25.8, 18.2, 22.9, 20.5, 24.0

62
Grouped Data
1. Arithmetic Mean
For grouped data, the arithmetic mean is
defined as:-

Where X refer to Class Mark and f refer to


the corresponding frequency.
63
2. Median
For grouped data, median is given as:-

Where:
is the median.
64
is the lower class boundary of the median
class interval, that is the class interval
containing the median value.
N is the total of the frequencies.
is the cumulative frequency of all
frequencies of the class intervals below the
median class interval.
is the frequency of the median class
interval.
is the class interval size of the median
class interval. 65
NB: The median class interval is that interval
at which the cumulative frequencies
exceed N/2 for the first time.
3. Mode
The mode is computed using the
following formula:

66
Where:
is the modal value.
is the lower class boundary of the modal
class, that is the class interval containing
the mode.
is the difference between the frequency of
the modal class interval and the frequency
of the class interval immediately below the
modal class interval.
67
is the difference between the
frequency of the modal class interval
and the frequency of the class interval
immediately above the modal class
interval.
C is the class interval size of the
modal class interval.
NB: The modal class interval is that interval
which has the highest frequency.
68
• Example 1.3: The following data
represents the age distribution of a sample
of 100 people covered by health insurance
(private or government)
Age Number
25-34 23
35-44 29
45-54 28
55-64 20

Compute Mean, Median and Mode. 69


Example 1.4: Consider the frequency
distribution of serum cholesterol in
86 stroke patients:-
Interval Frequency
3.0-3.9 3
4.0-4.9 14
5.0-5.9 21
6.0-6.9 20
7.0-7.9 21
8.0-9.0 5
9.0-9.9 2

Compute Mean, Median and Mode 70


QUANTILES OR FRACTILES
• We now consider dividing a frequency
distribution into a number of specified
fractions or quantities.
• The general terminology is that of
quantiles or Fractiles.
• Let’s focus on the Median, Quartiles,
Deciles and Percentiles.

71
• The Median divides the distribution into two
equal parts.
• The Quartiles divide the distribution into
four equal parts. There are three quartiles
(1st , 2nd , and 3rd).
• The Deciles divide the distribution into ten
equal parts.
• The Percentiles divide the distribution into
one hundred equal parts.
72
MEASURES OF DISPERSION
• There are several measures of
dispersion/variability. But we will consider
only three common measures, which are:-
1. Range
2. Variance
3. Standard Deviation

73
1. Range
 This is the simplest measure of dispersion
or variation.
 It is defined as the difference between the
highest value and the lowest value.
 For the frequency distribution, the range is
given by the difference between the upper
class limit of the highest class interval and
the lower class limit of the lowest class
interval. 74
2. Variance (S2)
 Average squared distance of individual
observations from the mean.
 High variance means that most scores are
far away from the mean. Low variance
indicates that most scores cluster tightly
about the mean.
 Variance for ungrouped data is given by:-

75
However, the formula given above
gives the biased estimate of the
population variance. Unbiased
estimate of the population variance
will be given by:-

76
Variance for ungrouped data is given by:

3. Standard Deviation (S)


 This is the statistical measure which show
how individual scores vary from the mean.
 Standard deviation is the square root of
variance, thus, it is expressed in the original
units of measurements. 77
Coefficient of Variation
 Is the ratio of standard deviation to the
arithmetic mean of the distribution.
 It is defined as:

 NB: If you are comparing distributions, the


distribution with smallest C.V is Said to be
less variable than the other distribution. 78
Exercise
Using the data in example 1.1-1.4,
compute the
i. range, variance and standard
deviation.
ii. The 1st quartile, 3rd quartile, 5th
decile, 9th decile, 25th percentile,
and 75th percentile.
79
END

80

You might also like