Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
53 views56 pages

MLS 314 Module 1

Uploaded by

Emmanuel Dauda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views56 pages

MLS 314 Module 1

Uploaded by

Emmanuel Dauda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

MLS 314

BIOSTATISTICS I

By
MEDUGU, JESSY THOMAS
[email protected]

DEPT. OF MEDICAL LABORATORY SCIENCE,


UNIVERSITY OF MAIDUGURI.
Course outline
 What is Biostatistics?
The role of Statistics in Medical and Health
Sciences.
Definitions and Terminologies in Biostatistics
Data Collection Methods
Descriptive and Inferential Statistics
Measures of Central Tendency
Measures of Dispersion
What is Statistics?

Statistics is the scientific study of methods of collection,


collation, analysis, presentation and interpretation data.

Biostatistics is the application of statistical methods and


procedures in the study and understanding of medicine, health
and biological sciences.

The terms Biostatistics and Medical Statistics are often


interchangeably used.
Statistics in Medical Sciences & Public Health

 Medicine and medical sciences are becoming increasingly


quantitative rather than qualitative.

 The planning, conduct and interpretation of medical and public


health research are dependent on statistical methods.

 Statistics influences public health decisions and actions.

 Statistics pervades the medical and health literature.


Statistics in Medical Sciences & Public Health cont’d…

 Determining the magnitude of disease burden in a population.


 Identifying risk factors and population at risk.
 Assessing the impact of public health interventions.
 Modeling and hypothesizing to enable prediction of future health-
related events.
 Biostatistics concepts are adopted as basis to define treatment
protocols.
 Biostatistics concepts are adopted in trials of new drugs,
pharmaceuticals, vaccines .
Definitions and Terminologies in Biostatistics

Variable

 A variable is any entity or item or characteristic that is able or liable to


change or vary.

 Depending on the characteristic of the variable, it can be classified as:


 Quantitative variable

 Qualitative variable

 A quantitative (numerical) variable is either continuous or discrete.

 A qualitative (categorical) variable is put as nominal or ordinal.


 Continuous quantitative variable take any range of numerical values e.g.
Height, weight, Glucose level and your account balance.

 Discrete quantitative variables these are integers, typically counts e.g. family
size, coliform counts, number of rooms and number of wives.

 Nominal qualitative variables these are characters that can not be described
numerically e.g. colour of skin (black, brown yellow), height (short or tall) and
gender (male or female).

 Ordinal qualitative variables are mutually exclusive, ordered and can be


ranked e.g. Social class (Low, medium and high), Disease severity (Mild,
Moderate and severe) and Level of education
Population
 In research or biostatistics, population is used in a wider sense than usual.
 It refers to a group of people, animals or objects of research or statistical
interest. E.g. Diabetics, hypertensive individuals e.t.c.
Sample
 This is a subset of population that is systematically chosen in a way to serve
as true representative of the population.
 Universal sample (sampling): this occur when the population is small enough
to be exhausted in a research.
Why do we sample?
 Limited time
 To avoid too much expenses or resource requirement
 To avoid cumbersomeness
 Data: data is a collection of related variables or observations
which when analyzed will give useful information for decision
making.

 Data point- refers to one single observation.

 Based on scale of measurement, data are either quantitative


or qualitative.

 Based on source, data are primary or secondary.


Validity of data:
 Accuracy and reliability of a test; the extent to which the test
measures what is supposed to measure.
 Researchers depend on various types of validity to verify the
effectiveness of measurement procedures used.
 Types of validity: External, construct, criterion, content and
face validity.
Threats to validity of data
 Confounders
 Selection bias
 Observers variation
Data Collection Methods

 Data collection techniques must be employed to enable us


systematically collect information on our study units (people,
objects, events or phenomena).

 If data are collected haphazardly, it becomes difficult to


answer our research questions.

 It is the first step that must be taken seriously and be


conducted accurately before collation of data for onward
analysis, presentation and interpretation.
Data Collection Methods Cont’d…

 The following are data collection methods:

1. Observation/measurements

2. Interviews (SIS or SSIS)

3. Self-administered questionnaires

4. Focus group discussions

5. Key Informant Interview (In-depth interviews)

6. Use of documentary sources


Observation
 This is a technique that involves systematically selecting,
watching and recording of events, objects or phenomena (e.g.
blood pressure, weight, use of protective devices, student’s
behaviour).
 Participant observation: the observer takes part in the situation
he/she observes.
 Non-participant observation (concealed): the observer watches
the situation, does not participate and remains concealed.
 Non-participant observation (open, non-concealed): the
observer watches the situation openly (does not hide his/her
presence) but does not participate.
Measurements

 If observations are made using a defined scale or equipment,


the technique is referred to as measurement. E.g. Using
weighing scale for weight, New improved Neubauer counting
chamber for white cells count e.t.c.
Advantages of Observation Techniques
 Gives more detailed and context-related information
 Permits collection of information on facts not mentioned in the
questionnaire.
 Permits tests reliability of responses to questionnaires
 Measurement allows actual quantification of targets.
Measurements cont’d…

Disadvantages of Observation Techniques


 Ethical issues concerning confidentiality.
 Observer bias may occur
 The presence of the data collector can influence the situation
observed.
 Thorough training of research assistants is required.
Interviews

 An interview is a data collection technique that involves oral


questioning of respondents (using SIS or SSIS).

 SIS (Structured Interview Schedule): involves the use of fixed


list of questions to be asked in standard sequence, with fixed or
pre-categorized responses.

 SSIS (Semi-structured Interview Schedule): this technique allows


for flexibility in ordering of the questions. An interviewer may ask
additional questions to gain more useful information.
Self-administered Questionnaires
 This is a technique in which written questions are presented and
are to be answered by the respondents in writing.
 The types of questionnaires used are:
 Unstructured
 Semi-structured
 Structured
 Written questionnaires can be administered in different ways,
including:
 Through the mail
 Gathering respondents in group (s)
 Hand delivering to respondents and collecting them later.
Focus Group Discussion (FGD)

 In FGD, discussion of 6-10 persons (with similar characteristics)


is organized.

 The discussion is usually guided by a facilitator during which


members talk freely and spontaneously about a certain topic.

 The aim of the discussion is to capture perceptions, attitudes and


ideas of the participants.

 The discussions are either tapped or captured by note-takers.


Key Informant Interview (In-depth Interview)

 KII allows people that have monopoly of knowledge/information to


tell stories, provide insight about an issue.

 A key informant could be: a knowledgeable community leader, health


expert or experienced driver or cleaner.

 The interview may include one or two informative members of a


target group.

 In this technique, the data collector must be diplomatic, blend to


culture and adopt use of euphemism.
Use of Documentary Sources

 This method involves retrieving data already collected (but not


analyzed) from existing sources.

 Examples of such sources include: medical records, archives,


records from college of medical sciences and state LGA records
(e.g. data on disease surveillance).

 A documentary source can be programme-specific data e.g.


immunization coverage, HIV/AIDS clients receiving services from
SIDHAS project.
Use of Documentary Sources cont’d…

Advantages of using documentary sources

 Inexpensive (data had already been collected)

 Permits examination of past trends

Disadvantages of using documentary sources

 Issue of access

 Confidentiality

 Biased information

 Missing information
Descriptive and Inferential Statistics
Descriptive and Inferential Statistics

 Descriptive Statistics simply describes the attributes of a data


set:
 Frequency of occurrence of certain values
 Typical or representative values
 Degree of spread or scatter e.t.c.
 Inferential Statistics refers making generalizations about a
population based on the attributes of a sample; using the
knowledge of probability.
Measures of Central Tendency
Measures of Central Tendency (Location) cont’d…

Measures of CT

 Arithmetic mean

 Median

 Mode
Arithmetic mean (AM)

 AM is customarily just called Mean.

 It is the simple average of a given set of values (numbers).

 It is the sum of given values divided by the total number of the


values.

 The mean is calculates by the formula:


Arithmetic mean (AM) cont’d…
Arithmetic mean (AM) cont’d…
Arithmetic mean (AM) cont’d…
Arithmetic mean (AM) cont’d…

Advantages of AM
 It is based on all the values in the series
 It is easy to understand and simple to calculate
 It is not influenced by the position of values in the series.
 The mean is used in a number of inferential statistics.
Limitations of AM
 The mean is easily influenced by extreme values.
 It is not a good measure of location when the data are skewed.
Assignment 1

 Write a short note on any two other types of mean other than the
arithmetic mean.
• Note: this should not be more than a page.
Median
 This is the item that divides a set of data into two equal halves.

 If a given series of measurements or observations is arranged


in increasing order, the median is the middle one.

 It is usually calculated by the formula (n + 1)th/2; if the total


number of data set is an odd.

 Or nth/2; if the total number of data set is an even.


Median cont’d…
 For grouped, discrete data, data with class interval, the
formula is;

Where L1= Lower class boundary of the median class.


N = number of given data.
(∑f)1` = sum of all frequencies of the class/classes lower than
the median class.
fmedian = frequency of the median class.
c = size of the median class.
Median cont’d…
Median cont’d…
Median cont’d…
Median cont’d…
Advantages of Median
 Median is not easily affected by extreme values.
 It can be obtained graphically.
 It is a measure of rank or position.
 It gives clear idea of the distribution of the data.
Disadvantages of Median
 It may not be representative if there are few data.
 Beyond descriptive statistics, median is rarely used in inferential
statistics.
 It may require rearrangement of data involved. This may be
cumbersome if large sample is involved.
Mode

 The mode is that value in a data set which occurs most frequently.

 It is identified by counting the number of times each value occurs in


the set and selecting that value which occurs most often.

 E.g. the mode of this set of 7 observations: 3,4,7,9,15,4,5 is 4.

 But if two or more numbers have highest occurrence (frequency), the


mode is obtained by taking the AM of the values.
Mode cont’d…
Mode cont’d…
Mode for grouped, discrete data with class intervals
 To determine a value for mode here, an interval with highest
frequency is identified first.
 It is called modal interval or modal class.
 The formula for mode is then used;

Where: L1 = lower class boundary of modal class


∆1= excess of modal frequency over the
frequency of immediate lower class (∆1= fm –fm-1)
∆2 = excess of modal frequency over the
frequency of immediate upper class (∆2 = fm –fm+1)
C = class width or size
Mode cont’d…
Mode cont’d…
Advantages of Mode
 It can be obtained graphically.
 It is not affected by extreme values.
 It is easy to understand and compute.
 The data does not need any ordering before mode can be
obtained.
Disadvantages of Mode
 It is not an ideal measure of CT.
 It is not useful in further statistical processing.
 It does not make use of all the values in the distribution.
 It may not be unique because 2 or more values may be equally
frequent.
Measures of Dispersion
Measures of Dispersion

 Apart from knowing the typical or representative values (CT), it


is often of interest to know the extend of variability exhibited
by the various values.

 The dispersion of a set of observation refers to the variety that


the values of the observation exhibit.

 If all the values are the same , there is no dispersion; if they are
not the same, dispersion is present in the data.

 The study of Statistics is all about variability.


Measures of Dispersion

 Range

 Mean deviation

 Variance

 Standard deviation

 Coefficient of variance

 Standard error
Range

 Range is the simplest measure in Statistics.


 It is the difference between the largest (Xh) and smallest (XL)
values; usually denoted as R = Xh – XL
 E.g. the Range of 5,7,8,9,2 is 9- 2 = 7.
 Range has two main disadvantages:
 It only takes into account two extreme values. Variability of
immediate values is ignored.
 It tends to increase as the number of observations increases.
Mean Deviation (MD)

 MD measures the average spread of values from the AM.


 MD is referred to as the average amount by which a value in a given
data set differs from the AM.
 It is otherwise known as Mean Absolute Deviation.
 For ungrouped data; MD = ∑Ix-ẋI/n.
 For grouped data; MD = ∑fIx-ẋI/∑f.
Features of MD
 It is straightforward to calculate.
 It cannot be distorted by extreme values
 It makes use of every value in the data set.
 It cannot be used for further statistical processing.
Mean Deviation (MD) cont’d…
Variance and Standard Deviation

 The variance and standard deviation are estimates that assess how
scattered the individual measurements are from the mean.
 They are defined in terms of the deviations (x-ẋ) of the observation
from mean.
 The variance can be described as the estimated average of the
squared deviations.
 Standard deviation is the square root of variance.
 the standard deviation is the most important measure of dispersion
used in statistical analysis.
Variance and Standard Deviation cont’d…

 When dealing statistically with analyzing sample, in most cases


we do not have information on the whole population.
 But if we have, the formula for variance and standard deviation
called population variance and population standard deviation,
respectively are:
Variance and Standard Deviation cont’d…

 When dealing statistically with analyzing sample and we do not


have information on the whole population, the formula for
variance is:
Variance and Standard Deviation cont’d…

 When dealing statistically with analyzing sample and we do


not have information on the whole population, the formula
for standard deviation is:
Coefficient of Variation (CV)

 CV is the ratio of sample standard deviation to sample mean


multiplied by 100.
 It expresses the SD as a percentage of the sample mean.
 It answers the question: “what percentage of the sample mean is
the SD”?
 Used for the comparison of variability in different data sets
measured in different units (it is unit-less).
 It is denoted by: CV = S/ẋ X 100%, where S = SD and ẋ = sample
mean.
Standard Error (SE)
 SE is a measure of how precisely the population mean is estimated by the
sample mean.
 It is a measure of the precision of a sample in estimating population
parameter.
 The size of SE depends both on how much variation there is in the
population and on the size of the sample.
 The larger the sample size n, the smaller the SE.
 It is given by a formula: se = s/√n, where s = sample SD and n = sample
size.
Assignment 2
End of Module 1

You might also like