Biostatistics
Basic Concepts & Descriptive Statistics
Lecture 1 & 2
Dr. Moataza Abdel Wahab, Dr. Tajammal Mustafa,
Email:
[email protected]; Phone: 0500758041
Department of Family and Community Medicine
Thursday, February 6, 2025
Lectures 1 & 2 Objectives
On the successful completion of this unit, students will be
able to:
▪ Define different types of variables
▪ Define methods of describing central tendency (mean and
median) and variability (variance, standard deviation,
range, inter-quartile range)
▪ Design a scientifically sound Table
▪ Identify different types of graphs and their appropriate use
▪ Identify the shape of data distribution (Normal distribution
curve and skewed data)
2
Thursday, February 6, 2025 Introduction to Population Health Module
For further reading, please refer to:
Bonita R, Beaglehole R, Kjellstrom T. Basic epidemiology.
Geneva: World Health Organization; 2006. Pages: 63-69
Thursday, February 6, 2025 Introduction to Population Health Module
Biostatistics
➢ Biostatistics is the branch of statistics that concerns with
the applications of statistical methods to medical and
biological data.
➢ In medical field, statistical 1.methods enable us to study the
effectiveness of different treatment and medicines.
➢ Various statistical methods are frequently used in the
analysis of data in health and medical sciences.
➢ Knowledge of statistical methods is very important in
health and medical research and in clinical practice for
dealing with uncertainty in diagnosis and treatments.
Thursday, February 6, 2025
Statistics
Statistics is a scientific field that deals with the
collection, presentation, analysis, and interpretation of
data.
TYPES:
Descriptive statistics is a branch of statistics
dedicated to the organization, summarization and
description of data.
Inferential statistics is the branch of statistics
concerned with using sample data to make inferences
about a population. In inferential statistics, predictions
are made and conclusions are drawn for the target
population based on sample.
Thursday, February 6, 2025
Statistical data
➢ Individual recorded value of the result of an experiment is called
an observation.
➢ A collection of such observations may be termed as data or
statistical data.
➢ Suppose we want to know the average weight of students of
first year class in a certain college. We record the weight of
each student in that class. The numerical value so recorded is
called an observation. The collected observations are
collectively called data.
Statistical data or data are classified as Primary data and
Secondary data.
➢ If the data is collected for the first time for the purpose of the
current study, it is called Primary data.
➢ Secondary data is that which has been already collected by
someone or some organization for some other purpose .
Thursday, February 6, 2025
Variable: a characteristic that can take on
different values for different persons, places,
or things
Types of Variables
Qualitative Variables (Categorical Variables)
1. Nominal Variables
2. Ordinal Variables
Quantitative Variables (Numerical Variables)
1. Discrete Variables
2. Continuous Variables
Thursday, February 6, 2025 Introduction to Population Health Module
Types of variables
Qualitative/Categorical
Quantitative/Numerical variables variables
mentioned as numbers mentioned as text
Usually coded for computer use
CONTINUOUS NOMINAL
accept decimals and fractions Can not be ordered
DESCRETE ORDINAL
Integers, Can be ordered
do not accept decimals and fractions Or Ranked
Thursday, February 6, 2025 Introduction to Population Health Module
Qualitative Variables (Categorical Variables)
A categorical variable is one for which the observations
(usually textual) recorded result in a set of categories.
Some characteristics are not capable of being
measured as in the sense height or weight are
measured. In such cases, measuring consists of
categorizing, e.g.., sex, race.
Nominal variables :-- have no inherent ranking or order.
e.g., Disease status – present or absent, Alive – Yes or
No, Occupation – clerical, professional, managerial;
Food types – Carbohydrate, fat, protein; Eye color –
blue, green, brown.
Ordinal variables :-- have inherent ranking. e.g., synopsis
of clinical condition – good, fair, poor; staging of tumor
–I, II, III, IV.
If a variable can take on only two values such as absent or
present, positive or negative, alive or dead, these are
called Binary or Dichotomous variables.
Thursday, February 6, 2025 Introduction to Population Health Module
Quantitative Variables (Numerical Variables)
A variable that can be measured in usual sense. For
example, height, weight, blood pressure, age.
▪ Continuous variables :-- Can take on every
possible value between two numbers, decimals or
fractions are possible. e.g., age, systolic blood
pressure, serum cholesterol, height, weight.
▪ Discrete variables:-- Variable with gaps or
interruptions in possible values, such as counts,
e.g., number of hospital admissions per day,
number of students in nursing class.
Thursday, February 6, 2025 Introduction to Population Health Module
Types of Variables
EXAMPLE 1.
▪ Suppose that we measure whether or not one
regularly takes a vitamin for a sample of 50
pregnant women attending antenatal clinic at
FAMCO. Then,
– The variable is ------------
– The population -----------
– The sample ---------------
– The Type of variable ---------------
Thursday, February 6, 2025 Introduction to Population Health Module
Types of Variables
EXAMPLE 2.
▪ Suppose that we measure the hemoglobin level in
g/dl for a sample of 75 people who have a certain
disease ‘X’. Then,
– The variable is ------------
– The population -----------
– The sample ---------------
– The Type of variable ---------------
Thursday, February 6, 2025 Introduction to Population Health Module
Methods of Presenting Data :
▪ Tables
▪ Graphs
– Line graph
– Histogram & Frequency Polygon
– Population Pyramid
– Bar Chart
– Grouped Bar Chart
– Stacked Bar Chart
– 100% Component Bar Chart
▪ Maps
- Spot Map
- Area Map
▪ Summary statistics
Thursday, February 6, 2025 Introduction to Population Health Module
TABLES
A table is a set of data arranged in rows and columns.
Almost any quantitative information can be organized into a
table.
Some General Principles to Make Tables
Tables should be as simple as possible.
Generally three variables are a maximum number which can be
read easily.
Thursday, February 6, 2025 Introduction to Population Health Module
Some General Principles to Make Tables
▪ Tables should be self explanatory.
– Use a clear and concise title that describe what,
where, and when of the data in the table.
– Codes, abbreviations, or symbols should be
explained in detail in a footnote.
– Each row and column should be labeled
concisely and clearly.
– The specific units of measurement for the data
should be given.
– Totals should be shown.
Thursday, February 6, 2025 Introduction to Population Health Module
1- Tabular presentation
1.1 Simple frequency distribution Table
(S.F.D.T.)
Title : answering :what? Where? When?
Name of variable
(Units of variable) Frequency %
-
- Categories
-
Total
Thursday, February 6, 2025
The following are the blood groups of 25 patients: A , AB, O, A, B, O, B,
B, O, A, O, A, A, B, B, O, A, B, B, O, B, B, AB, O, O.
Table 1. Distribution of 25 patients at the surgical department of “x”
hospital in January 2015 according to their ABO blood groups
Blood group Frequency %
A 6 24
B 9 36
AB 2 8
O 8 32
Total 25 100
Thursday, February 6, 2025
Table 2. Age Distribution of Study Participants
Age (years) Frequency Percentage
15-<20 6 4.9
20-<25 10 8.1
25-<30 32 26.0
30-<40 41 33.3
40-<50 30 24.4
50+ 4 3.3
TOTAL 123 100
Thursday, February 6, 2025 Introduction to Population Health Module
1.2 Complex frequency distribution Table
Table 3. Distribution of 60 patients at the chest department of “x”
hospital in May 2008 according to drug type and cure
Lung disease
Total
Drug Cured Deteriorated
No. % No. % No. %
A 15 65.2 8 34.8 23 100
B 5 13.5 32 86.5 37 100
Total 20 33.3 40 66.7 60 100
Thursday, February 6, 2025
Two Variable Table
Thursday, February 6, 2025 Introduction to Population Health Module
Three Variable
Table
Thursday, February 6, 2025 Introduction to Population Health Module
2- Graphical presentation
Graphs drawn using Cartesian coordinates
• Line graph
• Bar graph
• Histogram
• Frequency polygon
• Frequency curve
• Scatter plot
• Box Plot
Pie chart
Thursday, February 6, 2025 Introduction to Population Health Module
2.1 Line Graph
Numeric data changing along time , simple or complex tables
Year MMR
MMR/1000 1960 50
60
50 1970 45
40
30 1980 26
20
10 1990 15
0
Year
1960 1970 1980 1990 2000 2000 12
Figure (1): Maternal mortality rate of (country), 1960-2000
Line graphs are often used to plot changes in data over time, such as
temperature changes of a patient (temperature chart) . You can use
these changes to find trends in your data and possibly to predict future
results.
Thursday, February 6, 2025
2.2 Bar Chart
➢Simple Bar Charts
In these charts, bars of uniform width are used.
The length of the bars is taken proportional to
magnitude of the values represented. Bar chart is
obtained by plotting categories (of some constant
width) along X-axis and raising bars of the length
equal to the corresponding numbers along Y-axis.
Usually some fixed gap is left between two bars. Bar
charts are mainly used for graphical representation of
categorical data.
Thursday, February 6, 2025 Introduction to Population Health Module
%
100
90 Health workers
80 blood group
70
Blood
60 %
50
40
group
40
30
30 A 40
20
20
10 B 10
10
0
O 30
A O AB B Blood group AB 20
Figure (1): ABO blood group for Healthcare
workers ("X" hospital 2015)
Simple Bar Chart
Thursday, February 6, 2025
25
➢Grouped/Multiple Bar Charts
In these charts, grouped bars are used to represent
related sets of data, for example males and females.
The advantage of multiple bar charts is that comparison
can be made easily.
➢Subdivided/stacked Bar Charts
In these charts, each bar is subdivided into different
sections having different shadings.
➢100% Component Bar Charts
This is a variant of a stacked bar chart, in this we make
all the bars of the same height (or length) and show the
components as percent of the total rather than as actual
values.
This is useful for comparing the contribution of different
components to each of the categories of the main
variable. Introduction to Population Health Module
Thursday, February 6, 2025
Grouped Bar Chart Blood group males females
A 40 20
B 10 30
50 O 30 5
%
males females 45
45 AB 20 45
40
40
35
30 30
30
25
20 20
20
15
10
10
5
5
0
A B O AB
Blood group
Figure (2): ABO blood group by gender of
Healthcare workers ("X" hospital 2015)
Thursday, February 6, 2025
27
100% Component
Bar Chart
Figure (2): ABO blood group by gender of Healthcare
workers ("X" hospital 2015)
%
100
90 20
80 45
70
30
60 AB
50 5 O
10
40 B
30
30 A
20 40
10 20
0
males females
Thursday, February 6, 2025
28
Subdivided Bar or Stacked Bar Chart
Thursday, February 6, 2025
29
2.3 Pie chart
▪ A pie chart is a circle divided into different
sectors, with angles at the center proportional to
different components of a total. Pie chart can be
used to compare the totals and the components.
Continents Number of
Cases
America 252977
Africa 129066
Europe 60195
Australia 3189
Asia 1254
Total 446681
Thursday, February 6, 2025 Introduction to Population Health Module
Graphical Presentation of Quantitative
Data
Commonly used graphs are histogram, frequency polygon and
frequency curve.
➢Histogram is a graphical display of a frequency
distribution and is obtained by plotting the class intervals
along the X-axis and frequencies along the Y-axis.
➢Frequency polygon is a graph obtained by joining by
straight lines the mid points of the tops of the bars of the
histogram.
➢Frequency curve is a smoothed curve, which does not
necessarily pass through the mid points like frequency
polygon. This curve is very important as analysis of the data
depends on the shape of the curve drawn.
➢Box plot
Thursday, February 6, 2025 Introduction to Population Health Module
2.3 Histogram
For Continuous data in simple table
Thursday, February 6, 2025
32
Reaction time Frequency
(in seconds)
0-<10 1
10-<20 2
20-<30 8
30-<40 12
40-<50 6
50-<60 3
Figure (2): Distribution of patients at (place) , in (time) by
reaction time to drug “X”
Thursday, February 6, 2025
Population Pyramid
Thursday, February 6, 2025 Introduction to Population Health Module
2.4 Frequency polygon
For Continuous data in simple or complex tables
Thursday, February 6, 2025
Frequency polygon
Males Females
% Sex
Age M-P
40 M F
35
20- (12%) (10%) 25
30
25
30- (36%) (30%) 35
20
40- (8%) (25%) 45
15
50- (16%) (15%) 55
10
60-70 (8%) (20%) 65
5
0
Age
25 35 45 55 65
Figure (2): Distribution of 45 patients at (place) , in (time) by age and sex
Thursday, February 6, 2025
2.5 Frequency curve
For Continuous data in simple or complex tables of many categories
9
8 Female
7 Male
6
Freq uency
5
4
3
2
1
0
20- 30- 40- 50- 60-69
Age in years
Thursday, February 6, 2025 Introduction to Population Health Module
2.7 Scatter diagram
To show relation between two numerical variable
• Rectangular coordinate
• Two quantitative variables
• One variable is called independent (X) and the second is called dependent (Y)
• Points are not joined
• No frequency table
NB: Variables can further be divided
into dependent and independent
variables. Dependent variable is also
called response variable and an
independent variable is also called
predictor or explanatory variable
Thursday, February 6, 2025
Wt. 67 69 85 83 74 81 97 92 114 85
(kg)
SBP(mmHg) SBP 120 125 140 160 130 180 150 140 200 130
(mmHg)
220
200
180
160
140
120
100
80 wt (kg)
60 70 80 90 100 110 120
Scatter diagram of weight and systolic blood pressure
Thursday, February 6, 2025
Scatter Plot Patterns
The figure below shows common patterns of correlation.
Strong, Positive Strong, Negative
Correlation Correlation
Weak, Positive
Weak, Negative
Correlation
Correlation
Y Y
Complex No Correlation
X X
Thursday, February 6, 2025 Introduction to Population Health Module
2.8 Box Plot
Suitable for quantitative data
• Q1 = 1st quartile (25th percentile)
• Q2 = 2nd quartile = Median
(middle value)
• Q3 = 3rd quartile (75th percentile)
• Dots represent outliers.
41
Thursday, February 6, 2025 Introduction to Population Health Module
Maps
▪ Maps are used to show the geographic
location of events or attributes
(spot or area map)
Spot Maps
▪ Spot maps use dots or other symbols to show
where each case-patient lived or was
exposed.
▪ A spot map is useful for showing the
geographic distribution of cases
Thursday, February 6, 2025 Introduction to Population Health Module
Spot
Map
Thursday, February 6, 2025 Introduction to Population Health Module
Area Map
▪ An area map, also called a chloropleth map, can
be used to show rates of disease or other health
conditions in different areas by using different
shades or colors.
▪ When choosing shades or colors for each
category, ensure that the intensity of
shade or color reflects increasing disease
burden
Thursday, February 6, 2025 Introduction to Population Health Module
Area Map
Thursday, February 6, 2025 Introduction to Population Health Module
Frequency Distribution
▪ Grouping of Data:
▪ Categorical or Quantitative Variables
Thursday, February 6, 2025 Introduction to Population Health Module
Frequency Distribution
Table 2. Age Distribution of Study Participants
Age (years) No. %
15-<20 6 4.9
20-<25 10 8.1
25-<30 32 26.0
30-<40 41 33.3
40-<50 30 24.4
50+ 4 3.3
TOTAL 123 100
Thursday, February 6, 2025 Introduction to Population Health Module
Frequency Distribution
Table 1. Demographic characteristics of study participants
Characteristic No. %
(n=123)
AGE
<=30 Yrs 48 39.0
> 30 Yrs 75 61.0
SEX
Male 75 61.0
Female 48 39.0
EDUCATION
< High school (H.S) 48 39.0
H.S or above 75 61.0
Thursday, February 6, 2025 Introduction to Population Health Module
Frequency Distribution
Histogram and Frequency Polygon
- May display frequency distribution
graphically by histogram
- The values are shown on X-axis
- Frequency shown on Y-axis
- Above each class interval a rectangular
bar, or cell, is constructed. The height of the cell
corresponds to the Frequency.
Thursday, February 6, 2025 Introduction to Population Health Module
Frequency Distribution
Histogram
and
Frequency
Polygon
Thursday, February 6, 2025 Introduction to Population Health Module
Summary statistics
Variability
Central location
(averages) (Dispersion)
Variance
Mean
Median Standard Deviation
Mode Semi interquartile range
Coefficient of variation
Non Central location
Range
Quartiles and percentiles
Thursday, February 6, 2025
3-Descriptive statistics
A) Measures of Central Tendency
– Characteristics that describe the middle
or most commonly occurring values in
a series
– Used as summary measures for series
i.e., summarize the attributes of
continuous variables.
• Mean
• Median
• Mode
Thursday, February 6, 2025 Introduction to Population Health Module
Arithmetic mean
▪ Probably most common of the measures of central
tendency
– a.k.a. ‘average’
▪ Definition
x=
x i
n
– For a given set of data there is one and only one
arithmetic mean.
– Arithmetic mean is easily understood and easy to
compute
Weakness
– Influenced by extreme values
Thursday, February 6, 2025 Introduction to Population Health Module
Example:
Suppose the weights of 14 patients are 62, 64, 65,
66, 68, 70, 70, 70, 70, 74, 74, 77, 77 and 79 Kg,
The mean for this data is
=(62+64+65+66+68+70+70+70+70+74+74+77+7
7+79) / 14
= 1036/14 = 74 Kg.
54
Thursday, February 6, 2025 Introduction to Population Health Module
Median:
– The value which divides the ‘ordered array’ into two equal
parts
_ Frequently used if there are extreme values in a distribution
or if the distribution is non-normal
If an odd number of observations, the median will be the
(n+1)/2 observation
Ex.: Median of 11 observations is the 6th observation
If an even number of observations, the median will be the
midpoint between the middle two observations
Ex.: Median of 12 observations is the midpoint between 6th and
7th
Thursday, February 6, 2025 Introduction to Population Health Module
MEDIAN:
For data of the previous example
Example:
Suppose the weights of 14 patients are 62, 64, 65, 66, 68,
70, 70, 70, 70, 74, 74, 77, 77 and 79 Kg,
i) Already in order
ii) Median ranks = 14/2 =7 & (14/2)+1 =8 (7th and 8th
observation)
iii) Median = (70 + 70 )/2 = 70 Kg (The average of the 7th and
8th observation)
56
Thursday, February 6, 2025 Introduction to Population Health Module
➢Quartiles and percentiles are calculated using the same
concept and steps as median yet each differ in calculating
the rank:
➢Q1 rank= (n+1)/4
➢Q3 rank= (n+1) * ¾
➢P10 rank= n* 10/100
➢P97 rank=n* 97/100 and so on.
➢Note that the median = Q2 = P50
57
Thursday, February 6, 2025 Introduction to Population Health Module
Mode
▪ Not used very frequently in practice
Definition: Value that occurs most frequently in data set
For example, for weights: 50, 63, 67, 63, 52, 70, 75, 72,
– Mode is 63
▪ If all values different, no mode
For example, for weights: 50, 63, 67, 61, 52, 70, 75, 72,
▪ May be more than one mode
– Bimodal or multimodal
For example, for weights: 50, 63, 67, 63, 52, 70, 75, 70,
Here are two modes: the modes are 63 Kg and 70 Kg.
Thursday, February 6, 2025 Introduction to Population Health Module
Range:
is the difference between the highest and lowest values.
Heavily influenced by two most extreme values and
ignores the rest of the distribution
Example:
The following data represent the weight of 10 persons:
60, 53, 80, 76, 89, 56, 42, 46, 88, and 95 kg. Find the range.
Answer :
largest observation = 95 smallest observation = 42
The range = 95 - 42 = 53 kg
Thursday, February 6, 2025
Interquartile range (IQR):
It is equal to the distance between third and first
quartiles
IQR = (Q3 - Q1)
Example:
For the set of weights, where
Q1 = 10.5 Kg and Q3 = 35.5 Kg
IQR= Q3-Q1= 35.5-10.5= 25 kg
Thursday, February 6, 2025
Semi Interquartile range (SIQR):
It is equal to half of the distance between third and
first quartiles (Q3 - Q1)
SIQR=
2
Example: for the set of weights where Q1 = 10.5 Kg
and Q3 = 35.5 Kg
IQR= Q3-Q1= 35.5-10.5= 25 kg
SIQR=(Q3-Q1)/2 = 12.5 kg
SIQR is used along with MEDIAN when data is
skewed.
Thursday, February 6, 2025
Variance
▪ Variance measures distribution of values
around their mean
▪ Definition of Population Variance
It is the average of squared deviations from
the mean. Or Sum of the squared
deviations from the mean divided by the
number of observations.
σ2 = ∑ (xi – x)2 / n
Thursday, February 6, 2025 Introduction to Population Health Module
Variance
Formula for sample variance
s = ( xi − x ) /(n −1)
2 2
▪ Degrees of freedom
- n-1 used because if we know n-1 deviations, the nth
deviation is known.
- Deviations have to sum to zero
Thursday, February 6, 2025 Introduction to Population Health Module
Standard Deviation
▪ It is the positive square root of the variance.
s= s 2
▪ Standard deviation in same units as mean
– Variance in units2
➢Standard deviation (SD) is a widely used measure of
the variability or dispersion, when the data is normally
distributed
➢It shows how much individual values vary from the
“mean”.
➢A low standard deviation indicates that the data points
tend to be very close to the mean, whereas high
standard deviation indicates that the data is spread out
over a large range of values.
Thursday, February 6, 2025 Introduction to Population Health Module
Example
The weights (in Kg) of 9 children attending well
baby clinic are as follow.
2, 4, 5, 5, 6, 6, 6, 4, 7
Compute:
a) The mean b) The median
c) The mode d) Range
e) Variance f) Standard Deviation
Thursday, February 6, 2025 Introduction to Population Health Module
Example
xi (xi – x) (xi – x)2
2 -3 9
2, 4, 5, 5, 6, 6, 6, 4, 7 4 -1 1
5 0 0
5 0 0
Mean = 45/9 = 5 6 1 1
6 1 1
6 1 1
4 -1 1
7 2 4
45 0 18
Thursday, February 6, 2025
Variance:
S2 = (9 + 1 + 0 + 0 + 1 + 1 + 1 + 1 + 4) / (9-1)
= 18/8
= 2.25
Standard Deviation:
S = √ 2.25 = 1.5
Range: Smallest = 2 Largest = 7
Range = 7-2 = 5
Thursday, February 6, 2025
7 7 7 8
7 77 7 77
7 6
3 2
7 8 13
9
Mean = 7
Mean = 7
SD=0.63
SD=0
Mean = 7
SD=4.04
Thursday, February 6, 2025 Introduction to Population Health Module
Box Plot
A box and whisker plot is defined as a graphical method
of displaying variation in a set of data.
A boxplot is a standardized way of displaying the
distribution of data based on a five number summary
(“minimum”, first quartile (Q1), median, third quartile
(Q3), and “maximum”).
It can also tell you if your data is symmetrical, how
tightly your data is grouped, and if and how your data is
skewed.
Thursday, February 6, 2025 Introduction to Population Health Module
Box Plot
Thursday, February 6, 2025 Introduction to Population Health Module
Box Plot
Thursday, February 6, 2025 Introduction to Population Health Module
➢ Question No (1):
Draw a boxplot for the following data:
10, 6, 16, 17, 13, 12, 8 14, 15, 9, 20, 23, 5
Answer:
Arranged data: 5, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17,
20, 23
Min: 5 Max: 23 Q2 (Median)= 13
Q1= (13+1)x1/4 = 14/4=3.5th observation, (average
of 8 and 9) = 8.5
Q3= (13+1)x3/4 = (14/4)x3 = 10.5th observation,
(average of 16 and 17) =16.5
Thursday, February 6, 2025 Introduction to Population Health Module
Question No (1):
Ans.
8.5 16.5
13
5 23
Thursday, February 6, 2025 Introduction to Population Health Module
The Normal Distribution
➢The normal distribution is the most important
of the continuous distributions. It is a
probability distribution of a continuous random
variable X which ranges from −∞ to +∞. It has
two parameters: the mean µ and the standard
deviation σ.
➢Standard Normal distribution: - The normal
probability distribution of Z = (X − µ) / σ, which
has zero mean and unit variance, is called the
standardized normal distribution or standard
normal distribution and is denoted by N (0, 1).
Thursday, February 6, 2025
74
The Normal Distribution curve
Thursday, February 6, 2025 Introduction to Population Health Module
Properties of a Normal Distribution
▪ The distribution is bell-shaped, uni-modal, and symmetrical.
▪ As the distribution is symmetrical, its mean, median and
mode coincide and are all equal to µ. That is, In a normal
distribution Mean = Median = Mode.
▪ In a normal distribution, approximately, 68% data (area) lies
within 1 SD from the mean, 95% data (area) lies within 2 SD
from the mean, and 99% data (area) lies within 3 SD from
the mean.
▪ Area under the curve = 1, divided into 2 symmetrical halves
by a vertical line from the highest point, meeting the
horizontal at the mean.
▪ It is completely determined by its mean and standard
deviation σ (or variance σ2)
Thursday, February 6, 2025 Introduction to Population Health Module
Question No (2):
For a sample of 200 medical students, the mean
heart rate is 70 beats/min, with a standard deviation
of 10 beats/min. Consider that heart beats are
normally distributed.
What percentage of students will have heart beats:
1) Between 60-80 beats/min
2) Above 90 beats/min
3) Below 60 beats/min
Thursday, February 6, 2025 Introduction to Population Health Module
Question No (2):
Ans.
Thursday, February 6, 2025 Introduction to Population Health Module
Question No (2):
Ans.
Thursday, February 6, 2025 Introduction to Population Health Module
Skewness:
Skewness is the degree of departure from symmetry of a
distribution.
A positively / Right skewed
distribution has a "tail" which
is pulled in the positive
direction.
A negatively / Left skewed
distribution has a "tail" which
is pulled in the negative
direction.
Thursday, February 6, 2025
Positively skewed (or skewed to the
right) distribution
▪ It arises when the mean is increased by some
unusually high values.
▪ Most of the observations are of low values.
Mode < Md. < Mean
Thursday, February 6, 2025 Introduction to Population Health Module
Negatively skewed (or skewed to the left)
distribution
▪ It arises when the mean is reduced by some
extremely low values
▪ Most of the observations are of high values
Mean < Md. < Mode
Thursday, February 6, 2025 Introduction to Population Health Module
Distribution of data
▪ Normal: Mean = Median = Mode
▪ Left (Negatively) skewed: Mean ≤ Median ≤ Mode
▪ Right (Positively) skewed: Mode ≤ Median ≤ Mean
83
Thursday, February 6, 2025 Introduction to Population Health Module
Question No (3):
For the set of data {15, 18, 11, 8, 19, 11, 23, 35, 46,
7, 26, 29, 36} , you would call it
(a) Normally distributed
(b) Positively skewed
(c) Negatively skewed
(d) can not determine
Thursday, February 6, 2025 Introduction to Population Health Module
Question No (3): Answer
1. First, arrange data in ascending order
7, 8, 11,11, 15, 18, 19, 23, 26, 29, 35, 36, 46
2. Find, median, mode, and mean
Median= 19 Mode= 11 Mean=21.85
In this case, Mode ≤ Median ≤ Mean. So, this is positively
(Right) skewed
This you could decide even by looking at just the mode
and median.
Also, the difference between min value (7) and median (19) is
12. Whereas, the difference between median (19) and max
value (47) is 28. This tells us that extreme values are on the
right side or tail is on the right side. So, it is positively (right)
skewed
Thursday, February 6, 2025 Introduction to Population Health Module
THANK YOU
86
Thursday, February 6, 2025 Introduction to Population Health Module