0% found this document useful (0 votes)

135 views31 pages

Session 1 DEN1015H 2013 Lecture Notes

Uploaded by

Jeff Chadwick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views31 pages

Session 1 DEN1015H 2013 Lecture Notes

Uploaded by

Jeff Chadwick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

DEN 1015H LECTURE NOTES Session 1

INTRODUCTION

The field of Statistics is the subject that deals with the collection, analysis and
interpretation of numerical information. Statistics is applied to the planning and
analysis of research studies in virtually all subject areas. Biostatistics is a branch
of statistics that deals primarily with the health sciences and the biological
sciences.

Statistical and biostatistical methods are used for summarizing and organizing data
to allow efficient reporting and presentation of the results and also to provide
simple interpretations of the findings. This lecture gives many examples of the
graphical and tabular techniques of descriptive statistics that are used for
describing and summarizing data. The next lecture introduces concepts of
probability. Probability models describe the variability in data and also form the
logical basis of statistical inference, by which conclusions are drawn from data.
The remainder of the course will focus on several methods of statistical inference
that have important applications in biomedical research. Many of the examples
used to illustrate the concepts involve data from recent dental research.

DESCRIPTIVE STATISTICS

1. Defining & Summarizing Data

“Raw” data are observations derived from people, lab animals, lab specimens,
hospitals, etc.

The number of individuals (or animals, specimens, cells, hospitals, etc) is called
the sample size.

A variable relates to anything that is measured, recorded or manipulated in a

study.

In a data set the variables are the columns and the observations are the rows (Fig.
1).

1
© Dr. Herenia P. Lawrence
Implant Subject Type Irradiation Graft Failure
1 1 1 1 1 1
2 1 2 1 1 0
3 2 1 1 1 1
4 2 2 1 1 1
5 3 1 1 1 1
6 3 2 1 1 1
7 4 1 1 0 0
8 4 2 1 0 0
9 5 1 1 0 1
10 5 2 1 0 1
11 6 1 1 0 1
12 6 2 1 0 1
13 7 1 0 1 1
14 7 2 0 1 0
15 8 1 0 1 0
16 8 2 0 1 0
17 8 2 0 1 0
18 9 1 0 1 1
19 9 2 0 1 1
20 10 1 0 0 0
21 10 2 0 0 0
22 11 1 0 0 0
23 11 2 0 0 0
24 12 1 0 0 0
25 12 2 0 0 0
‘Type’: 1 = maxilla; 2 = mandible. For ‘irradiation’, ‘graft’, and
‘failure’: 1 = yes;; 0 = no.

Fig. 1. Hypothetical data set from a study investigating the failure rate of 25
implants placed in grafted and non-grafted mandibular/maxillary bone in 6
irradiated and 6 non-irradiated subjects.

TYPES OF VARIABLES

The first step, before performing statistical tests, is to decide what type of variables
(or data) one is dealing with, since different statistical analyses are needed for each
type of variable. Nominal and ordinal variables consist of counts in categories and
are analyzed using “non-parametric” statistics. Interval and ratio variables consist
of actual quantitative measurements and are analyzed using “parametric” statistics.

2
© Dr. Herenia P. Lawrence
There are a number of typologies, but one that has proven useful is given in Table
1 and Fig. 2. The basic distinction is between quantitative data (for which one asks
“how much?”) and qualitative or categorical data (for which one asks “what
type?”). For a review, see suggested readings for today’s class.

Table 1. Examples of types of variables (data)

Qualitative (or categorical): A categorical variable records the category to which an
observation belongs. Numerals (1, 2, 3, etc.) are usually used to record the category
membership but these are not to be confused as having any meaning as numerical values.
If the data could be recorded using letters (A, B, C, etc.) without losing information, then
we have a categorical variable.
a) Nominal (unordered categories, e.g. Caucasian/Black/Hispanic)
Subtype: Binary/Dichotomous (e.g. yes vs. no, dead vs. alive, male vs. female,
treatment vs. control). For purposes of statistical analysis such variables are
usually numerically coded using the numerals 0 and 1.
b) Ordinal (ordered categories, e.g. categories of pain severity: none, mild,
moderate, and severe; stage I, II or III cancer; a Likert scale – strongly
disagree, disagree, neutral, agree, strongly agree).
Please note: you should not use mean scores for these!!!
b) Ranked (e.g. ten leading causes of failure of dental implants, which have been
arranged from the cause that resulted in the greatest number of failures to the
cause that resulted in the fewest. These causes were then assigned consecutive
integers that correspond to their place in the sequence).

Quantitative: For a quantitative variable, differences between possible values have

meaning independent of the values themselves. For example, a difference between a
value of 0 and a value of 1 is comparable to a difference between a value of 1 and a value
of 2.
a) Discrete (integer or whole numbers, e.g. DMF score; number of teeth; number
of children). OK to compute.... Text Nominal Ordinal Interval Ratio
frequency distribution. Yes Yes Yes Yes
b) Continuous median and percentiles.
add or subtract.
No
No
Yes
No
Yes
Yes
Yes
Yes
These have two subtypes: mean, standard deviation, standard error of the mean No
ratio, or coefficient of variation. No
No
No
Yes
No
Yes
Yes
Interval scale – no natural zero (e.g. IQ, degree Celsius, probing pocket depth,
clinical attachment level)
Ratio scale – has a natural zero (e.g. length in metres, salivary flow rate, age,
pulse rate, vital capacity).

In general, the amount of information increases as one goes from nominal to ratio
variables. Classifying interval measures into large categories is akin to throwing
away data.
3
© Dr. Herenia P. Lawrence
There are other ways of defining types of variables. In an experiment, the
independent variables are those that are varied by and under the control of the
experimenter; the dependent variables are those that respond to experimental
manipulation. For example, in a clinical trial to determine the effect of periodontal
therapies on attachment gain, the independent variable is the type of therapy and
the dependent variable is the gain in attachment measured in millimeters.
Dependent variables should be clinically important and related to the independent
variables.

Data Types

Qualitative Quantitative
Dependent Independent
(periapical (CHX irrigation
lesion) versus saline)
Categorical/Nominal Discrete
[sex (dichotomous), (# teeth)
marital status]
Continuous
Categorical/Ordinal
(stage of cancer, pain
rating, Likert scale) Interval Ratio
(36º-38º C, (age, pulse rate,
probing pocket vital capacity,
depth) VAS)

Fig. 2. Types of Variables

4
© Dr. Herenia P. Lawrence
FREQUENCIES, FREQUENCY DISTRIBUTIONS AND GRAPHS
The next step in the process of analyzing data is to describe the data by using a
frequency distribution to reflect the probability of the occurrence of an event. A
frequency distribution consists of a set of frequencies for all possibilities.

Frequencies for categorical variables

Summarizing categorical variables is straightforward, the main task being to count
the number of observations in each category. These counts are called frequencies.
They are often also presented as relative frequencies; that is, as proportions or
percentages of the total number of individuals (sample size).

Example: Type of malocclusion in a sample of 200 schoolchildren (WHO criteria

of malocclusion)
Type of Malocclusion Frequency Relative
Frequency (%)
Normal 100 50.0
Moderate 61 30.5
Severe 39 19.5
Total 200 100.0

Frequency (or count) the number of people belonging to a category, e.g. number
of students who present moderate malocclusion = 61
Relative frequency the proportion of people belonging to a category (frequency
divided by the total sample size). It can be expressed as a
percentage, e.g. the percentage of pupils who have moderate
degree of malocclusion = 61/200 100 = 30.5%.

Bar Diagram & Pie Chart

Bar and pie charts are popular types of graphs used to display frequencies and
relative frequencies.

5
© Dr. Herenia P. Lawrence
Severe Malocclusion 39

Moderate Malocclusion 61

Normal Occlusion 100

0 20 40 60 80 100 120
Severe
Malocclusion
Number of Schoolchildren 19.5%

Normal Occlusion
50%

Moderate
Malocclusion
30.5%

6
© Dr. Herenia P. Lawrence
Frequency distributions for quantitative variables

Example: DMF scores for a group of 50 8-year-old children.

Relative Cumulative
Frequency (no. Cumulative
DMF score Frequency Relative
of children) Frequency
(%) Frequency (%)
0 4 8 4 8
1 2 4 6 12
2 3 6 9 18
3 6 12 15 30
4 7 14 22 44
5 10 20 32 64
6 9 18 41 82
7 3 6 44 88
8 1 2 45 90
9 1 2 46 92
10 4 8 50 100
Total 50 100

Cumulative Frequency the number of people in the sample with values less than
or equal to a specified value, e.g. 32 of the children have
a DMF score of 5 or less.

Relative Frequency (%) the number of people in the sample taking each value
divided by the total number of people studied and
multiplied by 100, i.e., the proportion of the total sample
in each category. This proportion can be interpreted as
the probability that an individual chosen at random from
the original sample may fall in a particular category or
within a range of categories.

Cumulative Relative the percentage of people in the sample with values less
Frequency (%) than or equal to a specified value, e.g. 64% of the sample
have a DMF score of 5 or less.

7
© Dr. Herenia P. Lawrence
Histograms
The histogram is an appropriate method for depicting a frequency distribution for
discrete or continuous data. Values are grouped into intervals, generally of equal
size. These intervals are then represented by bars with (if intervals have equal
width) heights proportional to the frequency of observations contained within
them.

Example: Histogram with equal class intervals

The following frequency distribution shows the age at onset of edentia in a sample
of 200 edentulous persons.

Age at the last birthday Frequency

10-19 8
20-29 56
30-39 70
40-49 43
50-59 18
60-69 4
70-79 1
Total 200

The class intervals are all ten years.

80
70 10-19 yr
70
20-29 yr
F 60 56
r 30-39 yr
e 50
q 43 40-49 yr
u 40
e 50-59 yr
n 30
c
18 60-69 yr
y 20
70-79 yr
10 8
4 1
0
Age (years)

8
© Dr. Herenia P. Lawrence
Example: Histogram with unequal class intervals

The following data are abstracted from a paper on the age at onset of edentia in
another sample of edentulous persons. The percentage distribution is shown.

Age at the Class interval % of total

last birthday
11-15 5 0.5
16-19 4 3.5
20-24 5 10.5
25-29 5 17.5
30-34 5 20.0
35-44 10 29.0
45-54 10 14.0 *5% divided by
55-74 20 5.0 20 and
multiplied by 5
Total 100.0 equals 1.25

25 11-15 yr
% 20 16-20 yr
20 21-25 yr
p 17.5
e
26-30 yr
o 14.5 14.5
r 31-35 yr
f 15
36-40 yr
5 10.5
a 41-45 yr
g 10
y 7 7 46-50 yr
e
e
51-55 yr
a 5 4.4
r 56-60 yr
s 1.25 1.25 1.25 1.25 61-65 yr
0.5
0 65-70 yr
Age (years)
71-75 yr

To standardize the data into equal intervals of 5 years, you will need to divide the
relative frequency by the width of the interval and then multiply it by 5 (see
example above*).

9
© Dr. Herenia P. Lawrence
Frequency Polygon
The frequency polygon is similar to the histogram. It is constructed by placing a
point at the center of each interval of the histogram such that the height of the point
is equal to the frequency or relative frequency associated with that interval.

NUMBER OF HOURS WORKED PER WEEK

AS REPORTED BY DENTISTS
60
Histogram
45
Percentage

0
1 to 10 11 to 20 21 to 30 31 to 40 41 to 50 51 to 60
Number of Hours

NUMBER OF HOURS WORKED PER WEEK

AS REPORTED BY DENTISTS
60
Frequency
Polygon
45
Percentage

0
1 to 10 11 to 20 21 to 30 31 to 40 41 to 50 51 to 60
Number of Hours

10
© Dr. Herenia P. Lawrence
Stem-and-leaf plot
It resembles a histogram, with the first digit(s) of each datum along the “stem” and
the last digit(s) forming the “leaves”.

Urinary concentration of lead in 15 children from housing estate ( mol/24hr)

0.6, 0.1, 1.1, 0.4, 2.6, 2.0, 0.8, 1.3, 1.2, 1.5, 3.2, 1.7, 1.9, 1.9, 2.2

a) Stem-and-leaf “as they come”

Stem Leaf

0 6 1 4 8
1 1 3 2 5 7 9 9
2 6 0 2
3 2

We then order the leaves, as in “b)”

b) Ordered stem-and-leaf plot

Stem Leaf

0 1 4 6 8
1 1 2 3 5 7 9 9
2 0 2 6
3 2

11
© Dr. Herenia P. Lawrence
Numerical data can be further summarized by measures that describe where the
center of the distribution lies – mean, median, and mode – and measures of how
wide the distribution is – range, percentile, and standard deviation.

MEASURES OF CENTRAL TENDENCY

MEAN the sum of all observations divided by the total number of

observations (or sample mean or arithmetic mean), as follows:
n
xi
i 1
x
n
pronounced “x-bar,” where xi = the value for the ith subject in the
sample
= the Greek capital letter sigma, indicating a summation over all xi’s
n = the number of subjects in the sample (also called the sample size).

MEDIAN the middle value of the distribution, i.e., the value for which 50% of
the sample have values less than or equal to the median and 50% have
values greater than or equal to the median (50th percentile). It is
calculated by rank ordering (from lowest to highest) the values and
then determining the value corresponding to the middle rank, i.e., the
rank order (n+1)/2. Thus, if the sample contains an odd number of
subjects, the median will be the value of the subject with the middle
rank. If the sample contains an even number of subjects, the median
value will fall half-way between the values of the two midmost
subjects.

MODE the most common single value, i.e., the peak of the frequency
distribution. A distribution with two or more modes is referred to as
bimodal, trimodal, etc.

Example: Number of teeth in a sample of 10 babies

6, 6, 6, 6, 7, 8, 8, 10, 10, 12

Mean = 6 + 6 + 6 + 6 + 7 + 8 + 8 + 10 + 10 + 12 = 7.9 teeth

Median = 7.5 teeth [the average of the (n/2)th and (n/2 + 1)th observations if n is
even].

12
© Dr. Herenia P. Lawrence
Mode = 6 teeth

N.B.: The mode is seldom used. If the sample is small, either it may not be
possible to estimate the mode (e.g. when all the values are different), or the
estimate obtained may be misleading.

Properties of the mean and median:

1. If the sample is shifted by a constant c, i.e., c is added to all data values, then
the mean and median are also changed by this same amount.

Example: If a measuring device is not calibrated correctly so that every

measurement is out by c units then the mean of any sample of values will
also be out by c units.

2. If the sample is rescaled by a constant c, i.e., all data values are multiplied
by c, then the mean and median are also multiplied by c.

Example: If the units of serum glucose are changed from mg/dl to g/l
(multiply by 10/1000=1/100), then the mean is changed from 110.1 mg/dl to
1.101 g/l.

Comparison of the mean and the median:

1 Mean values are substantially influenced by unusual values (outliers) so it is

most suitable for distributions that are roughly symmetrical. If unusually
large or small values (outliers) can arise, the median will be less influenced
by these.

2. In some cases, the mean must be used for interpretative reasons since it takes
into account each individual observation.

Example: In a needs assessment survey, the mean number of decayed teeth

per child in a sample of children is more relevant than the median for
inferring the total needs of the population.

13
© Dr. Herenia P. Lawrence
Possible distributions of data values
Symmetric vs. asymmetric (skewed) distributions

Two asymmetric curves, one with positive skew (Curve A - skewed to the right)
and one with negative (Curve B - skewed to the left) skew.

Symmetrical and bell-shaped distributions differing in terms of kurtosis.

Bimodal

14
© Dr. Herenia P. Lawrence
The mean, median, and mode in a symmetric distribution

The mean, median, and mode in a skewed distribution

Histogram of highly skewed data. Note the outlier = 43.

15
© Dr. Herenia P. Lawrence
Data Transformations

Logarithmic (natural logarithms or logarithms to base e = ln or logarithms to the

base 10 = log) transformation (only with positive values)

For Positively skewed distributions

Square or cubic transformations

For Negatively skewed distributions

Example: The following are the number of days spent in hospital by 17 subjects
following an operation:

3, 4, 4, 6, 8, 8, 8, 10, 10, 12, 14, 14, 17, 25, 27, 37, 42

0 10 20 30 40 50

a) Raw Data

0 1 2 3 4 5

b) Ln Data

Fig. 1. One-way scatter plots of length of hospital stay showing a) raw data and b)
data on a logarithmic scale.

The Geometric Mean

Consider the 17 observations of the duration of stay in hospital plotted on a one-

way scatter plot in Figure 1a above. The distribution is skewed to the right with a
few rather large observations. Because of this skewness, the mean duration (14.65)
would not be a satisfactory measure of the central value; the median (10) would be
more useful. Figure 1b shows a one-way scatter plot of the logarithms of the
observations and now the distribution is more symmetric. The mean log duration
(2.41) is therefore a satisfactory measure of the central value of the distribution of
log duration. The anti-logarithm of this mean (antilog 2.41 = exp2.41 = 11.13),
known as the geometric mean, is a better measure of the central value of the
distribution of duration than the original mean. In fact the geometric mean is
usually close to the value of the median.

16
© Dr. Herenia P. Lawrence
Thus, the geometric mean can be calculated by the following three steps:

1. take the logarithm of all data values;

2. calculate the sample mean of the log data values;

3. the geometric mean is the anti-logarithm of the sample mean found in step 2.

antilog of log10 x = 10log x

antilog of ln x = eln x

The geometric mean is used only with data which are heavily positively skewed.
Examples of variables for which logarithms and geometric means are sometimes
useful include concentrations and bacterial counts.

17
© Dr. Herenia P. Lawrence
MEASURES OF DISPERSION or VARIATION, i.e., STATISTICS USED
TO DESCRIBE THE SPREAD OF A DISTRIBUTION

Range: the interval between the lowest and highest value in the distribution.

Percentile ranges
A percentile range is an interval between two specified percentile points, e.g. the
interquartile range includes those between the 25th (Q25) and 75th (Q75)
percentiles; the median is equivalent to the 50th (Q50) percentile point, since
(n+1)/2 = (n+1) 50/100. For a distribution with a large number of observations
the quartiles are most easily found from the cumulative relative frequency by
reading off the values that correspond to 25%, 50%, and 75% (see cumulative
distribution plot below illustrating the data on page 7).

120

100

0
0 1 2 3 4 5 6 7 8 9 10

DMF

18
© Dr. Herenia P. Lawrence
A box plot (or box and whiskers plot) displays the 1st and 3rd quartiles as a box with
the median at the centre. Lines are drawn from the box to the extreme
observations (although values lying too far from the box are sometimes identified
separately as outliers). Like the histogram, the box plot is useful for checking the
symmetry of a distribution (i.e., whether it has the same shape on either side of the
median).

Example: Using the data from the previous example, i.e., the number of days spent
in hospital by 17 subjects following an operation:

3, 4, 4, 6, 8, 8, 8, 10, 10, 12, 14, 14, 17, 25, 27, 37, 42

The smallest value = 3

The lower quartile, Q25 = n+1 0.25 = 4.5th value of the ordered observations,
i.e., the average of the 4th and 5th values = (6 + 8)/2 = 7
The median, Q50 = 10 or [(17+1)/2 = the 9th value]
The upper quartile, Q75 = n+1 0.75 = 13.5th value of the ordered
observations, i.e., the average of the 13th and 14th values = (17 + 25)/2 = 21
The largest value = 42

DAYS 0 10 20 30 40 50

Q25 Q50 Q75

BOX PLOT

Min Max
10

19
© Dr. Herenia P. Lawrence
Another important use of the box plot is for comparing distributions. For example,
the figure below displays box plots of changes in number of decayed, missing, or
filled surfaces (DMFS) in a clinical trial to compare caries-preventive effects of
various chewing gums. The plots are on the same scale, allowing easy comparison
between treatment groups.

Fig. Box plots of change in DMFS by treatment group in a chewing gum study.

20
© Dr. Herenia P. Lawrence
Sample Variance: the average square ‘distance’/difference of each observation
from the mean. By squaring the difference, all terms will be positive (see your
algebra notes).
2
2 xi x
s
n 1

xi = the value for the ith subject in the sample

x = the sample mean
= the Greek letter sigma, indicating a summation over all xi’s
n = the sample size

The quantity n-1 is called the number of degrees of freedom of the variance. The
formula uses n-1 because the variance of the sample calculated in this way better
approximates the variance of its target population.

Sample Standard Deviation (abbreviated SD, s.d., or s): the square root of the
variance. The standard deviation will have the same unit of measurement as the
original data. The smaller the standard deviation, the less each score varies from
the mean. The larger the spread of scores, the larger the SD becomes.
Algebraically, the formula looks like this:

( xi x) 2
s.d. (or s) =
n 1

Example 1: The number of teeth in a sample of 9 babies aged 9 months are:

5, 6, 6, 7, 8, 8, 10, 10, 12

The range is 12 - 5 = 7

The sample mean x = 5 + 6 + 6 + 7 + 8 + 8 + 10 + 10 + 12 = 72/9 = 8 teeth

N.B.: Many calculators have built-in functions for the mean and standard
deviation. The keys are commonly labeled x and n 1 , respectively, where is
the lower case Greek letter sigma.

21
© Dr. Herenia P. Lawrence
Calculation of the Variance and Standard Deviation

Observation Value (xi) xi - x (xi - x)2

1 5 -3 9
2 6 -2 4
3 6 -2 4
4 7 -1 1
5 8 0 0
6 8 0 0
7 10 2 4
8 10 2 4
9 12 4 16
Sum 72 0 42
The variance is 42/8 = 5.25 teeth2
The standard deviation = square root of the variance = 5.25 = 2.29 teeth
Alternative formula for the variance
s2 = (xi - x)2 or xi2 - ( xi)2/n
n-1 n-1
Example 2: Variance calculation using the second formula
xi x i2
5 25
6 36
6 36
7 49
8 64
8 64
10 100
10 100
12 144
Sum 72 618
xi2 = 618
( xi)2/n = 722/9 = 576
Thus, variance = 618-576 = 5.25 teeth2
9-1

22
© Dr. Herenia P. Lawrence
Properties of the standard deviation and inter-quartile range:

1. If the sample is translated by a constant c, i.e., c is added to all data values,

then the standard deviation and inter-quartile range are not changed.

Example: If a measuring device is not calibrated correctly then the

variability of the values will be unaffected, unlike the mean.

2. If the sample is rescaled by a constant c, i.e., all data values are multiplied
by c, then the standard deviation and inter-quartile range are also multiplied
by c.

Example: In mg/dl, serum glucose has a mean of 110.1 mg/dl and a standard
deviation of 30.0 mg/dl; in g/l, the mean is 1.101 g/l with a standard
deviation of 0.3 g/l.

Comparison of measures of spread:

1. The inter-quartile range is much less influenced by unusually large or small

observations than the standard deviation. In cases when such observations
can arise, summarizing the data using the median and inter-quartile range is
preferable to the mean and standard deviation.

2. For some measurements the coefficient of variation (the standard deviation

divided by the mean) is the most meaningful summary measure. The
coefficient of variation is denoted by CV, and so we have CV = s/ x . The CV
expresses the standard deviation as a percentage of the mean, i.e., CV = 100
(s/ x )%. The coefficient of variation is a unitless measure of spread. For
example, for serum glucose, we have CV = 30.0mg/dl 110.1mg/dl = 0.27
or 27%. It is often used along with the geometric mean for data which are
heavily positively skewed.

3. The range (largest value minus smallest) of a set of data is not a good
indicator of spread because...

(a) it is highly sensitive to extreme values,

(b) it does not make efficient use of the data, and
(c) ranges based on different numbers of values cannot be meaningfully
compared.

Always report the number of observations (n) on which the summary is based.
For binary responses (e.g. A, B) report the percentage of As or Bs but not both.

If the central value of a quantitative distribution is measured using the median

(as in positively skewed distributions), give the lower and upper quartiles as
well.

If the central value of a quantitative distribution is measured using the mean

give the standard deviation as well.

The table below shows the distribution of the number of previous pregnancies of a
group of women aged 30-34 taking part in a study of the association between
periodontal disease in expecting women and reproductive outcomes. Eighteen of
the 100 women had no previous pregnancies, 27 had one, 31 had two, 19 had three,
and five had four previous pregnancies.

No. of previous pregnancies

0 1 2 3 4 Total
No. of women 18 27 31 19 5 100

As, for example, adding 2 thirty-one times is equivalent to adding the product (2 ×
31), the total number of previous pregnancies is calculated by
x = (0 × 18) + (1 × 27) + (2 × 31) + (3 × 19) + (4 × 5)
= 0 + 27 + 62 + 57 + 20 = 166

The average number of previous pregnancies is, therefore:

x = 166/100 = 1.66

In the same way:

2 2 2 2 2
x 2 = (0 × 18) + (1 × 27) + (2 × 31) + (3 × 19) + (4 × 5)
= 0 + 27 + 124 + 171 + 80 = 402

The standard deviation is, therefore:

402 166 2 / 100 126.44
s = = 1.13
99 99

If a variable has been grouped when constructing a frequency distribution, its mean
and standard deviation should be calculated using the original values, not the
frequency distribution. There are occasions, however, when only the frequency
distribution is available. In such a case, approximate values for the mean and
standard deviation can be calculated by using the values of the mid-points of the
groups and proceeding as above.

Norman GR, Streiner DL. Biostatistics. The bare essentials (2nd ed.). Hamilton,
ON: B.C. Decker Inc., 2000. Chapters 1, 2, and 3.

Weintraub JA, Douglass CW, Gillings DB. Biostats. Data analysis for dental
health care professionals (2nd ed.). Research Triangle Park, NC: CAVCO Inc.,
1985. Chapters 4 and 5.

Kim JS, Dailey RJ. Biostatistics for oral healthcare (1st ed.). Ames, IA: Blackwell
Pub. Professional, 2008. Chapters 2 and 3.

1. The data below show the arch lengths of the maxilla of 110 boys aged 6 years. Arch length
is defined as the perpendicular distance between lines tangent to the labial surfaces of the
central incisors and the distal portions of the second primary molars or their permanent
successors. The measurements are in mm.
25.5 31.0 30.0 33.4 30.6 32.0 32.6 30.3 31.6 30.7
31.1 28.0 32.7 32.1 30.1 30.5 29.4 27.6 34.7 29.5
30.6 32.8 32.2 30.2 33.2 30.4 31.0 29.6 31.3 28.9
31.2 31.8 26.4 32.7 30.5 32.3 28.0 31.9 31.2 33.4
27.5 32.9 30.9 31.7 27.3 34.3 28.1 33.6 27.7 30.8
31.3 30.8 32.8 28.3 31.6 28.2 32.4 27.8 33.5 28.9
30.7 32.9 27.4 31.5 29.1 29.7 29.3 30.0 32.5 29.4
29.1 30.5 29.3 30.4 29.2 31.5 26.6 30.2 28.8 29.7
27.2 29.0 33.5 28.4 29.9 30.3 29.6 31.5 28.7 31.4
29.2 28.5 31.4 34.6 30.9 35.5 28.6 29.5 29.5 29.8
28.2 30.5 29.8 31.3 32.7 30.5 33.4 28.2 27.6 32.5

a) Construct a frequency distribution using 1 mm intervals.

b) Calculate relative frequencies (%) in each 1 mm interval.
c) Calculate cumulative relative frequencies (%) for each 1 mm interval.

2. In a caries prevention trial in the elderly, the 150 subjects had the following numbers of teeth
present:
40 subjects had fewer than 5 teeth
60 subjects had 5– teeth
30 subjects had 10– teeth
10 subjects had 15– teeth
10 subjects had more than 25 teeth

Draw a histogram to illustrate the data using 5 teeth as the unit class interval. Describe the
shape of the distribution.

The best measure of the typical number of teeth present for these subjects is

3. The following data represent the number of CNS symptoms reported by 25 dentists who used
a squeeze cloth technique for mercury-rich amalgam mixtures:
1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 8, 9, 9, 10.
Calculate the mean, median and mode as well as the standard deviation for the number of
CNS symptoms reported.

27
© Dr. Herenia P. Lawrence
4. The following figures are the maximum pocket depths recorded in 12 women referred for
dental treatment during pregnancy. The depths are in mm.

1.5 1.5 3.0 5.0 9.0 0.5 0.5 5.0 4.0 4.0 6.5 3.0

a) What is the range of recorded pocket depths?

b) Calculate the mean, variance and standard deviation of these maximum pocket depths.
c) Construct a box plot representing the five number summary of the distribution, i.e., the
smallest value, the lower quartile (Q25), the median (Q50), the upper quartile (Q75) and the
largest value.
d) Do the data appear to be skewed?
e) Do the data contain any outlying observations?
f) What is the geometric mean of this distribution?

5. An absent-minded instructor calculated the following statistics for an examination: mean=50,

range=50, number of cases=99, minimum=20, and maximum=70. She then found an
additional examination with a score of 50. Recalculate the statistics, including the additional
exam score.

6. A sample of pocket depths has sample mean 4 mm and sample standard deviation 1 mm. If
the units of the measurements are changed to cm, the new sample mean is

The new sample standard deviation is

7. In the above question, what is the coefficient of variation of the pocket depths?

To answer this question, did you use mm or cm (or does it matter)?

1. Arch Length Arch Length Frequency Relative Frequency Cumulative

(mm) (Class) (%) Relative Freq.
(%)
25-25.9 1 1 0.9 0.9
26-26.9 2 2 1.8 2.7
27-27.9 3 8 7.3 10.0
28-28.9 4 14 12.7 22.7
29-29.9 5 19 17.3 40.0
30-30.9 6 22 20.0 60.0
31-31.9 7 18 16.4 76.4
32-32.9 8 15 13.6 90.0
33-33.9 9 7 6.4 96.4
34-34.9 10 3 2.7 99.1
35-35.9 11 1 0.9 100.0

2. Teeth Class interval Frequency Relative Frequency 5-teeth Adjusted

(%) Relative Freq. (%)
0-4 5 40 26.7 26.7
5-9 5 60 40.0 40.0
10-14 5 30 20.0 20.0
15-24 10 10 6.7 (6.7/10) 5=3.35
25-29 8 10 6.7 (6.7/8) 5=4.2
30-32 (6.7/8) 3=2.5

The best measure of the typical number of teeth present for these subjects is the median.

% Subjects
45
40
40 Positively skewed
35 distribution
0-4
30 26.7 5-9
25 10-14
20
20 15-19
20-24
15
25-29
10 30-32
3.35 3.35 4.2
5 2.5
0
# Teeth

4. a) Range = 8.5 mm
b) Mean = 3.625 mm
Variance = 6.415 mm2
Standard deviation = 2.533 mm
c)
Descriptives

Stat is tic Std. Error

POCKET Mean 3. 625 .731
95% Conf idence Lower Bound 2. 016
Interv al f or Mean Upper Bound
5. 234

5% Trimmed Mean 3. 500

Median 3. 500
Variance 6. 415
Std. Dev iation 2. 533
Minimum .5
Max imum 9. 0
Range 8. 5
Interquart ile Range 3. 500
Skewness .696 .637
Kurt osis .340 1. 232

-1
-2
N= 12

POCKET

d) The data are slightly positively skewed.

e) The expecting mother with 9.0 mm of maximum pocket depth appears to be an outlier.
However, the sample size is too small to make conclusions.
30
© Dr. Herenia P. Lawrence
f) The mean log maximum pocket depth = 0.9735 (you should have logarithmically
transformed all the data first). The geometric mean is the anti-log of this mean = 2.65
mm.

5. Mean=50

xi
Since x
n

xi
50
99

xi = 50 99 = 4950

Recalculated x 4950 + 50 = 50
100

Range=50
Number of cases=100
Minimum=20
Maximum=70

8. A sample of pocket depths has sample mean 4 mm and sample standard deviation 1 mm. If
the units of the measurements are changed to cm, the new sample mean is
0.4 cm

The new sample standard deviation is

0.1 cm

9. In the above question, what is the coefficient of variation of the pocket depths?
CV = s / x = 1 / 4 = 0.25

To answer this question, did you use mm or cm (or does it matter)?

It does not matter because CV is unitless.

Oral Surgery: Safety & Complications
100% (1)
Oral Surgery: Safety & Complications
4 pages
Lecture 1 - Online - INTRODUCTION TO BIOSTATISTICS (Compatibility Mode)
100% (1)
Lecture 1 - Online - INTRODUCTION TO BIOSTATISTICS (Compatibility Mode)
28 pages
Biostats Solved Past Papers Chapter 1
No ratings yet
Biostats Solved Past Papers Chapter 1
17 pages
Obturation Tech Final
100% (2)
Obturation Tech Final
143 pages
Coronectomy Partial Odontectomy or Intentional Root Retention
No ratings yet
Coronectomy Partial Odontectomy or Intentional Root Retention
11 pages
J Islamabad Med Dent Coll 2013 2 2 103
No ratings yet
J Islamabad Med Dent Coll 2013 2 2 103
1 page
Lecture 5 - Facial Pain and TMJ Disease
No ratings yet
Lecture 5 - Facial Pain and TMJ Disease
6 pages
Basics and Descriptive Statistics
No ratings yet
Basics and Descriptive Statistics
41 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
26 pages
An Introduction To Statistics - Data Types Distrib
No ratings yet
An Introduction To Statistics - Data Types Distrib
2 pages
Influence of Cervical Margin Relocation CMR On Periodonta - 2018 - Journal of
No ratings yet
Influence of Cervical Margin Relocation CMR On Periodonta - 2018 - Journal of
7 pages
Introduction Bio.
No ratings yet
Introduction Bio.
12 pages
Guardian - Dental PPO Plan Summary 2022
No ratings yet
Guardian - Dental PPO Plan Summary 2022
4 pages
Introduction To Biostatistics 1.Zp256050
No ratings yet
Introduction To Biostatistics 1.Zp256050
68 pages
Chemistry Project: Toothpaste Analysis
No ratings yet
Chemistry Project: Toothpaste Analysis
36 pages
Different Types of Variable Used in Data Collection
No ratings yet
Different Types of Variable Used in Data Collection
26 pages
L21) Basic Concepts and Terminology in Biostatistics
No ratings yet
L21) Basic Concepts and Terminology in Biostatistics
41 pages
Dental Composite Innovations
No ratings yet
Dental Composite Innovations
13 pages
Types of Biological Variables: Shreemathi S. Mayya, Ashma D Monteiro, Sachit Ganapathy
No ratings yet
Types of Biological Variables: Shreemathi S. Mayya, Ashma D Monteiro, Sachit Ganapathy
4 pages
1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
Bios Tast Is Tics
No ratings yet
Bios Tast Is Tics
79 pages
Unit 15
No ratings yet
Unit 15
3 pages
1st Lecture-Introduction To Biostatistics and Types of Data-15!02!2025
No ratings yet
1st Lecture-Introduction To Biostatistics and Types of Data-15!02!2025
27 pages
DR Beckett
No ratings yet
DR Beckett
2 pages
Dangling Modifiers
No ratings yet
Dangling Modifiers
16 pages
Fiche - Unit 6a
No ratings yet
Fiche - Unit 6a
4 pages
Session 8 DEN1015H 2012 Lecture Notes & Review Problems With Solutions
No ratings yet
Session 8 DEN1015H 2012 Lecture Notes & Review Problems With Solutions
15 pages
BST 121
No ratings yet
BST 121
111 pages
STT034 Lecture
No ratings yet
STT034 Lecture
6 pages
1 - 2 Biostatistics
No ratings yet
1 - 2 Biostatistics
24 pages
52 Ioc Scientific Schedule
No ratings yet
52 Ioc Scientific Schedule
15 pages
Alveolar Ridge Chanves in Patients Congenitally Missing Mandibular Second Premolars
No ratings yet
Alveolar Ridge Chanves in Patients Congenitally Missing Mandibular Second Premolars
6 pages
Nature of Biostat
No ratings yet
Nature of Biostat
54 pages
Dentistry 11 00223 v3
No ratings yet
Dentistry 11 00223 v3
14 pages
10 1016@j Aller 2009 10 005
No ratings yet
10 1016@j Aller 2009 10 005
7 pages
Bio Stat
No ratings yet
Bio Stat
857 pages
EndoTriad May2015
No ratings yet
EndoTriad May2015
7 pages
CH 1-CH 3 (Notes)
100% (1)
CH 1-CH 3 (Notes)
41 pages
Vailati Belser20iii
No ratings yet
Vailati Belser20iii
23 pages
Study Guide - Describing Data
No ratings yet
Study Guide - Describing Data
18 pages
Replicating Materials - Impression and Casting (D. Materials)
No ratings yet
Replicating Materials - Impression and Casting (D. Materials)
45 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Cracket Teeth
No ratings yet
Cracket Teeth
17 pages
Statistics in Medicine - 2024 - Anaesthesia - Intensive Care Medicine
No ratings yet
Statistics in Medicine - 2024 - Anaesthesia - Intensive Care Medicine
9 pages
1 Introduction To Biostatistics
100% (3)
1 Introduction To Biostatistics
52 pages
5 Axis Dental Milling Machine Guide
No ratings yet
5 Axis Dental Milling Machine Guide
2 pages
Diagnosis of CD
No ratings yet
Diagnosis of CD
79 pages
Down The Drain-Student
No ratings yet
Down The Drain-Student
5 pages
Chapter-1 (Introduction To Biostatistics)
No ratings yet
Chapter-1 (Introduction To Biostatistics)
30 pages
Biostatistics Lecture - 1 - Introduction
50% (4)
Biostatistics Lecture - 1 - Introduction
36 pages
Measurement Scales
No ratings yet
Measurement Scales
18 pages
2 Obtaining Data 1
No ratings yet
2 Obtaining Data 1
24 pages
Basic Concepts in Biostatistics-1
No ratings yet
Basic Concepts in Biostatistics-1
40 pages
Biostatistics and Exercise
100% (9)
Biostatistics and Exercise
97 pages
20 - Basic Concepts and Terminology in Biostatistics (SepI2020)
No ratings yet
20 - Basic Concepts and Terminology in Biostatistics (SepI2020)
38 pages
Electronic Statistics and Probabilities
No ratings yet
Electronic Statistics and Probabilities
241 pages
Dental Lab Fee Schedule
No ratings yet
Dental Lab Fee Schedule
5 pages
1 Introduct
No ratings yet
1 Introduct
9 pages
Oral Care Products
No ratings yet
Oral Care Products
32 pages
The Gross Histological and Ultrastructural Anatomy of Equine
No ratings yet
The Gross Histological and Ultrastructural Anatomy of Equine
17 pages
Meet Me
No ratings yet
Meet Me
3 pages
Juvenile Veterinary Dent - 2005 - Veterinary Clinics of North America Small Ani
No ratings yet
Juvenile Veterinary Dent - 2005 - Veterinary Clinics of North America Small Ani
29 pages
Trigeminal Nerve Final
No ratings yet
Trigeminal Nerve Final
101 pages
DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
No ratings yet
DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
15 pages
Jaw and Soft Tissue Cysts Guide
No ratings yet
Jaw and Soft Tissue Cysts Guide
3 pages
06 - Design Class II, III, IV
No ratings yet
06 - Design Class II, III, IV
81 pages
Important Concepts Doc
No ratings yet
Important Concepts Doc
40 pages
0 Ppt1 Introduction To Biostatistics123
No ratings yet
0 Ppt1 Introduction To Biostatistics123
59 pages
Topic 1 - W1-3 Introduction To Biostatistics
No ratings yet
Topic 1 - W1-3 Introduction To Biostatistics
52 pages
Basic Concepts in Biostatistics-2
No ratings yet
Basic Concepts in Biostatistics-2
35 pages
Powerful and More Uniform - This Is Why MR Is So Expensive, The Magnets Are High
No ratings yet
Powerful and More Uniform - This Is Why MR Is So Expensive, The Magnets Are High
5 pages
Lecture No. 12 Community Dentistry
No ratings yet
Lecture No. 12 Community Dentistry
21 pages
Bio Statistics
No ratings yet
Bio Statistics
435 pages
1 Introduction
No ratings yet
1 Introduction
97 pages
1.introduction To Biostatistics
No ratings yet
1.introduction To Biostatistics
51 pages
Contact Details:: Dr. Joy C. Chavez
No ratings yet
Contact Details:: Dr. Joy C. Chavez
101 pages
Contemporary Fixed Prosthodontics 5th Edition Stephen Rosenstiel - The Complete Ebook Is Available For Download With One Click
100% (1)
Contemporary Fixed Prosthodontics 5th Edition Stephen Rosenstiel - The Complete Ebook Is Available For Download With One Click
42 pages
Dental Examiner Agreement Analysis
No ratings yet
Dental Examiner Agreement Analysis
19 pages
Biostatistics for Health Students
No ratings yet
Biostatistics for Health Students
56 pages
Innovative Developments in Design and Manufacturing Advanced Research in Virtual and Rapid Prototyping Proceedings of Vrp4 Oct 2009 Leiria Portugal Paulo Jorge Da Silva Bartolo Download
100% (2)
Innovative Developments in Design and Manufacturing Advanced Research in Virtual and Rapid Prototyping Proceedings of Vrp4 Oct 2009 Leiria Portugal Paulo Jorge Da Silva Bartolo Download
84 pages
Contact Details:: Dr. Joy C. Chavez
No ratings yet
Contact Details:: Dr. Joy C. Chavez
54 pages
Forensic Identification Prof Cristine
No ratings yet
Forensic Identification Prof Cristine
213 pages
Introduction to Biostatistics
No ratings yet
Introduction to Biostatistics
45 pages
Class 1 - Descripritive Statistics
No ratings yet
Class 1 - Descripritive Statistics
46 pages
Lectures Total
No ratings yet
Lectures Total
269 pages
Biostatics For Nurses
No ratings yet
Biostatics For Nurses
74 pages
Biostatistics Basics for Beginners
No ratings yet
Biostatistics Basics for Beginners
54 pages
Biostatistics..and Orthodontics
No ratings yet
Biostatistics..and Orthodontics
99 pages
Statistics For Everyone Workshop Fall 2010
No ratings yet
Statistics For Everyone Workshop Fall 2010
47 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
101 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
272 pages
Biostatistics Teaching
No ratings yet
Biostatistics Teaching
283 pages

Session 1 DEN1015H 2013 Lecture Notes

Uploaded by

Session 1 DEN1015H 2013 Lecture Notes

Uploaded by

DEN 1015H LECTURE NOTES Session 1

1. Defining & Summarizing Data

A variable relates to anything that is measured, recorded or manipulated in a

Table 1. Examples of types of variables (data)

Quantitative: For a quantitative variable, differences between possible values have

Fig. 2. Types of Variables

Frequencies for categorical variables

Example: Type of malocclusion in a sample of 200 schoolchildren (WHO criteria

Bar Diagram & Pie Chart

Normal Occlusion 100

Example: DMF scores for a group of 50 8-year-old children.

Example: Histogram with equal class intervals

Age at the last birthday Frequency

The class intervals are all ten years.

Age at the Class interval % of total

NUMBER OF HOURS WORKED PER WEEK

NUMBER OF HOURS WORKED PER WEEK

Urinary concentration of lead in 15 children from housing estate ( mol/24hr)

a) Stem-and-leaf “as they come”

We then order the leaves, as in “b)”

b) Ordered stem-and-leaf plot

MEASURES OF CENTRAL TENDENCY

MEAN the sum of all observations divided by the total number of

Example: Number of teeth in a sample of 10 babies

Mean = 6 + 6 + 6 + 6 + 7 + 8 + 8 + 10 + 10 + 12 = 7.9 teeth

Properties of the mean and median:

Example: If a measuring device is not calibrated correctly so that every

Comparison of the mean and the median:

1 Mean values are substantially influenced by unusual values (outliers) so it is

Example: In a needs assessment survey, the mean number of decayed teeth

Symmetrical and bell-shaped distributions differing in terms of kurtosis.

The mean, median, and mode in a skewed distribution

Histogram of highly skewed data. Note the outlier = 43.

Logarithmic (natural logarithms or logarithms to base e = ln or logarithms to the

For Positively skewed distributions

Square or cubic transformations

For Negatively skewed distributions

3, 4, 4, 6, 8, 8, 8, 10, 10, 12, 14, 14, 17, 25, 27, 37, 42

The Geometric Mean

Consider the 17 observations of the duration of stay in hospital plotted on a one-

1. take the logarithm of all data values;

2. calculate the sample mean of the log data values;

antilog of log10 x = 10log x

3, 4, 4, 6, 8, 8, 8, 10, 10, 12, 14, 14, 17, 25, 27, 37, 42

The smallest value = 3

Q25 Q50 Q75

xi = the value for the ith subject in the sample

Example 1: The number of teeth in a sample of 9 babies aged 9 months are:

The sample mean x = 5 + 6 + 6 + 7 + 8 + 8 + 10 + 10 + 12 = 72/9 = 8 teeth

Observation Value (xi) xi - x (xi - x)2

1. If the sample is translated by a constant c, i.e., c is added to all data values,

Example: If a measuring device is not calibrated correctly then the

Comparison of measures of spread:

1. The inter-quartile range is much less influenced by unusually large or small

2. For some measurements the coefficient of variation (the standard deviation

(a) it is highly sensitive to extreme values,

If the central value of a quantitative distribution is measured using the median

If the central value of a quantitative distribution is measured using the mean

No. of previous pregnancies

The average number of previous pregnancies is, therefore:

In the same way:

The standard deviation is, therefore:

a) Construct a frequency distribution using 1 mm intervals.

a) What is the range of recorded pocket depths?

5. An absent-minded instructor calculated the following statistics for an examination: mean=50,

The new sample standard deviation is

To answer this question, did you use mm or cm (or does it matter)?

1. Arch Length Arch Length Frequency Relative Frequency Cumulative

2. Teeth Class interval Frequency Relative Frequency 5-teeth Adjusted

Stat is tic Std. Error

5% Trimmed Mean 3. 500

d) The data are slightly positively skewed.

The new sample standard deviation is

To answer this question, did you use mm or cm (or does it matter)?

You might also like