Statistics and Data Analysisfor
Nursing Research
Second Edition
CHAPTER
3
Central Tendency, Variability, and Relative
Standing
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Characteristics of a Data Distribution
Shape (Chapter 2)
Central tendency
Variability
Both central tendency and variability can be
expressed by indexes that are descriptive
statistics
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Central Tendency
Indexes of central tendency provide a single
number to characterize a distribution
Measures of central tendency come from the
center of the distribution of data values,
indicating what is “typical,” and where data
values tend to cluster
Popularly called an “average”
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Central Tendency Indexes
Three alternative indexes:
The mode
The median
The mean
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Mode
The mode is the
score value with the
2.5
The mode
highest frequency;
2.0
1.5
the most “popular” 1.0
score .5 Std. Dev = 1.80
Age: 26 27 27 28 0.0
Mean = 28.3
N = 7.00
29 30 31 26.0 27.0 28.0 29.0 30.0 31.0
AGE
Mode = 27
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Mode: Advantages
Can be used with data measured on any
measurement level (including nominal level)
Easy to “compute”
Reflects an actual value in the distribution,
so it is easy to understand
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Mode: Disadvantages
Ignores most information in the distribution
Tends to be unstable (i.e., value varies a lot
from one sample to the next)
Some distributions may not have a mode (e.g.,
10, 10, 11, 11, 12, 12)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Median
The median is the
score that divides the 2.5 The median
distribution into two 2.0
equal halves 1.5
50% are below the
1.0
median, 50% above
.5 Std. Dev = 1.80
Mean = 28.3
0.0 N = 7.00
Age: 26 27 27 28 26.0 27.0 28.0 29.0 30.0 31.0
29 30 31 AGE
Median (Mdn) = 28
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Median: Advantages
Not influenced by outliers
Particularly good index of what is “typical”
when distribution is skewed
Easy to “compute”
Appropriate when data are ordinal level
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Median: Disadvantages
Does not take actual data values into
account—only an index of position
Value of median not necessarily an actual
data value, so it is more difficult to
understand than mode
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Mean
The mean is the
arithmetic average 2.5 The mean
2.0
Data values are 1.5
summed and divided 1.0
by N .5 Std. Dev = 1.80
Mean = 28.3
0.0 N = 7.00
26.0 27.0 28.0 29.0 30.0 31.0
Age: 26 27 27 28 AGE
29 30 31
Mean =
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Mean (cont’d)
Most frequently used measure of central
tendency—usually preferred for interval- and
ratio-level data
Equation:
M = ΣX ÷ N
Where:
M = sample mean
Σ = the sum of
X = actual data values
N = number of people
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Mean: Advantages
The balance point in the distribution:
Sum of deviations above the mean always
exactly balances those below it
Does not ignore any information
The most stable index of central tendency
Many inferential statistics are based on the
mean
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Mean: Disadvantages
Sensitive to outliers
Gives a distorted view of what is “typical”
when data are skewed
Value of mean is often not an actual data
value
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Mean: Symbols
Sample means:
In reports, usually symbolized as M
In statistical formulas, usually symbolized as
x (pronounced X bar)
Population means:
The Greek letter μ (mu)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Central Tendency in Normal
Distributions
In a normal
distribution, all
three indexes
coincide
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Central Tendency in Skewed
Distributions
In a skewed distribution, the mean is pulled
“off center” in the direction of the skew
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Variability
Variability concerns how spread out or
dispersed data values in a distribution are
Two distributions with the same mean could
have different dispersion
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Variability (cont’d)
High variability: A
heterogeneous
distribution (A)
Low variability: A
homogeneous
distribution (B)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Indexes of Variability
Range
Interquartile range
Standard deviation
Variance
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Range
Range: The difference between the highest
and lowest value in the distribution
Weights (pounds):
110 120 130 140 150 150 160 170 180 190
The range here is 80 (190 – 110)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Range: Advantages
Easy to compute
Readily understood
Communicates information of interest to
readers of a report
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Range: Disadvantages
Depends on only two scores, does not take all
information into account
Sensitive to outliers
Tends to be unstable—fluctuates from sample
to sample
Influenced by sample size
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Interquartile Range
Interquartile range (IQR): Based on quartiles
Lower quartile (Q1): Point below which 25% of
scores lie
Upper quartile (Q3): Point below which 75% of
scores lie
IQR = Q3 - Q1
IQR is the range of scores within which the middle
50% of scores lie
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Interquartile Range (cont’d)
IQR Example: Weights (pounds):
110 120 130 140 150 150 160 170 180 190
The IQR is 45.0 (172.5 – 127.5)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Interquartile Range: Advantages
Reduces influence of outliers and extreme
scores in expressing variability
Uses more information than the range
Important in evaluating outliers
Appropriate as index of variability with
ordinal measures
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Interquartile Range: Disadvantages
Is not particularly easy to compute
Is not well understood
Does not take all values into account
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Standard Deviation
Standard deviation (SD): An index that conveys
how much, on average, scores in a distribution
vary
SDs are based on deviation scores (x),
calculated by subtracting the mean from each
person’s original score
x=X-M
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
The Standard Deviation (cont’d)
Standard deviation example: Weights (pounds):
110 120 130 140 150 150 160 170 180 190
In this distribution, M = 150
For the first person, x = -40
For the last person, x = +40
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Standard Deviation (cont’d)
The sum of all deviation scores in a
distribution always = 0
Thus, to compute SDs, deviation scores
must be squared (x2) before being summed
SD equation:
SD = Square root of: Σx2 ÷ (N -1)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Standard Deviation (cont’d)
Weights (pounds):
110 120 130 140 150 150 160 170 180 190
Deviation scores (x) for M = 150:
-40 -30 -20 -10 0 0 10 20 30 40
Squared deviation scores (x2):
1600 900 400 100 0 0 100 400 900 1600
Sum of squared deviation scores:
1600+900+400+100+0+0+100+400+900+1600 = 6000
SD = √(6000/(N -1) =
SD = √(6000/(9) = 25.82
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Standard Deviation Interpretation
Provides a “standard”—the SD indicates the
average amount of deviation of scores from
the mean
Tells you how wrong, on average, the mean
is as a summary of the overall distribution
An SD provides valuable information when
the distribution is normal:
There are approximately three SDs above and
below the mean in a normal distribution
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Standard Deviation Interpretation
(cont’d)
In a normal distribution, a fixed percentage
of cases lie within certain distances from the
mean:
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
SDs and Individual Scores
A person who scores one SD below the mean
has a higher score than 16% of the cases
(2.3% + 13.6%)
A person who scores one SD above the mean
has a higher score than 84% of the cases
(50.0% + 34.1%)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Standard Deviation: Advantages
Takes all data into account in describing
variability
Is more stable as a measure of variability than
the range or IQR
Lends itself to computation of other measures
often used in inferential statistics
Is helpful in interpreting individual scores
when data are distributed approximately
normally Copyright ©2010 by Pearson Education, Inc.
Statistics and Data Analysis for Nursing Research, Second Edition
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Standard Deviation: Disadvantages
Can be influenced by extreme scores
Not as “intuitive” or as easy to interpret as
the range
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Variance
An important variability concept in inferential
statistics, but not used descriptively
The variance = SD2
In earlier example, SD2 = 25.822 = 666.67
Not easily interpreted because it is not in
units of original data—it is in units squared
(here, pounds squared)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Measurement Scales and Descriptive
Statistics
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Relative Standing
Central tendency and variability indexes
describe a distribution
There are also descriptive statistics to
describe individual scores—i.e., their relative
standing or position in a distribution:
Percentile ranks
Standard scores
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Percentiles
A percentile is one one-hundredth of a
distribution
Quartiles divide a distribution into quarters
Deciles divide a distribution into tenths
Each percentile, quartile, etc. can be
determined in relation to a score in a
distribution
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Percentile Rank
A percentile rank is the location of a given
score in the distribution—it communicates
what percentage of cases fall at or below
that value
Score What percentile rank?
Percentile What score?
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Percentiles and Outliers
Outliers are often defined in relation to
percentiles
There are:
Mild outliers
Extreme outliers
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Outliers: Formal Definition
A mild outlier is a score that is between 1.5
and 3.0 times the value of the IQR, below Q1
or above Q3
An extreme outlier is a score that is greater
than 3.0 times the value of the IQR, below
Q1 or above Q3
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Box Plots
A box plot (or box-and-whiskers plot) is a
graphic depiction of a distribution that
shows the median, the IQR, and the outer
limits of values not considered outliers
Outlying cases can be shown on the box plot,
with identifying information (e.g., an ID
number)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Box Plots (cont’d)
Bottom of “box” shows Q1
Top of “box” shows Q3
Horizontal line in box shows median
“Whiskers” show outer limits of what is
NOT an outlier
In SPSS, a circle O indicates value and ID of a
mild outlier
An asterisk * is for an extreme outlier
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Box Plot Illustration
Textbook Heart Rate Data:
Q1 = 62
Q2 = 66 = Median
Q3 = 68
“Whiskers” limits: 53, 77
Mild outliers:
50 (#106), 45 (#105)
Extreme outliers:
40 (#104), 90 (#103),
95 (#102), 100 (#101)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Box Plots Versus Histograms
Outliers can be seen in histograms, but box
plots give more useful information about
degree of extremity and ID numbers
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Standard Scores
Standard scores—another index of relative
standing helpful in interpreting raw scores
A standard score (also called a z score) is a
score expressed in standard deviation units,
in relative distance from the mean
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Standard Scores (cont’d)
Standard score equation:
z = (X – M) ÷ SD
That is, the mean is subtracted from an
individual score, then divided by the SD
For example:
M = 100, SD = 25, X = 125, z = 1.0
M = 100, SD = 25, X = 50, z = -2.0
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Standard Scores (cont’d)
Standard scores have a mean of 0.0 and an SD of
1.0:
But z scores can be transformed mathematically to
have any mean and SD
Most typical:
Mean = 500, SD = 100 (e.g., GRE, SAT)
Mean = 100, SD = 15 (e.g., IQ tests)
Mean = 50, SD = 50 (called T scores)
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Uses of Descriptive Statistics
Indexes of central tendency and variability
are used to:
Understand data, get a “big picture”
Evaluate outliers and need for strategies to
address problems (e.g., using a trimmed mean
that recalculates mean after deleting a fixed
percentage (e.g., 5% from either end)
Describe research participants (e.g., their age,
education, length of illness)
Answer descriptive questions
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Descriptive Statistics in SPSS
Can be obtained through Analyze
Descriptive Statistics and are obtained in
three programs within that broad umbrella
(each has slightly different options):
Frequencies Statistics
Descriptives Options
Explore Statistics
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Descriptive Statistics in SPSS
Frequencies
Percentile values
Central tendency
Dispersion (variability)
Skewness and Kurtosis
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Descriptive Statistics in SPSS
Descriptives
Mean (no median)
Dispersion (variability)
Skewness and Kurtosis
No percentiles
BUT has good display options
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Descriptive Statistics in SPSS
Descriptives (cont’d)
Another important
feature: The ability
to create standard
scores
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Descriptive Statistics in SPSS Explore
Can request both
statistical description
and graphical
description (plots)
Select options with
pushbuttons
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Descriptive Statistics in SPSS Explore
(cont’d)
Statistical options:
Full descriptive
statistics
Outliers
Percentiles
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.
Descriptive Statistics in SPSS Explore
(cont’d)
Important graphic
option: Box-and-
whiskers plots
Statistics and Data Analysis for Nursing Research, Second Edition Copyright ©2010 by Pearson Education, Inc.
Denise F. Polit Upper Saddle River, New Jersey 07458
All rights reserved.