Comparing variables
Statistical Comparison Tests
What tests should be used?
Numerical data
Comparing independent groups
Comparing groups
Numerical data
Independent / unrelated
groups
Normally distributed Not Normally Distributed
2 groups 2 groups
Independent t-test Mann-Whitney U-test
≥ 3 groups ≥ 3 groups
One-factor ANOVA Kruskal-Wallis test
Example
• Is the average age of patients who receive
best supportive care significantly different
from chemotherapy patients?
• Or, could any difference seen be explained by
chance?
Comparing 2 independent groups
Chemotherapy patients
Mean 66.7
80
Median 67.0
60
Standard 9.2
Frequency
40 Deviation
Interquartile 12.0
20
Range
0
0 20 40 60 80 100
Age at Diagnosis
Best supportive care patients
Mean 75.2
35
30
Median 76.0
25 Standard 9.4
Frequency
20 Deviation
15
Interquartile 12.0
10
5
Range
0
0 20 40 60 80 100
Age at Diagnosis
Null hypothesis (H0):
Average age is the same in the two groups
Alternative hypothesis (H1):
Average age is different
If p > 0.05 we don’t reject the null hypothesis
If p < 0.05 we reject the null hypothesis: there is a
statistically significant difference between the
groups
Example
Is the average age of patients who receive best
supportive care significantly different from
chemotherapy patients?
Chemotherapy patients: mean age = 66.7
Best Supportive care patients: mean age = 75.2
T-test; p=0.02
Comparing groups
Numerical data
Independent / unrelated
groups
Normally distributed Not Normally Distributed
2 groups 2 groups
Independent t-test Mann-Whitney U-test
≥ 3 groups ≥ 3 groups
One-factor ANOVA Kruskal-Wallis test
Numerical data
Comparing dependent groups
Dependent/related groups?
= paired data
• Repeat data on same group of patients
Eg. before and after some intervention
• Patient groups which have been one-to-one
matched for characteristics such as age, sex,
stage of cancer.
Comparing groups
Numerical Data
Dependent/Related
groups
Not Normally
Normally distributed
Distributed
2 groups 2 groups
Paired t-test Wilcoxon signed rank test
≥ 3 groups
≥ 3 groups
Repeated Measures
ANOVA Friedman test
Comparing 2 groups: Paired data
Time to produce organ at risk contours
25
20
time (mins)
15
10
0
Manual Contour Modified Contour
Null hypothesis (H0): The average change
between manual and modified is 0
Alternative hypothesis (H1): The average
change between manual and modified is not 0
If p > 0.05 we don’t reject the null hypothesis
If p < 0.05 we reject the null hypothesis: there is a
statistically significant difference between the
groups
HADS anxiety scores were obtained for 30 cancer patients
before and after a session with a clinical psychologist to assess
whether there had been a significant change. The HADS scores
had a significantly skewed distribution.
Which is the most appropriate test?:
(a) Two-sample t-test
(b) Wilcoxon signed rank test
(c) One factor ANOVA
(d) Mann-Whitney test
(e) Paired t-test
Non numerical / categorical data
Comparing independent groups
Comparing groups
Non-numeric/
categorical data
Independent / unrelated
Related groups
groups
Chi-square test
Fisher’s exact test McNemar’s test
Linear Trend test
Comparing 2 groups with
a categorical outcome
Group 1 = female
Group 2 = male
Outcome = tumour side (left vs right)
Comparing 2 groups with
a categorical outcome
Group 1 = female
Group 2 = male
Outcome = tumour side (left vs right)
Use a chi-square test
Is there a difference in tumour side between
males and females?
Side/ Sex Left Right Total
Female 386 500 886
Male 457 644 1,101
Total 843 1,144 1,987
Is there a difference in tumour side between
males and females?
Side/ Sex Left Right Total
Female 386 500 886
44% 56%
Male 457 644 1,101
42% 58%
Total 843 1,144 1,987
Null hypothesis (H0): Tumour side is similar
between sexes (eg. 44% ≈ 42% for left side)
Alternative hypothesis (H1): Tumour side
is different between sexes (or tumour side is
associated with sex)
If p > 0.05 we don’t reject the null hypothesis
If p < 0.05 we reject the null hypothesis: there is a
statistically significant association between sex
and tumor side
Is there a difference in tumour side between
males and females?
Chi-square test; p=0.36
No significant difference in tumour side
No association between tumour side and sex
of patient
What is the chi-square test?
Compares observed and expected values
(under the null hypothesis) for each cell in
the table
Expected values are calculated by
multiplying the relevant column and row
totals and dividing by the grand total
What is the chi-square test?
The chi-square statistic =
sum of the terms
2
𝑂 −𝐸
𝐸
[one term for each cell in the table]
This numerical value is then compared
with the chi-square distribution
Is there a difference in tumour side between
males and females?
Side/ Sex Left Right Total
Female 386 500 886
375.9 510.1
Male 457 644 1,101
467.1 633.9
Total 843 1,144 1,987
Expected values in red
Is there a difference in tumour side between
males and females?
Side/ Sex Left Right Total
Female 386 500 886
375.9 510.1
Male 457 644 1,101
467.1 633.9
Total 843 1,144 1,987
Chi-square statistic =
2 2 2 2
386−375.9 500−510.1 457−467.1 644−633.9
+ + +
375.9 510.1 467.1 633.9
= 0.85 ; p=0.36
Chi-square test
Can also be used for comparing more than 2
groups, and for outcomes that have more
than 2 categories.
Eg. Comparing patients grouped into 4
distinct categories of treatment and an
outcome measured at 12 months as ‘died’,
‘relapse’, ‘remission’.
Fisher’s Exact test
Use instead of chi-square test if group sizes
are small
Eg. in previous example if only 4 females
and 5 males
Linear trend test
If the grouping factor is an ordered category, eg
stage of tumour, and the outcome is a 2-category
variable, eg death within 12 months, a more
powerful analysis than a simple chi-square test is a
chi-square test for linear trend.
This method effectively fits a straight line to the
death rates corresponding to each stage.
Comparing groups
Non-numeric/
categorical data
Independent / unrelated
Related groups
groups
Chi-square test
Fisher’s exact test McNemar’s test
Linear trend test
Non numerical / categorical data
Comparing dependent/related
groups
Comparing paired groups with a
categorical outcome
One group with data before and after treatment
Outcome = binary (eg. yes vs no)
Example
McNemar’s test is based on comparing the off-diagonal numbers:
Before ‘yes’ / After ‘no’ vs. Before ‘no’/After ‘yes’
ie. 785 vs 75
1000 cancer patients diagnosed via screening or presenting
with symptoms were classified according to their ER status
(positive vs negative).
What is the most appropriate test to use to see whether those
diagnosed via symptoms are more likely to be ER negative
than those diagnosed via screening?
(a) Fisher’s Exact test
(b) McNemar’s test
(c) Chi-square test for linear trend
(d) Wilcoxon signed rank test
(e) Chi-square test
Scatterplot
Scattergram of creatinine vs. digoxin
140
120
100
Creatinine
80
60
40
20
0
0 20 40 60 80 100 120
Digoxin
Correlation
Pearson correlation:
• Used for Normally distributed data
• Measures linear relation between variables
Correlation
• r = 0 no relationship
• r = 1 perfect +ve relationship
• r = -1 perfect –ve relationship
Pearson's correlation coefficient
1 0.93 0.75 -0.2
8
7
1.0
6 6 6
0.5
5
4 0.0
4
4
y
y
-0.5
3 2
2
-1.0
2
0 -1.5
0 1
-1 0 1 2 -1 0 1 2 -1 0 1 2 -1 0 1 2
x x x x
-1 -0.89 -0.8 -0.08
4 3
0 2
2
0 2
-2 0
-2 1
-2
y
y
-4 -4 -4 0
-6 -6
-6 -1
-8 -8
-2
-1 0 1 2 -1 0 1 2 -1 0 1 2 -1 0 1 2
x x x x
Pearson's correlation coefficient
Sensitive to outliers
0.85 0.32
30
8
25
6 20
y
y
15
4
10
2
5
0 0
-1 0 1 2 -1 0 1 2
x x
Correlation
Spearman correlation:
• Used for non-Normally distributed data
• Measures monotonic relation between
variables
Correlation does not imply
causation
Correlation does not imply
causation
Eg.
Correlation between pork consumption and cirrhosis
mortality r = 0.40 (over 16 countries)
Relationships
Continuous Data
Normally
Not Normally
distributed
Distributed
Pearson’s Spearman’s
correlation correlation
(2 variables) (2 variables)
In a study looking at the relationship between pre-
treatment HADs anxiety score and post treatment
satisfaction score (SAT) were obtained on 130 women.
The Pearson correlation was -0.65; p<0.001.
What is the single best statement?
(a) HADs score explains 65% of the variability in satisfaction score
(b) We can conclude that an increasing HADs score is a cause of poor
SAT score
(c) A correlation of 1 is interpreted as showing no difference
between the two factors.
(d) There is a significant negative linear relationship between HADs
and SAT
(e) The correlation between HADs and SAT can be different from the
correlation between SAT and HADs
Prediction
• Describe the relationship between two
variables
• Be able to predict the value of one
variable for a subject when only have data
on the other variable
Regression
“How can we predict creatinine from digoxin?”
Scattergram of creatinine vs. digoxin
140
120
100
Creatinine
80
60
40
20
0
0 20 40 60 80 100 120
Digoxin
Simple linear regression equation
Used to determine the exact linear relationship
between two variables x and y
y=b+ mx
b = intercept of line on y-axis (value of y when x is zero)
m = slope or gradient of line
x= independent variable
y = dependent variable
Linear Regression
Creatinine vs. digoxin
Creatinine = b + m digoxin
Fitted regression equation:
Creatinine = 5 + 1.2 digoxin
Creatinine vs. digoxin
Creatinine = 5 + 1.2 digoxin
For every unit increase in digoxin,
there is a 1.2 unit increase in creatinine
Goodness of fit
• How well does the regression line
fit the data?
• How accurate are the predictions?
R2
• The proportion of the total
variation explained by the model
• For simple linear regression = the
square of the correlation between
x and y.
Multiple linear regression
More than one ‘x’ variable (independent
variable) in the regression equation
Eg.
Creatinine = b + m1 digoxin + m2 age
Multiple linear regression
Can be used to adjust for confounding variables:
Eg.
“Is there a relationship between cholesterol and
age after adjusting for BMI?”
chol = b + m1 age + m2 BMI
Confounders
• Complicate relationships between two
variables of interest .
• Related to both the dependent variable and
the independent variable
The predictive model for the link between tumour size and
age was derived for a cohort of 250 women with breast
cancer:
Tumour size = 2.47 + 0.10 x age ; p=0.13
(a) Age is treated as a confounder in the regression
(b) The model shows that tumour size significantly
increases by 0.10 for every 1 year increase in age
(c) The association between tumour size and age is not
statistically significant
(d) For every unit change in tumour size, there is a 0.10
change in average age.
(e) The correlation between tumour size and age is
0.10
Prediction models
Outcome = continuous Outcome=binary
Predictors = continuous Predictors = continuous
and/or categorical and/or categorical
Linear Regression
Logistic Regression
Multiple linear regression
Multiple logistic regression
Logistic regression
Dependent variable = binary (two categories)
Eg.
Predict ‘complications (yes/no) following op’
from ‘current systolic BP’
Logistic regression
Predict ‘complications (yes/no) following op’
from ‘pre-op systolic BP’
Loge [p/(1-p)] = b + m (pre-op BP)
Where p = probability of having a complication
Screening Tests – Mammograms
Test
Abnormal Normal Total
Cancer
5,747 963 6,710
Non-Cancer
174,310 1,653,760 1,828,070
Total 180,057
1,654,723 1,834,780
NCI-funded Breast Cancer Surveillance Consortium (HHSN261201100031C). Downloaded 09/01/2015 from the Breast Cancer Surveillance
Consortium Web site - http://breastscreening.cancer.gov/statistics/benchmarks/screening/2009/table7.html .
Abnormal Normal Total
Cancer 5,747 963 6,710
Non-Cancer 174,310 1,653,760 1,828,070
Total 180,057 1,654,723 1,834,780
Sensitivity:
The percentage of the test being positive among the
people who really have the disease.
= 5,747 / 6,710 = 86%
Abnormal Normal Total
Cancer 5,747 963 6,710
Non-Cancer 174,310 1,653,760 1,828,070
Total 180,057 1,654,723 1,834,780
Specificity:
The percentage of the test being negative among the
people who really don’t have the disease.
= 1,653,760/1,828,070 = 90%
Abnormal Normal Total
Cancer 5,747 963 6,710
Non-Cancer 174,310 1,653,760 1,828,070
Total 180,057 1,654,723 1,834,780
Positive predictive value (PPV):
The percentage of test positive results that are true
positives
= 5,747 / 180,057 = 3%
Abnormal Normal Total
Cancer 5,747 963 6,710
Non-Cancer 174,310 1,653,760 1,828,070
Total 180,057 1,654,723 1,834,780
Negative predictive value (NPV):
The percentage of test negative results that are true
negatives
= 1,653,760 / 1,654,723 = 99.9%
Abnormal Normal Total
Cancer True positive False negative
Non-Cancer False positive True negative
Total
Prevalence of disease
• PPV and NPV vary with prevalence
• Sensitivity and specificity do not vary with
prevalence
Varying prevalence
Prevalence Sensitivity Specificity PPV NPV
1.0 per 1,000 0.86 0.90 0.85% 99.98%
3.6 per 1,000 0.86 0.90 3% 99.9%
20 per 1,000 0.86 0.90 15% 99.7%
Relationship between PPV, sensitivity,
specificity and prevalence
𝑃𝑃𝑉 =
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒+(1−𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦)×(1−𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
Relationship between NPV, sensitivity,
specificity and prevalence
𝑁𝑃𝑉 =
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 ×(1−𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
1−𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 × 𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒+𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦×(1−𝑝𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒)
To assess the effectiveness of a new ultrasound screening test, 100
women with abnormal mammogram results were screened. 25
women were later found, by pathology, to have cancer. 20 of these
25 women had a positive ultrasound result. 65 of those without
cancer had a negative ultrasound.
(a) The PPV is 20/35
(b) The sensitivity of a test depends on the prevalence of cancer
(c) The sensitivity is 20/100
(d) The specificity is 65/75
(e) Specificity = 1 - sensitivity
Question 1
Multiple linear regression analysis (multivariable
analysis) can be used to:
Select the best statement
(a) adjust for confounding by adding the confounding
variable as an independent variable.
(b) determine if a study result is clinically important.
(c) adjust for loss to follow-up bias.
(d) calculate a study's power.
(e) correct the p-value for multiple comparisons.
Question 2
Select the best statement:
(a) The correlation coefficient, r, is dependent on the units of
measurement.
(b) In correlation, it matters which variable is put on the x-axis
and which variable is put on the y-axis.
(c) A correlation coefficient is always positive
(d) The Pearson correlation coefficient, r, is sensitive to
extreme values (outliers).
(e) Correlation is a measure of the relationship between the
population mean and the sample mean.
Question 3
Select the best statement concerning a scatterplot:
(a) It plots the distribution of potential confidence intervals in
a data set.
(b) It plots the distribution of a single variable.
(c) It provides a visual description of the distribution of
potential sample means drawn from a given population.
(d) It plots the standard errors of randomly selected data from
a given population.
(e) It plots the values of two numerical variables in a data set.
Question 4
The following is a linear regression model:
Energy Expenditure = 0.56 x (Caloric Intake) + 502.
Select the correct answer:
(a) The model adjusts for potential confounding by caloric intake.
(b) For every unit change in energy expenditure there is a 0.56 unit
change in mean caloric intake.
(c) For every unit change in caloric intake there is a 0.56 unit change
in mean energy expenditure.
(d) For every unit change in energy expenditure there is a change in
caloric intake of 0.56 +502.
(e) 502 is the value of caloric intake when energy expenditure is zero.
Question 5
Follow-up results for up to 84 months were reported for a series of 323 men with prostate
cancer
treated by brachytherapy (BXT). At entry, a Prognostic Index was calculated for each man,
and each classed as low, intermediate or high risk.
The table below gives summary statistics comparing the three risk groups for age and the
outcome measure which was the proportion with PSA £0.2 ng/ml at five years.
What is the most appropriate statistical test to compare age by the level of risk ?
(a) a one way analysis of variance
(b) a log-rank test
(c) a chi-squared test
(d) an unpaired t-test
(e) none of the above
Question 6
A cohort of cancer patients experiencing nausea and
vomiting during chemotherapy were given a new anti-
nausea treatment. A symptom score (1-10) was recorded
before and after treatment. The clinical effect of the
medication could be evaluated using:
(a) Wilcoxon signed rank test
(b) Spearman correlation coefficient
(c) Mann-Whitney U-test.
(d) McNemar’s test
(e) One-factor ANOVA.
Question 7
Select the best statement concerning Pearson's correlation coefficient:
(a) The associated hypothesis test has a null hypothesis that the correlation is
equal to one.
(b) It reflects the magnitude of the association for linear and non-linear
relationships between two numerical variables.
(c) A Pearson's correlation coefficient of zero indicates there is no linear
association between the two variables.
(d) A Pearson's correlation coefficient of positive one indicates there is no linear
association between the two variables.
(e) A Pearson's correlation coefficient of negative one indicates there is a non-
linear relationship between the two variables.
Question 8
Results of a liver scan (positive vs negative) and subsequent
pathology result (abnormal vs normal) were obtained for 344
patients. Abnormal pathology results were found for 258, and
positive liver scans were found for 263 patients. Of these 263
patients, 231 had abnormal pathology
(a)The sensitivity of the liver scan is 231/263
(b)The prevalence of abnormal pathology in the cohort is 258/344
(c)The specificity of the liver scan cannot be determined
(d)The PPV of the liver scan is 263/344
(e)The NPV of the liver scan cannot be determined