TRAINING ON STATISTICAL
ANALYSIS
Hazwan Mat Din
Malaysian Research Institute on Ageing
Universiti Putra Malaysia
Hazwan Mat Din / MyAgeing 1
STATISTICAL ANALYSIS:
Which to choose?
Hazwan Mat Din / MyAgeing 2
Points to consider to select the right
statistical test
• Research question/ hypothesis
• Number of variables
• Type of data
• Number of groups
• Sample distribution
• Sample type
Hazwan Mat Din / MyAgeing 3
Common research questions in medical
research
• Difference between / among means
• Difference between / among proportions
• Association between / among factors
• Difference between / among treatment effects
Hazwan Mat Din / MyAgeing 4
T-TEST
Hazwan Mat Din / MyAgeing 5
TYPE OF T-TEST
1. One sample t-test
Was this sample drawn from a population with certain mean?
2. Independent t-test
Are the groups drawn from populations with different means?
3. Paired t-test
Two dependent ore related sample
Each score in sample is paired with a score in another
What is the mean difference in an outcome variable under the two
conditions?
Hazwan Mat Din / MyAgeing 6
INDEPENDENT T-TEST
• = Student’s t-test
• Univariable analysis of numerical data
• Compare means of two independent groups
• Dependent variable is continuous
Weight, age, height, BP, Hb level
• Independent variable is categorical (ordinal/nominal) with two
levels
Sex (male/female), disease outcome (alive/died)
Hazwan Mat Din / MyAgeing 7
Assumptions for independent t-test
• Random samples by design
• Observations are independent samples
• Data are normally distributed in each group to check
• Sufficient sample size
• Population variance are homogenous between groups to check
Hazwan Mat Din / MyAgeing 8
Hands on
• Data: pain_medication.sav
• Hypothesis:
There is no significant mean difference of age between male and female;
H0: µ(male) = µ(female)
There is significant mean differenc of age between male and female;
HA: µ(male) ≠ µ(female)
Hazwan Mat Din / MyAgeing 9
Assumption: Checking normality
Histograms show normal distribution of
age according to gender
Hazwan Mat Din / MyAgeing 10
Assumption: Equal variance
• Levene’s test equality of variance
Not significant (p-value more than 0.05)equal variance met
Significant (p-value less than 0.05)equal variance not met
Hazwan Mat Din / MyAgeing 11
Hands on: Independent t-test
Hazwan Mat Din / MyAgeing 12
Output
• Assumption of homogeneity of variances, • T-test p-value
p-value = 0.854, therefore the variance • There is no significant mean difference of age between male
are homogenous or equal variances and female
assumed
Hazwan Mat Din / MyAgeing 13
Presentation
Table 1. The mean difference of age between gender
Mean age (SD) t-test (df) p value
Male 54.9 (10.6)
-0.378 (198) 0.705
Female 55.4 (10.2)
Hazwan Mat Din / MyAgeing 14
PAIRED T-TEST
• To compare means of two samples that are paired / related /
dependent eg matched /measured twice eg pre-post test design
• To study paired data, should examine the difference between the
pair
Hazwan Mat Din / MyAgeing 15
Assumptions
• Random samples
• Observation are dependent, e.g. pre- & post- data
• The differences (post minus pre) are normally distributed or
sufficient sample size need to check
Hazwan Mat Din / MyAgeing 16
Hands On
• Data: Diet_pre_post.sav
• Hypothesis
H0: There is no significant difference of mean weight before diet and after
diet
HA: There is significant difference of mean weight before diet and after
diet
Hazwan Mat Din / MyAgeing 17
Assumption: Checking normality
Run histogram to check for
Compute mean normality
difference
A new variable is
created
Hazwan Mat Din / MyAgeing 18
Hands on: Paired t-test
Hazwan Mat Din / MyAgeing 19
Output
• Significant p value less than 0.05
• Fail to reject HA
• There is significant difference of mean before diet and after diet
Hazwan Mat Din / MyAgeing 20
Presentation
Table 2. The mean difference of weight before and after diet
Mean of Weight (SD) Mean Difference
t statistics(df)* P value
Pre Post (95% CI)
198.4 (33.4) 190.3 (33.5) -8.0 (-9.6, 6.5) -11.18 (15) <0.001
*Paired t-test
Hazwan Mat Din / MyAgeing 21
Analysis of Variance
(ANOVA)
Hazwan Mat Din / MyAgeing 22
THE GENERAL LINEAR MODEL
UNIVARIATE
• The general linear model is an extension of multiple linear
regression for a single dependent variable
• Provides regression analysis and analysis of variance for one
dependent variable by one or more factors and/or variable
• Can investigate interactions between factors as well as the effects
of individual factors, the effect of covariates and covariate
interaction with factors
Hazwan Mat Din / MyAgeing 23
ANOVA
• One way ANOVA
To test the differences of one continuous variable between one categorical
variable > 2 groups
• Two way ANOVA
To test the effect of 2 categorical independent variables on one continuous
dependent variable
• Multifactorial ANOVA
To test the effects of >2 categorical independent variables on one
continuous dependent variable
Hazwan Mat Din / MyAgeing 24
ONE WAY ANOVA
• Used to determine the effect of single factor on one numerical
outcome (dependent variable)
• To compare the means of more than 2 groups of an independent
variable
• For factors with 2 categories that compare means of 2 groups,
Independent t-test is preferred
• By logic can do independent t-test for each of pair but it will
increase the type 1 error
Hazwan Mat Din / MyAgeing 25
Type 1 error
• Multiple tests to identify differences in pairs of group means
• Set alpha at 0.05, the probability to make correct decision for an
individual test is 1 – 0.05 = 0.95
• If three related t-test are conducted, each alpha set at 0.05, the
probability of making correct decisions on all three of the tests is:
0.95 x 0.95 x 0.95 = 0.857375
• The probability of making error is 1 – 0.857375 = 0.142625. Thus,
the probability that would erroneously reject the null hypothesis
has increased from desired 0.05 to an actual error rate of
0.142625
Hazwan Mat Din / MyAgeing 26
Assumptions
• Random samples
• Independent samples
• Normality of distribution
Histogram
Sufficient sample size (eg. n = 15 per group)
• Homogeneity of variance
Levene’s test
If met, use Bonferroni adjustment
If violated, use Dunnett’s C
Hazwan Mat Din / MyAgeing 27
Hands on
• Data: iqdata.sav
• Hypothesis:
1) HA: There is significant mean difference of IQ level among student
groups
2) HA: There is a significant association between IQ level and student
groups
Hazwan Mat Din / MyAgeing 28
Assumption: Checking normality
Histograms show
normal distribution
Hazwan Mat Din / MyAgeing 29
Hands On: One Way ANOVA
Hazwan Mat Din / MyAgeing 30
Output
Out of 2268.9 variation,
Equal Variance met 431.2 units explained (19.0%)
The mean of IQ level is significantly different among groups
But which group?
Hazwan Mat Din / MyAgeing 31
Post hoc tests
• A significant F test in ANOVA table does not tell which pairs of
means are significantly different from one another.
• Need additional tests to determine exactly which mean differences
are significant
• Called post hoc test / multiple comparison procedures / posteriori
comparisons
• Which pair(s) that the significant lie?
Hazwan Mat Din / MyAgeing 32
Multiple Comparison Tests
• Bonferroni
• Duncan Multiple range test
Equal variance assumed
• Scheffe’s test
• Turkey’s test
Equal variance not assumed
• Dunnet’s C
Hazwan Mat Din / MyAgeing 33
Correction in Multiple Comparison
• What Bonferroni post hoc does
• To adjust error: divide the error rate by the number of tests, eg.
0.05/3 = 0.0167
Now alpha is set at 0.0167
The probability making a correct decision for an individual test is 1 –
0.0167 = 0.983
The probability making three correct decision would be 0.983 x 0.93 x
0.983 = 0.94986
The probability of making an error is 1 – 0.94986 = 0.05014 ≈ 0.05
Hazwan Mat Din / MyAgeing 34
Hands On: Post Hoc Test
Hazwan Mat Din / MyAgeing 35
Output
Physics vs Maths, p >0.999
Physics vs Chemistry, p = 0.027
Maths vs Chemistry, p = 0.029
Hazwan Mat Din / MyAgeing 36
Presentation
Table 3: Mean and standard deviation of IQ level among student groups
Group Mean (SD) F-statistics (df)* p-value
Physics student 35.80 (7.79)
Maths student 35.87 (6.94) 4.93 (2,42) 0.012
Chemistry student 42.40 (4.73)
*One Way ANOVA
Physics vs Maths, p >0.999
Physics vs Chemistry, p = 0.027
Maths vs Chemistry, p = 0.028 (Bonferroni)
There is significant mean difference of IQ level among student groups
There is a significant association between IQ level and student groups
Hazwan Mat Din / MyAgeing 37
CORRELATION
Hazwan Mat Din / MyAgeing 38
What is correlation?
• The relationship between two quantitative variables without being
able to infer causal relationships
• Is a statistical technique used to determine the degree to which
two variable are related
Hazwan Mat Din / MyAgeing 39
Correlation-scatter plot
• Two quantitative variables
• Points are not joined
Hazwan Mat Din / MyAgeing 40
CORRELATION(Pearson)
• Data: HbA1c.sav
• Hypothesis
• HA: There is significant correlation between age and level of HbA1c among
study sample
Hazwan Mat Din / MyAgeing 41
Assumptions
1. Random sample
2. Independent observation
3. Linear relationship Checked using scatter
plot, an elliptical shape
4. Bivariate normal distribution
*If assumption 3 &/or 6 are not met, use Spearman correlation
Hazwan Mat Din / MyAgeing 42
Checking assumption 3 & 4
An ‘elliptical shape’ suggests linear
relationship and bivariate norma
Hazwan Mat Din / MyAgeing 43
Hands on - Correlation
r > 0.75 very good-perfect correlation*
r > 0.50 – 0.75 moderate – good correlation*
r > 0.25 – 0.50 fair correlation*
r < 0.25 weak or no correlation*
There is significant correlation between age and
HbA1c level (p<0.001). The observed r is 0.35
which suggests positive and fair correlation
*Colton T.: Statistics in Medicine. Little, Brown, 1974
Hazwan Mat Din / MyAgeing 44
PEARSON CHI-SQUARE &
FISHER’S EXACT TEST
Hazwan Mat Din / MyAgeing 45
PEARSON CHI-SQUARE/FISHER’S EXACT
TEST
• To test the association between two categorical variables
• Example:
Gender (Male, Female) vs HIV (Yes, No)
Race (Malay, Chinese, Indian) vs Smoking Status (Yes, No)
Hazwan Mat Din / MyAgeing 46
Checking Assumptions
• Two variables are independent
• Two variables are categorical
• Expected count of < 5
• Equal or more 20 % - Fisher exact test
• Less than 20% - Pearson Chi-square
Hazwan Mat Din / MyAgeing 47
PEARSON CHI-SQUARE
• Data: Coronary.sav
• Hypothesis
HA: There is significant association between gender and coronary artery
disease
Hazwan Mat Din / MyAgeing 48
Hands on – Pearson Chi-Square
Hazwan Mat Din / MyAgeing 49
Output – Pearson Chi-Square
p-value <0.001
There is significant association
between gender and coronary
artery disease
Assumption met for expect count <5 less than 20%
Hazwan Mat Din / MyAgeing 50
MULTIPLE LINEAR
REGRESSION
Hazwan Mat Din / MyAgeing 51
REGRESSION
• Regression analysis is a statistical tool that utilizes the relationship
between variables so that one variable can be predicted from the
other, or others
• Uses a variable (x) to predict some outcome variable (y)
• Values in y change as a function of changes in values of x
Hazwan Mat Din / MyAgeing 52
SIMPLE LINEAR REGRESSION
• To explore the nature of relationship between two continuous
variable
• To investigate the change in response
How much the value Y (dependent variable) varies with one unit of
change in value X (independent variable)
Y = mx + c
m: slope of the line
c: intercept on y axis
Hazwan Mat Din / MyAgeing 53
REGRESSION COEFFICIENT
• The constant (m) the represent the rate of change of one variable
(y) as a function of changes in other (x)
• The slope of regression line
Hazwan Mat Din / MyAgeing 54
Coefficient of Determination (r2)
• It provides a measure of how well future outcomes are likely to be
predicted by the model
• How much the independent ‘x’ is explained by dependent ‘y’
• = Square of Correlation coefficient in SLR
Hazwan Mat Din / MyAgeing 55
Hands On
• Data: Cholesterol_MLR.sav
• Hypothesis
HA: There is significant linear relationship between age, diet, exercise and
socio-demographic index with serum cholesterol level
Hazwan Mat Din / MyAgeing 56
Hands On: SLR
Hazwan Mat Din / MyAgeing 57
Output
Regression coefficient:
y = mx + c
m = 0.06
c = 5.90
Cholesterol = 5.90 + (0.06*age)
Increase of one year of age will
increase 0.06 mmol/L of serum
cholesterol (0.02, 0.09, p = 0.002)
Hazwan Mat Din / MyAgeing 58
SLR
• Simple (SLR) – Only ONE independent variable
Age Cholesterol
Diet Cholesterol
Exercices Cholesterol
Socio-
Cholesterol
economic
Hazwan Mat Din / MyAgeing 59
Presentation
Table 4. Association of age, duration of exercise, diet inventory score and socio-
economic index with serum cholesterol level using simple linear regression
SLRa
Independent variable
b*(95% CI) P value
Age (years) 0.06 (0.02, 0.09) 0.002
Duration of exercise (hrs/week) -0.62 (-0.79, -0.46) <0.001
Diet inventory score 0.45 (0.30, 0.61) <0.001
Socio-economic index 0.21 (0.17, 0.25) <0.001
a Simplelinear regression (dependent variable: cholesterol mmol/L)
* Crude regression coefficient
Hazwan Mat Din / MyAgeing 60
MULTIPLE LINEAR REGRESSION
• Very often a single independent variable provides an inadequate
description about the variation in a dependent variable
For example in order to explain variability in cholesterol level, using age
as the only independent variable will be inadequate
Variable age may explain only small percentage of variability of the total
variability in cholesterol level
The effect found may not be due to age but may be influenced by other
factors like exercise, diet and socio-economic index
Hazwan Mat Din / MyAgeing 61
MULTIPLE LINEAR REGRESSION
Cholesterol
Age Socio-economic
Diet
Exercise
Hazwan Mat Din / MyAgeing 62
Steps in Analysis
1. Data exploration
2. SLR
3. Variable selection
Preliminary main-effect model
4. Checking interaction and multicollinearity
Preliminary final model
5. Checking model assumptions
Final model
6. Interpretation & presentation
Hazwan Mat Din / MyAgeing 63
Step 3: Variable selection
• Repeat step for variable selection for:
Stepwise method
Forward method
Backward method
• Choose with the most numbers of significant independent variable
Hazwan Mat Din / MyAgeing 64
Hands on - Variable selection
Hazwan Mat Din / MyAgeing 65
Output – Stepwise method
Hazwan Mat Din / MyAgeing 66
Output – Forward Method
Hazwan Mat Din / MyAgeing 67
Output – Backward Method
Except for forward method, backward and stepwise contained the
same variables. Backward method model chosen as it contained all
significant variable
Preliminary main-effect model: Cholesterol = 7.297 + 0.033*age + 0.394*diet – 0.540*exercise
Hazwan Mat Din / MyAgeing 68
Step 4.1: Checking interaction
• All possible two way interaction should be checked
• Create interaction terms by transform, compute menu
• Add the interaction term and run the model using method “Enter”
• Check ONE interaction at ONE time
• Kept in the model if p-value <0.05
Hazwan Mat Din / MyAgeing 69
Hands on – Checking Interaction
Hazwan Mat Din / MyAgeing 70
Output – Checking Interaction
Hazwan Mat Din / MyAgeing 71
Step 4.2: Checking Multicollinearity
No VIF>10
No problem
Hazwan Mat Din / MyAgeing 72
Step 5: Checking Assumptions
• Random sample
• Linearity
• (Overall fitness) Scatter plot of unstandardized residuals against
unstandardized predicted values (XPYR)
• Independent observations
• Normality
• Histogram of unstandardized residual
• Equal variance
• Same procedure as Linearity (Scatter plot)
Hazwan Mat Din / MyAgeing 73
Hands on – Generate residual & predicted
values
Hazwan Mat Din / MyAgeing 74
Checking Assumptions- Linearity & Equal
Variance
Hazwan Mat Din / MyAgeing 75
Output –Linearity & Equal variance
Example for non-linear & non equal
variance scatter plot
Hazwan Mat Din / MyAgeing 76
Checking Assumption – Normality of
residual
Hazwan Mat Din / MyAgeing 77
Step 6: Presentation
Table 5. Association of age, duration of exercise, diet inventory score and socio-
economic index with serum cholesterol level using multiple linear regression
MLRa
Indipendent variable
Adj. b*(95% CI) P value
Age (years) 0.03 (0.01, 0.06) 0.005
Duration of exercise (hrs/week) -0.54 (-0.66, -0.44) <0.001
Diet inventory score 0.29 (0.29, 0.50) <0.001
aMultiple LinearRegression (R2 = 0.69; The model fits well; Model assumptions are met; There is no interaction effect between
independent variables and no multicollinearity problem)
*Adjusted regression coefficient
For prediction study, it is essential to report final model equation:
Cholesterol = 7.30 + (0.03*Age) + (-0.54*Exercise) + (0.29*Diet)
Hazwan Mat Din / MyAgeing 78
Step 6: Interpretation
• There is significant linear relationship between age and serum
cholesterol level (p = 0.05)
• One year older have cholesterol level higher by 0.03 mmol/L (95% CI: 0.01,0.06
mmol/L)
• There significant negative linear relationship between duration of
exercise and serum cholesterol level (p<0.001)
• Those having 1 hr/wk more exercise have cholesterol level lower by 0.54
mmol/L (95% CI: -0.66,-0.44)
• There is significant linear relationship between diet inventory index
and serum cholesterol level (p<0.001)
• Those with 1 unit more of the index, have cholesterol level higher by 0.39
mmol/L (95% CI: 0.29,0.50 mmol/L)
• With three significant variables, the model explains 69% of the
variation in the serum cholesterol level in study sample
Hazwan Mat Din / MyAgeing 79
MULTIPLE LOGISTIC
REGRESSION
Hazwan Mat Din / MyAgeing 80
MULTIPLE LOGISTIC REGRESSION
• Multiple logistic regression is the estimation of the relationship
between a dichotomous dependent variable and more than one
independent variables or covariates
• Independent variables are the combination of numerical and
categorical variables
• Outcome is binary categorical variable
Hazwan Mat Din / MyAgeing 81
ODDS
• The odds = the chance
• The odds of an event is the ratio of the number of ways the event
can occur to the number of ways even cannot occur
• Eg.: On average 54 girls are born in every 100 births. What is the
odds of any randomly chosen delivery to be a girl?
• Number of girls/number of boys =54/46 = 1.17
• So, about 1.17 the odds to get a baby girl
Hazwan Mat Din / MyAgeing 82
ODDS RATIO
• Odds ratio is calculated by dividing the 2 odds
• Eg.: What is the odds ratio of men to have CAD compared to
women?
• The odds of men having CAD/The odds of women having CAD
Have CAD = 1 No CAD = 0
Men = 1 a b
Women = 0 c d
Hazwan Mat Din / MyAgeing 83
• The odds of men having CAD =
n of men having CAD (a)
n of men not having CAD (b)
• The odds of women having CAD =
n of women having CAD (c)
n of women not having CAD(d)
• The odds ratio (OR) of men to have CAD compared to women =
(a/b)/(c/d)
Thus, OR = ad/bc
Hazwan Mat Din / MyAgeing 84
CODING OF CATEGORICAL VARIABLES
• Always start with 0
• 0: Reference group (low risk, non-diseased, normal)
• Eg. 2 level categories:
Smoking status: 0 (non-smoker), 1 (smoker)
Cancer status: 0 (no cancer), 1 (has cancer)
• Eg. 3 level categories:
Race: 0 (Malay), 1 (Chinese), 2 (Indian)
Income level : 0 (low), 1(medium), 3 (high)
Hazwan Mat Din / MyAgeing 85
STEPS IN MULTIPLE LOGISTIC
REGRESSION
1. Data exploration & cleaning
2. Simple logistic regression
3. Variable selection
Preliminary main effect model
4. Checking interaction & multicollinearity
Preliminary final model
5. Checking assumption
Final model
6. Interpretation and presentation
Hazwan Mat Din / MyAgeing 86
Hands on
• Dateset: Coronary.sav
• Research questions:
• What are the factors associated to coronary artery disease among study
sample?
Hazwan Mat Din / MyAgeing 87
Step 2: Simple logistic regression
• To screen for important independent variable
Hazwan Mat Din / MyAgeing 88
Output
• B – regression coefficient
• S.E – Standard error
• Exp (B) – ODDS RATIO
- Exp (0.332) = 1.394
Showing the reference group • 95% CI for Exp (B)/OR
defined in analysis for • Gender is significant to the model
categorical variable • Interpretation:
- Men has 1.4 times the odd to have CAD compared to
women when others confounders were not adjusted
• If the odd ratio less than 1, it is protective (less risk)
Hazwan Mat Din / MyAgeing 89
• For numerical variable
• Interpretation:
- A person with 1mmHg increase in systolic blood pressure
has 1.03 times the odds to have CAD when other
confounders were not adjusted
Repeat step 2 for all independent variables:
oDbp, chol, age, bmi, race
Hazwan Mat Din / MyAgeing 90
Result from Simple Logistic Regression
Table 6. Associated factors of coronary artery disease by simple logistic regression
model
Variable Regression Crude Odds Ratio Wald p-value
coefficient (b) (95 % CI) Statistic
Systolic blood pressure (mmHg) 0.02 1.025 (1.02,1.03) 203.49 <0.001
Diastolic blood pressure 0.05 1.053 (1.05,1.06) 245.58 <0.001
(mmHg)
Serum cholesterol (mmol/L) 0.25 1.28 (1.21,1.37) 63.07 <0.001
Age (years) 0.03 1.04 (1.02,1.05) 45.50 <0.001
Body mass index (unit) -0.01 0.99 (0.97,1.01) 1.15 0.285
Gender
Women 0 1
Men 0.33 1.39 (1.17,1.66) 13.89 <0.001
Race
Malay 0 1
Chinese -0.03 0.97 (0.79,1.19) 0.08 0.772
India -0.09 0.91 (0.74,1.12) 0.74 0.389
Hazwan Mat Din / MyAgeing 91
Step 3: Variable Selection
• Include all significant independent variables from simple logistic
regression.
• May include clinically important independent variable
• Method used for comparison:
• Forward (LR) method
• Backward (LR) method
Hazwan Mat Din / MyAgeing 92
Step 3:Hands on
Hazwan Mat Din / MyAgeing 93
Output
Repeat same process using Backward(LR)
Hazwan Mat Din / MyAgeing 94
Comparing Forward(LR) & Backward(LR)
Forward (LR)
Backward (LR)
Forward (LR) provide model without
non-significant independent
variable. Variable in Forward(LR) is
chosen for preliminary main effect
model.
Hazwan Mat Din / MyAgeing 95
Step 4: Checking Multicollinearity &
Interaction
Checking multicollinearity
Hazwan Mat Din / MyAgeing 96
Output: Multicollonearity
No >0.5
No problem
Hazwan Mat Din / MyAgeing 97
Checking interaction
• Check for possible 2-way interaction in this model
• Dbp & chol
• Dbp & gender
• Chol gender
Hazwan Mat Din / MyAgeing 98
Use Ctrl key
The interaction term (cholesterol & diastolic blood pressure) is
not significant
Hazwan Mat Din / MyAgeing 99
The interaction term (cholesterol & gender) is not significant
The interaction term (diastolic blood pressure & gender) is not
significant
Hazwan Mat Din / MyAgeing 100
Step 5: Checking Assumptions
• The Hosmer-Lemeshow test
• Classification table
• Area under the curve Receiver Operating Characteristic (ROC)
curve
Hazwan Mat Din / MyAgeing 101
Classification table
Hosmer-Lemeshow test checking
Not significant for
assumption met
>70% for assumption met
Hazwan Mat Din / MyAgeing 102
• Area under the curve checking
Create predicted value
Hazwan Mat Din / MyAgeing 103
Create ROC curve
• Significant p-value means assumption met for area
under the curve
• The model can accurately discriminate 70.9% of the
cases
Hazwan Mat Din / MyAgeing 104
Step 6: Interpretation & presentation
Hazwan Mat Din / MyAgeing 105
Table 7: Associated factors of coronary heart disease by multiple logistic regression model
Variable Regression Adjusted Odds Ratio Wald p-value
coefficient (b) (95 % CI) Statistic
Diastolic blood pressure (mmHg) 0.05 1.05 (1.04,1.06) 212.62 <0.001
Serum cholesterol (mmol/L) 0.14 1.15 (1.07,1.23) 15.66 <0.001
Gender
Women 0 1
Men 0.40 1.49 (1.24,1.78) 18.55 <0.001
aForward LR Multiple Logistic Regression model was applied
Multicollinearity and interaction were checked and not found
Hosmer-Lemeshow test, (p=0.214), classification table (overall correctly classified percentage=86.4%)and area under the
ROC curve (70.9%) were applied to check the model fitness
Hazwan Mat Din / MyAgeing 106
Interpretation
• A person with an increase in 1mmHg of diastolic blood pressure has a
1.05 times the odd to have coronary heart disease (95% CI:1.04,1.06,
p<0.001)when adjusted for gender and serum cholesterol
• A person with an increase in 1mmol/l of serum cholesterol has a 1.15
times the odds to have coronary heart disease (95% CI: 1.07,1.23,
p<0.001) when adjusted for diastolic blood pressure and gender
• Men has 1.49 times the odds compared to women to have coronary
heart disease (95% CI: 1.24,1.78, P<0.001) when adjusted for diastolic
blood pressure and serum cholesterol
• The prediction equation:
Log (p/1-p) = -7.242 + (0.050*dbp) + (0.137*chol) + [0.398*gender(1)]
Hazwan Mat Din / MyAgeing 107
References
• Lecture notes for Medical Statistics course by Prof. Syed Hatim
Noor (2013). Unit of Biostatistics and Research Methodology,
Universiti Sains Malaysia.
• Lecture notes for Medical Statistics course by Dr. Sarimah
Abdullah (2013). Unit of Biostatistics and Research Methodology,
Universiti Sains Malaysia.
• Lecture notes for Medical Statistics course by Assoc. Prof Siti
Norsa’adah Bachok (2013). Unit of Biostatistics and Research
Methodology, Universiti Sains Malaysia.
Hazwan Mat Din / MyAgeing 108
Thank You
Hazwan Mat Din / MyAgeing 109