Zemen post Graduated Collage
DEPARTMENT OF PUBLIC HEALTH
BIOSTATISTICS 2 GROUP ASSIGNMENT ON
“QUESTIONS RELATED TO THE SUBJECT MATTER”
Group member name ID
1 Ousman Ahmed ……… 383/Zpgce/2014
2 Tewfik Ahmed 405/ Zpgce/2014
3 Mitku Eshetu……………...367/ Zpgce/2014
4 Wessen Ayalew …………… 411/Zpgce/2014
5 Zewdu Tibebu …….……… 418/Zpgce/2014
6 Mohamed Yesuf ….…….... 368/Zpgce/2014
7 Selamawit Yimam……....
SUBMITTED TO: Mr. Getawa walle (Assistant Professor)
DESSIE, ETHIOPIA
April, 2023
Acknowledgement
I would like to acknowledge Zemen post graduated Collage, Department of Public Health for
giving us the chance to engage in MPH program. We are also grateful to thank Mr. Getawa walle
(Assistant Professor), who is instructor at ZPGC, for his valuable teaching and gave us this
assignment to practice our practical knowledge.
1) Linear regression
A) Check and report the assumptions of linear regression
Solution Simple Linear Regression (Linear Regression) study The Relation Ship between two
continuous Variable in order to predict the Value of the dependent Variable based on the value
of the independent Variable.
Simple linear regression Equation Y= bo+b1X+ℇ
Where Y is the Dependent Variable, X is the Independent Variable, b1 is the regression
coefficient, bo is constant (intercept).
The Epsilon term(ℇ) is Residual Value (Simply Error) this variable capture all the influence that
are not explained by the Independent Variable X.
The Aim of regression Analysis
1) Predict the Values of the response variable for different Value of the independent variable.
2) Determine whether the variations of the independent variable can explain the variation in the
dependent variable and what is exist
Assumption of linear Regression
Assumption 1: - Both Variables (the Dependent and Independent variable are continuous)
If the data is set linear regression run simple regression to determine whether infant birth weight
is influenced by family income.
Note The above two variables are continuous variable so full fill Assumption 1
Assumption 2: is Linear relationship: There must be a linear relationship between the independent
and dependent variables The relationships between variables are approximately linear. The graph
below indicates as Family income increase the birth weight of the infant increase so is full fill the
second assumption
Assumption 3: - Normality: The residuals of the model should be normally distributed to ensure that
the model is not biased.
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
birth weight in kgs .125 70 .009 .968 70 .070
a. Lilliefors Significance Correction
Our null hypothesis that residuals are normally distributed rejected because the sig value is 0.07
which greater than 0.05 so failed to reacted the null hypothesis this means the data is normally
disturbed so accepted Assumption number 3
Assumption 4 :-Independence: The observations should be independent of each other. This means
The errors are independent: there is no relationship between Residual variables an independent variable
Model Summaryb
Adjusted R Std. Error of the
Model R R Square Square Estimate Durbin-Watson
1 .726a .527 .520 .36815 2.159
a. Predictors: (Constant), Monthly family income
b. Dependent Variable: birth weight in kgs
As the results of Durbin-Watson 2.159 is between 1.5 to 2.5 there is no relationship between
Residual variables an independent variable so assumption 4 is full filled.
Assumption 5:- . Homoscedasticity: The variance of the residuals should be constant across all levels
of the independent variable.
As the graph above it assumption 5 Homoscedasticity is satisfied
Assumption 6: -there are not significant outliers’ data series
Descriptive
Statistic Std. Error
birth weight in kgs Mean 2.9857 .06349
95% Confidence Interval for Lower Bound 2.8591
Mean Upper Bound 3.1124
5% Trimmed Mean 2.9865
Median 3.0000
Variance .282
Std. Deviation .53121
Minimum 1.85
Maximum 4.00
Range 2.15
Interquartile Range .57
Skewness -.120 .287
Kurtosis -.323 .566
As the minimum and maximum absolute value is less than 3 in indicates that are not out liers
problems in addition to this cook’s value are less than 1 full fill assumption 6.
B) Check multi-collinearity between explanatory variables if exist and
report it.
Solution
To check for multicollinearity between the explanatory variables, we can use methods such as
Variance Inflation Factor (VIF) or correlation matrix. Multicollinearity exists when there is a high
correlation between two or more explanatory variables, which can lead to incorrect coefficient
estimates. If multicollinearity is found, we may need to remove one of the multicollinearity
variables from the model.
Coefficientsa
Standardize
d
Unstandardized Coefficient 95.0% Confidence Collinearity
Coefficients s Interval for B Statistics
Lower Upper Toleranc
Model B Std. Error Beta t Sig. Bound Bound e VIF
1 (Constant) -10.601 1.827 -5.803 .000 -14.252 -6.950
Period of gestation.029 .008 .416 3.424 .001 .012 .045 .219 4.567
(days)
Height of mother in.034 .012 .375 2.763 .008 .009 .058 .176 5.682
cms
Age of father in years .004 .006 .059 .760 .450 -.007 .016 .542 1.845
Monthly family.000 .000 .106 1.146 .256 .000 .000 .377 2.656
income
Age of mother in years .005 .009 .053 .512 .610 -.014 .023 .302 3.307
Sex -.046 .070 -.042 -.657 .513 -.187 .094 .778 1.285
a. Dependent Variable: birth weight in kgs
Note :-According to Tabacchnisk and Fidel (2001) If the Tolerance Value is <1 and the
Variance inflation Factor(VIF)>10 it indicates the presence of Multi collinearity in the data
Based on the Above Data Tolerance Factor is >1 and VIF <10 so there is not multi collinearity
among the data.
C) By taking infant birth weight as dependent variable, do linear regression to identify factors
associated with infant weight change. (First, do bivariable and then multivariable linear
regression).
Solution
Bivariable (simple) linear regression studies the relationship between a single continuous
dependent variable Y and one independent variable X. There are many factors which affects
infant birth weight. But I will select some scientifically sound variables to do bivariable (simple)
linear regression.
Y ₌ a + bX
Hypothesis testing
Ho: β ₌ 0
Ha: β not equal to zero, or β>0, or β<0
Decision; if sig < o.o5 reject Ho
Bivariate Linear Regression
Family income and infant birth weight
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 3040.271 7.509 404.880 .000
1
Family Income .560 .035 .185 16.211 .000
a. Dependent Variable: Weight of infant at birth
R²₌ 0.034 and ANOVA test, P<0.05
The model fits the data.
Infant birth weight ₌ 3040.271 + 0.56 (Family income)
Interpretation; as family income increases in one unit, on average infant birth weight increases
by 0.185 kg.
Family size and infant birth weight
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 2909.918 16.038 181.441 .000
1
Family Size 36.825 2.707 .152 13.605 .000
a. Dependent Variable: Weight of infant at birth
R²₌ 0.023 and ANOVA test, P<0.05
The model fits the data.
Infant birth weight ₌ 2909.918 + 36.825 (Family size)
Interpretation; as family size increases in one unit, on average infant birth weight increases by
0.152kg.
- As family size increases by one person, on average infant birth weight increases by
36.825grams.
Highest grade completed mother and infant birth weight
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 3062.063 6.937 441.425 .000
1 Highest grade completed 78.686 5.940 .148 13.247 .000
mother
a. Dependent Variable: Weight of infant at birth
R²₌ 0.022 and ANOVA test, P<0.05
The model fits the data.
Infant of birth weight ₌ 3062.063+ 78.686 (Highest grade completed mother)
Interpretation; as highest grade completed mother increases in one unit, on average infant birth
weight increases by 0.148kg.
Total live births by mom and infant birth weight
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 2935.368 17.341 169.275 .000
1
Total live births by mom 85.556 7.849 .122 10.901 .000
a. Dependent Variable: Weight of infant at birth
R²₌ 0.022 and ANOVA test, P<0.05
The model fits the data.
Infant of birth weight ₌ 2935.368+ 85.556 (Total live births by mom)
Interpretation; as total live births by mom increases in one unit, on average infant birth weight
increases by 0.122kg.
- As total live births by mom increases in one live birth, on average infant birth weight
increases by 85.556grams.
Mother's Age and infant birth weight
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 3052.130 14.403 211.903 .000
1
Mother's Age 20.131 4.315 .053 4.665 .000
a. Dependent Variable: Weight of infant at birth
R²₌ 0.003 and ANOVA test, P<0.05
The model fits the data.
Infant of birth weight ₌ 3052.130+ 20.131 (Mother's Age)
Interpretation; as mother's age increases in one unit, on average infant birth weight increases by
0.053kg.
- As mother's age increases in one year, on average infant birth weight increases by
20.131grams.
-
- Multivariable Linear Regression
It is the analysis of data that takes into account a number of independent variables and one
dependent variable simultaneously.
I selected the following variables to assess their association with infant birth weight.
Descriptive Statistics
Mean Std. Deviation N
Weight of infant at birth 3115.81 516.801 7171
Study area 2.35 .818 7171
Total pregnancies by mom 2.14 .729 7171
Total live births by mom 2.12 .730 7171
Family Size 5.53 2.103 7171
mother's age 26.59 6.266 7171
Marital status 1.01 .147 7171
Highest grade completed of 1.07 1.140 7171
husband
Family Wealth 1.16 .857 7171
Family Income 136.40 172.897 7171
Mother's Age 3.10 1.336 7171
Length of infant at birth 49.200 2.8049 7171
Arm circumference of infant 10.731 1.1234 7171
Highest grade completed .62 .952 7171
mother
Occupation of mother 1.31 1.037 7171
By using forward method the significant variables are the following.
Variables Entered/Removeda
Model Variables Entered Variables Method
Removed
1 Arm circumference of infant . Forward (Criterion: Probability-of-F-to-enter <= .050)
2 Length of infant at birth . Forward (Criterion: Probability-of-F-to-enter <= .050)
3 Study area . Forward (Criterion: Probability-of-F-to-enter <= .050)
4 Total live births by mom . Forward (Criterion: Probability-of-F-to-enter <= .050)
5 Family Income . Forward (Criterion: Probability-of-F-to-enter <= .050)
6 Family Size . Forward (Criterion: Probability-of-F-to-enter <= .050)
7 Mother's Age . Forward (Criterion: Probability-of-F-to-enter <= .050)
a. Dependent Variable: Weight of infant at birth
Check whether the model fits the data or not using R² and ANOVA test, P<0.05.
Model Summaryh
Model R R Adjusted Std. Error of Change Statistics
Square R Square the R Square F Change df1 df2 Sig. F
Estimate Change Change
1 .579a .336 .336 421.271 .336 3621.535 1 7169 .000
2 .689b .475 .475 374.495 .139 1903.712 1 7168 .000
3 .709c .503 .503 364.417 .028 402.965 1 7167 .000
4 .712d .507 .507 362.915 .004 60.437 1 7166 .000
5 .713e .509 .508 362.400 .001 21.392 1 7165 .000
6 .713f .509 .509 362.274 .000 5.970 1 7164 .015
7 .714g .509 .509 362.150 .000 5.929 1 7163 .015
a. Predictors: (Constant), Arm circumference of infant
b. Predictors: (Constant), Arm circumference of infant, Length of infant at birth
c. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area
d. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area, Total live births by
mom
e. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area, Total live births by
mom, Family Income
f. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area, Total live births by
mom, Family Income, Family Size
g. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area, Total live births by
mom, Family Income, Family Size, Mother's Age
h. Dependent Variable: Weight of infant at birth
Interpretation; The model summary table shows that the model is fit because R square and
Adjusted R square are continuously increased at each steps from 0.336 to 0.509.
ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 642711099.052 1 642711099.052 3621.535 .000b
1 Residual 1272277058.608 7169 177469.251
Total 1914988157.660 7170
Regression 909700293.256 2 454850146.628 3243.216 .000c
2 Residual 1005287864.404 7168 140246.633
Total 1914988157.660 7170
Regression 963213824.088 3 321071274.696 2417.714 .000d
3 Residual 951774333.572 7167 132799.544
Total 1914988157.660 7170
Regression 971173812.835 4 242793453.209 1843.432 .000e
4 Residual 943814344.825 7166 131707.277
Total 1914988157.660 7170
Regression 973983332.839 5 194796666.568 1483.221 .000f
5 Residual 941004824.821 7165 131333.541
Total 1914988157.660 7170
Regression 974766831.204 6 162461138.534 1237.870 .000g
6 Residual 940221326.456 7164 131242.508
Total 1914988157.660 7170
Regression 975544468.191 7 139363495.456 1062.608 .000h
7 Residual 939443689.469 7163 131152.267
Total 1914988157.660 7170
a. Dependent Variable: Weight of infant at birth
b. Predictors: (Constant), Arm circumference of infant
c. Predictors: (Constant), Arm circumference of infant, Length of infant at birth
d. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area
e. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area, Total live births by
mom
f. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area, Total live births by
mom, Family Income
g. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area, Total live births by
mom, Family Income, Family Size
h. Predictors: (Constant), Arm circumference of infant, Length of infant at birth , Study area, Total live births by
mom, Family Income, Family Size, Mother's Age
The ANOVA table also shows us the model fits the data because sig<0.05(significant).
Coefficientsa
Model Unstandardized Standardi t Sig. Correlations Collinearity
Coefficients zed Statistics
Coefficien
ts
B Std. Error Beta Zero- Partial Part Toleran VIF
order ce
(Constant) 256.003 47.781 5.358 .000
1 Arm circumference of 266.500 4.428 .579 60.179 .000 .579 .579 .579 1.000 1.000
infant
(Constant) -2649.747 78.990 -33.545 .000
Arm circumference of 197.059 4.246 .428 46.407 .000 .579 .481 .397 .860 1.163
2
infant
Length of infant at birth 74.206 1.701 .403 43.632 .000 .563 .458 .373 .860 1.163
(Constant) -2140.416 80.944 -26.443 .000
Arm circumference of 199.665 4.134 .434 48.298 .000 .579 .496 .402 .859 1.165
3 infant
Length of infant at birth 68.405 1.680 .371 40.717 .000 .563 .433 .339 .834 1.199
Study area -107.246 5.343 -.170 -20.074 .000 -.250 -.231 -.167 .969 1.032
(Constant) -2180.220 80.773 -26.992 .000
Arm circumference of 197.269 4.129 .429 47.782 .000 .579 .492 .396 .854 1.171
infant
4
Length of infant at birth 67.886 1.674 .368 40.543 .000 .563 .432 .336 .833 1.201
Study area -109.931 5.332 -.174 -20.618 .000 -.250 -.237 -.171 .965 1.036
Total live births by mom 45.991 5.916 .065 7.774 .000 .119 .091 .064 .986 1.014
(Constant) -2204.713 80.831 -27.275 .000
Arm circumference of 196.688 4.125 .428 47.687 .000 .579 .491 .395 .853 1.172
infant
5 Length of infant at birth 67.532 1.674 .367 40.346 .000 .563 .430 .334 .831 1.203
Study area -97.180 5.996 -.154 -16.209 .000 -.250 -.188 -.134 .761 1.314
Total live births by mom 46.203 5.908 .065 7.821 .000 .119 .092 .065 .986 1.014
Family Income .130 .028 .044 4.625 .000 .190 .055 .038 .773 1.293
(Constant) -2208.156 80.816 -27.323 .000
Arm circumference of 196.462 4.124 .427 47.637 .000 .579 .490 .394 .853 1.173
infant
Length of infant at birth 67.419 1.674 .366 40.278 .000 .563 .430 .333 .830 1.204
6
Study area -95.994 6.013 -.152 -15.964 .000 -.250 -.185 -.132 .756 1.323
Total live births by mom 33.304 7.921 .047 4.204 .000 .119 .050 .035 .548 1.824
Family Income .118 .029 .040 4.145 .000 .190 .049 .034 .751 1.332
Family Size 6.796 2.781 .028 2.443 .015 .160 .029 .020 .535 1.869
(Constant) -2206.650 80.790 -27.313 .000
Arm circumference of 196.441 4.123 .427 47.648 .000 .579 .491 .394 .853 1.173
infant
Length of infant at birth 67.370 1.673 .366 40.259 .000 .563 .430 .333 .830 1.204
7 Study area -94.389 6.047 -.149 -15.609 .000 -.250 -.181 -.129 .747 1.339
Total live births by mom 45.457 9.360 .064 4.856 .000 .119 .057 .040 .392 2.549
Family Income .116 .029 .039 4.066 .000 .190 .048 .034 .750 1.333
Family Size 8.059 2.828 .033 2.849 .004 .160 .034 .024 .517 1.934
Mother's Age -11.289 4.636 -.029 -2.435 .015 .051 -.029 -.020 .477 2.098
a. Dependent Variable: Weight of infant at birth
Interpretation coefficient b
Ŷ = a + b1x1 + b2x2 + b3x3 + ... + b7x7
Where Ŷ = the predicted Weight of infant at birth; a= the intercept; b1= the regression
coefficient for variable 1; x1= the value of variable 1, b2= the regression coefficient for variable
2; x2= the value of variable 2…. and so on to b7 and x7 for variable 7.
Ŷ (Weight of infant at birth) = -2206.650+196.441(Arm circumference of infant) +67.370(Length of
infant at birth) -94.389(Study area) +45.457(Total live births by mom) +0.116(Family Income) +
8.059(Family Size) -11.289(Mother's Age)
Positively related Variables which are directly related
An increase of Arm circumference of infant by one, infant birth weight increases by196.441
An increase of Length of infant at birth by one, infant birth weight increases by 67.370
An increase of Total live births by mom by one, infant birth weight increases by 45.457
An increase of Family Income by one, infant birth weight increases by 0.116
An increase of Family Size by one, infant birth weight increases by 8.059
Negatively related variables which are inversely related
An increase of Study area by one, infant birth weight decreases by 94.389
An increase of Mother's Age by one, infant birth weight decreases by 11.289
2. Check the assumption of linear regression, specifically normality and homogeneity of
variance.
We can check normality by Histogram and normal P P plot.
Interpretation; the histogram is symmetrical thus indicates standardised residuals are normally
distributed.
Interpretation; the normal p-p plot shows relationships between dependent variable and
independent variables are approximately linear.
Homogeneity of variance; variance is constant which is checked by residual plot.
1. Check multicollinearity between explanatory variables if exist and report
Multicollinearity is diagnosed for those variables which were entered to final multivariable
analysis.
Coefficientsa
Model Unstandardized Standardi t Sig. Collinearity
Coefficients zed Statistics
Coefficie
nts
B Std. Beta Toleranc VIF
Error e
(Constant) -2260.138 92.151 -24.526 .000
TOTAL PREGNANCIES -13.854 11.745 -.066 -1.179 .238 .022 45.808
BY MOM
TOTAL LIVE BIRTH BY 5.484 12.367 .025 .443 .657 .021 47.876
1 MOM
Study area -93.482 6.634 -.148 -14.092 .000 .621 1.611
Total pregnancies by mom 5.364 30.549 .008 .176 .861 .037 27.095
Total live births by mom 55.711 31.099 .079 1.791 .073 .036 28.151
Family Size 9.261 3.185 .038 2.908 .004 .408 2.452
mother's age 2.799 3.487 .034 .803 .422 .038 26.107
Highest grade completed -5.159 5.582 -.011 -.924 .355 .452 2.213
of husband
Family Wealth 8.080 5.853 .013 1.381 .167 .727 1.376
Family Income .117 .036 .039 3.275 .001 .482 2.073
Mother's Age -19.426 16.117 -.050 -1.205 .228 .039 25.360
Length of infant at birth 67.346 1.676 .366 40.191 .000 .828 1.208
Arm circumference of 196.195 4.125 .426 47.558 .000 .851 1.175
infant
Highest grade completed 6.594 6.792 .012 .971 .332 .438 2.285
mother
Occupation of mother -7.191 4.801 -.014 -1.498 .134 .738 1.355
a. Dependent Variable: Weight of infant at birth
b.
Interpretation of multicollinearity
Possible correlations among the variables at p<0.05 from multicollinearity diagnosis table
determined by Vif value described as follows.
- Vif >10 multicollinearity effect is severe potential which are total pregnancy by mom,
total life birth by mom and mother's age have severe multicollinearity effect
- Vif 5-10 multicollinearity effect is medium potential - no variable in this category
- Vif <5 multicollinearity effect is less potential - the rest have less potential to
multicollinearity effect
2)Logistic regression
1. By taking birth weight as outcome: <2500 gm (low BWt) and >2500 gm, do logistic
regression to explore factors associated with normal birth weight. (First, do
bivariable and then multivariable logistic regression).
Logistic regression is the appropriate regression analysis to conduct when the dependent variable
is dichotomous (binary). Logistic regression is used to describe data and to explain the
relationship between one dependent binary variable and one or more nominal, ordinal, interval or
ratio level independent variables. According to the given data the weight of infant at birth is
continuous variable. So, first l have transformed the continuous infant weight into categorical
variable by using recode into new variable as “0” for <2500 gm and “1” for >2500 gm. And do
bi variable logistic and then multi variables through steps.
Bivariable logistic regression
Case Processing Summary
Unweighted Casesa N Percent
Included in Analysis 7389 91.8
Selected Cases Missing Cases 661 8.2
Total 8050 100.0
Unselected Cases 0 .0
Total 8050 100.0
a. If weight is in effect, see classification table for the total number
of cases.
The Case Processing Summary simply tells us about how many cases are included in our
analysis and how many are missed. This Case Processing Summary tells us that 661 participants
are missed and 7389 cases are analyzed.
Dependent Variable Encoding
Original Value Internal Value
Low birth wt. 0
Normal birth wt. 1
This table tells us how our outcome variable is encoded.
Classification Tablea,b
Observed Predicted
Weight of infant at birth Percentage Correct
<2500 ≥2500
<2500 0 624 .0
0 100.0
Weight of infant at birth 6765
Step 0 ≥2500
Overall Percentage 91.6
a. Constant is included in the model.
b. The cut value is .500
Classification table tells us about baseline model which is a model that does not include our
explanatory variables.
Interpretation; Classification table tells us 91.6% correct prediction by the baseline model.
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step Constan 2.383 .042 3245.25 1 .000 10.841
0 t 4
Interpretation; since p value is <0.05, constant is a statistically significant predictor of the
outcome.
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1609.428 59 .000
Step Block 1609.428 59 .000
1 Mod 1609.428 59 .000
el
Interpretation; since p value is <0.05, it is better prediction than constant only model.
Model Summary
Step -2 Log Cox & Snell R Nagelkerke R
likelihood Square Square
1 2668.880a .196 .445
a. Estimation terminated at iteration number 20 because
maximum iterations has been reached. Final solution cannot
be found.
Interpretation; 44.5% of outcome variability is explained by added predictors.
Check Hosmer lemshow test and report it.
The Hypothesis test
Ho; predicted value = observed value
Ha; predicted value ≠ observed value
Hosmer & Lemshow test of the goodness of fit suggests the model is a good fit to the
data as p value is not significant so we accept Ho.
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 4.952 8 .763
Interpretation; since p value is >0.05, the model fits the data (accept Ho)
Classification Tablea
Observed Predicted
Weight of infant at Percentage
birth Correct
<2500 ≥2500
Weight of infant at <2500 219 405 35.1
Step
birth ≥2500 83 6682 98.8
1
Overall Percentage 93.4
a. The cut value is .500
Interpretation; it shows an improvement in classification accuracy after new predictor added in
the model (from 91.6 to 93.4).
Variables in the Equation
B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
q211(1) -.580 .217 7.133 1 .008 .560 .366 .857
q213(1) .082 .349 .055 1 .814 1.086 .548 2.152
site 24.787 2 .000
site(1) 1.154 .299 14.929 1 .000 3.170 1.766 5.692
site(2) .952 .211 20.298 1 .000 2.590 1.712 3.918
gravida 1.962 2 .375
gravida(1) -.217 .764 .081 1 .776 .805 .180 3.599
gravida(2) -.590 .568 1.078 1 .299 .554 .182 1.689
parity 4.793 2 .091
parity(1) .123 .763 .026 1 .872 1.131 .253 5.047
Step 1a
parity(2) .823 .572 2.069 1 .150 2.277 .742 6.983
famsize 7.990 10 .630
famsize(1) -.962 1.266 .577 1 .447 .382 .032 4.570
famsize(2) -.969 .862 1.264 1 .261 .379 .070 2.055
famsize(3) -.645 .857 .566 1 .452 .525 .098 2.816
famsize(4) -.714 .854 .699 1 .403 .490 .092 2.610
famsize(5) -.854 .853 1.001 1 .317 .426 .080 2.268
famsize(6) -.875 .851 1.056 1 .304 .417 .079 2.212
famsize(7) -.690 .855 .651 1 .420 .502 .094 2.681
famsize(8) -.475 .895 .282 1 .596 .622 .108 3.594
famsize(9) -.762 .904 .712 1 .399 .467 .079 2.742
famsize(10) -1.450 .953 2.317 1 .128 .235 .036 1.518
wealth 3.001 8 .934
wealth(1) -16.989 27260.289 .000 1 1.000 .000 .000 .
wealth(2) -16.917 27260.289 .000 1 1.000 .000 .000 .
wealth(3) -16.703 27260.289 .000 1 1.000 .000 .000 .
wealth(4) -16.552 27260.289 .000 1 1.000 .000 .000 .
wealth(5) -16.865 27260.289 .000 1 1.000 .000 .000 .
wealth(6) .250 28412.618 .000 1 1.000 1.284 .000 .
wealth(7) -.217 33355.847 .000 1 1.000 .805 .000 .
wealth(8) -6.403 38349.572 .000 1 1.000 .002 .000 .
faminc .002 .001 3.500 1 .061 1.002 1.000 1.003
agegrp 1.758 6 .941
agegrp(1) .440 .639 .475 1 .491 1.553 .444 5.429
agegrp(2) .579 .621 .867 1 .352 1.783 .528 6.029
agegrp(3) .582 .614 .898 1 .343 1.789 .537 5.955
agegrp(4) .572 .610 .882 1 .348 1.773 .537 5.855
agegrp(5) .611 .611 1.001 1 .317 1.843 .557 6.099
agegrp(6) .457 .664 .473 1 .491 1.579 .430 5.800
length .335 .021 244.203 1 .000 1.397 1.340 1.457
armcircum 1.192 .060 400.178 1 .000 3.293 2.930 3.701
eductn .515 3 .916
eductn(1) -.078 .385 .041 1 .839 .925 .435 1.966
eductn(2) -.172 .384 .200 1 .655 .842 .397 1.786
eductn(3) -.126 .439 .082 1 .774 .882 .373 2.084
q322 6.767 6 .343
q322(1) .870 .383 5.147 1 .023 2.386 1.126 5.057
q322(2) .633 .414 2.340 1 .126 1.884 .837 4.242
q322(3) .626 .733 .729 1 .393 1.870 .444 7.864
q322(4) .663 .720 .849 1 .357 1.941 .473 7.962
q322(5) 16.592 5698.156 .000 1 .998 16061819.449 .000 .
q322(6) 1.403 1.272 1.216 1 .270 4.066 .336 49.197
q312 19.895 5 .001
q312(1) -.263 .762 .119 1 .730 .769 .173 3.422
q312(2) -.391 .454 .742 1 .389 .676 .278 1.646
q312(3) .377 .242 2.422 1 .120 1.457 .907 2.341
q312(4) .583 .161 13.157 1 .000 1.792 1.308 2.456
q312(5) .063 .133 .223 1 .637 1.065 .821 1.381
q313 5.163 4 .271
q313(1) -.598 .355 2.845 1 .092 .550 .274 1.102
q313(2) -.835 .490 2.909 1 .088 .434 .166 1.133
q313(3) 1.426 1.733 .677 1 .410 4.163 .139 124.256
q313(4) -.404 .366 1.221 1 .269 .667 .326 1.367
q314 11.513 5 .042
q314(1) -1.226 .780 2.469 1 .116 .294 .064 1.354
q314(2) -.086 .778 .012 1 .912 .918 .200 4.212
q314(3) 18.231 8819.415 .000 1 .998 82699588.268 .000 .
q314(4) -.659 .722 .833 1 .361 .517 .126 2.130
q314(5) -.545 .716 .579 1 .447 .580 .143 2.360
q319(1) -.519 .190 7.448 1 .006 .595 .410 .864
Constant -8.509 27260.289 .000 1 1.000 .000
a. Variable(s) entered on step 1: q211, q213, site, gravida, parity, famsize, wealth, faminc, agegrp, length, armcircum, eductn, q322, q312,
q313, q314, q319.
Interpretations
The column labeled “sig.” gives the p value for the statistical test that the odds ratios
significantly differ from one.
Variables which increases birth wt. (OR >1, p<0.05)
1. Urban area is 3.17 times as likely as rural to increase infant birth wt.
2. Other urban area is 2.59 times as likely as rural to increase infant birth wt.
3. As length of infant increases by one unit, infant birth wt. increases by1.397.
4. As arm circumference increases by one unit, infant birth wt. increases by 3.293.
5. Drinking protected well/spring is 1.792 times as likely as drinking river/pond water to
increase infant birth wt.
Variables which decreases birth wt. (OR<1, p<0.05)
1. Abortion is 0.560 times as likely as no abortion to increase infant birth wt.
2. Use soap for to wash hand is 0.595 times as likely as no use soap to increase infant birth
wt.
N.B; the other variables have no significant association with infant birth wt.
Multinomial Logistic Regression
Assumptions
1. Dependent variable should be measured at the nominal level.
2. There should have two or more independent variables that are continuous, ordinal or
nominal (including dichotomous variables).
3. There is independence of observations and the dependent variable should have mutually
exclusive and exhaustive categories.
4. There should be no multicolinearity.
5. There needs to be a linear relationship between any continuous independent variables and
the logit transformation of the dependent variable.
6. There should be no outliers, high leverage values or highly influential points.
I have done multinomial logistic regression by using independent variables which have
significant association with infant birth wt. checked by binary logistic regression.
Case Processing Summary
N Marginal
Percentage
<2500 666 8.5%
Weight of infant at birth
≥2500 7203 91.5%
Tap inside the house 83 1.1%
Private tap in compound 323 4.1%
Public tap 1111 14.1%
Source of drinking water
Protected well/spring 2462 31.3%
Unprotected well/spring 2469 31.4%
River/pond 1421 18.1%
Yes 6833 86.8%
Use soap to wash hands
No 1036 13.2%
Urban 1862 23.7%
Study area Other urban 1728 22.0%
Rural 4279 54.4%
HAVE YOU EVER HAD YES 601 7.6%
AN ABORTION? NO 7268 92.4%
Valid 7869 100.0%
Missing 181
Total 8050
Subpopulation 1950a
a. The dependent variable has only one value observed in 1785 (91.5%)
subpopulations.
Model fit techniques
Hypothesis testing
Ho; the model fits the data
Ha; the model does not fit the data well
Decision; reject Ho if p< 0.05
Goodness-of-Fit
Chi- df Sig.
Square
Pearson 1549.817 1938 1.000
Devianc 865.651 1938 1.000
e
Interpretation; since p>0.05, the model fits the data.
Another option to get overall measure of the model is to consider the statistics presented in the
model fitting information.
Model Fitting Information
Model Model Fitting Likelihood Ratio Tests
Criteria
-2 Log Chi- df Sig.
Likelihood Square
Intercept 2826.760
Only
Final 1224.596 1602.163 11 .000
Interpretation; since p <0.05, the full model statistically significantly predicts the dependent
variable better than the intercept only model alone.
Likelihood Ratio Tests
Effect Model Fitting Likelihood Ratio Tests
Criteria
-2 Log Likelihood of Chi-Square df Sig.
Reduced Model
Intercept 1224.596a .000 0 .
length 1517.990 293.393 1 .000
armcircum 1814.721 590.124 1 .000
q312 1242.873 18.276 5 .003
q319 1236.870 12.274 1 .000
site 1307.122 82.526 2 .000
q211 1231.168 6.572 1 .010
The chi-square statistic is the difference in -2 log-likelihoods between the
final model and a reduced model. The reduced model is formed by
omitting an effect from the final model. The null hypothesis is that all
parameters of that effect are 0.
a. This reduced model is equivalent to the final model because omitting
the effect does not increase the degrees of freedom.
Interpretation; since p <0.05, all variables are statistically significant.
Parameter Estimates
Weight of infant at birtha B Std. Error Wald df Sig. Exp(B) 95% Confidence Interval for
Exp(B)
Lower Bound Upper Bound
Intercept 24.771 1.007 605.475 1 .000
length -.321 .019 273.386 1 .000 .725 .698 .753
armcircum -1.160 .055 443.838 1 .000 .314 .282 .349
[q312=1] -.009 .617 .000 1 .988 .991 .296 3.320
[q312=2] .183 .392 .217 1 .641 1.200 .557 2.588
[q312=3] -.382 .219 3.061 1 .080 .682 .445 1.047
[q312=4] -.527 .151 12.185 1 .000 .591 .439 .794
[q312=5] -.046 .126 .132 1 .716 .955 .747 1.222
<2500
[q312=6] 0b . . 0 . . . .
[q319=1] .584 .173 11.386 1 .001 1.794 1.278 2.519
[q319=2] 0b . . 0 . . . .
[site=1] -1.433 .206 48.405 1 .000 .238 .159 .357
[site=2] -1.097 .162 45.622 1 .000 .334 .243 .459
[site=3] 0b . . 0 . . . .
[q211=1] .496 .187 7.045 1 .008 1.643 1.139 2.370
[q211=2] 0b . . 0 . . . .
a. The reference category is: ≥2500
b. This parameter is set to zero because it is redundant.
Interpretations
Variables which increase under wt. (<2500) (OR >1, p<0.05)
1. Use soap for wash hand increases under wt. by 1.794 compared with normal wt.
2. Abortion increases under wt. by 1.643 compared with normal wt.
Variables which decrease under wt. (<2500) (OR <1, p<0.05)
1. Length of infant increases under wt. by 0.725 compared with normal wt.
2. Arm circumference increases under wt. by 0.314 compared with normal wt.
3. Protected well/spring water drink increases under wt. by 0.591 compared with normal wt.
4. Urban area increases under wt. by 0.238 compared with normal wt.
5. Other urban area increases under wt. by 0.334 compared with normal wt.
3)Survival analysis
1. By taking time to infant death as outcome, do cox regression (First, do bivariable
and then multivariable cox regression)
Bivariable cox regression between area and time to infant death
Categorical Variable Codingsa
Frequenc (1) (2)
y
1=Urban 1927 1 0
2=Other 1763 0 1
siteb
urban
3=Rural 4360 0 0
a. Category variable: site (Study area)
b. Indicator Parameter Coding
Variables in the Equation
B SE Wald df Sig. Exp(B) 95.0% CI for Exp(B)
Lower Upper
site 57.526 2 .000
site(1 .209 .029 52.223 1 .000 1.233 1.165 1.305
)
site(2 .129 .029 19.261 1 .000 1.138 1.074 1.206
)
Interpretation
1. An infant in urban area is 1.233(HR) times likely to get death at any time than an infant in
rural area. (since p=000 and CI does not contain 1)
2. An infant in other urban area is 1.138 times likely to get death at any time than an infant in
rural area. (since p=000 and CI does not contain 1)
Bivariable cox regression between gravida and time to infant death
Categorical Variable Codingsa
Frequenc (1) (2)
y
1=Prim 1804 1 0
gravida i
b 2=2-4 3585 0 1
3=> 4 2651 0 0
a. Category variable: gravida (Total pregnancies by mom)
b. Indicator Parameter Coding
Variables in the Equation
B SE Wald df Sig. Exp(B) 95.0% CI for Exp(B)
Lower Upper
gravida 5.974 2 .050
gravida(1 .078 .032 5.912 1 .015 1.081 1.015 1.152
)
gravida(2 .037 .027 1.915 1 .166 1.038 .985 1.094
)
Interpretation
1. Primigravida is 1.081 times likely to give infant death at any time than 2-4 pregnancies
by mom.
Bivariable cox regression between length of infant and time to infant death
Variables in the Equation
B SE Wald df Sig. Exp(B) 95.0% CI for Exp(B)
Lower Upper
lengt .003 .004 .654 1 .419 1.003 .995 1.012
h
Interpretation
No significant association
Multivariable cox regression
Variables in the Equation
B SE Wald df Sig. Exp(B) 95.0% CI for Exp(B)
Lower Upper
weight .000 .000 .884 1 .347 1.000 1.000 1.000
site 29.989 2 .000
site(1) .253 .047 28.587 1 .000 1.288 1.174 1.414
site(2) .167 .041 16.810 1 .000 1.182 1.091 1.280
gravida .816 2 .665
gravida(1) .108 .158 .472 1 .492 1.114 .818 1.518
gravida(2) .085 .096 .780 1 .377 1.088 .902 1.313
parity .483 2 .785
parity(1) -.080 .161 .246 1 .620 .924 .674 1.265
parity(2) -.068 .098 .476 1 .490 .934 .771 1.133
famsize 8.225 10 .607
famsize(1) .610 .255 5.715 1 .017 1.840 1.116 3.034
famsize(2) .099 .120 .680 1 .410 1.104 .872 1.399
famsize(3) .116 .117 .985 1 .321 1.123 .893 1.413
famsize(4) .132 .116 1.300 1 .254 1.142 .909 1.434
famsize(5) .124 .115 1.167 1 .280 1.132 .904 1.419
famsize(6) .114 .115 .983 1 .321 1.121 .894 1.405
famsize(7) .070 .117 .359 1 .549 1.073 .853 1.349
famsize(8) .148 .122 1.483 1 .223 1.160 .914 1.471
famsize(9) .096 .132 .525 1 .469 1.101 .849 1.426
famsize(10) .035 .156 .049 1 .824 1.035 .763 1.404
faminc .000 .000 1.475 1 .225 1.000 1.000 1.000
agegrp 3.167 6 .788
agegrp(1) -.012 .159 .006 1 .941 .988 .723 1.350
agegrp(2) -.041 .154 .072 1 .789 .960 .709 1.298
agegrp(3) -.078 .152 .260 1 .610 .925 .687 1.247
agegrp(4) -.037 .151 .058 1 .810 .964 .716 1.297
agegrp(5) -.046 .152 .093 1 .760 .955 .709 1.286
agegrp(6) .017 .165 .011 1 .917 1.017 .737 1.405
length .004 .005 .449 1 .503 1.004 .993 1.014
armcircum -.022 .014 2.534 1 .111 .978 .952 1.005
sex -.034 .025 1.885 1 .170 .967 .921 1.015
eductn .231 3 .972
eductn(1) .008 .059 .019 1 .891 1.008 .898 1.132
eductn(2) .011 .057 .035 1 .851 1.011 .905 1.129
eductn(3) -.016 .066 .055 1 .814 .985 .864 1.121
q322 11.731 6 .068
q322(1) .090 .084 1.168 1 .280 1.095 .929 1.290
q322(2) .086 .091 .904 1 .342 1.090 .912 1.303
q322(3) .060 .140 .180 1 .671 1.061 .806 1.397
q322(4) .454 .142 10.284 1 .001 1.575 1.193 2.080
q322(5) .180 .184 .955 1 .329 1.197 .834 1.719
q322(6) .033 .132 .063 1 .802 1.034 .798 1.339
q312 1.524 5 .910
q312(1) -.069 .133 .269 1 .604 .933 .719 1.211
q312(2) -.047 .076 .393 1 .531 .954 .823 1.106
q312(3) -.009 .049 .037 1 .848 .991 .900 1.090
q312(4) -.035 .038 .819 1 .366 .966 .896 1.041
q312(5) -.032 .036 .801 1 .371 .968 .902 1.039
q313 7.418 4 .115
q313(1) .064 .048 1.753 1 .185 1.066 .970 1.173
q313(2) .002 .086 .001 1 .980 1.002 .846 1.187
q313(3) .010 .156 .004 1 .950 1.010 .744 1.370
q313(4) -.039 .047 .672 1 .412 .962 .877 1.055
q319 -.064 .040 2.598 1 .107 .938 .868 1.014
Interpretation
1. An infant in urban area is 1.288 times likely to get death at any time than an infant in
rural area. (since p=000 and CI does not contain 1)
2. An infant in other urban area is 1.182 times likely to get death at any time than an infant
in rural area. (since p=000 and CI does not contain 1)
3. Family size 2 is 1.840 times likely to get an infant death at any time than family size 12.
(since p=0.017 and CI does not contain 1)
4. Maid/cleaner is 1.575 times likely to get an infant death at any time than others. (since
p=0.001 and CI does not contain 1)
2. Do K-M graph for sig categorical variable and interpret it
Case Processing Summary
Study area Total N N of Censored
Events N Percent
Urban 1927 1748 179 9.3%
Other 1763 1634 129 7.3%
urban
Rural 4360 3913 447 10.3%
Overall 8050 7295 755 9.4%
Overall Comparisons
Chi- df Sig.
Square
Log Rank (Mantel-Cox) 113.032 2 .000
Breslow (Generalized 147.376 2 .000
Wilcoxon)
Tarone-Ware 138.702 2 .000
Test of equality of survival distributions for the different levels of
Study area.
Interpretation; since p<0.05, there is a statistically significant difference in time-to- infant death
between the groups.
Interpretation; rural infants survives 100% until 300 days but urban and other urban infants
survive 100% until around 75 days.
Case Processing Summary
Family Total N N of Censored
Size Events N Percent
2 52 26 26 50.0%
3 1490 1323 167 11.2%
4 1535 1389 146 9.5%
5 1429 1287 142 9.9%
6 1175 1069 106 9.0%
7 919 853 66 7.2%
8 639 591 48 7.5%
9 394 372 22 5.6%
10 206 192 14 6.8%
11 97 86 11 11.3%
12 104 98 6 5.8%
Overall 8040 7286 754 9.4%
Overall Comparisons
Chi- df Sig.
Square
Log Rank (Mantel-Cox) 22.044 10 .015
Breslow (Generalized 28.401 10 .002
Wilcoxon)
Tarone-Ware 24.501 10 .006
Test of equality of survival distributions for the different levels of
Family Size.
Interpretation; since p<0.05, there is a statistically significant difference in time-to- infant death
between the groups.
Interpretation; family size 2 decreased its survival time early compared to others.
Case Processing Summary
Occupation of mother Total N N of Censored
Events N Percent
Fulltime housewife 6524 5911 613 9.4%
Housewife with 759 691 68 9.0%
occassional income
Student 209 188 21 10.0%
Maid/cleaner 98 87 11 11.2%
Clerical/Typist/Casher 43 40 3 7.0%
Professional (Nurse, 115 108 7 6.1%
teacher)
Oother 286 257 29 10.1%
Overall 8034 7282 752 9.4%
Overall Comparisons
Chi- df Sig.
Square
Log Rank (Mantel-Cox) 53.893 6 .000
Breslow (Generalized 60.730 6 .000
Wilcoxon)
Tarone-Ware 57.351 6 .000
Test of equality of survival distributions for the different levels of
Occupation of mother.
Interpretation; since p<0.05, there is a statistically significant difference in time-to- infant death
between the groups.
Interpretation; maid/cleaner has decreased infant survival time early compared with others.