Test of differences
Non parametric tests
Nominal Variables (categorical variable)
1. Binomial Test
The binomial test is useful for determining if the proportion of people in one of two categories
is different from a specified amount.
Condition: 1 variable with 2 categories (type: numeric)
E.g.: gender (male / female)
H0 (null hypothesis): We are assuming that respondents are in equal proportion
Interpretation:
If p value less than 0.05 then there is difference and reject the null hypothesis.
If P value greater than 0.05 then there is no difference and we fail to reject the null hypothesis.
2. Chi-Squared, One-Variable Test:
One variable with more than 2 categories (type: numeric)
E.g.: zone (south, north, east, and west)
Note: In expected column the value should be >=5 (if expected values < 5 then we should not
perform this test)
If <5 then convert it into 2 categories and perform binomial test not chi-square test
H0: there is no significant difference between the observed (O) and expected (E) frequencies.
Interpretation:
In this case none of the categories have expected frequencies less than 5.so, this is valid test.
If p value less than 0.05 then there is difference and reject the null hypothesis.
If P value greater than 0.05 then there is no difference and we fail to reject the null hypothesis.
3. Cross tabulation (Two variables: one nominal variable with 2 categories, another variable
with more than 2 categories)
E.g.: gender and zone
Perform cross tab, row = gender (dichotomous variable), column= zone
Statistics -> click on chi-square
Cells -> click on expected
Interpretation:
If p value less than 0.05 then there is difference and reject the null hypothesis.
If P value greater than 0.05 then there is no difference and we fail to reject the null hypothesis.
4. Mcnemar (2 variables (dichotomous) - same nature [ex:Satisfied and Dissatisfied] , data
collected at 2 different intervals)
E.g.: one variable attendance in jan and another variable attendance in feb
Npt -> 2 related samples -> click on mcnemar
Interpretation:
If p value less than 0.05 then there is difference and reject the null hypothesis.
If P value greater than 0.05 then there is no difference and we fail to reject the null hypothesis.
5. Cochran’ Q test (3 variables (dichotomous) - same nature [ex:Satisfied and Dissatisfied] ,
data collected at more than 2 different intervals)
E.g.: one variable attendance in jan and another variable attendance in feb and another var
attendance in mar
Npt -> K related samples -> click on chchran’s Q test
Interpretation:
If p value less than 0.05 then there is difference and reject the null hypothesis.
If P value greater than 0.05 then there is no difference and we fail to reject the null hypothesis.
Ordinal variables:
1.1 sample ks test (1 ordinal variable)
E.g. employee satisfaction
Select test distribution as uniform
Interpretation:
If p value less than 0.05 then there is difference and reject the null hypothesis.
If P value greater than 0.05 then there is no difference and we fail to reject the null hypothesis.
2.2 sample ks test (1 ordinal variable, 1 dichotomous nominal variable)
E.g.: employee satisfaction with respect to gender
Grouping variable = dichotomous variable
Choose test type = kolmorgrov-signmov Z
Interpretation:
If p value less than 0.05 then there is difference and reject the null hypothesis.
If P value greater than 0.05 then there is no difference and we fail to reject the null hypothesis.
3. Fried man test (3 ordinal variable at different time intervals)
E.g.: job satisfaction of 3 different months
K related samples -> Fried man test
Interpretation:
If p value less than 0.05 then there is difference and reject the null hypothesis.
If P value greater than 0.05 then there is no difference and we fail to reject the null hypothesis.
Parametric test perform on interval/scale variables
Distribution is normal
Check normality before performing any of the parametric t tests. If the means are same in the t-
test then look for the f value to look for deviances in the variance.
1. t-test with one sample (one scale variable)
E.g. assume the avg of previous batch is 26.2 then what is avg of current batch?
(in 2006 we had 26000 from literature data then what is the value now ? to peform this we need t-
test )
e.g.: age, salary etc.
Analyze -> compare means -> one-sample T test
Interpretation:
If p value less than 0.05 then there is difference and reject the null hypothesis.
If P value greater than 0.05 then there is no difference and we fail to reject the null hypothesis.
2. t-test for 2 variables (one interval variable, another nominal dichotomous variable)
We perform this test to check with respect t to dichotomous variable whether difference exists in
interval variable or not?
E.g. leadership interaction w.r.t gender
Analyze->compare means -> independent Samples T test
Grouping variable = dichotomous variable (gender)
Normality test :
Analyze -> Explore -> dependent scale variable
Statitics -> check descriptives and outliers
Plots -> histogram and normality curve
If p value in test of normality is less than 0.05 then that variable is not normally distributed.
P value of Shapiro-wilk >0.05 then data is normally distributed.
3. ANOVA (one variable should be interval and one should be nominal or ordinal)
E.g.: income and educational qualification
zone vs income
dependent list -> scale variable
factor list -> either nominal or oridinal variable
Interpretation:
if F values is more than 1 there is scope for variation and sign value <0.05
if F value is more than 1 then variance is high
it F value is close to 1 then variance is low
check F-table ( 4 AND 30 and check the value in table if it more 2 then reject it )
Test of relations
co-relation :
strength and direction of variable
1. linear co-relation ( interval ) ( pearson corelation ) ( parametric ) ( normal distrubuiton )
2. rank co-realation ( ordinal ) ( assume normal distribution )
a. spearman(npt) ( >= 30 )
b. kandles tauby( npt) ( 10-15 datapoints then only use )
Note : if 2astres stirng relatin at 99 and one asterk raltion with 95 CL
note : if you dont have normal distribution then even interval variable choose spearman
analze -> correlate -> biavarate
1. check the normality
2. scatter graph plot
3. check the direction by corelaton
if <0.05 relation exits and the F value for strength
regression
1. check ofor normality
check the corelation between varialbles better thean moderate
run the scatter plot
y always independent var
y=a+bx
zpred = x
zresidul = y
residual is for dependt var
in regression look at r2 value