Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views12 pages

Statistics

Uploaded by

dr bhushan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Statistics

Uploaded by

dr bhushan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Made with Xodo PDF Reader and Editor

2 STATISTICS

Definition

* It is ascience of data analysis.


Science of learning from data and of measuring, controlling and
communicating uncertainty.
analvsis,
Branch of maths that deals with the collection,
numerical data.
interpretation and presentation of masses of
Data
are the raw materials of
Data is defined as numbers, which
statistics.
data are:Records(Census, Registration of vital events,
* Sources of
records), Survey (gives information about changing
Hospital timely warning),
provide
trends in health, Provides feedback,
Literature.
Experimeni, and Published
Measurement scales and Types of data (4 scales)
*
1. Nominal scale
names or labels for certain characteristics.
" Provides are called categorical
nominal scale
" Variables assessed on a
variables.
Gender, Religion, Class rollnunmbers, etc.
" E.g.
2. Ordinal scale
lin rank order.
Numbers are assigned arank and represented
"
E.g.Visual Analogue Scale (VAS).
"
scale of
3. Interval scale
represented in a continuous between
scale is differences
" Data in this that there are equal
measurement such
values in the scale.
Made with Xodo PDF Reader and Editor

Methodology &, Medical

50 An Entrance Guideon Research Statistics


be arbitrary.
" Thezero value can
Falhrenheit scale.
" Eg.
4. Ratio scale measurement is the estimation of the ratio
scale data quantity and a
" In this ofa continuous
between a magnitude kind.
magnitude ofthe same point where nothing is sCored or
indicatesthe
Zero actually
the scale.
temperature, ent
measurement of height and
scale of
E.g. Kelvin
weight.
Variable
character on which a set of data are
Variable refers to a particular
*
recorded.
types: Qualitative and
Variables can be broadly classified into 2
Quantitative variables.
1. Qualitative Variables
" Defined bysome characteristics.
" Also known askategorical values.
" Can be Nominal or an Ordinal variable.
"Nominal variable covers- Categories that cannot be
ranked;
ie. nocategory is more valuable than other.
Ordinal variable- Categories follow a logical hierarchy and
hence can be ranked.
2. Quantitative variables
" Measured on Numerical scales.
" Also known asnumerical variable
"Can be Continuous or Discrete.
Continuous)ariable includes fractional values. E.g. Hb%
Discrete )wariable can take on only certain discrete values, 1.e.
contains only whole numbers. Eg. RBC, No. of patients dyg
from cancer.
Types of Statistics
istics AM IMP
Descriptive Statistics
" Deals with the description of the
raw data in a meaningful way.
* It includes organisation and
collected from different summarization of the raw data
" sources.
Summarization of data is
which is a descriptive done by computing a single number
measure.
Made with Xodo PDF Reader and Editor

Statistics 51

Parameter:) Descriptive measure computed from the data of


population.
Sample: Measure computed from the data of a sample.
The extent to which observation clusters is summarized by
Measures of central tendency Mean, Median, Mode)
AThe spread can be described by measure of dispersion. (Range,
Variance, Standard Deviation, Coefficient Variation, Percentile.
Interquartile range).
Inferential Statistics
Involves making inferences that go beyond the actual data.
* This incorporates the application of statistical methods used to
draw conclusions from asample and make inferences to the entire
population.
+ Related to hypothesis testing.Steps are:
"Select proper (research design) and appropriate 6ample size.)
"Decide test of statistical significance.
" Determinepvalue)
" Compare it with critical yalue of,say 0.05.
Ifpvalue is less than critical value, reject the null hypothesis.
Measurement of Central Tendency
4 Mean
" Most common measure of central tendency.
Groupof data itemsis represented as a single number.
" Various types of means commonly used are:
sum of data
1. Arithmetic mean: Calculated by dividing the
items with the number of data items.
E=Lx/n
I= Arithmetic mean
Ex= Sum of observations
n=Number of observations
by multiplying
2. Geometric mean: n numbers is obtained always equal
azb.n them all together and then taking n" root.It is
to or less than the arithmeticmean.
by dividing the number
3. Harmonic mean: It is calculated each nunmber in the
am) the reciprocal of
of observations byharmonic of the
series.Thus, the mean is the reciprocal
AM
arithmeticmean of the reciprocals..
data sets.
Disadvantage: It is affected by outliners in the

AAAAA
Made with Xodo PDF Reader and Editor

An Entrance Guide on Researclh Methodology & Medical Statistics


52
Median
number of the group when they are ranked in order
"It is mniddle
(For odd numbers of observation) numbers, the meanof the midd
number of
" If there is an even
twois taken.
half of the observations are larger and half are
It denotes that
smaller.
particularly if the distribution of data is not
Useful measure
symmetrical.
Mode
most frequently occurring value in data series.
The is non-numerical data.
" Advantage:It can be used with
mode is3.
E.g. In data series 1,2,3,3,3,4,4 ;
equal frequency: Bimodal
TWo or more values may occur with
& Multimodal.
mean and median are.available, Mode can be computed by:
" If
***)Mode= 3 Median- 2Mean
Measure of Dispersion
Range scores in
It is the lifferenc between the lowest and the highest
the distribution.
Simplest measureof variability.
Percentiles
. Can be obtained by ranking the values and then grouping them
into 100equal parts, that are called centiles or percentiles.
Median represents 50" percentile.
" Used to compare an individual value with norm.
* Interquartile Range
" It is defined as the difference between the 25th and75th percentiles,
also called first and third quartiles, respectively.
. Jfthe dispersion in the data series is less, we can use the (10th to
90h percentile value to denote spread.
Variance
Mean of the squares of all the deviation scores in the distribution.
" Represented asoor population and(Sor sample.
Standard Deviation)
. It is simply the square root of the
variance.
Made with Xodo PDF Reader and Editor

Statistics 53

. Represented aojfor a population and by$for asample.


.When standard deviation of a distribution is greater than its mean, the
mean doesnot adequately represent the measure of central tendency.
Note: The formulae for the variance (and standard deviation)
for a population has the value 'n' as the denominator. However, the
expression (n-1) is used when calculatingthe variance and SD of a
sample.
i.e.n = for population)
Aacl= for sample
where, n-1is degree of freedom.
*Coefficient of Variation:
"CV is standard deviation divided by the mearn {multiplied by
100).
. CV of a data series denotes the SD expressed as a percentage
of mean. CV= SD
Mean x100

" It is commonly used to describe variability of measuring


instruments.
Measures of Precision ***
« Coefficient of variation
Standard Error
" Defined as the standard deviation of repeated sample means or
proportions taken from the same population.
population value is
" Used to define arange in which the true Confidence interval.
likely to lie, and this range is known as
S.E. SD

interval ]s most commonly


* Conventionally, the 95% (onfidence
used.
* For normal distribution curve
mean would cover a range 1.96 SE either side of
570 lof the probability of includung the
Sample mean and willhave a95%
population mean. sample mean and will
2,58SE either side of the
99% C will span population mean.
have 99% probability of including the
CI would be narrower if SEM is smaller.
* t is evident that the SEM would be smaller and CIwould
ience,if sample is larger,
be narrower.
Made with Xodo PDF Reader and Editor

56 An Entrance Guide on Research Methodology & Medical


Data Presentation
Statistics
* Data need to be presented in tables and graphs.
*Regarding data presentation in tables, it is helpful
helpful to remember
following:
"" Fot Quantitative/ Numerical data andIsymmetric distribution--
Meanis used.
FotiOrdinal data)Numerical data and Skewed distribution
Median is used.
For examining bimodal or multi-modal distributions--Moda
"For numerical data to emphasize extreme valuesRange is
used.
SDis to be used along with Mean.
Interquartile range or Percentiles should be used along with
4 Median,.
" SD and Percentiles may be used for normative data.
If intention is to compare variability between sets Coefficient
of variation.
If intention is to draw inferences about populations from
samples 95% CI$hould be used.
o Methods of Presentation
Method of Presentation

Tabulation Drawings
Qualitative Quantitative| Qualitative Quantitative
Simple table Frequency Bar diagram -Pie/ Sector
Complex table -distribution
table Pie/Sector Frequency
Pictogram polygon
Map/Spot -Frequency
diagram curve
Line Chart

-Cumulative
frequency curve
-Scatter/ Dot
Table
Diagram
Simplest.and most revealing
quantitative data. methods for both qualitative and
Made with Xodo PDF Reader and Editor

Statistics 59

Variability
It is ameasure of the spread of data set.
Within a group variability is known aserror variance.
Between group variance is also known as effect variance
"Types of variability: ( Last Page)
Biological variability: Any difference between cells,
individualorganisms or group of organisms caused by genetic
differences. Genchpe/ Pheneye
Real variability: When variability is not by chance and is more
than the defined limits in universe.
SE of mean and proportion ) is used to verify the real
variability.
" Experimental variability: When the variation is due to materials
and methods,procedures or defects in the techniques involved
in theexperiment. It is of three types:
1. Observer error
May be subjective or objective.
"Subjective error-Observer or Researcher may alter some
information and thereby add a new number of error, when
human participants are not trained properly.
Objective erroj- An untrained observer may record
measurements with error. Due to different positions of the
observer the error occurs is known asParallax error.
2. InstrumentalError
Defects in weighing machine, heightmeasures and other tools
may cause undesirable variability.
3. Sampling Error
" When a sample is not agood representative of apopulation.
Sample must be representative and sufficiently large.
Error bias or Problem bias
* These are the factors that affect the Sample size calculations:
lype IError: Occurs whenone concludes that aditferenceexists
between the groupswhen, in reality, it does no.
" Considered as false positive resuli.)
" Denoted by (a
" Also known asRegulator's Error)
does
1ype IIError: Occurs when one concludes that a difference
reality,
not exist between the groups being compared when, in
adifference does exist.
Made with Xodo PDF Reader and Editor

on Research Methodology & Medical Statistics


Guide
An Entrance
66 if y is known and
x ony, regression
Regression equation ofbyx.y)
co-efficientis denoted y on x, if x is known and regression
Regression equation of
"
co-efficient is denoted by y. random variabloc
should be
both variables
< For
correlation
response variable y must be random.
onlythe Coefficient:
* For regression
Correlation
* Intra- class agreement between multiple raters giving
the
" Used to assess subjects.
scores to anumber of (between subjects) variance
the true
Calculation is based on measurement error (during repeat
and variance of the
measurement).
betweenl0 to 1.)
" ICC takes a value inter-rater agreement (but seldom
" Value lindicates, Complete
achieved) indicates strong
>0.7
Arbitrarily, it is taken that f value
agreement.
Statistical Test
Non-Parametric.
3They are parametric and
" Parametric tests
Paramnetric tests are used to compare samples of normaly
distributed data.
Most common parametrictests are: t test, ANOVA test, Z test.
o Parametric test assume that:
.Dataare numerical.
Normal distribution.
Observations within a group are independent of each other.
" Samples have been drawn randomly.
" If data is numerical /quantitative; if want to see is there any
difference between groups and if data isunpaired and there ae
2groups Unpaired 't- test>
" If data isnumerical/quantitative; if want to see is there
difference between groups- paired situation and there are_
groups(laired 't-tesl)
" fdata is numerical/ quantitative : if want to see is there any
difference betweengroups and if data is unpaired and there a
>2 groups ANOVA.)
Made with Xodo PDF Reader and Editor

Statistics 67

. Ifdata is numerical/ quantitative; if want to see is there any


difference between groups and if data ispaired) and there are
ANOV
>2 groups Repeated measures
"Ztest
"It is also known as Normality test and Large sample test.
" Sample size must be sufficiently large i.enore than30.)
" If the SD of population is known, Z test can still be applied even
ifsample size is smaller than 30.
"If Zvalue increases, P value of an event happening by chance
decreases.
Sample mean - Population mean
Z=
SE of sample mean
* t-test
"Also known as student's test.
" Ifsamples are random.
"Data should be numerical/ Quantitative.
" Sample ifless than 30.)
" Variables normally distributed.
" Unpaired t-test
Independen)samples t test.
independent samples differ
" Totest if means estimated for two
significantly.
"Paired t-test
the means of two dependent samples, or in other words
" To test if
two related data sets, differ significantly.
after treatmernt.
" For scenario of before and
* ANOVA test
" Stands for Analysis of variance test.
" Also known asf test)
significant differences between the means of
" Used to test for
two or more groups. extensionof thettest.
multiple group
" it may be regarded as a compares ratio of twovariances,
"In ANOVA,we actually
Non-Parametric tests: Counts Or
represent
*Categorical variables commonly
frequencies.
Made with Xodo PDF Reader and Editor

Rescarch Methodology & Medical Statistics


68 AnEntrane Guideon
is not
tests are used when data normally
< Non parametric
distributed.
Chi-square test, Wilcoxon test, Man
tests are:
* Non-parametric test, Fischer Exact test, Mc-Nemar'e
Whitney test, Kruksall Walli's
test. difference between
data is categorical, and
want to see is there a
* If are there -Chi square test)
data, 2groups
groups for unpaired
Fischer's Exact test.
between
categorical, and want to see isthere a difference
* Ifdata is there Mc-Nemar'stest.
2 groups are
groups for paired data, there a difference between
categorical, and want to see is
* Ifdata is
groups for unpaired data, 2groups are there -Chi square
test.
a difference between
If data iscategorical, and want to see is there
test)
groups for paired data, >2 groups are there -Cochran's
Chi-square test
"Data follows specific distribution known as chi-square
distribution.
Chi-square distribution is acontinuous probability distribution
given by positive values that are skewed to the right.
" The shape of chi-squared distribution is characterized by the
'degrees of freedom', that determines its shape. As the degree of
freedom increases, it becomes more symmetrical and approaches
the normal distribution.
" Usedfor both small and
large
samples.
" Sample must be drawn
independently at random.
" Expected frequency in each
cellmust be more than 5 or else use
Yate's correction formula. Yate's
0.5 from the positive correction inyolves subtracting
discrepancies discrepancies and adding 0.5 to the
before these values are negative
. The data must be whole squared.
numbers and not in
2(0- E)²
percentages.
E
O=Observed
Made with Xodo PDF Reader and Editor

Statistics 69

E- Expected
" Degree of freedom for chisquare is (column
-1)(row-1)
.Al cells should have an expected frequency greater than 1 and
s08% of the cells should have expected frequencies of at least 5.
IEthis is not the case, it may help to combine some categories:
this known as collapsing categories.
association)
" Chisquare test doesnot measure thestrength of
test
1. Wilcoxon Sum
't' test.
" Alternative to the two sample
and after the
before
. Jsed totest the difference in scores of data
experimental manipulation.
" The centre point is said to be zero.
. This test is a linear transformation ofMann Whitney test)
test
2. Wilcoxon Signed Rank Sum
free test.
" Non parametric test and distribution
adistribution
" Used to test the null hypothesis that the median of
is equal to some value.
samples. Ifpopulation
" Used to compare one sample or paired twoWilcoxon signed rank
is toosmall or not normally distributed,
test is applied.
"This test is used in place offpaired 't test.
on an ordinal scale.
"The data are measuredat least
3. Mann-Whitney test
" Also known as Utest.
that comes from te same
"Used to compare two population means or not.
population, to test whether two means are equal
" It is an alternative for the independent samples e test.
4. Kruskal's Wallis test
" Also known as H test.
"Used when the(assumptions of ANOVA are not met)
Whitneytest to three or more
Used as an extension of the Mann
groups.
5. Fischer's Exact test
non-random associations between
Used todetermine if thereare
twocategorical variables. to 2x2 contingency tables
" This test is most commonly applied
and is used in small samples.
Made with Xodo PDF Reader and Editor

An Entrance Guideon Researclh Methodology &Medical


70
association
Usedto examine the significance of
"kinds of classifications. between the two
"Me-Nemar's test
Used to assess if a statistically significantchange in propow:.
" has occurred on a dichotomous trait at fwo time pointson the
same population.
individuals ,
" This test is used whenever the same
measured.
Mc-Nemar's testjralue is greater than table value, null hypothesis
is rejected.
For Diagnostic tests
Sensitivity: Atest would be considered sensitive, in general, if it
is positive for most individuals having the disease.
" It is true positive rate.
* Specificity: Atest would be considered specific, in general, if it is
positive for only a small fraction of those without the disease.
" It is true negative rate.
" Sensitivity and Specificity are inherent properties of the test.
« Predictive Values
" It depends onthe prevalence of thedisease being tested for.
" Positive predictive value: The probability of disease being
present when test is positive.
" Negative predictive value: The probability of disease being
absent when test is negative.
Likelihood Ratios
"One cansummarize
information about a diagnostictest uising d
measure known as likelihood ratio that combines
about the sensitivity and information
"Likelihood ratio of a specificity.
positive test result Ris Sensitivity
(L-specificity
" Likeliho0d ratio ofa
negative test result LR his
Specificity
o Cohen's Kappa Statistic
(1-SensitivityI
"Jt is a measure
of inter rater
variables.
. Jf raters are in
agreement for categorical
. The kappa tendscomplete agreement then k=1.
to be higher
when there are fewer

You might also like