RELIABILI
TY
Reliability
(“consistency”)
Reliability indicates the accuracy or
precision of the measuring instrument. It
refers to a condition where measurement
process yields consistent responses over
repeated measurements.
Reliability
(“consistency”)
You need questions that yield
consistent scores when asked
repeatedly scores from an
instrument are stable and
consistent.
types of reliability
Types of
Measures the consistency of...
Reliability
Test-Retest The same test oven time.
The same test conducted by different
Interrater
people.
Different versions of a test which are
Parallel Forms
designed to be equivalent.
Internal Consistency The individual items of a test.
Test-restest
Test-retest reliability measures the
consistency of results when you repeat
the same test on the same sample at a
different point in time. You use it when
you are measuring something that you
expect to stay constant in your sample.
Test-restest
How to measure/assess:
To measure test-retest reliability, you
conduct the same test on the same group
of people at two different points in time.
Then you calculate the correlation
between the two sets of results.
Test-restest
Example:
You devise a questionnaire to measure the IQ of a
group of participants (a property that is unlikely to
change significantly over time). You administer the
test two months apart to the same group of people,
but the results are significantly different, so the test-
retest reliability of the IQ questionnaire is low.
Interrater
Interrater reliability (also called interobserver
reliability) measures the degree of agreement
between different people observing or assessing
the same thing. You use it when data is
collected by researchers assigning ratings,
scores or categories to one or more variables,
and it can help mitigate observer bias.
Interrater
How to measure/assess:
To measure interrater reliability, different
researchers conduct the same measurement
or observation on the same sample. Then you
calculate the correlation between their
different sets of results. If all the researchers
give similar ratings, the test has high
interrater reliability.
Interrater
Example:
A team of researchers observe the progress of wound
healing in patients. To record the stages of healing,
rating scales are used, with a set of criteria to assess
various aspects of wounds. The results of different
researchers assessing the same set of patients are
compared, and there is a strong correlation between all
sets of results, so the test has high interrater reliability.
Parallel/ Equivalent Forms
Parallel forms reliability measures the
correlation between two equivalent
versions of a test. You use it when you
have two different assessment tools or
sets of questions designed to measure
the same thing.
Parallel/ Equivalent Forms
How to measure/assess:
The most common way to measure
parallel forms reliability is to produce a
large set of questions to evaluate the
same thing, then divide these
randomly into two question sets.
Parallel/ Equivalent Forms
How to measure/assess:
The same group of respondents
answers both sets, and you calculate the
correlation between the results. High
correlation between the two indicates
high parallel forms reliability.
Parallel/ Equivalent Forms
Example:
A set of questions is formulated to
measure financial risk aversion in a group
of respondents. The questions are
randomly divided into two sets, and the
respondents are randomly divided into
two groups.
Parallel/ Equivalent Forms
Example:
Both groups take both tests: group A takes
test A first, and group B takes test B first.
The results of the two tests are compared,
and the results are almost identical,
indicating high parallel forms reliability.
Thank
You