Reliability – refers to the consistency of
scores obtained by the same person when re-
examined with the same test on different
occasions, or with different sets of equivalent
items, or under other variable examining
condition.
Classical Test Score Theory – this assumes
that each person has a true score that would
be obtained if there were no errors in
measurement.
Measurement error – the difference between
the observed score and the true score results.
E = X - T
(error) (observed score) - (true score)
Standard error of measurement – the
standard deviation of the distribution of
errors for each repeated application of the
same test on an individual.
Factors that contribute to consistency:
These consist entirely of those stable attributes of
the individual, which the examiner is trying to
measure.
Factors that contribute to inconsistency:
These include characteristics of the individual,
test, or situation, which have nothing to do with
the attribute being measured, but which
nonetheless affect test scores.
Domain Sampling Model
There is a problem in the use of limited number of
items to represent a larger and more complicated
construct.
A. Item selection
One source of measurement error is the
instrument itself. A test developer must settle
upon a finite number of items from a potentially
infinite pool of test question.
B. Test Administration
General environmental conditions may exert an
untoward influence on the accuracy of
measurement, such as uncomfortable room
temperature, dim lighting, and excessive noise.
C. Test Scoring
Whenever psychological test uses a format other
than machine-scored multiple choice items, some
degree of judgment is required to assign points to
answers.
With the help of a computer, the item
difficulty is calibrated to the mental ability of
the test taker.
If you got several easy items correct, the
computer will then move to more difficult
items.
If you get several difficult items wrong, the
computer moves back to average items.
It is also known as time sampling reliability
This is used when we measure only traits or
characteristics that do not change over time.
Carryover effect
Practice effect
Error variance
It is established when at least two different versions
of the test yield almost the same scores.
It is also known as item sampling reliability or
alternate forms reliability
The error of variance in this case represents
fluctuations in performance from one set of items to
another, but not fluctuations over time.
One of the most rigorous and burdensome
assessments of reliability since test
developers have to create two forms of the
same test.
Practical constraints make it difficult to retest
the same group of individuals.
It is the degree of agreement between two
observers who simultaneously record
measurements of the behaviors.
It is obtained by splitting the items on a
questionnaire or test in half, computing a separate
score for each half, and then calculating the
degree of consistency between the two scores for
a group of participants.
The task we must set
for ourselves is not to
feel secure, but to be
able to tolerate
insecurity.
- Erich Fromm