PSYCHOLOGICAL ASSESSMENT
PROFESSOR: Ms. Mary Anne Joseph Montoya & RPm & Ms. Irish Mandap, RPm, RPsy, MS.
2nd year finals for the 2nd Semester A.Y. 2024-2025
BACHELOR OF SCIENCE IN PSYCHOLOGY 2027
MODULE 4: OF TESTS AND TESTING
TOPIC OUTLINE: - Test score = true score + error
I. Assumptions about Psychological Ø ERROR VARIANCE – the component of a test
Testing score attributable to sources other than trait or
II. Qualities of a Good Test ability measured
III. Norms
IV. Types of Norms UNFAIR AND BIASED ASSESSMENT PROCEFUR4E
V. Fixed Reference Group Scoring CAN BE IDENTIFIED AND REFORMED
Systems - Test are tools, they can either be used properly
or improperly.
ASSUMPTIONS ABOUT PSYCHOLOGICAL - No matter how good the test is, there will
TESTING always be bias
PSYCHOLOGICAL TRAITS AND STATES - Consider the client: their characteristics will
Ø TRAITS – any distinguishable, relatively determine if their test fits them.
enduring way in which one individual varies - Problems arise if the test is used with people for
from another whom it was not intended.
o They may change over time, yet there
are often high correlations between TESTING AND ASSESSMENT OFFER POWERFUL
trait scores at different time points. BENEFITS TO SOCIETY
o Traits are manifests and depends on - Without tests, everyone can be whoever they
the situation, because traits are want to be.
affected most on external
circumstances. They change but not QUALITIES OF A GOOD TEST
that much. GOOD TESTS = HIGH QUALITY DECISIONS
Ø STATES – distinguish one person from another - This can be seen when: hiring employees,
but are relatively less enduring medicines to prescribe, making a diagnosis,
- Informed, scientific concepts developed or entrance exams, scholarship programs,
constructed to describe or explain behavior grades, licensure exams, and treatment for
o Exist only as constructs – measurable mental illnesses
through overt behavior - Goos tests are easy to administer, score, and
interpret. If need be, it has adequate norms
TRAITS AND STATES CAN BE QUANTIFIED AND
MEASURED BAD TESTS = LOW QUALITY DECISIONS
- Once a construct is defined, test developers
turn to item content and item weighting PSYCHOMETRIC PROPERTIES
- Different definitions = different measurements 1. RELIABILITY
- The criterion of reliability involves the
TEST-RELATED BEHAVIOR PREDICTS NON-TEST consistency of the measuring tool: precisions
RELATED BEHAVIOR with which the test measures and the extent to
- Not a test of your creativity or drawing skills which error is present in measurements.
- Ex: when you take an exam and you shade the - We want to be reasonably certain that the
answer, it is not what we measure rather we are measuring tool or test that we are using is
measuring the content. consistent. That is, we want to know that it
yields the same numerical measurement every
ALL TESTS HAVE LIMITS AND IMPERFECTIONS time it measure the same thing under the same
- Competent test users understand and conditions.
appreciate the limitations of the tests they use 2. VALIDITY
as well as how those limitations might be - Considered valid when it measure what it
compensated for by data from other sources. purports to measure
- There will be bias, blind spots, and question
regarding reliability and validity but we must NORMS
compensate these weaknesses through NORM-REFERENCED TESTING AND ASSESSMENT
providing more test. - A method of evaluation and a way of deriving
meaning from test scores by evaluating an
VARIOUS SOURCES OF ERROR ARE PART OF THE individual test taker’s score and comparing it to
ASSESSMENT PROCESS scores of a group of test takers
- Assumption that factors other than what a test Ø NORMS – are the test performance data of a
attempts to measure will influence performance particular group of test takers that are designed
on the test.
for use as a reference when evaluation or Non-probability is convenient and motivated to
interpreting individual test scores. volunteer
Ø NORMATIVE SAMPLE – is that a group of
people whose performance on a particular test SAMPLING TO DEVELOP NORMS
is analyzed for reference in evaluating the - Having to obtained a sample… test developers:
performance of individual test takers. o Administer the test with standard set of
instructions
NOTE: Norms established during test development, we o Recommend a setting for test
develop them thinking about our future test takers. administration
o Collect and analyze data
SAMPLING TO DEVELOP NORMS o Summarize data using description
Ø TEST STANDARDIZATION – the process of statistics including measures of central
administering a test to a representative sample tendency and variability
of test takers for the purpose of establishing o Provide a detail description of the
norms and the instructions on hoe to administer standardization sample
the test.
o STANDARDIZED TEST – clearly TYPES OF NORMS
specified procedures for administration PERCENTILES
and scoring (with normative data) - Expression of the percentage of people whose
score on a test or measure falls below a
SAMPLING particular raw score.
SAMPLING Ex: In such a distribution, the xth percentile is equal to
- Tests developers select a population, for which the score at or below which x% of scores fall.
the test is intended, that has at least one - 87th percentile = the test taker’s score or
common, observable characteristics performance is similar or higher than 87% of his
- The process of selecting the portion of the peers
universe deemed to be representative of the
whole population
Ø POPULATION – is the complete universe or set
of individuals with at least one common,
observable characteristics
Ø PROBABILITY – equal chance of being
selected for sample
SIMPLE RANDOM SAMPLING
- Example of this is the fishbowl method
NOTE: Real differences between raw scores may be
CLUSTER SAMPLING minimized near the ends of the distribution and
- Clusters of participants within a population of exaggerated in the middle of the distribution
interest are randomly selected, and then all
individuals in each selected cluster are used AGE NORMS
o Ex: all students from CBA, CED, CEA - Average performance of different samples of
will be the sample test-taker were to various ages when the test
was administered
MULTISTAGE SAMPLING
- 1st stage: random sample of clusters GRADE NORMS
- 2nd stage: random sample of people within - The average test performance of test takers in
those clusters a given school grade
o Ex: randomly select students from
CBA, CED, and CEA (not everyone) NATIONAL NORMS
- Derived from a normative sample that was
STRATIFIED RANDOM SAMPLING nationally representative of the population at
- Purposefully select groups or STRATA, then the time of the norming study was conducted.
randomly select individuals within each strata,
proportionate to their membership in the NATIONAL ANCHOR NORMS
population - An equivalency table for scores on two different
SAMPLING tests. Allows for a basis of comparison.
o Technical considerations entail that it
CONVENIENCE Easy to reach, quickest, first you
would be a mistake to treat these
SAMPLING can find; most convenient
equivalencies as precise equalities
PURPOSIVE Intentional; think rough; on
Ex: Reading test 1: 96th percentile = raw score of 69.
SAMPLING purpose
Reading test 2: 96th percentile = raw score of 14
SNOWBALL Through recommendations
Therefore, 69 = 14
SAMPLING
QUOTA Fill in subsets of the population
SAMPLING like stratified but not random
SUBGROUP NORMS all groups in much the
- A normative sample can be segmented by any same way
of the criteria initially used in selecting subjects Consider consulting with Take for granted that
for the sample. members of particular members of all cultural
Ex: geographical location, educational level, gender, communities regarding communities will
etc. the appropriateness of automatically deem
particular assessment particular techniques,
LOCAL NORMS techniques, tests, or test tests, or test items
- Provide normative information with respect to items appropriate for use
the local population’s performance on some Strive to incorporate Take a one size fits all
test assessment methods view of assessment
that complement the when it comes to
FIXED REFERENCE GROUP SCORFING worldview and lifestyle of evaluation of persons
SYSTEMS assesses who come from various cultural and
FIXED REFERENCE GROUP SCORING SYSTEMS from a specific cultural linguistic populations
- This distribution of scores obtained on the test and linguistic population
from one group of test takers is used as the Be knowledgeable about Select tests or other tools
basis for the calculation of test scores for future the many alternative of assessment with little
administrations of the test. tests or measurement or no regard for the
procedures that may be extent to which such
used to fulfill the tools are appropriate for
assessment objectives use with a particular
assessee
Be aware of equivalence Simply assume that a
issues across cultures, test has been translated
including equivalence of into another language is
NORMATIVE REFERENCED language used and automatically equivalent
- Involve comparing individuals to the normative constructs measured in every way to the
group original
- Norm-referenced tests compare a student’s Score, interpret, and Score, interpret, and
performance against the performance of their analyze assessment analyze assessment in a
peers. data in its cultural context cultural vacuum
Ø How are the scores interpreted? An individual’s with due consideration of
raw score is calculated and then compared with cultural hypotheses as
the scores of others (the appropriate norm possible explanation of
group). Raw scores are transformed and findings
reported in more meaningful units such as
percentiles or grade equivalents.
CRITERION-REFERENCED TESTS
- Test takers are evaluated as to whether they
meet a set standard
- Criterion-referenced tests compare a student’s
knowledge and skills against a predetermined
standard, cut score, or other criterion
- In criterion-referenced tests, the performance
of other students does not affect a student’s
score
Ø How are the score interpreted? An individual’s
score is calculated and then compared with the
total possible score on the test. Raw scores are
the. Transformed and reported in more
meaningful units such as percentages.
CULTURE AND INFERENCE
NO TEST IS PERFECT
Ø It may be less effective for specific population
Ø Cultural bias
CULTURALLY INFORMED ASSESSMENT: SOME
DO’S AND DON’T’S
DO DO NOT
Be aware of the cultural Take for granted that a
assumptions on which a test is based on
test is based assumptions that impact