Parametric Test Non-parametric Equivalent Short Explanation
Simple Linear Regression Non-parametric Predicts a continuous
Regression(e.g., Theil–Sen outcome from one predictor;
estimator, LOESS) non-parametric version does
not assume a specific line or
normal residuals.
Two-way ANOVA Scheirer–Ray–Hare test (or Compares means when there
Aligned Rank Transform are two independent
ANOVA) factors, and checks for
interaction effects.
Independent t-test Mann–Whitney U test Compares the means or
distributions of two
independent groups(e.g.,
scores of Group A vs. Group
B).
Paired t-test Wilcoxon signed-rank test Compare two related or
matched samples or
“before vs after” (e.g.,
pre-test vs. post-test scores
of the same students).
Repeated Measures ANOVA Friedman test Tests for differences across
three or more related
measurements on the same
subjects (e.g., scores at
three time points).
One-way ANOVA Kruskal–Wallis test Checks whether three or
more independent groups
differ in their medians (e.g.,
comparing exam scores from
three teaching methods).
Pearson Correlation (r) Spearman’s rho or Measures the strength and
Kendall’s tau direction of the relationship
between two variables.
Spearman/Kendall use ranks
and handle non-normal data.
Correlation Type When to Use What It Shows
Pearson’s r Two continuous variables, Strength & direction of a
normally distributed linear relationship
Spearman’s rho Ordinal data or continuous Strength & direction of a
data not normal monotonic(increasing/decre
asing) relationship
Kendall’s tau Ordinal data, especially with Strength of association based
small samples or many ties on rank agreement
Point-Biserial One continuous + one Link between a yes/no
binary variable category and a continuous
score
Phi coefficient Two binary variables Association between two
yes/no categories
Type Shape Key Points Example Situation
Positive Skew Tail is longer on the Mean > Median > Income distribution
(Right-skewed) right; most scores Mode where few people
are low, a few very earn extremely high
high scores pull the salaries.
mean to the right.
Negative Skew Tail is longer on the Mean < Median < Exam where most
(Left-skewed) left; most scores are Mode students scored very
high, a few very low high but a few scored
scores pull the mean very low.
to the left.
Zero Skew Perfect bell curve, Mean = Median = Heights of adults in a
(Symmetrical) left and right sides Mode large population.
mirror each other.
Intelligence Test
Person Test / Contribution Key Points
Francis Galton Early measurement of Pioneer of psychometrics
sensory abilities & individual and statistics in psychology.
differences.
James McKeen Cattell Coined “mental tests.” Brought Galton’s ideas to the
U.S.
Alfred Binet & Théodore Binet–Simon Intelligence First practical IQ test for
Simon Scale (1905). children.
Lewis Terman Stanford–Binet Intelligence Popularized the IQ concept in
Scale (1916). the U.S.
Florence Goodenough Goodenough–Harris Nonverbal test of children’s
Draw-A-Man Test. intelligence; quick screening
tool.
David Wechsler WAIS, WISC, WPPSI. Modern standard IQ tests
across ages.
Charles Spearman Proposed the g factor Developed factor analysis.
(general intelligence).
Raymond Cattell Fluid & Crystallized 16 Personality Factors too.
intelligence.
John Carroll Three-Stratum Theory of Integrated many models of
intelligence. cognitive ability.
Personality and Clinical Assessment
Person Test / Contribution Key Points
Hermann Rorschach Rorschach Inkblot Test. Classic projective test of
personality.
Henry Murray & Christiana Thematic Apperception Projective measure of
Morgan Test (TAT). motives and personality.
John Buck House–Tree–Person (HTP) Projective drawing test to
test. explore personality and
emotional functioning.
Robert Woodworth Woodworth Personal Data First modern self-report
Sheet. personality inventory(WWI).
Starke Hathaway & J.C. MMPI (Minnesota Multiphasic Widely used objective
McKinley Personality Inventory). personality test.
Hans Eysenck Eysenck Personality Measured major personality
Questionnaire (EPQ). dimensions.
Raymond Cattell 16 Personality Factor Factor-analytic personality
Questionnaire (16PF). assessment.
Paul Costa & Robert NEO-PI-R / Big Five Model. OCEAN personality traits.
McCrae
Vocational / Aptitude & Special Tests
Person Test / Contribution Key Points
John Holland RIASEC / Holland Codes. Career interest & vocational
assessment.
Frank Parsons Father of vocational Matched individual traits to
guidance. suitable careers.
Edward Thorndike Early achievement & Helped shape educational
aptitude testing. testing.
David McClelland Developed thematic Studied motivation and
measures of achievement success factors.
motivation (nAch).
Type of Reliability When to Use How It’s Measured Easy Example
(Estimate)
Test–Retest You want to check if Pearson r Give the same IQ test
test scores stay the (correlation between now and again after 2
same over time. first and second test weeks → scores
Best for traits that do scores). should be almost the
not change quickly same.
(IQ, personality).
Parallel / Alternate You made two Pearson r Two different math
Forms versions of a test (correlation between exams should give
and want them to be Form A and Form B). similar results to the
equally hard and same group of
reliable. students.
Inter-Rater When different Cohen’s kappa (for Two psychologists
raters/judges are categories), ICC (for watch the same child
scoring and you want numbers), or percent and both rate the
their ratings to agree. agreement. child’s anxiety level
almost the same.
Fleiss Kappa-more
than 2 raters
Internal You want to see if Cronbach’s Alpha All questions in a
Consistency items in one test (for scales), KR-20 depression scale
measure the same (for right/wrong should consistently
thing. Good for items), or Split-half measure depression.
surveys and (Spearman–Brown).
questionnaires.
0.70 or higher = acceptable reliability.
0.80–0.90 = good for most uses.
0.90+ = needed for high-stakes tests.
Reliability = Consistency (same result again).
Validity = Accuracy (measures what it should).
📌 Spearman–Brown Formula
➡️
Purpose:
Used to estimate how reliability changes when you change the length of a test
(e.g., making it longer or shorter).
This is important because longer tests are usually more reliable.
Theory Main Idea Key Focus of Strengths Limitation Typical
Concepts Measurem s Uses
/ Formula ent
Classical Every X=T+E• Total test ✔ Simple, ❗ Item Most
Test observed X= score – easy to difficulty & classroom
Theory test score observed evaluates compute. discriminati tests,
(CTT) is made of score • T = the test as ✔ Works on depend many
a person’s true score a whole. even with on the traditional
❗
true score • E = error small specific psychologi
plus samples. sample. cal scales.
random Assumes
error. measurem
ent error is
the same
for all
ability
levels.
Domain A test is Reliability Sampling ✔ ❗ Test
Sampling just a is based of items Explains Assumes constructio
Theory sample of on how from the why longer the n,
questions well the content tests are “universe” especially
from a sample of domain. more of all when
large items reliable. ✔ possible creating
universe represents Highlights items is large item
(domain) the entire importance clearly banks
❗
of possible domain. of good defined. (e.g.,
questions Uses item Still achieveme
about the Spearman sampling. relies on nt or
trait. –Brown sample licensure
prophecy statistics. tests).
formula to
estimate
reliability
when test
length
changes.
Item The Probability Each item ✔ Item ❗ High-stake
Response chance of of correct and parameter Requires s
Theory a correct answer is person–it s are large standardiz
(IRT) response modeled em sample sample ed exams
depends by logistic interactio independe sizes and (e.g., GRE,
on the functions n. nt once complex NCLEX,
person’s with item calibrated. statistical CAT
ability parameter ✔ Allows software. exams),
level (θ) s: • a = computeri modern
and the discriminati zed psychologi
characteri on • b = adaptive cal
stics of difficulty • testing. ✔ assessme
each item. c= Provides nts.
guessing precision
(3-paramet at different
er model). ability
levels.
Type of Validity What It Means How It’s Checked Example
Content Validity The test covers all Expert judgment; A math exam
important parts of comparing test items includes all key
the concept or skill. to the topic or topics taught in
competency outline. class.
Criterion-related The test predicts or Correlation between • Predictive:
Validity relates to an test scores and a Entrance exam
external criterion relevant criterion. scores predict future
(real-world outcome). Two subtypes: college GPA. •
Predictive and Concurrent: Current
Concurrent. job skills test
correlates with
supervisor ratings.
Construct Validity The test truly Statistical analysis: A depression scale
measures the factor analysis, really measures
theoretical correlations with depression, not just
construct it claims to related tests general sadness.
measure. (convergent) or
unrelated tests
(discriminant).
Type Purpose Example
Exploratory Factor Used when you don’t know A researcher designs a new
Analysis (EFA) the structureyet. Lets the personality questionnaire and
data reveal how many factors explores what dimensions
exist. (like extraversion, openness)
appear.
Confirmatory Factor Used when you already have Testing if a 5-factor model of
Analysis (CFA) a theory or model and want personality (Big Five) fits the
to test if the data fit that responses from a large
model. survey.
Term Meaning
Factor Loading The correlation between an item and the
factor (like how strongly a question measures
a trait).
Eigenvalue Indicates how much variance a factor
explains; factors with eigenvalue ≥1 are
usually kept.
Communality The proportion of a variable’s variance
explained by the factors.
Rotation Mathematical method to simplify factor
structure for easier interpretation (e.g.,
Varimax, Oblimin).
Type of Bias / Error What Happens Example Key Point to
Remember
Halo Effect Rater lets one Because an Judge each trait
positive trait employee is very separately.
influence all ratings. friendly, the manager
also rates their
productivity as
excellent even if it’s
average.
Horns Effect Opposite of halo: one A worker is often late, Avoid letting one flaw
negative traitlowers so the supervisor affect other areas.
all ratings. also scores their
teamwork as poor,
even if they are
cooperative.
Leniency Error Rater gives higher A supervisor gives all Be objective; use
scores than team members clear standards.
deserved to “Excellent” to avoid
everyone. conflict.
Severity (Strictness) Rater gives lower A very demanding Balance standards
Error scores than manager rates even with evidence.
deserved. strong employees as
“average.”
Central Tendency Rater avoids Giving everyone Use the full rating
extremes and rates “Satisfactory” even scale when justified.
most people in the when some are
middle. excellent or poor.
Recency Effect Rater focuses on An employee’s strong Review the entire
most recent work last week evaluation period.
behavior, forgetting overshadows months
earlier performance. of poor output.
First Impression Early impressions An employee’s great Base ratings on total
(Primacy) Effect strongly influence performance in the evidence, not first
ratings. first month leads to impressions.
consistently high
ratings despite later
decline.
Contrast Effect Rating is influenced A good performer Rate each employee
by comparison to looks average when against the criteria,
others, not by rated right after a star not peers.
standards. employee.
Similar-to-Me Bias Rater favors people A manager rates a Focus on job-related
who are similar in fellow alumnus higher behaviors.
background or than others with
interests. equal performance.
Stereotyping Judgments are based Assuming older Evaluate individuals,
on group employees are less not groups.
characteristics tech-savvy and rating
(gender, age, culture) them lower.
rather than actual
performance.
Attribution Error Rater misjudges Blaming an Consider situational
reasons for behavior employee’s poor factors.
(internal vs. external performance on
causes). laziness rather than a
lack of resources.
TEST DEVELOPMENT
1. Test Conceptualization
2. Test Construction
3. Test tryout
4. Test Revision
Tool / Test When to Use Concrete Example What it Tells You
Pearson’s r Two continuous 📏 Height & Weight – Strength & direction
(interval/ratio) check if taller people of a linear
variables, roughly weigh more. relationship (e.g., r
normal. = +0.80).
Spearman’s rho (ρ) At least ordinal data, 🎓 Class rank vs. Correlation of ranked
or when not normal. hours studied. data.
Kendall’s Tau (τ) Small sample or 🏅 Compare two Agreement of two
many tied ranks. judges’ rankings in a ranking sets.
talent show.
Point–Biserial One variable 👩🎓 Gender (M/F) & Relationship between
Correlation continuous, the math score. a binary and a
other true/false. continuous variable.
Biserial Correlation One variable 📝 Pass/Fail (cut at Underlying
continuous, the other 75) vs. actual exam relationship beyond
an artificial score. the artificial split.
dichotomy.
Phi Coefficient (φ) Both variables 🚭 Smoking (yes/no) Association of two
dichotomous. & lung disease binary variables.
(yes/no).
Cramer’s V Two nominal 🍕 Pizza topping Strength of
variables in a preference vs. blood association (0 =
contingency table type. none, 1 = perfect).
larger than 2×2.
Eta (η) One variable 🎶 Music genre vs. Degree of association
nominal, other average study hours. without assuming
continuous, linearity.
relationship may be
non-linear.
t-test (Independent) Compare mean 🧠 IQ of males vs. If group means differ
scores of two females. significantly.
independent
groups.
t-test Compare mean 🏫 Pre-test vs. If the group changed
(Paired/Dependent) scores of the same post-test anxiety over time.
group at two time scores after therapy.
points.
One-way ANOVA Compare means of 3 🍎 Effect of three If at least one group
or more different diets on mean differs.
independent weight loss.
groups.
Repeated-Measures Same subjects 📅 Mood ratings Whether average
ANOVA measured 3+ times. across three therapy scores change
sessions. across
conditions/time.
Chi-Square Test of Two categorical 🏥 Smoking status Whether the two
Independence variables. vs. presence of categorical variables
asthma. are related.
Mann–Whitney U Non-parametric 💼 Median job Difference in medians
alternative to satisfaction of two between two groups.
independent t-test departments.
(ordinal or not
normal).
Wilcoxon Non-parametric 😌 Stress scores Difference in paired
Signed-Rank alternative to paired before vs. after observations without
t-test.
meditation in same normality
group. assumption.
Kruskal–Wallis H Non-parametric 🏡 Comparing Difference in medians
alternative to median rent prices among 3+ groups.
one-way ANOVA. across 4 cities when
data are skewed.
Correlation Ratio Used in ANOVA to Diet type explaining Proportion of
(η²) show effect size. 30% of weight loss variance explained by
variance. group membership.
Regression Predict a continuous 📈 Predict job Strength & direction
(Simple/Multiple) outcome from one or performance from IQ of prediction.
more predictors. and experience.
Test Name Developer(s) Main Target Users Format & Special
(Acronym) / Year Purpose / / Age Group Scoring Notes /
What It Application
Measures
Philippine UP Institute Measures High school Paper-and-pe Widely used
Aptitude of general and college ncil; in schools for
Classificatio Psychology, scholastic students. multiple-choic career
n Test 1970s aptitude in e; raw scores guidance
(PACT) verbal, converted to and
numerical, percentile identifying
and abstract ranks and potential
reasoning; stanines. college
helps identify majors.
academic
strengths.
Philippine UP Institute Measures Grade-school Multiple-choic Developed to
Intelligence of general to high e; raw scores avoid cultural
Test (PIT) Psychology, intellectual school converted to bias of
1960s ability suited students. IQ-type Western IQ
to Filipino standard tests.
language and scores
culture. (mean = 100,
SD = 15).
Philippine Dr. Ester A. Assesses High school, Likert-type Commonly
Personality Bugarin & UP major college items; used in
Inventory Institute of personality students, and scoring uses employee
(PPI) Psychology, traits such adults norm-refere selection,
1970s as emotional (especially in nced guidance,
stability, HR). T-scores. and
sociability, counseling.
and
dependability.
Panukat ng Dr. Maria Filipino-langu High school 220-item One of the
Ugali at Lourdes age test of and college paper-and-pe few fully
Pagkatao Carandang & personality students. ncil; scores Filipinoperso
(PUP) colleagues, traits: reported as nality tests;
late 1970s leadership, percentiles ideal for
emotional per trait. school
stability, guidance
social programs.
orientation,
self-reliance,
etc.
Philippine UP Institute Identifies High school Checklist Used in
Occupationa of career and college format; career
l Interest Psychology, interestsacro students. scores counseling
Inventory 1980s ss different expressed as and
(POII) occupational interest vocational
fields. profiles. guidance.
Philippine Dr. Alfredo Measures Adolescents Likert-type Often used in
Values Lagmay & core Filipino and adults. items; research on
Survey team, 1980s values (e.g., normative Filipino
(PVS) hiya, scoring. culture and
pakikisama, social
utang na psychology.
loob).
Filipino Adapted by Assesses Filipino 5-item Used in
Family Dr. Ramon L. family families; checklist; community
APGAR Castillo from functioning adolescents scores health, social
the original and to adults. categorized work, and
APGAR, supportin as high, family
1990s five areas: moderate, or counseling.
Adaptability, low family
Partnership, function.
Growth,
Affection,
Resolve.
Masaklaw na Dr. Lillian P. Measures Filipino 150-item Useful in
Panukat ng Gonzales, self-concept adolescents self-report; guidance
Loob 1990s and and young scores in counseling
(MAPANLOO self-percepti adults. percentile and
B) on: physical, ranks. development
social, al research.
intellectual,
and moral
self.
Philippine UP Institute Evaluates College Likert scale; Applied in
Emotional of emotional students and scoring in organization
Quotient Psychology, intelligence working standard al training
Inventory early 2000s (self-awarene adults. scores. and
(PEQI) ss, empathy, counseling
relationship settings.
skills).
Test Name Developer(s) Main Area Scoring Key Important
Measured Method Features Notes on
Results
Wechsler David Global IQ Raw scores Gold-standar Examine
Adult Wechsler and 4 → Scaled d adult IQ pattern of
Intelligence (first ed. indexes: scores test; updated index scores;
Scale 1955; IV = Verbal (mean = 10, norms for large gaps
(WAIS-IV) 2008) Comprehensi SD = 3) → ages 16–90 may signal
on, Index & learning or
Perceptual Full-Scale IQ cognitive
Reasoning, (mean = 100, disorders
Working SD = 15) even if FSIQ
Memory, is average
Processing
Speed
Wechsler David Children’s Same as For ages Discrepancie
Intelligence Wechsler intellectual WAIS (IQ 6–16; parallel s between
Scale for (1st ed. 1949; ability mean = 100, to WAIS indexes help
Children V = 2014) SD = 15) identify
(WISC-V) specific
learning
disabilities
Stanford–Bi Lewis Five factors Raw scores Covers Compare
net Terman(1916 of cognitive → Standard 2–85+ yrs; factor scores
Intelligence adaptation of ability IQ (mean = long history to detect
Scales (5th Binet–Simon) 100, SD = of use giftedness or
ed.) 15) development
al delay
Raven’s John C. Non-verbal Raw score → Culture-fair, Ideal when
Progressive Raven(1938) reasoning, Percentile language-fre verbal or
Matrices fluid rank & e educational
intelligence standard bias must be
score vs. minimized
age norms
Minnesota Starke Personality Raw → Validity T ≥ 65
Multiphasic Hathaway & structure & T-scores(me scales detect suggests
Personality J.C. psychopathol an = 50, SD faking or clinically
Inventory–2 McKinley(19 ogy = 10) inconsistent significant
(MMPI-2) 43; 2nd ed. answers elevation
1989)
16 Raymond B. 16 primary & Raw → Sten Widely used Extreme sten
Personality Cattell 5 global scores(1–10, in counseling scores (1–3
Factors (1949) personality mean = 5.5) & HR or 8–10)
(16PF) traits show
pronounced
traits
NEO Paul Costa “Big Five” Raw → Provides Scores ±1
Personality & Robert traits T-scores(me domain & SD indicate
Inventory–R McCrae an = 50, SD facet profiles high or low
evised (1992) = 10) trait levels
(NEO-PI-R)
Beck Aaron T. Severity of 21 items Quick Reflects
Depression Beck(1996) depressive rated 0–3; self-report current
Inventory–II symptoms Total 0–63; depression
(BDI-II) cutoffs: 0–13 level; not a
minimal, stand-alone
14–19 mild, diagnosis
20–28
moderate,
29–63 severe
Hamilton Max Anxiety Clinician Covers ≥25 =
Anxiety Hamilton(19 severity rates 14 psychic & moderate–se
Rating Scale 59) items 0–4; somatic vere anxiety
(HAM-A) Total 0–56 anxiety
Rorschach Hermann Personality Responses Classic Interpretation
Inkblot Test Rorschach(1 dynamics, coded using projective requires
921) unconscious Exner test using 10 specialized
processes Comprehens inkblots training;
ive System results are
or R-PAS descriptive,
not a direct
diagnosis
Thematic Henry Motives, Qualitative 31 picture Reveals
Apperceptio Murray & needs, story cards; underlying
n Test (TAT) Christiana interpersonal analysis; projective drives and
Morgan themes some conflicts
(1935) systems use
need/press
coding
Strong Edward Vocational Raw → Matches High
Interest Kellogg interests Standard interests to similarity to
Inventory Strong Sr. scores & occupational career
(1927; percentile themes groups
revised many ranks suggests
times) good
vocational fit
Child Thomas Emotional & Parent/teach Versions for T ≥65
Behavior Achenbach( behavioral er ratings → 1½–5 and indicates
Checklist 1983) problems in T-scores 6–18 yrs clinically
(CBCL) children (mean = 50, significant
SD = 10) concerns
Draw-A-Pers Florence Cognitive Scoring Simple, Results must
on Test Goodenoug maturity differs: quick; often be
(DAP) h(1926); later (Goodenough cognitive used with interpreted in
Machover ) or version uses children context with
(projective personality quantitative other data
version 1949) (Machover) criteria;
projective
version
qualitative
Bender Lauretta Visual-motor Qualitative Nine Used as
Visual-Motor Bender integration & analysis of geometric screening for
Gestalt Test (1938) possible reproduction figures to neurological
brain errors; some copy impairment
dysfunction systems give
quantitative
scores
Philippine UP Institute Scholastic Raw → Filipino-norm Culturally
Aptitude of aptitude: Percentile ed; for career valid
Classificatio Psychology( verbal, ranks & guidance measure of
n Test 1970s) numerical, stanines academic
(PACT) abstract potential
reasoning
Philippine Ester Major Likert items HR selection Culture-speci
Personality Bugarin & personality → T-scores & counseling fic norms
Inventory UP Institute traits for reduce
(PPI) of Filipinos Western bias
Psychology(
1970s)
Panukat ng Maria Personality 220 items → Fully Filipino Suited for
Ugali at Lourdes traits in Percentile personality school
Pagkatao Carandang Filipino scores per test guidance and
(PUP) language trait HR
et al. (late
1970s)