CHAPTER 3: A Statistic Refresher 2. Median: The middle score in a distribution.
Reviewer for Psychological Testing and 3. Mode: The most frequently occurring score.
Assessment
Measures of Variability
Chapter 3: A Statistic Refresher
• Range: Difference between highest and
Scales of Measurement lowest scores.
• Measurement: Assigning numbers or • Interquartile Range (IQR): Difference
symbols to characteristics of things between Q3 and Q1 (middle 50%).
according to rules.
• Semi-Interquartile Range: IQR divided by 2.
• Scale: A set of numbers or symbols
representing empirical properties. • Standard Deviation: Square root of variance;
indicates score dispersion.
Types of Scales:
• Skewness:
1. Nominal Scale: Classification or
categorization (e.g., gender in a study). o Positive skew: Few high scores.
2. Ordinal Scale: Rank ordering but no absolute o Negative skew: Few low scores.
zero (e.g., intelligence test rankings). • Kurtosis:
3. Interval Scale: Equal intervals between o Platykurtic: Flat distribution.
numbers but no absolute zero (e.g., IQ
scores). o Leptokurtic: Peaked distribution.
4. Ratio Scale: Has a true zero point (e.g., time o Mesokurtic: Normal distribution.
taken to complete a puzzle). The Normal Curve
Describing Data • Bell-shaped, mathematically defined curve
• Distribution: A set of test scores arranged for with symmetry.
study. • Developed by DeMoivre, Laplace, and later
• Raw Score: Unmodified numerical Pearson.
representation of performance. Standard Scores
• Frequency Distributions: • Standard Score: Converts raw scores into a
o Simple Frequency Distribution: Lists scale with a set mean and standard
all scores and their occurrences. deviation.
o Grouped Frequency Distribution: • Types of Standard Scores:
Groups test scores into intervals. 1. Z-score: Indicates how many
• Graph Types: standard deviations a raw score is
from the mean.
1. Histogram: Vertical bars at true limits of test
scores. 2. T-score: Mean of 50, standard
deviation of 10.
2. Bar Graph: Non-contiguous rectangular bars
showing categorical data. 3. Stanine: Standardized nine-point
scale (1 to 9).
3. Frequency Polygon: Continuous line
connecting score points.
Measures of Central Tendency
1. Mean: The average score.
CHAPTER 5: Reliability o Measures the correlation among test
items without needing multiple test
I. Key Concepts administrations.
• Reliability: The consistency of a o Homogeneous: Items measure a
measurement. single trait.
• Reliability Coefficient: The proportion that o Heterogeneous: Items measure
indicates the ratio between true score
multiple factors.
variance and total variance.
IV. Methods of Internal Consistency Reliability
• Variance: A statistic describing the sources
of test score variability. 1. Split-Half Reliability
o True Variance: Variance due to actual o Divides the test into two halves and
differences in the trait being correlates the scores.
measured.
o Uses Pearson r and the Spearman-
o Error Variance: Variance caused by Brown Formula.
random factors.
o Odd-Even Reliability: A specific type
II. Sources of Error Variance of split-half reliability.
1. Test Construction 2. Spearman-Brown Formula
2. Test Administration o Estimates reliability based on test
length.
3. Test Scoring and Interpretation
o Not suitable for heterogeneous tests
III. Types of Reliability Estimates or speed tests.
1. Test-Retest Reliability 3. Kuder-Richardson Formulas
o Correlates scores from the same test o KR-20: Measures inter-item
taken at different times. consistency for dichotomous items.
o Coefficient of Stability: Used if the o KR-21: Used when test difficulty is
time interval is greater than six consistent.
months.
4. Coefficient Alpha (Cronbach’s Alpha)
2. Parallel-Forms & Alternate-Forms
Reliability o Mean of all possible split-half
correlations.
o Coefficient of Equivalence:
Measures the relationship between o Used for Likert-scale tests and non-
different test forms. dichotomous items.
o Parallel Forms: Identical means and V. Measures of Inter-Scorer Reliability
variances; scores correlate equally
with the true score. • Inter-Scorer Reliability: Agreement between
multiple raters.
o Alternate Forms: Different versions
of a test, designed to be parallel. • Coefficient of Inter-Scorer Reliability: A
correlation coefficient measuring scorer
o Both require two test administrations consistency.
and may be affected by factors like
fatigue or practice. • Kappa Statistic: Used to calculate inter-
scorer agreement.
3. Internal Consistency Reliability
VI. Standard Error of Measurement (SEM)
• Estimates the amount of error in an observed o Zero CVR: Exactly half of the
score. panelists rate an item as essential.
• Measures precision of test scores. o Positive CVR: More than half but not
all panelists rate an item as essential.
3. Criterion-Related Validity
CHAPTER 6: Validity
• Evaluates how well test scores predict
I. Definition of Validity outcomes based on a criterion.
• Validity: The extent to which a test measures • Criterion: A standard used for evaluating the
what it claims to measure in a specific accuracy of test scores.
context.
• Characteristics of a good criterion:
• Trinitarian View of Validity:
1. Relevant – Applicable to the test.
1. Content Validity – Evaluates the
extent to which a test covers the 2. Valid – Meaningful for its intended
subject matter. purpose.
2. Criterion-Related Validity – 3. Uncontaminated – Not influenced by
Examines the relationship between predictor measures.
test scores and external measures.
• Types of Criterion-Related Validity:
3. Construct Validity – Assesses
whether a test aligns with theoretical 1. Concurrent Validity: Compares test
concepts. scores with criterion scores
collected at the same time.
2. Predictive Validity: Measures how
II. Types of Validity well test scores predict future
criterion scores.
1. Face Validity
• Statistical Measures of Criterion-Related
• Refers to how a test appears to measure a
Validity:
certain trait, from the perspective of test-
takers. o Validity Coefficient: Correlation
between test scores and criterion
• More about perception than actual scores.
psychometric soundness.
o Incremental Validity: Determines
2. Content Validity how much a new predictor improves
• Determines whether test items predictive ability.
representatively sample the subject matter. o Expectancy Data: Uses expectancy
• Used in objective, achievement, and tables to predict the probability of
aptitude tests. certain outcomes.
• Two key tools: 4. Construct Validity
o Table of Specification (TOS) • Assesses whether test scores meaningfully
relate to a theoretical concept.
o Subject Matter Expertise (SME)
• Construct: A theoretical trait or ability that is
• Content Validity Ratio (CVR): Formula used not directly observable (e.g., intelligence,
to quantify content validity. motivation).
o Negative CVR: Less than half the • Evidence of Construct Validity:
panelists rate an item as essential.
1. Convergent Evidence – High 1. Test Conceptualization – Defining the
correlation with similar constructs. purpose and scope of the test.
2. Divergent Evidence – Low correlation 2. Test Construction – Developing items and
with unrelated constructs. selecting the format.
3. Factor Analysis – Identifies test 3. Test Tryout – Administering a preliminary
components that contribute to a version of the test.
construct.
4. Item Analysis – Evaluating the quality of test
4. Evidence of Homogeneity – Shows items.
that test items measure a single
construct. 5. Test Revision – Modifying the test based on
findings from item analysis.
5. Evidence of Changes with Age –
Constructs develop predictably over
time. II. Test Conceptualization
6. Evidence from Pretest-Posttest • The initial stage where an idea for a test is
Changes – Scores change in conceived.
response to interventions.
• Important questions to consider:
7. Evidence from Distinct Groups –
Test differentiates between groups o What will the test measure?
known to differ on the construct. o Who will take and use the test?
o How will it be administered?
III. Validity, Bias, and Fairness o What responses will it require?
1. Test Bias o How will scores be interpreted?
• Refers to systematic errors that unfairly • Norm-Referenced vs. Criterion-Referenced
advantage or disadvantage certain groups. Tests:
• Types of Rating Errors: o Norm-Referenced Tests: Compare
o Leniency Error – Overly generous test-takers’ scores to a group norm.
ratings. o Criterion-Referenced Tests:
o Severity Error – Overly harsh ratings. Measure mastery of specific skills or
knowledge.
o Central Tendency Error – Avoiding
extreme ratings. Pilot Work
o Halo Effect – Rating influenced by • Preliminary research to refine the test
unrelated characteristics. before full development.
2. Test Fairness • Involves literature reviews, experimenting
with test items, and refining content.
• Ensures that tests are administered and
interpreted equitably for all individuals.
III. Test Construction
CHAPTER 8: Test Development Scaling Methods (Assigning Numbers to
Measurement)
I. Stages of Test Development
• Age-Based Scaling – Compares performance
The process of developing a psychological test based on age.
involves five key stages:
• Grade-Based Scaling – Compares o Reduces floor effects (difficulty at
performance by educational level. low levels) and ceiling effects
(difficulty at high levels).
• Stanine Scaling – Transforms raw scores into
a scale from 1 to 9.
• Unidimensional vs. Multidimensional V. Test Tryout
Scales:
• Administering the test to a sample similar to
o Unidimensional – Measures a single the target population.
construct.
• Must simulate real testing conditions as
o Multidimensional – Measures closely as possible.
multiple constructs.
Common Scaling Methods
VI. Item Analysis
1. Rating Scale – Assesses the intensity of a
trait (e.g., Likert scale). Statistical techniques used to evaluate test items:
2. Method of Paired Comparisons – Presents 1. Item Difficulty Index – Measures how many
pairs of stimuli for comparison. test-takers answered correctly.
3. Comparative Scaling – Requires ranking of 2. Item Reliability Index – Indicates internal
items. consistency of the test.
4. Categorical Scaling – Assigns items to 3. Item Validity Index – Assesses if items
distinct categories. measure the intended construct.
5. Guttman Scaling – Arranges items from 4. Item Discrimination Index – Determines
how well an item differentiates high and low
weakest to strongest expression.
scorers.
5. Item Characteristic Curve – Graphically
IV. Writing Test Items represents difficulty and discrimination.
• Item Pool: The collection of potential test Other Considerations in Item Analysis
items.
• Guessing – Difficult to control but affects test
• Two Major Item Formats: accuracy.
1. Selected-Response Items: Multiple- • Bias – Items should not unfairly favor one
choice, matching, true/false. group over another.
2. Constructed-Response Items: • Speed Tests – Later items may appear more
Short-answer, essay, completion- difficult simply due to time constraints.
type.
Computerized Test Item Development
VII. Test Revision
• Item Bank – A large collection of test items
for future use. • Occurs at two stages:
• Item Branching – Adjusts test difficulty 1. During new test development – After
based on a test-taker’s responses. analyzing test results.
• Computerized Adaptive Testing (CAT): 2. Throughout the life cycle of an
existing test – When updates are
o Dynamically selects items based on necessary.
prior responses.
• Cross-Validation: Testing on a different
sample to confirm validity.
• Validity Shrinkage: A decrease in test validity o Used to predict likelihood of
after cross-validation. success based on test scores.
• Co-Validation: Testing multiple assessments o Expectancy tables help categorize
on the same sample for efficiency. test-takers into passing, acceptable,
or failing groups.
Other Scoring Considerations
o Taylor-Russell and Naylor-Shine
• Anchor Protocol: A highly accurate reference
Tables: Used to analyze test validity in
scoring method. employment selection.
• Scoring Drift: Changes in scoring 2. Brogden-Cronbach-Gleser Formula
consistency over time.
o Used to estimate the monetary or
practical benefits of using a test in
CHAPTER 9: Utility selection decisions.
I. Definition of Utility
• Utility: The usefulness or practical value of IV. Practical Considerations in Utility Analysis
a test or assessment. • Size of the Applicant Pool – Affects how
• Helps determine whether a test improves selective the hiring process can be.
efficiency in decision-making. • Job Complexity – More complex jobs require
• Can also refer to the effectiveness of a more predictive and specialized tests.
training program or intervention. • Cut Scores – The minimum score required for
passing or selection.
II. Factors Affecting a Test’s Utility
1. Psychometric Soundness – A test is useful if V. Methods for Setting Cut Scores
it provides reliable and valid information for 1. Angoff Method
decision-making.
o Experts estimate how minimally
2. Costs – Expenses related to: competent individuals would
o Purchasing the test. perform.
o Printing test materials. o Issue: Low inter-rater reliability can
lead to disagreement.
o Scoring and interpretation (manual or
computerized). 2. Known Groups Method
3. Benefits – The profits, advantages, or o Compares test scores of groups
improvements gained from using a test. known to have or lack a trait.
o Issue: The cut score depends on
group composition, which can vary.
III. Utility Analysis
3. IRT-Based Methods (Item Response Theory)
• Definition: A family of techniques used for
cost-benefit analysis of a test. o Determines cut scores based on test-
takers’ performance across all
• Determines whether the use of a test is worth items.
the investment.
o Item Mapping Method: Used for
How Utility Analysis is Conducted licensing exams.
1. Expectancy Data o Bookmark Method: Common in
academic settings.
4. Discriminant Analysis ▪ Assimilation – Incorporating
new information into existing
o A statistical method to classify structures.
individuals into categories (e.g.,
successful vs. unsuccessful ▪ Accommodation – Modifying
employees). existing structures to fit new
information.
CHAPTER 10: Intelligence and its Measurement
III. Factor-Analytic Theories of Intelligence
I. Definition of Intelligence
Factor analysis identifies relationships between
• Intelligence is a multifaceted capacity that different cognitive abilities.
includes the ability to:
1. Charles Spearman (Two-Factor Theory)
o Acquire and apply knowledge
o Intelligence consists of:
o Reason logically
▪ General Intelligence (g) –
o Plan effectively Affects all cognitive tasks.
o Solve problems ▪ Specific Abilities (s) – Unique
o Make sound judgments to particular tasks.
o Visualize concepts 2. Louis Thurstone (Primary Mental Abilities)
o Adapt to new situations o Identified seven abilities (e.g., verbal
comprehension, numerical ability).
o Find words and thoughts easily
o Later acknowledged g-factor
o Pay attention and be intuitive influences all abilities.
3. J.P. Guilford (Structure of Intellect Model)
II. Early Theories of Intelligence o Rejected g and proposed intelligence
1. Francis Galton – First to study the consists of 150+ abilities.
heritability of intelligence. 4. Howard Gardner (Multiple Intelligences)
o Believed intelligence was hereditary o Identified seven types of intelligence:
and linked to sensory abilities.
▪ Logical-mathematical
2. Alfred Binet – Developed the first intelligence
test. ▪ Linguistic
o Saw intelligence as including ▪ Musical
reasoning, judgment, memory, and ▪ Spatial
abstraction.
▪ Bodily-kinesthetic
3. David Wechsler – Viewed intelligence as a
global capacity. ▪ Interpersonal
o Emphasized the role of non- ▪ Intrapersonal
intellective factors (e.g., motivation).
o Basis for emotional intelligence
4. Jean Piaget – Intelligence as a biological theories.
adaptation.
5. Raymond Cattell (Fluid & Crystallized
o Learning occurs through: Intelligence)
o Crystallized Intelligence (Gc) – o Arithmetic and verbal reasoning
Knowledge from education and decline in later adulthood.
experience.
o After age 75, cognitive abilities
o Fluid Intelligence (Gf) – Nonverbal, decline significantly.
culture-free, problem-solving
ability.
CHAPTER 11: Tests of Intelligence
6. John Horn (Extended Cattell’s Model)
o Added factors like visual, auditory, I. The Stanford-Binet Intelligence Scales
and quantitative processing. 1. First Edition (Original Stanford-Binet)
o Distinguished vulnerable abilities o First test to provide organized and
(decline with age) vs. maintained detailed administration and scoring.
abilities (stay stable).
o First American test to introduce the
7. John Carroll (Three-Stratum Theory) Intelligence Quotient (IQ).
o Intelligence has three levels: o Introduced alternate items (items
▪ Stratum I – Narrow abilities used under special conditions).
(e.g., memory, speed). 2. Fourth Edition
▪ Stratum II – Broad abilities o Used a point scale (organized by item
(e.g., Gf, Gc). category, not age).
▪ Stratum III – g-factor (general o Based on the Cattell-Horn model of
intelligence). intelligence.
8. CHC Model (Cattell-Horn-Carroll) 3. Fifth Edition (SB5)
o A combination of Cattell’s and o Based on the Cattell-Horn-Carroll
Carroll’s models. (CHC) theory.
o Guides modern intelligence testing. o Administered to individuals ages 2 to
85+.
IV. Measuring Intelligence o Includes 10 subtests that contribute
to a Full Scale IQ.
• Mental Age – Compares performance to a
specific age group. o Subtest scores: Mean = 10, Standard
Deviation = 3.
o Includes a behavioral checklist for
V. Intelligence: Key Issues examiners.
1. Nature vs. Nurture
o Preformationism: Intelligence is II. The Wechsler Tests
fixed at birth.
• Created by David Wechsler.
o Predeterminism: Intelligence is
genetically determined and • Designed for preschoolers to adults.
unchangeable. • Evolution of Wechsler tests:
2. Stability of Intelligence o WAIS-IV (Wechsler Adult Intelligence
o Vocabulary improves with age. Scale)
o WISC-IV (Wechsler Intelligence Scale
for Children)
o WPPSI-III (Wechsler Preschool and o Example: Standardized IQ tests.
Primary Scale of Intelligence)
2. Divergent Thinking
1. Wechsler Adult Intelligence Scale (WAIS-IV)
o Creative reasoning: Generates
• Consists of core and supplemental multiple possible solutions.
subtests:
o Requires flexibility, originality, and
o Core subtests: Used to calculate a imagination.
composite score.
o Example: Creative problem-solving
o Supplemental subtests: Provide tasks.
additional clinical information.
• 10 Core Subtests:
o Block Design, Similarities, Digit Span,
Matrix Reasoning, Vocabulary,
Arithmetic, Symbol Search, Visual
Puzzles, Information, Coding.
• 5 Supplemental Subtests:
o Letter-Number Sequencing, Figure
Weights, Comprehension,
Cancellation, Picture Completion.
2. Wechsler Intelligence Scale for Children (WISC-
IV)
• First published in 1949.
• Based on the CHC model of intelligence.
• Contains five supplemental tests (adds
about 30 minutes to testing).
3. Wechsler Preschool and Primary Scale of
Intelligence (WPPSI-III)
• First test to properly sample the total U.S.
population, including racial minorities.
• Tests children from 2 years 6 months and up.
• Subtests are categorized as core,
supplemental, or optional.
• Used for children with short attention spans
or special conditions.
III. Thinking Styles in Intelligence
1. Convergent Thinking (Guilford, 1967)
o Deductive reasoning: Narrows down
solutions to one correct answer.
o Requires fact recall and logical
judgment.