Constructing Measures:
An Item Response Modeling Approach
(Expanded and Revised 2nd Edition)
Mark Wilson
University of California, Berkeley
Draft 1.0 July 2022
Not for citation or quotation without the author’s permission.
Table of Contents
Preface
Aims of the book
Audience for the book
Structure of the book
Using this book to teach a course
Acknowledgements
Part I. A Constructive Approach to Measurement
Chapter 1: The BEAR Assessment System: Overview of the "4 building blocks"
approach
1.0 Book Overview
1.1 What is “Measurement”
1.1.1 Construct Modeling
1.2 The BEAR Assessment System (BAS)
1.3 The Construct Map
1.3.1 Example 1: The MoV Construct in the Data Modeling Assessments
1.4 The Items Design
1.4.1 Example 1: MoV Items
1.4.2 The Relationship Between the Construct and the Responses
1.5 The Outcome Space
1.5.1 Example 1: The MoV Outcome Space
1.6 The Calibration Model
1.6.1 Example 1, continued: The MoV Wright Map
1.6.2 Return to the Discussion of Causation and Inference
1.7 Reporting the Results to the Measurer and Other Users
1.8 Using the 4 Building Blocks to Develop an Instrument
1.9 Resources
1.10 Exercises and activities
Appendix 1A The MoV Outcome Space
Appendix 1B The BEAR Assessment System (BAS): Papers about its uses and
applications
Textbox 1.1 Some Useful Terminology
Part II. The Four Building Blocks
Chapter 2: Construct Maps
2.0 Chapter Overview
2.1 The Construct Map
ii
2.2 Examples of Construct Maps
2.2.1 Example 1: The Six Constructs in the Data Modeling Curriculum
2.2.2 Example 2: A Social and Emotional Learning Example (RIS-Researcher
Identity Scale).
2.2.3 Example 3: An Attitude Example (GEB- General Ecological Behavior)
2.2.4 Example 4: A 21st Century Skills Example (LPS-Argumentation)
2.2.5 Example 5: A Process Measurement Example: Collaborative Problem
Solving (CPS)
2.2.6 Example 6: A Health Assessment Example (PF-10—Physical Functioning
10)
2.2.7 Example 7: An Interview Example (CUE- Conceptual Underpinnings of
Evolution).
2.2.8 Example 8: An Observational Instrument: Early Childhood (DRDP)
2.2.9 The Issues Evidence and You Science Assessment (IEY)
2.3 Using Construct Mapping to Help Develop an Instrument
2.4 Resources
2.5 Exercises and activities
Appendix 2A The full set of GEB items
Chapter 3: The Items Design
3.0 Chapter Overview
3.1 The Idea of an Item
3.2 The Facets of the Items Design
3.2.1 The Construct Facet
3.2.2 The Secondary Facets
3.3 Different Types of Item Responses
3.3.1 Participant Observation
3.3.2 Specifying (Just) the Topics
3.3.3. Open-ended Items
3.3.4 Selected Response Items
3.3.5 Steps in Item Development
3.4 A Unique Feature of Human Measurement: Listening to the respondents
3.5 Building-in Fairness through Design
3.5.1. Universal Design
3.6 Resources
3.7 Exercises and activities
Appendix 3A: The Item Panel
Appendix 3B: Supporting Construct-Irrelevant Action
Chapter 4: The Outcome Space
4.0 Chapter Overview
4.1 The Qualities of an Outcome Space
4.1.1 Well-defined Categories
4.1.2 Research-based Categories
4.1.3 Context-specific Categories
4.1.4 Finite and Exhaustive Categories
iii
4.1.3 Ordered Categories
4.2 Scoring the Outcome Space (The Scoring Guide)
4.3 General Approaches to Constructing an Outcome Space
4.3.1 Phenomenography
4.3.2 The SOLO Taxonomy
4.3.3 Guttman Items
4.4 When Humans Become a Part of the Items Design: The Rater
4.5 Resources
4.6 Exercises and Activities
Appendix 4A: The item pilot investigation
Appendix 4B: Matching Likert and Guttman Items in the RIS Example
Chapter 5: The Calibration Model
5.0 Chapter Overview
5.1 Combining the Two Approaches to Measurement
5.2 The Construct Map and the Rasch Calibration Model
5.2.1 The Wright Map
5.2.2 Modeling the Response Vector
5.3 The PF-10 Example (Example 6)
5.4 Reporting Measurements
5.4.1 Interpretation and Errors
5.4.2 The PF-10 Example (Example 6), continued
5.5 Resources
5.6 Exercises and Activities
Appendix 5A Results from the PF-10 Dichotomous Analysis
Textbox 5.1 Making Sense of Logits
Chapter 6: Using the Calibration Model
6.0 Chapter Overview
6.1 More than Two Score Categories: Polytomous Data
6.1.1 The PF-10 Example (Example 6), continued.
6.2 Evaluating Fit
6..2.1 Item Fit
6.2.2 Respondent Fit
6.3 Resources
6.4 Exercises and Activities
Appendix 6A Results for the PF-10 Polytomous Analysis
Textbox 6.1 The Partial Credit Model
Textbox 6.2 Calculating the Thurstone Thresholds
Part III. Quality Control Methods
Chapter 7: Choosing and Evaluating a Calibration Model
iv
7.0 Chapter Overview
7.1 Requirements for the Calibration model
7.1.1 Interpretation of Thurstone’s Requirement in Terms of the Construct Map
7.2 Evaluating Fit.
7.2.1 Item Fit
7.2.2 Respondent Fit
7.3 Resources
7.4 Exercises and activities
Textbox 7.1 Showing that Equation 6.5 Holds for the Rasch Calibration Model
Chapter 7: Trustworthiness, Precision and Reliability
7.0 Chapter Overview
7.1 Trustworthiness in Measurement
7.2 Measurement Error—Precision
7.3 Summaries of Measurement Error
7.3.1 Internal Consistency Coefficients
7.3.2 Test-Retest Coefficients
7.3.3 Alternate Forms Coefficients
7.3.4 Other Reliability Coefficients
7.4 Inter-rater Consistency
7.5 Resources
7.6 Exercises and activities
Appendix 7A Results from the PF-10 Analysis
Chapter 8: Trustworthiness, Validity and Fairness
8.0 Chapter Overview
8.1 Trustworthiness, continued
8.2 Evidence Based on Instrument Content
8.2.1 Instrument Content Evidence for Example 2, The Researcher Identity
Scale-G.
8.3 Evidence Based on Response Processes
8.3.1. Response Process Evidence Related to Example 8—the DRDP
8.4 Evidence Based on Internal Structure
8.4.1 Evidence of Internal Structure at the Instrument Level: Dimensionality
8.4.2 Dimensionality Evidence for Example 2, The Researcher Identity Scale-G
8.4.3 Evidence of Internal Structure at the Instrument-level: The Wright Map
8.4.4 Wright Map Evidence for Example 2, The Researcher Identity Scale-G
8.4.5 Evidence of Internal Structure at the Item-level
8.4.6 Item-level Evidence of Internal Structure for the PF-10 instrument
8.5 Evidence Based on Relations to Other Variables
8.5.1 “Other Variables” Evidence for Two Examples
8.6 Evidence Based on Consequences of Using an Instrument
8.7 Evidence Based on Fairness
8.7.1 Differential Item Functioning (DIF)
8.7.2 DIF Evidence for the RIS-G
v
8.8 Crafting a Full Validity Argument
8.9 Resources
8.10 Exercises and activities
Appendix 8A: Tables of Results for the RIS-G
Part IV. A Beginning Rather than a Conclusion
Chapter 9: Building on the Building Blocks
9.0 Chapter Overview
9.1 Choosing the Calibration Model
9.1.1 Interpretation of Thurstone’s Requirement in Terms of the Construct Map
9.2 Comparing Overall Model Fit
9.3 Beyond the Lone Construct Map—Multidimensionality
9.4 Resources
9.5 Exercises and activities.
Textbox 9.1 Showing that Equation 9.5 Holds for the Rasch Calibration Model
Textbox 9.2 Statistical Formulation of the Multidimensional Partial Credit model
Chapter 10: Beyond the Building Blocks
10.0 Chapter Overview
10.1 Beyond the Construct Map: Learning Progressions
10.2 Beyond the Items Design and the Outcome Space—Process Measurement
10.3 Beyond the Calibration Model—Considering a More Complex Scientific Model
10.4 Other Similar Frameworks: Principled Assessment Designs
10.4.1 Example: Evidence-Centered Design
10.4.2 Going “Outside the Triangle”
10.5 A Beginning Rather than a Conclusion
10.5.1 Further Reading About the History of Measurement in the Social Sciences
10.5.2 Further Reading About Alternative Approaches
10.5.3 Further Reading About the Philosophy of Measurement
10.6 Exercises and activities.
References
Appendices
Appendix 1 The Examples Archive
Appendix 2 Computerized design, development, delivery, scoring and reporting—BASS
Appendix 3 The BEAR Assessment System (BAS): Papers about its Uses and
Applications
vi