0% found this document useful (0 votes)

51 views10 pages

Psych Assessment Unit VIII

Test Development

Uploaded by

main.22001293

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views10 pages

Psych Assessment Unit VIII

Test Development

Uploaded by

main.22001293

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

TEST DEVELOPMENT

The Process of developing a test occurs in five stages: (CCTAR)

A. Test conceptualization
B. Test construction
C. Test tryout
D. Item analysis
E. Revision

A. TEST CONCEPTUALIZATION
 The beginning of any published test
 Defining the SCOPE, PURPOSE, and LIMITS of a test
 An emerging social phenomenon or pattern of behavior

SOME PRELIMINARY QUESTIONS?

1. What is the test designed to measure? (construct of interest)

2. What is the objective of the test? (Goal and use of test)
3. Is there a need for this test? (advantages)
4. Who will use this test? (test users and their purpose of use)
5. Who will take this test? (specific details of testtakers)
6. What content will the test cover? (scope)
7. How will the test be administered? [individual or group; pen and
8. paper or through computer]
9. What is the ideal format of the test? (TF: multiple choice)
10. Should more the one form of the test be developed?
11. What special training will be required of test users for administering or interpreting the test? (BG
and qualifications of test users)
12. What types of responses will be required of testtakers?
13. Who benefits from the administration of this test?
14. Is there any potential for harm as the result of an administration of this test? (ethics)
15. How will meaning be attributed to scores on this test? (score meaning and scoring procedures)

Another question to consider:

“Should the test be NORM-REFERENCED or CRITERION-REFERENCED?”

PILOT WORK
 Pilot work, pilot study, and pilot research refer to the preliminary research surrounding the
creation of a prototype of the test.
 Test items may be pilot studied to evaluate whether they should be included in the final
form of the instrument.
 Test developer typically attempts to determine how best to measure a targeted construct
 Once it has been completed, the process of test construction begins

Downloaded by SOFIA ANN REDULLA

B. TEST CONSTRUCTION

 Pilot work is a necessity when constructing tests or other measuring instruments for publication
and wide distribution

Scaling is the process of:

 designing and calibrating a measuring device
 assigning numbers (scale values) to different amounts of the trait, attribute, or
characteristic being measured

TYPES OF SCALES

1. Age−based scale – if the test taker’s test performance as function of age is of critical interest
2. Grade−based scale – test taker’s test performance as a function of grade
3. Stanine scale – all raw scorers on the test are to be transformed into scores that can range
from 1 to 9
4. Unidimensional scale - only one dimension is presumed to underlie the ratings
5. Multidimensional scale - more than one dimension is thought to guide the test taker’s
responses
6. Paired comparisons – test takers are presented with pairs of stimuli (e.g. two photos, two
objects, or two statements) from which they must select one of the stimuli according to some
rule (e.g. “Which of the two statements do you agree more?”)
7. Comparative scaling – one method of sorting that entails judgments of a stimulus in comparison
with every other stimulus on the scale. test taker compares an item (person/object/statement)
with every other item on the list and sort them out into a rank
- E.g. Arrange the following statements from most important to least important.
8. Categorical scaling – one of two or more alternative categories that differ quantitatively with
respect to some continuum
- For example, they may be asked to sort the cards into three piles: those behaviors that
are never justified, those that are sometime justified, and those that are always justifies.
9. Guttman scales – items on it range sequentially from weaker to stronger expressions or
feeling being measured. The scales are developed through the administration of a number
of items to a target group.
- Scalogram analysis – an item-analysis procedure and approach to test development that
involves a graphic mapping of a test taker’s responses

10. Rating scale – grouping of statements or symbols on which judgments of the strength of a
particular trait or emotion are indicated by the test taker.
11. Summative scale – final test score is obtained by summing the ratings across all the items
12. Likert scale – one type of summative rating scale usually used to scale attitudes, has five
alternative responses (sometimes even). Ordinal level data.

Downloaded by SOFIA ANN REDULLA

WRITING ITEMS

 What range of content should the items cover?

 Which of the many different types of item formats should be employed?
 How many items should be written in total and for each content area covered??

Item Pool - refers to the reservoir or well from which items will or will not be drawn for the final version of
the test.
 In writing an item pool it is advisable that the first draft would contain approximately twice the
number of items that the final version of the test contains
 If it poorly written, the test developer should either rewrite items sampling or create new items

HOW TO DEVELOP AN ITEM POOL?

 Write a large number of items from personal experience
 Asking for help from others, including experts
 Interview (to get insights that could assist in item writing)
 Searching through academic research literature and other databases

Item format
 Variables such as the form, plan, structure, arrangement and layout of test items.

TYPES OF ITEM FORMAT

 Selected−response format
o Requires testtakers to select a response from a set of alternative responses
 3 types: multiple choice, matching, binary choice
 Constructed−response format
o Requires testtakers to supply or to create the correct answer, not merely to select it
 3 types: completion item, short answer, essay

3 TYPES OF SELECTED−RESPONSE FORMAT

 MULTIPLE CHOICE FORMAT

 Has three elements
 1. A stem
 2. A correct alternative or option
 3. Distractors or foils
 BINARY CHOICE ITEM – only 2 possible responses
 Varieties of binary-choice format:
 agree or disagree
 yes or no
 right or wrong
 fact or opinion
 true or false
 ADVANTAGE: typically, easier to write than multiple−choice items because they cannot
contain distractors and can be written relatively quickly
 DISADVANTAGE: there is probability of obtaining a correct answer through chance
(guessing)

Downloaded by SOFIA ANN REDULLA

 MATCHING ITEM
 The test taker is presented with two columns: PREMISES on the left and RESPONSES
on the right.

3 TYPES OF CONSTRUCTED−RESPONSE ITEM

 COMPLETION ITEM
o Requires the examinee to provide a word or phrase that completes a sentence
o Example: The standard deviation is generally the most useful measure of ____.
o A good completion item should be worded so that the correct answer is specific
 SHORT−ANSWER ITEM
o Another form of completion item but much shorter and more specific
o Example: Distractors – refers to the incorrect responses in multiple−choice
 ESSAY ITEM - requires the test taker to respond to a question by writing a composition, typically
one that demonstrates recall of facts, understanding analysis, and/or interpretation
o The answer is usually in a paragraph/sentence form, a depth of knowledge
o DISADVANTAGES: focused on a more limited area in the same amount of time when
o using a series of selected-response items or completion items AND subjectivity in scoring
and inter-score difference.

WRITING ITEMS FOR COMPUTER ADMINISTRATION

Item bank - a relatively large and easily accessible collection of test questions

Item Branching - the ability of the computer to tailor the content and order of presentation of test items on
the basis of responses to previous items
− Computerized Adaptive Testing (CAT)
o An interactive, computer−administered test−taking process wherein items presented to
the testtaker are based in part on the testtaker’s performance on previous items
o The computer may not permit the testtaker to continue with the test until the practice
items have been responded to in a satisfactory manner and the test taker has
demonstrated an understanding of the test procedure
− CAT tends to reduce the FLOOR EFFECTS and CEILING EFFECTS

Floor effects (TOO HARD)

o Refers to the diminished utility of an assessment tool for distinguishing test takers at
the low end of the ability, trait, or other attribute being measured

Ceiling effects (TOO EASY)

o Refers to the diminished utility of an assessment tool for distinguishing test takers at
the high end of the ability, trait, or other attribute being measure.

Downloaded by SOFIA ANN REDULLA

SCORING ITEMS
Models of Test Scoring:
1. Cumulative Scoring - the higher the score on the test, the higher the test−taker
on the ability, trait, or other characteristics that the test purports to measure
2. Class scoring (Category scoring) - test taker responses earn credit toward
placement in a particular class or category with other test takers whose pattern
of responses is presumably similar in some way
 E.g. MBTI
3. Ipsative Scoring - Comparing a test taker’s score on one scale within a test to another
scale within that same test

C. TEST TRYOUT

a. The part of test development in which the test developer will try out the test
b. In test tryout:
i. The test should be tried out on people who are similar in critical
aspects to the people for whom the test was designed
ii. There should be no fewer than 5 subjects and preferably as many as 10
for each item on the test
iii. The more the subjects employed, the weaker the role of chance in
subsequent data analysis
iv. The more the merrier
v. It should be executed under conditions as identical as possible to the
conditions under which the standardized test will be administered

D. ITEM ANALYSIS

WHAT IS A GOOD ITEM?

a. Must be reliable and valid

b. Through a good item, it helps to discriminate test takers, the high scorers and low
scorers
ITEM ANALYSIS

c. A statistical technique which is used for selecting and rejecting the items of the
test on the basis of
vi. An index of the item’s difficulty
vii. An index of the item’s reliability
viii. An index of the item’s validity
ix. An index of item discrimination.

Downloaded by SOFIA ANN REDULLA

ITEM−DIFFICULTY INDEX

- An index of an item’s difficulty is obtained by calculating the proportion of the total number
of testtakers who answered the item correctly.

- Can be referred as item−endorsement index in other context, such as personality testing

ITEM−RELIABILITY INDEX

- The item−reliability index provides an indication of the internal consistency of a test; the
higher this index, the greater the test’s internal consistency.

ITEM−DISCRIMINATION INDEX
 The item−discrimination index is a measure of the difference between the proportion of high
scorers answering an item correctly and the proportion of low scorers answering the item
correctly.
 the higher the value of d, the greater the number of high scorers answering the item correctly

Downloaded by SOFIA ANN REDULLA

ITEM−CHARACTERISTIC
CURVES

 A graphic representation of item difficulty and discrimination

 The steeper the slope, the greater the item discrimination. An item may also vary in terms
of its difficulty level

OTHER CONSIDERATIONS IN ITEM ANALYSIS

1) GUESSING
2) ITEM FAIRNESS

a. A biased test item is an item that favors one particular group of examinees in relation
to another when differences in group ability are controlled
b. item−characteristic curves can be used to identify biased items
c. Choice of item-analysis method may affect determinations of item bias

Downloaded by SOFIA ANN REDULLA

3) SPEED TEST - Item analyses of tests taken under speed conditions yield misleading or
uninterpretable results. The closer an item is to the end of the test, the more difficult it may
appear to be. This is because testtakers simply may not get to items near the end of the test
before time runs out

QUALITATIVE ITEM ANALYSIS - A general term for various nonstatistical procedures designed to
explore how individual test items work
- Techniques of data collection and analysis that rely primarily on verbal (interviews,
observations, open−ended questionnaire rather than mathematical or statistical
procedures)

A. “THINK ALOUD” TEST ADMINISTRATION - Qualitative research tool designed to shed light on
the testtaker’s thought processes during the administration of a test. Useful in assessing
why and how testtakers are misinterpreting a particular item

B. EXPERT PANELS - May also provide qualitative analyses of test items

- A sensitivity review is a study of test items, typically conducted during the test
development process, in which items are examined for fairness to all prospective
testtakers and for the presence of offensive language, stereotypes, or situations

E. TEST REVISION AS A STAGE IN NEW TEST DEVELOPMENT

Some ways of approaching test revision:
- Characterize each item according to its strengths and weaknesses.
 Items with numerous weaknesses, items that are too hard, or too easy must be
eliminated or revised.
- Balance strengths and weaknesses across items.
 If the test is too easy, add difficult items or revise some items.
TEST REVISION IN THE LIFE CYCLE OF AN EXISTING TEST
When to revise an existing test?
1. The stimulus materials look dated and current test takers cannot relate to them
2. The verbal content of the test, including the administration instructions and the test
items, contains detailed vocabulary that is not readily understood by current test takers
3. As popular culture changes and words take new meanings, certain words or expressions
in the test items or directions may be perceived as inappropriate or even offensive to a
particular group and must therefore be changed
4. The test norms are no longer adequate as a result of group membership changes in the
population of potential test takers
5. The test norms are no longer adequate as a result of age−related shifts in the abilities
measured over time, and so an age extension of the norms (upward, downward, or in
both directions) is necessary.
6. The reliability or the validity of the test, as well as the effectiveness of the individual test
items, can be significantly improved by a revision
7. The theory on which the test was originally based has been improved significantly, and

Downloaded by SOFIA ANN REDULLA

these changes should be reflected in the design and the content of the test

CROSS−VALIDATION AND CO−VALIDATION

A. Cross-validation – refers to the revalidation of a test on a sample of testtakers other than

those on whom test performance was originally found to be a valid predictor of some
criterion.
 Validity shrinkage - the decrease in item validities that inevitably occurs after cross-
validation of findings

B. Co-validation - defined as a test validation process conducted on two or more tests using the
same sample of test takers
 Co-norming – process of co-validation in conjunction with the creation of norms or the
revision of existing norm.

Downloaded by SOFIA ANN REDULLA

Stages of Test Development
100% (7)
Stages of Test Development
3 pages
Chapter 8 Test Development
100% (1)
Chapter 8 Test Development
3 pages
MODULE 8: Test Development: PSY 112: Psychological Assessment
No ratings yet
MODULE 8: Test Development: PSY 112: Psychological Assessment
59 pages
Test Development
No ratings yet
Test Development
59 pages
Chapter 8
No ratings yet
Chapter 8
7 pages
Test Construction
100% (1)
Test Construction
3 pages
PsychAssess 5 TestDevelopment
No ratings yet
PsychAssess 5 TestDevelopment
4 pages
Chapter 8 Test Development
No ratings yet
Chapter 8 Test Development
4 pages
Astrology of Fate
100% (2)
Astrology of Fate
4 pages
V. Test Development 2
No ratings yet
V. Test Development 2
29 pages
Psychological Assessment Rationalization
No ratings yet
Psychological Assessment Rationalization
7 pages
Test Development Drill
No ratings yet
Test Development Drill
11 pages
Test Development
100% (1)
Test Development
5 pages
Test-Development-and-Administration (Edited)
No ratings yet
Test-Development-and-Administration (Edited)
5 pages
Assessment Trans Chapter 8
No ratings yet
Assessment Trans Chapter 8
8 pages
Learningprocess 2nd 171015153936
No ratings yet
Learningprocess 2nd 171015153936
44 pages
Test Construction and Development
No ratings yet
Test Construction and Development
3 pages
LECTURE 3 - Test Development - 044659
No ratings yet
LECTURE 3 - Test Development - 044659
15 pages
CHAPTER 8 Clavillas Garma Garcia, J. Layog
No ratings yet
CHAPTER 8 Clavillas Garma Garcia, J. Layog
41 pages
Test Construction and Development
No ratings yet
Test Construction and Development
3 pages
5 Test Development
No ratings yet
5 Test Development
30 pages
Test Development of Assessment
No ratings yet
Test Development of Assessment
26 pages
Testing 3.2
No ratings yet
Testing 3.2
4 pages
Midterms Psychological Assessment 1
No ratings yet
Midterms Psychological Assessment 1
13 pages
7 Test Development
No ratings yet
7 Test Development
24 pages
2021 (Part-2) Test Costruction or Item Composition
No ratings yet
2021 (Part-2) Test Costruction or Item Composition
26 pages
Item Analysis and Test Construction
100% (1)
Item Analysis and Test Construction
45 pages
Reporting - Test Development
No ratings yet
Reporting - Test Development
5 pages
Finals Psychass Reviewer
No ratings yet
Finals Psychass Reviewer
11 pages
Test Development - Falcutan
No ratings yet
Test Development - Falcutan
3 pages
PsychAssess 5 TestDevelopment
No ratings yet
PsychAssess 5 TestDevelopment
4 pages
Topic-12B-Test-Development-by-cohen 2
No ratings yet
Topic-12B-Test-Development-by-cohen 2
66 pages
Test Construction
No ratings yet
Test Construction
9 pages
7 - Test Development
No ratings yet
7 - Test Development
17 pages
Week 7 - Test Development
No ratings yet
Week 7 - Test Development
12 pages
PA - Test Contruction - Writing Items
No ratings yet
PA - Test Contruction - Writing Items
39 pages
Test Development Process Guide
50% (2)
Test Development Process Guide
25 pages
Finals Psych Ass Reviewer
No ratings yet
Finals Psych Ass Reviewer
43 pages
Format 2
No ratings yet
Format 2
15 pages
DLL - Gen Math Week 17
No ratings yet
DLL - Gen Math Week 17
4 pages
Constructing Test Items BARU
100% (1)
Constructing Test Items BARU
324 pages
Psychometric Assessmemt: Types of Psychometric Tests
No ratings yet
Psychometric Assessmemt: Types of Psychometric Tests
3 pages
RM - Questionnaire Design
No ratings yet
RM - Questionnaire Design
40 pages
Lecture 5, Test Construction Standardization
No ratings yet
Lecture 5, Test Construction Standardization
26 pages
Week13 - Ã Ä Renci
No ratings yet
Week13 - Ã Ä Renci
41 pages
Test Development
No ratings yet
Test Development
30 pages
Test Dev
No ratings yet
Test Dev
7 pages
PMI ACP Project Management Institute Agile Certified Practitioner Exam Study Guide 1st Edition Hunt Download
No ratings yet
PMI ACP Project Management Institute Agile Certified Practitioner Exam Study Guide 1st Edition Hunt Download
31 pages
Assignment HPGD2203 Educational Management January 2023 Semester - Specific Instruction
No ratings yet
Assignment HPGD2203 Educational Management January 2023 Semester - Specific Instruction
10 pages
Pa CH6
No ratings yet
Pa CH6
7 pages
Research Proposal
100% (1)
Research Proposal
13 pages
L2 Objective Test
No ratings yet
L2 Objective Test
20 pages
REVIEWER
No ratings yet
REVIEWER
8 pages
Detailed Lesson Plan in Science 3
No ratings yet
Detailed Lesson Plan in Science 3
21 pages
Habilidades Sociales en Adolescentes
No ratings yet
Habilidades Sociales en Adolescentes
13 pages
Test Construction
No ratings yet
Test Construction
35 pages
Test Development-SR
No ratings yet
Test Development-SR
9 pages
8.c.i. Philosophy - Final TOS Long
No ratings yet
8.c.i. Philosophy - Final TOS Long
1 page
SSC Exams Preparation
No ratings yet
SSC Exams Preparation
4 pages
Horan 2001
No ratings yet
Horan 2001
49 pages
Evaluation of Learning
No ratings yet
Evaluation of Learning
28 pages
Activities Guide and Evaluation Rubric - Pre-Task
No ratings yet
Activities Guide and Evaluation Rubric - Pre-Task
4 pages
Designing A Classroom Test: Anthony Paolo, PHD
No ratings yet
Designing A Classroom Test: Anthony Paolo, PHD
45 pages
WISC-IV Technical Report #1 Theoretical Model and Test Blueprint
No ratings yet
WISC-IV Technical Report #1 Theoretical Model and Test Blueprint
7 pages
Test Conceptualization: Norm-Referenced Vs Criterion-Referenced
No ratings yet
Test Conceptualization: Norm-Referenced Vs Criterion-Referenced
7 pages
Zero Analysis Belonging
No ratings yet
Zero Analysis Belonging
3 pages
Develop Personal Work Priorities: BSBPEF402
No ratings yet
Develop Personal Work Priorities: BSBPEF402
15 pages
The Relationship of Subordinate Upward Influencing Behaviour, Satisfaction and Perceived Superior Effectiveness With Leader-Member Exchanges
No ratings yet
The Relationship of Subordinate Upward Influencing Behaviour, Satisfaction and Perceived Superior Effectiveness With Leader-Member Exchanges
15 pages
3 Types of Lesson Plan & The Five Parts of A Lesson - Wiki
No ratings yet
3 Types of Lesson Plan & The Five Parts of A Lesson - Wiki
6 pages
Test Construction
No ratings yet
Test Construction
8 pages
Slide 6 - Test Construction and Adaptation
No ratings yet
Slide 6 - Test Construction and Adaptation
34 pages
23 v032 Multiple Meaning Match
No ratings yet
23 v032 Multiple Meaning Match
5 pages
Test Construction
No ratings yet
Test Construction
60 pages
Cornell Notes Template
No ratings yet
Cornell Notes Template
4 pages
Assessment For HDP
No ratings yet
Assessment For HDP
7 pages
EDUC3 Module 3
No ratings yet
EDUC3 Module 3
23 pages
Psych Assessment Unit VII
No ratings yet
Psych Assessment Unit VII
6 pages
School Name: SK Jelutong School Address: Jalan Jelutong 11600 Georgetown Pulau Pinang TEACHER'S NAME
No ratings yet
School Name: SK Jelutong School Address: Jalan Jelutong 11600 Georgetown Pulau Pinang TEACHER'S NAME
12 pages
History of Biographical Writing
0% (1)
History of Biographical Writing
5 pages
Unit 1 Test Development
No ratings yet
Unit 1 Test Development
57 pages
Understanding Test Validity Types
No ratings yet
Understanding Test Validity Types
3 pages
The Daffodils by Wordsworth - Short and Objective Answers
No ratings yet
The Daffodils by Wordsworth - Short and Objective Answers
2 pages
Item Writing Guidelines
No ratings yet
Item Writing Guidelines
74 pages
Text Analysis With R For Students of Literature
No ratings yet
Text Analysis With R For Students of Literature
1 page
Physio Psych Notes
No ratings yet
Physio Psych Notes
2 pages
Psych Assessment Unit V
No ratings yet
Psych Assessment Unit V
2 pages
Test Development
No ratings yet
Test Development
17 pages
Local Resource Mobilization Guide
No ratings yet
Local Resource Mobilization Guide
24 pages
Diploma of Professional Retraining
No ratings yet
Diploma of Professional Retraining
2 pages
Intelligence Test
No ratings yet
Intelligence Test
38 pages
College Essay Writing Tips
100% (2)
College Essay Writing Tips
4 pages
Test and Their Uses in Educational Assessment: Prepared By: Group II
No ratings yet
Test and Their Uses in Educational Assessment: Prepared By: Group II
46 pages
morphBOOK 2016
No ratings yet
morphBOOK 2016
170 pages
Social Work Notes
0% (1)
Social Work Notes
4 pages

Psych Assessment Unit VIII

Uploaded by

Psych Assessment Unit VIII

Uploaded by

TEST DEVELOPMENT

The Process of developing a test occurs in five stages: (CCTAR)

SOME PRELIMINARY QUESTIONS?

1. What is the test designed to measure? (construct of interest)

Another question to consider:

Downloaded by SOFIA ANN REDULLA

Scaling is the process of:

Downloaded by SOFIA ANN REDULLA

 What range of content should the items cover?

HOW TO DEVELOP AN ITEM POOL?

TYPES OF ITEM FORMAT

3 TYPES OF SELECTED−RESPONSE FORMAT

 MULTIPLE CHOICE FORMAT

Downloaded by SOFIA ANN REDULLA

3 TYPES OF CONSTRUCTED−RESPONSE ITEM

WRITING ITEMS FOR COMPUTER ADMINISTRATION

Floor effects (TOO HARD)

Ceiling effects (TOO EASY)

Downloaded by SOFIA ANN REDULLA

WHAT IS A GOOD ITEM?

a. Must be reliable and valid

Downloaded by SOFIA ANN REDULLA

- Can be referred as item−endorsement index in other context, such as personality testing

Downloaded by SOFIA ANN REDULLA

 A graphic representation of item difficulty and discrimination

OTHER CONSIDERATIONS IN ITEM ANALYSIS

Downloaded by SOFIA ANN REDULLA

B. EXPERT PANELS - May also provide qualitative analyses of test items

E. TEST REVISION AS A STAGE IN NEW TEST DEVELOPMENT

Downloaded by SOFIA ANN REDULLA

CROSS−VALIDATION AND CO−VALIDATION

A. Cross-validation – refers to the revalidation of a test on a sample of testtakers other than

Downloaded by SOFIA ANN REDULLA

You might also like