Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views11 pages

6507 2

The document discusses the differences between aptitude tests and achievement tests, highlighting that achievement tests assess knowledge of previously learned material while aptitude tests measure potential for future learning. It emphasizes the importance of classroom assessments in promoting learning through corrective instruction and effective feedback, as well as the need for teachers to adapt their assessment strategies. Additionally, it contrasts norm-referenced and criterion-referenced grading, noting their different purposes and implications for student performance evaluation.

Uploaded by

hasnain1079363
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views11 pages

6507 2

The document discusses the differences between aptitude tests and achievement tests, highlighting that achievement tests assess knowledge of previously learned material while aptitude tests measure potential for future learning. It emphasizes the importance of classroom assessments in promoting learning through corrective instruction and effective feedback, as well as the need for teachers to adapt their assessment strategies. Additionally, it contrasts norm-referenced and criterion-referenced grading, noting their different purposes and implications for student performance evaluation.

Uploaded by

hasnain1079363
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

Course: Educational Measurement and Evaluation (6507)

Semester: Spring, 2020

Assignment No. 2
Q.1 What are the uses of aptitude test? How aptitude tests are different from achievement test? Explain
in detail.
Achievement Tests
Achievement tests are designed to assess a test taker’s knowledge in certain academic areas. If you think about
the word achievement, that is precisely what these kinds of tests measure. An achievement test will measure
your achievement or mastery of content, skill or general academic knowledge.
Achievement tests can be both standardized and formal, but they can also be summative, non-standardized
assessments given in class. Either way, these types of assessments will measure your achievement or mastery
of the content. This type of assessment focuses on your previous learning and knowledge.
Non-Standardized Achievement Tests
Examples of less formal, non-standardized achievement tests include a cumulative final exam in a psychology
class or an end-of-course assessment in math. Another example would be a comprehensive assessment of
Spanish II at the end of the year. An assessment like this may include a written and speaking portion of the
exam. This type of achievement test will measure whether or not you mastered the content of the course.
These informal assessments measure a student’s achievement in specific academic areas. They may determine
promotion to the next grade level or determine pass or fail of a certain subject area. They may also measure a
student’s current level of ability in the subject area by demonstrating skills through performance measures. An
example may be a performance assessment in a language that you are studying.
Lastly, a non-standardized achievement test may be a specific skill demonstration to determine your ability in
martial arts or athletic skill. For example, athletes hoping to get recruited for a college football team will
perform a series of achievement tests such as sprints, jumps and agility. These assessments will measure
and highlight their athletic ability. When you look at achievement vs. aptitude vs. ability, you get different
results.
Standardized Achievement Tests
Standardized achievement tests differ from informal types of achievement tests because these are standardized
to measure specific things. These types of tests can only be administered by individuals who have been trained
to do so. Also, results for these tests are often compared across the age and grade level of test takers.
A standardized test includes the same format, same types of questions and the same content no matter when or
where the test is administered or who is taking the test. Standardized tests share the common characteristic of
being measurable and quantifiable. Scores from standardized tests are quantifiable and result in a numeric
measure, often a percentile, percentage or grade equivalency.
Examples of standardized achievement tests include the Woodcock-Johnson Tests of Achievement (WJ), the
Peabody Individual Achievement Test (PIAT-R) and the Wechsler Individual Achievement Test (WIAT).
Other, more familiar standardized achievement tests include the ACT and the SAT.
Why Use Achievement Tests?
The specific results obtained from an achievement test are most commonly used for admissions or high school
placement. Colleges and universities can rely on these measures for accuracy mainly because the results are
standardized.
Achievement tests may also be used for scholarship applications or acceptance into honor societies. Because
these tests focus on mastery of previously learned material and content, schools rely on them as an indicator of
academic past and future success.
Aptitude Tests
While the achievement tests measure a test taker’s level of knowledge or mastery of specific content, the
aptitude test measures a test taker’s potential for future learning. In this instance, think of the word aptitude,

1
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

which is defined as a person’s natural ability to learn a skill or perform a task. Additionally, this type of
assessment measures a student’s current and potential ability to perform certain tasks.
Aptitude tests measure a test taker’s natural talents or abilities and can serve as a guide for future planning.
These types of assessments may include a series of questions in which a test taker simply makes a value
judgement, to agree or disagree, and the results may show what types of career paths they would be suited for.
These tests may also ask test takers to indicate preferences.
Other types of aptitude tests include personality inventories. These types of assessments will indicate the
personal preferences and interpersonal strengths and weaknesses of the test taker. These tests may also measure
a test taker’s ability to solve complex problems or future abilities to perform certain tasks.
Why Use Aptitude Tests?
For high school students, aptitude tests are widely used to help them determine a path of future study. For
example, an aptitude test may indicate a test taker is an extrovert and enjoys public speaking. For another test
taker, it may indicate a strength in complex reasoning and problem solving.
These results will align with certain areas of study and professional careers. This may be especially helpful for
students who are not sure what they want to study or what type of post-high school career they want to pursue.
Aptitude Tests and IQ Tests
The main difference between aptitude tests and IQ tests is the focus of the tests. IQ tests measure a very broad
range of abilities and the results indicate a person’s general intelligence. You might think of this test as a
shallow assessment of a broad range of items.
The aptitude test, however, measures a much narrower range of abilities. This test uses a specific set of
parameters to go in-depth into certain areas of skill. While this test is very specific, it is important to understand
that it is limited in what it can predict.
Achievement vs. Aptitude
Similarities
The most significant similarities between an achievement test and an aptitude test is that both can be
standardized. Results from both can be used to determine strengths, abilities and parts of intelligence in test
takers.
Differences
The main difference between achievement tests and aptitude tests is the way the tests value previously learned
material. The achievement test specifically assesses a test taker’s mastery of previously learned material.
However, the aptitude test essentially disregards the information previously learned by the test taker. In other
words, achievement tests measure past learning, and aptitude tests measure future potential.
Q.2 How classroom testing can be used to promote learning? Develop a protocol for affective classroom
techniques.
Teachers who develop useful assessments, provide corrective instruction, and give students second
chances to demonstrate success can improve their instruction and help students learn.
Large-scale assessments, like all assessments, are designed for a specific purpose. Those used in most states
today are designed to rank-order schools and students for the purposes of accountability—and some do so fairly
well. But assessments designed for ranking are generally not good instruments for helping teachers improve
their instruction or modify their approach to individual students. First, students take them at the end of the
school year, when most instructional activities are near completion. Second, teachers don't receive the results
until two or three months later, by which time their students have usually moved on to other teachers. And third,
the results that teachers receive usually lack the level of detail needed to target specific improvements (Barton,
2002; Kifer, 2001).

2
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

The assessments best suited to guide improvements in student learning are the quizzes, tests, writing
assignments, and other assessments that teachers administer on a regular basis in their classrooms. Teachers
trust the results from these assessments because of their direct relation to classroom instructional goals. Plus,
results are immediate and easy to analyze at the individual student level. To use classroom assessments to make
improvements, however, teachers must change both their view of assessments and their interpretation of results.
Specifically, they need to see their assessments as an integral part of the instruction process and as crucial for
helping students learn.
Despite the importance of assessments in education today, few teachers receive much formal training in
assessment design or analysis. A recent survey showed, for example, that fewer than half the states require
competence in assessment for licensure as a teacher (Stiggins, 1999). Lacking specific training, teachers rely
heavily on the assessments offered by the publisher of their textbooks or instructional materials. When no
suitable assessments are available, teachers construct their own in a haphazard fashion, with questions and essay
prompts similar to the ones that their teachers used. They treat assessments as evaluation devices to administer
when instructional activities are completed and to use primarily for assigning students' grades.
To use assessments to improve instruction and student learning, teachers need to change their approach to
assessments in three important ways.
Make Assessments Useful
For Students
Nearly every student has suffered the experience of spending hours preparing for a major assessment, only to
discover that the material that he or she had studied was different from what the teacher chose to emphasize on
the assessment. This experience teaches students two un-fortunate lessons. First, students realize that hard work
and effort don't pay off in school because the time and effort that they spent studying had little or no influence
on the results. And second, they learn that they cannot trust their teachers (Guskey, 2000a). These are hardly the
lessons that responsible teachers want their students to learn.
Nonetheless, this experience is common because many teachers still mistakenly believe that they must keep
their assessments secret. As a result, students come to regard assessments as guessing games, especially from
the middle grades on. They view success as depending on how well they can guess what their teachers will ask
on quizzes, tests, and other assessments. Some teachers even take pride in their ability to out-guess students.
They ask questions about isolated concepts or obscure understandings just to see whether students are reading
carefully. Generally, these teachers don't include such “gotcha” questions maliciously, but rather—often
unconsciously—because such questions were asked of them when they were students.
Classroom assessments that serve as meaningful sources of information don't surprise students. Instead, these
assessments reflect the concepts and skills that the teacher emphasized in class, along with the teacher's clear
criteria for judging students' performance. These concepts, skills, and criteria align with the teacher's
instructional activities and, ideally, with state or district standards. Students see these assessments as fair
measures of important learning goals. Teachers facilitate learning by providing students with important
feedback on their learning progress and by helping them identify learning problems (Bloom, Madaus, &
Hastings, 1981; Stiggins, 2002).
Critics sometimes contend that this approach means “teaching to the test.” But the crucial issue is, What
determines the content and methods of teaching? If the test is the primary determinant of what teachers teach
and how they teach it, then we are indeed “teaching to the test.” But if desired learning goals are the foundation
of students' instructional experiences, then assessments of student learning are simply extensions of those same
goals. Instead of “teaching to the test,” teachers are more accurately “testing what they teach.” If a concept or
skill is important enough to assess, then it should be important enough to teach. And if it is not important
enough to teach, then there's little justification for assessing it.

3
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

For Teachers
The best classroom assessments also serve as meaningful sources of information for teachers, helping them
identify what they taught well and what they need to work on. Gathering this vital information does not require
a sophisticated statistical analysis of assessment results. Teachers need only make a simple tally of how many
students missed each assessment item or failed to meet a specific criterion. State assessments sometimes
provide similar item-by-item information, but concerns about item security and the cost of developing new
items each year usually make assessment developers reluctant to offer such detailed information. Once teachers
have made specific tallies, they can pay special attention to the trouble spots—those items or criteria missed by
large numbers of students in the class.
In reviewing these results, the teacher must first consider the quality of the item or criterion. Perhaps the
question is ambiguously worded or the criterion is unclear. Perhaps students mis-interpreted the question.
Whatever the case, teachers must determine whether these items adequately address the knowledge,
understanding, or skill that they were intended to measure.
If teachers find no obvious problems with the item or criterion, then they must turn their attention to their
teaching. When as many as half the students in a class answer a clear question incorrectly or fail to meet a
particular criterion, it's not a student learning problem—it's a teaching problem. Whatever teaching strategy was
used, whatever examples were employed, or whatever explanation was offered, it simply didn't work.
Analyzing assessment results in this way means setting aside some powerful ego issues. Many teachers may
initially say, “I taught them. They just didn't learn it!” But on reflection, most recognize that their effectiveness
is not defined on the basis of what they do as teachers but rather on what their students are able to do. Can
effective teaching take place in the absence of learning? Certainly not.
Some argue that such a perspective puts too much responsibility on teachers and not enough on students.
Occasionally, teachers respond, “Don't students have responsibilities in this process? Shouldn't students display
initiative and personal accountability?”
Indeed, teachers and students share responsibility for learning. Even with valiant teaching efforts, we cannot
guarantee that all students will learn everything excellently. Only rarely do teachers find items or assessment
criteria that every student answers correctly. A few students are never willing to put forth the necessary effort,
but these students tend to be the exception, not the rule. If a teacher is reaching fewer than half of the students in
the class, the teacher's method of instruction needs to improve. And teachers need this kind of evidence to help
target their instructional improvement efforts.
Follow Assessments with Corrective Instruction
If assessments provide information for both students and teachers, then they cannot mark the end of learning.
Instead, assessments must be followed by high-quality, corrective instruction designed to remedy whatever
learning errors the assessment identified (see Guskey, 1997). To charge ahead knowing that students have not
learned certain concepts or skills well would be foolish. Teachers must therefore follow their assessments with
instructional alternatives that present those concepts in new ways and engage students in different and more
appropriate learning experiences.
High-quality, corrective instruction is not the same as reteaching, which often consists simply of restating the
original explanations louder and more slowly. Instead, the teacher must use approaches that accommodate
differences in students' learning styles and intelligences (Sternberg, 1994). Although teachers generally try to
incorporate different teaching approaches when they initially plan their lessons, corrective instruction involves
extending and strengthening that work. In addition, those students who have few or no learning errors to correct
should receive enrichment activities to help broaden and expand their learning. Materials designed for gifted
and talented students provide an excellent resource for such activities.

4
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

Developing ideas for corrective instruction and enrichment activities can be difficult, especially if teachers
believe that they must do it alone, but structured professional development opportunities can help teachers share
strategies and collaborate on teaching techniques (Guskey, 1998, 2000b). Faculty meetings devoted to
examining classroom assessment results and developing alternative strategies can be highly effective. District-
level personnel and collaborative partnerships with local colleges and universities offer wonderful resources for
ideas and practical advice.
Occasionally, teachers express concern that if they take time to offer corrective instruction, they will sacrifice
curriculum coverage. Because corrective work is initially best done during class and under the teacher's
direction, early instructional units will typically involve an extra class period or two. Teachers who ask students
to complete corrective work independently, outside of class, generally find that those students who most need to
spend time on corrective work are the least likely to do so.
As students become accustomed to this corrective process and realize the personal benefits it offers, however,
the teacher can drastically reduce the amount of class time allocated to such work and accomplish much of it
through homework assignments or in special study sessions before or after school. And by not allowing minor
errors to become major learning problems, teachers better prepare students for subsequent learning tasks,
eventually need less time for corrective work (Whiting, Van Burgh, & Render, 1995), and can proceed at a
more rapid pace in later learning units. By pacing their instructional units more flexibly, most teachers find that
they need not sacrifice curriculum coverage to offer students the benefits of corrective instruction.
Q.3 Explain the purpose and use of norm reference and criterion-reference grading. Criticize the
existing grading procedures used in our context at secondary level.
Due to the recent and unprecedented emphasis on educational accountability, assessment selection has become
an important consideration. There are various types of assessments that can be used to measure student
performance. Criterion- Referenced Tests (CRT) and Norm-Referenced Tests (NRT) are two types of
assessments that measure performance, but relative to different criteria. Additionally, scores are reported in
different formats, interpreted differently and target different content.
Difference Between NRT and CRT
Tests based on norms measure the performance of a group of test takers against the performance of another
group of test takers. This type of assessment result can used to compare the performance of seventh graders in a
particular school system to the performance of a broader, and perhaps more diverse (nationally or state-wide),
group of seventh graders. Criterion based tests measure the performance of test takers relative to particular
criteria covered in the curriculum. In other words, CRT test scores can be used to determine if the test taker has
met program objectives.
Pros and Cons
The advantages and disadvantages of norm referenced tests vs criterion referenced tests depends on the purpose
and objective of testing. Norm referenced tests may measure the acquisition of skills and knowledge from
multiple sources such as notes, texts and syllabi. Criterion referenced tests measure performance on specific
concepts and are often used in a pre-test / post-test format. These tests can also be used to determine if
curriculum goals have been met. The content of NRT is much broader and superficial than the content measured
by CRT.
Differing Methods of Test Administration
Norm referenced tests must be administrated in a standardized format, while criterion referenced tests do not
necessitate a standard administration. Since norm referenced tests measure the performance of test takers to
other test takers, it is essential that testing conditions closely match those of the norm setting test takers.
Therefore, the test administration is scripted. This is in sharp contrast to criterion referenced testing
administration.

5
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

Score Reporting and Interpretation


Scores are reported differently for criterion referenced and norm referenced tests. Criterion referenced test
results are reported in categories or range. For instance, performance may be reported as not proficient,
proficient or very proficient. The interpretation of this performance is obvious and directly related to the
acquisition of stated curriculum objectives. The reporting of results for a norm referenced test is accomplished
by a percentile rank. A test taker who scores in the 95th percentile has performed better than 95% of the
individuals taking the test. In general, scoring at the 50th percentile is average and indicates that the test taker
has scored better than 50% of the individuals testing.
Criterion-referenced tests compare a person’s knowledge or skills against a predetermined standard, learning
goal, performance level, or other criterion. With criterion-referenced tests, each person’s performance is
compared directly to the standard, without considering how other students perform on the test. Criterion-
referenced tests often use “cut scores” to place students into categories such as “basic,” “proficient,” and
“advanced.” If you’ve ever been to a carnival or amusement park, think about the signs that read “You must be
this tall to ride this ride!” with an arrow pointing to a specific line on a height chart. The line indicated by the
arrow functions as the criterion; the ride operator compares each person’s height against it before allowing them
to get on the ride.
Note that it doesn’t matter how many other people are in line or how tall or short they are; whether or not you’re
allowed to get on the ride is determined solely by your height. Even if you’re the tallest person in line, if the top
of your head doesn’t reach the line on the height chart, you can’t ride.
Criterion-referenced assessments work similarly: An individual’s score, and how that score is categorized, is not
affected by the performance of other students. In the charts below, you can see the student’s score and
performance category (“below proficient”) do not change, regardless of whether they are a top-performing
student, in the middle, or a low-performing student. This means knowing a student’s score for a criterion-
referenced test will only tell you how that specific student compared in relation to the criterion, but not whether
they performed below-average, above-average, or average when compared to their peers. Norm-referenced
measures compare a person’s knowledge or skills to the knowledge or skills of the norm group. The
composition of the norm group depends on the assessment. For student assessments, the norm group is often a
nationally representative sample of several thousand students in the same grade (and sometimes, at the same
point in the school year). Norm groups may also be further narrowed by age, English Language Learner (ELL)
status, socioeconomic level, race/ethnicity, or many other characteristics. One norm-referenced measure that
many families are familiar with is the baby weight growth charts in the pediatrician’s office, which show which
percentile a child’s weight falls in. A child in the 50th percentile has an average weight; a child in the 75th
percentile weighs more than 75% of the babies in the norm group and the same as or less than the heaviest 25%
of babies in the norm group; and a child in the 25th percentile weighs more than 25% of the babies in the norm
group and the same as or less than 75% of them. It’s important to note that these norm-referenced measures do
not say whether a baby’s birth weight is “healthy” or “unhealthy,” only how it compares with the norm group.
For example, a baby who weighed 2,600 grams at birth would be in the 7th percentile, weighing the same as or
less than 93% of the babies in the norm group. However, despite the very low percentile, 2,600 grams is
classified as a normal or healthy weight for babies born in the United States—a birth weight of 2,500 grams is
the cut-off, or criterion, for a child to be considered low weight or at risk. (For the curious, 2,600 grams is about
5 pounds and 12 ounces.) Thus, knowing a baby’s percentile rank for weight can tell you how they compare
with their peers, but not if the baby’s weight is “healthy” or “unhealthy.”
Norm-referenced assessments work similarly: An individual student’s percentile rank describes their
performance in comparison to the performance of students in the norm group, but does not indicate whether or
not they met or exceed a specific standard or criterion.

6
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

In the charts below, you can see that, while the student’s score doesn’t change, their percentile rank does change
depending on how well the students in the norm group performed. When the individual is a top-performing
student, they have a high percentile rank; when they are a low-performing student, they have a low percentile
rank. What we can’t tell from these charts is whether or not the student should be categorized as proficient or
below proficient.

Q.4 Highlight the pros and cons of mainstreaming at secondary level in our local context.
Many students with special needs are placed into a self-contained classroom or multi-classroom program in
which they learn alongside peers who have disabilities as well. This is sometimes referred to by the number of
student to teaching staff ratio, such as a 12:1:1 classroom environment; 12 students, 1 teaching assistant, 1
teacher. Placing students with special needs into the regular education classroom is known as mainstreaming.
[caption id="attachment_130320” align="aligncenter” width="640”]
Pros of Mainstreaming
Social Advantages: Students get to receive their education with their non-disabled peers who are the same age
as them. By doing so, students get to interact with their peers in ways that the special education classroom
wouldn’t do. Many students with special needs often have an identified need to improve their social skills.
Placing them into classes with a diverse group of students can certainly help increase those skills. It also helps
self-esteem as well, because the students know that they are in “regular” education classes with their peers. No
matter how hard we work to break down walls and build acceptance, the social stigma of being different still
exists. By blending students of differing abilities into one classroom, not only does it help the students with
special needs, but it also helps the regular education students as well, by teaching them how to work with others
who are different from them. It teaches all students compassion, acceptance, collaboration and patience, life-
long skills that will better prepare them for the future. Academic Advantages: Another advantage of
mainstreaming is that the students are receiving the same curricula material as their non-disabled peers.
Although they may receive accommodations and modifications to the curriculum, they are still learning what
everyone else is learning. It gives these students a chance to learn something that they may not have had a
chance to learn in a special education classroom. Tolerance: If classrooms aren’t mainstreamed, then a great
majority of the student population will not be exposed to students with special needs. This means that they will
never get to learn or promote the kind of tolerance that will carry with them through adulthood. Mainstreaming
special needs students with the rest of the population exposes all students to all types of people, whether they
have disorders or not. As the other students learn tolerance, the students with special needs will learn what
behaviors are acceptable and which ones aren’t.
Cons of Mainstreaming
While there can be many benefits, there can also be downsides to running a mainstream classroom. Social
Disadvantages: Some students with special needs have behavioral issues that will need to be addressed in the
classroom. These issues are not only disruptive to the rest of the class, but can also be embarrassing to the
student, causing more damage to their self-esteem and social world than would happen if the student was not
mainstreamed. Academic Disadvantages: While the students with special needs are able to use the same
curricula as students without special needs, they may not be able to keep up with the work. This can result in
them feeling like the odd man out. The extra effort that teachers have to put into ensuring everyone understands
the work may also take away from the rest of the classroom. This can impact the pace of the classroom as a
whole. While some mainstreamed students with special needs will have pull-outs into a resource room or some
other means of individualized tutoring, any slowdown in the classroom pace that can impact reaching specific
goals is a concern. Tolerance: Tolerance is a wonderful thing to learn, but it can also backfire. Students who do
not have special needs may be under the impression that the student with special needs “gets away” with more

7
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

than the rest of the class because of his or her disability. This can lead to resentment and it can also lead to the
other students acting out.
Weighing the Pros and Cons
You’ve looked at the pros and cons. Mainstreaming offers enough of both for those involved to be able to form
a clear and informed opinion on what is the right path for a particular student. As stated before, more and more
students with special needs are being placed into regular education classes because of a general belief that it is
the best placement for them, based on their needs. As with anything, this placement comes with a lot of work
for the students, parents, and teachers involved in the process. The IEP team needs to make the decision based
on what is best for the student. The decision needs to be carefully thought out, and if the student is
mainstreamed, they need to be carefully monitored and also need to make sure that they have all they need to be
successful in the mainstream classroom. The pros and cons need to continue to be weighed so that the plan
works to the benefit of the student and does not cause a decrease in achieving the academic goals of either the
individual or of the other students in the class.
Q.5 Which statistics are used to make comparisons under norm reference grading system? Explain with
examples.
Norm-referenced refers to standardized tests that are designed to compare and rank test takers in relation to
one another. Norm-referenced tests report whether test takers performed better or worse than a hypothetical
average student, which is determined by comparing scores against the performance results of a statistically
selected group of test takers, typically of the same age or grade level, who have already taken the exam.
Calculating norm-referenced scores is called the “norming process,” and the comparison group is known as the
“norming group.” Norming groups typically comprise only a small subset of previous test takers, not all or even
most previous test takers. Test developers use a variety of statistical methods to select norming groups, interpret
raw scores, and determine performance levels.
Norm-referenced scores are generally reported as a percentage or percentile ranking. For example, a student
who scores in the seventieth percentile performed as well or better than seventy percent of other test takers of
the same age or grade level, and thirty percent of students performed better (as determined by norming-group
scores).
Norm-referenced tests often use a multiple-choice format, though some include open-ended, short-answer
questions. They are usually based on some form of national standards, not locally determined standards
or curricula. IQ tests are among the most well-known norm-referenced tests, as are developmental-screening
tests, which are used to identify learning disabilities in young children or determine eligibility for special-
education services. A few major norm-referenced tests include the California Achievement Test, Iowa Test of
Basic Skills, Stanford Achievement Test, and TerraNova.
The following are a few representative examples of how norm-referenced tests and scores may be used:
 To determine a young child’s readiness for preschool or kindergarten. These tests may be designed to
measure oral-language ability, visual-motor skills, and cognitive and social development.
 To evaluate basic reading, writing, and math skills. Test results may be used for a wide variety of
purposes, such as measuring academic progress, making course assignments, determining readiness for
grade promotion, or identifying the need for additional academic support.
 To identify specific learning disabilities, such as autism, dyslexia, or nonverbal learning disability, or to
determine eligibility for special-education services.
 To make program-eligibility or college-admissions decisions (in these cases, norm-referenced scores are
generally evaluated alongside other information about a student). Scores on SAT or ACT exams are a
common example.

8
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

Norm-Referenced vs. Criterion-Referenced Tests


Norm-referenced tests are specifically designed to rank test takers on a “bell curve,” or a distribution of scores
that resembles, when graphed, the outline of a bell—i.e., a small percentage of students performing well, most
performing average, and a small percentage performing poorly. To produce a bell curve each time, test
questions are carefully designed to accentuate performance differences among test takers, not to determine if
students have achieved specified learning standards, learned certain material, or acquired specific skills and
knowledge. Tests that measure performance against a fixed set of standards or criteria are called criterion-
referenced tests.
Criterion-referenced test results are often based on the number of correct answers provided by students, and
scores might be expressed as a percentage of the total possible number of correct answers. On a norm-
referenced exam, however, the score would reflect how many more or fewer correct answers a student gave in
comparison to other students. Hypothetically, if all the students who took a norm-referenced test performed
poorly, the least-poor results would rank students in the highest percentile. Similarly, if all students performed
extraordinarily well, the least-strong performance would rank students in the lowest percentile.
It should be noted that norm-referenced tests cannot measure the learning achievement or progress of an entire
group of students, but only the relative performance of individuals within a group. For this reason, criterion-
referenced tests are used to measure whole-group performance.
Reform
Norm-referenced tests have historically been used to make distinctions among students, often for the purposes
of course placement, program eligibility, or school admissions. Yet because norm-referenced tests are designed
to rank student performance on a relative scale—i.e., in relation to the performance of other students—norm-
referenced testing has been abandoned by many schools and states in favor of criterion-referenced tests, which
measure student performance in relation to common set of fixed criteria or standards.
It should be noted that norm-referenced tests are typically not the form of standardized test widely used to
comply with state or federal policies—such as the No Child Left Behind Act—that are intended to measure
school performance, close “achievement gaps,” or hold schools accountable for improving student learning
results. In most cases, criterion-referenced tests are used for these purposes because the goal is to determine
whether schools are successfully teaching students what they are expected to learn.
Similarly, the assessments being developed to measure student achievement of the Common Core State
Standards are also criterion-referenced exams. However, some test developers promote their norm-referenced
exams—for example, the TerraNova Common Core—as a way for teachers to “benchmark” learning progress
and determine if students are on track to perform well on Common Core–based assessments.
Debate
While norm-referenced tests are not the focus of ongoing national debates about “high-stakes testing,” they are
nonetheless the object of much debate. The essential disagreement is between those who view norm-referenced
tests as objective, valid, and fair measures of student performance, and those who believe that relying on
relative performance results is inaccurate, unhelpful, and unfair, especially when making important educational
decisions for students. While part of the debate centers on whether or not it is ethically appropriate, or even
educationally useful, to evaluate individual student learning in relation to other students (rather than evaluating
individual performance in relation to fixed and known criteria), much of the debate is also focused on whether
there is a general overreliance on standardized-test scores in the United States, and whether a single test, no
matter what its design, should be used—in exclusion of other measures—to evaluate school or student
performance.
It should be noted that perceived performance on a standardized test can potentially be manipulated, regardless
of whether a test is norm-referenced or criterion-referenced. For example, if a large number of students are

9
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

performing poorly on a test, the performance criteria—i.e., the bar for what is considered “passing” or
“proficient”—could be lowered to “improve” perceived performance, even if students are not learning more or
performing better than past test takers. For example, if a standardized test administered in eleventh grade uses
proficiency standards that are considered to be equivalent to eighth-grade learning expectations, it will appear
that students are performing well, when in fact the test has not measured learning achievement at a level
appropriate to their age or grade. For this reason, it is important to investigate the criteria used to determine
“proficiency” on any given test—and particularly when a test is considered “high stakes,” since there is greater
motivation to manipulate perceived test performance when results are tied to sanctions, funding reductions,
public embarrassment, or other negative consequences.
The following are representative of the kinds of arguments typically made by proponents of norm-referenced
testing:
 Norm-referenced tests are relatively inexpensive to develop, simple to administer, and easy to score. As
long as the results are used alongside other measures of performance, they can provide valuable
information about student learning.
 The quality of norm-referenced tests is usually high because they are developed by testing experts,
piloted, and revised before they are used with students, and they are dependable and stable for what they
are designed to measure.
 Norm-referenced tests can help differentiate students and identify those who may have specific
educational needs or deficits that require specialized assistance or learning environments.
 The tests are an objective evaluation method that can decrease bias or favoritism when making
educational decisions. If there are limited places in a gifted and talented program, for example, one
transparent way to make the decision is to give every student the same test and allow the highest-scoring
students to gain entry.
The following are representative of the kinds of arguments typically made by critics of norm-referenced testing:
 Although testing experts and test developers warn that major educational decisions should not be made
on the basis of a single test score, norm-referenced scores are often misused in schools when making
critical educational decisions, such as grade promotion or retention, which can have potentially harmful
consequences for some students and student groups.
 Norm-referenced tests encourage teachers to view students in terms of a bell curve, which can lead them
to lower academic expectations for certain groups of students, particularly special-needs students, English-
language learners, or minority groups. And when academic expectations are consistently lowered year
after year, students in these groups may never catch up to their peers, creating a self-fulfilling prophecy.
For a related discussion, see high expectations.
 Multiple-choice tests—the dominant norm-referenced format—are better suited to measuring
remembered facts than more complex forms of thinking. Consequently, norm-referenced tests promote rote
learning and memorization in schools over more sophisticated cognitive skills, such as writing, critical
reading, analytical thinking, problem solving, or creativity.
 Overreliance on norm-referenced test results can lead to inadvertent discrimination against minority
groups and low-income student populations, both of which tend to face more educational obstacles that
non-minority students from higher-income households. For example, many educators have argued that the
overuse of norm-referenced testing has resulted in a significant overrepresentation of minority students in
special-education programs. On the other hand, using norm-referenced scores to determine placement in
gifted and talented programs, or other “enriched” learning opportunities, leads to the underrepresentation of
minority and lower-income students in these programs. Similarly, students from higher-income households

10
Course: Educational Measurement and Evaluation (6507)
Semester: Spring, 2020

may have an unfair advantage in the college-admissions process because they can afford expensive test-
preparation services.
 An overreliance on norm-referenced test scores undervalues important achievements, skills, and abilities
in favor of the more narrow set of skills measured by the tests.

11

You might also like