0 ratings 0% found this document useful (0 votes) 24 views 8 pages Lesson 3
The document discusses performance-based assessment in language teaching, emphasizing the importance of interactive tasks and authentic evaluation methods. It highlights the shift from traditional standardized tests to alternative assessments that better reflect students' communicative abilities and intelligence. Additionally, it addresses the rise of computer-based testing and its implications for assessment practices in education.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
HNPTER | Testing, Assessing, and Teaching 11
written production, open-ended responses, integrated performance (across skill
areas), group performance, and other interactive tasks. To be sure, such assessment
is time-consuming and therefore expensive, but those extra efforts are paying off in
the form of more direct testing because students are assessed as they perform actual
or simulated real-world tasks. In technical terms, higher content validity (see
‘Chapter 2 for an explanation) is achieved because learners are measured in the
process of performing the targeted linguistic acts.
In an English language-teaching context, performance-based assessment means
that you may have a difficult time distinguishing between formal and informal
assessment. If you rely a little less on formally structured tests and a little more on
evaluation while students are performing various tasks, you will be taking some
steps toward meeting the goals of performance-based testing. (See Chapter 10 for a
further discussion of performance-based assessment.)
A characteristic of many (but not all) performance-based language assessments
Is the presence of interactive tasks. In such cases, the assessments involve learners
in actually performing the behavior that we want to measure. In interactive tasks,
testiakers are measured in the act of speaking, requesting, responding, or in com-
ining listening and speaking, and in integrating reading and writing. Paper-and-
pencil tests certainly do not elicit such communicative performance
‘A prime example of an interactive language assessment procedure is an oral
interview. The testtaker is required to listen accurately to someone else and 10
respond appropriately. If care is taken in the test design process, language elicited
and volunteered by the student can be personalized and meaningful, and tasks can
approach the authenticity of reallife language use (see Chapter 7),
CURRENT ISSUES IN CLASSROOM TESTING
The design of communicative, performance-based assessment rubrics continues to
challenge hoth assessment experts and classroom teachers. Such efforts to improve
various facets of classroom testing are accompanied by some stimulating issues, all
of which are helping to shape our current understanding of effective assessment.
Let's look at three such issues; the effect of new theories of intelligence on the
testing industry; the advent of what has come to be called “alternative”assessment;
and the increasing popularity of computer-based testing.
New Views on Intelligence
Intelligence was once viewed strictly as the ability to perform (a) linguistic and (b)
logical-mathematical problem solving. This “IQ* (intelligence quotient) concept of
intelligence has permeated the Western world and its way of testing for almost a
century. Since “smartness” in general is measured by timed, discrete-point tests con-
sisting of a hierarchy of separate items, why shouldn't every field of study be so mea-
sured? For many years, we have lived in a world of standardized, norm-referenced12
omrree 1 Testing, Assessing. and Teaching
tests that are timed in a multiple-choice format consisting of a multiplicity of logic-
constrained items, many of which are inauthentic.
However, research on intelligence by psychologists like Howard Gardner,
Robert Sternberg, and Daniel Goleman has begun to tum the psychometric world
upside down. Gardner (1983, 1999), for example, extended the traditional view of
intelligence to seven different components.’ He accepted the traditional conceptu-
alizations of linguistic intelligence and logical- mathematical intelligence on which
standardized IQ tests are based, but he included five other “frames of mind” in his
theory of multiple intelligences:
+ spatial intelligence (the ability to find your way around an environment, to
form mental images of reality)
+ musical intelligence (the ability to perceive and create pitch and rhythmic
patterns)
+ bodily-kinesthetic intelligence (fine motor movement, athletic prowess)
+ interpersonal intelligence (the ability to understand others and how they
feel, and to interact effectively with them)
+ intrapersonal intelligence (the ability to understand oneself and to develop a
sense of selfidentity)
Robert Sternberg (1988, 1997) also charted new territory in intelligence re-
search in recognizing creative thinking and manipulative strategies as part of intel-
ligence. All “smart” people aren't necessarily adept at fast, reactive thinking. They
may be very innovative in being able to think beyond the normal limits imposed
existing tests, but they may need a good deal of processing time to enact this ct
ativity. Other forms of smartness are found in those who know how to manipulate
their environment, namely, other people. Debaters, politicians, successful salesper-
sons, smooth talkers,and con artists are all smart in their manipulative ability to per-
suade others to think their way, vote for them, make a purchase, or do something
they might not otherwise do.
More recently, Daniel Goleman’s (1995) concept of “EQ” (emotional quotient)
has spurred us to underscore the importance of the emotions in our cognitive pro-
cessing. Those who manage their emotions—especially emotions that can be detri-
mental—tend to be more capable of fully intelligent processing. Anger, grief,
resentment, self-doubt, and other feelings can easily impair peak performance in
everyday tasks ay well as higher-order problem solving
‘These new conceptualizations of intelligence have not been universally
accepted by the academic community (see White, 1998, for example). Nevertheless,
their intuitive appeal infused the decade of the 1990s with a sense of both freedom
and responsibility in our testing agenda. Coupled with parallel educational reforms
at the time (Armstrong, 1994), they helped to free us from relying exclusively on
For a summary of Gardner's theory of intelligence, see Brown (2000, pp. 100-102).Cxmvren 1 Testing. Assessing, and Teaching 13
timed, discrete-point, analytical tests in measuring language. We were prodded to
cautiously combat the potential tyranny of objectivity” and its accompanying imper-
sonal approach, But we also assumed the responsibility for tapping into whole lan-
guage skills, learning processes, and the ability to negotiate meaning. Our challenge
‘was to test interpersonal, creative, communicative, interactive skills,and in doing so
to place some trust in our subjectivity and intuition,
Traditional and “Alternative” Assessment
Implied in some of the cartier description of performance-hased classroom assess
‘ment is trend to supplement traditional test designs with alternatives that are more
authentic in their elicitation of meaningful communication. Table 1.1 highlights dif-
ferences between the two approaches (adapted from Armstrong, 1994, and Bailey,
1998, p.207),
‘Two caveats need to be stated here. First, the concepts in ‘Table 1.1 represent
some overgeneralizations and should therefore be considered with caution. It is dif
ficult, in fact, to draw a clear line of distinction between what Armstrong (1994) and
Bailey (1998) have called traditional and alternative assessment. Many forms of
assessment fall in between the two, and some combine the best of both.
Second, it is obvious that the table shows a bias toward alternative assessment,
and one should not be misled into thinking that everything on the leftthand side is
tainted while the list on the right-hand side offers salvation to the field of language
assessment! As Brown and Hudson (1998) aptly pointed out, the assessment tradi-
tions available to us should be valued and utilized for the functions that they pro-
vide. At the same time, we might all be stimulated to look at the righthand list and
ask ourselves if, among those concepts, there are alternatives to assessment that we
can constructively use in our classrooms.
It should be noted here that considerably more time and higher institutional
budgets are required to administer and score assessments that presuppose more
Table 1.1. Traditional and alternative assessment
Traditional Assessment Alternative Assessment
ne shot, standardized exams Continuous longsterm assessment
Timed, multiple-choice format Untimed,tree-response format
Decontextualized test tems Contextualized communicative tasks
Scores suffice for feedback Individualized feedback and washback
Norm-referenced scores Criterion-referenced scores
Focus on the “right” answer ‘Open-ended, creative answers
‘Summative Formative
‘Oriented to product ‘Oriented to process
Non-interactve performance Interactive performance
Fosters extrinsic motivation Fosters intrinsic motivation14 carne 1 Testing Assessing and Teaching
subjective evaluation, more individualization, and more interaction in the process of
offering feedback. The payoff for the lattcr, however, comes with more useful feed-
back to students, the potential for intrinsic motivation, and ultimately a more
complete description of a student's ability. See Chapter 10 fora complete treatment
‘of alternatives in assessment.) More and more educators and advocates for educa-
tional reform are arguing for a de-cmphasis on large-scale standardized tests in favor
of building budgets that will offer the kind of contextualized, communicative
performance-based assessment that will better facilitate learning in our schools. (In
Chapter 4, issues surrounding standardized testing are addressed at length.)
Computer-Based Testing
Recent years have seen a burgeoning of assessment in which the testtaker performs
fesponses on a computer. Some computerbased tests (also known as “computer
assisted” or “web-based” tests) are small-scale “home-grown’ tests available on web-
sites. Others are standardized, large-scale tests in which thousands or even tens of
thousands of testtakers are involved, Students receive prompts (or probes, as they
are sometimes referred to) in the form of spoken or written stimuli from the com-
puterized test and are required to type (or in some cases, speak) their responses,
Almost all computer-based test items have fixed, closed-ended responses; however,
tests like the Test of English as a Foreign Language CTOEFL") offer a written essay
section that must be scored by humans (as opposed to automatic, electronic, or
machine scoring). As this book goes to press, the designers of the TOEFL are on the
verge of offering a spoken English section
A specific type of computer-based test, a computer-adaptive test, has been
available for many years but has recently gained momentum. In a computer-adaptive
test (CAT), cach testtaker receives a set of questions that meet the test specifica-
tions and that are generally appropriate for his or her performance level. The CAT
starts with questions of moderate difficulty. As test-takers answer each question, the
computer scores the question and uses that information, as well as the responses to
previous questions, to determine which question will be presented next. As long as
‘examinees respond correctly, the computer typically selects questions of greater oF
equal difficulty: Incorrect answers, however, typically bring questions of lesser or
equal difficulty. The computer is programmed to fulfill the test design as it continu-
ously adjusts to find questions of appropriate difficulty for testtakers at all perfor-
mance levels. In CATs, the testtaker sees only one question at a time, and the
‘computer scores each question before selecting the next one.As a result, testtakers
cannot skip questions, and once they have entered and confirmed their answers,
they cannot return to questions or to any earlier part of the test.
‘Computerbased testing, with or without CAT technology, offers these advantages:
+ classroombased testing
* selfdirected testing on various aspects of a language (vocabulary, grammar,
discourse, one or all of the four skills, etc.)
= =Cote 1 Testing Assessing. and Teaching 15
* practice for upcoming high-stakes standardized tests
+ some individualization, in the case of CATS
+ large-scale standardized tests that can be administered easily to thousands of
testtakers at many different stations, then scored electronically for rapid
reporting of results
Of course, some disadvantages are present in our current predilection for com-
puterizing testing. Among them:
* Lack of security and the possibility of cheating are inherent in classroom
based, unsupervised computerized tests.
+ Occasional “homegrown” quizzes that appear on unofficial websites may be
mistaken for validated assessments,
+ The multiple-choice format preferred for most computer-based tests contains
the usual potential for flawed item design (See Chapter 3),
+ Open-ended responses are less likely to appear because of the need for
human scorers, with all the attendant issues of cost, reliability, and turn-
around time.
+ The human interactive element (especially in oral production) is absent.
More is said about computer-based testing in subsequent chapters, especially
Chapter 4, in a discussion of large-scale standardized testing, In addition, the fol
lowing websites provide further information and examples of computer-based tests:
Educational Testing Service www.ets.org
‘Test of English as a Foreign Language www.toefLorg,
‘Test of English for International Communication _www.toeic.com
International English Language Testing System www.ielts.org
Dave's ESL Café (computerized quizzes) wwwesleafe.com
Some argue that computer-based testing, pushed to its ultimate level, might mit-
igate against recent efforts to return testing To its artful form of being tailored by
teachers for their classrooms, of being designed to be performance-based, and of
allowing a teacher-student dialogue to form the basis of assessment. This need not
be the case. Computer technology can be a boon to communicative language
testing, Teachers and testmakers of the future will have access to an ever increasing
‘ange of tools to safeguard against impersonal, stamped-out formulas for assessment.
By using technological innovations creatively, testers will be able to enhance authen-
ity, to increase interactive exchange, and to promote autonomy.
¢ Ful ok a
‘As you read this book, I hope you will do so with an appreciation for the place
of testing in assessment, and with a sense of the interconnection of assessment and‘CrrER 1 Testing. Assessing, and Teaching
teaching. Assessment is an integral part of the teaching-learning cycle, In an inter-
active, communicative curriculum, assessment is almost constant. Tests, which are a
subset of assessment, can provide authenticity, motivation, and feedback to the
earner. Tests are essential components of a successful curriculum and one of sev-
‘eral partners in the learning process. Keep in mind these basic principtes:
1. Periodic assessments, both formal and informal, can increase motivation by
serving as milestones of student progress.
2. Appropriate assessments aid in the reinforcement and retention of informa-
tion.
3. Assessments can confirm areas of strength and pinpoint areas needing further
work
4. Assessments can provide a sense of periodic closure to modules within a cur.
riculum,
5. Assessments can promote student autonomy by encouraging students’ self
evaluation of their progress.
66. Assessments can spur learners to set goals for themselves
7. Assessments can aid in evaluating teaching effectiveness.
[Note: () Individual work; (G) Group or pair work; (©) Whole-class discussion. |
1. (G) Ina small group, look at Figure 1.1 on page 5 that shows tests as a subset
of assessment and the latter as a subset of teaching. Do you agree with this
diagrammatic depiction of the three terms? Consider the following classroom
teaching techniques: choral drill, pair pronunciation practice, reading aloud,
information gap task, singing songs in English, writing a description of the
weekend's activities. What proportion of each has an assessment facet to it?
Share your conclusions with the rest of the class
2. G) The chart below shows a hypothetical line of distinction between forma-
tive and summative assessment, and between informal and formal assessment.
Asa group, place the following techniques/procedures into one of the four
cells and justify your decision. Share your results with other groups and dis-
cuss any differences of opinion.
Placement tests
Diagnostic tests
Periodic achievement tests
Short pop quizzesOHATER 1 Testing, Assessing. and Teaching 17
Standardized proficiency tests
Final exams
Portfolios
Journals
Speeches (prepared and rehearsed)
Onal presentations (prepared, but not rehearsed)
Impromptu student responses to teacher's questions
Studentwritten response (one paragraph) to a reading assignment
Drafting and revising writing
Final essays (after several drafts)
Student oral responses to teacher questions after a videotaped lecture
Whole class open-ended discussion of a topic
Formative Summative
Informal
Formal
(/) Review the distinction between norm-referenced and eriterion-
referenced testing. If norm-referenced tests typically yield a distribution of
scores that resemble a bellshaped curve, what kinds of distributions are
typical of classroom achievement tests in your experience?
4. (/C) Restate in your own words the argument between unitary trait propo-
nents and discrete-point testing advocates. Why did Oller back down from the
unitary trait hypothesis?
5. (/C) Why are cloze and dictation considered to be integrative tests?
6, (G) Look at the list of Gardner's seven intelligences. Take one or two intelli
ences, as assigned to your group, and brainstorm some teaching activities
that foster that type of intelligence. Then, brainstorm some assessment tasks18
Comeree 1 Testing, Assessing. and Teaching
that may presuppose the same intelligence in order to perform well. Share
‘your results with other groups.
7. (©) Asa whole-class discussion, brainstorm a variety of test tasks that class
members have experienced in learning a foreign language. Then decide
‘which of those tasks are performance-based, which are not,and which ones
fall in between.
8. (G) Table 1.1 lists traditional and alternative assessment tasks and characteris-
tics. In pairs, quickly review the advantages and disadvantages of each, on
both sides of the chart. Share your conclusions with the rest of the class.
9. (©) Ask class members to share any experiences with computerbased testing
and evaluate the advantages and disadvantages of those experiences,
FOR YOUR FURTHER READING.
‘MeNamara, Tim. (2000). Language testing. Oxford: Oxford University Press.
One of a number of Oxford University Press's brief introductions to various
areas of language study, this 140-page primer on testing offers definitions
of basic terms in language testing with brief explanations of fundamental
concepts. It is a useful little reference book to check your understanding of
testing jargon and issues in the field,
Mousavi, Seyyed Abbas. (2002). An encyclopedic dictionary of language testing.
Third Edition. Taipet: Tung Hua Book Company.
‘This publication may be difficult to find in local bookstores, but it is a
highly useful compilation of virtually every term in the field of language
testing, with definitions, background history, and research references. Tt
provides comprehensive explanations of theories, principles, issues, tools,
and tasks. Its exhaustive 88-page bibliography is also downloadable at
hutp://www.abbas-mousavi.com. A shorier version of this 942-page tome
may be found i the previous version, Mousavi's (1999) Dictionary of tan~
guage testing (Tehran: Rahnama Publications).