The Challenge of Large-Scale
English Language Testing in China
Yan Jin,
Shanghai Jiao Tong University
INTRODUCTION
English language testing in China
English language testing in China
Purpose Education level Test Proficiency level
Admission High-school Zhong Kao Band 5
College or university Gao Kao (NMET) Band 8
Graduate school GSEEE /
Program exit Vocational English PRETCO-B Level B
PRETCO-A Level A
College CET-4 Band 4
Non-English major CET-6 Band 6
CET-SET Grade A, B, C, D
College TEM-4 Band 4
English major TEM-8 Band 8
Proficiency Public English
PETS Band 1, 2, 3, 4, 5
language test
The scale of English language tests
Number of test takers in 2012 (million)
TEM • Test for English Majors
0.50m
GSEEE • Graduate School English Entrance
1.65m
Examination
PETS • Public English Test System 2.04m
NMET • National Matriculation English
Test
9.15m
CET • College English Test 17.98m
…… • ……
The high-stakes purposes of EFL tests
College
admission
Graduate
College
program
graduation
admission
TESTS
Employment
……
and career
Modern Language Testing at the Turn of the Century:
Assuring What We Count Counts
“a strong program of
test validation that includes
considerations of ethical test use”
Bachman (2000, p. 1)
Test validation
“steers between the Scylla and Charybdis
of what Messick called construct under-
representation, on the one hand, and
construct-irrelevant variance, on the other”
McNamara & Roever (2006, p. 18)
Ethical test use
Language testing needs to go beyond the
cognitive and psychological models and
explore a model that encompasses
a clear social dimension
McNamara & Roever (2006)
Challenges
Construct underrepresentation:
native speaker norms
Construct-irrelevant variance:
Computer-based language testing
Ethical language test use
Native-speaker norms?
The relevance of native speaker norms
As the use of English in international communication
increasingly involves only nonnative speakers in many
settings, the relevance to this communication of
relatively distant native speaker norms is increasingly
questioned. The topic raises complex sociolinguistic,
policy, cultural, and political issues that are only
beginning to be explored…
McNamara & Roever (2006: 252)
A paradigm shift for ELT in Asia
A paradigm shift for English language teaching
within the Asian region that is responsive to
“the increasing use of English as a language
of contact between non-native speakers
across national boundaries”
Wang & Hill (2011, p. 206)
Three ways of using English as an international language
1. World Englishes (WEs) approach
Using a variety of Englishes in the lectures and dialogues
that form the stimulus material in a test—an approach
currently being used in international tests.
2. English as a lingua franca (ELF) approach
Elder and Davies (2006): testing it with language input that is standard
English but designing test accommodations that modify the test delivery
system in order to make it accessible and fair for ELF users without changing
the construct; Or, testing ELF as a construct onto itself while favoring strategic
competence.
3. Locally defined English as an international language
(EIL) approach
Decisions are based on carefully considered local needs for English including
its international uses. Such local needs will typically be based on a thorough
needs analysis of the EIL language and context involved in a particular local
English learning situation.
J.D. Brown, 2012
Accents in listening tests
Test taker Accent Speaker
TEM English major British & American, Australian NS
GSEEE College graduate British & American NS
PETS English learner British & American NS
NMET High school leaver British & American NS
CET Non-English major British & American NS
IB-CET Non-English major British & American NS & NNS
TOEFL British & American, Australian,
High school leaver NS
iBT New Zealand, Canadian
British & Australian, American,
IELTS High school leaver NS
New Zealand
British & American, non-native
PTE-A High school leaver NS & NNS
accents
Example 1: CET Listening
Paper-based Internet-based
Fully scripted Audio clips (radio
Recorded in a studio program)
Native speaker with a Video clips (TV, movie,
standard accent online materials)
Preferably English Possible presence of NNS
radio/TV broadcaster or or NS with a local accent
teacher/editor/writer Questions on NS with a
Male + Female standard accent
British + American
Example 2: IELTS Listening
http://www.ielts.org/test_takers_information/what_is_ielts/test_format.aspx
Example 3: TOEFL iBT Listening
http://www.ets.org/toefl/important_update/english_accents_added
Example 4: PTE-Academic Listening
http://pearsonpte.com/au/Documents/RelevantFactsheet.pdf
More research of WEs and language testing
The attitude toward WEs in language testing will have
a marked effect on the construct representation:
test design, testing processes, rating criteria, rater training,
score interpretation …
JD Brown (2013):
7 recommendations on how language testers and the WEs
community can start working together.
Recommendation No. 4:
Base tests on context, needs and decision purposes.
Messick (1996: 243)
“… nothing important be left out of the
assessment of the focal construct"
Computer-based language testing
Computer-based language tests
CB-IELTS IB-CET CB CET-SET
A B C D E
TOEFL iBT PTE-A
A. CB-IELTS: since 2000 (www.ielts.org/researchers/research/computer_based_ielts.aspx)
B. TOEFL iBT: since 2005 (www.ets.org/toefl/ibt)
C. IB-CET: since 2008 (www.ccets.org)
D. PTE-Academic: since 2009 (www.pearsonpte.com/pteacademic)
E. CB-CETSET: since 2012 (www.cet.edu.cn)
Validity of CBLT/CALT
Growing concern about the potential threat to validity of
computerized tests in the context of high-stakes testing:
Validation studies of large-scale computer-
based language tests center on whether the
use of technology will confound the
measurement of test takers’ language
proficiency with computer proficiency,
resulting in test bias and unfairness.
Empirical studies
Taylor et al. 1998 Computer familiarity and TOEFL CBT
Sawaki 2001 PB and CB L2 reading assessment
Choi et al. 2003 PB and CB TEPS in Seoul
Brown 2003 Handwritten vs. word-processed IETLS essays
Breland et al. 2004 Handwritten vs. word-processed TOEFL writing
Wolfe & Manalo 2005 TOEFL composition medium & score quality
Weir et al. 2007 CB IELTS vs. IELTS writing
Cross-modal validity aiming at establishing equivalence
Mixed results: enhance/impede/have no effect
on test performance
Assumption: Computer as a source of CIV
Studies of CB CET-SET and IB CET
Cognitive processing (Weir 2005):
(Jin, Wu and Yan, 2012): Cognitive processing of writing
Is computer literacy construct- relevant in a language test in
the 21st century? (LTRC 2011)
(Jin & Zhang, L. 2012): Communicative strategies in speaking
The impact of test mode on the use of communication
strategies in the paired discussion task (AFELTA 2012)
(Jin & Zhang, X. 2013): Cognitive processing of integrated and
independent tasks
A comparative study of PTE-Academic and IB-CET (LTRC 2013)
Computer Literacy and CBLT construct
What is the role of computer literacy in the
conceptualization of the construct of a CBLT?
(Is the ability to use the computer part of the
construct to be measured?)
What do we want to measure in a CBLT?
Computer operation in IB-CET
Click (MCQ) ?Automatic spelling check
Double click (de-select) ?Automatic grammar check
Drag and drop (match) ?Automatic capitalization
Type (SAQ, composition) ?Online dictionary
Cut and paste (composition) ?Online resources
Talk to the microphone ?Cut & paste (summary writing)
Highlight while reading ?Pauses of video clips (dictation)
Scroll while reading & writing ?Repeat (listening to repeat)
Check and adjust equipment ?Previewing listening questions
Social-cognitive construct representation
A global construct:
‘ability–in language user’ + ‘language user–in context’
Incorporates interaction from an individual-focused
cognitive perspective (a psycholinguistic ability model)
A local, context-bound construct:
ability–in language user–in context
Adopts a social interactional perspective: individual ability
and contextual facets interact in ways that change them
both
(Chalhoub-Deville 2003: 369-383)
Ability within context as the construct
... future L2 construct exploration should
address the ‘abilities – in language users – in
contexts’, as well as the connections that enable
language users to transfer relevant schemes to
appropriately engage in a variety of
communicative events.
(Chalhoub-Deville, 2003: 378)
A research agenda
A paradigm shift for the conceptualization of the
construct of a CBLT/CALT:
Specify computer-mediated TLU situations
Design CBLT tasks and interfaces to elicit best performance
Investigate
The interaction involved in language use
among attributes of the test taker;
between the test taker and task characteristics.
……
Assessment
Use
We spend a lot of time worrying about the
technical reliability and technical excellence of
our instruments, and too little worrying about
the validity for the purposes for which they are
intended (or about the ethical justification for
those uses).
(Spolsky, 1995: 9)
… once the assessment-based interpretations are
actually used for the purpose for which they are
intended, the assessment can develop a life of its
own. It may be lured out of the well-described
domain where the test developer intended it to
reside, and other test users may co-opt it for uses
beyond those for which it was developed
(Bachman & Palmer, 2010, p. 429)
College English Test (CET)
CB-CETSET
IB-CET
2012
CETSET 2008 Computer-
CET-6 1999 Based
Internet-
CET-4 Based CET
1989 CET
Spoken CET Spoken
1987 CET
English English
CET Band 6
Test Test
Band 4
CET: A history of over 25 years
CET: the first decade
Psychometric-structuralist approach to language
testing
(Spolsky, 1995)
100,000 in 1987 → 1 million in the mid-1990s
Ensuring the psychometric and technical qualities of
this large-scale test: “A good test would automatically
produce good effects in the classroom.”
(Wall, 2000, p. 505-506)
CET: since the late 1990s
The social dimension of language testing
(McNamara & Roever, 2006; Yang & Gui, 2007)
9.5 million in 2005 18 million in 2012
The large scale and the high stakes place greater
professional and social responsibilities on the test
developer. However, …
“Is it possible for test developers to take the responsibility
for the consequences of the test without being given
sufficient power over the use of the test?”
CET: the intended purpose
Based on the requirements stipulated in the national
College English teaching syllabus
(State Education Commission, 1985; 1986)
“check whether the English language proficiency of
college students has met the requirements set in the
national College English teaching syllabus”
(College English Test Working Group, 1987; 1988)
Promote the implementation of the teaching syllabus
CET: uses for other purposes
Employment
opportunity
Graduate
Residential
program
permit
admission
Diploma/
degree CET ……
Consequences of overuse
The De Facto curriculum
Teaching to the test
Learning to the test
Cheating, fake certificates
Concern with test fairness
Decline in moral standards
Corrupting the test
When a quantitative indicator is used for social
decision-making, it distorts and corrupts the indicator
itself and the social process it was intended to monitor
(Campbell, 1975, in Madaus et al., 2009: 155)
Who should answer for the consequences?
Critical language testing
A hidden political agenda? (Shohamy, 2001, p. 113-114)
Gatekeeping function of language tests (Spolsky, 1997)
The CET is used mainly for initial screening purposes.
A fiercely competitive culture in which opportunities are
insufficient or unevenly distributed
Ethical language testing
The test developer and the test user should shoulder the
responsibilities of test development and use.
(Bachman and Palmer, 2010; Davies, 1997, 2004)
Test developer’s Decision maker’s
STAGES
responsibility responsibility
Primary responsibility
Needs to understand
1 Initial Planning
2 Design
Primary responsibility
Needs to understand
3 Operationalization
Primary responsibility
Needs to understand
4 Trialing
5 Assessment Use
Bachman & Palmer, 2010, p. 432
A responsible test developer/decision maker…
A responsible test developer needs to
convince the decision maker that
the assessment records are consistent,
the assessment-based interpretations are meaningful,
impartial, generalizable, relevant, and sufficient …
A responsible decision maker needs to
convince the other stakeholders that
the decisions are values sensitive and equitable,
the consequences are beneficial …
(Bachman & Palmer, 2010, p. 433)
Cooperation among stakeholders
Fruitful cooperation toward a common goal is possible
only if all the stakeholders are sufficiently equal in power
and ability (Mathew, 2004, p. 123-134)
Empower test developers:
Adequate funding for quality test development
Truth about the intended purpose of the test
Express expert views on how tests impact stakeholders
Educate test users:
With a good knowledge of the test, test users will be in a better
position to justify their decisions and the consequences of the
decisions.
Developing an “ethical milieu”
Contribute to the development of an “ethical milieu”
for language testing. (Davies, 1997b, p. 336)
Improve stakeholders’ assessment literacy.
Teachers: make the best use of a standardized
language test and avoid teaching to the test.
Students: understand what is being assessed and
how to improve performance on the test.
Develop a code of practice for language assessment in
China.
Limits of language testing and language tests
The test developer: reconcile
users’ high expectations with
the ethics of test use
The test user: understand the
test-based interpretations and
the intended use of the test
Language testing: A weak profession
“… unlike medicine and the law, there is no quasi-legal
body controlling entry and authorizing the right to
practice.” (Boyd & Davies, 2002, p. 307)
There are limits to what language tests can tell us
about test takers and what test developers can do in
their professional role as language testers.
Collaborate to strengthen their roles as promoters
of professionalism and ethics in language testing.
Local Global
English English
language language
tests tests
Strengths and Weaknesses
Contextualization: Theoretical model
e.g., needs, culture, policy underpinning the test design
Alignment to curriculum Evidence supporting
Resources: validity arguments:
e.g., cognitive processing,
e.g., human, facility, finance
meaningfulness of scores…
Accessibility Code of practice:
Cost effectiveness e.g., piloting, equating,
accommodation, post-test
…… analysis, transparency
……
Linking to the CEF
A1 A2 B1 B2 C1 C2
IELTS - - 4-5 5.5-6.5 7-8 8.5-9
BEC-Higher - - - - C -
PTE-Academic - - 43-58 59-75 76-84 -
TOEFL iBT - - 57-86 87-109 110-120 -
TOEIC_R/L 60-109 110-274 275-399 400-489 490-495 -
TOEIC _Speaking 50-89 90-119 120-159 160-199 200 -
TOEIC_Writing 30-69 70-119 120-149 150-199 200 -
English language testing in China
An English language framework for China
Purpose Education level Test Proficiency level
Admission High-school Zhong Kao Band 5
College or university Gao Kao (NMET) Band 8
Graduate school GSEEE /
Program exit Vocational English PRETCO-B Level B
PRETCO-A Level A
College CET-4 Band 4
Non-English major CET-6 Band 6
CET-SET Grade A, B, C, D
College TEM-4 Band 4
English major TEM-8 Band 8
Proficiency Public English
PETS Band 1, 2, 3, 4, 5
language test
A PROFOUND NEED FOR RESEARCH
Plan
Use Design
R
Trial Operate
The End
Thanks For Your Attention
[email protected]