BERA Annual Conference | 2009
Assessment is the key to identifying and measuring achievement
– it shows when education works. Effective assessment leads to
better opportunities in life.
Enriching Education
Established in 1858, we aim to promote educational
Cambridge Assessment provides fair, valid and reliable excellence and high quality learning through the use of
assessments that encourage personal development by assessment. Although there have been many changes to
recognising achievement. the education system over the years, the sense of mission
that sparked the creation of the University of Cambridge
We play a leading role in researching, developing and Local Examinations Syndicate (the original name of
delivering educational assessment to eight million Cambridge Assessment) remains at the heart of
learners in over 150 countries every year through our everything we do today. We strive for continuous
three major exam boards: Cambridge ESOL, CIE and OCR. improvement of assessment systems and methodologies
Cambridge Assessment is a department of the University around the world to guarantee learners everywhere
of Cambridge and a not-for-profit organisation. access to the benefits of their education.
www.cambridgeassessment.org.uk
2 Cambridge Assessment Research
Cambridge Assessment Research Presentations
at the BERA Annual Conference 2009
Objective questions in GCSE science: Exploring question
This booklet features information about the Cambridge difficulty, item functioning and the effect of reading
Assessment research that is being presented at this year’s difficulties
BERA conference.
Victoria Crisp
At Cambridge Assessment, the reliability of our assessments
stems from evidence-based and research-led approaches to all Time: Thursday 3 September from 2:30pm to 4:00pm
products, services and new developments. We have the largest Session: Main Conference Parallel Session 3
research capability of its kind in Europe with more than 50 Reference: 0159
researchers who pioneer the latest techniques and evaluate
current assessments.
Externally funded research is also undertaken, including for
the regulators in the United Kingdom and for many education
ministries. The results of our research are widely published
in well-respected major refereed journals such as Review of
Educational Research and Assessment in Education, as well as Victoria Crisp, presenter
being presented at seminars and conferences. We also have
our own publications, Research Matters and Research Notes. Breaking from a strong tradition of constructed response
examinations, one revised science qualification in the UK now
involves some examinations with only objective questions. It
was considered of interest to investigate characteristics of these
newer GCSE papers such as difficulty and their contribution to
validity. This study also explored the potential to use access
arrangements data to investigate how students with certain
needs may be affected differently to other students by features
of exam questions.
Item level performance data for the entire candidature of two
GCSE science examination papers were obtained. Traditional
statistical measures of difficulty (facility values) and
discrimination (correlations of item score with total mark)
were calculated for each item. Rasch analysis was also
Publications available at www.cambridgeassessment.org.uk
conducted to provide estimates of difficulty independent
from student ability and information on item functioning. For
one of the papers, a ‘Reader’ group of students was identified
including all students who had access to a reader in their exam.
A ‘Norm’ group of the same size was selected randomly from
students without a reader. Measures of difficulty and functioning
were compared between groups. For a number of interesting
items a sample of student responses was analysed.
Sylvia Green, Director – Research Division
A number of factors potentially making questions easier
Cambridge Assessment has supported the BERA Annual (e.g. absence of technical terms) or more difficult (e.g. incorrect
Conference for many years. Like BERA, we believe that response option makes an accurate statement) were identified.
educational research plays a vital role in the continuous Factors potentially contributing to problems with item
improvement of education and assessment policies and functioning were also identified (e.g. objective questions
practices. Members of our Research Division contribute to that facilitate guessing). The analyses also suggested a number
the comprehensive conference programme by presenting of question features that may have particularly influenced those
papers that cover a wide range of assessment issues. We requiring reading support (e.g. better performances on questions
look forward to seeing you either at one of our presentations with little technical language), some of which are unsurprising.
or at our exhibition stand. The findings have implications for question writing practice.
BERA Annual Conference 2009 3
Standard-maintaining by expert judgement: using the Aspects of AS and A-level Physics uptake
rank-ordering method for determining the pass mark on
multiple-choice tests Tim Gill, Carmen Vidal Rodeiro and John F. Bell
Milja Curcin, Beth Black and Tom Bramley Time: Wednesday 2 September from 3:00pm to 4:30pm
Session: Main Conference Parallel Session 1
Time: Thursday 3 September from 2:30pm to 4:00pm Reference: 0350
Session: Main Conference Parallel Session 3
Reference: 0176
Milja Curcin, presenter Tim Gill, presenter
The Angoff method for determining pass marks on multiple- Concern is often expressed about the declining uptake of A-level
choice tests is widely used in North America, Australia, and in Physics in England. However, such concerns do not always take
the UK. It involves experts judging the difficulty of the test items account of important information. In particular some analyses
for ‘minimally competent’ candidates. have been based on A-level entries when the actual supply of
A-level Physicists is based on passes.
However, as a standard setting method, the Angoff method has
no explicit mechanism for standard maintaining, i.e. keeping the This paper will draw together the findings from previous
pass mark at the same standard session on session. Therefore, work undertaken by Cambridge Assessment into the uptake
there is a need to explore judgemental methods of standard- of different subjects at A-level and report on new work that
maintaining for multiple-choice tests in situations where the addresses some issues arising.
requirements for statistical equating and linking are not met.
We review the trends in Physics A-level uptake over the last
This study involved piloting an adapted rank-ordering method, twenty years in relation to other A-level subjects (for example,
which allowed direct comparison of items from a previous the Physics entry in terms of the size of the overall A-level
session with those from the current live session of a test. entry) and consider the impact of broadening the sixth form
Each judge was given several packs of four items (two from curriculum.
each session). Their task was to place the four items in rank
order of perceived difficulty. We also consider the uptake of A-level by school type,
ethnicity and social factors. We will address the claim that the
By fitting a Rasch model which estimates relative difficulty independent sector is particularly successful at encouraging
for each item based on the judges’ rank orders, we obtained a students to take A-level Physics. The apparent decline of entries
common scale of ‘perceived difficulty’ on which to compare the in the state sector compared to the independent sector is placed
two tests. Knowing the pass mark for the previous test, we could in the context of the ability of the candidates taking the A-level.
map it to the pass mark on the live test which would be achieved
by a candidate of inferred equivalent ability. This would allow Patterns of entry of GCSE science subjects and how they relate
standards to be maintained session on session. This exercise was to A-level uptake will also be described. In particular, it will be
carried out twice in two different OCR vocational qualifications in demonstrated how the compensatory nature of Double Award
order to investigate its consistency across contexts and over time. Sciences has led to misleading views of its efficacy as
preparation for Physics A-level.
We will discuss the validity of this method and compare it with
the Angoff procedure. We will also discuss its potential as a The paper will also review the reasons why students choose
standard maintaining technique in different examination contexts. Physics at A-level, including secondary analyses from a large
scale survey conducted by Cambridge Assessment into the
reasons for A-level choice.
4 Cambridge Assessment Research
How are archive scripts used in judgements about An investigation into marker reliability and other qualitative
maintaining grading standards? aspects of on-screen essay marking
Jackie Greatorex Martin Johnson
Time: Thursday 3 September from 2:30pm to 4:00pm Time: Thursday 3 September from 9:00am to 10:30am
Session: Main Conference Parallel Session 3 Session: Main Conference Parallel Session 2
Reference: 0182 Reference: 0205
Jackie Greatorex, presenter Martin Johnson and Hannah Shiell, presenters
Background Literature suggests that readers’ comprehension of texts
Generally GCE and GCSE Awarding Bodies use: might be weaker when extended texts are read on screen rather
than on paper. This has important implications for assessment,
◆ Awarding procedures to determine grade boundaries
implying a need to explore whether the mode in which an essay
(including archive scripts to remind examiners of the
is accessed might influence assessors’ judgements about the
previous year’s standard).
quality of the essay.
◆ Comparability studies to monitor standards over time or
between Awarding Bodies.
This project investigated whether examiners could mark digital
images of a set of GCSE English Literature essays as reliably
Some authors suggested replacing aspects of marking and/or
on screen as they could in the traditional paper mode, whilst
Awarding with Thurstone pairs and/or rank ordering. Both involve
also employing a variety of methods to capture some of the
judging the quality of scripts and are used in some comparability
complex reading behaviours that pertain to the assessment of
studies. These ideas are still being explored, refined and debated.
extended texts.
At the International Association for Educational Assessment
To investigate essay marking reliability, examiners’ marks
conference in 2008 Greatorex et al presented some findings from
were statistically compared across both modes and with an
a wider project. The project data constituted over twenty verbal
independent reference mark for each essay. To consider whether
protocols of examiners judging script quality in experimental
mode affected the script features (or constructs) being attended
conditions which replicated Thurstone pairs, rank ordering, and
to by the examiners, Kelly’s Repertory Grid technique was used
part of the Awarding procedure. Greatorex et al reported that
to elicit constructs and ratings from two senior examiners. These
the questions that statistically discriminated between grade
were then used to build a profile of each script, while marking
A and grade B performance were not necessarily the questions
reliability analyses were used to infer any potential relationship
examiners attended to most in the live scripts. My BERA paper
between construct recognition and mode.
also draws from the wider project and focuses on archive scripts.
Examiners’ cognitive load whilst marking was measured by
BERA paper
a Task Load Index which enabled a comparison of each marker’s
There are two aims (1) to compare between conditions in terms
cognitive workload in each mode. This was complemented by
of the questions receiving most attention in archive scripts and
a measure comparing examiners’ spatial encoding abilities
(2) to identify how well these questions discriminated between
across modes. Finally, examiners’ navigation flow and annotation
the performance of candidates who actually received grades
practices were observed, coded and compared across a sample
A and B. Data analysis is still ongoing. Interim results indicate
of scripts marked in both modes. These observations were
that two questions statistically discriminated between question
then used to inform a series of semi-structured interviews
level marks of candidates who were awarded grades A and B
with each examiner.
and these two questions were not always the most referenced
questions. Discussion will focus on how the findings relate to
practice or potential practices.
BERA Annual Conference 2009 5
What was this student doing?: Evidencing validity in A-level Must examiners meet in order to standardise their marking?
assessments An experiment with new and experienced examiners of GCE
AS Psychology.
Stuart Shaw and Victoria Crisp
Nicholas Raikes, Jane Fidler and Tim Gill
Time: Thursday 3 September from 4:30pm to 6:00pm
Session: Main Conference Parallel Session 4 Time: Thursday 3 September from 9:00am to 10:30am
Reference: 0160 Session: Main Conference Parallel Session 2
Reference: 0672
Stuart Shaw and Victoria Crisp, presenters Nicholas Raikes, presenter
Validity is about the extent to which the inferences made from When high stakes examinations are marked by a panel of
an assessment’s outcomes are appropriate. A claim of validity examiners, the examiners must be standardised so that
is generally agreed to require evidence of a number of factors. candidates are not advantaged or disadvantaged according
Whilst a number of possible frameworks for evaluating validity to which examiner marks their work.
have been proposed, there have been few attempts to apply
such frameworks in the UK. It is common practice for Awarding Bodies’ standardisation
processes to include a ‘Standardisation’ or ‘Co-ordination’
As part of the piloting of a multi-faceted methodology for meeting, where all examiners meet to be briefed by the Principal
providing comprehensive validity evidence, this paper reports Examiner and to discuss the application of the mark scheme in
some of the evidence garnered to address one of the validation relation to specific examples of candidates’ work. Research into
questions within the framework used: ‘Do performances on exam the effectiveness of standardisation meetings has cast doubt on
tasks reflect relevant qualities/intended thought processes?’ their usefulness, however, at least for experienced examiners.
Eleven questions from the examinations of an international In the present study we address the following research questions:
A-level geography qualification were selected. For each
1. What is the effect on marking accuracy of including a
exam question, six geography experts were presented with
face-to-face meeting as part of an examiner standardisation
the question and its mark scheme and asked to identify the
process?
processes that they would expect students to use to answer
each sub-question well. The experts were then shown responses 2. How does the effect on marking accuracy of a face-to-face
to the question from three students (one strong, one average meeting vary with the type of question being marked
and one weak response) and were asked to identify the processes (short-answer or essay) and the level of experience of the
that they thought the students had actually used to arrive at examiners?
these answers. Finally, the experts were asked to reflect on the 3. To what extent do examiners carry forward standardisation
match between the expected and apparent processes. on one set of questions to a different but very similar set
of questions?
The experts’ views on the anticipated and perceived processes
were analysed, looking for commonalities. Additionally, expected Detailed results and discussion will be included in the paper
and apparent processes were compared, with reference to the presented at the conference.
experts’ reflections.
The findings of the study will help stakeholders in public
The paper will report on stronger and weaker matches between examinations decide whether examiners must meet in order for
expected and apparent processes and what these suggest with them to be standardised, and whether this varies according to
respect to this aspect of validity. the experience of the examiners and the type of questions.
6 Cambridge Assessment Research
Continuing development for assessment professionals
The Cambridge Assessment Network enables professionals in assessment to share and develop
knowledge and expertise – a centre of excellence in assessment. Our aim is to build an international
community committed to high quality assessment that enhances learning.
We offer formal and informal professional development activities for those working in assessment:
◆ Seminars, training sessions and other events covering key issues in assessment
◆ A certificated course in the subject, accredited by the University of Cambridge
◆ Access to formal and informal sources of expertise
◆ A wide range of networking opportunities.
Our programme is supported by AssessNet, a virtual learning environment, through which we can
deliver bespoke online courses to members of the profession.
www.assessnet.org.uk
Forthcoming events/courses
Certificate in the Principles and Practice of Assessment
This innovative programme is offered by the University of Cambridge Institute of Continuing
Education together with Cambridge Assessment. Led by specialists in assessment, the programme
provides an introduction to educational assessment, using topical and relevant examples.
Cambridge Assessment Conference – Issues of control and innovation: the role of the state in
assessment systems. 19 October, Robinson College, Cambridge
The keynote speakers will be Professor Alison Wolf, King’s College London, and Professor Robin
Alexander, University of Cambridge. Experts including: Professor Mary James, Faculty of Education,
University of Cambridge; Isabel Nesbit, Ofqual; and Dr John Allan, SQA; will lead a series of
seminars. For further information please visit www.assessnet.org.uk/annualconference.
BERA Annual Conference 2009 7
Cambridge Assessment
1 Hills Road
Cambridge CB1 2EU
United Kingdom
tel +44 (0) 1223 553311
fax +44 (0) 1223 460278
www.cambridgeassessment.org.uk
*1988336790*
Cambridge Assessment is the brand name of the University of Cambridge
Local Examinations Syndicate, a department of the University of
for position only
Cert no. SA-COC-1527
Cambridge. Cambridge Assessment is a not-for-profit organisation. Cover: Corbis Images
8 Cambridge Assessment Research