Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views4 pages

10 - Language Testing and Assessment

The document provides an overview of language testing and assessment, defining tests as samples of behavior used to infer language proficiency. It discusses various test types based on method (paper-and-pencil vs. performance-based) and purpose (achievement vs. proficiency), highlighting their characteristics and limitations. Additionally, it covers concepts of reliability and validity in testing, emphasizing the importance of accurate measurement and the need for reliable scoring methods.

Uploaded by

Dorottya Csikai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

10 - Language Testing and Assessment

The document provides an overview of language testing and assessment, defining tests as samples of behavior used to infer language proficiency. It discusses various test types based on method (paper-and-pencil vs. performance-based) and purpose (achievement vs. proficiency), highlighting their characteristics and limitations. Additionally, it covers concepts of reliability and validity in testing, emphasizing the importance of accurate measurement and the need for reliable scoring methods.

Uploaded by

Dorottya Csikai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Language testing and assessment

Definition of test
 A test is a sample of an individual’s behaviour/performance on the basis of which inferences
are made about the more general underlying competence of that individual
 Language tests involve any kind of measurement/examination technique which aims at
describing the test taker’s foreign language proficiency, e.g. oral interview, listening
comprehension task, or free composition writing.
 Language test may differ in test method and purpose.

Test types based on the testing method

Paper-and-pencil language tests


 assessment of
o a separate component of the language (grammar, vocabulary)
o receptive understanding (reading, listening)
 test item: fixed response format (a number of possible responses are presented, the
candidate is required to choose (e.g.: multiple choice)
o correct answer: key; incorrect answers: distractors
o distractors are chosen based on observations of typical errors of learners
 not useful in testing the productive skills (except indirectly)

Performance-based tests
 skills are assessed in an act of communication
 assessment of speaking and writing
 the samples are elicited in context of simulations of real-word tasks in realistic contexts
 test taker is assessed by trained raters using an agreed rating process

Test types based on the purpose

Achievement tests
 associated with the process of instruction
 during or at the end of a course study
 whether and where progress has been made in terms of the goals of learning
 should support the teaching to which they relate
 possible negative effect on teaching: teaching to the test
 may be self-enclosed: it may not bear direct relationship to language use
o successful performance does not necessarily indicate successful achievement
 relate to the past: they measure what students have learned as a result of teaching
 alternative assessment: don’t teach and study for the test, involve students in assessment,
enable them to self-assess their progress

Proficiency tests
 relate to the future situation of language use
o without necessarily reference to the previous process of teaching
 based on a specification of what candidates have to be able to do in a language
 criterion: the students’ real-life language use
 include performance features, where characteristics of the criterion setting are represented
o e.g.: test of communicative abilities of a health professional: communicating with
patients
 admissions to a foreign university, occupation requiring L2 skills

The criterion:
- criterion: relevant communicative behaviour in the target situation; series of performances
subsequent to the test, the target
- Test: a performance representing samples from the criterion
- some teachers question the value of direct testing  how can you test behaviour?

Other limits to testing:


- authenticity: there is an inevitable gap between the test and the criterion
- validity: generizability; does it actually measure what it has to measure?
- Observer’s paradox

Reliability
Reliability shows how precisely we measured. The scores obtained should be very similar to those
which would have been obtained by the same students with the same ability, but at a different time.

The reliability coefficient:


 quantify the reliability of a test (between 0 and 1)
 compares the reliability of different tests
 ideal = 1 → would give the same results for a particular set of candidates regardless of when
it was administered
 it can be different for different types of language tests
o a good vocab or reading test is between .90-.99
o auditory comprehension is often .80-.89
o and oral production may be .70-.79
 it also depends also on the importance of the decisions that are to be taken on the basis of
the test
 determining → need two sets of scores for comparison
o Test-retest method: get a group of subjects to take the same test twice (problematic:
likely to recall items, learning or forgetting might takes place btw the two tests, low
motivation to take the test twice)
o Alternate forms method: use two different forms of the same test; often not
available
o Split half method: each subject given two scores; one score for on half, the other
score for the other → scores used as if the same test had been taken twice →
provides coefficient of internal consistency

The standard error of measurement and the true score


Classical test theory assumes that each person has a true score that would be obtained if there were
no errors in measurement. (by taking the same test over and over again without being affected by
circumstances → scores should vary, we could calculate average score).

Standard error of measurement (SEoM)


 based on the reliability coefficient and a measure of the spread of all the scores on the test
o SEoM = 5, candidate scores 56 → his/her true score lies btw 51-61
 statements based on what is known about the pattern of scores that would occur if it were
possible to take the test over and over again
 Item Response Theory (IRT): estimate how far an individual test taker’s actual score is likely to
diverge from their true score; estimate for each individual, based on their performance on
each of the items on the test
 standard error of measurement serves to remind us that in the case of some individuals
there is quite possibly a large discrepancy btw actual score and true score
Reliability cannot be estimated directly since that would require one to know the true scores, which
according to classical test theory is impossible.

Scorer reliability
Ideally the same scorer should give the same scores regardless of the circumstances, and this would
be the same score as would be given by any other scorer on any occasion
 Scorer reliability coefficient → quantifies the level of agreement given by the same or
different scorers on different occasions
o scorer reliability coefficient of a multiple choice test: 1 (requires no judgement)
o Interview → a degree of judgement is called for on the part of the scorer, perfect
consistency is not to be expected

How to make tests more reliable


 Take enough samples of behaviour (the more items you have on a test, the more reliable the
test will be)
o each additional item should represent a fresh start → gain additional information
 Exclude items which do not discriminate well between weaker and stronger students as they
contribute little to the reliability of a test (they are either too easy or too difficult)
 Do not allow too much freedom in answering → depressing effect on reliability
 Write unambiguous items
 Provide clear and explicit instructions, both in written and oral tasks
 Ensure that tests are well laid out and perfectly legible
 Make candidates familiar with format and design techniques
 Provide uniform and non-distracting conditions of administration
 Use items that permit scoring which is as objective as possible (multiple choice, open-ended
fill in the gap type of question with one-word answers)
 Make comparisons between candidates as direct as possible (provide 2 composition items
instead of 6)
 Provide a detailed scoring key
 Train scorers
 Agree acceptable responses and appropriate scores at outset of scoring
 Identify candidates by number, not by name
 Have multiple, independent scorers

Reliability and validity


Reliability is a more accurate way of describing precision, while validity is a more precise way of
describing accuracy.
An example often used to illustrate the difference between reliability and validity in the experimental
sciences involves a common bathroom scale. If someone who is 200 pounds steps on a scale 10 times
and gets readings of 15, 250, 95, 140, etc., the scale is not reliable. If the scale consistently reads
"150", then it is reliable, but not valid. If it reads "200" each time, then the measurement is both
reliable and valid.

You might also like