0% found this document useful (0 votes)

31 views19 pages

Note For Weekend Presentation

Uploaded by

wakgarimideksa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views19 pages

Note For Weekend Presentation

Uploaded by

wakgarimideksa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

1.

Definition and classification of tests

Test. Test has narrower concept than measurement, assessment, or evaluation. Test most commonly
refers to a set of items or questions designed to be presented to one or more students under specified
conditions (Wressma and Jurs, 1985). When a test is given measurement takes place; however, all
measurement is not necessarily testing. Suppose a teacher records information about the learning
styles preferred by the student. This is an example of measurement but not considered as testing.
Achievement test: is a test that measures the extent to which an individual has achieved something -
acquired certain information or mastered certain skills, usually as a result of specific instruction or
general schooling.

Aptitude test: is usually a measure of cognitive or psychomotor domain of the likelihood of an

individual’s benefiting from a training program.

Mastery Test: is a of the extent to which a student has mastered a specific set of objectives or met
minimum requirements set by the teacher or examining agency.

Diagnostic tests: are used to measure student’s strengths and weaknesses, usually to identify
deficiencies in skills or performance. Such tests may also be used to identify learning problems. Most
often diagnostic tests are designed to provide in-depth measurement to locate the source of a particular
problem. These tests are related to prescriptive tests, which extend to prescribe learning activities
intended to overcome student deficiencies.

Measurement. A broad definition of measurement is often stated as the assignment of numbers to the
characteristics of objectives or events according to rules (Wressma and Jurs, 1985). Measurement
often includes the assignment of a number to express in quantitative terms the degree to which a pupil
possesses a given characteristic. Actually the numerals represent some specific characteristics of the
object or event. However, the numeral itself has no relevance to measurement until it is assigned
quantitative meaning.

Measurements involving, length, weight and volume are common place and readily understandable by
most people, since the quantification in such measurement is quite apparent.
However, measurement of educational attributes, although involves the same general concepts and
ideas is not as easily understood. The crucial element is of course, the rule. For this reason the rule
and what goes into it require specific attention.

Suppose a student is measured on science achievement through the use of 20-item test each item
representing 5 points. The rule is a correct response to an item receives 5 points. The points are then
totaled for the achievement score. Even if the rule is applicable and produces a score representing
quantification, the test cannot produce measurement relevant to the achievement unless the test items
are appropriate.

Assessment. The term assessment is not always used with consistent meaning. Wressma and Jurs
(1985) for example used assessment as synonymous with measurement and defined it as collecting
data in the context of conducting measurement. On the other hand, Payne (1992) defined assessment
as the systematic evaluative appraisal of an individual’s ability and performance in particular
environment or context. It is characterized by synthesis of a variety of data such as: observations,
stress interviews, performance measures, group discussions, individual and group tasks, peer ratings,
projective techniques and various kinds of structured tests. Therefore, assessment was viewed as a
device which is concerned with the totality of educational settings and which is subsuming the terms
measurement and evaluation and more inclusive than them.

Assessment should be considered separately from evaluation, although the two are related.
Assessment includes such activities as grading, examining, determining achievement in a particular
course or measuring an individual attitude about an activity, group, or job. In general, assessment is
the use of various written and oral measures and tests to determine the progress of students toward
reaching the program objectives. To be informative, assessment must be done in a systematic manner,
including ensuring consistency within measures (from one assessment period to the next with the same
instrument) and across measures (similar results achieved with different instruments). Evaluation is the
summarization and presentation of these results for the purpose of determining the overall
effectiveness of the program, the worth of the program, in order to evaluate the program.
These definitions are provided in Appendix II, the document "Uses for Evaluation Data." With this
basic knowledge, we now can turn to the steps in designing an evaluation. After describing the general
steps to an evaluation plan, the specific requirements of the Title VII bilingual education evaluation
will be addressed.

Evaluation. Evaluation is the process of making a value judgment about the worth of a students’
achievement, product, or performance (Nitko, 1996). It is a process that includes measurement and
possibly testing, but it also containing the notion of value judgment. If a teacher administers a science
test to a class and computes the percentage of correct responses, measurement and testing have taken
place. The scores must be interpreted which may mean judging them to be excellent, good, fair or
poor. This process is evaluation because value judgments are being made. Evaluation sometimes is
based on objective data; however, more commonly it involves a synthesis of information from two or
more sources such as test scores, values and impressions. In any event evaluation does include
making a value judgment and in education such judgments are based on objective information. The
following figure shows relationship between test, measurement and evaluation (Wressma & Jurs,
1985).

When a teacher makes value judgments about pupils' performance, then she is doing more than
measuring. She is using measurement data to evaluate. All teachers evaluate pupils. Evaluation takes
place when a teacher determines which students have satisfactorily completed a course and which ones
have not, when the teacher finds that John can operate the microscope better than anyone in the class,
when we decide which students are eligible for participation in interschool competition and which
students are not. In any school, evaluation is inescapable.

A student's performance may be compared with the performance of other students (normative
evaluation) as in the case of John above--he can operate a microscope better than anyone else in the
class; or a student's performance may be compared with a predetermined standard (criterion
evaluation) as in the case of determining which students are eligible for interschool competition.

Formative Evaluation: Testing occurs constantly with learning so that teachers can evaluate the
effectiveness of teaching methods along with the assessment of students' abilities.
Formative: Formative evaluations are initial or intermediate evaluations. Formative evaluation
provides information about the progress of the participant each day throughout a learning unit.
Formative evaluation involves breaking a learning unit down into smaller parts to enable both the
educator and the students to identify the precise of a task or performance that are in error and need
correcting. Formative evaluations should occur throughout the instructional, training or research
process.

Summative Evaluation

Evaluation which tests students' performance to determine students' final overall assimilation of course
material and/or overall instructional method effectiveness is summative.

Summative assessment generally takes place after a period of instruction and requires making a
judgment about the learning that has occurred (e.g., by grading or scoring a test or paper).

Summative: A final, comprehensive judgment conducted near the end of an instruction or training
program. For example, the final grade of A.

Summative Evaluation: Testing is done at the end of the instructional unit. The test score is seen as
the summation of all knowledge learned during a particular subject unit.

Norm-referenced evaluation (NRT ) is evaluation based on a comparison of a student's

performance with one or more other student's performance on the same test.

A level of achievement relative to a clearly defined subgroup, such as all women or men your
age. It means that you report how well a performance compares with that of others (people of the
same age, gender, or class)

Norm-referenced standards are designed to rank order individuals from best to worst and are
usually expressed in percentile ranks.

Criterion-referenced evaluation is evaluation based on a comparison of a student's performance

with some preset performance standard which is determined independently of the test, or test
scores.
With a criterion referenced approach, the examiner compares an individual's result against the
specific, predetermined level of achievement (standard) to determine mastery.

Criterion-referenced standards are a minimum proficiency or pass-fail standard.

Criterion-referenced tests (CRTs) determine "...what test takers can do and what they know, not
how they compare to others (Anastasi, 1988, p. 102). CRTs report how well students are doing
relative to a pre-determined performance level on a specified set of educational goals or outcomes
included in the school, district, or state curriculum.

Item analysis for criterion-referenced mastery tests

The item analysis procedure used with norm-referenced tests is no directly applicable to criterion-
referenced mastery tests. The reasons are:
 CRTs are designed to describe pupils in terms of the types of learning tasks they can
perform
 CRT items measure the effects of the instruction not to ranking of students
 In the preparation of items for Criterion-referenced tests the item writer need not make a
conscious decision to write items that will be about moderate difficulty.
 Level of difficulty and discriminating index of criterion-referenced test items are determined
by the learning outcome they are designed to measure.
The quality of test items to discriminate between high and low achievers is not crucial to criterion-
referenced test items. Because some good items might have very low, or zero, indexes of
discrimination. If all students answered a test item correctly the item would be eliminated from
norm-referenced tests. In criterion-referenced tests this may indicate that both the instruction and
the item have been effective. An item earmarked for revision for one type of test may be selected
without change for use in the other type.
The steps followed in the analysis of criterion-referenced mastery items are:
 The same test is given before instruction (pretest) and after instruction (posttest) to
determine the extent the test items measure the effects of the instruction.
 Prepare a chart by listing the number of the items across the top of the chart and the students
name down the side of the chart.
 Record correct (+) and incorrect (-) responses for each pupil on the pretest (B) and post test
(A) as indicated below.
Classroom Vs Standardized Tests

Classroom test are teacher made tests especially to be used for measuring students achievement.

Standardized Tests are commercially produced tests which are constructed professional test-
makers.

The major distinction between the standardized tests and classroom test is that in a standardized test
systematic sampling of performance (that is students’ scores) has been obtained under prescribed
directions of administrations. They also differ markedly in terms of their sampling content,
construction, norms, and purposes.
Validity

It is of paramount importance that the method of evaluation employed be able to accurately

measure the skill or knowledge that it seeks to measure, that it be valid. It is also important that
evaluations exhibit what is known as face validity. Face validity means that elements of the
evaluation appear to be related to stated course objectives. It is a common student complaint that
they could not perceive the connection between the evaluation and course objectives. It is therefore
necessary not only that the instructor be able to make a connection between the evaluation and the
course, but that the student be able to do so as well.

In addition to face validity, evaluations must have content validity. The format of an evaluation
must conform closely to the course objectives that it seeks to evaluate. If a course objective states
that students will be able to apply theories of practice to case studies, then an evaluation should
provide them with appropriate cases to demonstrate this ability.

Finally, effective methods of evaluation have certain predictive characteristics. A student who
performs well on an evaluation concerning a certain skill might be expected to perform well on
similar evaluations on related skills. Additionally, that student might be expected to score
consistently when evaluated in the future.

Reliability

The concept of reliability is closely related to (and often confused with) validity. A reliable method
of evaluation will produce similar results (within certain limitations) for the same student across
time and circumstances.

Measures of central tendency

Averages or measures of central tendency are descriptive properties of a set of observations or
their corresponding frequency distributions. The average is a central reference value which is
usually close to the point of greatest concentration of the measurements and may in some
sense be thought to typify the whole set. The three commonly used measures of central
tendency are the mean, the median and the mode. Each is described below.

The mean ( )

The mean of a set of scores is the arithmetic average. In ungrouped data it is found by
summing all the scores and dividing the sum by the number of scores. The mean is
obtained by the following formula

ΣX
= N Where: = mean
ΣX = The sums of all scores X
N = Total number of scores

The mean of the 50 scores in Table 8 is

Σ fX 49+( 47×2 )+ .. .+(31×2)+30 1949

= N = 50 = 50 = 38.98

The mean is the most useful of the three measures of central tendency because many
important statistical procedures are based on it and is based on all of the data in the
distribution. It is also a more reliable measure than either the median or the mode. In
addition to its use of describing a set of data, it is used in inferential statistics to estimate
the population mean.

The Median
The median is the point on the scale of measurement above and below which 50% of the
scores fall. It is the 50th percentile.

Computing the median for ungrouped data essentially consists of identifying the middle
score. If there are odd numbers of scores, the median is the middle score in the
distribution. If the number of scores is even, the median falls between the two middle
scores. Therefore, for the ungrouped data we compute the median by taking the
following steps.
 Arrange the scores in descending order (highest to lowest)
( N +1)
th

 The median score is the 2 value for both odd and even number of
scores.
Examples: Find the median for
(a) 19, 23, 6, 18, 3, 21, 12
(b) 18, 46, 44,23, 29, 40, 28, 27
Solution
 Arrange the scores in descending order

( N +1)
th

Determine the 2 value for the data in (a) and (b).

a) 23, 21, 19, 18, 12, 6, 3

( N +1)
th

Median = 2 value
(7+1)
th

= 2 = 4th value = 18
b) 46, 44, 40, 29, 28, 27 23, 18
( N +1) ( 8+1)
th th

Median = ( 2 value = 2 value = 4.5th value =

(29+28)
2 = 28.5

The process of computing the median becomes more complex when scores are grouped
into class intervals. The formula to calculate the median for grouped data is:
Where ll = lower exact limit of the internal containing the N/2 score
N = total number of scores
Cf = cumulative frequency of score below the interval containing the N/2 score
fi = frequency of scores in the interval containing the N/2 score
i = size (width) of the class interval .

Applying the formula for the data in Table 9, the median is:
( N
−cf ) i
Mdn=ll+
2
fi
=ll
[ N ( 0 . 50)−cf
fi ]i

Mdn=38 .5+
( 50
2
−22 ) 3

12 = 38.58

The median is a very useful statistic and can be used for a fairly small distribution with a
few extreme scores, when distribution is badly skewed, or when there are missing scores.

The Mode
The mode is the simplest index of central tendency. It is the most frequent score in the
distribution. In ungroup data it is determined by inspection or counting rather than by
computation. In grouped data the mode can be estimated by using:

Mode = 3 median - 2 mean

= 3 (38.58) – 2(38.98) = 37.78
The mode is used when there is a need to quickly estimate the central tendency, when
there are large number of cases, or when the data is nominal or categorical.

5.1.3. Measures of variations

Measures of variability are indicators of the dispersion of the distribution of scores.
Although averages summarize the central tendency of a group of scores, they do not
summarize how the raw the scores spread out over the score scale. For example, the
mean of the English test for two 10th grade classes may be 65. However, in one class the
scores may range widely from 90 to 25, while in the other the scores may range from 60
to 70. Obviously, the students in the latter class are more nearly alike in their English
achievement than students in the former class. You will need to create more widely
different English levels when teaching the former class than when teaching the latter.
This section describes one of the measures of variability, its square root, the variance and
the standard deviation.
Variance
The standard deviation measures how widely the scores in the distribution are spread
about the mean. It is the square root of the average squared difference between the scores
and the mean, the variance.
The standard deviation of a population is symbolically defined

√
2
∑i x
Where
σ=
√ ss
N
= ∑
( X−μ )2
N
=
N

 = mean of the distribution

N = total number of scores in the distribution
ss = sum of squares
X = any raw score
x = any deviation score
 = population standard deviation
The standard deviation of a sample is symbolically can be defined as

s=
√
ss
N−1
= ∑
N−1 √
( X −X )2

Where
= mean of the sample
s = sample standard deviation
The use of definition formula of variance would be a tedious task for large number of
cases. The following computational formula is used to find the sample standard deviation
which is derived algebraically from the definition formula.

√
(∑ 2 X )
2

∑
√ n ∑ x 2 −( ∑ X )
2 2
X−
s=
√ ss
n−1
=
n−1
n
or n(n−1 )

To illustrate the computation of standard deviation, we will consider data in Table 9. The
midpoints of the class intervals serve as X. the procedures for computing values required
in the computation is as follows.

Table 10: Data for the computation of the standard deviation

Midpoint fX fX2
Class Interval f
(X)
48-50 1 49 49 2401
45-47 6 46 276 12696
42-44 9 43 387 16641
39-41 12 40 480 19200
36-38 8 37 296 10952
33-35 9 34 306 10404
30-32 5 31 155 4805
Total 50 1949 77099
From the Table 9 the following results are obtained, which are inserted in the formula of
standard deviation for computation of the standard deviation of the data in Table 10.
∑ fX = 1,949
∑ fX 2 = 77 ,099
n = 50

√
n ( √ fx 2 ) −( ∑ fx )
2

s=
n ( n−1 )

= √
50(77 , 099 ) − (1949)2
(50)( 49)

√
3854950−3798601 56349
= 2450
= √22. 999
=
2450

= 4.80

The standard deviation is the measure of dispersion of the scores around the mean. The
mean of the data is 38.98. Therefore, one standard deviation above the mean is 38.98 +
4.80 = 43.78 and two standard deviation above the mean is 38.98 + (4.80 × 2) = 48.58.
Similarly, one standard deviation below the mean is 38.98-4.80 = 34.18 and two standard
deviation below the mean is 38.98 – (4.80 × 2) = 29.38. This means that almost all scores
in the data lies within ± 2 standard deviations.

6.1.4. Measures of Correlation

Correlation deals with the extent to which two or more variables are related. The
magnitude of the relationship between the variables is measured by an index called
correlation coefficient.

The first step in determining the nature of the relationship is to graph the data. A graph,
or scatter diagram, of the pairs of scores of two variables (which may be tests scorer) can
be plotted to give a visual illustration of the relationship between the two variables.
For example, suppose that 20 students took two tests, one in English and the other in
History. The pair of scores are presented below.

Table 11: English and History Scores

Student English Afan Oromoo
1 7 5
2 10 12
3 18 10
4 20 12
5 11 17
6 8 13
7 25 15
8 16 18
9 10 18
10 8 5
11 18 14
12 29 20
13 24 22
14 16 12
15 9 10
16 16 18
17 12 6
18 11 9
19 24 23
20 14 22

The scatterplot of the data is given blow.

25.00
History Scores

20.00

15.00

10.00

5.00

5.00 10.00 15.00 20.00 25.00 30.00

English Scores
Note that the points plotted tend to run from the lower left to the upper right. This pattern
represents a positive correlation between the two variables. In other words, a positive
relationship is represented when high scores on variable X are associated with high
scores on variable Y. On the other hand, points running from the upper left to the lower
right would represent a negative correlation. A perfect relationship between two
variables exists when all points in the scatterplot lie on a straight-line. A scatterplot in
which the points are in a nearly circular pattern illustrates zero or near-zero correlation.

The computed value of the perfect positive correlation coefficient is +1.00, and the
computed value of the perfect negative correlation coefficient is -1.00. When no
relationship exists between the two variables, the correlation coefficient is 0.00. The
computed value of the correlation coefficient is a function of the slope of the general
pattern of points in the scotterplot and the width of the ellipse that encloses the points. If
the slope is negative, the sign of the correlation coefficient is negative. If the width of the
ellipse is narrow, the degree of relationship is larger and the correlation coefficient is
larger.

The Pearson Product-Moment Correlation Coefficient

The most commonly used correlation coefficient in the behavioral sciences is the
Pearson product-moment correlation coefficient, or the Pearson r. The level of
measurement required for using the Pearson r are the interval and ratio scales.
The standard score formula for Pearson r is
∑ Zx ZY
r=
N
X− X

Where: Zx = standard score (Z) of X = s X

Y −Y

Zy = standard score of Y = Y s
N = Number of cases or observations

The formula states that all X and Y scores must be converted into standard scores or Z
scores and the product of each pair of Z scores computed. These products then summed
and the sum divided by N. More simply r equals the sum of the cross products or two
variables, in standard score form, divided by N.

Computational Formula for the Correlation Coefficient

The Z-score formula is not generally used to compute the Pearson r, because it requires
converting all observed scores to standard scores. However, by using the definition of
standard scores, it is possible to drive a computational formula that involves only the
observed X and Y scores. The formula is
N ∑ XY −( ∑ X )( ∑ Y )
r=

√[ ( ∑ X ) ][ (∑ Y ) ]
2 2

N∑ X
2
N∑Y
2
− −

where N = number of pairs of scores

X = the sum of the X scores
Y = the sum of the Y sores
X2= the sum of the X squares
Y2= the sum of the Y squares
XY = Sum of products of X and Y scores
The concepts of reliability, validity and usability of tests will be discussed in the
following pages.

Validity

It is of paramount importance that the method of evaluation employed be able to accurately

measure the skill or knowledge that it seeks to measure, that it be valid. It is also important that
evaluations exhibit what is known as face validity. Face validity means that elements of the
evaluation appear to be related to stated course objectives. It is a common student complaint
that they could not perceive the connection between the evaluation and course objectives. It is
therefore necessary not only that the instructor be able to make a connection between the
evaluation and the course, but that the student be able to do so as well.

In addition to face validity, evaluations must have content validity. The format of an evaluation
must conform closely to the course objectives that it seeks to evaluate. If a course objective
states that students will be able to apply theories of practice to case studies, then an evaluation
should provide them with appropriate cases to demonstrate this ability.

When constructing or selecting tests and other evaluation instruments, the most
important question is, to what extent will the interpretation of the scores be
appropriate, meaningful, and useful for the intended application of the results? The
validity of data collected implies that the evaluation has actually been focused on
the subject initially targeted for evaluation. For instance, for the sake of validity,
learners’ written and oral skills cannot be evaluated with the same tests.
Validity refers to:
 The appropriateness, meaningfulness, and usefulness of the specific
inferences made from test scores.
 The consistency (accuracy) with which the scores measure a particular
cognitive ability of interest.
From these definitions we can see that there are two aspects of validity. These are
 What is measured (refers to abilities to perform observable tasks, or
command of substantive knowledge)
 How consistently it is measured? (Refers to the reliability of the scores)
For example: if a test is to be used to describe students’ achievement, we should be
able to interpret the scores as a relevant and representative sample of the achievement
domain to be measured. If the results are to be used as a measure of pupils reading
comprehension, we should like our interpretation to be based on evidence that the scores
actually reflect reading comprehension and are not distorted by irrelevant factors.
Basically, then, validity is always concerned with the specific use of the results and the
soundness of our proposed interpretations.

Reliability is a necessary ingredient of validity but it is not sufficient to ensure validity.

Unless the test scores measure what the test user intends to measure, no matter how
reliably, the scores will not be valid.

When interpreting validity in relation to testing and evaluation, there are certain things to
remember. These are:
1. Validity refers to the appropriateness of the interpretation of the results of a test or
evaluation instrument for a given group of individuals, and the instrument itself.
2. Validity is a matter of degree; it does not exist on an all-or-none basis. Hence, high
validity, moderate validity, and low validity.
3. Validity is always specific to some particular use or interpretation. No test is valid
for all purposes.
4. Validity is a unitary concept. In the most recent revision of the standards, the
traditional view that there are several different types of validity has been discarded.
Instead, validity is viewed as a unitary concept based on various kinds of evidence.

Kinds of validity Evidence

i. Content validity
Content validity is related to how adequately the content of the test samples the
domain about which inferences are to be made. The procedure is to compare the test
tasks to the test specifications describing the task domain under consideration. If the
test specification is carefully constructed and carefully followed in building the test,
it will contribute much to ensure content validity.
There is no single commonly used numerical expression for content validity. It is
determined by:
 Whether or not each item represents the total domain or sub domain;
 Critical inspection of the test items to determine the items represent the
content;
 Inter judge agreement about the match of the items to the domain;
 Building two tests over the same content, giving both to the same set of
students and correlating the results

ii. Criterion-related validity

Criterion-related validity provides information on how well test performance predicts
future performance or estimates current performance on some valued measures other
than the test itself called criterion. The procedure is to compare test scores with
another measure of performance obtained at a later date (for prediction) or with
another measure of performance obtained concurrently (for estimating present
status). For example, scores on a dictation test are a generally accepted as measures
of spelling achievement. We can make a distinction between two kinds of criterion-
related validity. These are:
 Concurrent validity: criteria data are collected at approximately the
same time as the test data.
 Predictive validity: data are gathered at a later date. The concern is with the
usefulness of the test scores in predicting some future performance.
Example:
 Does score on the CEE (college entrance examination) predict freshman program
performance in the university?
 Do supervisor’s ratings predict success on the job?
There is a need to show that a positive relationship exists between scores on the CEE
(the predictor) and grade point average on freshman program performance (the
criterion). If a correlation of say 0.60 is obtained, we might conclude that the
examination is a useful predictor of future performance, that is, there is support for using
the scores to predict success in university. The degree of positive relationship between
the predictor and criterion can be perfect positive correlation(r = 1.00), moderate
positive correlation(r = 0.60), and no correlation(r = 0.00).
One problem in criterion related validity is lack of suitable criteria for validating
achievement tests. This poses a problem for instructors to select satisfactory criteria to
validate their examinations. In that case instructors have to depend on procedures of
logical analysis to ensure valid test interpretation. Accordingly, they are advised to:
 Carefully identify the objectives of instruction
 State the objectives in terms of changes in students’ performance.
 Construct or select evaluation instruments that satisfactorily measure the learning
outcomes sought.

iii. Construct-related validity

When we are interested to interpret test performance in terms of some psychological trait
or quality, we are concerned with construct-related validity evidence. Construct validity
is the degree to which we can infer certain constructs in a psychological theory from the
test scores. A construct is a psychological quality that we assume exists in order to
explain some aspect of behavior.
Examples:
 Mathematical reasoning ability
 Intelligence
 Creativity
 Reading comprehension
 Sociability
 Honesty
For example, rather than speak about a pupils score on a particular mathematics test or
how well it predicts grades in future mathematics courses, we might want to infer that
the pupils possesses a certain degree of mathematical reasoning ability.

The test constructor builds a paper and pencil test to measure mathematical reasoning.
The mathematical reasoning test would be considered to have construct validity to the
degree that test scores are related to the judgments made from observing behavior
identified by the psychological theory as mathematical reasoning. If the anticipated
relationships are not found, then the construct validity of the inference that the test is
measuring mathematical reasoning is not supported. Construct validation has been
commonly used with theory building and theory testing.

Factors influencing validity

The following factors can prevent the test items from functioning as intended and
thereby lower the validity of the interpretations from the test scores.
1. Unclear directions;
2. Reading vocabulary and sentence structure too difficult;
3. Inappropriate level of difficulty of the test items;
4. Poorly constructed test items;
5. Ambiguity;
6. Test items inappropriate for the outcome being measured;
7. Inadequate time limits;
8. Test too short;
9. Improper arrangement of items;
10. Identifiable pattern of answers.

Reliability
The concept of reliability is closely related to (and often confused with) validity. A reliable method of
evaluation will produce similar results (within certain limitations) for the same student across time and
circumstances. Reliability can be defined as the degree of consistency between two measures of
the same thing. It:
 Provides the competency that makes validity possible.
 Indicates how much confidence we can place in our results.
The concept of reliability as applied to testing and evaluation can be clarified by noting the
following general points.
1. Reliability refers to the results (test scores) obtained with an evaluation instrument and not
to the instrument itself.
2. Estimates of reliability always refer to a particular type of consistency. Test scores are not
reliable in general. They are reliable or generalizable over different
 periods of time
 sample of questions
 raters
3. Reliability is necessary but not sufficient condition for validity. A test that produces
totally inconsistent results cannot possibly provide valid information about the
performance being measured. Low reliability can be expected to restrict the degree of
validity that is obtained. High reliability does not ensure that a satisfactory degree of
validity will be present.
4. Reliability is primarily statistical. Shifts in the relative standing of students in the group
or in terms of the amount of variation to be expected in individuals’ score is reported
by means of reliability coefficient or standard error of measurement.

Educational Measurement and Evaluation
100% (1)
Educational Measurement and Evaluation
46 pages
Measuring & Evaluating Learning Outcomes
88% (17)
Measuring & Evaluating Learning Outcomes
122 pages
Basic Concepts in Assessment Notes 1
100% (3)
Basic Concepts in Assessment Notes 1
6 pages
Assessment Book PDF
0% (1)
Assessment Book PDF
56 pages
Purpose of Taking Advance Rotc
100% (1)
Purpose of Taking Advance Rotc
16 pages
4298 Bep 3101 Educational Measurement and Evaluation
No ratings yet
4298 Bep 3101 Educational Measurement and Evaluation
64 pages
Measurement
No ratings yet
Measurement
15 pages
PSY311 Notes - Mrs. Addoh
No ratings yet
PSY311 Notes - Mrs. Addoh
67 pages
Measurement, Assessment, Evaluation
No ratings yet
Measurement, Assessment, Evaluation
4 pages
Measurement UNIT 1-3 Final
No ratings yet
Measurement UNIT 1-3 Final
168 pages
Edu 213 Group 1
No ratings yet
Edu 213 Group 1
55 pages
Unit 1-9 - Assessment in Language Education - Tập Bài Giảng
No ratings yet
Unit 1-9 - Assessment in Language Education - Tập Bài Giảng
105 pages
Chapter 1
No ratings yet
Chapter 1
19 pages
Evaluation
No ratings yet
Evaluation
90 pages
University of Northeastern Philippines: School of Graduate Studies and Research
No ratings yet
University of Northeastern Philippines: School of Graduate Studies and Research
7 pages
Definition and Purposes of Measurement and Evaluation
57% (7)
Definition and Purposes of Measurement and Evaluation
7 pages
Assessment Fundamentals Guide
No ratings yet
Assessment Fundamentals Guide
12 pages
Assessment of Learning I
No ratings yet
Assessment of Learning I
189 pages
Assessment and Evaluation
No ratings yet
Assessment and Evaluation
21 pages
New - Measurement and Evaluation
No ratings yet
New - Measurement and Evaluation
67 pages
Educational Measurement Course Guide
No ratings yet
Educational Measurement Course Guide
76 pages
Educational Measurement & Evaluation
No ratings yet
Educational Measurement & Evaluation
6 pages
Education
No ratings yet
Education
6 pages
Assessment H1
100% (1)
Assessment H1
9 pages
Buku Kompen (123 141)
No ratings yet
Buku Kompen (123 141)
19 pages
Lesson 1
No ratings yet
Lesson 1
17 pages
Assignment No.1 (Code 6507)
No ratings yet
Assignment No.1 (Code 6507)
12 pages
Formative and Summative Evaluation
No ratings yet
Formative and Summative Evaluation
8 pages
Assessment Terminology 1
No ratings yet
Assessment Terminology 1
4 pages
ASSESSMENT OF LEARNING - assignmentPED - 6
No ratings yet
ASSESSMENT OF LEARNING - assignmentPED - 6
5 pages
Measurement and Evaluation: Unit - 8
No ratings yet
Measurement and Evaluation: Unit - 8
43 pages
Assessment of Learning I
No ratings yet
Assessment of Learning I
189 pages
Basics of Student Assessment
No ratings yet
Basics of Student Assessment
189 pages
Evaluation and Research
No ratings yet
Evaluation and Research
4 pages
Module 10 Measurement Assessment Evaluation
No ratings yet
Module 10 Measurement Assessment Evaluation
5 pages
8602 Assignment 1
No ratings yet
8602 Assignment 1
25 pages
EBS 234 Assessment in Basic Schools
No ratings yet
EBS 234 Assessment in Basic Schools
92 pages
Educational Evaluation Concepts
No ratings yet
Educational Evaluation Concepts
5 pages
PSY 311 Week 1
No ratings yet
PSY 311 Week 1
10 pages
Classroom Assessment and Evaluation
No ratings yet
Classroom Assessment and Evaluation
9 pages
The Concepts of Test
No ratings yet
The Concepts of Test
8 pages
Educational Assessment and Evaluation 8602: Course Code
No ratings yet
Educational Assessment and Evaluation 8602: Course Code
18 pages
Give Teachers A Stronger Voice in Assessment
No ratings yet
Give Teachers A Stronger Voice in Assessment
8 pages
Module For Assesstment
100% (1)
Module For Assesstment
14 pages
Assignment No 1: Q.1 Define Measurement, Assessment and Evaluation. Differentiate These With
No ratings yet
Assignment No 1: Q.1 Define Measurement, Assessment and Evaluation. Differentiate These With
10 pages
Assessment in Learning 1. Lesson 1
No ratings yet
Assessment in Learning 1. Lesson 1
7 pages
Chapter 1 All Lessons
No ratings yet
Chapter 1 All Lessons
9 pages
CC 9 Assesment and Learning
No ratings yet
CC 9 Assesment and Learning
131 pages
Epsc 311 Measurement and Evaluation-1
No ratings yet
Epsc 311 Measurement and Evaluation-1
13 pages
Tps 201 Test & Measurements
No ratings yet
Tps 201 Test & Measurements
57 pages
204 Unit 1
No ratings yet
204 Unit 1
25 pages
Assess Reviewer
No ratings yet
Assess Reviewer
12 pages
Assessment of Learning 1 1
100% (2)
Assessment of Learning 1 1
125 pages
Difference - Assessment and Evaluation
No ratings yet
Difference - Assessment and Evaluation
75 pages
Measurement and Evaluation
No ratings yet
Measurement and Evaluation
12 pages
Measurement, Assessment, Evaluation and Tests
No ratings yet
Measurement, Assessment, Evaluation and Tests
25 pages
Measurement and Evaluation in Education
No ratings yet
Measurement and Evaluation in Education
56 pages
Final Synthesis Project
No ratings yet
Final Synthesis Project
8 pages
M. Ali CV
No ratings yet
M. Ali CV
2 pages
Kids' Summer Writing Camps
No ratings yet
Kids' Summer Writing Camps
1 page
Vocational Skills Championships
No ratings yet
Vocational Skills Championships
1 page
Employee Training Impact at TUM
No ratings yet
Employee Training Impact at TUM
25 pages
Cambridge International AS Level: Arabic 8680/31 October/November 2022
No ratings yet
Cambridge International AS Level: Arabic 8680/31 October/November 2022
3 pages
m8l18 PDF
No ratings yet
m8l18 PDF
25 pages
RW-Week 6
No ratings yet
RW-Week 6
3 pages
Class 12 Chemistry Bengali Cbse
No ratings yet
Class 12 Chemistry Bengali Cbse
10 pages
Assessing Children's Pain: R-Flacc Pain Rating Scale For Children With Developmental Disability
0% (1)
Assessing Children's Pain: R-Flacc Pain Rating Scale For Children With Developmental Disability
1 page
Jurnal Unsur Hara
No ratings yet
Jurnal Unsur Hara
35 pages
Carol Gilligan': S Theory OF Oral Development
No ratings yet
Carol Gilligan': S Theory OF Oral Development
14 pages
SPM Kedah 2018 Biology Marking Scheme
No ratings yet
SPM Kedah 2018 Biology Marking Scheme
9 pages
Skills 360 - Making The Most of Personal Learning (Part 1) Discussion Questions
No ratings yet
Skills 360 - Making The Most of Personal Learning (Part 1) Discussion Questions
6 pages
Grade 9 Dressmaking Analysis
No ratings yet
Grade 9 Dressmaking Analysis
6 pages
Multimodal AI On Wound Images and Clinical Notes For Home Patient Referral
No ratings yet
Multimodal AI On Wound Images and Clinical Notes For Home Patient Referral
11 pages
Classroom Constitution
No ratings yet
Classroom Constitution
2 pages
Comparative SComparative Study The Kurt Lewin of Changtudy The Kurt Lewin of Chang
100% (1)
Comparative SComparative Study The Kurt Lewin of Changtudy The Kurt Lewin of Chang
4 pages
Draft Intern Report Well Group
No ratings yet
Draft Intern Report Well Group
68 pages
Cultural and Social Studies Modules
No ratings yet
Cultural and Social Studies Modules
16 pages
SMCR Model of Communication
No ratings yet
SMCR Model of Communication
11 pages
Transactional and Interactional
No ratings yet
Transactional and Interactional
12 pages
3RD PT Eng3 Tos
No ratings yet
3RD PT Eng3 Tos
2 pages
User Experience Design Project Guide
No ratings yet
User Experience Design Project Guide
70 pages
I-Ready Placement Tables 2017-2018final
No ratings yet
I-Ready Placement Tables 2017-2018final
6 pages
Reduce 2nd Grade Lunch Detentions
No ratings yet
Reduce 2nd Grade Lunch Detentions
1 page
OB-GYN Outpatient Census 6/7/19
No ratings yet
OB-GYN Outpatient Census 6/7/19
2 pages
Model: BERT + DNN Discussion: Anushya Subbiah Divya Sudhakar Kenny Hsu
No ratings yet
Model: BERT + DNN Discussion: Anushya Subbiah Divya Sudhakar Kenny Hsu
1 page

Note For Weekend Presentation

Uploaded by

Note For Weekend Presentation

Uploaded by

1.

Definition and classification of tests

Aptitude test: is usually a measure of cognitive or psychomotor domain of the likelihood of an

Norm-referenced evaluation (NRT ) is evaluation based on a comparison of a student's

Criterion-referenced evaluation is evaluation based on a comparison of a student's performance

Criterion-referenced standards are a minimum proficiency or pass-fail standard.

Item analysis for criterion-referenced mastery tests

It is of paramount importance that the method of evaluation employed be able to accurately

Measures of central tendency

The mean of the 50 scores in Table 8 is

Σ fX 49+( 47×2 )+ .. .+(31×2)+30 1949

Determine the 2 value for the data in (a) and (b).

a) 23, 21, 19, 18, 12, 6, 3

Median = ( 2 value = 2 value = 4.5th value =

Mode = 3 median - 2 mean

5.1.3. Measures of variations

 = mean of the distribution

Table 10: Data for the computation of the standard deviation

6.1.4. Measures of Correlation

Table 11: English and History Scores

The scatterplot of the data is given blow.

5.00 10.00 15.00 20.00 25.00 30.00

The Pearson Product-Moment Correlation Coefficient

Where: Zx = standard score (Z) of X = s X

Computational Formula for the Correlation Coefficient

where N = number of pairs of scores

It is of paramount importance that the method of evaluation employed be able to accurately

Reliability is a necessary ingredient of validity but it is not sufficient to ensure validity.

Kinds of validity Evidence

ii. Criterion-related validity

iii. Construct-related validity

Factors influencing validity

You might also like