Sample Size Determination For Rasc
Sample Size Determination For Rasc
4, 708–724
D ECEMBER 2010
DOI : 10.1007/ S 11336-010-9182-4
C LEMENS D RAXLER
LUDWIG-MAXIMILIAN-UNIVERSITY MUNICH
This paper is concerned with supplementing statistical tests for the Rasch model so that additionally
to the probability of the error of the first kind (Type I probability) the probability of the error of the second
kind (Type II probability) can be controlled at a predetermined level by basing the test on the appropriate
number of observations. An approach to determining a practically meaningful extent of model deviation is
proposed, and the approximate distribution of the Wald test is derived under the extent of model deviation
of interest.
Key words: Rasch model, Wald test, sample size, error of the second kind (Type II error).
1. Introduction
Testing the Rasch model from the frequentists’ point of view has a long tradition. Various
test statistics have been proposed, including exact tests based on discrete probability distributions
as well as statistics based on asymptotic theory.
As far as the former are concerned, Georg Rasch himself initiated the development. He
proposed an exact, parameter-free inference approach to testing the dichotomous Rasch model
(Rasch, 1960; Fischer, 1974) which is based on the conditional uniform distribution of the obser-
vations (responses of a number of persons to a number of items) given the sufficient statistics of
the model’s parameters. Ponocny (2001) proposed many other test statistics which have power
against violations of different assumptions of the model. These are uniformly most powerful un-
biased tests. Through lack of suitable combinatorial or analytical methods to derive the exact,
discrete distribution of these test statistics (under the complicated uniform distribution), Monte
Carlo methods have been applied to approximate the exact distributions by random sampling
from the uniform distribution (Snijders, 1991; Ponocny, 2001; Chen & Small, 2005; Verhelst,
2008). Molenaar (1983) pointed to some exceptional cases of exact tests.
The majority of research with regard to testing the Rasch model has focused on the applica-
tion of asymptotic theory. In their review of such tests Glas and Verhelst (1995a, 1995b) named
generalized Pearson χ 2 tests (van den Wollenberg, 1982; Glas, 1988; Glas & Verhelst, 1989;
Verhelst & Glas, 1995), likelihood ratio tests (Andersen 1973, 1980; Martin-Löf, 1973; Kelder-
man 1984, 1989), Wald tests (Wald, 1943), and Lagrange multiplier tests (Aitchison & Silvey,
1958). In contrast to the exact tests, these tests also apply to polytomous Rasch models.
The research mentioned above has solely focused on deriving the exact or the asymptotic
distribution of a statistic under the hypothesis of the validity of the Rasch model. Consequently,
the probability of the error of the second kind (Type II probability) β is left uncontrolled when
a model test is carried out. This paper is generally concerned with supplementing the existing
procedures for testing the Rasch model so that in addition to the probability of the error of the first
kind (Type I probability) α the probability of the error of the second kind β can be controlled at
a predetermined level. In particular, this paper focuses on the Wald test (Wald, 1943), which has
been discussed with regard to testing the Rasch model by Glas and Verhelst (1995a, 1995b). The
Requests for reprints should be sent to Clemens Draxler, Department Psychology, Ludwig-Maximilians-Universität
München, Leopoldstraße 13, 80802 Munich, Germany. E-mail: [email protected]
708
© 2010 The Psychometric Society
CLEMENS DRAXLER 709
objective is to approximate the complete distribution of the Wald statistic for a finite number of
observations under a practically meaningful and useful alternative hypothesis. This involves the
predetermination of a practically meaningful extent of deviation from the Rasch model, measured
on a useful scale so that the acceptance of the model is considered an error of practical importance
whenever the true extent of deviation is at least as great as the one predetermined. The upper
bound of the probability of accepting the model when the true extent of deviation is greater than
or equal to the one predetermined can then be controlled at a predetermined level β by basing
the test on the respective (appropriate) number of observations (given the probability of the error
of the first kind α).
In their review of statistical tests for testing polytomous Rasch models, Glas and Verhelst
(1995b) define a general framework of Rasch models for polytomous item responses. The Wald
test is one of the tests discussed by Glas and Verhelst which applies to this general class of
models, and so are the proposals and results of this paper.
The general model is defined as follows. Consider k items indexed by i = 1, . . . , k with
mi + 1 response categories indexed by h = 0, 1, . . . , mi . Let the binary response of a person to
category h of item i be modeled by the Bernoulli variable Xih which can take on the values
xih = 0 and xih = 1. The probability distribution of Xih is determined by
q
exp[xih ( p=1 rihp θp − dc=1 sihc βic )] exp[xih (r ih θ − s ih β i )]
P (Xih = xih ) = m = mi
, (1)
l=0 exp(r il θ − s il β i )
q d
l=0 exp( p=1 rilp θp −
i
c=1 silc βic )
Identifiability requires an additional restriction, for instance, setting one of the k item parameters
equal to a constant or setting the sum of the k item parameters equal to a constant. For example,
one can set βk = 0 or ki=1 βi = 0. Another unidimensional model frequently applied for poly-
tomous item responses is the partial credit model. It is derived from (1) by setting q = 1 so that
the person parameter vector θ and the vector of weights r ih reduce to scalars θ and rih . The latter
is restricted so that rih = h. Further, let the vector s ih be restricted to the scalar sih . That is, each
category h of each item i is associated with one weight sih . Let sih = 0, for i = 1, . . . , k, h = 0,
and sih = 1, for i = 1, . . . , k, h = 1, . . . , mi . With these settings one obtains the well-known
form of the partial credit model P (Xih = xih ) ∝ exp[xih (hθ − hc=1 βic )], for i = 1, . . . , k,
h = 1, . . . , mi , and with the exponent equal to 0, for h = 0. The set of item parameters {βic } can
be interpreted as a set of response category bounds between successive categories. An additional
identifiability
i constraint sets the sum of all category bounds (item parameters) equal to 0. That
is, ki=1 m c=1 βic = 0.
A remarkable feature of Rasch models is the separability of the person parameters and
the item parameters. By conditioning on the sufficient statistics for the incidental person pa-
rameters, the conditional likelihood is a function of the item parameters only. Maximizing the
conditional likelihood yields a consistent estimator for the item parameters (Andersen, 1970).
This is the well-known conditional maximum likelihood (CML) procedure. The application
of the Wald statistic (Wald, 1943) for testing the Rasch model shall be based on the asymp-
totic distribution of the CML estimator of the item parameters in this paper. CML estima-
tion of the item parameters of model (1) is feasible and is briefly indicated as follows (Glas
& Verhelst, 1995b). Consider the following matrices of weights. Let a q × mi matrix be de-
fined as R i = [r i0 , . . . , r ih , . . . , r imi ], for i = 1, . . . , k, and let a d × mi matrix be defined
as S i = [s i0 , . . . , s ih , . . . , s imi ], for i = 1, . . . , k. Further let R = [R 1 , . . . , R i , . . . , R ki ] be a
q × kmi matrix and S = [S 1 , . . . , S i , . . . , S k ] be a d × kmi matrix. Define the response of a
person to item i by the vector x i = (xi0 , . . . , xih , . . . , ximi ) so that the responses of a person to
all items, the response pattern x, is given by x = (x 1 , . . . , x i , . . . , x k ). Denote by {x} the set
of all possible response patterns. For each response pattern, for each element of {x}, define the
vector of sufficient statistics r = Rx for the vector of person parameters θ . Denote by {r} the set
of all possible values of r and define for each value of r the set {x | r = Rx} of response patterns
consistent with r = Rx. Note that each set {x | r = Rx} containing one response pattern only
will be excluded from all further considerations. The conditional probability of such a response
pattern is always equal to 1 and thus does not contain any statistical information (about the va-
lidity of the model). Finally, let β = (β 1 , . . . , β i , . . . , β k ). With these preparations as well as the
assumption of local independence, the conditional probability of the responses of a person to k
items, the response pattern x, is given by
exp(−x S β)
P (x | r = Rx) = , (2)
γr (β)
where γr (β) is a combinatorial function, a normalizing constant not depending on the obser-
vations and which is defined by the sum of exp(−y S β) over the set {y | r = Ry} of all re-
sponse patterns consistent with the associated value of r = Ry. The conditional likelihood of all
observations is then obtained by the product over the conditional probabilities of all observed
response patterns. Taking the partial derivatives of the conditional likelihood with respect to the
item parameters and setting them equal to zero yields the CML estimation equations for the item
parameters. Since the model considered defines an exponential family, it is well known (Ander-
sen, 1980) that parameter estimation reduces to equating the observed sufficient statistics (for the
item parameters) to their expected values.
CLEMENS DRAXLER 711
The Wald test (1943) is based on the rationale that there exists a general model and a special
case of it which is derived by imposing one or more restrictions on the general model. The
statistical hypothesis to be tested is given by these restrictions. This principle applied to the
problem of testing the Rasch model means assuming that the Rasch model holds for different
subpopulations of persons separately and testing the restriction of the equality of these models
(the equality of the parameters). This paper is concerned with testing a general class of Rasch
models defined by (1) and (2) respectively. Let the population of persons (respondents to the
items) be partitioned into u subpopulations indexed by t = 1, . . . , u and assume that model (2)
and its item parameters respectively holds separately for the u subpopulations. This shall be
indicated by introducing the index t for the item parameter vector β. Thus, β t shall be associated
with subpopulation t. The partition can either be based on an external criterion such as sex,
age, education, etc. or on the vector r of person scores (the sufficient statistics for the person
parameters). If the partition is based on the latter, the test will particularly have power against the
alternative of unequal item discriminations as modeled by the two- and three-parameter logistic
(2 PL and 3 PL) models. Testing the Rasch model which is defined for the whole population of
respondents is then equivalent to testing the restriction that the vector of functions
f (φ) = β 1∗ − β 2∗ , . . . , β t ∗ − β t+1
∗
, . . . , β u−1
∗
− β u∗ = 0 , (3)
where φ = (β 1∗ , . . . , β t ∗ , . . . , β u∗ ). The asterisk indicates that the dimension of β ∗t and β ∗t+1 de-
pends on and is equal to the number of differences between the free item parameters in subpopu-
lations t and t +1 which are restricted to be equal to 0. Any parameters not involved in any restric-
tion will be discarded. Thus, the hypothesis defined by (3) is the statement of the equality of the
vectors of free item parameters between the subpopulations, that is, β ∗1 = · · · = β ∗t = · · · = β ∗u .
This hypothesis is tested against the alternative hypothesis
f (φ) = 0 (4)
To determine the sample size for the Wald test of the Rasch model for given values of the
probabilities of the errors of the first and second kind α and β, the objective is to set the vector
of functions f (φ) equal to a vector c. That is,
f (φ) = c = 0. (8)
The vector c has to be chosen on the basis of practical considerations, which shall be discussed
in detail below. The limiting distribution of the statistic (5) under (8) can be derived under the
additional technical assumption that the model deviation (8) converges to (3) as the number
of observations n → ∞ at the rate n−1/2 or faster (Wald, 1943; Stroud, 1972). In practice, it
is reasonable to use a fixed model deviation like (8) to approximate the distribution of (5) for
a given value of n (Agresti, 2002, pp. 243, 591–592). In order to obtain this approximation
of the distribution of (5) under (8) for a given value of n, notice that the CML estimator β̂ t
has approximately a multivariate normal distribution, with β t as the vector of expected values
(population means) and Σ t as the covariance matrix (Andersen 1970, 1980). If (8) holds, the
expected values of the elements of the vector of differences f (φ̂) between pairs of multivariately
normally distributed estimators are equal to the elements of the vector c = 0. It follows that the
joint distribution (of the estimators) of these differences is multivariately normal with expected
values given by the vector c and the covariance matrix T (φ)Σ ∗ T (φ). Hence, under the model
deviation given by (8), the quadratic form (5) has approximately a noncentral χ 2 distribution with
the number of degrees of freedom equal to the number of tested restrictions and the noncentrality
parameter
−1
λ = c T (φ)Σ ∗ T (φ) c. (9)
The scalar parameter λ represents the model deviation defined by the vector (8).
The conditional form (2) of the model considered in this paper determines the conditional
probability of each element in the set {x | r = Rx} separately for all possible values that the vec-
tor of sufficient statistics r = Rx can take. The conditional probabilities of the response patterns
within each set {x | r = Rx} add up to 1. Thus, in order to derive a formula for the determination
of the sample size for the test of hypothesis (3) against (8), given the error probabilities α and β,
each set has to be treated separately. To do so, consider the asymptotic covariance matrix Σ t
of β̂ t (of all item parameters associated with t), for t = 1, . . . , u. It is given by
Σ t = −I −1
t , (10)
CLEMENS DRAXLER 713
where I t is defined as the matrix of the expected values of the second-order partial deriva-
tives of the conditional likelihood function with respect to the item parameters associated with
subpopulation t, and −I t is referred to as the associated information matrix (Fischer, 1974;
Andersen, 1980). Since all entries (the second-order partial derivatives) of I t are composed of a
sum over the elements of the set {r}, I t can be written as
It = I tr . (11)
{r}
If one or more of the u subpopulations are defined as a subset of {r}, notice that the summation
has to be taken over the elements of the respective subset of {r}. All entries of I tr can be written
as the product of two factors, where one of the factors is the number of respondents ntr drawn
from subpopulation t with the vector of sufficient statistics taking the value r. Hence,
I tr = ntr Γ tr , (12)
for each element in the set {r} and for t = 1, . . . , u. Let for each element in the set {r} and for
each subpopulation t = 1, . . . , u the weight wtr = ntr /n be defined so that
Σ ∗ = n−1 Γ ∗
⎛ ⎞
−( {r} w1r Γ ∗1r )−1
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
= n−1 ⎜
⎜ −( {r} wtr Γ ∗tr )−1 ⎟,
⎟
⎜ .. ⎟
⎝ . ⎠
−( {r} wur Γ ∗ur )−1
(14)
with all off-diagonal entries equal to zero. The asterisk again indicates that each row and column
of the complete matrix Γ tr associated with an item parameter not involved in a restriction will
be deleted so that the number of rows and columns of Γ ∗tr is equal to the number of free item
parameters involved in a restriction.
In order to determine the sample sizes n and ntr respectively for the test of hypothesis (3)
against (8), consider the following. Let the error probabilities α and β as well as the (critical)
value χ02 be given. That is, for the argument χ02 the cumulative distribution function of the central
χ 2 distribution with the number of degrees of freedom as defined above takes the value 1 − α.
Consider the noncentral χ 2 distribution with the number of degrees of freedom as defined above
and the noncentrality parameter λ. With regard to the latter, choose the value λ0 so that for the ar-
gument χ02 the cumulative distribution function of the noncentral χ 2 distribution with the number
of degrees of freedom as defined above and with λ = λ0 takes the value β. One requirement with
regard to the probabilistic properties of the statistical test of hypothesis (3) is that the probability
of accepting hypothesis (3) is equal to the predetermined level β if restriction (8) holds. This
requirement will obviously be met if the noncentrality parameter given by (9) takes the value λ0 .
714 PSYCHOMETRIKA
Using (13) the number ntr = nwtr , for t = 1, . . . , u and for each element in the set {r}, is then
also determined. The denominator of (15) can be considered as a scalar measure of model devi-
ation. It is a predetermined value following from restriction (8), that is, from the choice of the
elements of the vector c.
The hypothesis given by (3) will be rejected if the observed value of the Wald statistic given
by (5) is greater than or equal to χ02 . Otherwise, it will be accepted. If the test of hypothesis (3)
against (8), given the weights wtr , will be based on the number of observations given by (13)
and (15), the following requirements with regard to its probabilistic properties will be met. The
probability of rejecting (3) will be approximately equal to the predetermined level α if (3) is true.
The probability of accepting (3) will be approximately equal to the predetermined level β if (8) is
true; that is, if the true extent of model deviation expressed in the form given by the denominator
of (15) is equal to the predetermined number of the denominator of (15). If the true extent of
model deviation is greater than the predetermined number given by the denominator of (15), the
probability of accepting the hypothesis (3) will be smaller than β. Thus, the predetermined level
β is an upper bound for the probability of the error of the second kind of the model test.
The objective is to utilize a practically useful measure of model deviation which can di-
rectly be linked to the statistical formulation of model deviation in the form of the alternative
hypothesis given by (4) and by λ > 0. In other words, an equivalent practically useful form to
the statistical formulation (4) shall be utilized since an interpretation of the practical meaning
of differences between real-valued item parameters is hardly ever possible. To choose a value
of such a practically useful measure means predetermining a practically meaningful extent of
deviation from the model so that the acceptance of the model is considered an error of practical
importance whenever the true extent is at least as great. If the true extent of deviation is greater
than zero but smaller than the one predetermined, the acceptance of the model is not considered
an error of practical importance. Such a predetermination of a practically meaningful extent of
model deviation shall be equivalent to the statistical formulation given by restriction (8), the non-
centrality parameter (9), and the denominator of (15) respectively. As already stated above, the
scalar defined by the denominator of (15) is another equivalent definition of model deviation. It
can be considered as a global measure of model deviation which is, contrary to the noncentrality
parameter (9), independent of the sample size n.
Testing the Rasch model involves testing a composite hypothesis against a composite alter-
native. There exist basically infinitely many possibilities of choosing values for the elements of
the vector β ∗t , for t = 1, . . . , u, so that the vector of functions f (φ) is equal to the vector 0 when
considering (3) and equal to c in the case of (8). In this paper it is argued that, from a practical
point of view, it suffices to choose one of the infinitely many possibilities. Without loss of gen-
erality, this shall be indicated by means of a special case of the general model (1) and (2), the
dichotomous Rasch model, merely to simplify the presentation. For the dichotomous model al-
ready described above, the sufficient statistic r for the person parameter θ is simply given by the
sum score r = ki=1 xi1 , where xi1 ∈ {0, 1} is the observed value of the response of a person to
category h = 1 of item i. Let the conditional probability that xi1 = 1, given r, be denoted by πir .
CLEMENS DRAXLER 715
It is well-known and was already indicated for the general case above that πir is a function of
the item parameters only. It is given by
(i)
exp(−βi )γr−1 (β)
πir = , (16)
γr (β)
(i)
where γr (β) is the elementary symmetric function of order r of the item parameters and γr−1 (β)
its first-order partial derivative with respect to βi .
The determination of a practically meaningful deviation from the Rasch model shall be
based on the conditional probability πir , for i = 1, . . . , k and r = 1, . . . , k − 1. As was also
already indicated for the general model above, the values r = 0 and r = k are neglected since
the conditional probability πir is for each of these values equal to 1. In order to determine a
practically meaningful deviation, consider hypothesis (3) first. Choose values for the elements
of the vector β which best represent assumptions about the values of the item parameters for
the application under consideration. Useful information about the values of the item parameters
can, for instance, be obtained from the estimated values of a pilot survey or former analyses
of the items under consideration. If there is no such information available at all and there are
no other proper assumptions about the values of the item parameters, one may also choose all
elements of β to be equal. In this paper it will be argued that from a practical point of view it is
not really substantial for the chosen values of the item parameters to be near (or even equal) to
the true or the estimated values of the elements of β. For the application of the procedure to be
proposed in the sequel this is neither of primary interest nor of practical importance. The chosen
values shall merely serve as a (typical, possible) scenario for the sake of determining a practically
meaningful extent of model deviation. Any scenario regardless of its degree of discrepancy to the
real scenario (true values of the item parameters) may in principle be considered. All possible
scenarios serve the same purpose. This will be discussed in more detail below.
Note that for the case of the dichotomous Rasch model the vector of item parameters is
k-dimensional only. That is, β = (β1 , . . . , βi , . . . , βk ). Let the chosen values be denoted by
β = β (0) . Since the conditional probability πir is a function of the elements of β given by
(16), one obtains by using β = β (0) the conditional probabilities under hypothesis (3). That is,
(0)
πir = πir , for i = 1, . . . , k and r = 1, . . . , k − 1. These conditional probabilities represent one
typical scenario (from the infinitely many possible) for the application under consideration for
which hypothesis (3), the Rasch model, holds.
To determine a practically meaningful model deviation, it is proposed to choose with re-
gard to each subpopulation t = 1, . . . , u an alternative value for each conditional probability πir .
Denote the conditional probability associated with subpopulation t by πitr . Under the Rasch
model and under the chosen typical scenario, respectively, it holds that β t = β = β (0) and
(0)
πitr = πir = πir , for i = 1, . . . , k, t = 1, . . . , u, and r = 1, . . . , k − 1. Recall that β t is the
vector of item parameters associated with subpopulation t (which in the case of the dichotomous
model is also only k-dimensional). An equivalent determination of model deviation as defined by
restriction (8) as well as by the denominator of (15) and by the noncentrality parameter (9) is the
following. Choose alternative values with regard to the conditional probabilities under consider-
(1)
ation. That is, πitr = πitr , for i = 1, . . . , k, t = 1, . . . , u, r = 1, . . . , k − 1. Making this choice
the following two restrictions have to be taken into account. The first one is given by
k−1 u k−1
(0) (1)
uπir = πitr , (17)
r=1 t=1 r=1
for i = 1, . . . , k. Note that in the case of a partitioning of the population of respondents according
to the values of the sufficient statistic r = 1, . . . , k − 1 (so that the different subpopulations are
716 PSYCHOMETRIKA
defined to consist of persons with different sum scores), the summation on the right-hand side of
(17) has to be taken over r only, and on the left-hand side u has to be set equal to 1. The second
restriction is given by
k
(1)
πitr = r, (18)
i=1
for t = 1, . . . , u and r = 1, . . . , k − 1. Given (17) and (18) there are
u(k − 1) − 1 (k − 1) = u(k − 1)2 − (k − 1)
conditional probabilities free to vary (free to choose). Again, u will be set equal to 1 if the
subpopulations are defined so that they correspond exactly to the values of r. In a more elegant
(1) (0)
mathematical form these restrictions may be stated as follows. Let δitr = πitr − πir , for i =
r = 1,
1, . . . , k, . . . , k − 1 and t = 1, . . . , u, be defined. Then the restrictions (17) and (18) are
given by ut=1 k−1 r=1 δ itr = 0 and k
i=1 δitr = 0.
(1)
Let β t = β t = (β1t , . . . , βit , . . . , βkt ), for t = 1, . . . , u, denote the vector of item pa-
(1) (1) (1)
rameters under the chosen model deviation, represented by the chosen conditional probabilities.
Consider the structure of the CML estimation equations for the item parameters of the dichoto-
mous Rasch model (for each subpopulation t). It follows from these equations that
k−1 k−1 (1) (it) (1)
(1) exp(−βit )γr−1 (β t )
ntr πitr = ntr (1)
, (19)
r=1 r=1 γr (β t )
where ntr is the number of persons drawn from subpopulation t having obtained the sum score r,
(1) (1) (it) (1)
γr (β t ) is the elementary symmetric function of order r of the elements of β t , and γr−1 (β t )
(1)
is its first-order partial derivative with respect to βit . Using the given (chosen) weights defined
by (13), that is (for the dichotomous model) ntr = nwtr , and substituting ntr in (19) for nwtr , it
follows that
k−1 k−1 (1) (it) (1)
(1) exp(−βit )γr−1 (β t )
nwtr πitr = nwtr ,
r=1 r=1 γr (βt(1) )
(20)
k−1 k−1 (1) (it) (1)
(1) exp(−βit )γr−1 (β t )
wtr πitr = wtr (1)
,
r=1 r=1 γr (β t )
for i = 1, . . . , k and t = 1, . . . , u. The system of equations defined by (20) can easily be solved
by usual numerical algorithms applied for the Rasch model, for instance, by a Newton–Raphson
procedure which is described by Fischer (1974) and Andersen (1980, 1995). Having obtained
values for the elements of β (1) t , for t = 1, . . . , u, in this way, one has thus also obtained the
values for the elements of the vector c, which correspond to the model deviation chosen on the
(1)
basis of πitr = πitr , for i = 1, . . . , k, t = 1, . . . , u, r = 1, . . . , k − 1.
The reason for the proposal of choosing the practically relevant extent of model deviation on
the basis of πitr is that the practical meaning of differences between probabilities may be easier to
interpret than differences between the real-valued item parameters. However, the problem arises
that infinitely many possibilities to choose alternative values for the conditional probabilities
exist, so that the same value of the denominator c [T (φ)Γ ∗ T (φ)]−1 c of (15) representing the
(chosen) model deviation as a whole will be obtained. For instance, with regard to any πitr
the alternative value can be chosen to be greater or smaller than the associated value under
the Rasch model and the typical scenario chosen appropriately. That is, different signs of the
CLEMENS DRAXLER 717
differences between the alternative probabilities and the corresponding values under the chosen
scenario (different directions of deviation) can be determined. Another example is the following.
Concerning any πitr , the deviation of the alternative value from the corresponding value under
the model may be chosen to be greater than that regarding any other conditional probability. That
is, the extent of deviation may be chosen to differ between the conditional probabilities. One
can, for instance, choose the smaller deviations of the alternative values from the corresponding
values under the chosen scenario (under the model) the nearer the conditional probabilities under
the chosen scenario are to their limiting values 0 and 1. However, it is argued in this paper
that considering one of the infinitely many possibilities suffices without limiting the validity
and practicality of the procedure. The determination of only one scenario of model deviation
yields a particular value of the global measure of model deviation c [T (φ)Γ ∗ T (φ)]−1 c which
represents all possible scenarios of deviation of exactly the same global extent. As a consequence,
the statistic (5) has the same distribution for all possible scenarios of model deviation with the
same value of c [T (φ)Γ ∗ T (φ)]−1 c and the probability of the error of the second kind is equal
to the predetermined level β. In other words, a local determination of model deviation with
regard to each free varying conditional probability yields a value for the global deviation which
is consistent with all possible local determinations yielding the same global extent of deviation.
Thus, if one is interested in a global test of the model, the infinitely many possibilities of local
model deviations need not be considered in particular. The consideration of only one suffices. In
the sequel this will be discussed in more detail.
In order to get an idea of the sample sizes for different numbers of items and extents of model
deviation for given values of the error probabilities α and β, a number of numerical examples
shall be considered. These illustrations may also serve as a tentative guideline for the practical
application of the procedure theoretically discussed above. The details involved in determining a
practically relevant model deviation will also be discussed by considering a three-step approach
which may have the potential to be routinely used for practical applications. As already indicated
above, the first step involves the choice of a scenario concerning the item parameters. The second
step is to determine a scenario of a local model deviation, and the third step consists in making an
assumption about the weights, the probabilities of observing persons with different score groups.
The three steps yield a value of the measure of global model deviation. This is a value of the
denominator of (15). Along with the error probabilities α and β, the latter determines the total
sample size.
true value of at least one conditional probability πir referring to one particular (selected) item
(0)
i and one particular (selected) score group r deviates from the corresponding value πir under
the model at least by a certain practically relevant amount δir . In this case the acceptance of
the model will be considered an error of practical importance. Thus, the conditional probability
(1) (1) (0)
under the alternative πir will be determined so that πir − πir = δir holds. From restrictions
(17) and (18) the following is then obtained. It holds that
−δir
πj(1) (0)
r − πj r = = δj r , (21)
(k − 1)
for each item j = i and score group r,
least as great as 0.05, where the proportion of persons with score r = 1 is only one thousandth.
The practical question is whether the deviation 0.05 is relevant if the proportion of persons to
which it refers is that small. Thus, from a practical viewpoint, it might be more appropriate to
choose a score group r (for fixing δir ) with a greater weight, a proportion which is actually of
practical importance.
In the three-step process described above, particular assumptions are made which are jus-
tified as follows. The first step involves the assumption of a scenario of equal item parameters.
However, in most applications the true item parameters will not be equal. It is argued in this pa-
per that a discrepancy of the assumed from the true scenario neither invalidates the procedure nor
limits its practicality. Rather, it has the advantage of simple applicability. Such a simple scenario
merely serves the purpose of determining a practically relevant model deviation. Determining a
practically relevant model deviation locally on the basis of such a simple scenario (equal item
parameters) yields a value of a global measure of model deviation, the denominator of (15),
which is independent of all possible scenarios and all possible local model deviations yielding
the same value of the global measure. Each of the infinitely many combinations of scenarios with
local model deviations—which yield exactly the same value of the global measure as obtained
on the basis of the scenario of equal item parameters and the particular local model deviation
utilized—is then also considered practically relevant. Whenever the true global deviation is at
least as great as the predetermined global value, the acceptance of the model is considered an
error of practical importance regardless of the particular scenario and the particular local model
deviation utilized. This is the principle which simplifies the applicability of the whole procedure;
otherwise, an infinite number of different scenarios would have to be considered.
To support this argument, consider the following simple example. Let X be a binomially dis-
tributed random variable with parameters n and p. Let n = 100 be given. The hypothesis p = 0.5
shall be tested by applying asymptotic theory. One of a number of statistics serving this purpose
is given by (x − np)2 /[np(1 − p)]. It is asymptotically χ 2 distributed with df = 1. Assume
that X takes on the value x = 45. The observed (or estimated) deviation from the hypothesis
p = 0.5 is equal to p − x/n = 0.05 and (1 − p) − (n − x)/n = −0.05. It can be considered a
local deviation analogous to the procedure concerning the Rasch model proposed above. The χ 2
statistic yields the value χ 2 = 1, which can be seen as a global measure of deviation. Consider
another hypothesis p = 0.9 to be tested and assume that x = 87. In this case, the observed local
deviation from the hypothesis p = 0.9 is smaller. One obtains 0.03 and −0.03 respectively. On
the other hand, the observed value of the χ 2 statistic, the measure of global deviation, yields
the same value as in the first case. It is also equal to 1 (since the variance of X under p = 0.9
is smaller than that under p = 0.5). Since the same value of the global measure is obtained for
the two scenarios, an observed absolute local deviation of 0.05 regarding the scenario p = 0.5
is considered equivalent to an observed absolute local deviation of 0.03 regarding the scenario
p = 0.9. Consequently, if an absolute local deviation of 0.05 concerning the scenario p = 0.5 is
considered practically relevant, an absolute local deviation of only 0.03 concerning the scenario
p = 0.9 is then also considered practically relevant.
Returning to the case of testing the Rasch model, consider the local model deviation δir
regarding item i and score group r. The deviations of the conditional probabilities concerning
the other items and score groups are then given by (21), (22), and (23). Assume that this scenario
of a local model deviation will be considered practically relevant if the true item parameters are
all equal. However, if the true item parameters are not all equal, some conditional probabilities
under the model will be nearer to their limiting values 0 and 1 as under the equality assumption.
Thus, to obtain the same value of the global measure of model deviation as is obtained for the
scenario of equal item parameters with the local model deviation δir , the absolute local deviations
concerning the various conditional probabilities must on average be smaller than for the scenario
of equal item parameters.
720 PSYCHOMETRIKA
TABLE 1.
Total sample sizes (values of the ceiling function of n) for different numbers of items and two different extents of a local
model deviation obtained for α = β = 0.05.
Note: The local model deviation δ11 refers to item 1 and score group r = 1. The deviations referring to all
other item and score group combinations are given by (21), (22), (23). The weights w1 , . . . , wk−1 are given
by the binomial distribution with parameters p = 0.5 and n = k − 2.
A similar problem to that of assuming the scenario of equal item parameters and a particular
scenario of a local model deviation is involved in the third step. The observed relative frequency
nr /n of the number of persons with score r will almost surely deviate from the corresponding
assumed weight wr . Two approaches shall be proposed to deal with this problem. First, one can
give an analogous argument as the one above for the case of assuming the scenario of equal item
parameters. A value of the global model deviation is obtained only if a local model deviation is
determined along with an assumption about the weights. Consider that the chosen local deviation
δir and the weights given by (24) yield a particular value of the global deviation, say δ. All
possible combinations of different local deviations and different assumptions about the weights
which yield the same value δ will then be considered equivalent so that the same total sample size
will be obtained (given α and β). The second approach of dealing with the problem of a possible
discrepancy between wr and nr /n is the following. Draw enough observations so that for each
score r the number of respondents is at least as large as the values obtained on the basis of the
chosen local deviation δir and the chosen weights according to (24). If one does so, however, the
number of respondents nr will frequently be larger for one or more values of the sum score r
than the calculated values. Thus, the power of the test will be increased so that the predetermined
nominal level β is only the upper bound of the probability of the error of the second kind, given
α and the chosen extent of model deviation.
TABLE 2.
Total sample sizes (values of the ceiling function of n) for different numbers of items and two different extents of a local
model deviation obtained for α = β = 0.05.
Note: The local model deviation δ11 = · · · = δ1r0 refers to item 1 and the score groups of the low score
region, where r0 = (k − 1)/2. The deviations referring to all other item and score group combinations are
determined analogously to (21), (22), (23) accounting for restrictions (17), (18). The weights w1 , . . . , wk−1
are given by the binomial distribution with parameters p = 0.5 and n = k − 2.
the subsequent examples. The scenario of equal item parameters will again be utilized and it will
be assumed that the weight wr , for r = 1, . . . , k − 1, is given by (24). Concerning the determina-
tion of a local model deviation, consider the following. Let r = r0 be the largest value of r (the
largest score group) belonging to the low score region. That is, the low score region consists of
all values of r from 1 to r0 . Hence, the high score region consists of all values of r from r0 + 1 to
k − 1. For the following examples let r0 = (k − 1)/2. Note that all considered values of k are odd.
It is assumed that the rejection of the model will be preferred if all true conditional probabilities
πi1 , . . . , πir0 referring to the low score region of at least one (selected) item i deviate simulta-
(0) (0)
neously from the corresponding values πi1 , . . . , πir0 under the model (the assumed scenario of
equal item parameters) by a practically relevant amount δi1 = · · · = δir0 (which is equal for all
score groups belonging to the low score region). Thus, the alternative conditional probabilities
(1) (1) (1) (0)
πi1 , . . . , πir0 are determined so that πir − πir = δir holds for r = 1, . . . , r0 . The deviations
concerning all other conditional probabilities are then determined analogously to (21), (22), and
(23) accounting for restrictions (17) and (18). Assume again that this scenario yields a particular
value of the global model deviation, say δ. Consider the case where the true absolute deviations
concerning the selected item i and the score groups of the low score region are smaller than the
absolute value of the predetermined deviation δi1 = · · · = δir0 but where the true extent of the
global model deviation is equal to δ (because the true item parameters are not equal and/or the
true absolute deviations concerning other item and score group combinations are greater than for
the considered local scenario above). Then the acceptance of the model will also be considered
an error of practical importance even though the true absolute deviations concerning item i and
the score groups of the low score region are smaller than the absolute value of the predetermined
deviation δi1 = · · · = δir0 .
Table 2 shows the results for different (larger) numbers of items and two different extents of
a local model deviation δ11 = · · · = δ1r0 with regard to item 1 and the low score region. It shows
the obtained values of the ceiling function of n. Again, the cases with r = 0 and r = k are not
included in the total sample sizes.
(1)
Note that one has to make sure that the alternative conditional probabilities πi1 , . . . , πir(1)0
do not fall outside of the admissible interval. For larger item numbers it is thus recommended
to choose a positive sign for δi1 = · · · = δir0 . The examples show an interesting result from
a theoretical and practical point of view. For each of the two different local model deviations
considered in Table 2, a value of the global measure of model deviation is obtained which is
approximately equal for all item numbers. The denominator of (15) takes approximately the value
1/100 for the local model deviation δ11 = · · · = δ1r0 = 0.05 and approximately 4/100 for the case
δ11 = · · · = δ1r0 = 0.1, independent of the item number. This observation may lead to a useful
classification of the global measure of model deviation in order for a few practically relevant
categories to represent different levels of the global model deviation. Such a classification would
722 PSYCHOMETRIKA
probably improve the practicality of the procedure and be of great help for applied researchers.
One may consider three categories low, middle, and high, analogous to the classifications of
different measures regarding different statistical tests by Cohen (1988). Determining a practically
relevant model deviation could then easily be realized by choosing one of three values of the
denominator of (15).
7. Discussion
Glas and Verhelst (1995a, 1995b) have reviewed the most prominent approaches to testing
the Rasch model. This paper is generally concerned with supplementing statistical tests of the
Rasch model so that the probability of the error of the second kind (Type II probability) can, in
addition to the probability of the error of the first kind (Type I probability), be controlled at a
predetermined level. The focus in particular lies on the Wald test (Wald, 1943), which was also
discussed by Glas and Verhelst (1995a, 1995b). The motivation for this paper stems from the un-
derstanding that, from a practical viewpoint, the negative consequences of an error of the second
kind are at least as serious as those of an error of the first kind. Many researchers applying the
Rasch model will probably agree that in many cases the consequences of an error of the second
kind are even more serious. Thus, the need of statistical tests for the Rasch model controlling the
error of the second kind at a predetermined, desired level is of great practical interest. Unfortu-
nately, to the author’s knowledge, there has been no paper providing a satisfactory solution for
the problem under consideration, i.e., the predetermination of a practically meaningful and rele-
vant deviation from the Rasch model as well as the derivation of the probability distribution of a
statistic under the predetermined deviation. In this paper a solution is proposed which is based on
a three-step approach. The first step assumes one proper scenario with regard to the values of the
item parameters. The second step determines practically relevant deviations from the conditional
probabilities which follow from the assumed scenario (values of the item parameters) of the first
step. This is referred to as a local model deviation or one scenario of a local model deviation.
The third step makes an assumption about the weights, the probabilities of observing persons
with different score groups. From the determination of one particular scenario of local model de-
viation (alternative conditional probabilities) and particular weights, a value of the global extent
of model deviation will be obtained. This is a value of the denominator of (15). Disregarding
the infinitely many possible scenarios concerning the values of the item parameters in the first
step, the infinitely many possibilities of determining model deviations locally in the second step,
and the infinitely many possibilities of choosing weights in the third step has no influence on
the result (total sample size). All combinations of scenarios from the first step, scenarios from
the second step, and scenarios regarding the weights which yield the same value of the global
measure of model deviation will be considered equivalent extents of model deviation. Thus, the
consideration of only one combination of scenarios of the three steps suffices. The particular
chosen combination of scenarios of the three steps merely serves the purpose of predetermining
the global extent of model deviation. The optimal total sample size depends on the predetermined
global extent of deviation and the levels of the error probabilities α, β only, not on each of the
infinitely many combinations of scenarios of the three steps which yield an equal global extent.
It is customary in statistics to utilize global measures of model discrepancy on which sam-
ple size and power considerations are based. For instance, the approach by Cohen (1988) for a
number of different statistical tests is well known. Another nice example occurs in the context of
structural equation modeling. Satorra and Saris (1985) discuss a procedure (to determine a global
discrepancy) which uses the quadratic expression of the Wald statistic also used in this paper.
Another point worth discussing concerns the formulation of the alternative hypothesis of the
test described here. The determination of a practically meaningful model deviation is based on
CLEMENS DRAXLER 723
a partition of the population of respondents. It is assumed that different Rasch models (different
item parameters) hold for different subpopulations. If the partition is based on the person score
which is a sufficient statistic for the person parameter, such a formulation of the alternative
will also cover models assuming unequal item discriminations as well as models with a lower
asymptote parameter like the 2 PL and 3 PL models.
The first part of this paper deals with a general class of Rasch models which includes models
for polytomously scored items and multidimensional models. The last part of the paper is con-
cerned with the dichotomous model only, where a concrete approach of determining practically
relevant model deviations (choosing a combination of scenarios of the three steps above) is dis-
cussed in detail. The main purpose is to show the practicality of the procedure based on a simple
case. The application of the presented approach of determining model deviations to other models
belonging to the general class of Rasch models (1) is straightforward.
As a perspective with regard to subsequent research, it would probably be of help for applied
researchers if the global measure of model deviation given by the denominator of (15) could be
classified according to a few practically relevant categories (e.g., low, middle, high) representing
different levels of model deviation. For given values of the probabilities of the errors of the first
and second kind, the optimal total sample sizes could then be provided for the different practi-
cally relevant levels of the global measure of model deviation, analogous to the tables by Cohen
(1988). Tentative results for the dichotomous Rasch model are provided in this paper. These re-
sults show that, independent of the number of items, two global levels of model deviation can
be distinguished. Furthermore, it may also be of interest for subsequent research to treat power
and sample size considerations concerning other test statistics mentioned in the introduction and
compare them with the results in this paper.
References
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: The Danish Institute of
Education Research (Expanded Edition, 1980. Chicago: University of Chicago Press).
Rasch, G. (1961). On general laws and the meaning of measurement in psychology. Berkeley: University of California
Press.
Satorra, A., & Saris, W.E. (1985). The power of the likelihood ratio test in covariance structure analysis. Psychometrika,
50, 83–90.
Snijders, T.A.B. (1991). Enumeration and simulation methods for 0-1 matrices with given marginals. Psychometrika, 56,
397–417.
Stroud, T.W.F. (1972). Fixed alternatives and Wald’s formulation of the noncentral asymptotic behavior of the likelihood
ratio statistic. Annals of Mathematical Statistics, 43, 447–454.
van den Wollenberg, A. (1982). Two new test statistics for the Rasch model. Psychometrika, 47, 123–140.
Verhelst, N.D. (2008). An efficient MCMC algorithm to sample binary matrices with fixed marginals. Psychometrika,
73, 705–728.
Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model. In G.H. Fischer, & I.W. Molenaar (Eds.),
Rasch models—foundations, recent developments and applications (pp. 215–237). New York: Springer.
Verhelst, N.D., Glas, C.A.W., & Verstralen, H.H.F.M. (1994). OPLM: Computer program and manual. Arnhem: CITO.
Wald, A. (1943). Tests of statistical hypothesis concerning several parameters when the number of observations is large.
Transactions of the American Mathematical Society, 54, 426–482.
Wilson, M., & Masters, G.N. (1993). The partial credit model and null categories. Psychometrika, 58, 87–99.