0% found this document useful (0 votes)

7 views17 pages

Sample Size Determination For Rasc

This paper discusses methods for determining sample size in tests based on the Rasch model, focusing on controlling both Type I and Type II error probabilities. It proposes a way to approximate the distribution of the Wald test under specific model deviations and emphasizes the importance of selecting an appropriate number of observations. The research aims to enhance existing testing procedures by ensuring that the probability of accepting a false model is kept at a predetermined level.

Uploaded by

patriciafernandes.to

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views17 pages

Sample Size Determination For Rasc

Uploaded by

patriciafernandes.to

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

PSYCHOMETRIKA — VOL . 75, NO .

4, 708–724
D ECEMBER 2010
DOI : 10.1007/ S 11336-010-9182-4

SAMPLE SIZE DETERMINATION FOR RASCH MODEL TESTS

C LEMENS D RAXLER
LUDWIG-MAXIMILIAN-UNIVERSITY MUNICH

This paper is concerned with supplementing statistical tests for the Rasch model so that additionally
to the probability of the error of the first kind (Type I probability) the probability of the error of the second
kind (Type II probability) can be controlled at a predetermined level by basing the test on the appropriate
number of observations. An approach to determining a practically meaningful extent of model deviation is
proposed, and the approximate distribution of the Wald test is derived under the extent of model deviation
of interest.
Key words: Rasch model, Wald test, sample size, error of the second kind (Type II error).

1. Introduction

Testing the Rasch model from the frequentists’ point of view has a long tradition. Various
test statistics have been proposed, including exact tests based on discrete probability distributions
as well as statistics based on asymptotic theory.
As far as the former are concerned, Georg Rasch himself initiated the development. He
proposed an exact, parameter-free inference approach to testing the dichotomous Rasch model
(Rasch, 1960; Fischer, 1974) which is based on the conditional uniform distribution of the obser-
vations (responses of a number of persons to a number of items) given the sufficient statistics of
the model’s parameters. Ponocny (2001) proposed many other test statistics which have power
against violations of different assumptions of the model. These are uniformly most powerful un-
biased tests. Through lack of suitable combinatorial or analytical methods to derive the exact,
discrete distribution of these test statistics (under the complicated uniform distribution), Monte
Carlo methods have been applied to approximate the exact distributions by random sampling
from the uniform distribution (Snijders, 1991; Ponocny, 2001; Chen & Small, 2005; Verhelst,
2008). Molenaar (1983) pointed to some exceptional cases of exact tests.
The majority of research with regard to testing the Rasch model has focused on the applica-
tion of asymptotic theory. In their review of such tests Glas and Verhelst (1995a, 1995b) named
generalized Pearson χ 2 tests (van den Wollenberg, 1982; Glas, 1988; Glas & Verhelst, 1989;
Verhelst & Glas, 1995), likelihood ratio tests (Andersen 1973, 1980; Martin-Löf, 1973; Kelder-
man 1984, 1989), Wald tests (Wald, 1943), and Lagrange multiplier tests (Aitchison & Silvey,
1958). In contrast to the exact tests, these tests also apply to polytomous Rasch models.
The research mentioned above has solely focused on deriving the exact or the asymptotic
distribution of a statistic under the hypothesis of the validity of the Rasch model. Consequently,
the probability of the error of the second kind (Type II probability) β is left uncontrolled when
a model test is carried out. This paper is generally concerned with supplementing the existing
procedures for testing the Rasch model so that in addition to the probability of the error of the first
kind (Type I probability) α the probability of the error of the second kind β can be controlled at
a predetermined level. In particular, this paper focuses on the Wald test (Wald, 1943), which has
been discussed with regard to testing the Rasch model by Glas and Verhelst (1995a, 1995b). The

Requests for reprints should be sent to Clemens Draxler, Department Psychology, Ludwig-Maximilians-Universität
München, Leopoldstraße 13, 80802 Munich, Germany. E-mail: [email protected]

708
© 2010 The Psychometric Society
CLEMENS DRAXLER 709

objective is to approximate the complete distribution of the Wald statistic for a finite number of
observations under a practically meaningful and useful alternative hypothesis. This involves the
predetermination of a practically meaningful extent of deviation from the Rasch model, measured
on a useful scale so that the acceptance of the model is considered an error of practical importance
whenever the true extent of deviation is at least as great as the one predetermined. The upper
bound of the probability of accepting the model when the true extent of deviation is greater than
or equal to the one predetermined can then be controlled at a predetermined level β by basing
the test on the respective (appropriate) number of observations (given the probability of the error
of the first kind α).

2. A General Class of Rasch Models

In their review of statistical tests for testing polytomous Rasch models, Glas and Verhelst
(1995b) define a general framework of Rasch models for polytomous item responses. The Wald
test is one of the tests discussed by Glas and Verhelst which applies to this general class of
models, and so are the proposals and results of this paper.
The general model is defined as follows. Consider k items indexed by i = 1, . . . , k with
mi + 1 response categories indexed by h = 0, 1, . . . , mi . Let the binary response of a person to
category h of item i be modeled by the Bernoulli variable Xih which can take on the values
xih = 0 and xih = 1. The probability distribution of Xih is determined by
q
exp[xih ( p=1 rihp θp − dc=1 sihc βic )] exp[xih (r ih θ − s ih β i )]
P (Xih = xih ) = m = mi
, (1)
l=0 exp(r il θ − s il β i )
q d
l=0 exp( p=1 rilp θp −
i
c=1 silc βic )

where θ = (θ1 , . . . , θp , . . . , θq ) is a vector of real-valued person parameters and β i = (βi1 , . . . ,

βic , . . . , βid ) a vector of real-valued parameters associated with item i. The vectors of constants
r ih = (rih1 , . . . , rihp , . . . , rihq ) and s ih = (sih1 , . . . , sihc , . . . , sihd ) are predetermined (or known)
weights and are usually referred to as scoring functions (or score functions). Multiplying these
weights with the realized values xih yields the summands of the sufficient statistics (which are
weighted sums) for the parameters of the model.
A model of the general form (1) is basically unidentified without placing restrictions on
the vectors of weights r ih , s ih , and the parameters respectively. Glas and Verhelst (1995b) give
some examples of such restrictions and show that a number of well-known unidimensional and
multidimensional models as well as the dichotomous Rasch model are derived from model (1)
as special cases, including, for example, the unidimensional polytomous Rasch model (Rasch,
1961; Andersen 1977, 1995), the rating scale model (Andrich, 1978a), the binomial trials model
(Andrich, 1978b), the partial credit model (Masters, 1982), the extended partial credit model
(Wilson & Masters, 1993), the one parameter logistic model (OPLM) for dichotomous (Ver-
helst & Glas, 1995) and polytomous item responses (Verhelst, Glas, & Verstralen, 1994) and the
multidimensional Rasch model (Rasch, 1961; Andersen 1977, 1995). For instance, the dichoto-
mous Rasch model is obtained from (1) by restricting the number of response categories so that
h = 0, 1, for each item i = 1, . . . , k, and setting q = d = 1 so that the vectors r ih , s ih , θ , and β i ,
for i = 1, . . . , k and h = 0, 1, reduce to scalars rih , sih , θ , and βi respectively. Furthermore, the
weights rih and sih for response category h = 0, 1 are restricted so that ri0 = 0, ri1 = 1, si0 = 0,
and si1 = 1, for i = 1, . . . , k. Hence, it is a unidimensional model with one scalar person and one
scalar item parameter modeling the binary responses to the items. Let category h = 0 stand for
the negative or incorrect and category h = 1 for the positive or correct response alternative of
item i. If the response to item i is correct, the Bernoulli variable Xi1 takes on the value xi1 = 1.
If it is incorrect, Xi1 = 0. Then one obtains P (Xi1 = xi1 ) ∝ exp[xi1 (θ − βi )], for i = 1, . . . , k.
710 PSYCHOMETRIKA

Identifiability requires an additional restriction, for instance, setting one of the k item parameters
equal to a constant or setting the sum of the k item parameters equal to a constant. For example,
one can set βk = 0 or ki=1 βi = 0. Another unidimensional model frequently applied for poly-
tomous item responses is the partial credit model. It is derived from (1) by setting q = 1 so that
the person parameter vector θ and the vector of weights r ih reduce to scalars θ and rih . The latter
is restricted so that rih = h. Further, let the vector s ih be restricted to the scalar sih . That is, each
category h of each item i is associated with one weight sih . Let sih = 0, for i = 1, . . . , k, h = 0,
and sih = 1, for i = 1, . . . , k, h = 1, . . . , mi . With these settings one obtains the well-known
form of the partial credit model P (Xih = xih ) ∝ exp[xih (hθ − hc=1 βic )], for i = 1, . . . , k,
h = 1, . . . , mi , and with the exponent equal to 0, for h = 0. The set of item parameters {βic } can
be interpreted as a set of response category bounds between successive categories. An additional
identifiability
i constraint sets the sum of all category bounds (item parameters) equal to 0. That
is, ki=1 m c=1 βic = 0.
A remarkable feature of Rasch models is the separability of the person parameters and
the item parameters. By conditioning on the sufficient statistics for the incidental person pa-
rameters, the conditional likelihood is a function of the item parameters only. Maximizing the
conditional likelihood yields a consistent estimator for the item parameters (Andersen, 1970).
This is the well-known conditional maximum likelihood (CML) procedure. The application
of the Wald statistic (Wald, 1943) for testing the Rasch model shall be based on the asymp-
totic distribution of the CML estimator of the item parameters in this paper. CML estima-
tion of the item parameters of model (1) is feasible and is briefly indicated as follows (Glas
& Verhelst, 1995b). Consider the following matrices of weights. Let a q × mi matrix be de-
fined as R i = [r i0 , . . . , r ih , . . . , r imi ], for i = 1, . . . , k, and let a d × mi matrix be defined
as S i = [s i0 , . . . , s ih , . . . , s imi ], for i = 1, . . . , k. Further let R = [R 1 , . . . , R i , . . . , R ki ] be a
q × kmi matrix and S = [S 1 , . . . , S i , . . . , S k ] be a d × kmi matrix. Define the response of a
person to item i by the vector x i = (xi0 , . . . , xih , . . . , ximi ) so that the responses of a person to
all items, the response pattern x, is given by x = (x 1 , . . . , x i , . . . , x k ). Denote by {x} the set
of all possible response patterns. For each response pattern, for each element of {x}, define the
vector of sufficient statistics r = Rx for the vector of person parameters θ . Denote by {r} the set
of all possible values of r and define for each value of r the set {x | r = Rx} of response patterns
consistent with r = Rx. Note that each set {x | r = Rx} containing one response pattern only
will be excluded from all further considerations. The conditional probability of such a response
pattern is always equal to 1 and thus does not contain any statistical information (about the va-
lidity of the model). Finally, let β = (β 1 , . . . , β i , . . . , β k ). With these preparations as well as the
assumption of local independence, the conditional probability of the responses of a person to k
items, the response pattern x, is given by

exp(−x S β)
P (x | r = Rx) = , (2)
γr (β)

where γr (β) is a combinatorial function, a normalizing constant not depending on the obser-
vations and which is defined by the sum of exp(−y S β) over the set {y | r = Ry} of all re-
sponse patterns consistent with the associated value of r = Ry. The conditional likelihood of all
observations is then obtained by the product over the conditional probabilities of all observed
response patterns. Taking the partial derivatives of the conditional likelihood with respect to the
item parameters and setting them equal to zero yields the CML estimation equations for the item
parameters. Since the model considered defines an exponential family, it is well known (Ander-
sen, 1980) that parameter estimation reduces to equating the observed sufficient statistics (for the
item parameters) to their expected values.
CLEMENS DRAXLER 711

3. The Wald Statistic for Testing the Rasch Model

The Wald test (1943) is based on the rationale that there exists a general model and a special
case of it which is derived by imposing one or more restrictions on the general model. The
statistical hypothesis to be tested is given by these restrictions. This principle applied to the
problem of testing the Rasch model means assuming that the Rasch model holds for different
subpopulations of persons separately and testing the restriction of the equality of these models
(the equality of the parameters). This paper is concerned with testing a general class of Rasch
models defined by (1) and (2) respectively. Let the population of persons (respondents to the
items) be partitioned into u subpopulations indexed by t = 1, . . . , u and assume that model (2)
and its item parameters respectively holds separately for the u subpopulations. This shall be
indicated by introducing the index t for the item parameter vector β. Thus, β t shall be associated
with subpopulation t. The partition can either be based on an external criterion such as sex,
age, education, etc. or on the vector r of person scores (the sufficient statistics for the person
parameters). If the partition is based on the latter, the test will particularly have power against the
alternative of unequal item discriminations as modeled by the two- and three-parameter logistic
(2 PL and 3 PL) models. Testing the Rasch model which is defined for the whole population of
respondents is then equivalent to testing the restriction that the vector of functions

f (φ) = β 1∗ − β 2∗ , . . . , β t ∗ − β t+1
∗
, . . . , β u−1
∗
− β u∗ = 0 , (3)

where φ = (β 1∗ , . . . , β t ∗ , . . . , β u∗ ). The asterisk indicates that the dimension of β ∗t and β ∗t+1 de-
pends on and is equal to the number of differences between the free item parameters in subpopu-
lations t and t +1 which are restricted to be equal to 0. Any parameters not involved in any restric-
tion will be discarded. Thus, the hypothesis defined by (3) is the statement of the equality of the
vectors of free item parameters between the subpopulations, that is, β ∗1 = · · · = β ∗t = · · · = β ∗u .
This hypothesis is tested against the alternative hypothesis

f (φ) = 0 (4)

on the basis of the following statistic proposed by Wald (1943). It is given by

−1
W = f φ̂ T φ̂ Σ ∗ T φ̂ f φ̂ , (5)
∗ ∗
where φ̂ is composed of the CML estimators β̂ 1 , . . . , β̂ u for the u subpopulations. Note that the
restrictions which have to be placed on the item parameters (normalization condition) to identify
the model must be the same for all subpopulations since comparing item parameters from dif-
ferent scales is both meaningless and useless. Another possibility of comparing item parameters
between subpopulations on the basis of (5) which is independent of the chosen normalization
is discussed by Glas and Verhelst (1995a). The matrix T (φ) is defined by the matrix of partial
derivatives
∂f (φ)
T (φ) = (6)
∂φ
and
⎛ ⎞
Σ ∗1
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
Σ∗ = ⎜
⎜ Σ ∗t ⎟,
⎟ (7)
⎜ .. ⎟
⎝ . ⎠
Σ ∗u
712 PSYCHOMETRIKA

where Σ ∗t is a submatrix of the (complete) asymptotic covariance matrix Σ t of β̂ t , for t =

1, . . . , u. The asterisk indicates that each row and column of the complete covariance matrix
Σ t associated with an item parameter not involved in a restriction will be deleted so that the
number of rows and columns of Σ ∗t is equal to the number of free item parameters involved
in a restriction. The off-diagonal elements of Σ ∗ are all equal to zero since the responses of
respondents from the u subpopulations are independent. The product of matrices T (φ̂)Σ ∗ T (φ̂)
is the asymptotic covariance matrix of the estimator f (φ̂) of the vector of differences f (φ).
Wald (1943) proved that the statistic given by (5) is asymptotically central χ 2 distributed with
the number of degrees of freedom equal to the number of tested restrictions. In other words, the
number of degrees of freedom is equal to the number of differences between free item parameters
which are set equal to 0.

4. Determining the Sample Size

To determine the sample size for the Wald test of the Rasch model for given values of the
probabilities of the errors of the first and second kind α and β, the objective is to set the vector
of functions f (φ) equal to a vector c. That is,

f (φ) = c = 0. (8)

The vector c has to be chosen on the basis of practical considerations, which shall be discussed
in detail below. The limiting distribution of the statistic (5) under (8) can be derived under the
additional technical assumption that the model deviation (8) converges to (3) as the number
of observations n → ∞ at the rate n−1/2 or faster (Wald, 1943; Stroud, 1972). In practice, it
is reasonable to use a fixed model deviation like (8) to approximate the distribution of (5) for
a given value of n (Agresti, 2002, pp. 243, 591–592). In order to obtain this approximation
of the distribution of (5) under (8) for a given value of n, notice that the CML estimator β̂ t
has approximately a multivariate normal distribution, with β t as the vector of expected values
(population means) and Σ t as the covariance matrix (Andersen 1970, 1980). If (8) holds, the
expected values of the elements of the vector of differences f (φ̂) between pairs of multivariately
normally distributed estimators are equal to the elements of the vector c = 0. It follows that the
joint distribution (of the estimators) of these differences is multivariately normal with expected
values given by the vector c and the covariance matrix T (φ)Σ ∗ T (φ). Hence, under the model
deviation given by (8), the quadratic form (5) has approximately a noncentral χ 2 distribution with
the number of degrees of freedom equal to the number of tested restrictions and the noncentrality
parameter
−1
λ = c T (φ)Σ ∗ T (φ) c. (9)
The scalar parameter λ represents the model deviation defined by the vector (8).
The conditional form (2) of the model considered in this paper determines the conditional
probability of each element in the set {x | r = Rx} separately for all possible values that the vec-
tor of sufficient statistics r = Rx can take. The conditional probabilities of the response patterns
within each set {x | r = Rx} add up to 1. Thus, in order to derive a formula for the determination
of the sample size for the test of hypothesis (3) against (8), given the error probabilities α and β,
each set has to be treated separately. To do so, consider the asymptotic covariance matrix Σ t
of β̂ t (of all item parameters associated with t), for t = 1, . . . , u. It is given by

Σ t = −I −1
t , (10)
CLEMENS DRAXLER 713

where I t is defined as the matrix of the expected values of the second-order partial deriva-
tives of the conditional likelihood function with respect to the item parameters associated with
subpopulation t, and −I t is referred to as the associated information matrix (Fischer, 1974;
Andersen, 1980). Since all entries (the second-order partial derivatives) of I t are composed of a
sum over the elements of the set {r}, I t can be written as

It = I tr . (11)
{r}

If one or more of the u subpopulations are defined as a subset of {r}, notice that the summation
has to be taken over the elements of the respective subset of {r}. All entries of I tr can be written
as the product of two factors, where one of the factors is the number of respondents ntr drawn
from subpopulation t with the vector of sufficient statistics taking the value r. Hence,

I tr = ntr Γ tr , (12)

for each element in the set {r} and for t = 1, . . . , u. Let for each element in the set {r} and for
each subpopulation t = 1, . . . , u the weight wtr = ntr /n be defined so that

ntr = nwtr , (13)

with ut=1 {r} wtr = 1 and n as the total number of respondents in the sample. The values of
these weights have to be chosen on the basis of assumptions about each relative frequency ntr /n
in the sample of observations. This shall be discussed in more detail below. Using (10), (11),
(12), and (13), the matrix (7) can be written as

Σ ∗ = n−1 Γ ∗
⎛ ⎞
−( {r} w1r Γ ∗1r )−1
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
= n−1 ⎜
⎜ −( {r} wtr Γ ∗tr )−1 ⎟,
⎟
⎜ .. ⎟
⎝ . ⎠

−( {r} wur Γ ∗ur )−1
(14)

with all off-diagonal entries equal to zero. The asterisk again indicates that each row and column
of the complete matrix Γ tr associated with an item parameter not involved in a restriction will
be deleted so that the number of rows and columns of Γ ∗tr is equal to the number of free item
parameters involved in a restriction.
In order to determine the sample sizes n and ntr respectively for the test of hypothesis (3)
against (8), consider the following. Let the error probabilities α and β as well as the (critical)
value χ02 be given. That is, for the argument χ02 the cumulative distribution function of the central
χ 2 distribution with the number of degrees of freedom as defined above takes the value 1 − α.
Consider the noncentral χ 2 distribution with the number of degrees of freedom as defined above
and the noncentrality parameter λ. With regard to the latter, choose the value λ0 so that for the ar-
gument χ02 the cumulative distribution function of the noncentral χ 2 distribution with the number
of degrees of freedom as defined above and with λ = λ0 takes the value β. One requirement with
regard to the probabilistic properties of the statistical test of hypothesis (3) is that the probability
of accepting hypothesis (3) is equal to the predetermined level β if restriction (8) holds. This
requirement will obviously be met if the noncentrality parameter given by (9) takes the value λ0 .
714 PSYCHOMETRIKA

Equating λ0 to (9) and using (14), it follows that

−1
λ0 = c n−1 T (φ)Γ ∗ T (φ) c,
λ0
n= . (15)
c [T (φ)Γ ∗ T (φ)]−1 c

Using (13) the number ntr = nwtr , for t = 1, . . . , u and for each element in the set {r}, is then
also determined. The denominator of (15) can be considered as a scalar measure of model devi-
ation. It is a predetermined value following from restriction (8), that is, from the choice of the
elements of the vector c.
The hypothesis given by (3) will be rejected if the observed value of the Wald statistic given
by (5) is greater than or equal to χ02 . Otherwise, it will be accepted. If the test of hypothesis (3)
against (8), given the weights wtr , will be based on the number of observations given by (13)
and (15), the following requirements with regard to its probabilistic properties will be met. The
probability of rejecting (3) will be approximately equal to the predetermined level α if (3) is true.
The probability of accepting (3) will be approximately equal to the predetermined level β if (8) is
true; that is, if the true extent of model deviation expressed in the form given by the denominator
of (15) is equal to the predetermined number of the denominator of (15). If the true extent of
model deviation is greater than the predetermined number given by the denominator of (15), the
probability of accepting the hypothesis (3) will be smaller than β. Thus, the predetermined level
β is an upper bound for the probability of the error of the second kind of the model test.

5. The Formulation of Practically Meaningful Model Deviations

The objective is to utilize a practically useful measure of model deviation which can di-
rectly be linked to the statistical formulation of model deviation in the form of the alternative
hypothesis given by (4) and by λ > 0. In other words, an equivalent practically useful form to
the statistical formulation (4) shall be utilized since an interpretation of the practical meaning
of differences between real-valued item parameters is hardly ever possible. To choose a value
of such a practically useful measure means predetermining a practically meaningful extent of
deviation from the model so that the acceptance of the model is considered an error of practical
importance whenever the true extent is at least as great. If the true extent of deviation is greater
than zero but smaller than the one predetermined, the acceptance of the model is not considered
an error of practical importance. Such a predetermination of a practically meaningful extent of
model deviation shall be equivalent to the statistical formulation given by restriction (8), the non-
centrality parameter (9), and the denominator of (15) respectively. As already stated above, the
scalar defined by the denominator of (15) is another equivalent definition of model deviation. It
can be considered as a global measure of model deviation which is, contrary to the noncentrality
parameter (9), independent of the sample size n.
Testing the Rasch model involves testing a composite hypothesis against a composite alter-
native. There exist basically infinitely many possibilities of choosing values for the elements of
the vector β ∗t , for t = 1, . . . , u, so that the vector of functions f (φ) is equal to the vector 0 when
considering (3) and equal to c in the case of (8). In this paper it is argued that, from a practical
point of view, it suffices to choose one of the infinitely many possibilities. Without loss of gen-
erality, this shall be indicated by means of a special case of the general model (1) and (2), the
dichotomous Rasch model, merely to simplify the presentation. For the dichotomous model al-
ready described above, the sufficient statistic r for the person parameter θ is simply given by the
sum score r = ki=1 xi1 , where xi1 ∈ {0, 1} is the observed value of the response of a person to
category h = 1 of item i. Let the conditional probability that xi1 = 1, given r, be denoted by πir .
CLEMENS DRAXLER 715

It is well-known and was already indicated for the general case above that πir is a function of
the item parameters only. It is given by
(i)
exp(−βi )γr−1 (β)
πir = , (16)
γr (β)
(i)
where γr (β) is the elementary symmetric function of order r of the item parameters and γr−1 (β)
its first-order partial derivative with respect to βi .
The determination of a practically meaningful deviation from the Rasch model shall be
based on the conditional probability πir , for i = 1, . . . , k and r = 1, . . . , k − 1. As was also
already indicated for the general model above, the values r = 0 and r = k are neglected since
the conditional probability πir is for each of these values equal to 1. In order to determine a
practically meaningful deviation, consider hypothesis (3) first. Choose values for the elements
of the vector β which best represent assumptions about the values of the item parameters for
the application under consideration. Useful information about the values of the item parameters
can, for instance, be obtained from the estimated values of a pilot survey or former analyses
of the items under consideration. If there is no such information available at all and there are
no other proper assumptions about the values of the item parameters, one may also choose all
elements of β to be equal. In this paper it will be argued that from a practical point of view it is
not really substantial for the chosen values of the item parameters to be near (or even equal) to
the true or the estimated values of the elements of β. For the application of the procedure to be
proposed in the sequel this is neither of primary interest nor of practical importance. The chosen
values shall merely serve as a (typical, possible) scenario for the sake of determining a practically
meaningful extent of model deviation. Any scenario regardless of its degree of discrepancy to the
real scenario (true values of the item parameters) may in principle be considered. All possible
scenarios serve the same purpose. This will be discussed in more detail below.
Note that for the case of the dichotomous Rasch model the vector of item parameters is
k-dimensional only. That is, β = (β1 , . . . , βi , . . . , βk ). Let the chosen values be denoted by
β = β (0) . Since the conditional probability πir is a function of the elements of β given by
(16), one obtains by using β = β (0) the conditional probabilities under hypothesis (3). That is,
(0)
πir = πir , for i = 1, . . . , k and r = 1, . . . , k − 1. These conditional probabilities represent one
typical scenario (from the infinitely many possible) for the application under consideration for
which hypothesis (3), the Rasch model, holds.
To determine a practically meaningful model deviation, it is proposed to choose with re-
gard to each subpopulation t = 1, . . . , u an alternative value for each conditional probability πir .
Denote the conditional probability associated with subpopulation t by πitr . Under the Rasch
model and under the chosen typical scenario, respectively, it holds that β t = β = β (0) and
(0)
πitr = πir = πir , for i = 1, . . . , k, t = 1, . . . , u, and r = 1, . . . , k − 1. Recall that β t is the
vector of item parameters associated with subpopulation t (which in the case of the dichotomous
model is also only k-dimensional). An equivalent determination of model deviation as defined by
restriction (8) as well as by the denominator of (15) and by the noncentrality parameter (9) is the
following. Choose alternative values with regard to the conditional probabilities under consider-
(1)
ation. That is, πitr = πitr , for i = 1, . . . , k, t = 1, . . . , u, r = 1, . . . , k − 1. Making this choice
the following two restrictions have to be taken into account. The first one is given by
k−1 u k−1
(0) (1)
uπir = πitr , (17)
r=1 t=1 r=1

for i = 1, . . . , k. Note that in the case of a partitioning of the population of respondents according
to the values of the sufficient statistic r = 1, . . . , k − 1 (so that the different subpopulations are
716 PSYCHOMETRIKA

defined to consist of persons with different sum scores), the summation on the right-hand side of
(17) has to be taken over r only, and on the left-hand side u has to be set equal to 1. The second
restriction is given by
k
(1)
πitr = r, (18)
i=1
for t = 1, . . . , u and r = 1, . . . , k − 1. Given (17) and (18) there are

u(k − 1) − 1 (k − 1) = u(k − 1)2 − (k − 1)

conditional probabilities free to vary (free to choose). Again, u will be set equal to 1 if the
subpopulations are defined so that they correspond exactly to the values of r. In a more elegant
(1) (0)
mathematical form these restrictions may be stated as follows. Let δitr = πitr − πir , for i =
r = 1,
1, . . . , k, . . . , k − 1 and t = 1, . . . , u, be defined. Then the restrictions (17) and (18) are
given by ut=1 k−1 r=1 δ itr = 0 and k
i=1 δitr = 0.
(1)
Let β t = β t = (β1t , . . . , βit , . . . , βkt ), for t = 1, . . . , u, denote the vector of item pa-
(1) (1) (1)

rameters under the chosen model deviation, represented by the chosen conditional probabilities.
Consider the structure of the CML estimation equations for the item parameters of the dichoto-
mous Rasch model (for each subpopulation t). It follows from these equations that
k−1 k−1 (1) (it) (1)
(1) exp(−βit )γr−1 (β t )
ntr πitr = ntr (1)
, (19)
r=1 r=1 γr (β t )

where ntr is the number of persons drawn from subpopulation t having obtained the sum score r,
(1) (1) (it) (1)
γr (β t ) is the elementary symmetric function of order r of the elements of β t , and γr−1 (β t )
(1)
is its first-order partial derivative with respect to βit . Using the given (chosen) weights defined
by (13), that is (for the dichotomous model) ntr = nwtr , and substituting ntr in (19) for nwtr , it
follows that
k−1 k−1 (1) (it) (1)
(1) exp(−βit )γr−1 (β t )
nwtr πitr = nwtr ,
r=1 r=1 γr (βt(1) )
(20)
k−1 k−1 (1) (it) (1)
(1) exp(−βit )γr−1 (β t )
wtr πitr = wtr (1)
,
r=1 r=1 γr (β t )
for i = 1, . . . , k and t = 1, . . . , u. The system of equations defined by (20) can easily be solved
by usual numerical algorithms applied for the Rasch model, for instance, by a Newton–Raphson
procedure which is described by Fischer (1974) and Andersen (1980, 1995). Having obtained
values for the elements of β (1) t , for t = 1, . . . , u, in this way, one has thus also obtained the
values for the elements of the vector c, which correspond to the model deviation chosen on the
(1)
basis of πitr = πitr , for i = 1, . . . , k, t = 1, . . . , u, r = 1, . . . , k − 1.
The reason for the proposal of choosing the practically relevant extent of model deviation on
the basis of πitr is that the practical meaning of differences between probabilities may be easier to
interpret than differences between the real-valued item parameters. However, the problem arises
that infinitely many possibilities to choose alternative values for the conditional probabilities
exist, so that the same value of the denominator c [T (φ)Γ ∗ T (φ)]−1 c of (15) representing the
(chosen) model deviation as a whole will be obtained. For instance, with regard to any πitr
the alternative value can be chosen to be greater or smaller than the associated value under
the Rasch model and the typical scenario chosen appropriately. That is, different signs of the
CLEMENS DRAXLER 717

differences between the alternative probabilities and the corresponding values under the chosen
scenario (different directions of deviation) can be determined. Another example is the following.
Concerning any πitr , the deviation of the alternative value from the corresponding value under
the model may be chosen to be greater than that regarding any other conditional probability. That
is, the extent of deviation may be chosen to differ between the conditional probabilities. One
can, for instance, choose the smaller deviations of the alternative values from the corresponding
values under the chosen scenario (under the model) the nearer the conditional probabilities under
the chosen scenario are to their limiting values 0 and 1. However, it is argued in this paper
that considering one of the infinitely many possibilities suffices without limiting the validity
and practicality of the procedure. The determination of only one scenario of model deviation
yields a particular value of the global measure of model deviation c [T (φ)Γ ∗ T (φ)]−1 c which
represents all possible scenarios of deviation of exactly the same global extent. As a consequence,
the statistic (5) has the same distribution for all possible scenarios of model deviation with the
same value of c [T (φ)Γ ∗ T (φ)]−1 c and the probability of the error of the second kind is equal
to the predetermined level β. In other words, a local determination of model deviation with
regard to each free varying conditional probability yields a value for the global deviation which
is consistent with all possible local determinations yielding the same global extent of deviation.
Thus, if one is interested in a global test of the model, the infinitely many possibilities of local
model deviations need not be considered in particular. The consideration of only one suffices. In
the sequel this will be discussed in more detail.

6. Examples Concerning the Dichotomous Rasch Model

In order to get an idea of the sample sizes for different numbers of items and extents of model
deviation for given values of the error probabilities α and β, a number of numerical examples
shall be considered. These illustrations may also serve as a tentative guideline for the practical
application of the procedure theoretically discussed above. The details involved in determining a
practically relevant model deviation will also be discussed by considering a three-step approach
which may have the potential to be routinely used for practical applications. As already indicated
above, the first step involves the choice of a scenario concerning the item parameters. The second
step is to determine a scenario of a local model deviation, and the third step consists in making an
assumption about the weights, the probabilities of observing persons with different score groups.
The three steps yield a value of the measure of global model deviation. This is a value of the
denominator of (15). Along with the error probabilities α and β, the latter determines the total
sample size.

6.1. The Three-Step Process

Starting with the first step, the details involved in a practical application are the following.
Consider that no proper assumptions exist about the true values of the item parameters with
regard to the application under consideration. In such a case it shall be recommended to choose
a scenario with the values of the item parameters kept as simple as possible. This is the reason
for assuming that the values of the item parameters are all equal in the examples discussed here.
On the other hand, if one has reliable information about the values of the item parameters, for
instance, the estimated values from a pilot survey or former analyses, one can base the scenario
on these estimated values.
In the second step assume that the subpopulations are defined by the different score groups
r so that it holds for the conditional probability that πitr = πir , for i = 1, . . . , k, t = 1, . . . , u,
r = 1, . . . , k − 1. It shall be assumed that the rejection of the model will be preferred if the
718 PSYCHOMETRIKA

true value of at least one conditional probability πir referring to one particular (selected) item
(0)
i and one particular (selected) score group r deviates from the corresponding value πir under
the model at least by a certain practically relevant amount δir . In this case the acceptance of
the model will be considered an error of practical importance. Thus, the conditional probability
(1) (1) (0)
under the alternative πir will be determined so that πir − πir = δir holds. From restrictions
(17) and (18) the following is then obtained. It holds that
−δir
πj(1) (0)
r − πj r = = δj r , (21)
(k − 1)
for each item j = i and score group r,

(1) (0) −δir

πis − πis = = δis , (22)
(k − 2)
for item i and each score group s = r, and

(1) (0) δir

πj s − πj s = = δj s , (23)
(k − 1)(k − 2)
for each item j = i and each score group s = r. The determination of these differences in this
way is only one of infinitely many possibilities, but it is also one scenario of model deviation of
practical relevance. If the item parameters are all equal and the true absolute difference concern-
ing the conditional probability of the selected item i and the selected score group r is equal to the
absolute value of δir or greater, the acceptance of the model will be considered an error of prac-
tical importance. If it is smaller than the absolute value of δir but greater than 0, the acceptance
of the model will not be considered an error of practical importance.
The third step involves the choice of the weight wtr = wtr = wr , for r = 1, . . . , k − 1 (the
probability of observing a person with score r). It will be assumed for all numerical examples
that the distribution of the person parameters in the population is such that the number of persons
nr with score r = 1, . . . , k − 1 can be modeled by a discrete random variable having a binomial
distribution with parameters p = 0.5 and n = k − 2. This random variable can take as many
values as there are person scores considered. This is k − 1. Thus, the assumption about the
weight wr is given by the binomial distribution with parameters p = 0.5 and n = k − 2. That is,
k−2
k−2 1
wr = , (24)
r −1 2
for r = 1, . . . , k − 1. For example, for k = 5 and r = 1, . . . , 4 one obtains w1 = w4 = 1/8 and
w2 = w3 = 3/8.
Note that the approach of determining a practically relevant local model deviation (second
step) involves the selection of a particular item i and a particular score group r so that δir can
be fixed. As far as the selection of an item is concerned, it makes no difference (concerning the
global deviation) which item is chosen, as long as in the first step the scenario of equal item
(0) (0)
parameters is assumed. The reason is that π1r = · · · = πkr holds, for r = 1, . . . , k − 1. On the
other hand, the extent of global model deviation will depend on the score group r if the weights
are not equal for all score groups. Thus, if the weights are determined by (24), different values of
the global deviation will be obtained depending on the score group r. The question of which score
group shall be chosen can be answered on the basis of practical considerations only. Consider
the example in which δ11 = 0.05 with regard to item 1 and score group r = 1 as well as weight
w1 = 0.001 will be chosen. This means that the acceptance of the model will be considered an
error of practical importance if the true deviation regarding item 1 and score group r = 1 is at
CLEMENS DRAXLER 719

least as great as 0.05, where the proportion of persons with score r = 1 is only one thousandth.
The practical question is whether the deviation 0.05 is relevant if the proportion of persons to
which it refers is that small. Thus, from a practical viewpoint, it might be more appropriate to
choose a score group r (for fixing δir ) with a greater weight, a proportion which is actually of
practical importance.
In the three-step process described above, particular assumptions are made which are jus-
tified as follows. The first step involves the assumption of a scenario of equal item parameters.
However, in most applications the true item parameters will not be equal. It is argued in this pa-
per that a discrepancy of the assumed from the true scenario neither invalidates the procedure nor
limits its practicality. Rather, it has the advantage of simple applicability. Such a simple scenario
merely serves the purpose of determining a practically relevant model deviation. Determining a
practically relevant model deviation locally on the basis of such a simple scenario (equal item
parameters) yields a value of a global measure of model deviation, the denominator of (15),
which is independent of all possible scenarios and all possible local model deviations yielding
the same value of the global measure. Each of the infinitely many combinations of scenarios with
local model deviations—which yield exactly the same value of the global measure as obtained
on the basis of the scenario of equal item parameters and the particular local model deviation
utilized—is then also considered practically relevant. Whenever the true global deviation is at
least as great as the predetermined global value, the acceptance of the model is considered an
error of practical importance regardless of the particular scenario and the particular local model
deviation utilized. This is the principle which simplifies the applicability of the whole procedure;
otherwise, an infinite number of different scenarios would have to be considered.
To support this argument, consider the following simple example. Let X be a binomially dis-
tributed random variable with parameters n and p. Let n = 100 be given. The hypothesis p = 0.5
shall be tested by applying asymptotic theory. One of a number of statistics serving this purpose
is given by (x − np)2 /[np(1 − p)]. It is asymptotically χ 2 distributed with df = 1. Assume
that X takes on the value x = 45. The observed (or estimated) deviation from the hypothesis
p = 0.5 is equal to p − x/n = 0.05 and (1 − p) − (n − x)/n = −0.05. It can be considered a
local deviation analogous to the procedure concerning the Rasch model proposed above. The χ 2
statistic yields the value χ 2 = 1, which can be seen as a global measure of deviation. Consider
another hypothesis p = 0.9 to be tested and assume that x = 87. In this case, the observed local
deviation from the hypothesis p = 0.9 is smaller. One obtains 0.03 and −0.03 respectively. On
the other hand, the observed value of the χ 2 statistic, the measure of global deviation, yields
the same value as in the first case. It is also equal to 1 (since the variance of X under p = 0.9
is smaller than that under p = 0.5). Since the same value of the global measure is obtained for
the two scenarios, an observed absolute local deviation of 0.05 regarding the scenario p = 0.5
is considered equivalent to an observed absolute local deviation of 0.03 regarding the scenario
p = 0.9. Consequently, if an absolute local deviation of 0.05 concerning the scenario p = 0.5 is
considered practically relevant, an absolute local deviation of only 0.03 concerning the scenario
p = 0.9 is then also considered practically relevant.
Returning to the case of testing the Rasch model, consider the local model deviation δir
regarding item i and score group r. The deviations of the conditional probabilities concerning
the other items and score groups are then given by (21), (22), and (23). Assume that this scenario
of a local model deviation will be considered practically relevant if the true item parameters are
all equal. However, if the true item parameters are not all equal, some conditional probabilities
under the model will be nearer to their limiting values 0 and 1 as under the equality assumption.
Thus, to obtain the same value of the global measure of model deviation as is obtained for the
scenario of equal item parameters with the local model deviation δir , the absolute local deviations
concerning the various conditional probabilities must on average be smaller than for the scenario
of equal item parameters.
720 PSYCHOMETRIKA

TABLE 1.
Total sample sizes (values of the ceiling function of n) for different numbers of items and two different extents of a local
model deviation obtained for α = β = 0.05.

Local model deviation δ11 Number of items k

3 4 5
0.05 1373 3852 8512
0.1 344 963 2128

Note: The local model deviation δ11 refers to item 1 and score group r = 1. The deviations referring to all
other item and score group combinations are given by (21), (22), (23). The weights w1 , . . . , wk−1 are given
by the binomial distribution with parameters p = 0.5 and n = k − 2.

A similar problem to that of assuming the scenario of equal item parameters and a particular
scenario of a local model deviation is involved in the third step. The observed relative frequency
nr /n of the number of persons with score r will almost surely deviate from the corresponding
assumed weight wr . Two approaches shall be proposed to deal with this problem. First, one can
give an analogous argument as the one above for the case of assuming the scenario of equal item
parameters. A value of the global model deviation is obtained only if a local model deviation is
determined along with an assumption about the weights. Consider that the chosen local deviation
δir and the weights given by (24) yield a particular value of the global deviation, say δ. All
possible combinations of different local deviations and different assumptions about the weights
which yield the same value δ will then be considered equivalent so that the same total sample size
will be obtained (given α and β). The second approach of dealing with the problem of a possible
discrepancy between wr and nr /n is the following. Draw enough observations so that for each
score r the number of respondents is at least as large as the values obtained on the basis of the
chosen local deviation δir and the chosen weights according to (24). If one does so, however, the
number of respondents nr will frequently be larger for one or more values of the sum score r
than the calculated values. Thus, the power of the test will be increased so that the predetermined
nominal level β is only the upper bound of the probability of the error of the second kind, given
α and the chosen extent of model deviation.

6.2. Results of the Numerical Examples

Table 1 shows the values of the ceiling function of n (total sample sizes) obtained for α =
β = 0.05 and different numbers of items and two different extents of a local model deviation δ11
with regard to item 1 and score group 1. The weights w1 , . . . , wk−1 are determined by (24) as
described above. Note that cases with r = 0 and r = k are not included in the total sample sizes
presented in Table 1. Detailed results of the sample sizes for each of the other score groups are
then obtained by multiplying the total sample size n for each number of items k given in Table 1
with the respective values of the weights w1 , . . . , wk−1 determined by (24). For example, for the
case k = 5 and δ11 = 0.05 one obtains n1 = n4 = 1064 and n2 = n3 = 3192.
As can be seen in Table 1, the total sample size increases with an increasing number of
items. This is a consequence of the approach of determining the local model deviation as pro-
posed above by choosing δir and using (21), (22), and (23). The more items and the more score
groups that are considered, the smaller will be the absolute average deviations concerning all
items and score groups so that greater sample sizes will be obtained. Table 1 also shows that
the total sample size increases rapidly with an increasing number of items. Thus, for larger item
numbers it is recommended to partition the range of sum scores into two regions only, a low and
a high score region. This approach is also consistent with the usual practice of testing the Rasch
model in which mostly a low and a high score region are considered. This will be demonstrated in
CLEMENS DRAXLER 721

TABLE 2.
Total sample sizes (values of the ceiling function of n) for different numbers of items and two different extents of a local
model deviation obtained for α = β = 0.05.

Local model deviation δ11 = · · · = δ1r0 Number of items k

9 15 19 25 35 45
0.05 2140 2616 2871 3199 3655 3956
0.1 535 654 718 800 914 1011

Note: The local model deviation δ11 = · · · = δ1r0 refers to item 1 and the score groups of the low score
region, where r0 = (k − 1)/2. The deviations referring to all other item and score group combinations are
determined analogously to (21), (22), (23) accounting for restrictions (17), (18). The weights w1 , . . . , wk−1
are given by the binomial distribution with parameters p = 0.5 and n = k − 2.

the subsequent examples. The scenario of equal item parameters will again be utilized and it will
be assumed that the weight wr , for r = 1, . . . , k − 1, is given by (24). Concerning the determina-
tion of a local model deviation, consider the following. Let r = r0 be the largest value of r (the
largest score group) belonging to the low score region. That is, the low score region consists of
all values of r from 1 to r0 . Hence, the high score region consists of all values of r from r0 + 1 to
k − 1. For the following examples let r0 = (k − 1)/2. Note that all considered values of k are odd.
It is assumed that the rejection of the model will be preferred if all true conditional probabilities
πi1 , . . . , πir0 referring to the low score region of at least one (selected) item i deviate simulta-
(0) (0)
neously from the corresponding values πi1 , . . . , πir0 under the model (the assumed scenario of
equal item parameters) by a practically relevant amount δi1 = · · · = δir0 (which is equal for all
score groups belonging to the low score region). Thus, the alternative conditional probabilities
(1) (1) (1) (0)
πi1 , . . . , πir0 are determined so that πir − πir = δir holds for r = 1, . . . , r0 . The deviations
concerning all other conditional probabilities are then determined analogously to (21), (22), and
(23) accounting for restrictions (17) and (18). Assume again that this scenario yields a particular
value of the global model deviation, say δ. Consider the case where the true absolute deviations
concerning the selected item i and the score groups of the low score region are smaller than the
absolute value of the predetermined deviation δi1 = · · · = δir0 but where the true extent of the
global model deviation is equal to δ (because the true item parameters are not equal and/or the
true absolute deviations concerning other item and score group combinations are greater than for
the considered local scenario above). Then the acceptance of the model will also be considered
an error of practical importance even though the true absolute deviations concerning item i and
the score groups of the low score region are smaller than the absolute value of the predetermined
deviation δi1 = · · · = δir0 .
Table 2 shows the results for different (larger) numbers of items and two different extents of
a local model deviation δ11 = · · · = δ1r0 with regard to item 1 and the low score region. It shows
the obtained values of the ceiling function of n. Again, the cases with r = 0 and r = k are not
included in the total sample sizes.
(1)
Note that one has to make sure that the alternative conditional probabilities πi1 , . . . , πir(1)0
do not fall outside of the admissible interval. For larger item numbers it is thus recommended
to choose a positive sign for δi1 = · · · = δir0 . The examples show an interesting result from
a theoretical and practical point of view. For each of the two different local model deviations
considered in Table 2, a value of the global measure of model deviation is obtained which is
approximately equal for all item numbers. The denominator of (15) takes approximately the value
1/100 for the local model deviation δ11 = · · · = δ1r0 = 0.05 and approximately 4/100 for the case
δ11 = · · · = δ1r0 = 0.1, independent of the item number. This observation may lead to a useful
classification of the global measure of model deviation in order for a few practically relevant
categories to represent different levels of the global model deviation. Such a classification would
722 PSYCHOMETRIKA

probably improve the practicality of the procedure and be of great help for applied researchers.
One may consider three categories low, middle, and high, analogous to the classifications of
different measures regarding different statistical tests by Cohen (1988). Determining a practically
relevant model deviation could then easily be realized by choosing one of three values of the
denominator of (15).

7. Discussion

Glas and Verhelst (1995a, 1995b) have reviewed the most prominent approaches to testing
the Rasch model. This paper is generally concerned with supplementing statistical tests of the
Rasch model so that the probability of the error of the second kind (Type II probability) can, in
addition to the probability of the error of the first kind (Type I probability), be controlled at a
predetermined level. The focus in particular lies on the Wald test (Wald, 1943), which was also
discussed by Glas and Verhelst (1995a, 1995b). The motivation for this paper stems from the un-
derstanding that, from a practical viewpoint, the negative consequences of an error of the second
kind are at least as serious as those of an error of the first kind. Many researchers applying the
Rasch model will probably agree that in many cases the consequences of an error of the second
kind are even more serious. Thus, the need of statistical tests for the Rasch model controlling the
error of the second kind at a predetermined, desired level is of great practical interest. Unfortu-
nately, to the author’s knowledge, there has been no paper providing a satisfactory solution for
the problem under consideration, i.e., the predetermination of a practically meaningful and rele-
vant deviation from the Rasch model as well as the derivation of the probability distribution of a
statistic under the predetermined deviation. In this paper a solution is proposed which is based on
a three-step approach. The first step assumes one proper scenario with regard to the values of the
item parameters. The second step determines practically relevant deviations from the conditional
probabilities which follow from the assumed scenario (values of the item parameters) of the first
step. This is referred to as a local model deviation or one scenario of a local model deviation.
The third step makes an assumption about the weights, the probabilities of observing persons
with different score groups. From the determination of one particular scenario of local model de-
viation (alternative conditional probabilities) and particular weights, a value of the global extent
of model deviation will be obtained. This is a value of the denominator of (15). Disregarding
the infinitely many possible scenarios concerning the values of the item parameters in the first
step, the infinitely many possibilities of determining model deviations locally in the second step,
and the infinitely many possibilities of choosing weights in the third step has no influence on
the result (total sample size). All combinations of scenarios from the first step, scenarios from
the second step, and scenarios regarding the weights which yield the same value of the global
measure of model deviation will be considered equivalent extents of model deviation. Thus, the
consideration of only one combination of scenarios of the three steps suffices. The particular
chosen combination of scenarios of the three steps merely serves the purpose of predetermining
the global extent of model deviation. The optimal total sample size depends on the predetermined
global extent of deviation and the levels of the error probabilities α, β only, not on each of the
infinitely many combinations of scenarios of the three steps which yield an equal global extent.
It is customary in statistics to utilize global measures of model discrepancy on which sam-
ple size and power considerations are based. For instance, the approach by Cohen (1988) for a
number of different statistical tests is well known. Another nice example occurs in the context of
structural equation modeling. Satorra and Saris (1985) discuss a procedure (to determine a global
discrepancy) which uses the quadratic expression of the Wald statistic also used in this paper.
Another point worth discussing concerns the formulation of the alternative hypothesis of the
test described here. The determination of a practically meaningful model deviation is based on
CLEMENS DRAXLER 723

a partition of the population of respondents. It is assumed that different Rasch models (different
item parameters) hold for different subpopulations. If the partition is based on the person score
which is a sufficient statistic for the person parameter, such a formulation of the alternative
will also cover models assuming unequal item discriminations as well as models with a lower
asymptote parameter like the 2 PL and 3 PL models.
The first part of this paper deals with a general class of Rasch models which includes models
for polytomously scored items and multidimensional models. The last part of the paper is con-
cerned with the dichotomous model only, where a concrete approach of determining practically
relevant model deviations (choosing a combination of scenarios of the three steps above) is dis-
cussed in detail. The main purpose is to show the practicality of the procedure based on a simple
case. The application of the presented approach of determining model deviations to other models
belonging to the general class of Rasch models (1) is straightforward.
As a perspective with regard to subsequent research, it would probably be of help for applied
researchers if the global measure of model deviation given by the denominator of (15) could be
classified according to a few practically relevant categories (e.g., low, middle, high) representing
different levels of model deviation. For given values of the probabilities of the errors of the first
and second kind, the optimal total sample sizes could then be provided for the different practi-
cally relevant levels of the global measure of model deviation, analogous to the tables by Cohen
(1988). Tentative results for the dichotomous Rasch model are provided in this paper. These re-
sults show that, independent of the number of items, two global levels of model deviation can
be distinguished. Furthermore, it may also be of interest for subsequent research to treat power
and sample size considerations concerning other test statistics mentioned in the introduction and
compare them with the results in this paper.

References

Agresti, A. (2002). Categorical data analyses. New York: Wiley.

Aitchison, J., & Silvey, S.D. (1958). Maximum likelihood estimation of parameters subject to. restraints. Annals of
Mathematical Statistics, 29, 813–828.
Andersen, E.B. (1970). Asymptotic properties of conditional maximum likelihood estimators. Journal of the Royal Sta-
tistical Society, Series B, 32, 283–301.
Andersen, E.B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.
Andersen, E.B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42, 69–81.
Andersen, E.B. (1980). Discrete statistical models with social science applications. Amsterdam: North-Holland.
Andersen, E.B. (1995). Polytomous Rasch models and their estimation. In G.H. Fischer, & I.W. Molenaar (Eds.), Rasch
models—foundations, recent developments and applications (pp. 271–291). New York: Springer.
Andrich, D. (1978a). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.
Andrich, D. (1978b). A binomial latent trait model for the study of Likert-style attitude questionnaires. British Journal
of Mathematical and Statistical Psychology, 31, 84–98.
Chen, Y., & Small, D. (2005). Exact tests for the Rasch model via sequential importance sampling. Psychometrika, 70,
11–30.
Cohen, J. (1988). Statistical power analyses for the behavioral sciences. New York: Erlbaum.
Fischer, G.H. (1974). Einführung in die Theorie psychologischer Tests. (Introduction to the theory of psychological tests).
Bern: Huber.
Glas, C.A.W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika,
53, 525–546.
Glas, C.A.W., & Verhelst, N.D. (1989). Extensions of the partial credit model. Psychometrika, 54, 635–659.
Glas, C.A.W., & Verhelst, N.D. (1995a). Testing the Rasch model. In G.H. Fischer, & I.W. Molenaar (Eds.), Rasch
models—foundations, recent developments and applications (pp. 69–95). New York: Springer.
Glas, C.A.W., & Verhelst, N.D. (1995b). Tests of fit for polytomous Rasch models. In G.H. Fischer, & I.W. Molenaar
(Eds.), Rasch models—foundations, recent developments and applications (pp. 325–352). New York: Springer.
Kelderman, H. (1984). Log linear Rasch model tests. Psychometrika, 49, 223–245.
Kelderman, H. (1989). Item bias detection using log linear IRT. Psychometrika, 54, 681–697.
Martin-Löf, P. (1973). Statistiska modeller. (Statistical models. Notes from seminars 1969–1970 by Rolf Sundberg.)
Stockholm.
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Molenaar, I.W. (1983). Some improved diagnostics for failure of the Rasch model. Psychometrika, 48, 49–72.
Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika, 66, 437–460.
724 PSYCHOMETRIKA

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: The Danish Institute of
Education Research (Expanded Edition, 1980. Chicago: University of Chicago Press).
Rasch, G. (1961). On general laws and the meaning of measurement in psychology. Berkeley: University of California
Press.
Satorra, A., & Saris, W.E. (1985). The power of the likelihood ratio test in covariance structure analysis. Psychometrika,
50, 83–90.
Snijders, T.A.B. (1991). Enumeration and simulation methods for 0-1 matrices with given marginals. Psychometrika, 56,
397–417.
Stroud, T.W.F. (1972). Fixed alternatives and Wald’s formulation of the noncentral asymptotic behavior of the likelihood
ratio statistic. Annals of Mathematical Statistics, 43, 447–454.
van den Wollenberg, A. (1982). Two new test statistics for the Rasch model. Psychometrika, 47, 123–140.
Verhelst, N.D. (2008). An efficient MCMC algorithm to sample binary matrices with fixed marginals. Psychometrika,
73, 705–728.
Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model. In G.H. Fischer, & I.W. Molenaar (Eds.),
Rasch models—foundations, recent developments and applications (pp. 215–237). New York: Springer.
Verhelst, N.D., Glas, C.A.W., & Verstralen, H.H.F.M. (1994). OPLM: Computer program and manual. Arnhem: CITO.
Wald, A. (1943). Tests of statistical hypothesis concerning several parameters when the number of observations is large.
Transactions of the American Mathematical Society, 54, 426–482.
Wilson, M., & Masters, G.N. (1993). The partial credit model and null categories. Psychometrika, 58, 87–99.

Manuscript Received: 25 SEP 2008

Final Version Received: 12 MAY 2010
Published Online Date: 15 OCT 2010

CML Based Estimation of Extended Rasch Models With The Erm Package in R
No ratings yet
CML Based Estimation of Extended Rasch Models With The Erm Package in R
18 pages
Rasch
No ratings yet
Rasch
9 pages
Multivariate and Mixture Distribution Rasch Models Extensions and Applications Complete Volume Download
100% (19)
Multivariate and Mixture Distribution Rasch Models Extensions and Applications Complete Volume Download
14 pages
Rasch Model
No ratings yet
Rasch Model
3 pages
Ed 228267
No ratings yet
Ed 228267
34 pages
Rasch Model: Psychometric Analysis
No ratings yet
Rasch Model: Psychometric Analysis
8 pages
Journal of Statistical Software: Pcirt: An R Package For Polytomous and
No ratings yet
Journal of Statistical Software: Pcirt: An R Package For Polytomous and
14 pages
Rasch Model
No ratings yet
Rasch Model
3 pages
Rasch Model: January 2010
No ratings yet
Rasch Model: January 2010
3 pages
Graduate School of Education University of California Los Angeles, California
No ratings yet
Graduate School of Education University of California Los Angeles, California
18 pages
2part Latent Trait
No ratings yet
2part Latent Trait
33 pages
Rasch Models Foundations, Recent Developments, and Applications
No ratings yet
Rasch Models Foundations, Recent Developments, and Applications
428 pages
Quantitative Psychology Research The 78th Annual Meeting of The Psychometric Society Academic PDF Download
100% (17)
Quantitative Psychology Research The 78th Annual Meeting of The Psychometric Society Academic PDF Download
15 pages
Proefschrift Robert Zwitser PDF
No ratings yet
Proefschrift Robert Zwitser PDF
127 pages
Quantitative Psychology Research The 80th Annual Meeting of The Psychometric Society, Beijing, 2015 Scribd Download
100% (15)
Quantitative Psychology Research The 80th Annual Meeting of The Psychometric Society, Beijing, 2015 Scribd Download
16 pages
Rasch Models & R Package eRm Guide
100% (1)
Rasch Models & R Package eRm Guide
40 pages
Large Sample Problems
No ratings yet
Large Sample Problems
9 pages
Doebler Et Al. (2013)
No ratings yet
Doebler Et Al. (2013)
18 pages
Bolt, D. M., A. S. Cohen, Et Al.
No ratings yet
Bolt, D. M., A. S. Cohen, Et Al.
29 pages
A Fundamental Conundrum in Psychology's Standard Model of Measurement and Its Consequences For Pisa Global Rankings.
100% (1)
A Fundamental Conundrum in Psychology's Standard Model of Measurement and Its Consequences For Pisa Global Rankings.
10 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Applied Statistics for Health Studies
No ratings yet
Applied Statistics for Health Studies
28 pages
Likert Data Analysis Model
No ratings yet
Likert Data Analysis Model
14 pages
STAT 231 Course Notes Winter
100% (1)
STAT 231 Course Notes Winter
358 pages
Summary - Introduction To Statistics
No ratings yet
Summary - Introduction To Statistics
23 pages
Final Review Handout
No ratings yet
Final Review Handout
47 pages
General: Exact Binomial Confidence Intervals For Randomized Response
No ratings yet
General: Exact Binomial Confidence Intervals For Randomized Response
7 pages
Lecture Notes Week 2
No ratings yet
Lecture Notes Week 2
10 pages
Testlet Response Theory and Its Applications 1st Edition Howard Wainer Download PDF
No ratings yet
Testlet Response Theory and Its Applications 1st Edition Howard Wainer Download PDF
47 pages
Intro Stats Course Overview
No ratings yet
Intro Stats Course Overview
10 pages
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
No ratings yet
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
4 pages
Hamel
No ratings yet
Hamel
19 pages
Bayesian Randomized Item Response Theory
No ratings yet
Bayesian Randomized Item Response Theory
17 pages
Efficacy: Wollenberg
No ratings yet
Efficacy: Wollenberg
4 pages
CI and HT PDF
No ratings yet
CI and HT PDF
8 pages
Statistical Analysis Techniques Guide
0% (1)
Statistical Analysis Techniques Guide
8 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
Stats-Proj Group 2
0% (1)
Stats-Proj Group 2
53 pages
Topic 6 - 2024
No ratings yet
Topic 6 - 2024
5 pages
Statistical Modeling Notes
No ratings yet
Statistical Modeling Notes
25 pages
Unidimensionality in Rasch Models Efficient
No ratings yet
Unidimensionality in Rasch Models Efficient
26 pages
STAT 231 Course Notes W16 Print
No ratings yet
STAT 231 Course Notes W16 Print
424 pages
Testlet Response Theory and Its Applications 1st Edition Howard Wainer - Downloadable PDF 2025
No ratings yet
Testlet Response Theory and Its Applications 1st Edition Howard Wainer - Downloadable PDF 2025
52 pages
Advanced Research Method PPT Session5
No ratings yet
Advanced Research Method PPT Session5
21 pages
Application: Algorithm
No ratings yet
Application: Algorithm
18 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
6 pages
tmp7B5E TMP
No ratings yet
tmp7B5E TMP
14 pages
405 Irt
No ratings yet
405 Irt
51 pages
Bayesian Inference For Multistage and Other Incomplete Designs
No ratings yet
Bayesian Inference For Multistage and Other Incomplete Designs
19 pages
Statistics BI: Models of Random Outcomes. What Is A Model?
No ratings yet
Statistics BI: Models of Random Outcomes. What Is A Model?
22 pages
Pareto Test Paper
No ratings yet
Pareto Test Paper
48 pages
Non ParametricTestPresentation
No ratings yet
Non ParametricTestPresentation
28 pages
Bio24 Rathouz
No ratings yet
Bio24 Rathouz
45 pages
Stat 231 Course Notes
100% (1)
Stat 231 Course Notes
326 pages
MC Clure
No ratings yet
MC Clure
24 pages
Rezumat Teza - English
No ratings yet
Rezumat Teza - English
45 pages
Test2 2040 18W V1
100% (1)
Test2 2040 18W V1
11 pages
Dimitris
No ratings yet
Dimitris
46 pages
AP Statistics 2022 Correction
No ratings yet
AP Statistics 2022 Correction
22 pages
OECD Full Version
No ratings yet
OECD Full Version
44 pages
Effects of A Group-Based Physical and Cognitive Intervention
No ratings yet
Effects of A Group-Based Physical and Cognitive Intervention
9 pages
Hospital-At-home Integrated Care Programme For Older Patients With
No ratings yet
Hospital-At-home Integrated Care Programme For Older Patients With
5 pages
Longitudinal Effects On Self-Determination
No ratings yet
Longitudinal Effects On Self-Determination
13 pages
Developing A Holistic, Multidisciplinary
No ratings yet
Developing A Holistic, Multidisciplinary
7 pages
Starck Et Al. 2019 - Fertility and Marital Status in Adults With Childhood Onset Epilepsy - A Population-Based Cohort Study
No ratings yet
Starck Et Al. 2019 - Fertility and Marital Status in Adults With Childhood Onset Epilepsy - A Population-Based Cohort Study
7 pages
How Trained Volunteers Can Improve The Quality of Hospital Care For
No ratings yet
How Trained Volunteers Can Improve The Quality of Hospital Care For
6 pages
1 s2.0 S0020748908002721 Main
No ratings yet
1 s2.0 S0020748908002721 Main
14 pages
Fatigue Scale
No ratings yet
Fatigue Scale
10 pages
On The Question of An Identify Category Order
No ratings yet
On The Question of An Identify Category Order
9 pages
Regras de Ligaçao 2005
No ratings yet
Regras de Ligaçao 2005
8 pages
WHOCB For AUutism
0% (1)
WHOCB For AUutism
44 pages
Regras de Ligação 2002
No ratings yet
Regras de Ligação 2002
6 pages
Thuy 2020
No ratings yet
Thuy 2020
10 pages
Semansky Et Al 2013 How States Use Medicaid To Fund Community Based Services To Children With Autism Spectrum Disorders
No ratings yet
Semansky Et Al 2013 How States Use Medicaid To Fund Community Based Services To Children With Autism Spectrum Disorders
5 pages
Dababnah 2015
No ratings yet
Dababnah 2015
11 pages
JCN 10 774 Wald Test
No ratings yet
JCN 10 774 Wald Test
1 page
Lo-Mendell-Rubin Adjusted Likelihood Ratio Test and Wald Test - Statalist
No ratings yet
Lo-Mendell-Rubin Adjusted Likelihood Ratio Test and Wald Test - Statalist
12 pages
An Augmented Autoregressive Distributed Lag Bounds Test For Cointegration
No ratings yet
An Augmented Autoregressive Distributed Lag Bounds Test For Cointegration
13 pages
127 A Pathway To Financial Inclusion - Mobile Money and Individual Savings in Uganda
No ratings yet
127 A Pathway To Financial Inclusion - Mobile Money and Individual Savings in Uganda
32 pages
Sample Size and Optimal Design For Logistic Regression With Binary Interaction - Eugene Demidenko
No ratings yet
Sample Size and Optimal Design For Logistic Regression With Binary Interaction - Eugene Demidenko
11 pages
Reml Guide
No ratings yet
Reml Guide
93 pages
Management Convergence: (An International Journal of Management)
No ratings yet
Management Convergence: (An International Journal of Management)
13 pages
Linear Regression Model
No ratings yet
Linear Regression Model
195 pages
Exercise 11 Answers
No ratings yet
Exercise 11 Answers
3 pages
Solution Assignment
No ratings yet
Solution Assignment
34 pages
Lecture 4 ARDL - Bounds - Test
No ratings yet
Lecture 4 ARDL - Bounds - Test
58 pages
2017 13 Report
No ratings yet
2017 13 Report
36 pages
New Developments in Statistical Information Theory Based On Entropy and Divergence Measures
No ratings yet
New Developments in Statistical Information Theory Based On Entropy and Divergence Measures
346 pages
Handbook of Econometrics Vol2
No ratings yet
Handbook of Econometrics Vol2
675 pages
R Regress Post Estimation Time Series
No ratings yet
R Regress Post Estimation Time Series
12 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
2 pages
高级计量经济学博士课程
No ratings yet
高级计量经济学博士课程
80 pages
Examining Change in Maximal Reliability For Multiple-Component Measuring Instruments
No ratings yet
Examining Change in Maximal Reliability For Multiple-Component Measuring Instruments
18 pages
Logit Models for Binary Data
No ratings yet
Logit Models for Binary Data
50 pages
The Asymmetric Effects of Crude Oil Prices and Exchange Rates On Diesel Prices For 27 European Countries
No ratings yet
The Asymmetric Effects of Crude Oil Prices and Exchange Rates On Diesel Prices For 27 European Countries
25 pages
Dollarization and Foreign Exchange Rate Volatility in Nigeria: The Role of Institutional Quality
No ratings yet
Dollarization and Foreign Exchange Rate Volatility in Nigeria: The Role of Institutional Quality
16 pages
GARCH & Volatility Test Questions
No ratings yet
GARCH & Volatility Test Questions
3 pages
Wald Tests of Nonlinear Restrictions
No ratings yet
Wald Tests of Nonlinear Restrictions
5 pages
Multiple Imputation IN: Mplus
No ratings yet
Multiple Imputation IN: Mplus
19 pages
Balme Et Al. 2007 (Feeding - Habitat - Selection - by - Hunting - Leopards - Pant
No ratings yet
Balme Et Al. 2007 (Feeding - Habitat - Selection - by - Hunting - Leopards - Pant
11 pages
Econometrics II: Binary Models Guide
No ratings yet
Econometrics II: Binary Models Guide
63 pages
Massive MIMO Radar For Target Detection
No ratings yet
Massive MIMO Radar For Target Detection
12 pages
Bad News 3
No ratings yet
Bad News 3
13 pages
Econometric Estimators for Experts
No ratings yet
Econometric Estimators for Experts
38 pages
17.874 Lecture Notes Part 6: Panel Models
No ratings yet
17.874 Lecture Notes Part 6: Panel Models
13 pages

Sample Size Determination For Rasc

Uploaded by

Sample Size Determination For Rasc

Uploaded by

PSYCHOMETRIKA — VOL . 75, NO .

SAMPLE SIZE DETERMINATION FOR RASCH MODEL TESTS

2. A General Class of Rasch Models

where θ = (θ1 , . . . , θp , . . . , θq ) is a vector of real-valued person parameters and β i = (βi1 , . . . ,

3. The Wald Statistic for Testing the Rasch Model

on the basis of the following statistic proposed by Wald (1943). It is given by

where Σ ∗t is a submatrix of the (complete) asymptotic covariance matrix Σ t of β̂ t , for t =

4. Determining the Sample Size

ntr = nwtr , (13)

Equating λ0 to (9) and using (14), it follows that

5. The Formulation of Practically Meaningful Model Deviations

6. Examples Concerning the Dichotomous Rasch Model

6.1. The Three-Step Process

(1) (0) −δir

(1) (0) δir

Local model deviation δ11 Number of items k

6.2. Results of the Numerical Examples

Local model deviation δ11 = · · · = δ1r0 Number of items k

Agresti, A. (2002). Categorical data analyses. New York: Wiley.

Manuscript Received: 25 SEP 2008

You might also like