0% found this document useful (0 votes)

7 views10 pages

Varin 2005

Uploaded by

lforzani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

Varin 2005

Uploaded by

lforzani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Biometrika (2005), 92, 3, pp.

519–528
© 2005 Biometrika Trust
Printed in Great Britain

A note on composite likelihood inference and model selection

B CRISTIANO VARIN
Department of Statistics, University of Padova, via C. Battisti 241, I-35121 Padova, Italy
[email protected]

 PAOLO VIDONI

Department of Statistics, University of Udine, via T reppo 18, I-33100 Udine, Italy
[email protected]

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

S
A composite likelihood consists of a combination of valid likelihood objects, usually
related to small subsets of data. The merit of composite likelihood is to reduce the com-
putational complexity so that it is possible to deal with large datasets and very complex
models, even when the use of standard likelihood or Bayesian methods is not feasible. In
this paper, we aim to suggest an integrated, general approach to inference and model
selection using composite likelihood methods. In particular, we introduce an information
criterion for model selection based on composite likelihood. We also describe applications
to the modelling of time series of counts through dynamic generalised linear models and to
the analysis of the well-known Old Faithful geyser dataset.

Some key words: Dynamic generalised linear model; Hidden Markov model; Information criterion; Old Faithful
geyser data; Pairwise likelihood.

1. I
In a number of applications, the presence of large correlated datasets or the specification
of highly structured statistical models makes infeasible the computation of the likelihood
function. An alternative to ordinary likelihood methods or Bayesian strategies is to adopt
simpler pseudolikelihoods, like those belonging to the composite likelihood class (Lindsay,
1988). A composite likelihood consists of a combination of valid likelihood objects, usually
related to small subsets of data. It has good theoretical properties and it behaves well
in many applications concerning, for example, spatial statistics (Hjort & Omre, 1994;
Heagerty & Lele, 1998; Varin et al., 2005), multivariate survival analysis (Parner, 2001),
generalised linear mixed models (Renard et al., 2004), frailty models (Henderson &
Shimakura, 2003) and genetics (Fearnhead & Donnelly, 2002). In this paper, we develop
and justify an integrated, general approach for inference and model selection, using com-
posite likelihood methods. In particular, we focus on a new information criterion for
model selection, which is the counterpart of Takeuchi’s information criterion (Takeuchi,
1976; Shibata, 1989, p. 222; Burnham & Anderson, 2002), based on composite likelihood.
520 C V  P V
2. I      
The term composite likelihood (Lindsay, 1988) denotes a rich class of pseudolikelihoods
based on likelihood-type objects.
D 1. L et { f (y; h), yµY, hµH} be a parametric statistical model, with YkRn,
HkRd, n1 and d1. Consider a set of events {A : A kF, iµI}, where IkN and F
i i
is some sigma algebra on Y. A composite likelihood is defined as
L (h; y)= a f (yµA ; h)wi,
C i
iµI
where f (yµA ; h)= f ({y µY : y µA }; h), with y=(y , . . . , y ), while {w , iµI} is a set of
i j j i 1 n i
suitable weights. T he associated composite loglikelihood is l (h; y)=log L (h; y).
C C
With the full likelihood L (h; y) viewed as a special case, we can group composite
likelihoods into two general classes. The first includes ‘subsetting methods’ and it
contains, for example, the pairwise likelihood (Cox & Reid, 2004), which is based on
marginal events related to pairs of observations. Analogously, we may define the tripletwise

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

likelihood and so on. These composite likelihoods will be considered in the applications
of § 4. The second class is based on ‘omission methods’, since composite likelihoods
are obtained by omitting components of the full likelihood. The mth-order likelihood
(Azzalini, 1983) f (y ) X n f (y |yi−1 ), with yi−1 =(y , . . . , y ) and mµ{1, . . . , n−1}
1 i=2 i i−m i−m i−m i−1
fixed, belongs to this class. Other examples are the Besag (1974) pseudolikelihood and
the partial likelihood (Cox, 1975).
Since each component of L (h; y) is a likelihood object, it is almost immediate to
C
state that the estimating equation Vl (h; y)=0 is unbiased, under the usual regularity
C
conditions. The associated maximum composite likelihood estimator h@ =h@ (Y )
MCL MCL
is consistent and asymptotically normally distributed, with mean h and variance matrix
H(h)−1J(h)H(h)−T. Here, J(h)=var {Vl (h; Y )} and H(h)=E {V2l (h; Y )}, where the
f C f C
expectations are with respect to f (y; h). Although a deep analysis of eﬃciency issues is
still lacking, some useful results may be found in Lindsay (1988) and Cox & Reid (2004).
In this paper, we aim to emphasise the use of composite likelihood methods both for
making inference and for model selection purposes. We shall introduce a predictive model
selection procedure based on the following generalisation of the Kullback–Leibler
information.
D 2. Given a random variable Z=(Z , . . . , Z ), with density g(z), the composite
1 n
Kullback–L eibler divergence of a density h(z) relative to g(z) is

C q rD
L (g; Z)
I (g, h)=E log C = ∑ E {log g(ZµA )−log h(ZµA )}w ,
C g(z) L (h; Z) g(z) i i i
C iµI
where L (g; Z)= X g(ZµA )wi and L (h; Z)= X h(ZµA )wi.
C iµI i C iµI i
Note that I (g, h) is defined as the expectation, with respect to the true density g(z),
C
of the diﬀerence between the composite loglikelihoods associated with g(z) and h(z),
respectively. Moreover, it is a linear combination of the ordinary Kullback–Leibler
divergences, corresponding to the likelihood objects forming the composite likelihood
function.
Consider the sample Y =(Y , . . . , Y ) and a parametric statistical model specified by the
1 n
family of density functions { f (y; h), yµY, hµH}, with respect to a common dominating
measure. There might be several plausible statistical models for Y , which may or may not
Composite likelihood inference 521
contain the true g(y). We would like to choose the model which oﬀers the most satisfactory
predictive description of the observed data y. To be more precise, if Z is a future random
variable, defined as an independent copy of Y , we are interested in the choice of the ‘best’
model for forecasting Z, given a realisation of Y , using composite likelihood methods.
As usual for an information criterion, model selection can be approached on the basis
of the expected composite Kullback–Leibler information between the true density g(z)
and the estimated density @f (z)= f (z; h@ ), under the assumed statistical model: we
MCL
select the model which minimises E {I (g, @f )} or, equivalently, which maximises
g(y) C
Q(g, f )= ∑ E [E {log f (ZµA ; h@ )}]w . (1)
g(y) g(z) i MCL i
iµI
Equation (1) defines a theoretical criterion for predictive model selection, using composite
likelihood. However, it requires the knowledge of the true density g(z). Thus, in practice
we should maximise a selection statistic Q@ (g, f ), defined as a suitable estimator for Q(g, f ),
based on Y . In particular, we look for estimators that are unbiased, either exactly or to
the relevant order of approximation. A natural estimator is

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

l (h@ ; Y )=log L (h@ ; Y )= ∑ log f (Y µA ; h@ )w ,
C MCL C MCL i MCL i
iµI
which is the sample counterpart of (1) and corresponds to the maximised composite
loglikelihood. In the following section, we prove that l (h@ ; Y ) is biased and we introduce
C MCL
a modification which corrects the first-order bias.

3. A -   

Since standard likelihood theory under misspecification (White, 1994) may be usually
considered, with modest changes, within the composite likelihood context, we state the
following usual regularity assumptions.
Assumption 1. The parameter space H is a compact subset of Rd (d1) and, for every
fixed yµY, L (h; y) is twice diﬀerentiable, with continuity, with respect to h.
C
Assumption 2. The estimator h@ is defined as a solution to the composite likelihood
MCL
equation and there exists a vector h µint(H) such that, exactly or with an error term
*
that is negligible as n +2, E {Vl (h ; Y )}=0.
g(y) C *
Assumption 3. The estimator h@ is consistent for h and asymptotically normally
MCL *
distributed.
The quantity h µint(H) is a pseudo-true parameter value, such that the composite
*
Kullback–Leibler divergence between g(y) and f (y; h) is minimal. If the model is correctly
specified for Y , g(y)= f (y; h ), for some h µint(H), which is the true parameter value.
0 0
Note that, since we are concerned with model selection problems, two potential sources
of misspecification are involved: the first one reflects the fact that the true distribution
may not belong to the working family of distributions and the second is related to the
use of the composite likelihood instead of the full likelihood function. As emphasised in
the following, when using composite likelihood methods, we have to consider likelihood
theory under misspecification even if the true model for the data is taken into account.
The following lemmas motivate the main result of the paper. The arguments involve
expansions that are standard in full likelihood considerations and are similar to those
leading to Takeuchi’s information criterion, discussed below. However, the results are
presented here in a more general context, using composite likelihood.
522 C V  P V
L 1. Under Assumptions 1–3, we have that
Q(g, f )=E {l (h ; Y )}+1 tr{J(h )H(h )−1}+o(1),
g(y) C * 2 * *
with
J(h )=var {Vl (h ; Y )}, H(h )=E {V2l (h ; Y )}, (2)
* g(y) C * * g(y) C *
where the expectations are with respect to g(y).
Proof. Consider the stochastic Taylor expansion for l {h@ (Y ); Z} around h@ (Y )=h :
C MCL MCL *
l {h@ (Y ); Z}=l (h ; Z)+{h@ (Y )−h }TVl (h ; Z)
C MCL C * MCL * C *
+1 {h@ (Y )−h }T{V2l (h ; Z)}{h@ (Y )−h }+o (1).
2 MCL * C * MCL * p
Taking expectations term by term, with respect to the true distribution of Z, since Y and Z
are independent and identically distributed, and since Assumption 2 holds, we have
E [l {h@ (Y ); Z}]=E {l (h ; Y )}

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

g(z) C MCL g(y) C *
+1 {h@ (Y )−h }TE {V2l (h ; Z)}{h@ (Y )−h }+o (1).
2 MCL * g(z) C * MCL * p
Moreover, the mean value of the above expansion, with respect to the true distribution
of Y , gives
Q(g, f )=E {l (h ; Y )}+1 tr{H(h )V (h )}+o(1), (3)
g(y) C * 2 * *
where V (h )=E [{h@ (Y )−h }{h@ (Y )−h }T]. By means of standard asymptotic
* g(y) MCL * MCL *
arguments concerning the variance matrix of h@ (Y ), we obtain
MCL
V (h )=H(h )−1J(h )H(h )−1+o(n). (4)
* * * *
Plugging (4) into (3) completes the proof. %
L 2. Under Assumptions 1–3, we have that
E [l {h@ (Y ); Y }]=E {l (h ; Y )}−1 tr{J(h )H(h )−1}+o(1),
g(y) C MCL g(y) C * 2 * *
with J(h ) and H(h ) given by (2).
* *
Proof. Consider the stochastic Taylor expansion for l {h@ (Y ); Y } around h@ (Y )=h :
C MCL MCL *
l {h@ (Y ); Y }=l (h ; Y )+{h@ (Y )−h }TVl (h ; Y )
C MCL C * MCL * C *
+1 {h@ (Y )−h }TV2l (h ; Y ){h@ (Y )−h }+o (1).
2 MCL * C * MCL * p
Since, by standard asymptotic arguments, Vl (h ; Y ) may approximated by
C *
−{h@ (Y )−h }TV2l (h ; Y ),
MCL * C *
we obtain
l {h@ (Y ); Y }=l (h ; Y )−1 {h@ (Y )−h }TV2l (h ; Y ){h@ (Y )−h }+o (1).
C MCL C * 2 MCL * C * MCL * p
Taking expectations, with respect to the true distribution of Y and using relationship (4)
and V2l (h ; Y )=E {V2l (h ; Y )}+o (n), we complete the proof. %
C * g(y) C * p
From these lemmas, it is immediate to see that l (h@ ; Y ) is biased and that, under
C MCL
standard regularity conditions, the following information criterion is a first-order unbiased
estimator for Q(g, f ).
Composite likelihood inference 523
D 3. Consider a random sample Y , as previously defined. T he composite
likelihood information criterion selects the model maximising
l (h@ C (Y )−1},
; Y )+tr{JC (Y )H (5)
C MCL
C (Y ) are consistent, first-order unbiased estimators for J(h ) and H(h ),
where JC (Y ) and H
* *
respectively.
The statistic (5) is a generalisation of Takeuchi’s information criterion, which takes the
same form but with the parameter estimator being the ordinary maximum likelihood
estimator h@ and the quantities JC (Y ) and H C (Y ) computed from the ordinary likelihood
ML
function. In his setting and when the candidate model includes the true one, the
information identity J(h)=−H(h) holds and the selection statistic reduces to the more
familiar Akaike (1973) criterion l(h@ ; Y )−d. For the present setting this simplification
ML
will never occur, since the true model for the data does not play the role of true for the
composite likelihood; that is, the information identity does not hold.
Finally, we briefly mention two further important points. The first concerns the choice

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

of the weights in the composite likelihood. Typically, with regard to the pairwise likelihood,
the weights are chosen in order to eliminate non-neighbour pairs of observations, which
should be less informative; see for example Nott & Rydén (1999).
The second point concerns the efficient estimation of J(h) and H(h). The latter does
not pose difficulties and, under standard regularity conditions, a consistent estimator is
HC {h@ (Y )}=V2l {h@ (Y ); Y }. Much more difficult is the estimation of J(h), since
MCL C MCL
the associated naive estimator JC (h)=Vl (h; Y )Vl (h; Y )T vanishes when evaluated at
C C
h=h@ (Y ). If independent observations of the random vector Y are available, J(h) can
MCL
be estimated by the sample variance of the associated individual contributions to the
composite score function. On the other hand, if we observe only a single replicate of Y ,
as in the case of time series data, a different strategy is needed.
In this case, one might estimate J(h) by defining a partition of the sample Y such
that the associated contributions to the composite score function are approximately
uncorrelated. The sample variance matrix of these contributions may define an estimator
for J(h). However, the specification of this partition can be problematic for time series
data, especially in case of long-range dependency. As an alternative, a parametric bootstrap
procedure may be defined. However, in this case, the model has to be considered as
correctly specified and, if the data are high-dimensional, the computation can be very
time-consuming. A further possibility for time series and spatial data that do not seriously
depart from the condition of stationarity is to consider a resampling procedure called
window subsampling; see Heagerty & Lumley (2000) and references therein. This method
has been developed in the context of pairwise likelihood for binary spatial data by
Heagerty & Lele (1998). The idea behind window subsampling is to define suitable over-
lapping subseries of the original data that may be viewed as independent, identically
distributed, replicated observations. If we consider all the overlapping subseries of
dimension m, a suitable estimator for J(h) is
1 n−m+1 m
JC (h)= ∑ Vl (h; Y i+m )Vl (h; Y i+m )T,
m n−m+1 n C i C i
i=1
evaluated at h=h@ (Y ), where Y i+m =(Y , . . . , Y ). More refined estimators, and useful
MCL i i i+m
comments for the choice of the window dimension m, may be found in Heagerty &
Lumley (2000).
524 C V  P V
4. A
4·1. T esting overdispersion in dynamic generalised linear models
When modelling count data, usually by means of the Poisson distribution, one often
needs to account for overdispersion. Here, by means of a simulation study, we discuss the
use of composite likelihood methods for testing the presence of overdispersion with
regard to time series of count data. We compare two dynamic generalised linear models
(West & Harrison, 1997, p. 521) based respectively on the Poisson and on the negative
binomial distribution.
Given count data y=(y , . . . , y ), consider a Poisson-(1) model, which is a partially
1 n
observable stochastic process {X , Y } . The unobservable part {X } is an (1)
i i i1 i i1
process and the observations Y (i1) are conditionally independent given {X }
i i i1
and follow a Poisson distribution. More precisely, Y |X =x ~Po(exi ), for i1, and
i i i
X =lX +e , for i2, with e ~N(0, s2), for i2, independently for each i. We assume
i i−1 i i
that |l|<1, so that the latent model is stationary, and we set X ~N{0, s2/(1−l2)}. An
1
alternative model, useful for describing overdispersion, is obtained by substituting the

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

Poisson distribution with a negative binomial distribution with mean m =exi and an
i
additional size parameter k>0, such that, for i1,

A BA B
C(k−1+y ) km yi 1 1/k
f (y |x ; k)= i i (y µN),
i i C(k−1)y ! 1+km 1+km i
i i i
where C(.) is the gamma function. Note that these two models are nested, since the second
one tends to the first one as k 0. Although these models look simple, their analysis
using standard likelihood-based procedures is problematic, since the computation of the
likelihood function requires the evaluation of an intractable n-dimensional integral.
Although approximate solutions involving simulation-based methods are possible (Durbin
& Koopman, 2001, Ch. 11), we perform a diﬀerent strategy that relies on the pairwise
likelihood based on consecutive pairs of observations:

PP
n
L (h; y)= a f (y |x ; h) f (y |x ; h) f (x |x ; h) f (x ; h)dx dx . (6)
P i i i−1 i−1 i i−1 i−1 i i−1
i=2
In the first simulation study, we generated 500 datasets with n=300 observations
from the Poisson-(1) model with l=0·35 and s=1. The two-dimensional integrals
forming the pairwise likelihood (6) were approximated by means of double Gauss–Hermite
quadrature with 10 nodes for each dimension. Using adaptive quadrature gave similar
results. The sample means of the simulated pairwise likelihood estimators for l and s
were respectively 0·3500 and 0·9820, with sample standard deviations 0·1128 and 0·0831.
The good performance of the composite likelihood estimators also held for other values
of l and s.
The second simulation study deals with model selection. We generated 100 datasets
with n=300 observations from the negative binomial-(1) model with l=0·35, s=0·5
and various values for the size parameter k, namely 1, 1 , 1 , 1 and 0; k=0 indicates that
2 4 8
the simulations are from the Poisson-(1) model. We compared the two alternative
models using the composite likelihood crierion (5). The matrix J(h) was estimated with a
window subsampling procedure with window dimension m=50. Table 1(a) gives the
frequencies of correct model selection over the 100 simulated datasets, for diﬀerent values
for k.
Composite likelihood inference 525
Table 1. Frequencies of correct model selection over the 100 simu-
lated datasets with k=1, 1 , 1 , 1 , 0 and (a) l=0·35, s=0·5, and
2 4 8
(b) l=−0·6, s=0·7; k=0 indicates a true Poisson-(1) model
(a) (b)
k=1 k=1 k=1 k=1 k=0 k=1 k=1 k=1 k=1 k=0
2 4 8 2 4 8
100 90 64 59 55 99 82 64 45 58

Whenever k>1, the criterion (5) almost always indicated the true model, that is
the negative binomial one. Note that, as expected, whenever k approaches zero, we
choose the wrong model more often, since there is a potential slight overdispersion in the
observations and then the two models provide almost equivalent data descriptions, as
detected by the composite Kullback–Leibler divergence. When k is less than 1 , the values
4
of the selection statistics are usually very similar, as confirmed by the analysis of the

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

magnitudes of the observed diﬀerences. Similar results are obtained with l=−0·6 and
s=0·7; see Table 1(b).

4·2. T he Old Faithful geyser data

Here we present an application to the Old Faithful geyser dataset discussed in Azzalini
& Bowman (1990). We find that composite likelihood methods based on the tripletwise
likelihood perform well for parameter estimation but fail to distinguish between the two
models considered.
The data consist of a binary version of the time series of the duration of the successive
eruptions of the Old Faithful geyser in the Yellowstone National Park in the period from
1 to 15 August 1985. The short and long eruptions, based on a threshold of 3 minutes,
are labelled as 0 and 1, respectively. The random variables N , for r=0, 1, indicate
r
the corresponding observed numbers of eruptions, and N =105 and N =194. Also the
0 1
one-step observed transitions N , for r, s=0, 1, from state r to state s, are N =0,
rs 00
N =105, N =104 and N =89. Since N =0, only five two-step observed transitions
10 01 11 00
are nonnull: N =69, N =35, N =35, N =104 and N =54. In order to find a
010 110 011 101 111
plausible model for these data, MacDonald & Zucchini (1997; § 4.2) compare a suitable
hidden Markov model based on the binomial distribution with the second-order Markov
chain model proposed by Azzalini & Bowman (1990). A hidden Markov model is a
partially observable stochastic process {X , Y } , such that the unobserved part, {X } ,
i i i1 i i1
is a Markov chain. The evaluation of the likelihood function requires O(wn) computations
but MacDonald & Zucchini (1997) consider a suitable rearrangement of the terms
involved, thereby reducing significantly the computational burden.
An alternative to the full likelihood may be found within the class of composite
likelihoods. The simplest candidate in the pairwise likelihood, based on pairs of consecutive
observations. However, for the two models under consideration, this is not useful since
the associated likelihood equation has an infinite number of solutions. We therefore define
a tripletwise likelihood based on triplets of consecutive observations.
We consider the two competing models. For the two-state second-order Markov chain,
the tripletwise likelihood is easily computed and it involves pr(Y =r, Y =s, Y =t),
i−2 i−1 i
for i>2. In order to calculate these probabilities it is convenient to consider
D pr(Y =s, Y =r|Y =t, Y =s)=pr(Y =r|Y =s, Y =t),
(sr)(ts)= i−1 i i−2 i−1 i i−1 i−2
526 C V  P V
for r, s, t=0, 1, i>2. It is easy so see that D =b, D =c and D =1, with
(10)(01) (10)(11) (01)(10)
b, cµ(0, 1) as unknown parameters. Here h=(b, c) and the maximum tripletwise likelihood
estimates are found to be b@ =0·6634 and c@ =0·3932, which equal the maximum
MTL MTL
likelihood estimates. Graphical inspection from Fig. 1 shows, as expected, that the ordinary
loglikelihood has a more peaked form; indeed, the contour plots show diﬀerent gradient
directions. The maximised tripletwise loglikelihood is l (b@ , c@ ; y)=−451·5889. Note
T MTL MTL
that the maximum tripletwise likelihood estimates allow for equality between the estimated
and observed frequencies for the five triplets of potential observations: the estimate for
pr(Y =0, Y =1, Y =0), based on b@ and c@ , equals N /(n−2), and similarly
i−2 i−1 i MTL MTL 010
for the remaining triplets. We can therefore say that this model reaches a sort of ‘best’
possible fitting, as detected by the tripletwise likelihood.

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

Fig. 1. Contour plots of (a) the ordinary loglikelihood and (b) the tripletwise loglikelihood for the
second-order Markov chain model fitted to the Old Faithful geyser data.

With regard to the second model, we recall that, for a hidden Markov model, the
tripletwise likelihood is

n
L (h; y)= a ∑ f (x , x , x ; h) f (y |x ; h) f (y |x ; h) f (y |x ; h),
T i−2 i−1 i i−2 i−2 i−1 i−1 i i
i=3 xi−2, xi−1, xi
where f (x , x , x ; h) is the joint probability function of (X , X , X ) and the
i−2 i−1 i i−2 i−1 i
summation is over all the triplets of subsequent latent observations. In this case,

n
L (h; y)= a ∑ pr(Y =r, Y =s, Y =t)Nrst.
T i−2 i−1 i
i=3 r, s, tµ{0,1}
We assume that the transition probabilities of the hidden Markov chain are

pr(X =1|X =0)=1, pr(X =0|X =1)=a,

i+1 i i+1 i
for i1, and pr(Y =y|X =0)=ry(1−r)1−y, y=0, 1, pr(Y =1|X =1)=1, for i1, with
i i 1 i
aµ(0, 1) and rµ(0, 1) as unknown parameters. If we assume stationarity it is not diﬃcult
to compute the relevant triplet probabilities. Here h=(a, r) and the maximum tripletwise
likelihood estimates are a@ =0·8948 and r@ =0·2584, while the maximum likelihood
MTL MTL
estimates, though not the same, are very similar; that is, a@ =0·827 and r@ =0·225.
ML ML
Composite likelihood inference 527
Graphical inspection of the contour plots of the ordinary loglikelihood and of the
tripletwise loglikelihood leads to conclusions similar to those emphasised in the previous
case.
We find that the maximised tripletwise loglikelihood l (a@ , r@ ; y)=−451·5889
T MTL MTL
coincides with that obtained for the second-order Markov chain. This is a consequence
of the perfect match between the estimated and the observed frequencies for the five triplets
of potential observations, which holds for the two-state hidden Markov model as well.
Indeed, the computation, using Monte Carlo simulation, of the bias correction term
C (Y )−1} specifying the composite likelihood information criterion also gives the
tr{JC (Y )H
same approximated value of 4·65 for the two models. Thus, in this case, the tripletwise
likelihood, though useful for inferential purposes, does not detect the high-order structural
diﬀerences between the two estimated models. This conclusion emphasises that a careful
choice of the composite likelihood is necessary, both for inference and model selection,
with the aim of balancing the improved computational facility and the reduced descriptive
ability that characterise pseudolikelihood procedures.

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

A
The authors would like to thank Prof. A. Azzalini, Dr M. Chiogna and the associate
editor for helpful comments. This research is partially supported by grants from the
Ministry of Education and University, Italy.

R
A, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proc. Second
International Symposium on Information T heory, Ed. B. N. Petrov and F. Caski, pp. 267–81. Budapest:
Akademiai Kiado.
A, A. (1983). Maximum likelihood estimation of order m for stationary stochastic processes. Biometrika
70, 381–7.
A, A. & B, A. W. (1990). A look at some data on the Old Faithful geyser. Appl. Statist.
39, 357–65.
B, J. E. (1974). Spatial interaction and the statistical analysis of lattice system (with Discussion). J. R.
Statist. Soc. B 36, 192–236.
B, K. P. & A, D. R. (2002). Model Selection and Multimodel Inference: a Practical Information-
theoretic Approach, 2nd ed. New York: Springer-Verlag.
C, D. R. (1975). Partial likelihood. Biometrika 62, 269–76.
C, D. R. & R, N. (2004). A note on pseudolikelihood constructed from marginal densities. Biometrika
91, 729–37.
D, J. & K, S. J. (2001). T ime Series Analysis by State Space Methods. Oxford: Oxford
University Press.
F, P. & D, P. (2002). Approximate likelihood methods for estimating local recombination
rates. J. R. Statist. Soc. B 64, 657–80.
H, P. J. & L, S. R. (1998). A composite likelihood approach to binary spatial data. J. Am. Statist.
Assoc. 93, 1099–111.
H, P. J. & L, T. (2000). Window subsampling of estimating functions with application to
regression models. J. Am. Statist. Assoc. 95, 197–211.
H, R. & S, S. (2003). A serially correlated gamma frailty model for longitudinal count
data. Biometrika 90, 355–66.
H, N. L. & O, H. (1994). Topics in spatial statistics. Scand. J. Statist. 21, 289–357.
L, B. (1988). Composite likelihood methods. In Statistical Inference from Stochastic Processes,
Ed. N. U. Prabhu, pp. 221–39. Providence, RI: American Mathematical Society.
MD, I. L. & Z, W. (1997). Hidden Markov and Other Models for Discrete-valued T ime Series.
London: Chapman and Hall.
528 C V  P V
N, D. J. & R, T. (1999). Pairwise likelihood methods for inference in image models. Biometrika
86, 661–76.
P, E. T. (2001). A composite likelihood approach to multivariate survival data. Scand. J. Statist. 28,
295–302.
R, D., M, G. & G, H. (2004). A pairwise likelihood approach to estimation in multilevel
probit models. Comp. Statist. Data Anal. 44, 649–67.
S, R. (1989). Statistical aspects of model selection. In From Data to Model, Ed. J. Willems, pp. 215–40.
New York: Springer-Verlag.
T, K. (1976). Distribution of information statistics and criteria for adequacy of models (in Japanese).
Math. Sci. 153, 12–8.
V, C., H, G. & S, Ø. (2005). Pairwise likelihood inference in spatial generalized linear mixed
models. Comp. Statist. Data Anal. To appear.
W, M. & H, J. (1997). Bayesian Forecasting and Dynamic Models, 2nd ed. New York: Springer-
Verlag.
W, H. (1994). Estimation, Inference and Specification Analysis. New York: Cambridge University Press.

[Received November 2003. Revised December 2004]

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

Form Mechanics Lien Claim
No ratings yet
Form Mechanics Lien Claim
3 pages
Vuong LikelihoodRatioTests 1989
No ratings yet
Vuong LikelihoodRatioTests 1989
28 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
A Predictive Approach To The Random Effect Model
No ratings yet
A Predictive Approach To The Random Effect Model
7 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
MATH 437/ MATH 535: Applied Stochastic Processes/ Advanced Applied Stochastic Processes
No ratings yet
MATH 437/ MATH 535: Applied Stochastic Processes/ Advanced Applied Stochastic Processes
7 pages
Akaike 1998
No ratings yet
Akaike 1998
15 pages
Cox - Statistics Paper
No ratings yet
Cox - Statistics Paper
42 pages
Analysis of Binary Panel Data by Static and Dynamic Logit Models
No ratings yet
Analysis of Binary Panel Data by Static and Dynamic Logit Models
45 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
22 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Quasi-Likelihood Functions, Generalized Linear Models
No ratings yet
Quasi-Likelihood Functions, Generalized Linear Models
10 pages
Quasi-Likelihood Functions, Generalized Linear Models, and The Gauss-Newton Method (Wedderburn Article)
No ratings yet
Quasi-Likelihood Functions, Generalized Linear Models, and The Gauss-Newton Method (Wedderburn Article)
9 pages
Maximum Weights
No ratings yet
Maximum Weights
15 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
Lec7 Model
No ratings yet
Lec7 Model
8 pages
ML Notes
No ratings yet
ML Notes
4 pages
Regression With A Binary Dependent Variable
No ratings yet
Regression With A Binary Dependent Variable
63 pages
Estadísticas
No ratings yet
Estadísticas
9 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
34 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
Prediction of Random Variables
No ratings yet
Prediction of Random Variables
36 pages
14.384 Time Series Analysis: Mit Opencourseware
No ratings yet
14.384 Time Series Analysis: Mit Opencourseware
6 pages
Boukeloua 2025 TWMS
No ratings yet
Boukeloua 2025 TWMS
15 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
Midterm Review STA216: Generalized Linear Models: I I I I I I
No ratings yet
Midterm Review STA216: Generalized Linear Models: I I I I I I
26 pages
Box 1965
No ratings yet
Box 1965
12 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
Lecture 6 - Asymptotic Properties of Maximum Likelihood Estimators and Bayesian Methods of Point Estimation
No ratings yet
Lecture 6 - Asymptotic Properties of Maximum Likelihood Estimators and Bayesian Methods of Point Estimation
35 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
Linear Regression Analysis: Module - Ii
No ratings yet
Linear Regression Analysis: Module - Ii
11 pages
Maximum Likelihood Estimation in The Weibull Distribution Based On Complete and Oncensored Samples
No ratings yet
Maximum Likelihood Estimation in The Weibull Distribution Based On Complete and Oncensored Samples
11 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
DS 630 - Lec 02 - ST
No ratings yet
DS 630 - Lec 02 - ST
34 pages
11 Mle
No ratings yet
11 Mle
26 pages
7 Mle
No ratings yet
7 Mle
31 pages
Numerical Optimization in Likelihood Estimation
No ratings yet
Numerical Optimization in Likelihood Estimation
46 pages
14 Model Selection
No ratings yet
14 Model Selection
24 pages
Inf 2
No ratings yet
Inf 2
37 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
13 pages
Assignment 10 Solution
No ratings yet
Assignment 10 Solution
8 pages
RKL: A General, Invariant Bayes Solution For Neyman-Scott
No ratings yet
RKL: A General, Invariant Bayes Solution For Neyman-Scott
15 pages
MATH330 Lancester University Previous Yearpaper
No ratings yet
MATH330 Lancester University Previous Yearpaper
9 pages
Finite-Sample Optimal Estimation and Inference On Average Treatment Effects Under Unconfoundedness
No ratings yet
Finite-Sample Optimal Estimation and Inference On Average Treatment Effects Under Unconfoundedness
56 pages
Prints PDF
No ratings yet
Prints PDF
106 pages
Model Selection and Model Averaging
No ratings yet
Model Selection and Model Averaging
16 pages
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
No ratings yet
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
29 pages
I Prior
No ratings yet
I Prior
23 pages
Akaike Etc, Wli
No ratings yet
Akaike Etc, Wli
49 pages
NOTES
No ratings yet
NOTES
14 pages
Mixed Model Selection Information Theoretic
No ratings yet
Mixed Model Selection Information Theoretic
7 pages
(A) Modeling: 2.3 Models For Binary Responses
No ratings yet
(A) Modeling: 2.3 Models For Binary Responses
6 pages
08 SS039
No ratings yet
08 SS039
17 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
Estimating Regression Models of Finite But Unknown Order
No ratings yet
Estimating Regression Models of Finite But Unknown Order
17 pages
rssb12129 Sup 0001 Supinfo
No ratings yet
rssb12129 Sup 0001 Supinfo
39 pages
Airport Terminal Standard Dimensions
No ratings yet
Airport Terminal Standard Dimensions
2 pages
W5 Iccii Lab Physical Synthesis
No ratings yet
W5 Iccii Lab Physical Synthesis
16 pages
Mahendra Engineering College
No ratings yet
Mahendra Engineering College
2 pages
4167 11023 1 PB
No ratings yet
4167 11023 1 PB
11 pages
Working at Heights Verification of Competency RIIWHS204E OHS - Com.au
No ratings yet
Working at Heights Verification of Competency RIIWHS204E OHS - Com.au
4 pages
Installation Manual For EG2233/EG3333/EG8406/EG3355/EG3388 RF EAS Systems
No ratings yet
Installation Manual For EG2233/EG3333/EG8406/EG3355/EG3388 RF EAS Systems
8 pages
Chuks
No ratings yet
Chuks
4 pages
NIKE Bleed Blue Integrated Campaign
No ratings yet
NIKE Bleed Blue Integrated Campaign
2 pages
WB - 5 Judiciary
No ratings yet
WB - 5 Judiciary
39 pages
Pistonless Engine Project PPT by Khushal Kumar
No ratings yet
Pistonless Engine Project PPT by Khushal Kumar
16 pages
Board of Education Meeting Summary
No ratings yet
Board of Education Meeting Summary
13 pages
Project Year 12 English
No ratings yet
Project Year 12 English
7 pages
Wainfleet Discharge of Guns Bylaw
No ratings yet
Wainfleet Discharge of Guns Bylaw
5 pages
Factors and Norms Influencing Unpaid Care Work
No ratings yet
Factors and Norms Influencing Unpaid Care Work
64 pages
Designz Tweet Book
No ratings yet
Designz Tweet Book
117 pages
LP-3 (Information & Cyber Security) Lab Manual 2021-22
No ratings yet
LP-3 (Information & Cyber Security) Lab Manual 2021-22
37 pages
Computer Engineering Technician - Sample Resume
No ratings yet
Computer Engineering Technician - Sample Resume
2 pages
Sudan ATC VSAT Network: (Bay-Sat, PS14009)
No ratings yet
Sudan ATC VSAT Network: (Bay-Sat, PS14009)
7 pages
MunicipalBank E-Passbook13-05-2024 195315
No ratings yet
MunicipalBank E-Passbook13-05-2024 195315
3 pages
Kioxia SSD XG6-P Product Brief
No ratings yet
Kioxia SSD XG6-P Product Brief
2 pages
Preface To IGAS and IGFRS
No ratings yet
Preface To IGAS and IGFRS
5 pages
JD - Commissioning Supervisor
No ratings yet
JD - Commissioning Supervisor
2 pages
Manas College Pamphlet 2025
No ratings yet
Manas College Pamphlet 2025
2 pages
Counter Rust 7010 TDS
No ratings yet
Counter Rust 7010 TDS
2 pages
Local Media7707301369137256841
No ratings yet
Local Media7707301369137256841
33 pages
Solar & Crank Emergency Radio Guide
100% (2)
Solar & Crank Emergency Radio Guide
28 pages
Flexitallic Flexpro Brochure 11-30-2017
No ratings yet
Flexitallic Flexpro Brochure 11-30-2017
8 pages
Enzymes in Industrial Applications
No ratings yet
Enzymes in Industrial Applications
18 pages
Final Exam Samplex
No ratings yet
Final Exam Samplex
9 pages

Varin 2005

Uploaded by

Varin 2005

Uploaded by

Biometrika (2005), 92, 3, pp.

A note on composite likelihood inference and model selection

 PAOLO VIDONI

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

3. A -   

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

4·2. T he Old Faithful geyser data

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

pr(X =1|X =0)=1, pr(X =0|X =1)=a,

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

[Received November 2003. Revised December 2004]

Downloaded from http://biomet.oxfordjournals.org/ at Aston University on January 14, 2014

You might also like