0% found this document useful (0 votes)

2 views21 pages

AReviewof Bayesian Methodsfor Infinite Factorisations

Uploaded by

heitorblesa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views21 pages

AReviewof Bayesian Methodsfor Infinite Factorisations

Uploaded by

heitorblesa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/374166236

A Review of Bayesian Methods for Inﬁnite Factorisations

Preprint · September 2023

CITATIONS READS
0 61

1 author:

Margarita Grushanina
Imperial College London
2 PUBLICATIONS 0 CITATIONS

SEE PROFILE

All content following this page was uploaded by Margarita Grushanina on 25 September 2023.

The user has requested enhancement of the downloaded file.

A Review of Bayesian Methods for Infinite Factorisations
Margarita Grushanina*
arXiv:2309.12990v1 [stat.ME] 22 Sep 2023

September 25, 2023

Abstract

Defining the number of latent factors has been one of the most challenging problems in factor
analysis. Infinite factor models offer a solution to this problem by applying increasing shrinkage
on the columns of factor loading matrices, thus penalising increasing factor dimensionality. The
adaptive MCMC algorithms used for inference in such models allow to defer the dimension of the
latent factor space automatically based on the data. This paper presents an overview of Bayesian
models for infinite factorisations with some discussion on the properties of such models as well
as their comparative advantages and drawbacks.
Keywords: Factor analysis, adaptive Gibbs sampling, spike-and-slab prior, Indian buffet pro-
cess, multiplicative gamma process, increasing shrinkage

1 Introduction
Latent factor models represent a popular tool for data analysis in many areas of science, includ-
ing psychology, marketing, economics, finance, genetic research, pharmacology and medicine. Their
history dates back to Spearman (1904), who first suggested common factor analysis as a single factor
model in the context of psychology. Thurstone (1931) and Thurstone (1934) extended it to multi-
ple common factors and introduced some important factor analysis concepts, such as communality,
uniqueness, and rotation. Anderson and Rubin (1956) in their seminal paper established important
theoretical foundations of latent factor analysis. Since then there has been a vast and constantly
growing pool of literature covering various theoretical and practical aspects of factor analysis. Some
selective reviews include, for example, Barhoumi et al. (2013) and Stock and Watson (2016) for dy-
namic factor models, Bai and Wang (2016) for large factor models, and Fan et al. (2021) for factor
models in application to econometric learning.
Recent years have also seen a considerable research in the area of Bayesian latent factor models.
Some of the many important contributions in this area include Geweke and Zhou (1996), Aguilar and
West (2000), West (2003), Lopes and West (2004), Frühwirth-Schnatter and Lopes (2010), Conti et
al. (2014), Ročková and George (2016), Kaufmann and Schumacher (2019) and Frühwirth-Schnatter
et al. (2022a).
One of the most challenging tasks in factor analysis concerns the inference of the true number
of latent factors in the model. The most common approach in the literature has long been to use
various criteria to choose a model with the correct number of factors. Thus, Bai and Ng (2002)
* Department of Economics, Vienna University of Economics and Business, Welthandelsplatz 1, 1020 Vienna, Austria

1
use information criteria to compare models with different factors’ cardinalities. Kapetanios (2010)
performs model comparison using test statistics, while Polasek (1997) and Lopes and West (2004)
rely on marginal likelihood estimation to determine the true number of factors in the model. Car-
valho et al. (2008) perform the evolutionary stochastic model search which iteratively increases the
model by an additional factor until reaching some pre-specified limit or until the process stops includ-
ing additional factors. As a different approach, Lopes and West (2004) customise a reversible jump
MCMC (RJMCMC) algorithm introduced in Green (1995) for moving between models with different
numbers of factors, while Frühwirth-Schnatter and Lopes (2018) suggest a one-sweep algorithm to
estimate the true number of factors from an overfitting factor model.
However, such methods are often computationally demanding, especially when the dimensional-
ity of the analysed data set is high. Recently, another approach has been developed which allows the
factors’ cardinality to be derived from data by letting the number of factors to potentially be infinite.
The dimension reduction is then achieved by assigning a nonparametric prior to factor loadings which
penalises the increase of the number of columns in the factor loading matrix via increasing shrinkage
of the factor loadings on each additional factor to zero. Thus, in their pioneering work, Bhattacharya
and Dunson (2011) introduced the multiplicative gamma process (MGP) prior on the precision of
factor loadings, which is defined as a cumulative product of gamma distributions. Knowles and
Ghahramani (2011) and Ročková and George (2016) employed the Indian Buffet Process (IBP) to
enforce sparsity on factor loadings and at the same time penalise the increasing dimensionality of
latent factors. Legramanti et al. (2020) introduced the cumulative shrinkage process (CUSP) prior
which applies cumulative shrinkage on the increasing number of columns of the factor loading matrix
via a sequence of spike-and-slab distributions. Model inference is usually performed via Gibbs sam-
pler steps, however, the models’ changing dimensions at different iterations of the sampler require the
usage of adaptive algorithms, which have some specific properties that need to be taken into account.
This paper provides a review of the methods for infinite factorisations, with a focus on their
properties, comparative advantages and drawbacks. The paper proceeds as follows: Section 2 briefly
reviews the formulation of a Bayesian factor model and a shrinkage prior on factor loadings. Sections
3 - 5 provide an insight into the three above mentioned priors for infinite factorisations, namely,
MGP, CUSP and IBP priors, and outline their main advantages and drawbacks. Section 6 reviews the
concept of generalized infinite factorization models. Section 7 concludes with a discussion.

2 Bayesian infinite factor model

2.1 Bayesian latent factor model
In the traditional Bayesian factor analysis data on p related variables are assumed to arise from a
multivariate normal distribution yt ∼ Np (0, Ω), where yt is the t-th of the T observations and Ω is
the unknown covariance matrix of the data. A factor model represents each observation yt as a linear
combination of K common factors ft = (f1t , . . . , fKt )T :

yt = Λft + ǫt , (1)

where Λ is an unknown p × K factor loading matrix with factor loadings λih (i = 1, . . . , p, h =

1, . . . , K) and it is typically assumed that K ≪ p.
Often, the latent factors are assumed to be orthogonal and follow a normal distribution ft ∼
Np (0, Ip ). Furthermore, it is assumed that the factors ft and fs are pairwise independent for t 6= s.

2
The idiosyncratic errors ǫt are also assumed normal and pairwise independent:

ǫt ∼ Np (0, Σ), Σ = diag(σ12 , . . . , σp2 ).

These assumptions allow to represent the covariance matrix of the data in the following way:

Ω = ΛΛT + Σ. (2)

There are many different ways to choose a prior for the elements of the factor loading matrix Λ.
0 ) for the reason of conjugacy.
A typical choice involves a version of a normal prior λih ∼ N (d0ih , Dih
0
The hyperparameter dih is often chosen to be equal to zero. This has an additional advantage that with
a suitably chosen hyperprior for Dih 0 such setting can result in a sparse Λ with many zero elements,

which is justified for many applications of factor models. To ensure identifiability, it is often assumed
that Λ has a full rank lower triangular structure, which imposes a choice of a truncated normal prior
for the diagonal elements of Λ to ensure positivity and a normal prior for the lower diagonal elements
(see, e.g. Geweke and Zhou (1996), Lopes and West (2004), Ghosh and Dunson (2009), amongst
others).
The idiosyncratic variances σi2 are usually assigned an inverse Gamma prior σi2 ∼ G −1 (c0i , C0i )
mainly for the reasons of its conditional conjugacy.

2.2 Standard Gibbs sampler

Inference is usually performed via a Gibbs sampler, sequentially sampling factor loadings, id-
iosyncratic variances and factors from their respective conditional distributions. These steps are rather
generic for a wide range of factor models and choices of parameters. Assuming that the data is ex-
plained by K latent factors and that in the normal prior for the elements of the factor loading matrix
d0ih = 0, the Gibbs sampler steps for updating Λ, Σ and F = {ft : t = 1, . . . , T } will look as
follows:
Step 1. Sample λi for i in (1, . . . , p) from

λTi |− ∼ NK (Ψ−1 −2 T −1 −2 T −1 −2 T −1

i + σi F F ) F σi yi , (Ψi + σi F F )

0 , . . . , D 0 ) and λ is the ith row of the factor loading matrix Λ.

where Ψi = diag(Di1 iK i

Step 2. Sample σi−2 for i in (1, . . . , p) from

T
!
T 1X
σi−2 |− ∼G c0i + , C0i + (yit − λTi ft )2 .
2 2
t=1

Step 3. Sample ft for t in (1, . . . , T ) from

ft |− ∼ NK (IK + ΛT Σ−1 Λ)−1 ΛT Σ−1 yt , (IK + ΛT Σ−1 Λ)−1

where Σ = diag(σ12 , . . . , σp2 ).

Additional steps can be added to update hyperparameters if hyperpriors are assigned to any of the
parameters of the prior distributions for λih and σi2 .

3
2.3 Infinite factorisations and increasing shrinkage of the prior for Λ
In the above described Gibbs sampler steps, we take the number of latent factors K as known. In
reality, this is rarely the case and determining the plausible number of latent factors can be a difficult
and time consuming problem, especially in high-dimensional data sets. In the last decade, there
has been a rise in the literature using a different approach towards determining the number of latent
factors. This approach assumes that a factor model can in theory include infinitely many factors, i.e.
the factor loading matrix Λ can be comprised of infinitely many columns. This means that Λ is seen
as a parameter-expanded factor loading matrix with redundant parameters.
More formally, if ΘΛ denotes the collection of all matrices Λ with p rows and infinitely many
columns, then the product ΛΛT is a p × p matrix with all entries finite if and only if the following
condition holds1 :
n ∞
X o
ΘΛ = Λ = (λih ), i = 1, . . . , p, h = 1, . . . , ∞, max λ2ih < ∞
1≤i≤p
h=1

The prior on the elements of Λ is defined in such a way that it allows λih s to decrease in magni-
tude if the column index h grows, thus penalising the increasing factor dimensionality. This approach
allows the number of factors to be derived automatically from data via an adaptive inference algo-
rithm. In the next sections we discuss the most notable methods for infinite factorisations in detail.

3 Multiplicative gamma process prior

3.1 The prior specification
In their seminal paper, Bhattacharya and Dunson (2011) proposed one way to choose a prior on
the elements of a factor loading matrix so that to penalise the effect of additional columns: λih s are
given a normal prior centred at zero, while the prior precisions of λih s for each h are defined as a
cumulative product of gamma priors.
The MGP prior can be formalised as follows:
h
Y
λih |φih , τh ∼ N (0, φ−1 −1
ih τh ), φih ∼ G(ν1 /2, ν2 /2), τh = δl , (3)
l=1
δ1 ∼ G(a1 , b1 ), δl ∼ G(a2 , b2 ), l ≥ 2,

where δl (l = 1, . . . , ∞) are independent, τh is a global shrinkage parameter for the h-th column,
φih are local shrinkage parameters for the elements of the h-th column. The condition a2 > 1 is
imposed on the shape parameter of the prior for δl to insure that τh s are stochastically increasing with
increasing h. In Bhattacharya and Dunson (2011), b1 = b2 = 1 are set at 1, while a1 and a2 are
assigned the hyperprior G(2, 1) and sampled in a Metropolis-within-Gibbs step.

3.2 Inference and adaptive Gibbs sampler

The inference is done via a Gibbs sampler with a few additional steps to the standard ones de-
scribed in Section 2.2. A distinctive feature of the sampler suggested in Bhattacharya and Dunson (2011)
1
This follows from the Cauchy-Schwartz inequality, see the proof in Bhattacharya and Dunson (2011).

4
is that it truncates the factor loading matrix Λ to have k∗ columns, where k∗ is the number of factors
supported by the data at each given iteration of the sampler. The truncation procedure deserves some
closer attention.
Although theoretically the number of factors is allowed to be infinitely large, in reality one
chooses a suitable level of truncation k∗ , designed to be large enough not to miss any important
factors, but also not too conservative to induce unnecessary computational effort. The sampler is
initiated with a conservative guess K0 , which is chosen to be substantially larger than the supposed
actual number of factors. At each iteration of the sampler, the posterior samples of the factor load-
ing matrix Λ contain information about the effective number of factors supported by the data in the
following way. Let m(g) be the number of columns of Λ at iteration g which have all their elements
so small that they fall within some pre-specified neighbourhood of zero. Then these columns are
considered redundant and k∗(g) = k∗(g−1) − m(g) is defined to be the effective number of factors
at iteration g. To keep balance between dimensionality reduction and exploring the whole space of
possible factors, k∗ is adapted with probability p(g) = exp(α0 + α1 g), with the parameters chosen so
that the adaptation occurs more often at the beginning of the chain and decreases in frequency expo-
nentially fast (the adaptation is designed to satisfy the diminishing adaptation condition in Theorem
5 of Roberts and Rosenthal (2007), which is necessary for convergence). When the adaptation oc-
curs, the redundant factors are discarded and the corresponding columns are deleted from the loading
matrix (together with all other corresponding parameters). If none of the columns appear redundant
at iteration g, a factor is added, with all its parameters sampled from the corresponding prior distri-
butions. Adaptation is made to occur after a suitable burn-in period in order to ensure that the true
posterior distribution is being sampled from before truncating the loading matrices.
In the adaptive Gibbs sampler with the MGP prior on the factor loadings, the first three steps
will be essentially the same as in Section 2.2, with two alterations: the number of factors K will be
replaced by k∗ and in Step 1 Di1 0 , . . . , D 0 will consequently be replaced by φ−1 τ −1 , . . . , φ−1 τ −1 .
iK i1 1 ik ∗ k ∗
The additional steps will have the following form:
Step 4. Sample φih for i in (1, . . . , p) and h in (1, . . . , k∗ ) from

ν1 + 1 ν2 + τh λ2ih

φih |− ∼ G , .
2 2

Step 5. Sample δ1 from

k∗ p
!
2a1 + pk∗ 1 X (1) X
δ1 |− ∼ G ,1 + τl φil λ2il .
2 2
l=1 i=1

Sample δh for h ≥ 2 from

k∗ p
!
2a2 + p(k∗ − h + 1) 1 X (h) X
δh |− ∼ G ,1 + τl φil λ2il
2 2
l=h i=1

(h) Ql
where τl = t=1,t6=h δt for h in (1, . . . , k∗ ).
Step 6. Sample the posterior densities of a1 |δ1 and a2 |δ2 , . . . , δk∗ via a random walk Metropolis-
Hastings step with ap1 ∼ N (a1 , s21 ) and ap2 ∼ N (a2 , s22 ) serving as proposal quantities and the accep-

5
tance probabilities being:

Γ(a1 ) ap1 ap1 −a1 a1 −ap

ρa1 = δ e 1,
Γ(ap1 ) a1 1
k∗
!ap2 −a2
Γ(a2 ) −(k −1) ap2 Y
∗
p
ρa2 = p δl ea2 −a2 .
Γ(a2 ) a2
l=2

Step 7. At each iteration generate a random number ug from U (0, 1). If ug ≤ p(g), check if any
columns of the factor loading matrix Λ are within the pre-specified neighbourhood of 0, and if this
is so, discard the redundant columns and all its corresponding parameters. In the case when the
number of such columns is zero, generate an additional factor by sampling its parameters from the
prior distributions.

3.3 Practical applications and properties

The MGP prior has initially been developed for high-dimensional datasets with p ≫ T and a
sparse covariance matrix structure, such as genes expression data. However, it acquired a wide-
spread popularity and has been proved useful in various applications, see e.g. Montagna et al. (2012)
and Rai et al. (2014), amongst others. An application of particular interest is the infinite mixture of
infinite factor analysers (IMIFA) model introduced in Murphy et al. (2020), where the MGP prior was
used in the context of a mixture of factor analysers to allow automatic inference on the number of
latent factors within each cluster.
However, the MGP model has also some important limitations. Some of these limitations are
investigated in Durante (2017), who addressed the dependence of the shrinkage induced by the MGP
prior on the value of the hyperparameters a1 > 0 and a2 > 0. Bhattacharya and Dunson (2011) state
that the τh s in (3) are stochastically increasing with increasing h under the restriction a2 > 1, which
means that the induced prior on 1/τh increasingly shrinks the underlying quantity towards zero as the
column index h increases, provided that a2 > 1. Durante (2017) argues that this is not sufficient to
guarantee the increasing shrinkage property in a general case. Instead, further conditions are required,
such as

a2 > b2 + 1, a2 > a1 (4)

for the increasing penalization of a high number of factors to hold (in expectation), providing that
a1 > 0 and a2 > 0 and the values of a1 are not excessively high. In his simulation study of the
performance of the MGP prior for various values of the hyperparameters a1 and a2 , Durante (2017)
investigates the behaviour of the model with T = 100, p = 10, and two different values for the true
number of factors, namely K = 2 and K = 6. The results show an improved posterior concentration
when the parameters a1 and a2 satisfy condition (4), specially for the case K = 2. As the true rank
of the model increases, there is evidence that the shrinkage induced by the MGP prior might be too
strong.
Another critique of the MGP prior appeared in Legramanti et al. (2020), who pointed out that the
hyperparameters a1 and a2 both control the rate of shrinkage and the prior for the loadings on active
factors. This creates a trade-off between the need to maintain considerably diffuse priors for active
components and the endeavour to shrink the redundant ones. In their simulation study, Legramanti

6
(p, K) mode k∗ IQR â1 â2
(6, 2) 6.00 1.00 1.41 5.89
(10, 3) 5.75 1.30 1.31 5.12
(30, 5) 8.34 1.30 2.61 3.27
(50, 8) 12.30 1.60 2.62 2.49
(100, 15) 19.80 1.70 2.68 2.10
(150, 25) 5.00 0.00 4.32 4.96
Table 1: Performance of the adaptive Gibbs sampler based on the MGP prior for various combinations of p
and K. The modal estimates of k ∗ and the interquartile range (IQR) are reported. â1 and â2 are the estimates
of the values of a1 and a2 in (3) inferred via the Metropolis-Hastings step.

et al. (2020) found that the MGP prior significantly overestimates the number of active factors on a
medium sized data set with p < T .
In an attempt to evaluate the performance of the MGP prior when the hyperparameters a1 and a2
are derived from data, we simulated a dataset in a similar way as in Bhattacharya and Dunson (2011).
More specifically, a synthetic data set was simulated with T = 100 and idiosyncratic variances
sampled from G −1 (1, 0.25). The number of non-zero elements in each column of Λ were chosen
between 2k and k + 1, with zeros allocated randomly and non-zero elements sampled independently
from N (0, 9). We generated yt from Np (0, Ω), where Ω = ΛΛ′ + Σ. Further, we chose six (p, K)
combinations to test various dimensions of Λ, namely (6, 2), (10, 3), (30, 5), (50, 8), (100, 15) and
(150, 25) with a conservative initial upper bound of k0 = min(p, 5 log(p)), and k0 = 10 log(p) for
the latter case with p > T . For each pair we considered 10 simulation replicates. The simulation was
run for 30000 iterations with a burn-in of 10000.
We used the following hyperparameter values: ν1 and ν2 both equal to 3, the rate parameters b1
and b2 in the Gamma priors for δ1 and δl are set at 1. For the case when p < T , α0 and α1 in the
adaptation probability expression were set as −0.5 and −3 × (10)−4 , and as −1 and −5 × (10)−4
for the case when p ≥ T . The threshold for monitoring the columns to discard as 0.012 with the
proportion of elements required to be below the threshold at 80 % of p.
The simulation results in Table 1 show that the model tends to overestimate the number of active
factors in the case when p ≤ T . In the last case, when the number of variables p exceeds the number
of observations T , the number of active factors is severely underestimated compared to the true one.
The last two columns in Table 1 show the posterior mean of a1 and a2 . The first efficient shrinkage
condition of Durante (2017), a2 > b2 + 1, holds for all (p, k) combinations considered. For the first
three combinations of p and k, the column shrinkage parameters a1 and a2 , estimated from the data,
are in accordance with the second efficient shrinkage condition of Durante (2017), namely a2 > a1 .
However, with higher p, the condition a2 > a1 seems to cease holding when p gets closer to 50. This
result is of some interest especially in view of the simulation study in Durante (2017), which suggests
that the shrinkage induced by the MGP prior (and satisfying the condition a2 > a1 ) might prove too
strong when the dimension of the data set increases.
Assigning a hyperprior to influential parameters, like we did in the case of a1 and a2 , is a good
way to reduce uncertainty and subjectivity of the model. However, the adaptation mechanism of such
a sampler involves several hyperparameters, which may need to be adjusted depending on the nature
2
Setting the threshold for monitoring the redundant columns at a smaller value than 0.01 in the case when p ≥ T led to
a an improvement of the results. However, tuning the threshold parameters remains highly heuristic and can be tricky while
working with real data sets when the true number of factors is not known.

7
and dimensionality of data. For example, we used an additional parameter indicating the proportion
of the factor loadings in the column of Λ which needs to be within the chosen neighbourhood of zero
to be considered redundant. This was first introduced in Murphy et al. (2020), who found the choice
of these truncation parameters to be a delicate issue which strongly depends on the type of the data.
The threshold defining the neighbourhood of 0, which is used to decide which factor loadings should
be discarded, is another such example. Moreover, the parameters of the adaptation probability, α0
and α1 , also need some tuning. In our simulation study, the speed of the adaptation differed for the
settings with p < T and p > T , when using the same values for α0 and α1 .
The importance and difficulty of choosing a suitable truncation criteria in the adaptive infinite
factor algorithms was addressed in Schiavon and Canale (2020). The authors argue that the choice
of truncation criteria, such as the predefined neighbourhood of zero, plays a vital role for the perfor-
mance of the model. The optimal value of the criterion depends of the scale of data, while the number
of active factors can be severely underestimated if the value of the truncation criterion is too large,
and severely overestimated if it is too small. This is especially true for high-dimensional data, as with
p getting larger, the probability of having all values of |λih | smaller than the predefined threshold
goes to zero exponentially. In the absence of any guidance towards choosing an optimal value of such
a threshold, this remains a highly subjective and random procedure. Schiavon and Canale (2020)
suggest another way to define a criterion for truncating the redundant factors, which is robust to the
scale of the data and has a well-defined upper bound. The main idea is to truncate Λ in such a way
that the truncated model is able to explain at least a fraction Q ∈ (0, 1) of the total variability of the
data, where the variability of y is measured by the trace of the covariance matrix Ω:

tr(Λk∗ ΛTk∗ ) + tr(Σ)

≥ Q,
tr(Ω)

where Λk∗ denotes the factor loading matrix obtained by discarding the columns of Λ starting from
k∗ +1. The authors conduct a simulation study which shows that using the suggested method to select
the relevant active factors drastically improves the performance of the MGP model.

4 Cumulative shrinkage process prior

4.1 The prior specification
Legramanti et al. (2020) proposed another type of a nonparametric prior on the variances of the
elements of Λ, which largely corrects the drawbacks of the MGP prior. The CUSP prior on the factor
loadings induces shrinkage via a sequence of spike-and slab distributions that assign growing mass to
the spike as the model complexity grows. The CUSP prior formalises as follows:

λih | θh ∼ N (0, θh ), where i = 1, . . . , p and h = 1, . . . , ∞

h
X l−1
Y
θh | πh ∼ (1 − πh )G −1 (aθ , bθ ) + πh δθ∞ , πh = wl , w l = vl (1 − vm ) (5)
l=1 m=1

where πh ∈ (0, 1) and the vh s are generated independently from B(1, α), following the usual stick-
breaking representation introduced in Sethuraman (1994). By integrating out θh , each loading λih

8
has the marginal prior3

λih ∼ (1 − πh )t2aθ (0, bθ /aθ ) + πh N (0, θ∞ )

where t2aθ (0, bθ /aθ ) denotes the Student-t distribution with 2aθ degrees of freedom, location 0 and
scale bθ /aθ . To facilitate effective shrinkage of the redundant factors, θ∞ should be set close to 0.
The authors recommend a small value θ∞ > 0, following Ishwaran and Rao (2005), as it induces
a continuous shrinkage prior on every factor loading, thus improving mixing and identification of
inactive factors. The authors use the fixed value of θ∞ = 0.05, however, it can be replaced by some
continuous distribution without affecting the key properties of the prior. This is shown in Kowal and
Canale (2022), where a normal mixture of inverse-gamma priors is employed for the spike and slab
distributions. The slab parameters aθ and bθ should be specified so as to induce a moderately diffuse
prior on active loadings.

4.2 Inference and adaptive Gibbs sampler

The inference is done via Gibbs sampler steps. Similarly to the MGP model, the first three steps
remain essentially the same as in Section 2.2, with the difference that in Step 1 Di1 0 , . . . , D 0 will
iK
be replaced by θ1 . . . , θH , where H is the truncation level. This truncation level is chosen differently
than in Bhattacharya and Dunson (2011) and the adaptation process is also different and designed in
such a way that it depends less on heuristically chosen parameters.
While the probability of adaptation at iteration g of the sampler is also set to satisfy the di-
minishing adaptation condition of Roberts and Rosenthal (2007), there is no need to pre-specify an
ad-hoc parameter describing some small neighbourhood of 0. The inactive columns of Λ are iden-
tified as those which are assigned to the spike and are discarded at iteration g with the probability
p(g) = eα0 +α1 g together with all corresponding parameters. If at iteration g all columns of the factor
loading matrix are identified as active, i.e. assigned to the slab, an additional column of Λ is gen-
erated from the spike and all the corresponding parameters are sampled from their respective prior
distributions. The initial number of columns H at which the CUSP model is truncated is set equal
to p + 1, following the consideration that there can be at most p active factors and by construction
at least one column is assigned to the spike. The assignment of the columns of Λ to spike or slab
at iteration g is done using H (g) categorical variables zh ∈ {1, 2, . . . , H (g) } with a discrete prior
P r(zh = h | wh ) = wh , where H (g) is the number of columns in Λ at iteration g.
The additional Gibbs sampler steps will look as follows:
Step 4. Sample θh in a data augmentation step. Thus, (5) can be obtained by marginalising out
independent latent indicators zh with probabilities p(zh = l | wl ) = wl for l = 1, . . . , H, from the
equation

θh | zh ∼ {1 − 1(zh ≤ h)}G −1 (aθ , bθ ) + 1(zh ≤ h)δθ∞ .

Sample zh for h in (1, . . . , H) from a categorical distribution with probabilities as below

(
wl Np (λh ; 0, θ∞ Ip ), l = 1, . . . , h,
p(zh = l | −) ∼
wl t2aθ (λh ; 0, (bθ /aθ )Ip ) , l = h + 1, . . . , H.
3
In the equation (5) the inverse gamma distribution for the slab is chosen for the reasons of conjugacy. In principle, this
expression provides a general prior, where a sufficiently diffuse continuous distribution needs to be chosen for the slab.

9
Step 5. Sample vl for l in (1, . . . , H − 1) from
H H
!
X X
vl | − ∼ B 1 + 1(zh = l), α + 1(zh > l) .
h=1 h=1
Ql−1
Set vH = 1 and update w1 , . . . , wH from wl = vl m=1 (1 − vm ).
Step 6. For h in (1, . . . , H):
Pp
if zh ≤ h set θh = θ∞ , otherwise sample θh from G −1 aθ + 12 p, bθ + 1
2
2
j=1 ih .
λ
Step 7. After some burn-in period g̃ required for the stabilization of the chain, the truncation index
PH (g) (g)
H (g) and the number of active factors H ∗(g) = h=1 1(zh > h) are adapted with probability
p(g) = exp(α0 + α1 g)4 as follows:

– if H ∗(g) < H (g−1) − 1:

set H (g) = H ∗(g) + 1, drop inactive columns in Λ(g) along with the associated param-
eters in F (g) , θ (g) and w (g) , and add the final component sampled from the spike to
Λ(g) , together with the associated parameters in F (g) , θ (g) and w (g) sampled from the
corresponding priors

– otherwise:

set H (g) = H (g−1) + 1 and add the final column sampled from the spike to Λ(g) , together
with the associated parameters in F (g) , θ (g) and w (g) sampled from the corresponding
priors.

4.3 Practical applications and properties

Since its introduction, the CUSP prior has been widely used in both theoretical studies and prac-
tical applications. The most notable of them include Kowal and Canale (2022), who employed
the further generalised CUSP prior in the context of nonparametric functional bases; Frühwirth-
Schnatter (2023), who extended the CUSP prior to the class of generalized cumulative shrinkage pri-
ors with arbitrary stick-breaking representations which might be finite or infinite; Gu and Dunson (2023),
who applied the CUSP prior to infer the number of latent binary variables in the context of a Bayesian
Pyramid (a multilayer discrete latent structure model for discrete data).
In contrast to the MGP prior, the CUSP prior on factor loadings provides a clear separation in the
parameters which control active factors and the shrinkage of the redundant terms. Thus, the shrinkage
rate depends on α in a sense that smaller values of α enforce more rapid shrinkage and therefore
smaller number of factors. The parameters aθ , bθ of the inverse gamma prior for the slab control
modelling of active factors (the inverse gamma prior can be replaced by another suitable continuous
prior) and can be sampled from data in the spirit of the parameters a1 and a2 in the MGP model.
To evaluate the comparative performance of the model with the CUSP prior on the data sets of
various dimensionality, we simulated data sets in the same way as in Section 3.3. The stick breaking
parameter α, which represents a prior expectation of the number of active factors in the dataset, was
4
The coefficients α0 and α1 are chosen according to the criteria described in Section 3.2

10
(p, K) mode H ∗ IQR
(6, 2) 2.00 0.00
(10, 3) 3.00 0.00
(30, 5) 5.00 0.00
(50, 8) 8.00 0.00
(100, 15) 15.00 0.00
Table 2: Performance of the adaptive Gibbs sampler based on the CUSP prior for various combinations of p
and K. The modal estimates of H ∗ and the interquartile range (IQR) are reported.

set to 5 (as in Legramanti et al. (2020)). We also choose the same parameters of the slab distribution
as in Legramanti et al. (2020), namely aθ = bθ = 2 and θ∞ = 0.05. The parameters of the adaptation
probability of the sampler α0 and α1 were set as −1 and −5 × (10)−4 . The simulations were run
for 15,000 iterations, with 5,000 discarded as burn-in, as convergence was achieved faster than in the
case of the MGP prior. The simulation results are presented in Table 2 and show that the model was
able to recover the correct number of factors in all considered cases.
The CUSP model offers significant advantages compared to the MGP model by eliminating the
very subjective and influential truncation threshold and decoupling the generation mechanism for
active and redundant components. This results in much more robust estimations of the number of
factors in data sets of various dimensions. In our experience, assigning some continuous distribution
to δθ∞ and a hyperprior to bθ can improve the performance, especially on a non-standardised data
sets. The model provides poor uncertainty quantification with the sampler often being stuck in one
(in most cases correct) value of H ∗ . This problem was addressed in Kowal and Canale (2022) by
extending the CUSP prior with a parameter expansion scheme which disperses the shrinkage applied
to the factors.

5 Indian buffet process prior

5.1 The prior specification
Another, slightly different approach to modelling factor loading matrices involves Indian Buffet
Process (Griffiths and Ghahramani (2006)), which defines a distribution over infinite binary matrices,
to provide sparsity and a framework for inferring the number of latent factors in the data set. This
approach was first suggested in Knowles and Ghahramani (2011) and is formally presented below.
First, a binary matrix Z is introduced whose elements indicate whether an observed variable i
has a contribution (non-zero loading) of factor h. Then the elements of Λ can be modelled in the
following way:

λih |zih ∼ zih N (λih ; 0, βh−1 ) + (1 − zih )δ0 (λih ),

where βh is a precision of the factor loadings in the hth column of Λ and δ0 is a delta function with a
point-mass at 0.
Thus, the factor loadings are modelled via a spike-and-slab distribution, however, differently from
the CUSP prior, the separation into the spike and the slab is done not with a variance parameter but
directly for the factor loadings λih via an auxiliary binary indicator matrix. This allows a potentially
infinite number of latent factors, i.e. Z has infinitely many columns of which only a finite number

11
will have nonzero entries. If πh is a probability of a factor h contributing to any of the p variables,
and K is (for the moment the finite) number of latent factors, the IBP with the intensity parameter
αIB arises from the Beta-Bernoulli prior:
α
IB
zih |πh ∼ Bernoulli(πh ), πh |αIB ∼ B ,1 ,
K
by setting K → ∞ and integrating out πh .

5.2 Inference and adaptive Gibbs sampler

The inference is done via a Gibbs sampler, of which the second and the third steps are the same as
in Section 2.2. The initial number of factors, which will define the dimensions of Λ and Z is chosen
as some conservative number which clearly overfits any possible number of factors in the data set.
Step 1 has the difference that not the ith row of the factor loadings matrix Λ but each element λih is
sampled separately from the univariate normal distribution, if zih = 1:
Step 1. Sample λih for which zih = 1 from

λih |− ∼ N (βh + σi−2 fh fhT )−1 σi−2 fh yiT , (βh + σi−2 fh fhT )−1

where fh is a vector of t = 1, . . . , T observations of factor h.

The precisions βh will be sampled in the following way:
Step 4. Sampling βh providing it is given a gamma prior G(aβ , bβ )
 
Pp
zih X
βh | zh , λih ∼ G aβ + i=1 , bβ + λ2ih  .
2
i,h

The binary indicator zih can be sampled using the fact that it is possible to calculate the posterior
p(zih =1|−)
density of the ratio p(z ih =0|−)
from the likelihood and prior probabilities and for every element there
can be only two events, zih = 1 or zih = 0. This is done in the following way:
Step 5. Sample binary indicator zih using
q
(βh + σi−2 fh fhT )−1 βh exp 21 (βh + σi−2 fh fhT )−1 (σi−2 fh yiT )2 m−i,h

p(zih = 1|−)
∼
p(zih = 0|−) T − 1 − m−i,h

where m−i,h is the number of other variables for which factor h is active, not counting variable i.
Although the binary matrix Z has infinitely many columns, only the nonzero ones contribute
to the likelihood. However, one needs to take into account the zero columns too, as the number of
factors can (and in many cases will) change at the subsequent iterations of the sampler. Let us denote
κi the number of columns of Z which contain 1 only in row i, so it will contain information about the
number of factors which are only active for the variable i5 . After the sampling step 5, κi = 0 for any
5
In terms of the Indian Buffet Process this means the number of new dishes customer i tries.

12
i by design, so the new factors κi are sampled in a separate MH step. Note that this is not a random
walk MH step as the proposal densities are not symmetric.
Step 6. Sample the number of new active factors κi in a MH step with the following proposal density
!
T κi
− T 1X T P ois(κi ; αIB /(p − 1))
ρκi = (2π) 2 |M | 2 exp m Mm ,
2 t P ois(κi ; αIB ν/(p − 1))

where ν > 0 is a tuning parameter aimed at improving mixing, M = σi−2 λκi λTκi + Iκi with λκi
denoting a 1×κi vector of the new elements of the factor loading matrix, and m = M −1 σi−2 λκi (yit −
λTi ft ). Steps 5 and 6 are designed to be in one loop for i = (1, . . . , p), i.e. for each variable i, first,
the indicator zih is sampled for every h, and then the number of new factors for variable i is sampled
in the following step.
Step 7. Assuming the gamma prior G(aα , bα ), sample the IBP strength parameter αIB from
 
p
X 1
αIB | Z ∼ G aα + K+ , bα + ,
j
j=1

where K+ is the number of active factors for which zih = 1 at least for one i.

5.3 Practical applications and properties

The IBP prior coupled with a spike-and-slab distribution proved to be a useful approach to model
sparse factor loadings and represents an alternative to implementing an increasing shrinkage on the
columns of the factor loading matrix in terms of inferring the number of active factors. A somewhat
related work was introduced earlier by Rai and Daume (2008) in the context of a nonparametric
Bayesian factor regression model, where a sparse IBP prior was coupled with a hierarchical prior
over factors. The authors did not assume independence of factors as in traditional factor analysis,
and instead of a normal prior used a Kingman’s coalescent prior which describes an exchangeable
distribution over a countable set of factors.
The original model of Knowles and Ghahramani (2011) was further extended in Ročková and
George (2016), where the authors couple the IBP prior on the binary indicators with a spike-and-
slab LASSO (SSL) prior of the elements of Λ. The SSL prior assigns to both the spike and the
slab components a Laplace distribution designed so that the slab has a common scale parameter and
the spike has a factor-specific scale parameter (different for each h). This prior tackles the problem
of rotational invariance of Λ by automatically promoting rotations with many zero loadings thus
resulting in many exact zeros in the factor loading matrix and facilitating identification. Differently
from Knowles and Ghahramani (2011) and Rai and Daume (2008), who do inference via a Gibbs
sampler, Ročková and George (2016) use an expectation-maximization (EM) algorithm, which brings
computational advantages for high-dimensional data.
Recently, Frühwirth-Schnatter (2023) suggested an exchangeable shrinkage process (ESP) prior
for finite number of factors K, which has relation to the IBP prior when K → ∞. The prior in its
general form is formulated as follows:

λih | τh ∼ (1 − τh )δ0 + τh Pslab (λih ), τh | K ∼ B(aK , bK ), h = 1, . . . , K, (6)

13
where δ0 is a Dirac delta, Pslab is an arbitrary continuous slab distribution, and K is the finite number
of factors. The slab probabilities τh s then decide the number of active factors K+ < K. When in
(6) bK = 1 and aK = αIB /K, for K → ∞ this prior converges to the IBP prior (Teh et al. (2007)).
The ESP prior has been used in the context of sparse Bayesian factor analysis in Frühwirth-Schnatter
et al. (2022a) and in the context of a mixture of factor analysers model in Grushanina and Frühwirth-
Schnatter (2023).

6 Generalised infinite factor models

One of the recent developments in the area of infinite factor models is the generalised infinite
factorisation model developed in Schiavon et al. (2022), where authors were motivated by the ex-
isting methods’ drawbacks such as lack of accommodation for grouped variables and other non-
exchangeable structures. While the existing increasing shrinkage models focus on priors for Λ which
are exchangeable within columns, they lack consideration for possible grouping of the rows of Λ,
which can occur in many applications, such as, for example, different genes in genomic data sets.
Here we briefly outline the main idea of the proposed method without going into much detail.
The generalised model is defined in the following way:

yit = si (zit ), zt = Λft + ǫt , ǫt ∼ ηǫ , (7)

where Λ is a p × K factor loading matrix, ft is a K-dimensional factor with a diagonal covariance

matrix Ξ = diag(ξ11 , . . . , ξKK ), ǫt is a p-dimensional error term independent of factors, ηǫ is some
arbitrary distribution, and the function si is the function si : R → R, for i = 1, . . . , p. Here,
differently from the factor model described in Section 2.1, it is not necessarily assumed that ft and ǫt
are normally distributed.
When, in fact, this is the case and si is the identity function, the model (7) takes the form of a
Gaussian linear factor model described in Section 2.1. When si = Fi−1 (Φ(zit )) with Φ(zit ) denoting
a Gaussian cumulative distribution function, the model (7) becomes a Gaussian copula factor model
as described in Murray et al. (2013). Choosing an appropriate si and modifying the assumptions
regarding the distribution of the parameters in (7) results in other types of factor models. The covari-
ance matrix Ω as in (2) has a more general form in the case of the generalised infinite factorisation
model Ω = ΛΞΛT + Σ, where Σ is the covariance matrix of the error term. The suggested prior
on the elements of Λ allows infinitely many columns, so that the number of factors K → ∞, and is
formulated as follows:

λih | θih ∼ N (0, θih ), θih = τ0 γh φih , τ0 ∼ ητ0 , γh ∼ ηγh , φih ∼ ηφi , (8)

where τ0 , γh and φih are responsible for global, column-specific and local shrinkage, respectively,
are independent a priori and the distributions ητ0 , ηγh and ηφi are supported on [0, ∞).
What is essentially different to previously described models, is that via φih a non-exchangeable
structure is imposed on the rows of Λ via some meta covariates X, which inform the sparsity structure
of Λ. Denoting by Xp×q a matrix of q meta covariates, ηφi should be chosen so as to satisfy:

E(φih | βh ) = g(xTi βh ), βh = (β1h , . . . , βqh )T , βmh ∼ ηβ , m = 1, . . . , q,

where g is a smooth one-to-one differentiable link function, xi = (xi1 , . . . , xiq ) denotes the ith
row of X, and βh are coefficients controlling the impact of the meta covariates on the shrinkage of

14
the elements of the hth column of Λ. Taking the example from the ecology application studied in
Schiavon et al. (2022), different bird species (variables i) may belong to the same phylogenetic order
(metacovariates m), have roughly the same size, follow similar diet etc.
In more details, the priors and hyperpriors on the factor loading are specified as follows:

τ0 = 1, γh = νh ρh , φih | βh ∼ Ber{logit−1 (xTi βh )cp },

νh−1 ∼ G(aν , bν ), aν > 1, ρh = Ber(1 − πh ), βh ∼ Nq (0, σβ2 Iq ),

where the link function g(x) takes the form of logit−1 (x) = ex /(1 + ex ) and cp ∈ (0, 1) is a possible
offset. The distribution of the parameter πh = p(γh = 0) follows a stick-breaking construction
h
X l−1
Y
πh = wl , wl = vl (1 − vm ), vm ∼ B(1, αgen ),
l=1 m=1

similar to Legramanti et al. (2020).

The model inference is performed via an adaptive Gibbs sampler, which resembles the one de-
veloped for the CUSP model. The frequency of adaptation is set in accordance with the Theorem 5
of Roberts and Rosenthal (2007), and at the iteration, at which the adaptation occurs, the redundant
columns of the loading matrix are discarded with all other corresponding parameters and the number
of active factors is adapted accordingly. The redundant columns are identified as those for which
ρh = 0. If at some iteration there are no redundant columns, then an additional factor and all its
corresponding parameters are generated from the priors.
The exact form of the Gibbs sampler steps depends on the prior assumptions for the elements of
(7). In case of the standard isotropic Gaussian and inverse gamma priors for factors and idyosyncratic
variances, steps 2 and 3 of the sampler will be identical to the ones described in Section 2.2. For the
detailed description of the Gibbs sampler steps the reader is referred to the Supplementary Material
of Schiavon et al. (2022).

7 Discussion and identification issues

Infinite factorisation models offer an enormous advantage of the automatic inference on the num-
ber of active factors by allowing it be derived from data. This is done by assigning a non-parametric
prior to the elements of the factor loading matrix, which penalises the increasing number of columns.
Some of such models at the same time account for the element-wise sparsity of factor loadings which
can be justified in many real life applications, such as genetics, economics, biology, and many others.
One of the weak points of such models is that they often rely on rather subjective truncation
parameters, with the lack of clear guidance towards the procedure of choosing such parameters. The
MGP prior of Bhattacharya and Dunson (2011) is the most prominent example of it, the simulation
studies in Schiavon and Canale (2020) and in Section 3.3 of this paper illustrate this point. This
subjectivity was significantly reduced in the CUSP prior of Legramanti et al. (2020). Generalisation
of the CUSP prior by setting the hyperprior on the spike parameter as in Kowal and Canale (2022)
significantly improved the performance of the model on data sets of different nature and eliminated
the need of data-dependent parameter tuning. In addition, the parameter-expanded version of the
CUSP model suggested in Kowal and Canale (2022) resulted in better uncertainty quantification.
The class of generalised infinite factorisation models of Schiavon et al. (2022) generalises the idea
of infinite factorisations with increasing shrinkage on factor loadings and incorporates it into a wide

15
class of various types of factor models. In addition, it allows the grouping of the variables, which
provides a useful feature for a wide rage of applications. The truncation of the redundant factors
is done in a similar way to the CUSP model, however, the complexity of this rather general model
makes unavoidable some subjective choices regarding hyperparameters and functional forms.
Another important issue concerns the identification of factor loadings. It is well known that the
decomposition of the covariance matrix Ω as in (2) is not unique. First, the correct identification of
the idiosyncratic covariance matrix should be ensured to guarantee that in the following two repre-
sentations:

Ω = ΛΛT + Σ, Ω = ΘΘT + Σ0

Σ = Σ0 and, hence, the cross-covariance matrix ΛΛT = ΘΘT is uniquely identified. This prob-
lem is known under the name of variance identification. The row deletion property of Anderson and
Rubin (1956) presents a sufficient condition for variance identification and states that whenever an ar-
bitrary row is deleted from Λ, two disjoint matrices of rank K should remain. This property imposes
an upper bound on the number of factors K ≤ p−1 2 . So, for dense factor models, variance identifi-
cation can fail if the number of factors is too high. For sparse factor models, additional restrictions
on the number of non-zero elements in each column of Λ need to be applied (see, e.g. Frühwirth-
Schnatter et al. (2022b)). Although in most cases K ≪ p and the upper bound will be respected,
there is no formal guarantee of variance identification for infinite factor models even when the factor
loading matrix is dense, and even less so in the case of sparse infinite factor models.
The second problem deals with the correct identification of Λ from ΛΛT . It is referred to as
the problem of rotational invariance and stems from the fact that for any semi-orthogonal matrix
P : P P T = I and Θ = ΛP , gt = P T ft , the two models

yt = Λft + ǫt and yt = Θgt + ǫt

are observationally indistinguishable. This problem is often addressed in the literature by imposing
restrictions on the elements of Λ, such as, for example, setting the upper diagonal elements equal to
zero and requiring the diagonal elements to be positive so that Λ represents a positive lower triangular
matrix. This approach has first been implemented by Geweke and Zhou (1996) and followed by
many others (see, for example, Lopes and West (2004) and Carvalho et al. (2008)). This constraint
introduces order dependence upon variables, which results in posterior distributions whose shapes
depend on the ordering of the variables in the data set and thus is not applicable for infinite factor
models. However, these models can still be employed for the tasks of covariance matrix estimation,
variable selection and prediction, which do not require identification.
However, while variance identification is rarely addressed in the literature and not at all in the
context of infinite factor models, in recent years some ex-post identification methods aimed at tackling
rotational invariance have been proposed, which are applicable for infinite factor models. These
methods usually involve some kind of orthogonalisation procedure applied at a post-processing step,
such as, for example, orthogonal Procrustean algorithm (Aßmann et al. (2016)) or Varimax procedure
(Poworoznek et al. (2021)).
There have also been some attempts to embed identification consideration into the estimation
procedure. Thus, Ročková and George (2016) offer a solution to the indeterminacy due to rotational
invariance via the SSL prior, which automatically promotes the rotations with many zero loadings
and thus reduces posterior multimodality. Their EM algorithm provides sparse posterior modal esti-
mates with exact zeroes in the factor loading matrix. Schiavon et al. (2022) propose an identification

16
scheme, which is somewhat similar in the idea. They search for an approximation of the maximum a
posteriori estimators of Λ, β = (β1 , β2 , . . .) and Σ by integrating out the scale parameters and latent
factors from the posterior density function and taking the parameters of interest from the draw which
produced the highest marginal posterior density function f (Λ, β, Σ | y).

17
References
Aguilar, O. and M. West (2000). “Bayesian Dynamic Factor Models and Portfolio Allocation”. In:
Journal of Business and Economic Statistics 18(3), pp. 338–357.
Anderson, T.W. and H. Rubin (1956). “Statistical inference in factor analysis”. In: Proceedings of the
Third Berkeley Symposium on Mathematical Statistics and Probability Volume V, pp. 111–150.
Aßmann, C., J. Boysen-Hogrefe, and M. Pape (2016). “Bayesian analysis of static and dynamic factor
models: An ex-post approach towards the rotation problem”. In: Journal of Econometrics 192(1),
pp. 190–206.
Bai, J. and S. Ng (2002). “Determining the number of factors in approximate factor models”. In:
Econometrica 70(1), pp. 191–221.
Bai, Jushan and Peng Wang (2016). “Econometric Analysis of Large Factor Models”. In: Annual
Review of Economics 8, pp. 53–80.
Barhoumi, K., O. Darné, and L. Ferrara (2013). Dynamic Factor Models: A Review of the literatire.
Working papers. Banque de France.
Bhattacharya, A. and D.B. Dunson (2011). “Sparse Bayesian infinite factor models”. In: Biometrika
98(2), pp. 291–306.
Carvalho, C.M. et al. (2008). “High-Dimensional Sparse Factor Modeling: Applications in Gene Ex-
pression Genomics”. In: Journal of American Statistical Association 103(484), pp. 1438–1456.
Conti, J.C. et al. (2014). “Bayesian Exploratory Factor Analysis”. In: Journal of Econometrics 183,
pp. 31–57.
Durante, D. (2017). “A note on the multiplicative gamma process”. In: Statistics & Probability Letters
122, pp. 198–204.
Fan, Jianqing, Kunpeng Li, and Yuan Liao (2021). “Recent Developments in Factor Models and Ap-
plications in Econometric Learning”. In: Annual Review of Financial Economics 13(1), pp. 401–
430.
Frühwirth-Schnatter, S., D. Hosszejni, and H. F. Lopes (2022a). “Sparse finite Bayesian factor analy-
sis when the number of factors is unknown”. In: ArXiv 2301.06459.
Frühwirth-Schnatter, S., D. Hosszejni, and H.F. Lopes (2022b). “When it counts - Econometric iden-
tification of factor models based on GLT structures”. In: ArXiv: 2301.06354.
Frühwirth-Schnatter, S. and H. Lopes (2010). Parsimonious Bayesian Factor Analysis when the Num-
ber of Factors is Unknown. Research report. Booth School of Business, Univeristy of Chicago.
Frühwirth-Schnatter, S. and H. Lopes (2018). “Sparse Bayesian Factor Analysis when the Number of
Factors is Unknown”. In: ArXiv 1804.04231.
Frühwirth-Schnatter, Sylvia (2023). “Generalized Cumulative Shrinkage Process Priors with Appli-
cations to Sparse Bayesian Factor Analysis”. In: Philosophical Transactions of the Royal Society
A( 381), 381:20220148. DOI: 10.1098/rsta.2022.0148.
Geweke, J. and G. Zhou (1996). “Measuring the pricing error of the arbitrage pricing theory”. In:
Review of Financial Studies 9(2), pp. 557–587.

18
Ghosh, Joyee and David B. Dunson (2009). “Default Prior Distributions and Efficient Posterior Com-
putation in Bayesian Factor Analysis”. In: Journal of Computational and Graphical Statistics
18(2), pp. 306–320.
Green, P. (1995). “Reversible jump Markov chain Monte Carlo computation and Bayesian model
determination”. In: Biometrika 82(4), pp. 711–732.
Griffiths, T. and Z. Ghahramani (2006). “Infinite latent feature models and the Indian buffet process”.
In: Advances in Neural Information Processing Systems. Ed. by Y. Weiss, B. Schölkopf, and J.
Platt. Vol. 18. MIT Press.
Grushanina, M. and S. Frühwirth-Schnatter (2023). Dynamic Mixture of Finite Mixtures of Factor
Analysers with Automatic Inference on the Number of Clusters and Factors. arXiv: 2307.07045.
Gu, Y. and D.B. Dunson (2023). “Bayesian Pyramids: identifiable multilayer discrete latent struc-
ture models for discrete data”. In: Journal of the Royal Statistical Society Series B: Statistical
Methodology 85(2), pp. 399–426.
Ishwaran, H. and J.S. Rao (2005). “Spike and slab variable selection: Frequentist and Bayesian strate-
gies”. In: The Annals of Statistics 33(2), pp. 730–773.
Kapetanios, G. (2010). “A testing procedure for determining the number of factors in approximate
factor models with large datasets”. In: Journal of Business and Economic Statistics 3(28), pp. 251–
258.
Kaufmann, S. and C. Schumacher (2019). “Bayesian estimation of sparse dynamic factor models with
order-independent and ex-post made indetification”. In: Journal of Econometrics 210(1), pp. 116–
134.
Knowles, D. and Z. Ghahramani (2011). “Nonparametric Bayesian sparse factor models with appli-
cation to gene expression modeling”. In: The Annals of Applied Statistics 5(2B), pp. 1534–1552.
Kowal, D.R. and A. Canale (2022). “Semiparametric Functional Factor Models with Bayesian Rank
Selection”. In: ArXiv 2108.02151.
Legramanti, S., D. Durante, and D.B. Dunson (2020). “Bayesian cumulative shrinkage for infinite
factorizations”. In: Biometrika 107(3), pp. 745–752.
Lopes, H.F. and M. West (2004). “Bayesian model assessment in factor analysis”. In: Statistica Sinica
14(1), pp. 41–67.
Montagna, Silvia et al. (2012). “Bayesian Latent Factor Regression for Functional and Longitudinal
Data”. In: Biometrics 68(4), pp. 1064–1073.
Murphy, K., C. Viroli, and I.C. Gormley (2020). “Infinite Mixtures of Infinite Factor Analysers”. In:
Bayesian analysis 15(3), pp. 937–963.
Murray, J.S. et al. (2013). “Bayesian Gaussian Copula Factor Models for Mixed Data”. In: Journal of
the American Statistical Association 108(502), pp. 656–665.
Polasek, W. (1997). “Factor analysis and outliers: a Bayesian approach”. In: Discussion Paper, Uni-
versity of Basel.
Poworoznek, E., F. Ferrari, and D. Dunson (July 2021). “Efficiently resolving rotational ambiguity in
Bayesian matrix sampling with matching”. In: ArXiv: 2107.13783.

19
Rai, P. and H. Daume (2008). “The Infinite Hierarchical Factor Regression Model”. In: Advances in
Neural Information Processing Systems. Ed. by D. Koller et al. Vol. 21.
Rai, P. et al. (2014). “Scalable Bayesian Low-Rank Decomposition of Incomplete Multiway Tensors”.
In: Proceedings of the 31st International Conference on Machine Learning. Vol. 32. 2, pp. 1800–
1808.
Roberts, G.O. and J.S. Rosenthal (2007). “Coupling and ergodicity of adaptive Markov chain Monte
Carlo algorithms”. In: Journal of Applied Probability 44(2), pp. 458–475.
Ročková, V. and E.I. George (2016). “Fast Bayesian factor analysis via automatic rotation to sparsity”.
In: Journal of the American Statistical Association 111(516), pp. 1608–1622.
Schiavon, L. and A. Canale (2020). “On the truncation criteria in infinite factor models”. In: Stat 9(1),
e298.
Schiavon, L., A. Canale, and D.B. Dunson (2022). “Generalized infinite factorization models”. In:
Biometrika 109(3), pp. 817–835.
Sethuraman, J. (1994). “A constructive definition of Dirichlet priors”. In: Statistica Sinica 4, pp. 639–
650.
Spearman, C. (1904). “”General Intelligence,” Objectively Determined and Measured”. In: The Amer-
ican Journal of Psychology 15(2), pp. 201–292.
Stock, J.H. and M.W. Watson (2016). “Chapter 8 - Dynamic Factor Models, Factor-Augmented Vector
Autoregressions, and Structural Vector Autoregressions in Macroeconomics”. In: ed. by John B.
Taylor and Harald Uhlig. Vol. 2. Handbook of Macroeconomics. Elsevier, pp. 415–525.
Teh, Y., D. Görür, and Z. Ghahramani (2007). “Stick-breaking Construction for the Indian Buffet
Process”. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and
Statistics. Ed. by Marina Meila and Xiaotong Shen. Vol. 2. Proceedings of Machine Learning
Research. PMLR: San Juan, Puerto Rico, pp. 556–563.
Thurstone, L.L. (1931). “Multiple factor analysis”. In: Psychological Review 38(5), pp. 406–427.
Thurstone, L.L. (1934). “The Vectors of Mind”. In: The Psychological Review 41, pp. 1–32.
West, M. (2003). “Bayesian Factor Regression Models in the ”large p, small n” Paradigm”. In:
Bayesian Statistics. Oxford University Press, pp. 723–732.

View publication stats

Bus Admittance Impedance. Matrix Algorithm. Matlab
100% (2)
Bus Admittance Impedance. Matrix Algorithm. Matlab
4 pages
ClassNotes-Mathematics SSS3 First Term-881635007117
No ratings yet
ClassNotes-Mathematics SSS3 First Term-881635007117
126 pages
Matrix (Mathematics) - Wikipedia, The Free Encyclopedia PDF
No ratings yet
Matrix (Mathematics) - Wikipedia, The Free Encyclopedia PDF
20 pages
2017 Book PartialLeastSquaresPathModelin
100% (1)
2017 Book PartialLeastSquaresPathModelin
434 pages
Semiparametric Theory and Missing Data - Anastasios Tsiatis - Springer Series in Statistics, 1, 2006 - Springer - 9780387324487 - Anna's Archive
No ratings yet
Semiparametric Theory and Missing Data - Anastasios Tsiatis - Springer Series in Statistics, 1, 2006 - Springer - 9780387324487 - Anna's Archive
391 pages
CBSE Grade XII Maths Supplementary Material 2017-18
100% (1)
CBSE Grade XII Maths Supplementary Material 2017-18
203 pages
Matrix CPP Combine
No ratings yet
Matrix CPP Combine
14 pages
Bayesian Model Assessment in Factor Analysis: Federal University of Rio de Janeiro and Duke University
No ratings yet
Bayesian Model Assessment in Factor Analysis: Federal University of Rio de Janeiro and Duke University
27 pages
Large-Dimensional Factor Analysis With Weighted PCA
No ratings yet
Large-Dimensional Factor Analysis With Weighted PCA
110 pages
478 33 Powerpoint-Slides Ch-2 Arrays
No ratings yet
478 33 Powerpoint-Slides Ch-2 Arrays
18 pages
Points To Remember - Maths - Class10icse
No ratings yet
Points To Remember - Maths - Class10icse
8 pages
Jacobi Method
100% (1)
Jacobi Method
2 pages
XII - Maths - Module 4 - Matrix (Exercise)
No ratings yet
XII - Maths - Module 4 - Matrix (Exercise)
34 pages
DSP First Lab Manual
100% (1)
DSP First Lab Manual
167 pages
Unit-II DM Techniques
No ratings yet
Unit-II DM Techniques
20 pages
Summary
100% (1)
Summary
19 pages
Introduction To Factor Analysis - Fruchter, Benjamin - 1954 - New York, Van Nostrand - Anna's Archive
No ratings yet
Introduction To Factor Analysis - Fruchter, Benjamin - 1954 - New York, Van Nostrand - Anna's Archive
298 pages
Untitled
No ratings yet
Untitled
15 pages
Application of The Bootstrap Methods in Factor Analysis
No ratings yet
Application of The Bootstrap Methods in Factor Analysis
17 pages
Honey I Shrunk The Irrelevant Effects Simple and F - 2025 - Journal of Mathema
No ratings yet
Honey I Shrunk The Irrelevant Effects Simple and F - 2025 - Journal of Mathema
16 pages
A Modern Theory of Factorial Design - R. Mukerjee & C. F. Jeff Wu PDF
No ratings yet
A Modern Theory of Factorial Design - R. Mukerjee & C. F. Jeff Wu PDF
231 pages
Volume 1
100% (1)
Volume 1
195 pages
When Can Weak Latent Factors Be Statistically Inferred
No ratings yet
When Can Weak Latent Factors Be Statistically Inferred
74 pages
Factor Modeling For High-Dimensional Time Series: Inference For The Number of Factors
No ratings yet
Factor Modeling For High-Dimensional Time Series: Inference For The Number of Factors
34 pages
Factor Modelling For High-Dimensional Functional Time Series
No ratings yet
Factor Modelling For High-Dimensional Functional Time Series
56 pages
Linear Algebra
No ratings yet
Linear Algebra
197 pages
Quang Phan PHD Thesis - Weak Factor Model in Large Dimension
No ratings yet
Quang Phan PHD Thesis - Weak Factor Model in Large Dimension
114 pages
Likelihood-Free Adaptive Bayesian Inference Via Nonparametric Distribution Matching
No ratings yet
Likelihood-Free Adaptive Bayesian Inference Via Nonparametric Distribution Matching
61 pages
Asr 013
No ratings yet
Asr 013
16 pages
Day5.Matrices Notes
No ratings yet
Day5.Matrices Notes
3 pages
Han 2001
No ratings yet
Han 2001
12 pages
Part 3 Chapter 13
No ratings yet
Part 3 Chapter 13
13 pages
Tensor Analysis Basics
No ratings yet
Tensor Analysis Basics
45 pages
MUlticollinearity in EFA 2023
No ratings yet
MUlticollinearity in EFA 2023
22 pages
MN 18
No ratings yet
MN 18
67 pages
Algorithms For Non-Negative Matrix Factorization
No ratings yet
Algorithms For Non-Negative Matrix Factorization
7 pages
10 5923 J Ajms 20201002 03
No ratings yet
10 5923 J Ajms 20201002 03
11 pages
Machine Learning Econometrics Bayesian Algorithms
No ratings yet
Machine Learning Econometrics Bayesian Algorithms
33 pages
Lippincott Williams & Wilkins Medical Care
No ratings yet
Lippincott Williams & Wilkins Medical Care
18 pages
Econometrica - 2003 - Bai - Determining The Number of Factors in Approximate Factor Models
No ratings yet
Econometrica - 2003 - Bai - Determining The Number of Factors in Approximate Factor Models
31 pages
Intro To Factor Model - Good
No ratings yet
Intro To Factor Model - Good
70 pages
MTH603 Final Term Solved MCQs by JUNAID
No ratings yet
MTH603 Final Term Solved MCQs by JUNAID
41 pages
Larsson 2020
No ratings yet
Larsson 2020
20 pages
PYQ JUNE 2019 - VedPrep
No ratings yet
PYQ JUNE 2019 - VedPrep
51 pages
MOS-SF 36: Structural Equation Modeling To Test The Construct Validity of The Second-Order Factor Structure
No ratings yet
MOS-SF 36: Structural Equation Modeling To Test The Construct Validity of The Second-Order Factor Structure
10 pages
002 - Keller, Et Al (1998) - Use of Structural Equation Modeling To Test The Construct Validity of The SF-36
No ratings yet
002 - Keller, Et Al (1998) - Use of Structural Equation Modeling To Test The Construct Validity of The SF-36
10 pages
Assignment-2 SEE609
No ratings yet
Assignment-2 SEE609
3 pages
Robust Tests For Factor-Augmented Regressions With An Application To The Novel EA-MD Dataset
No ratings yet
Robust Tests For Factor-Augmented Regressions With An Application To The Novel EA-MD Dataset
61 pages
Chapter 4: Matrices and Determinants: Historical Note
100% (1)
Chapter 4: Matrices and Determinants: Historical Note
9 pages
Linear Alg II Chapter 1
No ratings yet
Linear Alg II Chapter 1
40 pages
A4 Maths 26 Page
No ratings yet
A4 Maths 26 Page
26 pages
At Siara S Macro Math Camp 2017
No ratings yet
At Siara S Macro Math Camp 2017
130 pages
APal Factor Analysis
No ratings yet
APal Factor Analysis
30 pages
Bai and NG 2002
No ratings yet
Bai and NG 2002
31 pages
The Dynamic, The Static, and The Weak Factor Models and The Analysis of High-Dimensional Time Series
No ratings yet
The Dynamic, The Static, and The Weak Factor Models and The Analysis of High-Dimensional Time Series
25 pages
Intro to Diagonalizable Matrices
No ratings yet
Intro to Diagonalizable Matrices
5 pages
6515 Transcripts DC5
No ratings yet
6515 Transcripts DC5
26 pages
15 Aos1364
No ratings yet
15 Aos1364
36 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
33 pages
Minimum L - Distance Estimators For Non-Normalized Parametric Models
No ratings yet
Minimum L - Distance Estimators For Non-Normalized Parametric Models
32 pages
Linear Algebra Final Review
No ratings yet
Linear Algebra Final Review
7 pages
Poncela
No ratings yet
Poncela
34 pages
Joreskog 1972
No ratings yet
Joreskog 1972
19 pages
(O) Fan, J., Lou, Z., & Yu, M. (2023) - Are Latent Factor Regression and Sparse Regression Adequate
No ratings yet
(O) Fan, J., Lou, Z., & Yu, M. (2023) - Are Latent Factor Regression and Sparse Regression Adequate
14 pages
AlessiBarigozziCapasso-Improved Penalization For Determining The Number of Factors in Approximate Factor Models
No ratings yet
AlessiBarigozziCapasso-Improved Penalization For Determining The Number of Factors in Approximate Factor Models
16 pages
10 1 1 142 9053top
No ratings yet
10 1 1 142 9053top
11 pages
Soluciones Algebra
No ratings yet
Soluciones Algebra
49 pages
Goldenratio Using DIFF EQ
No ratings yet
Goldenratio Using DIFF EQ
14 pages
Bayesian Tensor Factorisations For Time Series of Counts 3pk66m9hsc
No ratings yet
Bayesian Tensor Factorisations For Time Series of Counts 3pk66m9hsc
21 pages
Bai - Inferential Theory For Factor Models of Large Dimensions
No ratings yet
Bai - Inferential Theory For Factor Models of Large Dimensions
37 pages
BeOuTr Tqabmf
No ratings yet
BeOuTr Tqabmf
15 pages
XIIhdj
No ratings yet
XIIhdj
6 pages
MATH110 Homework 13 & 14: Outline
No ratings yet
MATH110 Homework 13 & 14: Outline
4 pages
Akaike 1987
No ratings yet
Akaike 1987
16 pages
Octave Matrix Functions Guide
No ratings yet
Octave Matrix Functions Guide
7 pages
Pertemuan 9
No ratings yet
Pertemuan 9
34 pages
Solutions # 8: Department of Physics IIT Kanpur, Semester II, 2022-23
No ratings yet
Solutions # 8: Department of Physics IIT Kanpur, Semester II, 2022-23
9 pages
State Space
No ratings yet
State Space
50 pages
Factor Analysis
No ratings yet
Factor Analysis
12 pages
Factor Analysis for Data Scientists
No ratings yet
Factor Analysis for Data Scientists
20 pages
Simultaneous Factor Analysis in Several Populations
No ratings yet
Simultaneous Factor Analysis in Several Populations
18 pages
Bayesian Model for Grouped Count Data
No ratings yet
Bayesian Model for Grouped Count Data
27 pages
Application of Newton Raphson Method To Non Linear Models 1
No ratings yet
Application of Newton Raphson Method To Non Linear Models 1
12 pages
1861 Algorithms For Non Negative Matrix Factorization
No ratings yet
1861 Algorithms For Non Negative Matrix Factorization
7 pages
Notes5 PDF
No ratings yet
Notes5 PDF
11 pages
Factorial Models for ANOVA
No ratings yet
Factorial Models for ANOVA
9 pages
Double Cross Validation for Factor Models
No ratings yet
Double Cross Validation for Factor Models
34 pages
Appears in Computer Algebra, Second Edition, B. Buchberger, R. Loos, G. Collins, Editors, Springer Verlag, Vienna, Austria, Pp. 95-11 (1982)
No ratings yet
Appears in Computer Algebra, Second Edition, B. Buchberger, R. Loos, G. Collins, Editors, Springer Verlag, Vienna, Austria, Pp. 95-11 (1982)
21 pages
Focardi Sergio
No ratings yet
Focardi Sergio
125 pages
Predicting Recessions With Factor Linear Dynamic Harmonic Regressions
No ratings yet
Predicting Recessions With Factor Linear Dynamic Harmonic Regressions
19 pages
Bayesian Factor Analysis For Mixed Ordinal and Continuous Responses
No ratings yet
Bayesian Factor Analysis For Mixed Ordinal and Continuous Responses
16 pages
Probabilistic Factorization of Non-Negative Data With Entropic Co-Occurrence Constraints
No ratings yet
Probabilistic Factorization of Non-Negative Data With Entropic Co-Occurrence Constraints
8 pages

AReviewof Bayesian Methodsfor Infinite Factorisations

Uploaded by

AReviewof Bayesian Methodsfor Infinite Factorisations

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Review of Bayesian Methods for Inﬁnite Factorisations

Preprint · September 2023

The user has requested enhancement of the downloaded file.

September 25, 2023

2 Bayesian infinite factor model

where Λ is an unknown p × K factor loading matrix with factor loadings λih (i = 1, . . . , p, h =

ǫt ∼ Np (0, Σ), Σ = diag(σ12 , . . . , σp2 ).

2.2 Standard Gibbs sampler

0 , . . . , D 0 ) and λ is the ith row of the factor loading matrix Λ.

Step 2. Sample σi−2 for i in (1, . . . , p) from

Step 3. Sample ft for t in (1, . . . , T ) from

ft |− ∼ NK (IK + ΛT Σ−1 Λ)−1 ΛT Σ−1 yt , (IK + ΛT Σ−1 Λ)−1

where Σ = diag(σ12 , . . . , σp2 ).

3 Multiplicative gamma process prior

3.2 Inference and adaptive Gibbs sampler

Step 5. Sample δ1 from

Sample δh for h ≥ 2 from

Γ(a1 ) ap1 ap1 −a1 a1 −ap

3.3 Practical applications and properties

a2 > b2 + 1, a2 > a1 (4)

tr(Λk∗ ΛTk∗ ) + tr(Σ)

4 Cumulative shrinkage process prior

λih | θh ∼ N (0, θh ), where i = 1, . . . , p and h = 1, . . . , ∞

λih ∼ (1 − πh )t2aθ (0, bθ /aθ ) + πh N (0, θ∞ )

4.2 Inference and adaptive Gibbs sampler

θh | zh ∼ {1 − 1(zh ≤ h)}G −1 (aθ , bθ ) + 1(zh ≤ h)δθ∞ .

Sample zh for h in (1, . . . , H) from a categorical distribution with probabilities as below

– if H ∗(g) < H (g−1) − 1:

4.3 Practical applications and properties

5 Indian buffet process prior

λih |zih ∼ zih N (λih ; 0, βh−1 ) + (1 − zih )δ0 (λih ),

5.2 Inference and adaptive Gibbs sampler

where fh is a vector of t = 1, . . . , T observations of factor h.

5.3 Practical applications and properties

λih | τh ∼ (1 − τh )δ0 + τh Pslab (λih ), τh | K ∼ B(aK , bK ), h = 1, . . . , K, (6)

6 Generalised infinite factor models

yit = si (zit ), zt = Λft + ǫt , ǫt ∼ ηǫ , (7)

where Λ is a p × K factor loading matrix, ft is a K-dimensional factor with a diagonal covariance

E(φih | βh ) = g(xTi βh ), βh = (β1h , . . . , βqh )T , βmh ∼ ηβ , m = 1, . . . , q,

τ0 = 1, γh = νh ρh , φih | βh ∼ Ber{logit−1 (xTi βh )cp },

similar to Legramanti et al. (2020).

7 Discussion and identification issues

yt = Λft + ǫt and yt = Θgt + ǫt

View publication stats

You might also like