PARAFAC. Tutorial and Applications: Elsevier
PARAFAC. Tutorial and Applications: Elsevier
intelligent
laboratory systems
ELSEVIER Chemomemcs and Intelligent Laboratory Systems 38 (1997) 149-171
Tutorial
Abstract
This paper explains the multi-way decomposition method PARAFAC and its use in chemometrics. PARAFAC is a gener-
alization of PCA to higher order arrays, but some of the characteristics of the method are quite different from the ordinary
two-way case. There is no rotation problem in PARAFAC, and e.g., pure spectra can be recovered from multi-way spectral
data. One cannot as in PCA estimate components successively as this will give a model with poorer fit, than if the simultane-
ous solution is estimated. Finally scaling and centering is not as straightforward in the multi-way case as in the two-way
case. An important advantage of using multi-way methods instead of unfolding methods is that the estimated models are
very simple in a mathematical sense, and therefore more robust and easier to interpret. All these aspects plus more are ex-
plained in this tutorial and an implementation in Matlab code is available, that contains most of the features explained in the
text. Three examples show how PARAFAC can be used for specific problems. The applications include subjects as: Analysis
of variance by PARAFAC, a five-way application of PA&WAC, PAFUFAC with half the elements missing, PARAFAC
constrained to positive solutions and PARAFAC for regression as in principal component regression.
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
2. Nomenclature ...................... . . . . .. . . . . .. . . . .. . . . .. , 151
3. Themodel ........................ .. .. . . . . . .. . . . .. . . .. . . . 152
3.1. Uniqueness .................... 152
3.2. Rank of multi-way arrays. ............ 153
4. Implementation ............ . . . . . . . . . . ...... ...... 153
4.1. Alternating least squares .... . . *. . . . . . . ...... ...... 153
4.1.1. Compressing ...... . . . . . . . . . . ...... ...... 155
4.1.2. Extrapolating ...... . . . . . . . . . . . . . . .. ...... 155
4.1.3. Initialization ...... . . .. . . . . . . ...... ...... 155
4.2. Stopping criterion ........ . . . . . . . . . ...... ...... 156
4.3. Constraining the solution .... . . . . . . . . . . ...... ...... 156
4.4. Missing values ......... . . . . . . . . . . ...... ...... 157
5. preprocessing ......................................... 157
* E-mail: [email protected].
10. Application III: Prediction of amino-N in sugar samples fromfluorescence .................. 167
10.1. Data .................................................... 167
10.2. Results and discussion .......................................... 167
10.3. M’n’M .................................................. 168
archy among these methods. Kiers [6] shows that atic part of the data. In the same way multi-way
PARAFAC can be considered a constrained version methods are less sensitive to noise and further give
of Tucker3, and Tucker3 a constrained version of loadings that can be directly related to the different
two-way PCA. Any data set that can be modeled ad- modes of the multi-way array. That two-way PCA can
equately with PARAFAC can thus also be modeled give very complex models can be illustrated with an
by Tucker3 or two-way PCA, but PARAFAC uses example. For an F-component PCA solution to an I
fewer degrees of freedom. A two-way PCA model X J X K array unfolded to an Z X JK matrix, the
always fits data better than a Tucker3 model, which PCA model consists of F( I + JK >parameters (scores
again will fit better than a PARAFAC model, all ex- and loading elements). A corresponding Tucker
cept for extreme cases where the models may fit model with equal number of components in each
equally well. If a PARAFAC model is adequate, mode would consist of F(Z + .Z + K) + F 3,and
Tucker3 and two-way PCA models will tend to use PARAFAC F(Z + J + K) parameters. For a hypo-
the excess degrees of freedom to model noise or thetical example consider a 10 X 100 X 20 array
model the systematic variation in a redundant way modeled by a 5 component solution. A two-way PCA
(see the last application). Therefore one will gener- model of the 10 X 2000 unfolded array consists of
ally prefer to use the simplest possible model. This 10050 parameters, a Tucker model of 775 and a
principle of using the simplest possible model is old, PARAFAC model of 650 parameters. Clearly, the
in fact dating back as long as to the fourteenth cen- PCA model will be more difficult to interpret than the
tury (Occam’s razor), and is now also known as the multi-way models.
law or principle of parsimony [7]. In the sense that it In this paper a tutorial of how to use PARAFAC
uses most degrees of freedom the PCA model can be is given. The interest in PARAFAC and related
considered the most complex and flexible model, methods is often hampered by practical considera-
while PARAFAC is the most simple and restricted tions regarding how to implement the algorithm, how
model. to do sound analysis etc. Many excellent papers on
Conceptually some may find two-way PCA more PARAFAC are not published in readily available pa-
simple than the multi-linear methods, but in a multi- pers. The essence of some of these papers is pre-
way context this is not so. Because the array has to sented. A very annoying characteristic of PARAFAC
be unfolded to a matrix before two-way analysis, the is the long time required to calculate the models. The
variables in the unfolded modes get mixed up, so that algorithms used are most often based on alternating
the effect of one variable is not associated with one least squares (ALS) initialized by either random val-
but many elements of a loading vector. Consider an ues or values calculated by a direct trilinear decom-
even more complex model than two-way PCA, e.g. a position based on the generalized eigenvalue prob-
model that does not assume any structure at all but lem. Here the ALS algorithm of PARAFAC is modi-
models each data element individually. This model fied in simple manners, which brings about a de-
would equal the data and obviously use all degrees of crease in the number of iterations and time required
freedom, giving a perfect fit. Thus, the more struc- to calculate the models of up to 20 times.
ture the poorer the fit is and the simpler the model is. In the following, the discussion will be limited to
It is apparent that the reason for using multi-way three-way data for simplicity, but most results are
methods is not to obtain better fit, but rather more valid for data and models of any (higher) order. Three
adequate, robust and interpretable models. This can to applications will show some typical applications of
some extent be compared to the difference between PARAFAC and also include higher order models.
using multiple linear regression (MLR) and partial
least squares regression (PLS) for multivariate cali-
bration. MLR is known to give the best fit to the de- 2. Nomenclature
pendent variable of the calibration data, but in most
cases PLS has better predictive power. PLS can be In the following, scalars are indicated by lower-
seen as a constrained version of MLR, where the case italics, vectors by bold lower-case characters,
constraints helps the model focusing on the system- bold capitals are used for two-way matrices, and un-
152 R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171
derlined bold capitals for three-way arrays. The let- minimize the sum of squares of the residuals, eijk in
ters I, .Z, K, L and M are reserved for indicating the the model
dimension of different modes. The ijkth element of X
is called xijk. The terms mode, way and order a& (1)
used more or less interchangeably though a distinc-
tion is sometimes made between the geometrical di-
This equation is shown graphically in Fig. 1 for two
mension of the hypercube - the number of ways -
components (F = 2).
and the number of independent ways - which is the
The model can also be written
order/mode [3,6]. An ordinary two-way covariance
F
matrix is only a one-mode array, because the vari-
ables are identical in the two ways. Likewise there -X= xaf8bf@cf
f= 1
will not be distinguished between the terms factor and
component. When three-way arrays are unfolded to where af, bf and cf are the fth columns of the load-
matrices the following notation will be used: If X is ing matrices A, B and C respectively [9].
an ZXJXKarrayandisunfoldedtoan ZXJKma-
trix the order of .Z and K indicates which indices are 3.1. Uniqueness
running fastest. In this case the indices of .Z are run-
ning fastest, meaning that the first .Z rows of X con- An obvious advantage of the PARAFAC model is
tain all variables for k = 1 and for j = 1 to j = J. the uniqueness of the solution. In bilinear methods
there is a well-known problem of rotational freedom.
The loadings in a spectral bilinear decomposition re-
3. The model flect the pure spectra of the analytes measured, but it
is not possible without external information to actu-
PARAFAC is a decomposition method, which ally find the pure spectra because of the rotation
conceptually can be compared to bilinear PCA, or problem. This fact has prompted a lot of different
rather it is one generalization of bilinear PCA, while methods for obtaining more interpretable models than
the Tucker3 decomposition is another generalization PCA and models alike [lo-121, or for rotating the
of PCA to higher orders [8,9]. The model was inde- PCA solution to more appropriate solutions. Most of
pendently proposed by Harshman [l] and by Carroll these methods, however, are more or less arbitrary or
and Chang [2] who named the model CANDECOMP have ill-defined properties. This is not the case in
(canonical decomposition). A decomposition of the PARAFAC. If the data is indeed trilinear, the true
data is made into triads or trilinear components, but underlying spectra (or whatever constitute the vari-
instead of one score vector and one loading vector as ables) will be found if the right number of compo-
in bilinear PCA, each component consists of one nents is used and the signal-to-noise ratio is appro-
score vector and two loading vectors. It is common priate [13-B]. This important fact is what originally
three-way practice not to distinguish between scores initiated R. A. Harshman to develop the method based
and loadings as these are treated equally numerically. on an idea from 1944 [16]. It is a very strong feature,
A PARAFAC model of a three-way array is given which gives the PARAFAC model an unsurpassed
by three loading matrices, A, B, and C with elements advantage.
aif, bjf, and ckf. The trilinear model is found to Leurgans et al. [ 171 among others have shown, that
Fig. 1. A graphical representation of a two-component PARAFAC model of the data army -X.
R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 153
unique solutions can be expected if the loading vec- three vectors [9]. The rank of a three-way array is
tors are linear independent in two of the modes, and equal to the minimal number of triads necessary to
furthermore in the third mode the less restrictive describe the array. For a 2 X 2 X 2 array it turns out,
condition is that no two loading vectors are linearly that the maximal rank is three! This means that there
dependent. A good example of this is given in the exist 2 X 2 X 2 arrays that cannot be described using
second application below. Kruskal [ 15,181 gives even only two components. An example can be seen in
less restrictive conditions for when unique solutions [20]. For a 3 X 3 X 3 array the maximal rank is five
can be expected. He uses the k-rank of the loading (see for example [ 181). These results may seem
matrices, which is a term introduced by Harshman strange, but are due to the special structure of the
and Lundy [ 191. If any combination of k’ columns of multilinear model compared to the bilinear. Further-
A has full column-rank, and this does not hold for k’ more Kruskal has shown that if for example 2 X 2 X 2
+ 1, then the k-rank of A is k’. The k-rank is thus arrays are generated randomly from any reasonable
related, but not equal, to the rank of the matrix, as the distribution, the volumes or probabilities of the array
k-rank can never exceed the rank. Kruskal proves that being of rank two or three are both positive. This as
if opposed to two-way matrices where only the full-rank
case has positive volume. The practical implication of
k1+k2+k3>2F+2,
this is yet to be seen, but the rank of an array might
then the PARAFAC solution is unique. k’ is the k-
have importance when one wants to create a multi-
rank of A, k* is the k-rank of B, k3 is the k-rank of
way array in a parsimonious way, yet still with suffi-
C and F is the number of PARAPAC components
cient dimensions to describe the phenomena under
sought.
investigation. It is already known, that unique de-
The mathematical meaning of uniqueness is that
compositions can be obtained even for arrays where
the estimated PARAFAC model cannot be rotated
the rank exceeds any of the dimensions of the differ-
without a loss of fit, as opposed to two-way analysis
ent modes. It has been reported that a ten factor model
where one may rotate scores and loadings without
was uniquely determined from an 8 X 8 X 8 array
changing the fit of the model. A unique solution
[ 1,14,19]. This shows that parsimonious arrays might
therefore means, that no restrictions are necessary to
contain sufficient information for quite complex
identify estimate the model apart from trivial varia-
problems, specifically that the three-way decomposi-
tions of scale and column order. For appropriate
tion is capable of withdrawing more information from
noise, i.e. random and not to severe, it also holds that
data than two-way PCA. Unfortunately there are not
the true underlying trilinear model will be the model
explicit rules for determining the maximal rank of
with the best fit. Therefore the true and estimated
arrays in general, except for the two-way case, and
models must coincide when the right number of
some simple three-way arrays.
components is chosen.
moderate PC (depending on implementation and compressed array can of course be compressed fur-
convergence criterion of course). This is problematic ther in another mode.
when recalculation of the model is necessary, which The compression of modes has been implemented
is often the case e.g. during outlier detection. To so that compression is done whenever the number of
make PAPAFAC a workable method it is therefore of variables in one mode exceeds the number of factors
utmost importance to develop faster algorithms. Us- sought with ten. If one mode consists of 20 variables
ing more computer power could of course solve the and a 5 factor model is estimated this mode is thus
problem but there is an annoying tendency of the data compressed as the dimension of the mode (20) ex-
sets always to be a little larger, than what is optimal ceeds 5 + 10. The number of principal components to
for the current computer power. In the implementa- compute is set to the number of factors in the
tion used here two acceleration methods have been PARAFAC model plus two. These settings work for
built in (for others see e.g. [1,22,23]). many types of problems often encountered in our re-
search group, although sometimes other settings may
4.1. I. Compressing be optimal, because the optimal settings depend on
Like in bilinear PCA one most often seeks a low- the type of data investigated (primarily the signal-
dimensional representation of a high-dimensional ar- to-noise ratio). When implementing PCA, one has to
ray in PARAFAC. This implies, that the data array is pay attention to the time demand of the PCA algo-
redundant, i.e., there is collinearity between the vari- rithm. Working on cross-product matrices instead of
ables. Consider a 5 X 6 X 500 array, where the third the raw data can speed up the algorithm substan-
500dimensional mode might be spectral. Using ALS tially, if one mode of the unfolded array is very large
on such an array is computationally costly. From the compared to the other mode.
theory of PCA it is known, that the variations in the When a nonnegativity constraint is used (see later)
spectra can be well represented by a low-dimen- compressing by PCA is not appropriate. Instead one
sional score matrix, that contains the main systematic can use a subset of the original variables to estimate
part of the variations. If the data array is unfolded an initial model. This submodel can be found on a
keeping the high-dimensional mode intact, one ob- smoothed version of the original data to ensure that
tains a 30 X 500 dimensional matrix. By two-way important aspects are not missing.
PCA we can describe most of the variation in this
matrix by a score matrix, of say, dimension 30 X 5. 4.1.2. Extrapolating
Folding back the matrix to a three-way array, the ar- Another method for speeding up the ALS algo-
ray now has the dimensions 5 X 6 X 5. Calculating rithm is to use the ‘temporal’ information in the iter-
the PARAFAC model on this low-dimensional array ations. The simple idea is to perform a predefined
takes only a fraction of the time required to calculate number of cycles of ALS-iterations and then these
the PARAFAC model of the high-dimensional array. estimates of the loadings are used to predict new es-
The estimated model is only describing the score ma- timates elementwise. There are two good reasons for
trix and not the original array, but it is only in the using the temporal information in the iterations of the
compressed mode, that the estimated loadings differ. PARAFAC-ALS algorithm. (i) It is only in the first
We can convert the calculated loadings in that mode few iterations that major changes occur in the esti-
into the original variable space by multiplying the mates of the elements of the loadings. The main frac-
loadings from the PARAFAC model - which are tion of iterations are used for minor modifications of
loadings in a score space - with the loadings from these factors. (ii) The changes in each element of the
the PCA. The PARAFAC model achieved hopefully factors is most often systematic and quite linear over
equals a PARAFAC model calculated from the origi- short ranges of iterations.
nal array. To ensure this, ALS is applied to the origi- To make it profitable to extrapolate it is neces-
nal array using the calculated loadings and scores as sary, that the time required to extrapolate is less, than
starting values. If the model is good only few extra the time required to perform a corresponding number
iterations will be necessary in the high-dimensional of iterations. This to some extent limits the applica-
space. If several modes are high-dimensional the bility of the method, because very ingenious extrapo-
156 R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171
lations with quadratic fit and adaptive parameters tend ALS algorithm tends to get stuck in local minima, a
to be so slow, that there is no gain in computing time. good initialization might help overcoming this prob-
Several implementations have been tried ending up lem. Our experience is that local minima is seldom a
with an algorithm by Claus A. Andersson, which problem if the data are trilinear, but others have re-
works fast in our Matlab code. At the ith iteration the ported differently [28,29]. Another practical problem
estimated factor loadings A, B and C are saved as Al, with these methods is how to extend them to higher
Bl and Cl. After the (i + 1)th iteration a linear re- orders. This problem has not yet been addressed.
gression is performed for each element to predict the
value of that element a certain number of iterations
4.2. Stopping criterion
ahead. As only two values of each element are used
in the regression, the prediction can simply be writ-
ten The importance of using a suitable stopping crite-
rion has been mentioned by several authors. It some-
Anew = Al + (A - Al)d, times occurs, that even small changes in the fit can
be associated with huge differences in the estimated
where d is the number of iterations to predict ahead.
loadings, because the response surface of the least
Making d dependent on the number of iterations have
squares error function is very flat [l]. This is espe-
proved useful, and specifically letting
cially true if some underlying phenomena are highly
d = it1i3 correlated. As a safeguard against this, one can run
the algorithm twice. If the algorithm has truly con-
where it is the number of iterations. When applying verged, the two solutions will essentially be identi-
the extrapolation, it is important not to extrapolate cal. If the algorithm has not converged it is unlikely,
during the first, say five, iterations, because the vari- that the estimated solutions are identical if a random
ations in the elements are very unstable in the begin- initialization has been used. A common criterion to
ning. If some modes are constrained, extrapolation use, is to stop the iterations when the relatiue change
has to wait longer for the iterations to be stable. Fur- in fit between two iterations is below a certain value
thermore if the extrapolations fail to improve the fit (e.g., 10p6). In some cases a low change in the rela-
persistently (more than four times) the number d is tive changes of the loadings is used [30]. The differ-
lowered from it’lR to itlln+‘. ence between these two approaches is not clear.
the loadings. Unless all modes are to be orthogonal- timate is given for free when iterating in the ALS al-
ized, this is not a problem, but merely a matter of gorithm. The estimate of the ijkth element of -X is
scaling. Models estimated under orthogonality con- F
straints will differ from models estimated without this iijk = c aifbjfckf.
constraint. The models however, will still be mathe- f= 1
ing vectors. For closed systems one might for exam- This is often referred to as single-centering. The cen-
ple want to restrict the sum of the F - 1 first loading tering shown above is called centering across the first
vectors to equal the Fth loading vector in one mode mode, which is the terminology suggested in [39].
to ensure, that the solution follows the known behav- The centering can of course be applied to any of the
ior of the underlying phenomena. This can be ac- modes, depending on the problem. If centering is to
complished using equality constraints in the least be performed across more than one mode, one has to
squares solutions [32-361. do this by first centering one mode, and then center
It is possible to fix certain loadings to predefined the outcome of this centering. If two centerings are
values (usually zero or one) by adjusting for these el- performed in this way, it is often referred to as dou-
ements during the regression steps in the ALS algo- ble-centering. Triple-centering means centering
rithm. For other knowledge based types of con- across all three modes one at a time. In [39-411 the
straints see [37,38]. effect of both scaling and centering on the trilinear
behavior of the data is described. It turns out that
centering one mode at a time, is the only appropriate
4.4. Missing values way of centering, with respect to the assumptions of
the PARAFAC model. Centering one mode at a time
Missing values in PARAFAC are easily handled essentially removes any constant levels in that partic-
by iteratively estimating the missing values. This es- ular mode. Centering for example matrices instead of
158 R. Bro/Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171
Fig. 5. Three-way unfolded array, with rows constituting one intact mode. Centering must be done across the columns of this matrix, while
scaling has to be done on the rows.
columns will destroy the multilinear behavior of the The scaling shown above is referred to as scaling
data, because more constant levels are introduced within the first mode. When scaling within several
than eliminated. The same holds for other kinds of modes is desired, the situation is a bit complicated
centering. One may, for instance, know that the true because scaling one mode affects the scale of the
model consists of a set of PARAFAC terms and one other modes. If scaling to norm one is desired within
overall level, which might incline one to estimate a several modes, this has to be done iteratively, until
PARAFAC model on the original data subtracted the convergence [39]. Another complicating issue, is the
grand level. However, even though the mathematical interdependence of centering and scaling. In general
structure might theoretically be true, the subtraction scaling within one mode disturbs prior centering
of the grand level introduces some artifacts in the across the same mode, but not across other modes.
data, not easily described by the PARAFAC model. Centering across one mode disturbs scaling within all
The model obtained as the grand level plus the modes [41]. Hence only centering across arbitrary
PARAFAC model is not the global least squares esti- modes or scaling within one mode is straightforward,
mate given the required structure. The grand level and and not all combinations of iterative scaling and cen-
the PARAFAC model would have to be estimated si- tering will converge. These rules may sound compli-
multaneously to obtain the global least squares model. cated, but in practice it need not influence the out-
Instead the subtraction of the grand level shifts the come much if the iterative approach is not used.
data, so that an extra spurious component will be Scaling to a sum-of-squares of one is arbitrary any-
necessary to describe the variation [40]. Scaling in way and it may be just as defensible to just scale
multi-way analysis also has to be done, taking the tri- within the modes of interest once, thereby having at
linear model into account. One should not, as with least mostly equalized any huge differences in scale.
centering, scale column-wise, but rather whole ‘slabs’ Centering can then be performed after scaling and
of the array should be scaled. If variable j of the sec- thereby it is assured that the modes to be centered are
ond mode is to be scaled (compared to the rest of the indeed centered. In the Matlab code available from
variables in the second mode), it is necessary to scale the Internet (see materials and methods) an M-file is
all columns where variable j occurs. This means that given to perform the iterative scaling and centering
one has to scale whole matrices instead of columns. procedures.
For a four-way array, one would have to scale three- A common rule of thumb is to center across the
way arrays. Mathematically scaling can be described mode of interest, but of course the purpose of center-
ing is to remove constant levels, hence knowledge of
the data might guide the proper preprocessing. The
appropriate centering and scaling procedures can
where si can be defined as
most easily be summarized in a figure where the ar-
ray is shown unfolded to a matrix (Fig. 5). Centering
must be done across the columns of this matrix, while
scaling should be done on the rows of this matrix.
R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 159
6. Assessing the solution is the ith element of u and has a value between zero
and one [42]. A high value indicates an influential
6.1. Postprocessing sample or variable, while a low value indicates the
opposite. Samples or variables with high leverages
and low in case of a variable mode must be investi-
As in two-way PCA different scalings of the solu-
gated to verify if they are inappropriate for the model
tion can be used. Scaling one loading vector by a
(outliers) or are indeed influential and acceptable. If
constant does not change the model, if another load-
a new sample is fit to an existing model, the leverage
ing vector of the same component is scaled accord-
can be calculated using the new scores for that sam-
ingly by the inverse of the same constant. The load-
ple as in ordinary regression analysis. The leverage is
ing vectors of the second and third mode can be nor-
then no longer restricted to be below one. As lever-
malized to length one. The scores or loadings of the
ages are actually developed for regression analysis,
first mode will then show the sum-of-squares, SS, of
the term squared Mahalanobis distance might be more
each component as
appropriate for a decomposition method as
PARAFAC, but as leverages are also widely used in
two-way PCA, the term leverage is preferred here.
i-1 j-1 k=l i=f j-1 k=l
i and j indicating the ith and jth component. still be appropriate but if the differences are large, this
Mitchell and Burdick [29,45] refer to TC as the un- indicates that some latent variables do not vary across
corrected correlation coefficient (UCC). The TC some of the ways, or perhaps vary interdependently.
value can be shown to correspond to the cosine of the In such a case the Tucker (or restricted versions) or
angle between two vectors, xi and xj, where xi is unfold bilinear models might be better [48-501.
the vector obtained by properly unfolding the tensor Mitchell and Burdick [29,45] investigate degener-
product of all loading vectors with index i (like the acy and find it profitable to do several runs of a few
b and c vector in Fig. 4 only a should also be in- iterations, and only use those runs that are not sub-
cluded). ject to degeneracy. Another way of circumventing
A TC value close to - 1 indicates a degenerate degenerate solutions is by applying orthogonality
solution. A TC value lower than - 0.85 is an indica- constraints on the model.
tion of a troublesome model according to [46], but this If the variation in one mode is not exactly obey-
can just be taken as a rule of thumb. Furthermore, for ing the linearity of the PARAFAC model, it is possi-
degenerate solutions the TC value will typically con- ble to eliminate this mode by using it for calculating
tinue to worsen for more iterations. If the numeri- covariance or cross-product matrices. Fitting the
cally high TC value is just caused by a poor initial- model to these derived data, is called indirect fitting
ization, the TC value will decrease again numeri- and has been used for longitudinal data [8,51]. Con-
cally. If several new estimations of the same model sider a three-way data array where the third mode
are consistent and not degenerate, the degenerate so- could be chromatographic. Perhaps the chromato-
lution can be discarded as an accidental local mini- graphic profiles of the same analytes change a little
mum. If decreasing the convergence criterion does from sample to sample due to analytical properties.
not eliminate degeneracy the cause is most often one The data array is therefore almost trilinear, but the
of three [41,47]. (i) Too many components are ex- differences from sample to sample in the third mode
tracted. This will be easily recognizable, by the fact makes it hard to make a sound PARAFAC model.
that models with fewer components yield nondegen- The I X J X K array can be seen as J matrices of size
erate solutions. Often extracting too many compo- I X K. For each j (1 to J) one can calculate an I X I
nents will give high positive TC values just as well covasiance matrix as XIX,, where Xj is the Z X K
as negative, which is not the case for real degenerate submatrix of X on the jth level of the second mode.
solutions. Split-half experiments will also help to dis- The thus obt&ed data array has the size Z X Z X J
tinguish this situation from more serious problems. and consists of covariance matrices. The original third
(ii) Poor preprocessing has been applied, which can mode has vanished and fitting the PARAFAC model
be characterized by degenerate solutions even for a to this array will give the following model (compare
low number of components, and when other informa- Eq. (1)).
tion indicates that further systematic information is F
There are three different types of data, that are PPO activity
8.1. Data
300
1 I
1t
+ ‘0 04
2 25o
++ ++ Q
+ al ^^”
Replicate 2 Replicat&-1
Fig. 7. Activities of one replicate set versus the other (a). Predictions of the activities of one replicate set from a model of the other replicate
set (b).
In the PARAFAC model, the data is interpreted as To decide on the number of components, a five-
a multi-way array of activities, specifically a five-way way PARAFAC model was made using the first of
array. In a sense, the whole array is seen as one sam- the two replicate sets instead of just using the mean
ple namely PPO activity, which is measured at dif- of these. The model from this analysis given by the
ferent conditions. The five different modes (ways) loadings A, B, C, D and E was compared to the other
are: 0, (dimension five), CO, (dimension three), pH replicate set. The number of components was chosen
(dimension three), temperature (dimension three), and to minimize the sum squared prediction error calcu-
substrate type (dimension three). The ijklmth ele- lated as
ment of the five-way array contains the activity at the
1
ith 0, level, the jth CO, level, the kth level of tem-
SS = 5 aijbjfckfd,fe,f - xijklm ’
perature, the Eth level of substrate type, for the mth
f= 1
1
pH. The five-way array is depicted in Fig. 6. Each
element in the array is the average of the two repli- Xijklm being the ijkZmth element of the replicate set
cates. not used for estimating the model. One component
gave the lowest prediction error, which furthermore
8.2. Results and discussion was in the neighborhood of the intrinsic error of the
reference value (standard deviation between repli-
Preprocessing of ANOVA data is somewhat diffi- cates 11.9 and standard deviation between the model
cult and no general guidelines can be given. One must and the test set 13.4 corresponding to 94% variance
either try different scalings and centerings or use ex- explained). The predictions are shown in Fig. 7,
ternal knowledge for guidance. In this case, the only where one clearly sees, that the model is very good
preprocessing thought to be of potential importance and comparable to the intrinsic error of the data.
was scaling the oxygen mode. From the residuals of The activity can hence be modeled by a very sim-
the two sets of replicates, some heteroscedasticty was ple one-component model. The loading vectors of this
observed in the oxygen mode. However, scaling the model are shown in Fig. 8. To predict the activity at
data according to this did not improve the predic- a certain setting of the different factors, one simply
tions of one replicate set predicted from the other, read the ordinate-values of the five different factors
simply because the elements with high residuals were and multiply these. If a low activity is sought it is very
downweighted and henceforth modeled with even easy to see how this can be obtained, i.e. by keeping
higher error. Other kinds of meancentering and scal- the temperature, oxygen and pH levels as low as pos-
ing were tried, but without improving the solution sible.
[581. A one-component solution was also found from a
164 R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171
test set procedure where half the elements of the ANOVA are hard to interpret. A standard ANOVA
five-way array were eliminated. The elements were performed in SAS pointed to the following model.
eliminated according to a fractional factorial design. Activity = A + B + C + D + E + AB + AC
For the remaining data, a PARAFAC model was built
with an algorithm that handles missing data. With the +AD+AE+BC+BC+BE+CD
model the activity of the left-out samples was pre- +CE+DE+ABC+ABD+ABE
dicted for both a one- and a two-component + ACE + ADC + ABDC + ABCE
PARAFAC model. The root mean square error of + ADE + ABDE + DCE + ADCE,
prediction, RMSEP, for a one-component model was A, B, C, D and E being the main effect of 0,, CO,,
12.6, while for two components an RMSEP of 5 1.9 temperature, substrate and pH, and e.g. ACD the in-
was found. teraction between 0,, temperature and substrate.
From the PARAFAC loadings it is possible to PARAFAC on the other hand suggested that the
predict the effect at any level of the factors investi- five-way multiplicative interaction term is sufficient
gated. To validate this, a PARAFAC model was made to model the variations
leaving out all samples with 20% oxygen. A curvefit
of the oxygen loading vector makes it possible to find Activity = ABCDE,
the oxygen effect of any level between 0 and 80%. or rather
For 20% the loading was estimated from a quadratic
Activity = ai bj ck d, e, .
curvefit of the loading vector. From this value and the
loadings of the remaining modes the 27 (1 X 3 X 3 It is interesting, that one of the few interactions not
x 3) left-out samples were predicted with an RM- significant in the ANOVA model is the five-way in-
SEP of 13.1. This shows that a completely general teraction term! Even though more sophisticated
model is obtained showing the effect of each factors ANOVA methods can be used, this example illus-
as simply a loading vector. trates, that choosing the right mathematical method
The data constitute a full factorial design, and can greatly influence the outcome, both with respect
analysis of variance is thus an obvious tool for inves- to prediction and interpretability. When using half the
tigating the influence of different factors. However, samples to estimate a model and predict the left-out
the problem is highly nonlinear and the results from samples an RMSEP of 35.3 and 12.6 was achieved for
R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 165
the ANOVA and the PARAFAC model respectively. baselines or other lower-order effects in a spectral
The reason for the better results with PARAFAC is decomposition.
simply that the underlying multiplicative model is
more appropriate for the enzymatic data, than is the
mathematical model underlying ANOVA. It is to be 9. Application II: Unique decomposition of sparse
expected that the main variation in the activity is fluorescence data
caused by something, that could be approximately
multiplicative. If pH is three little activity is ob- 9.1. Data
served no matter the oxygen level, but if pH is 6, the
activity of PPO is extremely dependent on oxygen. This problem is an illustrative example of the
unique decomposition obtained by PARAFAC using
8.3. Further modification of the model a nonnegativity constraint. The data set is part of an
investigation conducted by Claus A. Andersson at our
Mathematically the one-component PARAFAC laboratory. Two samples containing different
model could also be obtained by ANOVA by a loga- amounts of tyrosine, tryptophane and phenylalanine
rithmic transformation of the data, but PARAFAC were measured by fluorescence (excitation 250-300
offers an even more general model and is further- nm, emission 250-450 nm, 1 nm intervals). The ar-
more unique which is not the case for the ANOVA ray to be decomposed is hence 2 X 5 1 X 201. The
model. In an effort to make a better model a con- samples were measured on a PE LSSOB spectro-
strained modification of PARAFAC was used where fluorometer with excitation slitwidth of 2.5 nm, an
some loadings were forced to ones, thereby permit- emission slitwidth of 10 nm and a scanspeed of 1500
ting modeling of lower-order interactions and main rim/s.. Originally five samples were used, which
effects. The best model was found to consist of one could be decomposed by unconstrained PARAFAC
main additive effect, one four-way and one five-way without problems. However, to show how to incor-
interaction. This model, specifically porate external knowledge in the decomposition only
two of the samples are used here.
Xijk=ail +ai2bj2Ck2d/2+ai3bj3Ck3d13e~3
The theoretical multilinearity of fluorescence has
estimated from one of the replicate sets gave a model, been described in [17,28,59]. In Fig. 9 one of the two
that predicted the other replicate set with an RMSEP samples is shown. Notice the Rayleigh scatter in the
of 12.3. This as compared to the intrinsic error be- left part, which is not multilinear in its nature [60].
tween the two replicates, 11.9, and the error obtained Rayleigh scatter should be avoided in a multilinear
predicting with the one-component five-way interac- decomposition if possible, and there are three ways of
tion model, 13.4. However, the model only explained doing that: (i) Only measure the emission above the
one percent more of the variance than the one-com- excitation, if this wavelength area contains sufficient
ponent model, and hence the increased complexity
was judged not to be beneficial enough to justify the
model. The model was estimated by a three-compo-
nent PARAFAC where the loadings of the first com-
ponent were all fixed at the value one except in the
first mode. In the second component the loadings of
the fifth mode were forced to ones. Further investiga-
tion is now in progress trying to develop a general
multiplicative ANOVA model using PARAFAC as 3000
sketched here. Problems with defining degrees of 450
problems to be dealt with. It is noteworthy that this Fig. 9. A plot of the fluorescence of a sample containing Tyr, Trp
approach can also be used for estimating constant and Phe. Notice the Rayleigh scatter in the left comer.
166 R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171
Fig. 10. Estimated and true emission spectra: (a) Unconstrained PARAFAC, (b) NNLS PARAFAC, (c) PARAFAC with missing elements
and (d) missing elements and NNLS. In (d) the true spectra are also shown.
information; (ii) Use curvefitting in some form to es- are quite similar to the pure spectra of the analytes,
timate the emission in the neighborhood of the exci- but for tryptophane there is a small hump below 300
tation wavelengths; (iii) Measure a blank and sub- mn caused by the non-multilinear Rayleigh scatter.
tract this measurement from the sample measure- To avoid this it was tried to set all variables influ-
ment. This, however, can be problematic if the enced by Rayleigh scatter to missing values and then
Rayleigh scatter is mainly caused by particles in the estimate the corresponding PARAFAC model. The
sample. In this experiment nothing was done to elim- result can be seen in Fig. 10~. Apparently this alone
inate the Rayleigh scatter initially. is not sufficient to ensure a good curve resolution for
the tryptophane spectrum. Combining the missing el-
9.2. Results and discussion ements approach with the nonnegativity constraint
helps the model focuses on the right aspects of the
A three-component PARAFAC solution should data and the estimated loadings in Fig. 10d are shown
give the correct solution if the trilinear model is ap- together with the pure spectra. As seen the estimated
propriate. The emission loadings of a three compo- loadings are now quite similar to the pure spectra. The
nent PARAFAC solution is shown in Fig. 10a. The estimated excitation spectra are shown in Fig. 11.
spectrum corresponding to tryptophane has large The model precisely estimates the three pure spec-
negative areas. It was concluded that the decomposi- tra, even though there are only two independent sam-
tion was difficult due to the low variability (two ples, and the excitation spectra of tyrosine and tryp-
samples) and knowing that the fluorescence spectra tophane are very alike (correlation 0.93). According
and concentrations should be positive, it was natural to the rule mentioned in the paragraph on unique-
to constrain the PARAFAC loadings to positive val- ness, it is theoretically possible to estimate these three
ues. In Fig. lob the estimated emission loadings are different spectra correctly if only the concentrations
shown using nonnegativity constraints. The spectra vary independently pairwise and no spectra are lin-
R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 167
Table 3
Percentage of variance explained of the dependent variable (amino-N) of the test set. Each column correspond to a different model and each
row to the number of latent variables/components used. Bold numbers indicate variance explained for candidate models, and the numbers
in parentheses are the percentage of variance explained of the three-way array of independent variables in the test set (fluorescence spectra)
PARAF AC PARAF AC PARAF AC Two-way PLS Three-way PLS Tucker PCR
(raw) (meancentered) WNLS)
1 LV 84.0 84.1 84.0 84.4 84.2 84.3 83.9
2LV 85.4 85.4 85.5 86.6 86.1 84.8 85.7
3LV 85.2 85.4 85.2 88.5 88.9 85.3 86.8
4LV 87.1 85.1 86.8 91.6 91.4 88.0 87.2
5LV 91.2 (99.8) 90.7 (99.9) 91.1(99.8) 91.9 (96.0) 92.3 (95.7) 87.7 (99.8) 88.0 (99.9)
excitation and emission loadings from the PARAFAC models. The constraints imposed in PLS and N-PLS
model and from these the estimated amino-N content seem to be more adequate. Both give more predictive
was determined from the regression model. For com- models for amino-N. Both models fit the spectral data
parison the results of using multi-way PLS (N-PLS, poorer than the pure decomposition methods, which
[62]), ordinary two-way PLS, Tucker3 regression, and is expectable due to the constraints of the scores hav-
two-way principal component regression (PCR) are ing maximal covariance with the dependent variable.
also calculated. N-PLS is a multi-way calibration N-PLS uses only a fraction of the number of parame-
model. By N-PLS the array of independent variables ters that PLS uses to model the spectral data, so in a
is sequentially decomposed to a multi-linear model, mathematical sense, N-PLS obtains optimal predic-
in such a way that the scores have maximal covari- tions with the most simple model. Therefore one can
ante with the yet unexplained variation of the depen- argue, that the N-PLS model is the most appropriate
dent variable. The Tucker regression was performed model. However, the N-PLS model does not possess
by decomposing the raw data with a Tucker3 model the uniqueness properties of PARAFAC. One might
using the same number of components in each mode therefore also argue that the five-component nonneg-
and using the loadings of the sample mode for re- ativity constrained PARAFAC model is preferable, if
gression. PCR was performed using the successively the found loadings can be related to specific chemi-
estimated score vectors from a PCA model for re- cal analytes; an issue that will not be further investi-
gression. The spectral data was meancentered prior to gated here.
the PCA decomposition. Using PARAFAC for regression as shown here has
The results from the calibration models are shown the potential for simultaneously providing a model,
in Table 3. Several interesting aspects are illustrated that predicts the dependent variable, and precisely
here. All models obtain optimal or near-optimal pre- describes which phenomena in the independent vari-
dictions around five components. PLS and N-PLS ables, that are crucial for describing the variations in
seem to perform slightly better than the other meth- the dependent variable. However the little experience
ods and furthermore using fewer components. All obtained so far in our laboratory indicates that often,
pure decomposition methods (PARAFAC, Tucker3, one is better of by focusing on either decomposition
PCA) describe approximately 99.8% of the spectral (PARAFAC) or calibration (N-PLS). Purely spectral
variation using five components. Even though the data as here is the only type of data, where there
PCA and Tucker3 models are more complex and seems to be little differences in the predictive ability.
flexible than PARAFAC the flexibility apparently
does not contribute to better modeling of the spectra. 10.3. M’n’M
Combining this with the fact that the PARAFAC re-
gression models outperform both Tucker3 and PCA, All calculations were done on a 133 MHz Dell PC
very well illustrates that when PARAFAC is ade- with 32 Mb RAM. The PARAFAC algorithm was
quate there is no advantage of using more complex made in the mathematical software Matlab for Win-
R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 169
dows 4.2c.l (Mathworks). This implementation their data sets. Anonymous referees are thanked for
works with arrays up to ten ways. It also contains the helpful suggestions.
possibility to constrain loadings to be orthogonal or
nonnegative and handles missing data. The algorithm
is available from the Internet at http:\\ References
newton.foodsci.kvl.dk \ foodtech.html. Also avail-
able are M-files for PARAFAC and Tucker3 made by [l] R.A. Harshman, Foundations of the PARAFAC procedure:
Claus A. Andersson, and N-PLS by R. Bro. Other Model and conditions for an ‘explanatory’ multi-mode factor
programs for PARAFAC modeling are also avail- analysis, UCLA Working Papers in phonetics 16 (1970) 1.
[2] J.D. Carroll, .I. Chang, Analysis of individual differences in
able. Richard A. Harshman, Dept. Psychology, So-
multidimensional scaling via an N-way generalization of and
cial Science Center, University Western Ontario, Eckart-Young decomposition, Psychometrika 35 (1970) 283.
London, Ontario, Canada N6A 5C2 has a very gen- [3] P. Geladi, Analysis of multi-way (multi-mode) data,
eral PARAFAC program for three-way analysis Chemom. Intell. Lab. Syst. 7 (1989) 11.
which runs in batch mode on PC’s Rob Ross, at 141A.K. Smilde, Three-way analyses. Problems and prospects,
Chemom. Intell. Lab. Syst. 5 (1992) 143.
http://www.biosci.ohio-state.edu/ N rtr/multilin/
[51 P.M. Kroonenburg, Three-mode principal component analy-
muldoc.html offers fortran code for PARAFAC, and sis, Theory and applic ations, DSWO Press, Leiden, 1983.
Pentti Paatero, Dept. Physics, University of Helsinki, [61H.A.L. Kiers, Hierarchical relations among three-way meth-
BOX 9, FIN-00014 University, Helsinki, Finland, has ods, Psychometrika 56 (1991) 449.
made a program for two- and three-way PARAFAC [71 M.B. Seasholz, B.R. Kowalski, The parsimony principle ap-
plied to multivariate calibration, Anal. Chim. Acta 277 (1993)
which incorporates nonnegativity constraints and
165.
weighted regression. P.M. Kroonenburg’s latest ver-
Dl R.A. Harshman, S.A. Berenbaum, Basic concepts underlying
sion of his three-mode toolbox now contains the the PARAFAC-CANDECOMP three-way factor analysis
PARAFAC model. The program runs in DOS mode. model and its application to longitudinal data, in: D.H. Ei-
Orders should be sent to P.M. Kroonenburg, Dept. chom, J.A. Clausen, N. Haan, M.P. Honzik, P.H. Mussen
(Eds.), Present and past in middle life, Academic Press, NY,
Education, Leiden University, Wassenaarseweg 52,
1981, pp. 435-459.
2333 AK Leiden, The Netherlands.
191D.S. Burdick, An introduction to tensor products with appli-
cations to multiway data analysis, Chemom. Intell. Lab. Syst.
28 (1995) 229.
11. Conclusion [lo] I. Scarminio, M. Kubista, Analysis of correlated spectral data,
Anal. Chem. 65 (1993) 409.
[ll] L. Sarabia, M.C. Ortiz, R. Leardi, G. Drava, A program for
The PARAFAC model and its estimation has been non-orthogonal factor analysis, Trends Anal. Chem. 12 (1993)
described and its application for ANOVA, curve-res- 226.
olution and calibration has been exemplified. It is my [:121 N.M. Faber, M.C. Buydens, G. Kateman, Generalized rank
hope that this tutorial might encourage others to in- annihilation method I: derivation of eignevalue problems, J.
Chemom. 8 (1994) 147.
vestigate multi-way methods. Multi-way methods
131 R.A. Harshman, Determination and proof of minimum
have many advantages (and of course shortcomings) uniqueness conditions for PARAFACI, UCLA Working Pa-
that have not yet been fully acknowledged. pers in phonetics 22 (1972) 111.
[14] J.B. Kruskal, More factors than subjects, tests and treat-
ments: An indeterminacy theorem for canonical decomposi-
tion and individual differences scaling, Psychometrika 41
Acknowledgements
(1976) 281.
[15] J.B. Kruskal, Three-way arrays: Rank and uniqueness of tri-
Rasmus Bro is grateful for support and inspiration linear decomposition, with application to arithmetic complex-
from and funds to Professor Lam Munck from Nordic ity and statistics, Linear Algebra Appl. 18 (1977) 95.
Industry Foundation project P93149 and the F@TEK [16] R.B. Cattell, Parallel proportional profiles and other princi-
ples for determining the choice of factors by rotation, Psy-
foundation. Claus A. Andersson, Age K. Smilde and
chometrika 9 (1944) 267.
Henk Kiers are thanked for numerous suggestions [17] S. Leurgans, R.T. Ross, R.B. Abel, A decomposition for
during this work. Lam Norgaard, Hanne Heimdal and three-way arrays, SIAM J. Matrix Anal. Appl. 14 (1993)
Claus A. Andersson are thanked for letting me use 1064.
170 R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171
[18] J.B. Km&al, Rank, decomposition, and uniqueness for 3-way [39] J.M.F. ten Berge, Convergence of PARAFAC preprocessing
and N-way arrays, in: R. Coppi, S. Bolasco (Eds.), Multiway procedures and the Deming-Stephan method of iterative pro-
data analyses, Elsevier Science Pub., North-Holland, 1989. portional fitting, in: R. Coppi, S. Bolasco (Eds.), Multiway
[19] R.A. Harshman, M.E. Lundy, The PARAFAC model for data analyses, Elsevier Science Pub., North-Holland, 1989.
three-way factor analysis and multidimensional scaling, in: [40] J.B. Kruskal, Multilinear methods , Proc.Symp. Appl. Math.
H.G. Law, C.W. Snyder, J.A. Hattie, R.P. McDonald (Eds.), 28 (1983) 75.
Research methods for Multimode data analysis, Praeger, New [41] R.A. Harshman, M.E. Lundy, Data preprocessing and the ex-
York, 1984. tended PARAFAC model, in: H.G. Law, C.W. Snyder, J.A.
[20] J.M.F. ten Berge, H.A.L. Kiers, J. de Leeuw, Explicit Cande- Hattie, R.P. McDonald (Eds.), Research methods for Mulit-
comp/PARAFAC solution for a contrived 2 X 2 X 2 array of mode data analysis, Praeger, New York, 1984.
rank three, Psychometrika 53 (1988) 579. [42] R.D. Cook, S. Weisberg, Residuals and influence in regres-
[21] J.M.F. Ten Berge, Kruskal’s polynomial for 2X2X2 arrays sion, Chapman and Hall Ltd, New York, 1982.
and a generalization to 2 XnXn arrays, Psychometrika 56 [43] A.K. Smilde, D.A. Doombos, Simple validatory tools for
(1991) 631. judging the predictive performance of PARAFAC and three-
[22] H.A.L. Kiers, W.P. Krijnen, An efficient algorithm for way PLS, J. Chemom. 6 (1992) 11.
PARAFAC of three-way data with large numbers of observa- [441 S.R. Durell, C. Lee, R.T. Ross, E.L. Gross, Factor analysis
tion units, Psychometrika 56 (1991) 147. of the near-ultraviolet absorption spectrum of plastocyanin
[23] R.A. Harshman, M.E. Lundy, PARAFAC: Parallel factor using bilinear, trilinear and quadrilinear models, Arch.
analysis, Comp. Stat. Data Anal. 18 (1994) 39. Biochem. Biophys. 278 (1990) 148.
[24] R. Sands, F.W. Young, Component models for three-way [45] B.C. Mitchell, D.S. Burdick, Slowly converging PARAFAC
data: An alternating least squares algorithm with optimal sequences: Swamps and two-fator degeneracies, J. Chemom.
scaling features, Psychometrika 45 (1980) 39. 8 (1994) 155.
[25] D.S. Burdick, X.M. Tu, L.B. McGown, D.W. Millican, Reso [46] W.P. Krijnen, The analysis of three-way arrays by con-
lution of multicomponent fluorescent mixtures by analysis of strained PARAFAC methods, Ph.D. thesis, University of
excitation-emission-frequency array, J. Chemom. 4 (1990) 15. Groningen, 1993.
[26] E. Sanchez, B.R. Kowalski, Tensorial resolution: A direct tri- [47] J.B. Kruskal, Multilinear methods, in: H.G. Law, C.W. Sny-
linear decomposition, J. Chemom. 4 (1990) 29. der, J.A. Hattie, R.P. McDonald (Eds.), Research methods for
[27] S. Li, P.J. Gemperline, Eliminating complex eigenvectors and Mulitmode data analysis, Praeger, New York, 1984.
eigenvalues in multiway analyses using the direct trilinear 1481 J.B. Kruskal, R.A. Harshman, M.E. Lundy, How 3-MFA data
decomposition method, J. Chemom. 7 (1993) 77. can cause degenerate PARAFAC solutions, among other rela-
[28] S. Leurgans, R.T. Ross, Multilinear models in spectroscopy, tionships, in: R. Coppi, S. Bolasco (Eds.), Multiway data
Statist. Sci. 7 (1992) 289. analyses, Elsevier Science Pub., North-Holland, 1989.
[29] B.C. Mitchell, D.S. Burdick, An empirical comparison of [49] A.K. Smilde, Y. Wang, B.R. Kowalski, Theory of medium-
resolution methods for three-way arrays, Chemom. Intell. rank second-order calibration with restricted-Tucker models,
Lab. Syst. 20 (1993) 149. J. Chemom. (1994) 21.
[30] X.M. Tu, D.S. Burdick, Resolution of trilinear mixtures: Ap- [50] A.K. Smilde, R. Tauler, J.M. Henshaw, L.W. Burgess, B.R.
plication in spectroscopy, Stat. Sinica 2 (1992) 577. Kowalski, Multicomponent determination of chlorinated hy-
[31] N. Cliff, Orthogonal rotation to congruence, Psychometrika drocarbons using a reaction-based chemical sensor. 3.
31 (1966) 33. Medium-rank second-order calibration with restricted Tucker
[32] CL. Lawson, R.J. Hanson, Solving least squares problems, models, Anal. Chem. 66 (1994) 3345.
Classics in Appl. Math., No. 15, SIAM, Philadelphia, 1995. [51] R.A. Harshman, PARAFAC2: Mathematical and technical
[33] R.J. Hanson, Linear least squares with bounds and linear notes, UCLA Working Papers in Phonetics 22 (1972) 30.
constraints, SIAM J. Sci. Stat. Comput. 7 (1986) 826. [52] H.A.L. Kiers, An alternating least squares algorithm for
[34] H. Splth, Mathematical algorithms for linear regression, PARAFAC2 and three-way DEDICOM, Comp. Stat. Data
Academic Press, Inc., Boston, 1987. Anal. 16 (1993) 103.
[35] J.L. Barlow, Error analysis and implementation aspects of 1531 J.R. Kettenring, A case study in data analysis, Proc. Symp.
deferred correction for equality constrained least squares Appl. Math. 28 (1983) 105.
problems, SIAM J. Numer. Anal. 25 (1988) 1340. [541 A. Aastveit, H. Martens, ANOVA interactions by partial least
[36] J.L. Barlow, N.K. Nichols, R.J. Plemmons, Iterative methods squares regression, Biometrics 42 (1984) 829.
for equality-constrained least squares problems, SIAM J. Sci. 1551 H. Martens, L. Izquierdo, M. Thomassen, M. Martens, Par-
Stat. Comput. 9 (1988) 892. tial least squares regression on design variables as an altema-
[37] J.D. Carroll, S. Pruzansky, J.B. Kruskal, Candelinc: A gen- tive to analysis of variance, Anal. Chim. Acta 191(1986) 133.
eral approach to multidimensional analysis of many-ways ar- [561 M.V. Martinez, J.R. Whitaker, The biochemistry and control
rays with linear constraints on parameters, Psychometrika 45 of enzymatic browning, Trends Food Sci. Technol. 6 (1995)
(1980) 3. 195.
[38] R.T. Ross, S. Leurgans, Component resolution using multi- [57] H. Heimdal, L.M. Larsen, L. Poll, R. Bro, Oxidation of
linear models, Methods Enzymol. 246 (1995) 679. chlorogenic acid and ( - )-epicatiechin by letucce polyphenol
R. Bro/Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 171
oxidase in model solutions at various combinations of 0,. [60] G.W. Ewing, Instrumental methods of chemical analysis,
CO,, Temperature and pH, J. Agric. Food Chem., in press. McGraw-Hill Book Company, NY, 1985.
[58] R. Bro, H. Heimdal, Enzymatic browning of vegetables, Cal- [61] L. Norgaard, A multivariate chemometric approach to fluo-
ibration and analysis of variance by multiway methods, rescence spectroscopy, Talanta 42 (1995) 1305.
Chemom. Intell. Lab. Syst, 34 (1996) 85. [62] R. Bro, Multiway calibration, Multilinear PLS, J. Chemom.
[59] L. Norgaard, Classification and prediction of quality and pro- 10 (1996) 47.
cess parameters of thick juice and beet sugar by fluorescence
spectroscopy and chemometrics, Zuckerind. 120 (1995) 970.