Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
101 views23 pages

PARAFAC. Tutorial and Applications: Elsevier

Uploaded by

Jason Stanley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views23 pages

PARAFAC. Tutorial and Applications: Elsevier

Uploaded by

Jason Stanley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Chemometrics and

intelligent
laboratory systems
ELSEVIER Chemomemcs and Intelligent Laboratory Systems 38 (1997) 149-171

Tutorial

PARAFAC. Tutorial and applications


Rasmus Bro *
Chemometrics Group, Food Technology, Royal Veterinary & Agricultural Vniuersi& Rolighedsuej 30, Ill, DK-1958 Frederiksberg C,
Denmark

Received 4 April 1996; revised 30 July 1996; accepted 8 March 1997

Abstract

This paper explains the multi-way decomposition method PARAFAC and its use in chemometrics. PARAFAC is a gener-
alization of PCA to higher order arrays, but some of the characteristics of the method are quite different from the ordinary
two-way case. There is no rotation problem in PARAFAC, and e.g., pure spectra can be recovered from multi-way spectral
data. One cannot as in PCA estimate components successively as this will give a model with poorer fit, than if the simultane-
ous solution is estimated. Finally scaling and centering is not as straightforward in the multi-way case as in the two-way
case. An important advantage of using multi-way methods instead of unfolding methods is that the estimated models are
very simple in a mathematical sense, and therefore more robust and easier to interpret. All these aspects plus more are ex-
plained in this tutorial and an implementation in Matlab code is available, that contains most of the features explained in the
text. Three examples show how PARAFAC can be used for specific problems. The applications include subjects as: Analysis
of variance by PARAFAC, a five-way application of PA&WAC, PAFUFAC with half the elements missing, PARAFAC
constrained to positive solutions and PARAFAC for regression as in principal component regression.

Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
2. Nomenclature ...................... . . . . .. . . . . .. . . . .. . . . .. , 151
3. Themodel ........................ .. .. . . . . . .. . . . .. . . .. . . . 152
3.1. Uniqueness .................... 152
3.2. Rank of multi-way arrays. ............ 153
4. Implementation ............ . . . . . . . . . . ...... ...... 153
4.1. Alternating least squares .... . . *. . . . . . . ...... ...... 153
4.1.1. Compressing ...... . . . . . . . . . . ...... ...... 155
4.1.2. Extrapolating ...... . . . . . . . . . . . . . . .. ...... 155
4.1.3. Initialization ...... . . .. . . . . . . ...... ...... 155
4.2. Stopping criterion ........ . . . . . . . . . ...... ...... 156
4.3. Constraining the solution .... . . . . . . . . . . ...... ...... 156
4.4. Missing values ......... . . . . . . . . . . ...... ...... 157
5. preprocessing ......................................... 157

* E-mail: [email protected].

0169-7439/97/$17.00 Copyright 0 1997 Elsevier Science B.V. All rights reserved


PII SO169-7439(97)00032-4
1.50 R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-I 71

6. Assessing the solution. ............................................. 159


6.1. Postprocessing .............................................. 159
6.2. Leverages and residuals. ......................................... 159
6.3. Number of components. ......................................... 159
6.4. Degenerate solutions ........................................... 160

7. Types of data suitable for PARAFAC analysis ................................. 162

8. Application I: Analysis of variance ....................................... 162


8.1.Data .................................................... 162
8.2. Results and discussion .......................................... 163
8.3. Further modification of the model .................................... 165

9. Application II: Unique decomposition of sparse fluorescence data ...................... 165


9 .l. Data .................................................... 165
9.2. Results and discussion .......................................... 166

10. Application III: Prediction of amino-N in sugar samples fromfluorescence .................. 167
10.1. Data .................................................... 167
10.2. Results and discussion .......................................... 167
10.3. M’n’M .................................................. 168

11. Conclusion .................................................... 169

Acknowledgements. ................................................. 169

References ...................................................... 169

1. Introduction In practice many other types of data might be multi-


way: two-way data determined for several chemical
PARAFAC is a multi-way method originating treatments, pH’s, times, locations, etc. An important
from psychometrics [ 1,2]. It is gaining more and more way of generating multi-way data is of course im-
interest in chemometrics and associated areas for ages in all its bearings.
many reasons: Simply increased awareness of the PARAFAC is one of several decomposition meth-
method and its possibilities, the increased complexity ods for multi-way data. The two main competitors are
of the data dealt with in science and industry, and in- the Tucker3 method [5], and simply unfolding of the
creased computational power [3,4]. multi-way array to a matrix and then performing
Multi-way data are characterized by several S&S of standard two-way methods as PCA. The Tucker3
variables that are measured in a crossed fashion. method should rightfully be called three-mode prin-
Chemical examples could be fluorescence emission cipal component analysis (or N-mode principal com-
spectra measured at several excitation wavelengths ponent analysis), but here the term Tucker3 or just
for several samples, fluorescence lifetime measured at Tucker will be used instead. PARAFAC, Tucker and
several excitation and emission wavelengths or any two-way PCA are all multi- or bi-linear decomposi-
kind of spectrum measured chromatographically for tion methods, which decompose the array into sets of
several samples. Determining such variables will give scores and loadings, that hopefully describe the data
rise to three-way data; i.e., the data can be arranged in a more condensed form than the original data ar-
in a cube instead of a matrix as in standard multi- ray. There are advantages and disadvantages with all
variate data sets. In psychometrics a typical data set the methods, and often several methods must be tried
could be a set of variables measured on several per- to find the most appropriate.
sons/subjects on several occasions. Similar configu- Without going into details of two-way PCA and
rations can be imagined in for example sensometrics. Tucker it is important to have a feeling for the hier-
R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 151

archy among these methods. Kiers [6] shows that atic part of the data. In the same way multi-way
PARAFAC can be considered a constrained version methods are less sensitive to noise and further give
of Tucker3, and Tucker3 a constrained version of loadings that can be directly related to the different
two-way PCA. Any data set that can be modeled ad- modes of the multi-way array. That two-way PCA can
equately with PARAFAC can thus also be modeled give very complex models can be illustrated with an
by Tucker3 or two-way PCA, but PARAFAC uses example. For an F-component PCA solution to an I
fewer degrees of freedom. A two-way PCA model X J X K array unfolded to an Z X JK matrix, the
always fits data better than a Tucker3 model, which PCA model consists of F( I + JK >parameters (scores
again will fit better than a PARAFAC model, all ex- and loading elements). A corresponding Tucker
cept for extreme cases where the models may fit model with equal number of components in each
equally well. If a PARAFAC model is adequate, mode would consist of F(Z + .Z + K) + F 3,and
Tucker3 and two-way PCA models will tend to use PARAFAC F(Z + J + K) parameters. For a hypo-
the excess degrees of freedom to model noise or thetical example consider a 10 X 100 X 20 array
model the systematic variation in a redundant way modeled by a 5 component solution. A two-way PCA
(see the last application). Therefore one will gener- model of the 10 X 2000 unfolded array consists of
ally prefer to use the simplest possible model. This 10050 parameters, a Tucker model of 775 and a
principle of using the simplest possible model is old, PARAFAC model of 650 parameters. Clearly, the
in fact dating back as long as to the fourteenth cen- PCA model will be more difficult to interpret than the
tury (Occam’s razor), and is now also known as the multi-way models.
law or principle of parsimony [7]. In the sense that it In this paper a tutorial of how to use PARAFAC
uses most degrees of freedom the PCA model can be is given. The interest in PARAFAC and related
considered the most complex and flexible model, methods is often hampered by practical considera-
while PARAFAC is the most simple and restricted tions regarding how to implement the algorithm, how
model. to do sound analysis etc. Many excellent papers on
Conceptually some may find two-way PCA more PARAFAC are not published in readily available pa-
simple than the multi-linear methods, but in a multi- pers. The essence of some of these papers is pre-
way context this is not so. Because the array has to sented. A very annoying characteristic of PARAFAC
be unfolded to a matrix before two-way analysis, the is the long time required to calculate the models. The
variables in the unfolded modes get mixed up, so that algorithms used are most often based on alternating
the effect of one variable is not associated with one least squares (ALS) initialized by either random val-
but many elements of a loading vector. Consider an ues or values calculated by a direct trilinear decom-
even more complex model than two-way PCA, e.g. a position based on the generalized eigenvalue prob-
model that does not assume any structure at all but lem. Here the ALS algorithm of PARAFAC is modi-
models each data element individually. This model fied in simple manners, which brings about a de-
would equal the data and obviously use all degrees of crease in the number of iterations and time required
freedom, giving a perfect fit. Thus, the more struc- to calculate the models of up to 20 times.
ture the poorer the fit is and the simpler the model is. In the following, the discussion will be limited to
It is apparent that the reason for using multi-way three-way data for simplicity, but most results are
methods is not to obtain better fit, but rather more valid for data and models of any (higher) order. Three
adequate, robust and interpretable models. This can to applications will show some typical applications of
some extent be compared to the difference between PARAFAC and also include higher order models.
using multiple linear regression (MLR) and partial
least squares regression (PLS) for multivariate cali-
bration. MLR is known to give the best fit to the de- 2. Nomenclature
pendent variable of the calibration data, but in most
cases PLS has better predictive power. PLS can be In the following, scalars are indicated by lower-
seen as a constrained version of MLR, where the case italics, vectors by bold lower-case characters,
constraints helps the model focusing on the system- bold capitals are used for two-way matrices, and un-
152 R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171

derlined bold capitals for three-way arrays. The let- minimize the sum of squares of the residuals, eijk in
ters I, .Z, K, L and M are reserved for indicating the the model
dimension of different modes. The ijkth element of X
is called xijk. The terms mode, way and order a& (1)
used more or less interchangeably though a distinc-
tion is sometimes made between the geometrical di-
This equation is shown graphically in Fig. 1 for two
mension of the hypercube - the number of ways -
components (F = 2).
and the number of independent ways - which is the
The model can also be written
order/mode [3,6]. An ordinary two-way covariance
F
matrix is only a one-mode array, because the vari-
ables are identical in the two ways. Likewise there -X= xaf8bf@cf
f= 1
will not be distinguished between the terms factor and
component. When three-way arrays are unfolded to where af, bf and cf are the fth columns of the load-
matrices the following notation will be used: If X is ing matrices A, B and C respectively [9].
an ZXJXKarrayandisunfoldedtoan ZXJKma-
trix the order of .Z and K indicates which indices are 3.1. Uniqueness
running fastest. In this case the indices of .Z are run-
ning fastest, meaning that the first .Z rows of X con- An obvious advantage of the PARAFAC model is
tain all variables for k = 1 and for j = 1 to j = J. the uniqueness of the solution. In bilinear methods
there is a well-known problem of rotational freedom.
The loadings in a spectral bilinear decomposition re-
3. The model flect the pure spectra of the analytes measured, but it
is not possible without external information to actu-
PARAFAC is a decomposition method, which ally find the pure spectra because of the rotation
conceptually can be compared to bilinear PCA, or problem. This fact has prompted a lot of different
rather it is one generalization of bilinear PCA, while methods for obtaining more interpretable models than
the Tucker3 decomposition is another generalization PCA and models alike [lo-121, or for rotating the
of PCA to higher orders [8,9]. The model was inde- PCA solution to more appropriate solutions. Most of
pendently proposed by Harshman [l] and by Carroll these methods, however, are more or less arbitrary or
and Chang [2] who named the model CANDECOMP have ill-defined properties. This is not the case in
(canonical decomposition). A decomposition of the PARAFAC. If the data is indeed trilinear, the true
data is made into triads or trilinear components, but underlying spectra (or whatever constitute the vari-
instead of one score vector and one loading vector as ables) will be found if the right number of compo-
in bilinear PCA, each component consists of one nents is used and the signal-to-noise ratio is appro-
score vector and two loading vectors. It is common priate [13-B]. This important fact is what originally
three-way practice not to distinguish between scores initiated R. A. Harshman to develop the method based
and loadings as these are treated equally numerically. on an idea from 1944 [16]. It is a very strong feature,
A PARAFAC model of a three-way array is given which gives the PARAFAC model an unsurpassed
by three loading matrices, A, B, and C with elements advantage.
aif, bjf, and ckf. The trilinear model is found to Leurgans et al. [ 171 among others have shown, that

Fig. 1. A graphical representation of a two-component PARAFAC model of the data army -X.
R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 153

unique solutions can be expected if the loading vec- three vectors [9]. The rank of a three-way array is
tors are linear independent in two of the modes, and equal to the minimal number of triads necessary to
furthermore in the third mode the less restrictive describe the array. For a 2 X 2 X 2 array it turns out,
condition is that no two loading vectors are linearly that the maximal rank is three! This means that there
dependent. A good example of this is given in the exist 2 X 2 X 2 arrays that cannot be described using
second application below. Kruskal [ 15,181 gives even only two components. An example can be seen in
less restrictive conditions for when unique solutions [20]. For a 3 X 3 X 3 array the maximal rank is five
can be expected. He uses the k-rank of the loading (see for example [ 181). These results may seem
matrices, which is a term introduced by Harshman strange, but are due to the special structure of the
and Lundy [ 191. If any combination of k’ columns of multilinear model compared to the bilinear. Further-
A has full column-rank, and this does not hold for k’ more Kruskal has shown that if for example 2 X 2 X 2
+ 1, then the k-rank of A is k’. The k-rank is thus arrays are generated randomly from any reasonable
related, but not equal, to the rank of the matrix, as the distribution, the volumes or probabilities of the array
k-rank can never exceed the rank. Kruskal proves that being of rank two or three are both positive. This as
if opposed to two-way matrices where only the full-rank
case has positive volume. The practical implication of
k1+k2+k3>2F+2,
this is yet to be seen, but the rank of an array might
then the PARAFAC solution is unique. k’ is the k-
have importance when one wants to create a multi-
rank of A, k* is the k-rank of B, k3 is the k-rank of
way array in a parsimonious way, yet still with suffi-
C and F is the number of PARAPAC components
cient dimensions to describe the phenomena under
sought.
investigation. It is already known, that unique de-
The mathematical meaning of uniqueness is that
compositions can be obtained even for arrays where
the estimated PARAFAC model cannot be rotated
the rank exceeds any of the dimensions of the differ-
without a loss of fit, as opposed to two-way analysis
ent modes. It has been reported that a ten factor model
where one may rotate scores and loadings without
was uniquely determined from an 8 X 8 X 8 array
changing the fit of the model. A unique solution
[ 1,14,19]. This shows that parsimonious arrays might
therefore means, that no restrictions are necessary to
contain sufficient information for quite complex
identify estimate the model apart from trivial varia-
problems, specifically that the three-way decomposi-
tions of scale and column order. For appropriate
tion is capable of withdrawing more information from
noise, i.e. random and not to severe, it also holds that
data than two-way PCA. Unfortunately there are not
the true underlying trilinear model will be the model
explicit rules for determining the maximal rank of
with the best fit. Therefore the true and estimated
arrays in general, except for the two-way case, and
models must coincide when the right number of
some simple three-way arrays.
components is chosen.

3.2, Rank of multi-way arrays


4. Implementation
An issue that is quite astonishing at first is the rank 4.1. Alternating least squares
of multi-way arrays. Little is known in detail but
Kruskal [ 15,18], ten Berge et al. [20] and ten Berge The solution to the PARAFAC model can be found
[21] have worked on this issue. A 2 X 2 matrix has by alternating least squares (ALS) by successively
the maximal rank two. In other words: Any 2 X 2 assuming the loadings in two modes known and then
matrix can be expressed as a sum of two rank one estimating the unknown set of parameters of the last
matrices, two principal components for example. A mode. This is also how the model was initially pro-
rank-one matrix can be written as the outer product posed to be estimated. Consider a 2 X 2 X 2 array
of two vectors (a score and a loading vector). Such a sliced into two 2 X 2 matrices as shown in Fig. 2.
component is called a dyad. A triad is the trilinear Consider then a one-component PARAFAC model
equivalent to a dyad, namely a trilinear (PARAFAC) of this array. This model can also be written in terms
component, which is given by the tensor product of of two bilinear models as shown in Fig. 3.
154 R. Bro / Chemometrics and Intelligent Laboratory System 38 (1997) 149-171

Fig. 2. A 2X 2X2 three-way array -X can be represented by two


matrices Xl and X2.

This way of representing the three-way model as b’cl b’c2


two two-way models can be further modified by sim-
ply unfolding the array, i.e., concatenate the two ma-
trices Xl and X2 and correspondingly modify the
loading vectors (Fig. 4). All three versions are equiv-
alent and merely different graphical formulations of
the same model. Fig. 4. The principle of unfolding applied to a three-way array (and
If an estimate of b and c is given, it is now easily tbe corresponding one-component PARAFAC model).

seen that a can be determined by the least-squares


solution to the model a(b @I c) = X, where (b 8 c) (2) Estimate A from 5, B and C by least squares
is interpreted as the row vector obtained as the prop- regression
erly arranged tensor product of the vectors b and c (3) Estimate B likewise
and X is the unfolded array of size I X JK as shown (4) Estimate C likewise
in Fig. 4. If the vector (b @ c) is called z or Z in (5) Continue from 2 until convergence (little
case of more than one component, the model defin- change in fit or loadings).
ing A is A is an Z X F matrix containing in its fth column
X=AZ the fth loading vector. B and C are defined likewise.
The conditional least squares estimate of A is In step 2 X is unfolded to an Z X JK matrix and
the fth row ii the F X JK matrix Z is defined as
A=XZ’(ZZ’)-’
zf= (bf@CJ.
The general PARAFAC ALS algorithm can be writ-
ten The estimate of A is then determined as shown above.
(0) Decide on the number of components, F For estimating e.g., B, X is unfolded to an J X ZK
(1) Initialize B and C matrix and Z becomes i F X ZK matrix calculated
from A and C. B is then found as XZ’(ZZ’)-‘. For
three-way PARAFAC computationally efficient for-
mulations can be seen in e.g. [22]. The ALS algo-
rithm will, in each iteration, improve (or not worsen)
the fit of the model. If the algorithm converges to the
global minimum, which is most often the case for
well-behaved problems, the least-squares solution to
the model is found.
ALS is an attractive method because it ensures an
improvement of the solution in every iteration, but a
major drawback of ALS, is the time required to esti-
mate the models, especially when the number of
variables is high. Several hundred or thousands of it-
erations are sometimes necessaty before convergence
Fig. 3. A trilinear decomposition expressed as either a model of the is achieved. With a data array of size 50 X 50 X 50 a
three-way array or two models of two two-way arrays. model might very well take hours to calculate on a
R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-I 71 155

moderate PC (depending on implementation and compressed array can of course be compressed fur-
convergence criterion of course). This is problematic ther in another mode.
when recalculation of the model is necessary, which The compression of modes has been implemented
is often the case e.g. during outlier detection. To so that compression is done whenever the number of
make PAPAFAC a workable method it is therefore of variables in one mode exceeds the number of factors
utmost importance to develop faster algorithms. Us- sought with ten. If one mode consists of 20 variables
ing more computer power could of course solve the and a 5 factor model is estimated this mode is thus
problem but there is an annoying tendency of the data compressed as the dimension of the mode (20) ex-
sets always to be a little larger, than what is optimal ceeds 5 + 10. The number of principal components to
for the current computer power. In the implementa- compute is set to the number of factors in the
tion used here two acceleration methods have been PARAFAC model plus two. These settings work for
built in (for others see e.g. [1,22,23]). many types of problems often encountered in our re-
search group, although sometimes other settings may
4.1. I. Compressing be optimal, because the optimal settings depend on
Like in bilinear PCA one most often seeks a low- the type of data investigated (primarily the signal-
dimensional representation of a high-dimensional ar- to-noise ratio). When implementing PCA, one has to
ray in PARAFAC. This implies, that the data array is pay attention to the time demand of the PCA algo-
redundant, i.e., there is collinearity between the vari- rithm. Working on cross-product matrices instead of
ables. Consider a 5 X 6 X 500 array, where the third the raw data can speed up the algorithm substan-
500dimensional mode might be spectral. Using ALS tially, if one mode of the unfolded array is very large
on such an array is computationally costly. From the compared to the other mode.
theory of PCA it is known, that the variations in the When a nonnegativity constraint is used (see later)
spectra can be well represented by a low-dimen- compressing by PCA is not appropriate. Instead one
sional score matrix, that contains the main systematic can use a subset of the original variables to estimate
part of the variations. If the data array is unfolded an initial model. This submodel can be found on a
keeping the high-dimensional mode intact, one ob- smoothed version of the original data to ensure that
tains a 30 X 500 dimensional matrix. By two-way important aspects are not missing.
PCA we can describe most of the variation in this
matrix by a score matrix, of say, dimension 30 X 5. 4.1.2. Extrapolating
Folding back the matrix to a three-way array, the ar- Another method for speeding up the ALS algo-
ray now has the dimensions 5 X 6 X 5. Calculating rithm is to use the ‘temporal’ information in the iter-
the PARAFAC model on this low-dimensional array ations. The simple idea is to perform a predefined
takes only a fraction of the time required to calculate number of cycles of ALS-iterations and then these
the PARAFAC model of the high-dimensional array. estimates of the loadings are used to predict new es-
The estimated model is only describing the score ma- timates elementwise. There are two good reasons for
trix and not the original array, but it is only in the using the temporal information in the iterations of the
compressed mode, that the estimated loadings differ. PARAFAC-ALS algorithm. (i) It is only in the first
We can convert the calculated loadings in that mode few iterations that major changes occur in the esti-
into the original variable space by multiplying the mates of the elements of the loadings. The main frac-
loadings from the PARAFAC model - which are tion of iterations are used for minor modifications of
loadings in a score space - with the loadings from these factors. (ii) The changes in each element of the
the PCA. The PARAFAC model achieved hopefully factors is most often systematic and quite linear over
equals a PARAFAC model calculated from the origi- short ranges of iterations.
nal array. To ensure this, ALS is applied to the origi- To make it profitable to extrapolate it is neces-
nal array using the calculated loadings and scores as sary, that the time required to extrapolate is less, than
starting values. If the model is good only few extra the time required to perform a corresponding number
iterations will be necessary in the high-dimensional of iterations. This to some extent limits the applica-
space. If several modes are high-dimensional the bility of the method, because very ingenious extrapo-
156 R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171

lations with quadratic fit and adaptive parameters tend ALS algorithm tends to get stuck in local minima, a
to be so slow, that there is no gain in computing time. good initialization might help overcoming this prob-
Several implementations have been tried ending up lem. Our experience is that local minima is seldom a
with an algorithm by Claus A. Andersson, which problem if the data are trilinear, but others have re-
works fast in our Matlab code. At the ith iteration the ported differently [28,29]. Another practical problem
estimated factor loadings A, B and C are saved as Al, with these methods is how to extend them to higher
Bl and Cl. After the (i + 1)th iteration a linear re- orders. This problem has not yet been addressed.
gression is performed for each element to predict the
value of that element a certain number of iterations
4.2. Stopping criterion
ahead. As only two values of each element are used
in the regression, the prediction can simply be writ-
ten The importance of using a suitable stopping crite-
rion has been mentioned by several authors. It some-
Anew = Al + (A - Al)d, times occurs, that even small changes in the fit can
be associated with huge differences in the estimated
where d is the number of iterations to predict ahead.
loadings, because the response surface of the least
Making d dependent on the number of iterations have
squares error function is very flat [l]. This is espe-
proved useful, and specifically letting
cially true if some underlying phenomena are highly
d = it1i3 correlated. As a safeguard against this, one can run
the algorithm twice. If the algorithm has truly con-
where it is the number of iterations. When applying verged, the two solutions will essentially be identi-
the extrapolation, it is important not to extrapolate cal. If the algorithm has not converged it is unlikely,
during the first, say five, iterations, because the vari- that the estimated solutions are identical if a random
ations in the elements are very unstable in the begin- initialization has been used. A common criterion to
ning. If some modes are constrained, extrapolation use, is to stop the iterations when the relatiue change
has to wait longer for the iterations to be stable. Fur- in fit between two iterations is below a certain value
thermore if the extrapolations fail to improve the fit (e.g., 10p6). In some cases a low change in the rela-
persistently (more than four times) the number d is tive changes of the loadings is used [30]. The differ-
lowered from it’lR to itlln+‘. ence between these two approaches is not clear.

4.1.3. Initialization 4.3. Constraining the solution


Good starting values for the ALS algorithm could
potentially speed up the algorithm and ensure, that the
Constraining the PARAFAC solution can some-
global minimum is found. Several possible kinds of
times be helpful in terms of interpretability or stabil-
initializations have been proposed. Harshman and
ity of the solution. The fit of a constrained model will
Lundy [19] advocate for using random starting val-
always be lower than the fit of an unconstrained
ues and starting the algorithm from several different
model, but if the constrained model is more inter-
starting points. If the same solution is (essentially)
pretable and realistic this may justify the decrease in
reached several times there is little chance that a lo-
fit. In psychometrics orthogonalizing has been de-
cal minimum is reached due to an unfortunate initial
scribed as a means of overcoming problems with un-
guess. In [24-271 it is proposed to use starting values
stable solutions [22]. For the first mode, an orthogo-
based on generalized eigenvalue decompositions.
nal least squares solution to the PARAFAC model can
These eigenvalue decompositions are all comparable
be estimated as
to the generalized rank annihilation methods, where
two samples are used to estimate the loadings in the
A = XZ’(ZX’XZ’) -OS
second and third mode. With respect to speed, how-
ever, there is often no advantage of using these ini- X being I X JK and Z being F X JK as defined be-
tialization methods. Rather, the advantage is if the fore 123,311. This estimation method also normalizes
R. Bra/ Chemometrics and Intelligent Laboratory System 38 (1997) 149-171 151

the loadings. Unless all modes are to be orthogonal- timate is given for free when iterating in the ALS al-
ized, this is not a problem, but merely a matter of gorithm. The estimate of the ijkth element of -X is
scaling. Models estimated under orthogonality con- F
straints will differ from models estimated without this iijk = c aifbjfckf.
constraint. The models however, will still be mathe- f= 1

matically unique, only the models will be least


The missing elements are consistently replaced with
squares models under the orthogonality constraint.
the estimates of the elements, and the ALS is contin-
Orthogonalization is not often used in chemometrics,
ued until no changes occur in the estimates of these
because it hinders the straightforward interpretation
missing elements and the overall convergence crite-
of the loadings, but for more explorative purposes it
rion is fulfilled. It is also possible to handle missing
can be useful. It also enables more straightforward
values by weighted regression setting the weights of
interpretation of e.g., data arising from experimen-
missing values to zero.
tally designed data, as the sum-of-squares described
by the model can be partitioned into contributions
from individual components.
5. Preprocessing
Another and more often used constraint is to re-
quire nonnegative loadings in e.g., a spectral data set.
While orthogonalizing is based on a purely mathe- Preprocessing of three-way arrays is more compli-
matical basis, a nonnegativity constraint is often cho- cated than in the two-way case, though understand-
sen on the basis of specific knowledge of the data; for able in light of the multilinear variation presumed to
instance that absorbance measurements should be be an acceptable model of the data.
positive if proper blanking is used. To find the least Centering the first mode can be done by unfolding
squares loading vector given a nonnegativity con- the calibration array to an I X JK matrix, and then
straint is somewhat complicated. A general method center this matrix as in ordinary PCA:
-
has been described by Lawson and Hanson [32]. This x;;;;l”k”’
= Xijk - xjk
method is implemented in Matlab as NNLS. An
equivalent but faster algorithm is available from the where
author on request.
- c:= lxijk
For certain types of data it can be fruitful to apply Xjk =

constraints on the interrelationship between the load- I

ing vectors. For closed systems one might for exam- This is often referred to as single-centering. The cen-
ple want to restrict the sum of the F - 1 first loading tering shown above is called centering across the first
vectors to equal the Fth loading vector in one mode mode, which is the terminology suggested in [39].
to ensure, that the solution follows the known behav- The centering can of course be applied to any of the
ior of the underlying phenomena. This can be ac- modes, depending on the problem. If centering is to
complished using equality constraints in the least be performed across more than one mode, one has to
squares solutions [32-361. do this by first centering one mode, and then center
It is possible to fix certain loadings to predefined the outcome of this centering. If two centerings are
values (usually zero or one) by adjusting for these el- performed in this way, it is often referred to as dou-
ements during the regression steps in the ALS algo- ble-centering. Triple-centering means centering
rithm. For other knowledge based types of con- across all three modes one at a time. In [39-411 the
straints see [37,38]. effect of both scaling and centering on the trilinear
behavior of the data is described. It turns out that
centering one mode at a time, is the only appropriate
4.4. Missing values way of centering, with respect to the assumptions of
the PARAFAC model. Centering one mode at a time
Missing values in PARAFAC are easily handled essentially removes any constant levels in that partic-
by iteratively estimating the missing values. This es- ular mode. Centering for example matrices instead of
158 R. Bro/Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171

Fig. 5. Three-way unfolded array, with rows constituting one intact mode. Centering must be done across the columns of this matrix, while
scaling has to be done on the rows.

columns will destroy the multilinear behavior of the The scaling shown above is referred to as scaling
data, because more constant levels are introduced within the first mode. When scaling within several
than eliminated. The same holds for other kinds of modes is desired, the situation is a bit complicated
centering. One may, for instance, know that the true because scaling one mode affects the scale of the
model consists of a set of PARAFAC terms and one other modes. If scaling to norm one is desired within
overall level, which might incline one to estimate a several modes, this has to be done iteratively, until
PARAFAC model on the original data subtracted the convergence [39]. Another complicating issue, is the
grand level. However, even though the mathematical interdependence of centering and scaling. In general
structure might theoretically be true, the subtraction scaling within one mode disturbs prior centering
of the grand level introduces some artifacts in the across the same mode, but not across other modes.
data, not easily described by the PARAFAC model. Centering across one mode disturbs scaling within all
The model obtained as the grand level plus the modes [41]. Hence only centering across arbitrary
PARAFAC model is not the global least squares esti- modes or scaling within one mode is straightforward,
mate given the required structure. The grand level and and not all combinations of iterative scaling and cen-
the PARAFAC model would have to be estimated si- tering will converge. These rules may sound compli-
multaneously to obtain the global least squares model. cated, but in practice it need not influence the out-
Instead the subtraction of the grand level shifts the come much if the iterative approach is not used.
data, so that an extra spurious component will be Scaling to a sum-of-squares of one is arbitrary any-
necessary to describe the variation [40]. Scaling in way and it may be just as defensible to just scale
multi-way analysis also has to be done, taking the tri- within the modes of interest once, thereby having at
linear model into account. One should not, as with least mostly equalized any huge differences in scale.
centering, scale column-wise, but rather whole ‘slabs’ Centering can then be performed after scaling and
of the array should be scaled. If variable j of the sec- thereby it is assured that the modes to be centered are
ond mode is to be scaled (compared to the rest of the indeed centered. In the Matlab code available from
variables in the second mode), it is necessary to scale the Internet (see materials and methods) an M-file is
all columns where variable j occurs. This means that given to perform the iterative scaling and centering
one has to scale whole matrices instead of columns. procedures.
For a four-way array, one would have to scale three- A common rule of thumb is to center across the
way arrays. Mathematically scaling can be described mode of interest, but of course the purpose of center-
ing is to remove constant levels, hence knowledge of
the data might guide the proper preprocessing. The
appropriate centering and scaling procedures can
where si can be defined as
most easily be summarized in a figure where the ar-
ray is shown unfolded to a matrix (Fig. 5). Centering
must be done across the columns of this matrix, while
scaling should be done on the rows of this matrix.
R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 159

6. Assessing the solution is the ith element of u and has a value between zero
and one [42]. A high value indicates an influential
6.1. Postprocessing sample or variable, while a low value indicates the
opposite. Samples or variables with high leverages
and low in case of a variable mode must be investi-
As in two-way PCA different scalings of the solu-
gated to verify if they are inappropriate for the model
tion can be used. Scaling one loading vector by a
(outliers) or are indeed influential and acceptable. If
constant does not change the model, if another load-
a new sample is fit to an existing model, the leverage
ing vector of the same component is scaled accord-
can be calculated using the new scores for that sam-
ingly by the inverse of the same constant. The load-
ple as in ordinary regression analysis. The leverage is
ing vectors of the second and third mode can be nor-
then no longer restricted to be below one. As lever-
malized to length one. The scores or loadings of the
ages are actually developed for regression analysis,
first mode will then show the sum-of-squares, SS, of
the term squared Mahalanobis distance might be more
each component as
appropriate for a decomposition method as
PARAFAC, but as leverages are also widely used in
two-way PCA, the term leverage is preferred here.
i-1 j-1 k=l i=f j-1 k=l

Residuals are easily calculated by subtracting the


in the same way as in bilinear PCA. Due to the model from the data. These residuals can be used for
nonorthogonality of PARAFAC solutions in general, calculating variance-like estimates [43] or they can be
one cannot simply add the sum-of-squares for all plotted in various ways to detect trends and system-
components to get the total sum-of-squares. To judge atic variation.
a component’s influence one should compare the
sum-of-squares of the original data with the sum-of-
squares of the data subtracted the specific compo-
6.3. Number of components
nent.
It is also common practice to scale the loading
vectors, so the maximal loading is one to enhance vi- It is difficult to decide the best rank of a
sual interpretability. Other scalings can be applied PARAFAC model. This area is not very well founded
guided by the problem. As there are no predefined yet, and research is absolutely called for. It is not in
order of the components the order of the components general profitable to use cross-validation as in bilin-
might not be the same in two estimates of the same ear PCA. In PCA one deflates the X matrix after cal-
data set, even though the models are identical. This culation of each component, and therefore eventually
is just a matter of permutations. One can of course the components describe noise instead of systematic
build in a sorting in the algorithm, so that compo- variation. This is seen as an increase in the residuals
nents are sorted e.g., in order of their descriptiveness of modeling independent samples. This is the basis of
of the data as in two-way PCA. using cross-validation or jackknifing. Sometimes the
increase in the residual variance is not very pro-
nounced which makes it difficult to correctly esti-
6.2. Leverages and residuals
mate the proper rank of the model. This situation can
be even worse in PARAFAC. In PARAFAC, one
Leverages and residuals can be used for influence
does not deflate the array, because the trilinear model
and residual analysis. As the loading vectors are not
calculated simultaneously for all components can be
orthogonal the leverages have to be calculated as
shown to fit the array better, than if the components
u = diag(A(A’A)-‘A’), were calculated successively as is possible in PCA
[28]. As a consequence, extracting too many compo-
A being replaced with the proper loading matrix (A, nents does not only mean that noise is being increas-
B or C), and diag meaning the diagonal of the ma- ingly modeled, but also that the true factors are being
trix. The leverage for the ith sample or variable, vi. modeled by more (correlated) components. In gen-
160 R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171

eral, one will therefore not see as steep an increase in and


a cross-validation procedure as in the bilinear mod-
dof(F) =ZJKL-F(Z+.Z+K+L-3)
els.
There are three main ways of determining the cor- for a quadrilinear PARAFAC model. I, J, K and L
rect number of components: (1) Split-half experi- are the dimensions of the first, second, third and
ments, (2) judging residuals and, (3) compare with fourth mode respectively and F is the number of
external knowledge of the data being modeled. If the components in the model. These degrees of freedom
PARAFAC model is to be used for e.g., calibration might be used for explorative purposes, but they are
one can of course do cross-validation on the predic- not to be taken as statistically exact numbers of de-
tions of the dependent variable to find the optimal grees of freedom. Such are currently not available.
model. Ad. (3). With experience one gets a feeling for
Ad. (1). Harshman and Lundy [19] advocate for which results are good and which results are bad. This
using split-half experiments. The idea is to divide the can be very important for making good models. The
data into two halves and then make a PARAFAC use of experience and intuition can also be more sys-
model on both halves. Due to the uniqueness of the tematically used. Often one knows certain things
PARAFAC model, one will obtain the same result about the underlying phenomena in the data. Spectra
- same loadings in the nonsplitted modes - on both of certain analytes might be known, the shape of
data sets, if the correct number of components is chromatographic profiles might be known or the
chosen. To judge if two models are equal one must nonnegativity of certain phenomena might be known.
remember the intrinsic indeterminacy in PARAFAC: These kinds of hard facts can be very informative
The order and scale of a model may change if not when comparing different models. In [38,44] some
fixed algorithmically. If a wrong number of compo- examples on how to use residuals and external
nents is chosen in the split-half experiment, there is a knowledge to choose the appropriate number of
good chance, that the two models will not be equal, components are shown.
due to the differences in the different samples. When
doing a split-half experiment one has to decide which
6.4. Degenerate solutions
mode to split. In general one should split the data in
a mode with a sufficient number of independent
variables/samples. If one has a highdimensional Degenerate solutions are sometimes encountered.
spectral mode, an obvious idea would be to use this Degenerate solutions are solutions hard to handle for
spectral mode for splitting, but the collinearity of the the PARAFAC model. The estimated models are of-
variables in this mode would impede sound results. ten unstable and unreliable. A typical sign of a de-
Any number of components would yield the same re- generate solution, is that loading vectors of the same
sult if the spectra behave additively. mode have high correlations. Most often a degener-
Ad. (2). As in bilinear models, one can judge a ate solution is characterized by two PARAFAC com-
model on the fit. If systematic variation is left in the ponents showing equally shaped loading vectors in all
residuals, it is an indication that more components can modes with two or none of the pairs of loading vec-
be extracted. If a plot of the residual sum of squares tors of each mode positively correlated and one or
versus the number of components sharply flattens out three negatively correlated. An indication of degen-
for a certain number of components, this is an indica- erate solutions can thus be obtained by monitoring the
tion of the true number of components. If the resid- correlation between all pairs of loading vectors. In
ual variance is larger than the known experimental practice the triple cosine, TC, of all combinations of
error, it is indicative of more systematic variation in components is used. TC is defined as
the data. To calculate variance-like estimators [44]
TC,, = cos(a,, aj)cos(bi, bj)cos(ci, cj)
give the following degrees of freedom for a trilinear
PARAFAC model a:aj b;bj c:cj
=
dof(F) =Z.ZK-F(Z+J+K-2), lla~llllajll
llb,llllbillllcilllleilj
R. Bra/ Chenwmetrics and Intelligent Laboratory Systems 38 (1997) 149-171 161

i and j indicating the ith and jth component. still be appropriate but if the differences are large, this
Mitchell and Burdick [29,45] refer to TC as the un- indicates that some latent variables do not vary across
corrected correlation coefficient (UCC). The TC some of the ways, or perhaps vary interdependently.
value can be shown to correspond to the cosine of the In such a case the Tucker (or restricted versions) or
angle between two vectors, xi and xj, where xi is unfold bilinear models might be better [48-501.
the vector obtained by properly unfolding the tensor Mitchell and Burdick [29,45] investigate degener-
product of all loading vectors with index i (like the acy and find it profitable to do several runs of a few
b and c vector in Fig. 4 only a should also be in- iterations, and only use those runs that are not sub-
cluded). ject to degeneracy. Another way of circumventing
A TC value close to - 1 indicates a degenerate degenerate solutions is by applying orthogonality
solution. A TC value lower than - 0.85 is an indica- constraints on the model.
tion of a troublesome model according to [46], but this If the variation in one mode is not exactly obey-
can just be taken as a rule of thumb. Furthermore, for ing the linearity of the PARAFAC model, it is possi-
degenerate solutions the TC value will typically con- ble to eliminate this mode by using it for calculating
tinue to worsen for more iterations. If the numeri- covariance or cross-product matrices. Fitting the
cally high TC value is just caused by a poor initial- model to these derived data, is called indirect fitting
ization, the TC value will decrease again numeri- and has been used for longitudinal data [8,51]. Con-
cally. If several new estimations of the same model sider a three-way data array where the third mode
are consistent and not degenerate, the degenerate so- could be chromatographic. Perhaps the chromato-
lution can be discarded as an accidental local mini- graphic profiles of the same analytes change a little
mum. If decreasing the convergence criterion does from sample to sample due to analytical properties.
not eliminate degeneracy the cause is most often one The data array is therefore almost trilinear, but the
of three [41,47]. (i) Too many components are ex- differences from sample to sample in the third mode
tracted. This will be easily recognizable, by the fact makes it hard to make a sound PARAFAC model.
that models with fewer components yield nondegen- The I X J X K array can be seen as J matrices of size
erate solutions. Often extracting too many compo- I X K. For each j (1 to J) one can calculate an I X I
nents will give high positive TC values just as well covasiance matrix as XIX,, where Xj is the Z X K
as negative, which is not the case for real degenerate submatrix of X on the jth level of the second mode.
solutions. Split-half experiments will also help to dis- The thus obt&ed data array has the size Z X Z X J
tinguish this situation from more serious problems. and consists of covariance matrices. The original third
(ii) Poor preprocessing has been applied, which can mode has vanished and fitting the PARAFAC model
be characterized by degenerate solutions even for a to this array will give the following model (compare
low number of components, and when other informa- Eq. (1)).
tion indicates that further systematic information is F

present. (iii) The last situation of degeneracy occurs Xijk = c ‘ifajfb%


when the model is simply inappropriate, for exam- f= 1
ple, when the data are not trilinear as the model. In disregarding the noise. The a’s and b’s in this model
[48] some of these situations are referred to as two- correspond exactly to the a’s and b’s in the model
factor degeneracies. When two factors are interre- obtained from the raw data array (Eq. (1)) if the
lated a Tucker3 model is appropriate and estimating loading vectors in the third mode are orthonormal.
PARAFAC models with too few factors can yield This, however, is not very likely and a solution to this
degenerate models that can be shown not to con- problem has been suggested by Harshman. He has
verge to a minimum, while estimating models with a developed a model called PARAFAC2 [51]. In this
higher number of components is difficult due to the model the loading vectors of the third mode can be
correlations between the components. An indication oblique - nonorthogonal. The PARAFAC:! model
of this situation might be, that the estimated two-way has not yet been used very extensively maybe be-
rank of the unfolded array is different depending on cause the implementations so far have been compli-
which mode is unfolded. A PARAFAC model may cated and slow [52].
162 R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171

7. Types of data suitable for PARAFAC analysis Table 1


Experimental design

There are three different types of data, that are PPO activity

more or less commonly analyzed by PARAFAC: factor levels


PCA-like data, analysis of variance - ANOVA - 0, (o/o) 0, 5, 10, 20, 80
data and multidimensional scaling data. In chemo- co, (%I 0, 10, 20
metrics the most well-known application of PH 3.0, 4.5, 6.0
Temp (“C) 5,20, 30
PARAFAC is for PCA-like data, spectral data for ex-
Substrate CG, EPI, MIX
ample. The use of PARAFAC for these kind of data Replicates I, II
should be rather simple following the same strategy
as for decomposing bilinear data.
The use of PARAFAC for analysis of variance is
rare [53]. However, the use of PCA and related ity of PPO was determined in replicate. Building a
methods for ANOVA has been known for several calibration model to predict the activity from the ex-
years (see [54,55] and references in these). The ad- perimental conditions would give important informa-
vantage of using PARAFAC for ANOVA is in the tion on how the PPO activity - and therefore the
way interaction terms are modeled. In a standard color formation - is influenced by the different fac-
ANOVA an interaction between three factors (A, B tors. The different levels of the factors are shown in
and C> would be estimated as Eijk, while in a trilin- Table 1. The number of samples in the replicated full
ear model, this effect would be estimated as ai bjck factorial design is 5 X 3 X 3 X 3 X 3 X 2 = 810. For
or as a sum of such expressions if more PARAFAC details on the experimental conditions and a more
components are estimated. The interaction is not only in-depth discussion on the technological aspects see
estimated as a whole, but is modeled as a multiplica- [57,58].
tive effect of the three different factors. If the multi- In [58] the results obtained with PARAFAC are
plicative model is appropriate, the applied restriction compared with ANOVA, locally weighted regression
(a, bjck instead of merely Eijk) will give a more in- and nonlinear methods based on PLS and feedfor-
terpretable model. ward neural networks, but here the focus is on
Carroll and Chang [2] who developed PARAFAC PARAFAC and partially ANOVA.
(CANDECOMP) simultaneously with Harshman de-
veloped it for its application to multidimensional
CG EPI MIX
scaling. In psychometrics it has gained widespread
use for this purpose, but this will not be touched upon
specifically here.

8. Application I: Analysis of variance

8.1. Data

This data set is obtained for exploring the influ-


ence and properties of enzymatic browning of veg-
etables. The primary contributor to enzymatic brown-
ing is PPO, polyphenol oxidase [56]. The relation-
ship between PPO activity (expressed as oxygen
consumption) and experimental conditions is investi-
gated. For five 0, levels, three CO, levels, three pH
values, three different temperatures and three sub- Fig. 6. A graphical representation of the five-way array of activi-
strate types - all varied independently - the activ- ties.
R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 163

300
1 I

1t
+ ‘0 04
2 25o
++ ++ Q
+ al ^^”

Replicate 2 Replicat&-1

Fig. 7. Activities of one replicate set versus the other (a). Predictions of the activities of one replicate set from a model of the other replicate
set (b).

In the PARAFAC model, the data is interpreted as To decide on the number of components, a five-
a multi-way array of activities, specifically a five-way way PARAFAC model was made using the first of
array. In a sense, the whole array is seen as one sam- the two replicate sets instead of just using the mean
ple namely PPO activity, which is measured at dif- of these. The model from this analysis given by the
ferent conditions. The five different modes (ways) loadings A, B, C, D and E was compared to the other
are: 0, (dimension five), CO, (dimension three), pH replicate set. The number of components was chosen
(dimension three), temperature (dimension three), and to minimize the sum squared prediction error calcu-
substrate type (dimension three). The ijklmth ele- lated as
ment of the five-way array contains the activity at the

1
ith 0, level, the jth CO, level, the kth level of tem-
SS = 5 aijbjfckfd,fe,f - xijklm ’
perature, the Eth level of substrate type, for the mth
f= 1
1
pH. The five-way array is depicted in Fig. 6. Each
element in the array is the average of the two repli- Xijklm being the ijkZmth element of the replicate set
cates. not used for estimating the model. One component
gave the lowest prediction error, which furthermore
8.2. Results and discussion was in the neighborhood of the intrinsic error of the
reference value (standard deviation between repli-
Preprocessing of ANOVA data is somewhat diffi- cates 11.9 and standard deviation between the model
cult and no general guidelines can be given. One must and the test set 13.4 corresponding to 94% variance
either try different scalings and centerings or use ex- explained). The predictions are shown in Fig. 7,
ternal knowledge for guidance. In this case, the only where one clearly sees, that the model is very good
preprocessing thought to be of potential importance and comparable to the intrinsic error of the data.
was scaling the oxygen mode. From the residuals of The activity can hence be modeled by a very sim-
the two sets of replicates, some heteroscedasticty was ple one-component model. The loading vectors of this
observed in the oxygen mode. However, scaling the model are shown in Fig. 8. To predict the activity at
data according to this did not improve the predic- a certain setting of the different factors, one simply
tions of one replicate set predicted from the other, read the ordinate-values of the five different factors
simply because the elements with high residuals were and multiply these. If a low activity is sought it is very
downweighted and henceforth modeled with even easy to see how this can be obtained, i.e. by keeping
higher error. Other kinds of meancentering and scal- the temperature, oxygen and pH levels as low as pos-
ing were tried, but without improving the solution sible.
[581. A one-component solution was also found from a
164 R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171

Fig. 8. Loading vectors from a one-component PARAFAC model of enzymatic data.

test set procedure where half the elements of the ANOVA are hard to interpret. A standard ANOVA
five-way array were eliminated. The elements were performed in SAS pointed to the following model.
eliminated according to a fractional factorial design. Activity = A + B + C + D + E + AB + AC
For the remaining data, a PARAFAC model was built
with an algorithm that handles missing data. With the +AD+AE+BC+BC+BE+CD
model the activity of the left-out samples was pre- +CE+DE+ABC+ABD+ABE
dicted for both a one- and a two-component + ACE + ADC + ABDC + ABCE
PARAFAC model. The root mean square error of + ADE + ABDE + DCE + ADCE,
prediction, RMSEP, for a one-component model was A, B, C, D and E being the main effect of 0,, CO,,
12.6, while for two components an RMSEP of 5 1.9 temperature, substrate and pH, and e.g. ACD the in-
was found. teraction between 0,, temperature and substrate.
From the PARAFAC loadings it is possible to PARAFAC on the other hand suggested that the
predict the effect at any level of the factors investi- five-way multiplicative interaction term is sufficient
gated. To validate this, a PARAFAC model was made to model the variations
leaving out all samples with 20% oxygen. A curvefit
of the oxygen loading vector makes it possible to find Activity = ABCDE,
the oxygen effect of any level between 0 and 80%. or rather
For 20% the loading was estimated from a quadratic
Activity = ai bj ck d, e, .
curvefit of the loading vector. From this value and the
loadings of the remaining modes the 27 (1 X 3 X 3 It is interesting, that one of the few interactions not
x 3) left-out samples were predicted with an RM- significant in the ANOVA model is the five-way in-
SEP of 13.1. This shows that a completely general teraction term! Even though more sophisticated
model is obtained showing the effect of each factors ANOVA methods can be used, this example illus-
as simply a loading vector. trates, that choosing the right mathematical method
The data constitute a full factorial design, and can greatly influence the outcome, both with respect
analysis of variance is thus an obvious tool for inves- to prediction and interpretability. When using half the
tigating the influence of different factors. However, samples to estimate a model and predict the left-out
the problem is highly nonlinear and the results from samples an RMSEP of 35.3 and 12.6 was achieved for
R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 165

the ANOVA and the PARAFAC model respectively. baselines or other lower-order effects in a spectral
The reason for the better results with PARAFAC is decomposition.
simply that the underlying multiplicative model is
more appropriate for the enzymatic data, than is the
mathematical model underlying ANOVA. It is to be 9. Application II: Unique decomposition of sparse
expected that the main variation in the activity is fluorescence data
caused by something, that could be approximately
multiplicative. If pH is three little activity is ob- 9.1. Data
served no matter the oxygen level, but if pH is 6, the
activity of PPO is extremely dependent on oxygen. This problem is an illustrative example of the
unique decomposition obtained by PARAFAC using
8.3. Further modification of the model a nonnegativity constraint. The data set is part of an
investigation conducted by Claus A. Andersson at our
Mathematically the one-component PARAFAC laboratory. Two samples containing different
model could also be obtained by ANOVA by a loga- amounts of tyrosine, tryptophane and phenylalanine
rithmic transformation of the data, but PARAFAC were measured by fluorescence (excitation 250-300
offers an even more general model and is further- nm, emission 250-450 nm, 1 nm intervals). The ar-
more unique which is not the case for the ANOVA ray to be decomposed is hence 2 X 5 1 X 201. The
model. In an effort to make a better model a con- samples were measured on a PE LSSOB spectro-
strained modification of PARAFAC was used where fluorometer with excitation slitwidth of 2.5 nm, an
some loadings were forced to ones, thereby permit- emission slitwidth of 10 nm and a scanspeed of 1500
ting modeling of lower-order interactions and main rim/s.. Originally five samples were used, which
effects. The best model was found to consist of one could be decomposed by unconstrained PARAFAC
main additive effect, one four-way and one five-way without problems. However, to show how to incor-
interaction. This model, specifically porate external knowledge in the decomposition only
two of the samples are used here.
Xijk=ail +ai2bj2Ck2d/2+ai3bj3Ck3d13e~3
The theoretical multilinearity of fluorescence has
estimated from one of the replicate sets gave a model, been described in [17,28,59]. In Fig. 9 one of the two
that predicted the other replicate set with an RMSEP samples is shown. Notice the Rayleigh scatter in the
of 12.3. This as compared to the intrinsic error be- left part, which is not multilinear in its nature [60].
tween the two replicates, 11.9, and the error obtained Rayleigh scatter should be avoided in a multilinear
predicting with the one-component five-way interac- decomposition if possible, and there are three ways of
tion model, 13.4. However, the model only explained doing that: (i) Only measure the emission above the
one percent more of the variance than the one-com- excitation, if this wavelength area contains sufficient
ponent model, and hence the increased complexity
was judged not to be beneficial enough to justify the
model. The model was estimated by a three-compo-
nent PARAFAC where the loadings of the first com-
ponent were all fixed at the value one except in the
first mode. In the second component the loadings of
the fifth mode were forced to ones. Further investiga-
tion is now in progress trying to develop a general
multiplicative ANOVA model using PARAFAC as 3000
sketched here. Problems with defining degrees of 450

freedom and the numerical obstacles in estimating the


constrained PARAFAC models are the most obvious 240 250

problems to be dealt with. It is noteworthy that this Fig. 9. A plot of the fluorescence of a sample containing Tyr, Trp
approach can also be used for estimating constant and Phe. Notice the Rayleigh scatter in the left comer.
166 R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171

Fig. 10. Estimated and true emission spectra: (a) Unconstrained PARAFAC, (b) NNLS PARAFAC, (c) PARAFAC with missing elements
and (d) missing elements and NNLS. In (d) the true spectra are also shown.

information; (ii) Use curvefitting in some form to es- are quite similar to the pure spectra of the analytes,
timate the emission in the neighborhood of the exci- but for tryptophane there is a small hump below 300
tation wavelengths; (iii) Measure a blank and sub- mn caused by the non-multilinear Rayleigh scatter.
tract this measurement from the sample measure- To avoid this it was tried to set all variables influ-
ment. This, however, can be problematic if the enced by Rayleigh scatter to missing values and then
Rayleigh scatter is mainly caused by particles in the estimate the corresponding PARAFAC model. The
sample. In this experiment nothing was done to elim- result can be seen in Fig. 10~. Apparently this alone
inate the Rayleigh scatter initially. is not sufficient to ensure a good curve resolution for
the tryptophane spectrum. Combining the missing el-
9.2. Results and discussion ements approach with the nonnegativity constraint
helps the model focuses on the right aspects of the
A three-component PARAFAC solution should data and the estimated loadings in Fig. 10d are shown
give the correct solution if the trilinear model is ap- together with the pure spectra. As seen the estimated
propriate. The emission loadings of a three compo- loadings are now quite similar to the pure spectra. The
nent PARAFAC solution is shown in Fig. 10a. The estimated excitation spectra are shown in Fig. 11.
spectrum corresponding to tryptophane has large The model precisely estimates the three pure spec-
negative areas. It was concluded that the decomposi- tra, even though there are only two independent sam-
tion was difficult due to the low variability (two ples, and the excitation spectra of tyrosine and tryp-
samples) and knowing that the fluorescence spectra tophane are very alike (correlation 0.93). According
and concentrations should be positive, it was natural to the rule mentioned in the paragraph on unique-
to constrain the PARAFAC loadings to positive val- ness, it is theoretically possible to estimate these three
ues. In Fig. lob the estimated emission loadings are different spectra correctly if only the concentrations
shown using nonnegativity constraints. The spectra vary independently pairwise and no spectra are lin-
R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 167

the concentrations of the analytes if the right number


of components has been chosen. Due to the scaling
indeterminacy in PARAFAC we cannot estimate the
concentration of any of the analytes without knowing
the concentration in one sample. Suppose the con-
centrations of the analytes in the first sample are
known, we can then scale the PARAFAC solution and
compare the concentration estimates of the second
sample with the true concentrations. The result is
shown in Table 2.
Though the errors are large relatively, they are in
Wavelength the right neighborhood.
Fig. 11. Estimated excitation spectra using missing elements and
nonnegativity constraints. The thick lines are the pure spectra of
Trp, Tyr and Phe.
10. Application III: Prediction of amino-N in sugar
samples from fluorescence
early dependent on any of the others. However, due
to Rayleigh scatter, noise and spectral alikeness un- 10.1. Data
constrained PARAFAC was not sufficient in this case
for resolving the spectra. As for bilinear PCA, the outcome of a PARAFAC
The four different models differ mainly in the area model can be used as input to other models, most of-
of the Rayleigh scatter, but for all models the fit to ten for regression. In this example the emission spec-
the non-Rayleigh part of the data is almost identical. tra of 98 sugar samples dissolved in phosphate
The unconstrained PARAFAC explains 99.957% of buffered water were measured at four excitation
the variation, the nonnegativity constrained model wavelengths (excitation 230, 240, 290, and 340 nm,
explains 99.958% of the variation, the missing ele- emission 375-560 nm, 0.5 nm intervals). The
ments model explains 99.974%, and the combined amino-N content was also determined by a standard
missing and nonnegativity model explains 99.973%. wet-chemical procedure as described in [61]. Follow-
The little difference in explained variance clearly ing more or less the strategy of PCR a PARAFAC
supports that the preconceived assumptions of non- model is sought whose scores can predict the amino-
negativity and inappropriateness of the Rayleigh- N content of the sugar samples from the fluores-
scatter are valid. Otherwise the altered models would cence. The scores constitute the independent vari-
have had significantly poorer fit than the uncon- ables and are related to the amino-N content by mul-
strained model. It was also tried to resolve the spec- tiple linear regression.
tra by using generalized rank annihilation as de-
scribed in [26], but the result was similar to, though 10.2. Results and discussion
worse than, the result using PARAFAC with missing
values. Three different PARAFAC calibration models
The loadings of the sample mode are estimates of were made: One using raw fluorescence data and an
unconstrained PARAFAC model, one using raw data
and nonnegativity constraints on the emission mode,
Table 2
Predictions of concentrations in the second sample and one using meancentered data and unconstrained
Predicted concentration
PARAFAC. The models were made using test set
True concentration
validation with 49 samples in each set. A PARAFAC
8.8 x lo-’ 7.8 x 1o-7
model was estimated and a regression model esti-
4.4x 1o-6 3.5 x 10-6
3.0x 10-4 2.3x 1O-4 mated from the scores of the PARAFAC model. The
scores of the test set samples were calculated from the
168 R. Bro/Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171

Table 3
Percentage of variance explained of the dependent variable (amino-N) of the test set. Each column correspond to a different model and each
row to the number of latent variables/components used. Bold numbers indicate variance explained for candidate models, and the numbers
in parentheses are the percentage of variance explained of the three-way array of independent variables in the test set (fluorescence spectra)
PARAF AC PARAF AC PARAF AC Two-way PLS Three-way PLS Tucker PCR
(raw) (meancentered) WNLS)
1 LV 84.0 84.1 84.0 84.4 84.2 84.3 83.9
2LV 85.4 85.4 85.5 86.6 86.1 84.8 85.7
3LV 85.2 85.4 85.2 88.5 88.9 85.3 86.8
4LV 87.1 85.1 86.8 91.6 91.4 88.0 87.2
5LV 91.2 (99.8) 90.7 (99.9) 91.1(99.8) 91.9 (96.0) 92.3 (95.7) 87.7 (99.8) 88.0 (99.9)

excitation and emission loadings from the PARAFAC models. The constraints imposed in PLS and N-PLS
model and from these the estimated amino-N content seem to be more adequate. Both give more predictive
was determined from the regression model. For com- models for amino-N. Both models fit the spectral data
parison the results of using multi-way PLS (N-PLS, poorer than the pure decomposition methods, which
[62]), ordinary two-way PLS, Tucker3 regression, and is expectable due to the constraints of the scores hav-
two-way principal component regression (PCR) are ing maximal covariance with the dependent variable.
also calculated. N-PLS is a multi-way calibration N-PLS uses only a fraction of the number of parame-
model. By N-PLS the array of independent variables ters that PLS uses to model the spectral data, so in a
is sequentially decomposed to a multi-linear model, mathematical sense, N-PLS obtains optimal predic-
in such a way that the scores have maximal covari- tions with the most simple model. Therefore one can
ante with the yet unexplained variation of the depen- argue, that the N-PLS model is the most appropriate
dent variable. The Tucker regression was performed model. However, the N-PLS model does not possess
by decomposing the raw data with a Tucker3 model the uniqueness properties of PARAFAC. One might
using the same number of components in each mode therefore also argue that the five-component nonneg-
and using the loadings of the sample mode for re- ativity constrained PARAFAC model is preferable, if
gression. PCR was performed using the successively the found loadings can be related to specific chemi-
estimated score vectors from a PCA model for re- cal analytes; an issue that will not be further investi-
gression. The spectral data was meancentered prior to gated here.
the PCA decomposition. Using PARAFAC for regression as shown here has
The results from the calibration models are shown the potential for simultaneously providing a model,
in Table 3. Several interesting aspects are illustrated that predicts the dependent variable, and precisely
here. All models obtain optimal or near-optimal pre- describes which phenomena in the independent vari-
dictions around five components. PLS and N-PLS ables, that are crucial for describing the variations in
seem to perform slightly better than the other meth- the dependent variable. However the little experience
ods and furthermore using fewer components. All obtained so far in our laboratory indicates that often,
pure decomposition methods (PARAFAC, Tucker3, one is better of by focusing on either decomposition
PCA) describe approximately 99.8% of the spectral (PARAFAC) or calibration (N-PLS). Purely spectral
variation using five components. Even though the data as here is the only type of data, where there
PCA and Tucker3 models are more complex and seems to be little differences in the predictive ability.
flexible than PARAFAC the flexibility apparently
does not contribute to better modeling of the spectra. 10.3. M’n’M
Combining this with the fact that the PARAFAC re-
gression models outperform both Tucker3 and PCA, All calculations were done on a 133 MHz Dell PC
very well illustrates that when PARAFAC is ade- with 32 Mb RAM. The PARAFAC algorithm was
quate there is no advantage of using more complex made in the mathematical software Matlab for Win-
R. Bro / Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 169

dows 4.2c.l (Mathworks). This implementation their data sets. Anonymous referees are thanked for
works with arrays up to ten ways. It also contains the helpful suggestions.
possibility to constrain loadings to be orthogonal or
nonnegative and handles missing data. The algorithm
is available from the Internet at http:\\ References
newton.foodsci.kvl.dk \ foodtech.html. Also avail-
able are M-files for PARAFAC and Tucker3 made by [l] R.A. Harshman, Foundations of the PARAFAC procedure:
Claus A. Andersson, and N-PLS by R. Bro. Other Model and conditions for an ‘explanatory’ multi-mode factor
programs for PARAFAC modeling are also avail- analysis, UCLA Working Papers in phonetics 16 (1970) 1.
[2] J.D. Carroll, .I. Chang, Analysis of individual differences in
able. Richard A. Harshman, Dept. Psychology, So-
multidimensional scaling via an N-way generalization of and
cial Science Center, University Western Ontario, Eckart-Young decomposition, Psychometrika 35 (1970) 283.
London, Ontario, Canada N6A 5C2 has a very gen- [3] P. Geladi, Analysis of multi-way (multi-mode) data,
eral PARAFAC program for three-way analysis Chemom. Intell. Lab. Syst. 7 (1989) 11.
which runs in batch mode on PC’s Rob Ross, at 141A.K. Smilde, Three-way analyses. Problems and prospects,
Chemom. Intell. Lab. Syst. 5 (1992) 143.
http://www.biosci.ohio-state.edu/ N rtr/multilin/
[51 P.M. Kroonenburg, Three-mode principal component analy-
muldoc.html offers fortran code for PARAFAC, and sis, Theory and applic ations, DSWO Press, Leiden, 1983.
Pentti Paatero, Dept. Physics, University of Helsinki, [61H.A.L. Kiers, Hierarchical relations among three-way meth-
BOX 9, FIN-00014 University, Helsinki, Finland, has ods, Psychometrika 56 (1991) 449.
made a program for two- and three-way PARAFAC [71 M.B. Seasholz, B.R. Kowalski, The parsimony principle ap-
plied to multivariate calibration, Anal. Chim. Acta 277 (1993)
which incorporates nonnegativity constraints and
165.
weighted regression. P.M. Kroonenburg’s latest ver-
Dl R.A. Harshman, S.A. Berenbaum, Basic concepts underlying
sion of his three-mode toolbox now contains the the PARAFAC-CANDECOMP three-way factor analysis
PARAFAC model. The program runs in DOS mode. model and its application to longitudinal data, in: D.H. Ei-
Orders should be sent to P.M. Kroonenburg, Dept. chom, J.A. Clausen, N. Haan, M.P. Honzik, P.H. Mussen
(Eds.), Present and past in middle life, Academic Press, NY,
Education, Leiden University, Wassenaarseweg 52,
1981, pp. 435-459.
2333 AK Leiden, The Netherlands.
191D.S. Burdick, An introduction to tensor products with appli-
cations to multiway data analysis, Chemom. Intell. Lab. Syst.
28 (1995) 229.
11. Conclusion [lo] I. Scarminio, M. Kubista, Analysis of correlated spectral data,
Anal. Chem. 65 (1993) 409.
[ll] L. Sarabia, M.C. Ortiz, R. Leardi, G. Drava, A program for
The PARAFAC model and its estimation has been non-orthogonal factor analysis, Trends Anal. Chem. 12 (1993)
described and its application for ANOVA, curve-res- 226.
olution and calibration has been exemplified. It is my [:121 N.M. Faber, M.C. Buydens, G. Kateman, Generalized rank
hope that this tutorial might encourage others to in- annihilation method I: derivation of eignevalue problems, J.
Chemom. 8 (1994) 147.
vestigate multi-way methods. Multi-way methods
131 R.A. Harshman, Determination and proof of minimum
have many advantages (and of course shortcomings) uniqueness conditions for PARAFACI, UCLA Working Pa-
that have not yet been fully acknowledged. pers in phonetics 22 (1972) 111.
[14] J.B. Kruskal, More factors than subjects, tests and treat-
ments: An indeterminacy theorem for canonical decomposi-
tion and individual differences scaling, Psychometrika 41
Acknowledgements
(1976) 281.
[15] J.B. Kruskal, Three-way arrays: Rank and uniqueness of tri-
Rasmus Bro is grateful for support and inspiration linear decomposition, with application to arithmetic complex-
from and funds to Professor Lam Munck from Nordic ity and statistics, Linear Algebra Appl. 18 (1977) 95.
Industry Foundation project P93149 and the F@TEK [16] R.B. Cattell, Parallel proportional profiles and other princi-
ples for determining the choice of factors by rotation, Psy-
foundation. Claus A. Andersson, Age K. Smilde and
chometrika 9 (1944) 267.
Henk Kiers are thanked for numerous suggestions [17] S. Leurgans, R.T. Ross, R.B. Abel, A decomposition for
during this work. Lam Norgaard, Hanne Heimdal and three-way arrays, SIAM J. Matrix Anal. Appl. 14 (1993)
Claus A. Andersson are thanked for letting me use 1064.
170 R. Bra/ Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171

[18] J.B. Km&al, Rank, decomposition, and uniqueness for 3-way [39] J.M.F. ten Berge, Convergence of PARAFAC preprocessing
and N-way arrays, in: R. Coppi, S. Bolasco (Eds.), Multiway procedures and the Deming-Stephan method of iterative pro-
data analyses, Elsevier Science Pub., North-Holland, 1989. portional fitting, in: R. Coppi, S. Bolasco (Eds.), Multiway
[19] R.A. Harshman, M.E. Lundy, The PARAFAC model for data analyses, Elsevier Science Pub., North-Holland, 1989.
three-way factor analysis and multidimensional scaling, in: [40] J.B. Kruskal, Multilinear methods , Proc.Symp. Appl. Math.
H.G. Law, C.W. Snyder, J.A. Hattie, R.P. McDonald (Eds.), 28 (1983) 75.
Research methods for Multimode data analysis, Praeger, New [41] R.A. Harshman, M.E. Lundy, Data preprocessing and the ex-
York, 1984. tended PARAFAC model, in: H.G. Law, C.W. Snyder, J.A.
[20] J.M.F. ten Berge, H.A.L. Kiers, J. de Leeuw, Explicit Cande- Hattie, R.P. McDonald (Eds.), Research methods for Mulit-
comp/PARAFAC solution for a contrived 2 X 2 X 2 array of mode data analysis, Praeger, New York, 1984.
rank three, Psychometrika 53 (1988) 579. [42] R.D. Cook, S. Weisberg, Residuals and influence in regres-
[21] J.M.F. Ten Berge, Kruskal’s polynomial for 2X2X2 arrays sion, Chapman and Hall Ltd, New York, 1982.
and a generalization to 2 XnXn arrays, Psychometrika 56 [43] A.K. Smilde, D.A. Doombos, Simple validatory tools for
(1991) 631. judging the predictive performance of PARAFAC and three-
[22] H.A.L. Kiers, W.P. Krijnen, An efficient algorithm for way PLS, J. Chemom. 6 (1992) 11.
PARAFAC of three-way data with large numbers of observa- [441 S.R. Durell, C. Lee, R.T. Ross, E.L. Gross, Factor analysis
tion units, Psychometrika 56 (1991) 147. of the near-ultraviolet absorption spectrum of plastocyanin
[23] R.A. Harshman, M.E. Lundy, PARAFAC: Parallel factor using bilinear, trilinear and quadrilinear models, Arch.
analysis, Comp. Stat. Data Anal. 18 (1994) 39. Biochem. Biophys. 278 (1990) 148.
[24] R. Sands, F.W. Young, Component models for three-way [45] B.C. Mitchell, D.S. Burdick, Slowly converging PARAFAC
data: An alternating least squares algorithm with optimal sequences: Swamps and two-fator degeneracies, J. Chemom.
scaling features, Psychometrika 45 (1980) 39. 8 (1994) 155.
[25] D.S. Burdick, X.M. Tu, L.B. McGown, D.W. Millican, Reso [46] W.P. Krijnen, The analysis of three-way arrays by con-
lution of multicomponent fluorescent mixtures by analysis of strained PARAFAC methods, Ph.D. thesis, University of
excitation-emission-frequency array, J. Chemom. 4 (1990) 15. Groningen, 1993.
[26] E. Sanchez, B.R. Kowalski, Tensorial resolution: A direct tri- [47] J.B. Kruskal, Multilinear methods, in: H.G. Law, C.W. Sny-
linear decomposition, J. Chemom. 4 (1990) 29. der, J.A. Hattie, R.P. McDonald (Eds.), Research methods for
[27] S. Li, P.J. Gemperline, Eliminating complex eigenvectors and Mulitmode data analysis, Praeger, New York, 1984.
eigenvalues in multiway analyses using the direct trilinear 1481 J.B. Kruskal, R.A. Harshman, M.E. Lundy, How 3-MFA data
decomposition method, J. Chemom. 7 (1993) 77. can cause degenerate PARAFAC solutions, among other rela-
[28] S. Leurgans, R.T. Ross, Multilinear models in spectroscopy, tionships, in: R. Coppi, S. Bolasco (Eds.), Multiway data
Statist. Sci. 7 (1992) 289. analyses, Elsevier Science Pub., North-Holland, 1989.
[29] B.C. Mitchell, D.S. Burdick, An empirical comparison of [49] A.K. Smilde, Y. Wang, B.R. Kowalski, Theory of medium-
resolution methods for three-way arrays, Chemom. Intell. rank second-order calibration with restricted-Tucker models,
Lab. Syst. 20 (1993) 149. J. Chemom. (1994) 21.
[30] X.M. Tu, D.S. Burdick, Resolution of trilinear mixtures: Ap- [50] A.K. Smilde, R. Tauler, J.M. Henshaw, L.W. Burgess, B.R.
plication in spectroscopy, Stat. Sinica 2 (1992) 577. Kowalski, Multicomponent determination of chlorinated hy-
[31] N. Cliff, Orthogonal rotation to congruence, Psychometrika drocarbons using a reaction-based chemical sensor. 3.
31 (1966) 33. Medium-rank second-order calibration with restricted Tucker
[32] CL. Lawson, R.J. Hanson, Solving least squares problems, models, Anal. Chem. 66 (1994) 3345.
Classics in Appl. Math., No. 15, SIAM, Philadelphia, 1995. [51] R.A. Harshman, PARAFAC2: Mathematical and technical
[33] R.J. Hanson, Linear least squares with bounds and linear notes, UCLA Working Papers in Phonetics 22 (1972) 30.
constraints, SIAM J. Sci. Stat. Comput. 7 (1986) 826. [52] H.A.L. Kiers, An alternating least squares algorithm for
[34] H. Splth, Mathematical algorithms for linear regression, PARAFAC2 and three-way DEDICOM, Comp. Stat. Data
Academic Press, Inc., Boston, 1987. Anal. 16 (1993) 103.
[35] J.L. Barlow, Error analysis and implementation aspects of 1531 J.R. Kettenring, A case study in data analysis, Proc. Symp.
deferred correction for equality constrained least squares Appl. Math. 28 (1983) 105.
problems, SIAM J. Numer. Anal. 25 (1988) 1340. [541 A. Aastveit, H. Martens, ANOVA interactions by partial least
[36] J.L. Barlow, N.K. Nichols, R.J. Plemmons, Iterative methods squares regression, Biometrics 42 (1984) 829.
for equality-constrained least squares problems, SIAM J. Sci. 1551 H. Martens, L. Izquierdo, M. Thomassen, M. Martens, Par-
Stat. Comput. 9 (1988) 892. tial least squares regression on design variables as an altema-
[37] J.D. Carroll, S. Pruzansky, J.B. Kruskal, Candelinc: A gen- tive to analysis of variance, Anal. Chim. Acta 191(1986) 133.
eral approach to multidimensional analysis of many-ways ar- [561 M.V. Martinez, J.R. Whitaker, The biochemistry and control
rays with linear constraints on parameters, Psychometrika 45 of enzymatic browning, Trends Food Sci. Technol. 6 (1995)
(1980) 3. 195.
[38] R.T. Ross, S. Leurgans, Component resolution using multi- [57] H. Heimdal, L.M. Larsen, L. Poll, R. Bro, Oxidation of
linear models, Methods Enzymol. 246 (1995) 679. chlorogenic acid and ( - )-epicatiechin by letucce polyphenol
R. Bro/Chemometrics and Intelligent Laboratory Systems 38 (1997) 149-171 171

oxidase in model solutions at various combinations of 0,. [60] G.W. Ewing, Instrumental methods of chemical analysis,
CO,, Temperature and pH, J. Agric. Food Chem., in press. McGraw-Hill Book Company, NY, 1985.
[58] R. Bro, H. Heimdal, Enzymatic browning of vegetables, Cal- [61] L. Norgaard, A multivariate chemometric approach to fluo-
ibration and analysis of variance by multiway methods, rescence spectroscopy, Talanta 42 (1995) 1305.
Chemom. Intell. Lab. Syst, 34 (1996) 85. [62] R. Bro, Multiway calibration, Multilinear PLS, J. Chemom.
[59] L. Norgaard, Classification and prediction of quality and pro- 10 (1996) 47.
cess parameters of thick juice and beet sugar by fluorescence
spectroscopy and chemometrics, Zuckerind. 120 (1995) 970.

You might also like