Spca Iecr
Spca Iecr
net/publication/351856272
CITATIONS READS
4 236
4 authors, including:
All content following this page was uploaded by János Abonyi on 26 May 2021.
E-mail: [email protected]
Abstract
With the ever-increasing use of sensor technologies in industrial processes, and more
data becoming available to engineers, the fault detection and isolation activities in the
(PCA), which can reduce the dimensionality of large data sets without compromising
the information content. While most process monitoring methods offer satisfactory
detection capabilities, understanding the root cause of malfunctions and providing the
physical basis for their occurrence have been challenging. The relatively new sparse
PCA techniques represent a further development of the PCA in which not only the data
dimension is reduced but the data is also made more interpretable, revealing clearer
correlation structures among variables. Hence, taking a step forward from classical
1
based on a sparse algorithm. The resulting control charts reveal the correlation struc-
tures associated with the monitored process and facilitate a structural analysis of the
occurred faults. The applicability of the proposed method is demonstrated using data
generated from the simulation of the benchmark vinyl acetate process. It is shown
ate monitoring framework, can provide physical insight towards the origins of process
faults.
Introduction
As the cost of sensors and computing continue to decrease, the amount of data collected from
a process has risen dramatically. 1 With better sensors, higher sampling rates are possible,
allowing the operators to have a real-time view of the plant states. In addition, a decrease
in sensor costs enables the economically viable deployment of more sensors. Consequently,
process engineers have access to more data about underlying processes than ever. The
challenge remains, however, to find and develop efficient tools for analyzing and interpreting
the data for specific problems.
While in most plants low level control actions are carried out automatically, managing
abnormalities is still mostly carried out manually by the operators. 2 Confronted by vast
number of measurements in complex plants, however, this task is not trivial, leading to hu-
man errors and often accidents. Consequently, there is a significant potential for automating
the surveillance methods to provide effective operator support. Often, trained with data
collected during normal operation, these methods are expected to not only detect variances
from normal operation sufficiently early, but also to provide information about the nature
of such abnormal behavior and even further insight towards the cause of the abnormality to
help mitigate it.
As opposed to early process measurements that were limited to key variables sampled at
a few strategic locations, in modern processes, numerous process states and physical proper-
2
ties can be accessed and stored with high sampling frequencies with the aid of a vast network
of sensor arrays. 3 A key step to analyzing such large data sets is effective complexity reduc-
tion, during which data size is reduced and fundamental information structure is preserved.
Principal component analysis (PCA) is a popular method for achieving dimension reduc-
tion with applications in various scientific fields, 4 especially in the field of statistical process
monitoring. 5 PCA conserves as much information as possible while reducing the number of
dimensions, or in this context, measured variables, by introducing new dimensions called
principal components (PCs). The PCs obtained via this transformation are linear combi-
nations of all original measurements, and hence, while the amount of data is reduced, the
PCs no longer possess any physical attributes. This creates a challenge when one needs to
interpret the behavior or a trend associated with a specific PC.
Various methods have been proposed to address this drawback. In most applications,
this problem was historically solved by simply truncating PC loadings. However, Cadima et
al. 6 showed that this thresholding technique can often be misleading. Other early techniques
to create a more sparse structure involved rotation of loading matrices to obtain as many
zero loadings as possible. 6 A major shortcoming of these simple techniques is the loss of or-
thogonality and/or the correlation structure manifested in the loadings due to the rotation.
Additionally, different normalization techniques may lead to varying and unsatisfactory re-
sults. 6 Since the underlying correlations should not depend on the normalization technique,
the rotation approach is deemed unsuitable. As an alternative, the Simplified Component
Technique Least Absolute Shrinkage (SCoTLASS), introduced by Jolliffe et al., 7 systemat-
ically obtains sparse solutions as an extension to PCA. Sparseness is achieved through a
Lasso constraint and the resulting loadings are forced to be orthogonal. However, the ob-
jective function is non-convex, which makes the resulting optimization problem difficult to
solve. 8 In addition, the loading vectors are either uncorrelated or normalized, but not both.
Zhou et al. 9 further developed the SCoTLASS algorithm into a Sparse Principal Component
Analysis (called SPCA) method. The idea is to formulate the principal component problem
3
as a regression while penalizing non-zero loadings via elastic net regression. Hereby, the
PCA loadings are regressed while non-zero loadings are penalized. An advantage of this
method is that the number of non-zero-loadings (NNZL) per PC is controllable. However,
the loading structure depends on the number of desired PCs for this algorithm. 8 Journeé
et al. 10 proposed an algorithm to achieve sparseness via Lasso or cardinality constraints.
They reformulated the problem in a way to take advantage of the generalized power method.
Trendafilov 11 considered it as the most efficient sparsity approach in his review. Most sparse
PCA algorithms do not achieve to sustain orthogonality among the sparse loading vectors.
The algorithm developed by Qi et al. 8 creates either orthogonal loadings or uncorrelated
PCs. Other approaches involved semi definite programming 12 or greedy algorithms. 13 This
list is by far not complete as there is a constantly growing number of sparse PCA algorithms
in a field of ongoing research.
The field of statistical process monitoring has spawned many different approaches, espe-
cially those that exploit PCA and its variants. For example, Bin Shams et al. 14 proposed
an advancement to contribution plots for better fault isolation and diagnosis. Choi et al. 15
based their fault detection system on kernel PCA to reflect nonlinear system behaviour.
Jiang et al. 1 advanced this approach by using PCA to reflect linear relations and Kernel
PCA to reflect nonlinear relations in a chemical plant. They then used both models si-
multaneously for better fault detection. Misra et al. 16 combined PCA with wavelets in their
approach. Doing that, they accounted for time dependent behaviour of continuous processes.
Also accounting for autocorrelation, Dong et al. 17 applied Dynamic PCA (DPCA) to the
Tennessee Eastman process and normal PCA to the residuals. On the other hand, the con-
cept of sparse PCA algorithms has been introduced into the fault detection and diagnosis
literature much more recently. Grbovic et al. 18 used sparse PCA to find blocks in the sensor
network. They then trained Support-Vector-Machines for fault classification and fused their
decisions using Maximum Entropy algorithms. They applied this approach on the Tennessee
Eastman benchmark process. Gajjar et al. 19 demonstrated that sparse PCA models can also
4
serve as a foundation for multivariate control charts, such as Hotelling’s T 2 , with almost the
same results as PCA on the Tennessee Eastman benchmark process data set. Moreover, they
demonstrated the use of contribution plots in the diagnosis of faults. Furthermore, Gajjar et
al. 20 utilized least squares sparse PCA (LS SPCA) and visualized their results with parallel
coordinates for fault detection while for fault diagnosis, they used supervised Random Forest
algorithms to successfully isolate even the most difficult faults. In addition, by taking advan-
tage of the multiple correlated variables and units in complex plant-wide monitoring tasks,
the distributed control charts are gaining more and more popularity: Tang et al. 21 provided
an exhaustive review of other distributed monitoring and control approaches, while Jiang
et al. 22 focused on the data-driven distributed monitoring of industrial plant-wide processes
in their review paper. The key idea of such approaches lies on the same fundamentals to
first decompose a complex plant-wide process into multiple connected subprocesses and then
develop a data-driven model for their monitoring.
In a related but rather different study, Gao et al. 3 developed a high level sparse PCA
algorithm based on SPCA to obtain loading vectors with high captured variance and called
it the Forward SPCA. The approach was then applied to the Tennessee Eastman process to
discover underlying correlations. Their goal was to provide further process understanding
and physical insight into process behaviors based on the process geometries and control
structures. That study provides the inspiration and the motivation for the present work.
In this paper, the above mentioned advantages of SPCA algorithms and decentralized
fault detection charts are merged and a novel, decentralized, SPCA-based fault detection
approach is proposed. The novelty of the present approach is as follows:
• Building upon the correlation preserving characteristic of PCA and the structural
grouping of SPCA, the presented methodology utilizes SPCA to group the variables
of similar characteristics and use them to monitor the faults of structural elements
related to the underlying correlated process units.
• Decentralized fault monitoring charts are utilized to exploit the advantage of the struc-
5
Figure 1: Approach discussed in this work.
The approach is divided into three phases (Figure 1). During the first phase, which
is assumed to be under normal operating conditions, data is collected and a sparse PCA
model is built. Given its sparsity, only strongly correlated measurement variables load onto
same principal components. Revealed correlations are utilized for a deeper understanding
of the process. Hotelling’s T 2 control charts built on the individual principal components
allow for a close monitoring of fault behavior in Phase II. Utilizing the known correlation
structure from the first step, these charts provide the basis for interpreting and explaining
the physical and structural underpinnings for occurring faults. The approach is applied in a
case study to the simulation of a vinyl acetate production plant. Normal operation as well
as two abnormal behaviours serve as Phase I and Phase II data sets, respectively.
6
The rest of the paper is organized as follows: In the next section, the PCA method
and the sparse PCA algorithm (SPCA) are briefly introduced as well as a metric for sparsity
evaluation. This is followed by the description of the case study, the benchmark vinyl acetate
production process. The applicability of the proposed method is demonstrated through the
application of the SPCA model to data describing two abnormal behaviours and a detailed
discussion of observed dynamics is provided. Finally, the concluding remarks are offered. In
the Appendix section, a short description of the Hotelling’s T 2 control charts and the table
of the applied measurements of the vinyl acetate production process are provided.
In the following, Principal Component Analysis (PCA) as a method for dimensionality re-
duction is briefly discussed. This is followed by a review of the SPCA algorithm. The section
closes with the introduction of a metric, an index of sparseness, to evaluate the performance
of the algorithm.
It is well established that the PCA maximizes the amount of variance captured by the PCs
sequentially. Therefore, the first PC contains the most variance, the second PC less so, etc.
Mathematically, PCA can be formulated as an eigenvalue-decomposition problem where X
is a n × p matrix with n observations and p predictors. X is assumed to be normalized with
zero mean and unit variance, otherwise the variance per predictor p is distorted. 9 X can
then be expressed using the Singular Value Decomposition (SVD):
X = U DV T and Z = U D (1)
7
Here Z is commonly referred to as the principal components (PCs), or the score matrix, and
V are the corresponding loadings; U is the matrix of eigenvectors, while D is the matrix of
eigenvalues. Consequently, p new dimensions or PCs are initially obtained. To reduce the
dimensions, in a second step, only the first l PCs are selected. In order to decide how many
PCs are sufficient, the variance of each PCs is calculated; i.e., the variance of the j th PC is
given as:
2
dj = Djj /n (2)
A common strategy to determine the desired number of PCs is the cumulative percentage
of explained variance (CPV). The CPV expresses how much variance of the total variance
is captured by the first l PCs. The threshold k is defined as an acceptable percentage of
explained variance:
Pl
j=1 dj
CP V = Pp · 100% ≥ k (3)
j=1 dj
Typical values of k range from 75 to 85 percent, depending on how much critical infor-
mation is to be retained and how much noise (or unstructured information) remains in the
residual. The first l PCs which satisfy this condition are then used (model space) while
the other p − l PCs are excluded (residual or error space). This way the data dimension
is reduced to l p PCs. The drawback of this dimension reduction technique, however, is
that all of the l PCs are still linear combinations of all of the original p predictors. Thus,
even though in many applications the p predictors are associated with physical properties, it
becomes almost impossible to interpret the aggregate loading matrix V and the individual
PCs.
To address the aforementioned drawback, SPCA was developed. 9 In the present method-
ology, the SPCA methodology is formulated via elastic net regression, where the basic idea is
to combine the advantages of PCA with an elastic net regression formulation to create sparse
8
loading vectors which depend only on the most critical physical variables (measurements).
Elastic net regression is used to regress the loading vectors of the original PCA 23 as:
where Zi is the ith PC of the original PCA and β is the sparse loading vector. Sparseness
is achieved through the parameters λ and λ1 , which penalize all non-zero loadings in β.
For λ, λ1 = 0 the regression problem simplifies and the obtained loadings β reduce to PCA
loadings. The L1 norm for λ1 and L2 norm for λ are used in this formulation. The L2 norm
is necessary when the number of predictors p exceeds the number of observations (p >> n).
While this formulation succeeds in preserving some advantages of PCA, others are lost.
The amount of explained variance is still sequentially the most, therefore, a threshold k
can still be defined for SPCA to determine the number of retained (model) PCs. However,
since the regression formulation does not directly impose an orthogonality constraint, the
regressed sparse loadings are not necessarily orthogonal to each other. Consequently, simply
calculating the captured variance per PC would overestimate the explained variance of each
PC. For the original PCs Z, tr(Z T Z) would represent the total explained variance. For
the modified sparse PCs Z 0 with corresponding sparse loadings B, this estimation is too
optimistic. 9 Instead, using QR-decomposition, the modified PCs Z’ are decomposed into
Q, an orthonormal matrix, and R, an upper rectangular matrix using methods such as the
Gram-Schmidt orthogonalization. 19
Q, R = qr(XB) (5)
Pl 2
The adjusted variance is calculated as j=1 Rjj . The first l loading vectors can then
be used for dimensionality reduction, given a threshold value for the amount of captured
variance. In this work, the SPaSM toolbox in Matlab was used 24 where the SPCA algorithm
as described by Zou et al. 23 is implemented. The toolbox also allows control of the number
9
of non-zero-loadings (NNZL) per principal component.
Index of Sparseness
A challenge when applying sparse PCA is to determine the NNZL for each PC. Sparse PCA
algorithms inherit a trade-off among three different goals. First, the amount of explained
variance should be as high as possible for as few PCs as possible. Since more NNZL can
capture more variance, maximizing the NNZL is critical to achieve this goal. A second goal
is making PCs interpretable. Consistent with the idea of sparse PCA, with increasing NNZL,
more variables load onto the same PC, making the physical explanation more challenging.
A third goal is to keep each PC as uncorrelated as possible. This can be achieved by
decreasing the NNZL; while this is not the only way, it appears to be the simplest. The
Index of Sparseness, introduced by Trendafilov, 11 assembles these three goals into a single
metric:
Va Vs #0
IS = (6)
Vo2 n × p
The steps of the proposed methodology, as previously presented in Figure 1 start with the
SPCA model of the process which is built on data recorded during normal operation. This
model provides the opportunity to investigate the correlation structure of the process, as the
variables loading on the same PC are correlated and the different PCs are orthogonal to each
10
other and hence, behave independently and do not share any information. This structural
grouping of the variables provides the opportunity to methodologically ground the SPCA-
based decentralized fault detection approach. In the following step, the same model is
applied on the process data recorded during abnormal operation providing the opportunity
to compare it to the PCs of the normal operation. In the proposed approach, PCs are
evaluated and displayed individually as they contribute to the the Hotelling’s T 2 statistic,
establishing a decentralized fault detection framework. The core question investigated in the
present section is how to build the sparse PCA model of the process and how to analyse the
process structure aiming for fault detection using the resultant PCs.
Prior to building an SPCA model for fault detection, to serve as a comparison, a PCA
model is also fitted on the data recorded on the normal process behaviour. First of all, this
provides the optimal choice of the number of applied PCs together with the target captured
variance utilizing the same number of PCs in the case of the SPCA model. Second, in
implementing the SPCA model, since the number of PCs is already fixed, the next step is to
determine the NNZL per PC. The structure of the loadings of the PCA model can already
provide some insights into the complexity of the process and hence of the choice of the NNZL.
To provide more interpretability, the same NNZL is set for all PCs. The determination of
the NNZL lies in a trade-off between the competing targets: with a higher NNZL the CPV
increases, but at the same time, the more variables loading onto the same PC make the
interpretation of the PCs more difficult. Therefore, the Index of Sparseness is used as the
metric to uniformly set the parameters.
It is noted that to select a sparse PCA algorithm to be applied in facilitating the dis-
covery of the basis of abnormal behaviors, different attributes need to be considered. Some
algorithms (e.g., generalized power method 10 ) lead to high CPV (close to CPV of PCA).
This means that the PCs calculated by such algorithms can explain the highest ratio of
the variance. At the same time, these approaches also lead to high Index of Sparseness,
suggesting that they should be chosen only considering the ratio of explained variance. In
11
contrast, the SPCA algorithm has the same NNZL per PC for all PCs, which is of significant
advantage taking into consideration our aim of understanding the underlying connections
among the variables. Due to the uniform NNZL distribution, the explained variance of the
different PCs is tend to be more uniform as well. Since in this work not only the dimension
of the measured data is to be reduced, but also the PCs are to be interpreted, the SPCA
algorithm achieving its sparsity via elastic net regression, i.e., SPCA, is preferred.
In the final step, the correlation structure of the process based on the SPCA model is
to be investigated. Analogous to the methodology followed in Gao et al., 3 as the variables
loading onto the same PC are correlated, underlying connections can be deduced among
these variables representing the physical system loading onto the same PCs, as a result of
conservation laws and the laws of thermodynamics. 25 Hence, the sparse PCs and the physical
structure of the process is compared and interpreted. The expectation is that not only the
known relationships can be reaffirmed and verified but new and novel relationships can also
be discovered that may not be obvious to engineers and operators.
pretation
12
Figure 2: The flowsheet of the simulated vinyl acetate monomer production process
flows and 23 controller loops aiming to stabilize the process (Figure 2). Seven chemical
components are processed in the plant (ethylene (C2 H4 ), oxygen (O2 ), acetic acid (HAc
or CH3 COOH) as well as vinyl acetate (CH2 CHOCOCH3 ), water (H2 O), carbon dioxide
(CO2 ) and ethane (C2 H6 )). Two exothermic reactions take place. One is the conversion of
ethylene to vinyl acetate
1
C2 H4 + CH3 COOH + O −−→ C4 H6 O2 + H2 O (7)
2 2
13
and the second is the undesired combustion of ethylene
The simulation returns 43 measurements of different states within the plant, including
liquid levels, stream pressures, temperatures, flow rates and concentrations. Since the latter
property is often difficult to measure online, it is not considered in this study. Therefore,
the number of measurements is reduced to 23 as listed in Table ?? in the Appendix.
Phase I
For building the PCA model, a 240-minute normal operation was simulated. To increase
variability during normal operation, a more realistic reflection of a chemical plant operation,
a random value ∆mi following a Gaussian distribution with mean mi and standard derivation
of 1% of mi was added to the simulated values of measurement 5 (reactor exit temperature)
and measurement 23 (HAc tank level) at each sample time. The variables were then set to
zero mean and unit variance, before the application of PCA and SPCA.
Phase II
In Phase II, different abnormal behaviors can be simulated and analyzed. These typically
would include process faults (failures), operational (set-point) changes and external distur-
bances. While several abnormal behaviours, disturbances, set-point changes and faults were
investigated with the presented method, for demonstration and brevity purposes, we focus
on two faults. The chosen faults are different in origin and behaviour and hence, are able to
point out the different connections revealed by the algorithm. The fault modes occur at the
100th minute and last for 5 minutes. The total simulation time is 240 minutes.
The first fault is indicated by the fault ID 6 in the simulation. In this fault, the oxygen
feed to the process is lost. The second fault is the one with the fault ID 8. This malfunction
corresponds to the loss of the distillation column feed.
By the analysis of the sparse principal components and their Hotelling’s T 2 statistic, we
14
indicate how the methodology provides actionable knowledge and a monitoring tool to the
operators. The limit breaking of a Hotelling’s T 2 statistic indicates a deviance from normal
operation and the contribution of the sparse principal components narrows down the origin
of the malfunction to the variables loading on the specific component. With the expertise
of the operators, these variables or process segments can be easily checked for malfunctions,
moreover, the connection of the different variables can be readily interpreted.
Prior to building the SPCA model, to serve as a comparison, the PCA model is also fitted for
the Phase I data. The result can be evaluated by plotting the captured variance against the
number of PCs in Figure 3 a), the first 6 PCs explaining the 85.86% of the total variance. The
resulting loading structure is depicted in Figure 3 c), where the horizontal axis represents the
PCs, and the vertical axis indicates the number of the measured variable (according to Table
?? in the Appendix). In this graphic, the size of the dots is proportional to the absolute
value of the loadings, while the red and blue colors indicate the negative and positive values,
respectively. Clearly, with such a complex and full loading matrix, making any definitive
statements regarding the underlying variable correlations would be a challenging task.
Some sparse PCA algorithms display a dependency of the individual PCs to the total
number of calculated PCs. 8 Since this dependency makes it difficult to find the right number
of PCs per model, 6 PCs are used as default in the case of the SPCA model. As sparse PCA
models typically capture less variance than the ordinary PCA for the same number of PCs,
the cumulative percentage of explained variance (CPV) is expected to be less than 85.86%
for the SPCA model.
The next step is to determine the NNZL per PC. To enhance interpretability, the same
number of NNZL is used for all PCs. The choice of NNZL lies in a trade-off: with a higher
NNZL, the CPV would increase but the interpretability may be compromised. Therefore,
the Index of Sparseness is used as the metric to guide us in this process.
15
35 35
30 30
25 25
Percentage
Percentage
20 20
15 15
10 10
5 5
0 0
1 2 3 4 5 6 1 2 3 4 5 6
Number of PC Number of PC
(a) Explained variance distribution of the PCA (b) Explained Variance Distribution of the
model SPCA model
23 23
21 21
Number of Measurement
19 Number of Measurement 19
17 17
15 15
13 13
11 11
9 9
7 7
5 5
3 3
1 1
1 2 3 4 5 6 1 2 3 4 5 6
Number of PC Number of PC
(c) Loading structure for PCA (d) Loading structure for SPCA
Figure 3: The comparison of the PCA and SPCA models. Red and green dots indicate
negative and positive loadings, respectively.
Figure 4 shows the Pareto front of the competing targets: the ratio of the adjusted to
total variance, ratio of unadjusted to total variance, and the ratio of NNZL to all loadings.
As can be seen, the global optimum for the Index of Sparseness is 7 nonzero loadings per PC,
with a value of 0.296 for the Index of Sparseness. In the optimal case, the explained variance
is 60.37%, which is, as expected, less than in the case of PCA. The variance explained by
each PC is depicted in Figure 3b. Comparing Figure 3d to Figure 3c, the difference between
the interpretability of PCA and SPCA models is clear, as this now facilitates greatly the
analysis of the underlying correlations. Due to the uniform NNZL distribution (Figure 3b),
the explained variance of the different PCs is more uniform as well.
16
Figure 4: Pareto plot for SPCA
A direct comparison of the Hotelling’s T 2 statistic between the PCA and the SPCA
algorithms (Figure 5) shows that the models have a similar trend in Phase I. The UCL
was set at 95% confidence limit, thus it is expected that about 5 out of 100 sample points
could violate the UCL. Neither of the models indicate significant violation, therefore, the
assumption that the underlying data representing the normal operation is valid.
20 20
Hotelling's T2 statistic Hotelling's T2 statistic
UCL 95 % UCL 95 %
15 15
T2
T2
10 10
5 5
0 0
0 50 100 150 200 250 0 50 100 150 200 250
time in min time in min
(a) Hotelling’s T 2 statistic for the PCA model (b) Hotelling’s T 2 statistic for the SPCA model
17
Interpretation of the SPCA model
(a) PC 1, PC 2, PC 3
(b) PC 4, PC 5, PC 6
Figure 6: Measurements per PC illustrated at their position in the plant. Colors connecting
measurement of same PC
As the variables loading onto the same PC are correlated, underlying connections can be
deduced among the the dynamical behaviour of the variables loading onto the same PC, as a
result of conservation laws and the laws of thermodynamics present in the physical system.
In the following, some of these will be discussed and an interpretation of their grouping will
18
be provided. Especially significant are the variables with high loadings, since they exhibit a
strongly correlated behavior.
• The first PC (Figure 6a) is dominated by variables 3 and 10, the vaporizer and the
separator temperatures, since their loadings are by far the highest within the sparse
loading vector. Although these measurements seem to be far from one another in
the flowsheet, their properties are connected since the gas, after leaving the separator
unabsorbed is fed into the separator again. Among these two dominant variables,
other, less dominant variables are also present in the cluster.
• The second PC (Figure 6a) groups the variables around a different part of the plant.
With variable 13 (absorber level) and variables 17 and 18 (organic product flow rate
and level), this PC localizes around the downstream part of the plant. Other variables
that are correlated with these dominating variables also load onto this PC.
• The third PC (Figure 6a) depicts the relationship between variables 1 (the pressure
in the vaporizer) and 12 (the pressure in the absorber). Since both variables are pres-
sures, they possess a natural correlation through the gas dynamics. Furthermore, the
absorber recycling gas feeds into the vaporizer, therefore influencing its pressure. Other
variables that load onto this PC are more scattered. Some other, less trivial correla-
tions can also be seen in this PC, such as measurement 5 (reactor exit temperature).
This is, however, a non-obvious correlation, which can only be discovered through the
present SPCA construction.
• The fourth PC (Figure 6b) is localized on the distillation column with variables 21 &
22 (liquid level and temperature in column) as well as 23 (liquid level in HAc tank,
which is fed by the column bottoms). The aqueous product level (variable 19), which
is partially re-fed into the column, loads onto this PC. As before, three variables 13,
15, 16 also appear to correlate with the distillation column variables, albeit with a less
obvious physcial interpretation.
19
• The fifth PC (Figure 6b) directly captures the relationships associated with the effluent
heat exchanger. Variables 5, 6 and 8 are the hot fluid inlet and outlet temperatures and
its flow rate, respectively, while variable 7 is the outlet temperature of the cooling fluid.
The correlations captured by this PC are directly connected to the heat exchanger, thus
the correlation is as expected.
• The sixth PC (Figure 6b) explains the behavior of the decanter and the absorber. Ap-
parently, they have a strong relation even though the distillation column is in between.
Variables 11, 13, 14 (absorber liquid feed temperature, absorber liquid level and wash
acid fed into the absorber) describe the absorber dynamics. The decanter temperature,
the organic product flow rate leaving the decanter and the organic product level share
their origin in the decanter unit operation (variables 17, 18 and 20).
In this section, we demonstrated that the dependency structure within the vinyl acetate
process is not trivial. While some PCs, such as PC 5, directly reflect variables related to a
unit operation in their loadings, others show less obvious relations distributed around the
plant. A reason for this complex behaviour is the complexity of the process itself. With
several liquid and gas recycling streams, all unit operations are connected in some way.
Next, the knowledge obtained through the PCs will be used in Phase II to better understand
and interpret the trends associated with abnormal behaviors.
For Phase II, first, Fault 6 was analyzed. It simulates the loss of the oxygen feed for a period
of 5 minutes. The Hotelling’s T 2 for the SPCA model was calculated and for comparison,
said statistic also for the PCA model. Figure 7 illustrates that both models behave similarly,
where it is clearly visible that Fault 6 is flagged at minute 100 as the T 2 statistic responds
immediately. It can be observed that while the fault itself only lasts for 5 minutes, the UCL
continues to be violated for additional 50 minutes in both models.
20
120 120
Hotelling's T2 statistic Hotelling's T2 statistic
100 UCL 95 % 100 UCL 95 %
80 80
T2
T2
60 60
40 40
20 20
0 0
0 50 100 150 200 250 0 50 100 150 200 250
time in min time in min
(a) Hotelling’s T 2 statistic, PCA model (b) Hotelling’s T 2 statistic, SPCA model
Figure 7: Comparison of the T 2 statistic for Fault 6 using the PCA & SPCA models
Next, the contribution of each PC to the Hotelling’s T 2 statistic was calculated, resulting
in 6 individual control charts (Figure 8). From the distinct trends of the control charts,
combined with the process knowledge about correlated variables among the PCs, the occur-
rence of Fault 6 and the associated process response can be better understood. To achieve
this, the analysis of each control chart is necessary. Here, we would like to note that we have
also analysed the residual statistics (Q-statistic) of the models (see Additional Information),
however, as no significantly new information was obtained, that analysis is omitted here.
• The characteristic trend of the first PC (Figure 8a) during the fault consists of two
waves. The first wave, an immediate response to the fault, violates the UCL at minute
102 and peaks 4 minutes later, 4 minutes after that the T 2 value is below the UCL
again. A second wave peaking at minute 118 comes shortly after, again violating the
UCL. The T 2 statistic is back at a normal level 24 minutes after the first occurrence
of the fault. The first wave is caused by the lack of oxygen in the feed. As stated
before, the first PC is mainly influenced by the temperatures of the vaporizer and the
separator. Due to the lack of oxygen, the reactor outlet flow rate and temperature are
changed, which also alters the temperature of the vaporizer through the temperature
of the heat exchanger and the separator. The control system is crucial to understand
21
50 50 50
Hotelling's T2 statistic Hotelling's T2 statistic Hotelling's T2 statistic
UCL 95 % UCL 95 % UCL 95 %
40 40 40
30 30 30
T2
T2
T2
20 20 20
10 10 10
0 0 0
0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250
time in min time in min time in min
30 30 30
T2
T2
T2
20 20 20
10 10 10
0 0 0
0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250
time in min time in min time in min
the second wave. A controller is measuring the oxygen concentration in the reactor
inlet and if the inlet lacks oxygen, more fresh oxygen is fed into the system. After the
5 minutes of faulty operation, the reactor inlet lacks oxygen. The controller overshoots
the amount of oxygen needed, hence too much oxygen is present in the system, which
again causes the change of the vaporizer and separator temperatures, leading to the
second wave of UCL violation.
• The second PC (Figure 8b) behaves similar to the first in the sense that there are
two distinct waves again. In comparison to PC 1, the waves occur later and are less
intense (maximum values of 45 vs. 19). The delayed response to the fault as predicted
in Phase I is indeed the case here. The two unit operations loading onto the second
PC, decanter and absorber, are far downstream. Thus, the fault reaches them later,
leading to delayed fault detection. In addition, between the fault source and the two
unit operations are several other unit operations and controllers, further dampening
22
the effect of the fault.
• As can be seen in Figure 8c, the corresponding contribution to the Hotelling’s T 2 for
PC 3 looks different from PC 1 and PC 2. There are two waves as well, however, with
several smaller peaks on each of them, where the two peaks always have a constant
time difference. The UCL is immediately reached (minute 101), which is the fastest
response of all charts. Again, the two waves can be explained by the lack of oxygen and
the response of the controller. The loadings of PC 3 are dominated by the correlation
between the pressure of the vaporizer and the absorber. Due to the lack of oxygen, the
pressure in the absorber is almost immediately out of normal state, leading to an early
alarm. The second peak within the same wave can be explained by the recycling gas
lowering the pressure in the vaporizer as well. The same happens during the second
wave. The time difference between the two peaks is constant as it corresponds to the
recycle time of the process. This PC is especially valuable for this fault since it triggers
the alarm much faster than the rest of the PCs. Its trend furthermore leads directly
to a problem in the gas composition since the pressure is changing quickly. Aiming
for effective fault diagnosis, it would have been crucial to analyze this PC if the cause
were unknown.
• PC 4 has the most unique response among the 6 PCs (Figure 8d). Until minute 138,
the UCL is not violated, which is the latest response among the PCs regarding this
fault. Also, there is no second wave. The value of the T 2 statistic remains low as well
compared to the other PCs (6.9 at minute 146). The main unit operation associated
with PC 4 is the distillation column, which is not only far downstream of the fault
source, but also insensitive to changes in the oxygen feed. This way there is little to
no response to the fault. Monitoring PC 4 is not helpful for this fault.
• The measurements associated with the heat exchanger, which load onto PC 5, show
a similar trend to PC 1 and PC 2. The Hotelling’s T 2 statistic (Figure 8e) has two
23
waves again, similar to PC 1 and 2. The characteristic times are very similar to those
of PC 1. Also, the maximum T 2 statistic value of 46 at the peak is almost the same.
The heat exchanger is close to the oxygen feed, where the fault occurs. Therefore, the
reaction to the fault is immediate.
As a second malfunction in Phase II, Fault 8 was simulated, where the column feed is lost
between 100 and 105 minutes. To fit the data, the same procedure as for Fault 6 was applied.
Then, the overall Hotelling’s T 2 for the PCA and the SPCA models were calculated.
120 120
Hotelling's T2 statistic Hotelling's T2 statistic
100 UCL 95 % 100 UCL 95 %
80 80
T2
T2
60 60
40 40
20 20
0 0
0 50 100 150 200 250 0 50 100 150 200 250
time in min time in min
(a) Hotelling’s T 2 statistic, PCA model (b) Hotelling’s T 2 statistic, SPCA model
Figure 9: Comparison of the T 2 statistic for Fault 8 using the PCA & SPCA models
One can see that the the overall plots look again very similar (Figure 9). Both charts
24
violate the UCL around 102-103 minutes, both trends peaking at 108 minutes. The maximum
value of the T 2 statistic of the PCA model is slightly smaller than the one for the SPCA
model. The individual contributions to the Hotelling’s T 2 statistics were calculated for this
fault as well, and are discussed below.
70 70 70
Hotelling's T2 statistic Hotelling's T2 statistic Hotelling's T2 statistic
60 UCL 95 % 60 UCL 95 % 60 UCL 95 %
50 50 50
40 40 40
T2
T2
T2
30 30 30
20 20 20
10 10 10
0 0 0
0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250
time in min time in min time in min
50 50 50
40 40 40
T2
T2
T2
30 30 30
20 20 20
10 10 10
0 0 0
0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250
time in min time in min time in min
• The trend of the first PC (Figure 10a) does not show any reaction at minute 100
(the time of occurrence) to the fault. In fact, the trend violates the UCL only a few
times probably unrelated to the fault. As mentioned before, this PC is dominated by
the temperatures of the vaporizer and separator. Both unit operations are far away
from the distillation column, where the fault is occurring and the loss of column feed
does not influence the temperatures of these units. Therefore a faulty trend cannot be
observed directly, making this chart superfluous.
• In contrast, the trend of the Hotelling’s T 2 statistic contribution for the second PC
(Figure 10b) clearly reflects the effect of the fault. The UCL is violated for the first
25
time at minute 103 for 6 minutes, the duration of the fault itself. In contrast to Fault
6, there is no characteristic second wave in the violation of the UCL. The two main
unit operations described by the second PC are the decanter and the absorber. The
decanter is directly connected to the head of the column. Thus, a loss of column feed
influences the state of the decanter as well.
• As can be seen in Figure 10c, the third PC’s trend does not show any alarm at all. The
UCL is violated a few times after minute 100, but in the same way as it was violated
before the occurrence of the fault. Therefore, the variables loading onto the PC do not
seem to detect the presence of any fault. The third PC is dominated by the pressure
measurements (variables 1 & 12), however, both of these measurements are far away
from the abnormality source.
• The contribution to the Hotelling’s T 2 statistic of the fourth PC (Figure 10d) most
clearly signals the effect of Fault 8. At minute 110, the UCL is violated and until
minute 131 it stays above the UCL. The peak can be described as wide with a plateau
around a Hotelling’s T 2 statistic value of 9. The loadings of the fourth PC are grouped
around the distillation column, explaining the expected strong reaction to Fault 8. Es-
pecially variables 21 and 22, the distillation column level and temperature, are directly
influenced by the loss of feed. However, this PC associated with the distillation column
shows no immediate response to the fault and a 10 minute delay occurs. This can be
explained by the complex and nonlinear nature of the distillation column. Due to its
many stages, the column has both vapor and liquid holdups. A lack of feed therefore is
buffered by this storage first, explaining the delay. This PC interestingly demonstrates
that other PCs flag the occurrence of the fault before the PC that directly monitors
the variables associated with the unit itself.
26
explanation, the location of the variables is provided: the heat exchanger described
by this PC is located far away from the column and upstream. Only the acetic acid
recycle stream is influencing the upstream part, causing the lack of observed fault.
• PC 6 (Figure 10f) shows an immediate response to the fault, as the first wave occurs
right after the start of Fault 8. The peak of the first wave is followed immediately by a
second and higher wave of UCL violation. The first wave can be described as short, as
at minute 110, the UCL is no longer violated. Twenty minutes later, a second wave is
observed, which is more substantial and lasts for 5 minutes. In contrast to the first two
waves, the peak is only a little above the UCL. After another twenty minutes a third,
even smaller wave occurs which is barely above the UCL threshold. PC 6 is dominated
by the absorber and decanter units, which are directly connected to the distillation
column. The bottom product of the absorber flows to the distillation column, while
the product of the head of the distillation column flows to the decanter. On the other
hand, the bottom product of the distillation column partly flows back to the absorber
as washing acid. The first two waves, which are directly connected are similar to the
behaviour observed in Fault 6, and similar to Fault 6, they are caused by the dynamics
of the controller. In the decanter, the temperature is a controlled variable. The column
head product is cooled in a condenser, which is manipulated by the controller. Due to
the change in the column feed, the temperature in the decanter drops. The controller
overshoots the set-point by making the inlet temperature higher. Therefore this effect
of the two direct waves can be seen. The waves have a constant time difference between
each other, which suggests that they are due to the recycling streams and this time
delay is again consistent with the recycle dynamics of the process.
This fault demonstrates particularly well the advantages of the decentralized process
monitoring strategy. Some of the T 2 statistic trends are insensitive to the fault, while others,
especially the PC concerning the distillation column, suggest the presence of a malfunction.
A correct diagnosis given these charts would be highly likely.
27
Conclusions
A data-based method to cluster the process variables for decentralized fault detection has
been presented. The method utilizes a sparse PCA model, where the number of NNZL
was chosen based on the Index of Sparseness, a metric combining different optimization
goals of sparse PCA. Then, the obtained loadings were analysed and connections as well
as dependencies among process variables were articulated. The method is illustrated using
the analysis of the benchmark vinyl acetate process and data generated from its simulation,
from which a sparse PCA model using a robust algorithm was built. It was found that
the arrangement of unit operations can lead to sparse loading clusters, but not exclusively.
More complex relations were found as well, primarily due to complex recycle structures.
Then, the obtained model was applied to the detection and isolation of various process
abnormalities using a decentralized scheme, i.e., instead of monitoring just one multivariate
chart, six individual control charts were built using the sparse PCs. Each control chart
was analyzed and a connection between the knowledge discovered during Phase I and the
behaviour during Phase II was made. It was demonstrated that sparse PCA can be not only
useful for knowledge discovery, but also for using the discovered knowledge to determine and
articulate the root cause of faults.
Acknowledgement
This research was partially carried out during the studies aboard of Maximilian Theisen at
the Chemical Engineering Department at UC Davis which was funded by the ISAP program
of the German Academic Exchange Agency (DAAD). Gyula Dörgő gratefully acknowledges
the financial support of the Fulbright Scholarship, which made his year at the University of
California, Davis possible and founded a collaboration between the research groups.
28
References
(1) Jiang, Q.; Huang, B. Distributed monitoring for large-scale processes based on mul-
tivariate statistical analysis and Bayesian method. J. Process Contr. 2016, 46, 75 –
83.
(2) Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S. N. A review of process
fault detection and diagnosis: Part I: Quantitative model-based methods. Comput.
Chem. Eng. 2003, 27, 293 – 311.
(3) Gao, H.; Gajjar, S.; Kulahci, M.; Zhu, Q.; Palazoglu, A. Process Knowledge Discovery
Using Sparse Principal Component Analysis. Ind. Eng. Chem. Res. 2016, 55, 12046–
12059.
(4) Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault Detection and Diagnosis in Industrial
Systems; Springer London: London, 2001; pp 35–55.
(5) Raich, A.; Çinar, A. Statistical process monitoring and disturbance diagnosis in multi-
variable continuous processes. AICHE J. 1996, 42, 995–1009.
(6) Cadima, J.; Jolliffe, I. T. Loading and Correlations in the Interpretation of Principle
Compenents. J. Appl. Stat. 22, 203–214.
(7) Jolliffe, I. T.; Trendafilov, N. T.; Uddin, M. A Modified Principal Component Technique
Based on the LASSO. J. Comput. Graph. Stat. 2003, 12, 531–547.
(8) Qi, X.; Luo, R.; Zhao, H. Sparse principal component analysis by choice of norm. J.
Multivariate Anal. 2013, 114, 127 – 160.
(9) Zou, H.; Hastie, T.; Tibshirani, R. Sparse Principal Component Analysis. J. Comput.
Graph. Stat. 2006, 15, 265–286.
(10) Journée, M.; Nesterov, Y.; Richtárik, P.; Sepulchre, R. Generalized Power Method for
Sparse Principal Component Analysis. J. Mach. Learn. Res. 2010, 11, 517–553.
29
(11) Trendafilov, N. T. From simple structure to sparse components: a review. Computation.
Stat. 2014, 29, 431–454.
(12) d’Aspremont, A.; Ghaoui, L. E.; Jordan, M. I.; Lanckriet, G. R. G. A direct formulation
for sparse PCA using semidefinite programming. CoRR 2004, cs.CE/0406021 .
(13) Moghaddam, B.; Weiss, Y.; Avidan, S. Spectral Bounds for Sparse PCA: Exact and
Greedy Algorithms. Proceedings of the 18th International Conference on Neural Infor-
mation Processing Systems. Cambridge, MA, USA, 2005; p 915–922.
(14) Bin Shams, M. A.; Budman, H. M.; Duever, T. A. Fault Detection, Identification and
Diagnosis Using CUSUM Based PCA. Chem. Eng. Sci. 66, 4488–4498.
(15) Choi, S. W.; Lee, C.; Lee, J.-M.; Park, J. H.; Lee, I.-B. Fault detection and identification
of nonlinear processes based on kernel PCA. Chemometr. Intell. Lab. 2005, 75, 55 –
67.
(16) Misra, M.; Yue, H.; Qin, S.; Ling, C. Multivariate process monitoring and fault diagnosis
by multi-scale PCA. Comput. Chem. Eng. 2002, 26, 1281 – 1293.
(17) Dong, Y.; Qin, S. J. A novel dynamic PCA algorithm for dynamic data modeling and
process monitoring. J. Process Contr. 2018, 67, 1 – 11, Big Data: Data Science for
Process Control and Operations.
(18) Grbovic, M.; Li, W.; Xu, P.; Usadi, A. K.; Song, L.; Vucetic, S. Decentralized fault
detection and diagnosis via sparse PCA based decomposition and Maximum Entropy
decision fusion. J. Process Contr. 2012, 22, 738 – 750.
(19) Gajjar, S.; Kulahci, M.; Palazoglu, A. Real-time fault detection and diagnosis using
sparse principal component analysis. J. Process Contr. 2018, 67, 112 – 128, Big Data:
Data Science for Process Control and Operations.
30
(20) Gajjar, S.; Kulahci, M.; Palazoglu, A. Least Squares Sparse Principal Component Anal-
ysis and Parallel Coordinates for Real-Time Process Monitoring. Ind. Eng. Chem. Res.
2020, 59, 15656–15670.
(21) Tang, W.; Daoutidis, P. Distributed control and optimization of process system net-
works: A review and perspective. Chinese J. Chem. Eng. 2019, 27, 1461 – 1473.
(22) Jiang, Q.; Yan, X.; Huang, B. Review and Perspectives of Data-Driven Distributed
Monitoring for Industrial Plant-Wide Processes. Ind. Eng. Chem. Res. 2019, 58, 12899–
12912.
(23) Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. Roy.
Stat. Soc. B 2005, 67, 301–320.
(24) Sjöstrand, K.; Clemmensen, L.; Larsen, R.; Einarsson, G.; Ersbøll, B. SpaSM: A MAT-
LAB Toolbox for Sparse Statistical Modeling. J. Stat. Softw. 2018, 84, 1–37.
(25) Feng, C.-M.; Gao, Y.-L.; Liu, J.-X.; Zheng, C.-H.; Li, S.-J.; Wang, D. A Simple Re-
view of Sparse Principal Components Analysis. Intelligent Computing Theories and
Application. Cham, 2016; pp 374–383.
(26) Chen, R.; Dave, K.; McAvoy, T. J.; Luyben, M. A Nonlinear Dynamic Model of a Vinyl
Acetate Process. Ind. Eng. Chem. Res. 2003, 42, 4478–4487.
(27) Gollmann, D.; Gurikov, P.; Isakov, A.; Krotofil, M.; Larsen, J.; Winnicki, A. Cyber-
Physical Systems Security: Experimental Analysis of a Vinyl Acetate Monomer Plant.
Proceedings of the 1st ACM Workshop on Cyber-Physical System Security. New York,
NY, USA, 2015; p 1–12.
31
Supporting Information
32
Graphical TOC Entry
33