2019 Book MachineLearningForCyberPhysica
2019 Book MachineLearningForCyberPhysica
Jürgen Beyerer
Christian Kühnert
Oliver Niggemann Editors
Machine Learning
for Cyber Physical
Systems
Selected papers from the International
Conference ML4CPS 2018
Technologien für die intelligente Automation
Technologies for Intelligent Automation
Band 9
Oliver Niggemann
inIT - Institut für industrielle
Informationstechnik
Hochschule Ostwestfalen-Lippe
Lemgo, Germany
Springer Vieweg
© The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and
reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence and indicate if changes were made.
The images or other third party material in this book are included in the book's Creative Commons licence,
unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative
Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions
that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
This Springer Vieweg imprint is published by the registered company Springer-Verlag GmbH, DE part of Springer Nature
The registered company address is: Heidelberger Platz 3, 14197 Berlin, Germany
Preface
Cyber Physical Systems are characterized by their ability to adapt and to learn.
They analyze their environment, learn patterns, and they are able to generate
predictions. Typical applications are condition monitoring, predictive mainte-
nance, image processing and diagnosis. Machine Learning is the key technology
for these developments.
The editors would like to thank all contributors that led to a pleasant and
rewarding conference. Additionally, the editors would like to thank all reviewers
for sharing their time and expertise with the aut hors. It is hoped that these
proceedings will form a valuable addition to the scientific and development al
knowledge in t he research fields of machine learning, information fusion, system
technologies and industry 4.0.
18
26
36
46
58
66
77
87
97
107
Making Industrial Analytics work for Factory Automation Applications . 116
Markus Koester
Application of Reinforcement Learning in Production Planning and
Control of Cyber Physical Production Systems ...................... . 123
Andreas Kuhnle, Gisela Lanza
1 Introduction
industries from the sectors of aluminum and plastic, are to be optimized via
data-driven methodology.
In this work, we are focusing on a specific use case from the plastic industry.
We use sensor measurements provided by the cyber-physical systems of a real
production line producing coffee capsules and aim to reduce the waste quantity,
i.e., the number of low-quality production cycles, in a data-driven way. To this
end, we model the problem of waste quantity reduction as a two-class classifica-
tion problem and investigate different fundamental machine learning approaches
for detecting and predicting low-quality production cycles. We evaluate the ap-
proaches on a data set from a real production line and compare them in terms
of classification accuracy.
The paper is structured as follows. In Section 2, we describe the production
process and the collected sensor measurements. In Section 3, we present our
classification methodology and discuss the results. In Section 4, we conclude this
paper with an outlook on future work.
parts. Ideally, the defect rate of each cycle tends toward zero with a minimum
waste of raw material. In fact, only cycles with a defect rate below a certain
threshold are acceptable to the manufacturer. In order to elucidate the man-
ufacturing process, we schematically show the parts and periphery of a typical
injection molding machine in Figure 1. As can be seen in the figure, the injection
molding machine comprises different parts, among which the plastification unit
builds the core of the machine, and controllers that allow to steer the production
process.
The MONSOON Coffee Capsule and Context data set [2] utilized in this
work comprises information about 250 production cycles of coffee capsules from
a real injection molding machine. It contains 36 real-valued attributes reflecting
the machine’s internal sensor measurements for each cycle. These measurements
include values about the internal states, e.g. temperature and pressure values,
as well as timings about the different phases within each cycle. In addition, we
also take into account quality information for each cycle, i.e., the number of non-
defect coffee capsules which changes throughout individual production cycles. If
the number of produced coffee capsules is larger than a predefined threshold,
we label the corresponding cycle with high.quality, otherwise we assign the label
low.quality. The decision about the quality labels was made by domain experts.
Based on this data set, we benchmark different fundamental machine learning
approaches and their capability of classifying low-quality production cycles based
on the aforementioned sensor measurements. The methodology and results are
described in the following section.
– Classification and Regression Trees [9]: A decision tree classifier that hierar-
chicaly partitions the data.
– Random Forests [3]: A combination of multiple decision trees in order to
avoid over-fitting.
– Support Vector Machines [11]: An approach that aims to separate the classes
by means of a hyperplane. We investigate both linear SVM and SVM with
RBF kernel function.
As can be seen from the table above, all predictive models reach a clas-
sification accuracy of at least 63%, while the highest classification accuracy of
approximately 69% is achieved by the k-Nearest Neighbor classifier. For this clas-
sifier, we utilized the Euclidean distance and set the number of nearest neighbors
k to a value of 7. In fact, the k-Nearest Neighbor classifier is able to predict the
correct quality labels for 172 out of 250 cycles on average.
It is worth nothing that this rather low classification accuracy (69%) might
have a high impact on the real production process, since in our particular domain
hundreds of coffee capsules are produced every minute such that even a small
enhancement in waste quantity reduction will lead to a major improvement in
production costs reduction. In addition, we have shown that the performance of
the k-Nearest Neighbor classifier can be improved to value of 72% when enriching
the sensor measurements with additional process parameters [2].
To conclude, the empirical results reported above indicate that even a simple
machine learning approach such as the k-Nearest Neighbor classifier is able to
predict low-quality production cycles and thus to enhance the waste quantity
reduction. Although the provided sensor measurements are of limited extent
regarding the number of measurements, we believe that our investigations will be
helpful for further data-driven approaches in the scope of the project MONSOON
and beyond.
5
5 Acknowledgements
This project has received funding from the European Unions Horizon 2020 re-
search and innovation programme under grant agreement No 723650 - MON-
SOON. This paper reflects only the authors views and the commission is not
responsible for any use that may be made of the information it contains. It is
based on a previous paper [2].
References
1. Beecks, C.: Distance based similarity models for content based multimedia re-
trieval. Ph.D. thesis, RWTH Aachen University (2013)
2. Beecks, C., Devasya, S., Schlutter, R.: Data mining and industrial internet of
things: An example for sensor-enabled production process optimization from the
plastic industry. In: International Conference on Industrial Internet of Things and
Smart Manufacturing (2018)
3. Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
4. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE transactions on
information theory 13(1), 21–27 (1967)
5. Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier
under zero-one loss. Machine learning 29(2), 103–130 (1997)
6. Hetland, M.L., Skopal, T., Lokoč, J., Beecks, C.: Ptolemaic access methods: Chal-
lenging the reign of the metric space model. Information Systems 38(7), 989–1006
(2013)
7. Kuhn, M.: Building predictive models in r using the caret package. Journal of
Statistical Software, Articles 28(5), 1–26 (2008)
6
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
Deduction of time-dependent machine tool characteristics
by fuzzy-clustering
Abstract. With the onset of ICT and big data capabilities, the physical asset and
data computation is integrated in manufacturing through Cyber Physical Sys-
tems (CPS). This strategy also denoted as Industry 4.0 will improve any kind of
monitoring for maintenance and production planning purposes. So-called big-
data approaches try to use the extensive amounts of diffuse and distributed data
in production systems for monitoring based on artificial neural networks
(ANN). These machine learning approaches are robust and accurate if the data
base for a given process is sufficient and the scope of the target functions is cur-
tailed. However, a considerable proportion of high-performance manufacturing
is characterized by permanently changing process, workpiece and machine con-
figuration conditions, e.g. machining of large workpieces is often performed in
batch sizes of one or of a few parts. Therefore, it is not possible to implement a
robust condition monitoring based on ANN without structured data-analyses
considering different machine states – e.g. a certain machining operation for a
certain machine configuration. Fuzzy-clustering of machine states over time
creates a stable pool representing different typical machine configuration clus-
ters. The time-depending adjustment and automatized creation of clusters ena-
bles monitoring and interpretation of machine tool characteristics independently
of single machine states and pre-defined processes.
1 Introduction
This results in difficulties to correlate any kind of measuring data with the health
state of the machine and its components. Measures to address these challenges are:
1. Definition of Machine States (MSs) based on trigger parameters (TPs) (Table 1).
2. Deduction and comparison of Characteristic Values (CVs) is only carried out
a. for the same machine state
b. Gradually for a cluster resulting from the fuzzy-clustering (see 5 below)
3. Deduction of dynamic limits for the CVs over time
4. Fuzzy-based interpretation of the current CV-values regarding their expectation
values (see section 5, Fig. 5)
5. Fuzzy-Clustering of MSs to create a stable pool including a broad range of charac-
teristically configurations of the machine tool
Table 1. Normalized data of MSs using the relative normalization of TP, overall cycle.
MS 1 2 3 4 5 6 7 8 9
TP
1.1 Automatic mode 1 1 1 1 1 1 1 1 1
3.1 x-pos. 1 1 1 1 1 1 1 1 1
4.1 y-pos. 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
4.2 y-SRV ǻ 0 0 0 0 0 0 0 0 0
5.1 z-pos. 1 0.5 0 0.86 0.41 0.14 0.05 0.55 0.95
6.1 Jerk 1 1 1 1 1 1 1 1 1
7.1 Acceleration 1 1 1 0.5 0.5 0 0.75 0.75 0.75
8.1 Feed rapid traverse 1 1 1 0 0.67 0.83 0.67 1 1
9.1 Temperature of y2 0 0.40 0.66 0.81 0.96 0.91 1 0.71 0.70
ball-screw nut
TPs can vary in a broad range, e.g. the current position of an axis or the feed. A
combination that doesn’t occur in praxis – e.g. a stroke between 0 and 1 mm for a
given axis – is not detectable and therefore it does not increase the complexity. How-
ever an axis stroke of 1000 mm could be divided from any numerical integer between
2 and in principle. Thus it is still necessary to have an upfront definition of TPs
ranges. A practical solution for dynamic TPs like the jerk, the acceleration or the feed
consists in definition of altering-constrains to intersect a MS in sub-phases.
A MS is not a singular event but a process which is characterized by a given
timespan. Real-life processes of machine tools are continuous and can be fragmented
in several sub-phases by various measures. An example would be a boring operation
with a specific tool. Another one could be the stroke of a single axis as depicted in
Table 2 and Fig. 2.
The definition of an overall process is complex and may vary depending on the de-
sired application or monitoring object. This process would be the highest level of a
MS as depict in Table 1. The y-axis executes a stroke from 300 mm up to 2400 mm
and back, therefor representing a complete cycle. This overall stroke can consequently
be divided into several sup-phases which can be treated as discrete MS. These “sub-
MS” can be identified in dependence of the altering of dynamic parameters as de-
scribed in Table 2. To distinguish them from each other every sub-MS is described by
numerical values depending on the level of the dynamic parameter (Table 2, left).
Alternative identifications are also conceivable. However the introduced description
based on levels links physical parameters directly to the sub-MSs.
10
If the lowest possible level is defined by the direction of the jerk, a maximum of 50
sub-phases can be identified based on path dynamics. We divide the overall stroke in
12 sub-phases based on the identification levels 1-3 of Table 2 for demonstration
purposes as depicted in Fig. 2. Practically other TPs like the dynamic path of a second
axis as well as process parameters could also vary in parallel.
The fuzzy clustering of MSs, as presented in [8] can be exercised without any consid-
eration of possible correlations between TPs and CVs. This is possible for a limited
number of pre-defined MSs based on practical considerations about components of
interest and – heuristically anticipated – correlations between CVs and TPs. If a broad
range of TPs is combined with a variable resolution of TP sections as well as time
spans the clustering of all combinations – for every CV – becomes unpractical, statis-
tically challenging and the information content decays. Therefore a reduction of sig-
nificant MS and TPs for these states is necessary. This task can be addressed by the
usage of an artificial neural network (ANN), but the robustness and accuracy of such
depends heavily on the quantity of training data. This means that every relevant MS
has to occur several times before the ANN can play off its strength. This is not a giv-
en in non-serial machine tool applications as described in section 1.
For this purpose, regression analysis between the TPs and the CVs can be em-
ployed as suggested in this paper. Based on the introduced cycle, a regression analysis
was carried out. The input variables (TPs) and the responses (CVs) used in the regres-
12
sion analysis are shown in Table 8. This includes all varying parameters of the MS.
The considered MS regression analysis does not aim to a quantification of the regres-
sion function between the input variables and the responses but it should statistical
validate the significance of the input variables (for more detail see [9]). Thus, a linear
function without any interactions is chosen for the regression analysis.
The included MSs are 10 sub-phases of Fig. 2 for every TP-combination of Table
1. Sup-phases 113 and 213 (Fig. 2) are not considered due to their corrupted meas-
urement data. It should be noted the TPs 4.1 and 4.2 vary in accordance to the sub-
phases. Therefore 90 different – but related – MS are taken into account.
4 Practical example
The test cycle of Fig. 2 was derived for the 9 MS in Table 1 (Fig. 4). 51 cycles were
successively executed for each MS, resulting in an overall time of 2550s. Every cycle
includes all sub-phase (“sub-MS”) of Fig. 2.
y-stroke z-stroke
Fig. 4. UNION PCR130 machine; y- and z-axis used for the test cycles
Based on these cycles, a linear regression analyses was derived for the sub-phases
using the commercial software Cornerstone®. The aim of the regression analyses is
not to derive a quantitative model with the aim to predict the CVs based on the TPs.
The data available is not sufficient for such a purpose. The regression model is only
linear and not representative for the TPs as well as the CVs overall range. However,
the regression analysis deducts significance terms for every input-parameter (= TP),
therefore distinguishing the relevant TPs for a given CV (responses in Table 4) from
the irrelevant ones. Furthermore, when comparing the significance terms of the TPs
with the adjusted R-Square value of the correlation analysis we obtain an assessment
13
Table 4. Correlation analysis results for the sub phases of MS 1-9 and both CVs.
R-Square
R-Square
Adjusted
accelera-
Constant
Position
Temp.
Feed f
Responses
tion a
phases
Error
RMS
z-
(= CVs)
1st 2nd 3rd
Peff 0.010 0.494 0,043 0.054 0.011 0.790 0.664 0,067
0 0
fmax 0.277 0.008 0,691 0.095 0.379 0.818 0.757 4,541
Peff 2e-05 0.977 0.635 0.005 1e-05 0.966 0.954 0.028
1
fmax 3e-09 0.769 0.535 0.135 0,379 0 0 8.054
1
Peff 1e-04 0.756 0.052 0.011 9e-05 0.934 0.913 0.035
1 2
fmax 3e-05 0.731 0.198 2e-4 2e-05 0.960 0.947 2.366
Peff 0.014 0.061 0.196 0.323 0.013 0.677 0.569 0.029
1
fmax 6e-05 0.088 0.683 0.407 0.204 0.359 0.267 9.828
2
Peff 0.011 0.052 0.158 0.356 0.010 0.698 0.597 0.039
2
fmax 9e-06 0.095 0.181 0.813 0.355 0.347 0.254 7.248
Peff 0.059 0.460 0.132 0.051 0.071 0.548 0.398 0.114
0 0
fmax 0.023 0.047 0.237 0.001 0.032 0.921 0.873 4.527
Peff 0.001 0.439 0.156 0.013 0.001 0.845 0.793 0.038
1
fmax 0.519 0.991 0.880 0.002 0.517 0.770 0.738 3.076
1
Peff 2e-04 0.128 0.588 0.002 1e-04 0.926 0.902 0.054
2 2
fmax 0.550 0.861 0.802 0.001 0.499 0.806 0.778 2.903
Peff 3e-05 0.895 0.687 0.485 3e-05 0.931 0.921 0.009
1
fmax 1e-04 0.041 0.619 0.163 0.168 0.471 0.396 9.440
2
Peff 0.002 0.004 0.997 0.006 0.901 0.829 0.772 0.015
2
fmax 5e-05 0.047 0.608 0.207 0.286 0.452 0.373 8.336
significant Semi-significant Non-significant
Several important conclusions can be detracted from the results of the correlation
analysis and the subsequent survey of Covariance matrix of the significant TPs:
x The most promising sub-phases with the best correlations are the dynamic phases
in the middle of the axis stroke; the auto-definition detects this sub-phase MSs
x The effective vibration level is clearly correlated to the temperature of the nut
x The ball pass frequency of the ball-screw nut outer ring is clearly correlated to the
feed (the frequency can be calculated based on geometric parameters)
14
x The quality of regression for the effective Vibration level (Peff) is significant in
more sub-phases and therefore more generally usable than the ball pass frequency
of the ball-screw (Y2)nut (fmax)
Therefore the auto detection mechanism would choose sub-phases 112 and 212 as
most relevant for monitoring. In regard to the CVs, the temperature remains the only
relevant TP for the effective Vibration level while the feed remains the only relevant
TP for the outer ring frequency of the ball-screw nut.
The clustering was deducted solely on base of the two relevant TPs for each of the
two CVs as described in section 4. The algorithm is described in detail in [8] based on
[9]. Every MS is gradually attributed to the cluster centres. The relevant TP 8.1 and
9.1 do not vary in accordance to the sub-phases, so the clustering solely depends on
the (average) TP of the 9 MS. We obtain cluster centres at 0.71/0.99/0.00 for TP 8.1
(feed rapid traverse) respectively 0.09/0.92/0.64 for TP 9.1 (temperature of y2 ball
screw nut). Table 5 depicts the TP-values for each MS and their affiliation rate.
Table 5. Normalized TP and affiliation rates per cluster for all MS; optimization cycle nopt =
100; fuzzifier w = 1.5
Maschine states
1 2 3 4 5 6 7 8 9
Relevant TPs
8.1 Feed rapid traverse (for
1 1 1 0 0.67 0.83 0.67 1 1
CV2)
9.1 Temperature of y2 ball-
0 0.40 0.66 0.81 0.96 0.91 1 0.71 0.70
screw nut (for CV1)
Cluster Affiliation rates per cluster
TP 8.1 1 0 0 0 1 0.732 1 0 0
1
TP 9.1 1 0.273 0 0 0 0 0 0 0
TP 8.1 0 1 1 0 0 0.268 0 1 1
2
TP 9.1 0 0.034 0 0.857 1 1 0.997 0.013 0.06
TP 8.1 0 0 0 1 0 0.000 0 0 0
3
TP 9.1 0 0.693 1 0.143 0 0 0.003 0.987 0.994
y-stroke z-stroke
Fig. 5. Cluster-CV progress including Fuzzification ; CV1: Peff of ball-screw nut of Y2 axis
6 Conclusion
quality stability – becomes possible when combined with structural information’s and
a process-evaluation regarding their cluster attribution.
Further research is necessary due to different clustering approaches as well as more
complex regression model approaches (e.g. quadratic). Furthermore, the deduction of
complex Characteristic values for entire structural components using several CVs
based on different algorithms will be investigated.
Acknowledgements
The research presented in this paper is funded by the European Union (European
Social Fund) and by the Free State of Saxony. The authors would like to thank the
founders.
References
1. Lee, J.; Bagheri, B.; Kao, H.-A.: "A Cyber-Physical Systems architecture for Industry 4.0-
based manufacturing systems", Manufacturing Letters. 18–23 2015.
2. Lu, Y.: Industry 4.0: a survey on technologies, applications and open research issues.
Journal of Industrial Information Integration 6, 1-10 (2017)
3. Gausemeier, J.; Klocke, F.: Industrie 4.0 – International Benchmark, Options for the Fu-
ture and Recommendations for Manufacturing Research, Paderborn 2016.
4. J. T. Farinha, I. Fonseca, R. Oliveira und H. Raposo, „CMMS – An integrated view from
maintenance management to on-line condition monitoring,“ in Proceedings of
Maintenance Performance Measurement and Management (MPMM) Conference,
Coimbra, Portugal, 2014.
5. R. Teti, K. Jemielniak, G. O'Donnell and D. Dornfeld, „Advanced monitoring of
machining operations,“ CIRP Annals - Manufacturing Technology, Nr. 59, pp. 717-739,
2010.
6. W. Derigent, E. Thomas, E. Levrat and B. Iung, „Opportunistic maintenance based on
fuzzy modelling of component Proximity,“ CIRP Annals - Manufacturing Technology, Bd.
58, pp. 29-32, 2009.
7. M. Putz, U. Frieß, M. Wabner, A. Friedrich, A. Zander and H. Schlegel, „State-based and
self-adapting Algorithm for Condition Monitoring,“ in 10th CIRP Conference on
Intelligent Computation in Manufacturing Engineering - CIRP ICME '16, Ischia, Naples,
Italy, 20 - 22 July 2016
8. U. Frieß, M. Kolouch, M. Putz, A. Friedrich and A. Zander: “Fuzzy-clustering of machine
states for condition monitoring”, CIRP Journal of Manufacturing Science and Technology,
Vol. XX, xxx-xxx, 2018.
9. R. Kruse, C. Borgelt, C. Braune, F. Klawonn, C. Moewes und M. Steinbrecher,
Computational Intelligence - Eine methodische Einführung in Künstliche Neuronale
Netze, Evolutionäre Algorithmen, Fuzzy-Systeme und Bayes-Netze, Wiesbaden: Springer
Vieweg, 2. Auflage 2015.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
Unsupervised Anomaly Detection
in Production Lines
1 Introduction
In the last couple of years, the importance of cyber-physical systems in order to
optimize industry processes, has led to a significant increase of sensorized pro-
duction environments. Data collected in this context allows for new intelligent
solutions to e.g. support decision processes or to enable predictive maintenance.
One problem related to the latter case is the detection of anomalies in the behav-
ior of machines without any kind of predefined ground truth. This fact is further
complicated, if a reconfiguration of machine parameters is done on-the-fly, due
to varying requirements of multiple items processed by the same production
line. As a consequence, a change of adjustable parameters in most cases directly
leads to divergent measurements, even though those observations should not be
regarded as anomalies.
In the scope of the EU-Project COMPOSITION (under grant no. 723145), the
task of detecting anomalies for predictive maintenance within historical sensor
data from a real reflow oven was investigated. While the oven is used for soldering
surface mount electronic components to printed circuit boards based on contin-
uously changing recipes, one related problem was the unsupervised recognition
2 Related Work
While the topic of anomaly detection and feature extraction is covered by a broad
amount of literature, in the following we will focus on a selection of approaches
that led to the here presented algorithm. Recently, the automatic description
of time series, in order to understand the behavior of data or to perform sub-
sequent operations has drawn the attention of many researchers. One idea in
this regard is the exploitation of Gaussian processes [3, 5] or related structural
compositions [4]. Here, a time series is analyzed using a semantically intuitive
grammar consisting of a kernel alphabet. Although corresponding evaluations
show impressive results, they are rather applicable to smaller or medium sized
historical data, since the training of models is comparatively time consuming.
In contrast, other approaches exist, which focus on the extraction of well-known
statistical features, further optimized by means of an additional feature selec-
tion in a prior stage [2]. However, the selection of features is evaluated based on
already provided knowledge and thus not applicable in unsupervised use-cases.
A last approach discussed here, uses the idea of segmented self-similarity joins
based on raw data [7]. In order to decrease the complexity, segments of a time
series are compared against each other in the frequency domain. Even though
this idea provides an efficient foundation for many consecutive application sce-
narios, it lacks the semantic expressiveness of descriptive features as it is the
case for the already mentioned methods.
In the upcoming chapter we consequently try to deal with those challenges, while
presenting our approach for unsupervised anomaly detection.
3 Approach
(3.4). GADPL is also summarized in figure Algorithm 1, at the end of this chap-
ter.
3.2 Segmentation
the transition part of all segments, which have been created due to configu-
ration changes, gets truncated. If segments become smaller than a predefined
threshold, they can be ignored in the upcoming phases.
Algorithm 1 GADPL
Require: Time series T , Machine parameters M , Configuration transition time p,
Segment length (lmin , lmax ), Number of nearest neighbors k,
Dissimilarity threshold Δmax
C = cluster configurations(T , M )
R = {R1 , .., R|C | }
for all configuration segments Ci in C do
for all segments sj in Ci do
sj = truncate transitions(sj , p)
if |sj | < lmin then
Ci = Ci \ s j
else if |sj | > lmax then
sj = split segments(sj , lmax )
Ci = Ci ∪ sj
Ci = Ci \ s j
end if
Ri = Ri ∪ extract features(sj )
end for
end for
for all configuration representatives Ri in R do
for all representatives rj in ri do
N Nk = query index(rj , k)
if Δ(rj , N Nk ) > Δmax then
emit anomaly(i, j)
end if
end for
end for
anomalous behavior.
As one potential solution GADPL instead uses the mean over a specified size
of nearest neighbors, depicting the most similar behavior according to each seg-
ment. The idea is that even though there might multiple distinct characteristics
in the data, at least a predefined number of elements represent the same be-
havior compared to the processed item. Otherwise, this item will even have a
high average dissimilarity with respect to the most similar observations and can
therefore be classified as anomaly.
Let ri be the representative vector of the i-th segment obtained by feature ex-
traction and let N Nk (ri ) be the according set of k nearest neighbors. The dis-
similarity measure Δ for ri is defined as:
k
Δ(ri , N Nk (ri )) = k1 j=1 δ(ri , N Nkj (ri ))
where N Nkj (ri ) corresponds to the j-th nearest neighbor and δ to a ground dis-
tance defined on Rn .
Here, for the vectorized feature representations, any suitable distance function
δ is applicable. In the context of COMPOSITION we decided to use the Eu-
clidean distance for a uniform distribution of weights, applied to normalized
23
Fig. 1. Application of GADPL: The upper part shows the segmentation of time anno-
tated power consumption data in percent. The lower part illustrates the result of the
dissimilarity measurement, where the red rectangle indicates classified anomalies.
4 Evaluation
In this section we will discuss the evaluation performed on a historical data
set, provided in the scope of COMPOSITION. While in future, the algorithm
should be applied to continuously streamed sensor data, the initial evaluation
was performed on recorded data, captured over a period of seven years. The
data consists of machine parameters (already classified by recipe names) and
time-annotated sensor measurements including temperature value and power
consumption, based on a sampling rate of 5 minutes. In addition, a separate
maintenance log covers the dates of previous fan exchanges. However, malfunc-
tions only occurred two times during runtime and are therefore comparatively
rare. A confirmation of results due to actual defect components is consequently
restricted to some extent. Since this project and the here presented approach
are regarded as ongoing work, the outlined evaluation is continued likewise.
Figure 1 illustrates the application of GADPL, including segmentation (upper
part) and dissimilarity measurement (lower part), for the time around one fan
failure. Here, differently colored circles represent slices of the time series after
segmentation, describing the percentage power consumption of a fan. Using the
features mentioned in section 3.3, we intended to perceive deviating values and
untypical fluctuations within the data, without being sensitive to outliers aris-
ing from single incorrect sensor measurements. Having one of the recorded fan
24
exchanges at the end of February 2012, the result of the algorithm clearly shows
significantly higher values for the dissimilarity (red rectangle) prior to the event.
Even though increased dissimilarity values at the end of May 2011 and around
September 2011 can be be explained by analyzing the original data, yet there
were no recordings for a defect component at those times. However this does not
automatically imply incorrect indications, since defect machine parts are not the
only reasoning for anomalous characteristics in the data. An appropriate choice
for the value of a maximal dissimilarity, defining the anomaly threshold, can
therefore highly influence the accuracy.
Both cases of a defect fan behavior were clearly captured by the algorithm and
emphasized by a high dissimilarity.
5 Conclusion
6 Acknowledgements
This project has received funding from the European Unions Horizon 2020 re-
search and innovation programme under grant agreement No 723145 - COMPO-
SITION. This paper reflects only the authors views and the commission is not
responsible for any use that may be made of the information it contains.
25
References
1. Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The
r*-tree: an efficient and robust access method for points and rectangles. In Acm
Sigmod Record, volume 19, pages 322–331. Acm, 1990.
2. Maximilian Christ, Andreas W. Kempa-Liehr, and Michael Feindt. Distributed and
parallel time series feature extraction for industrial big data applications. CoRR,
abs/1610.07717, 2016.
3. David Duvenaud, James R. Lloyd, Roger Grosse, Josh B. Tenenbaum, and Zoubin
Ghahramani. Structure discovery in nonparametric regression through composi-
tional kernel search. In Sanjoy Dasgupta and David McAllester, editors, ICML
2013: Proceedings of the 30th International Conference on Machine Learning, vol-
ume 28 of JLMR Proceedings, pages 1166–1174. JLMR.org, June 2013.
4. Roger Grosse, Ruslan Salakhutdinov, William T. Freeman, and Joshua B. Tenen-
baum. Exploiting compositionality to explore a large space of model structures. In
Nando de Freitas and Kevin Murphy, editors, Proceedings of the 28th Conference in
Uncertainty in Artificial Intelligence, Corvallis, Oregon, USA, 2012. AUAI Press.
5. James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, and
Zoubin Ghahramani. Automatic construction and natural-language description of
nonparametric regression models. CoRR, abs/1402.4304, April 2014.
6. Svante Wold, Kim Esbensen, and Paul Geladi. Principal component analysis.
Chemometrics and intelligent laboratory systems, 2(1-3):37–52, 1987.
7. Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding,
Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. Matrix
profile i: all pairs similarity joins for time series: a unifying view that includes motifs,
discords and shapelets. In Data Mining (ICDM), 2016 IEEE 16th International
Conference on, pages 1317–1322. IEEE, 2016.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
A Random Forest Based Classifier for Error
Prediction of Highly Individualized Products
Gerd Gröner
1 Introduction
In a manufacturing process with highly individual products like ophthalmic
lenses, which are produced according to personalized prescriptions, it is difficult
to identify orders that are likely to fail within the production process already in
advance. These products might fail due to their difficult and diverse parameter
combinations. The parameters cover raw material characteristics, lens design,
geometry and manufacturing parameters (i.e., machine setting values). Even
such individual, prescribed products are not excluded from hard market compe-
titions. Accordingly, avoiding waste of material and working time is an emerging
problem. Obviously, since such customer-specific, individual products are not
interchangeable or replaceable by other products (like in case of on-stock prod-
ucts), it is highly valuable to avoid any kind of scrap / failure already beforehand
the production. Summing up, it is becoming more and more useful to analyze
product (order) parameters and find features and feature correlations in order
to predict (potential) failures already prior to the start of any manufacturing
process.
In our case, we are confronted with a rather hard problem since the products
can not be perfectly discriminated into good or bad ones solely based on their
product characteristics (which are given by individual prescription and design
in our case) and their corresponding target processing parameters. Therefore, it
is a challenging machine learning (ML) task to remedy this problem within an
advance distinction between good and potential faulty products, while, at the
same time, avoiding ML pitfalls like over-fitting. Furthermore, the pure number
of features is high and the data set is quite imbalanced, hampering the straight
forward exploitation of ML models.
Until now, ML is used for error detection in different manufacturing areas
(e.g., [1–3]), but due to the domain-specific data (highly individualized) and
fully-automated and very standardized manufacturing processes, the gap be-
tween different parameter combinations and the resulting processing steps is an
open challenge for applying ML technologies and assessing their benefits accord-
ingly.
We present a random forest classifier for error prediction that resulted from
a deep analysis of different ML algorithms, which has been used to train various
models. These models are evaluated in terms of their classification quality. The
best model is presented in detail. Interestingly, doubts (like difficult distinction)
and findings (like important features) of the domain experts form the manufac-
turing division were confirmed by the model. Finally, we give an argumentation
why the random forest model outperforms other (rather complex) models like
Neural Networks and Support Vector Machines (SVM) within this particular use
case.
2 Background
a variety of algorithms (cf. [4–6]), ranging from rather basic ones like regression
and Naive Bayes, to more difficult algorithms (in terms of setting-up and compu-
tation) like artificial neural networks (ANN), support vector machines (SVM),
decision trees and extensions of them like random forests classifiers (RFCs) and
boosted decision trees. Boosted decision trees and random forests belong to the
so-called ensemble algorithms, i.e., a set of trees or a forest is built by an en-
semble of decision trees. Ensemble algorithms implement methods to generate
multiple classifiers and then aggregate their results (cf. [16]). Boosted decision
tree algorithms apply a strategy of state-wise optimization of trees (measured
in terms of loss functions) [14, 15]. Trees within the ensemble of random forests
are built by randomly selecting the input features. Each tree in the ensemble is
obtained by randomly selecting the input features. Within each tree, each node
is still split by using the best feature (measured in terms of cost functions). The
final result of the forest is obtained by unit votes from the trees for the most
popular class.
The data set is obtained from a rather dedicated domain, following a production
process for highly individualized products, there are some essential key charac-
teristics that are comparable and transferable to different problems in completely
other domains. Therefore, we have to tackle challenges to cope with the following
data and application characteristics.
The data set is highly imbalanced, which is actually in the nature of error and
non-error classification problems. As already mentioned, we have a relationship
of roughly 5.4 % belonging to the minority class (error case), while slightly
more than the remaining 94.6 % of the data samples belong to the majority
class (non-error case). It is well known that the best classification results can be
achieved on balanced data sets (cf. [11–13]). Furthermore, in our case, we are not
only interested in the correct classification, we also want to know which are the
most influential features for ending-up in one of these two classes. Thus, a sound
prediction model that is able to do a proper classification (i.e., a non-guessing
solution!) is needed.
A further property is the complexity of the model. The pure number of sam-
ples (roughly 560000 entries in the data set) is a decent size, but the compared
amount of features (about 130) is rather high. In particular, not only the number
itself is an issue, it is rather the feature characteristic that counts for complexity,
as we will see later. There are no dominating single features and the number of
influential features is high, ending up with models that need a deep consideration
of feature manifestation and combinations, as demonstrated in the next section.
Finally, the third characteristic is the vague discriminability, which is the
most difficult one to handle in our case. Given all the features of a particularly
ordered product of an error case, the manufacturing process at the first time
has failed, while the second run with quite similar or even the same features (in-
29
features. Moreover, the discrimination between error and non-error (if possible
at all) requires the comprehensive consideration of various features and their
relations, which has been outlined in our comparison. For instance, less-complex
algorithms like Naive Bayes and regressions are not able to do a decent classifica-
tion. Algorithms known as complex and partially hard to initialize like support
vector machines (SVM) and artificial neural networks (ANN) are able to make
proper binary classifications, but with a low F1 score. Tree-based algorithms out-
perform all others. The best results are obtained by boosted trees and, slightly
better, by random forest classifiers.
Table 1 shows an excerpt of an algorithm comparison. The first column de-
scribes the used algorithm to train the model. Column two gives the setting pa-
rameters of the algorithm. If no parameter is given, the default values are taken
(from Scikit learn). The presented setting parameters are those which ended up
in the best results, mainly received by several trials and applying cross-validation
strategies (We used a 5-fold cross validation on the training data set).
The third column describes the performance in terms of precision, followed by
the recall in column four and the summarized F1 score in column five, concluded
by the ROC-AUC value (area under the ROC curve). All models where trained
with these algorithms from the Scikit learn package in Python.
For the random forest classifier (RFC), we explicitly parametrized the algo-
rithm with the minimum number of samples for a split to 3, and no limit of the
maximum depth of the branches in a tree. The quality of a split is measured by
the Gini impurity. This measure judges the quality of a selected target variable,
which is used to split a node, i.e., reflecting the importance or “best split criteria”
in a tree. The Gini impurity measures how often an element is wrongly classified
(i.e., assigned to a subset (bin)), if the “correct” label reflects the random label
assignment of the distribution of labels within the subset.
The boosted decision tree (implemented by AdaBoost in Scikit learn) has
been constituted within a rather similar setting. The tree properties are set to
the minimum number of samples for a split to three, no limitation on the depth
31
and also the Gini impurity is used to assess the split quality. The learning rate
shrinks the contribution of a single classifier within the ensemble. We use the
default boosting algorithm (SAMME.R), which aims at converging faster than
the other options.
The artificial neural network (ANN) (also referred to as multi-layer percetron
- MLP - classifier) uses an adaptive learning rate, which means that the learning
rate is reduced (divided by five) as far as in two successive runs the training
loss does not decrease. The parameter alpha represents the regulation of the
L2 penalty (i.e., Ridge penalty). The value is higher than the default, implying
smaller coefficients (weights). The parameter on the hidden layers defines the
number of hidden layers (five in our case) and also the number of nodes (neurons)
in each layer.
For the support vector machines (SVM) (or support vector classifier), we use
the rbf (radial basis function) kernel. (The rbf kernel uses a squared Euclidean
distance as measurement for data (point) separation. The gamma coefficient
is set to auto, which meas that the quotient from one and the number (n) of
features. The penalty parameter for errors (C) is five. This parameter is balancing
between errors in training compared to errors in testing, i.e., it influences the
generalization of a classifier to unseen data.
The random forest classifier was set up by using a 5-fold cross validation
(grid search with parameter alternatives) in order to find the best parameter
combinations (e.g., the minimum samples within a leaf). We need very deep
trees (setting no depth limitation) and a very low splitting rate in the nodes
(best results are achieved with three sample splits). The average tree depth is
51. A further interesting finding is the distance between precision and recall.
While the precision is about 0.74, recall ended up with 0.4 (F1 score is 0.52).
32
While it is often argued that both described tree algorithms (i.e., boosted deci-
sion trees and random forests) tend to perfectly adapt their feature values and
thus suffer often from overfitting, Breimann [5] showed that random forests are
robust against overfitting, providing (among others) possibilities to set regular-
ization parameters.
It is worth to notice that due to the rather low ratio of the error samples (so-called
minority class), we applied re-sampling methods [7, 8] to obtain a more balanced
data set. The best results were achieved by down-sampling (i.e., reducing the
data set size) in combination with a slight up-sampling, such that the error
ratio raises up to nearly 18 %. There is no dominating feature among the most
important features.
While several practical comparisons (e.g., [19]) show that the complex ANN
outperforms random forests, the variety of important (but not dominating fea-
tures) combined with their different results of interactions and the threat of
overfitting might cause the predominance of random forests in our case.
Nevertheless, we stress that the best results of the random forests is based on
the underlying data set and application use case with no indication as a general
superiority of random forest classifiers to other classification algorithms, which
was for instance argued in [18], but later contradicted (in terms of generalizabil-
ity) in [17].
33
References
1. Henmi, T., Deng, M., Yoshinaga, S.: Early Detection of Plant Faults by Using
Machine Learning. Int. Conf. on Advanced Mechatronic Systems (ICAMechS), 2016
2. Zidek, K., Maxim, V.: Diagnostics of Product Defects by Clustering and Machine
Learning Classification Algorithm. Journal of Automation and Control, vol.3, 2015
2
Post on KDnuggets: “When Does Deep Learning Work Better Than
SVMs or Random Forests?”, https://www.kdnuggets.com/2016/04/
deep-learning-vs-svm-random-forest.html
3
https://towardsdatascience.com
34
3. Meshram, A., Haas, C.: Anomaly Detection in Industrial Networks using Machine
Learning: A Roadmap. Machine Learning for Cyber Physical Systems: Selected pa-
pers from the International Conference ML4CPS 2016. Ed.: J. Beyerer, Springer,
pp. 65–72, 2017
4. Géron, A.: Hands-On Machine Learning with Scikit-Learn & TensorFLow. O’Reilly,
2017
5. Breiman, L.: Random Forests. Machine Learning, pp. 5–32, vol. 7, Kluwer Academic
Publishers, 2001
6. Rashid, T., Neuronale Netze selbst programmieren. O’Reilly, 2017
7. Lemaı̂tre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: A Python Toolbox to
Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine
Learning Research, vol. 18, pp. 1-5, 2017
8. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A Study of the Behavior of Several
Methods for Balancing Machine Learning Training Data. ACM SIGKDD, vol. 6 (1),
pp. 20–29, 2004
9. Ahmad, M.W., Mourshed, M., Rezgui, Y.: Trees vs Neurons: Comparison between
Random Forest and ANN for High-Resolution Prediction of Building Energy Con-
sumption. Energy and Buildings, Elsevier, vol. 147, pp. 77–89, 2017
10. Zhang, Y., Guo, W., Ray, S.: On the Consistency of Feature Selection With Lasso
for Non-linear Targets. Proc. of the 33rd Int. Conference on Machine Learning,
vol. 48, pp. 183–191, 2016
11. Eitrich, T., Kless, A., Druska, C., Meyer, W., Grotendorst, J.: Classification of
Highly Unbalanced CYP450 Data of Drugs Using Cost Sensitive Machine Learning
Techniques. Journal of Chemical Information and Modeling, vol. 47 (1), pp. 92–103,
2007
12. Wang, S., Yao, X.: Multiclass Imbalance Problems: Analysis and Potential Solu-
tions. Systems Man Cybernetics Part B - Journal IEEE Transactions on Cybernet-
ics, vol. 42, pp. 1119–1130, 2012
13. Kubat, M., Matwin, S.: Addressing the Course of Imbalanced Training Sets: One-
Sided Selection. Proc. of the 14th Int. Conference on Machine Learning, pp. 217–225,
1997
14. Wyner, A.J., Olson, M., Bleich, J., Mease, D.: Explaining the Success of AdaBoost
and Random Forests as Interpolating Classifiers. Journal of Machine Learning Re-
search, vol. 18, pp. 48:1–48:33, 2017
15. Friedman, J.: Greedy Function Approximation: A Gradient Boosting Machine.
Annals of Statistics, pp. 1189–1232, 2001
16. Liaw, A., Wiener, M.: Classification and Regression by Randomforest. R news,
vol. 2 (3), pp. 18–22, 2002
17. Wainberg, M., Alipanahi, B., Frey, B.,J.: Are Random Forests Truly the Best
Classifiers?. Journal of Machine Learning Research, vol. 17, pp. 110:1–110:5, 2016
18. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D: Do we Need Hun-
dreds of Classifiers to Solve Real World Classification Problems? Journal of Machine
Learning Research, vol. 15, pp. 3133–3181, 2014
19. Ahmad, W. M., Mourshed, M., Rezgui, Y.: Trees vs Neurons: Comparison between
random forest and ANN for high-resolution prediction of building energy consump-
tion Journal on Energy and Buildings, vol. 147, pp. 77–89, 2017
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
Web-based Machine Learning Platform for Condition-
Monitoring
*
Corresponding author. Tel.: +49-721-6091-360
E-mail address: [email protected]
Abstract. Modern water system infrastructures are equipped with a large amount
of sensors. In recent years machine-learning (ML) algorithms became a promis-
ing option for data analysis. However, currently ML algorithms are not frequently
used in real-world applications. One reason is the costly and time-consuming in-
tegration and maintenance of ML algorithms by data scientists. To overcome this
challenge, this paper proposes a generic, adaptable platform for real-time data
analysis in water distribution networks. The architecture of the platform allows
to connect to different types of data sources, to process its measurements in real-
time with and without ML algorithms and finally pushing the results to different
sinks, like a database or a web-interface. This is achieved by a modular, plugin
based software architecture of the platform. As a use-case, a data-driven anomaly
detection algorithm is used to monitor the water quality of several water treat-
ment plants of the city of Berlin.
1 Introduction
In recent years, a large number of new water quality and hydraulic sensors in water
distribution networks and water treatment plants have been installed. Reasons for this
trend are (1) a lot of new sensor companies and corresponding new sensors appeared
on the market which means decreasing costs and increasing performance of the sensor
units; (2) due to wireless communication technologies (e.g. GSM) the installation costs
are drastically decreasing. Hence, there is a need for the development of integrated
platforms for the storage, visualisation and enhanced data analysis of these data. The
benefit of advanced data analysis in water infrastructures has been already investigated
for different scenarios, e.g. monitoring of drinking water quality 4, forecasting of the
water consumption 6 or the modelling of sediment transport 1. However, different data
suppliers and old plants containing an outdated IT-infrastructure still complicate the
integration of state-of-the-art data analysis algorithms. In spite of the fact that many
IoT and data analysis platforms are available nowadays the effort for the integration of
these platforms in the IT infrastructure of water utilities and the implementation of ML
algorithms is still very high. To overcome some of these challenges, this paper presents
a generic data fusion and analysis platform with the focus on condition monitoring of
the WDN with machine learning algorithms. The platform follows a plug-in based ar-
chitecture, which means that depending on the specific needs of the current use case
(e.g. saving data in a database, performing anomaly detection) different software com-
ponents can be installed. As a use case, the platform is used to perform the condition-
monitoring of nine water quality measuring stations in parallel with a combination of
Principal Component Analysis (PCA) 2 and Gaussian Mixture Models (GMMs) 9. The
results of the machine learning algorithms, comprising the learned process map, the
state trajectory and the anomaly index, are visualized for all stations in a web-interface.
2 Platform Architecture
The architecture of the proposed platform consists in three main parts shown in figure
1: (1) the platform core, (2) a plugin structure and (3) a web-interface. The platform
core is responsible for the management of the different software modules and data han-
dling and described in section 2.1; the plugins provide the required use case specific
application functionality (e.g. analysis algorithms; connection to data source) and are
described in section 2.2. Finally, the web-interface, used to give a feed-back to the user,
is explained in section 2.3.
Fig 1: Plug-in architecture of the platform for real-time data analysis applications
the core as central component, thus preventing any plugin to plugin communication.
The core itself uses the Model-View-Controller (MVC) pattern 3.
The core manager is the controller of the platform. It is the owner of all plugins as well
as the core cache and responsible for their creation and destruction. Since it is also the
facade for the whole core, it is known by reference by all plugins, which need to request
access for each core cache entry they want to access.
The core cache acts as model to separate the core’s data from its logic. In order to
establish either a read only or read/write connection to the core cache, a plugin has to
be granted permission by the core logic. Once a connection is established, the plugin
receives a local copy of the requested core cache data which stays in sync with the
cache via the observer pattern 3.
2.2 Plugins
To maintain the maximum amount of flexibility, the platform follows a plugin based
architecture. This means that depending on the specific needs of the current use case
different software components can be integrated into the platform. Basically, a plugin
represents a software module fulfilling a specific task. Examples are the connection to
the SCADA system of the water utility; the implementation of an event detection algo-
rithm or the automated generation of a daily, weekly or monthly report. Plugins employ
the factory pattern 3 to allow creating several instances which can be configured started
and stopped individually.
3 Data-driven Condition-Monitoring
Map
Process Z-score
PCA GMM
data Normalization Scoring
భ షభ
ଵ ሻᇲ σ ൫ೣషࣆ ൯ ሿ
ሺݔȁࣆ ǡ ሻ ൌ ݁ ିమሾሺ௫ିࣆ (7)
ሺଶగሻȀమ ȁஊ ȁభȀమ
with ࣆ the mean vector and the covariance matrix. The log-probability of a sample
࢞߳Թଵൈ is then determined as
with ܽො אԹǤ. The training of the GMM means to estimate the weights ߱ , the mean ࣆ
and the covariance ǡ Ǥ Therefore, an usually an Expectation Maximization (EM) algo-
rithm is used 9. The EM algorithms tries to increase the expected log-likelihood of the
complete training data set by iteratively changing the GMM parameters until they con-
verged. In this paper, for training the GMM, the first two principal components from
the initial training set are used.
ܽො indicates a not normal state, while a good practice for a threshold selection is to take
the lowest value of ܽො resulting from the training data.
Fig 3: (Left) Visualization of the calculated GMM and the trajectory of a measuring
station from Berliner Wasserbetriebe in normal state. (Right) The same map with a
detected anomaly, namely a reduction of the redox-potential in the measurements.
4.1 Plugins
For the use case of water quality monitoring, the following plugins are implemented.
Data polling and parsing plugin (1): The measurements from the water quality
monitoring stations are exported by the SCADA system as chunked .csv files on a
secure FTPS server with a sample time of a few minutes. A plugin cyclically polls
to the FTPS server and checks if new data is available. In this case the correspond-
ing files are downloaded, parsed and written into the cache. From the cache, they
are analyzed by the condition-monitoring plugin.
42
Fig 4, upper side, shows the plug-in manager with the loaded plug-ins. The lower plot
gives a screenshot of the real-time data cache containing results from the different plug-
ins
Fig 4: (Upper plot) Plug-in manager with loaded plug-ins for monitoring; (lower plot)
real-time data cache
4.2 Web-interface
The web-interface provides an overview of the current state of the monitored measure-
ment stations, the process map with the trajectory, as well as information about the
historic results from the condition-monitoring algorithms. Furthermore, the complete
43
website is kept responsive, which means that the results can be visualized on a tablet or
smartphone as well. In summary the interface covers the following main features:
Dashboard: The dashboard consists of a set of tiles and deals as a summary of the
current states of each measuring station. Basically, tiles can be in green (normal
state) and in red color (anomaly detected), depending on the value of the anomaly
index (see section 3.3). If the index falls below a predefined threshold, the color
changes from green to red. A screenshot of the dashboard is shown in Fig 5 left
side.
Process map and trajectory visualization: The calculated map as well as the tra-
jectory and the anomaly index, described in section 3, are visualized in the web-
client. This gives an overview of the current state of the process and shows if it is
in normal or abnormal state. A screenshot of the process map is given in Fig 6.
Time series visualization: The web-client provides the possibility to give historic
and real-time access to the anomaly indices (Fig 5 middle). Additionally, in a pre-
defined time-frame, a plot of the alarm index with the corresponding measurements
is generated. A screenshot is shown in Fig 5 on the right hand side.
Fig 5: (Left) Screenshot of the Dashboard; (middle) exemplary anomaly indices for
measuring stations; (right) graph covering GMM scoring results with the corresponding
measurements of the last month for a measuring station
Fig 6: Visualization of the process map and trajectory within the web-client
44
5 Conclusion
This paper presents a generic platform for data analysis with a focus on data-driven
condition-monitoring in water distribution. Therefore, a plugin based software archi-
tecture is proposed, which can be used to collect data from different sources, treat data
with different analysis algorithms and provide the results by a web-based user interface.
Due to the plugin structure, the platform provides a large flexibility and can be adapted
for very complex scenarios. For data analyses, a data-driven condition-monitoring ap-
proach based on a combination of Principal Component Analysis and Gaussian Mixture
Models was realized. Within this approach, the original input data is reduced down to
two dimension to generate a map of the process. Next, this map is used in combination
with the calculated process trajectory to visualize if the process is close to a cluster
center, meaning in a normal state. Furthermore, an anomaly index is calculated, which
defines if the process is in normal or abnormal state. As a use-case, the results of the
monitoring of the water quality parameters in the city of Berlin has been presented.
Acknowledgements
The project ResiWater [7] is supported by the German Federal Ministry of Education
and Research (BMBF) and by the French Agence Nationale de la Recherche (ANR).
References
1. B. Bhattacharya, R.K. Price, D.P. Solomatine: Machine Learning Approach to Modeling
Sediment Transport, Journal of Hydraulic Engineering, 2007
2. C- Bishop: Pattern recognition and machine learning, Springer, 2006
3. E. Gamma, R. Helm, R. Johnson, J. Vlissidies: Design Patterns: Elements of Reusable Ob-
ject-oriented Software, Addison-Wesley, 1994
4. C. Kuehnert et. al.: A new alarm generation concept for water distribution networks based
on machine learning algorithms, 11th International Conference on Hydroinformatics, 2014
5. T. Marwala: Gaussian Mixture Models and Hidden Markov Models for Condition Monitor-
ing, In:Condition Monitoring Using Computational Intelligence Methods, Springer, 2012
6. Z. Ren: Short-term demand forecasting for distributed water supply networks: A multi-scale
approach, WCICA, 2016
7. Project ResiWater - Innovative Secure Sensor Networks and Model-based Assessment Tools
for Increased Resilience of Water Infrastructure, project website: https://www.resiwater.eu;
funded by BMBF (13M13688) and ANR (ANR-14-PICS-0003)
8. Fodor, Imola K. A survey of dimension reduction techniques. No. UCRL-ID-148494. Law-
rence Livermore National Lab., CA (US), 2002.
9. Reynolds, Douglas. "Gaussian mixture models." Encyclopedia of biometrics (2015): 827-
832.
10. Qin, S. Joe. "Survey on data-driven industrial process monitoring and diagnosis." Annual
reviews in control 36.2 (2012): 220-234.
11. Yin, Shen, et al. "A review on basic data-driven approaches for industrial process monitor-
ing." IEEE Transactions on Industrial Electronics 61.11 (2014): 6418-6428
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
Selection and Application of Machine Learning-
Algorithms in Production Quality
Jonathan Krauß1, Maik Frye1, Gustavo Teodoro Döhler Beck1, Robert H. Schmitt2
1 Fraunhofer
Institute for Production Technology IPT
Steinbachstr. 17, Aachen 52074, Germany
{jonathan.krauss,maik.frye,gustavo.beck}@ipt.fraunhofer.de
2 Laboratory for Machine Tools WZL RWTH Aachen University
Steinbachstr. 19, Aachen 52074, Germany
[email protected]
Digitalization has led to a steady increase in data in recent years. Through higher
computing power, it is possible to process the large amount of data [1]. Analyzing the
acquired data can enhance both the understanding and the process efficiency - or to
describe it in the words of Peter Sondergaard: “Information is the oil of the 21st century,
and analytics is the combustion engine” [2]. Especially sectors like the financing-do-
main or the marketing-domain are leading when it comes to generate value from data
[3]. In particular, the use of Machine Learning (ML)-algorithms increased over the last
decade. The main reasons for this trend, apart from the higher computing power and
data input mentioned above, are the increasing reliability of the algorithms, the simpler
implementation of the algorithms as well as the easier data acquisition. [1]
Even though the application of ML-algorithms is well established in other domains,
it is not common in the context of production quality. For process optimization in the
production quality-domain, physically based modeling (PBM) is commonly used.
While PBM offers the advantage of describing the current and future state of a system
by physical dependencies, data-driven models use the information from observed data
to identify current system characteristics and to predict the future state without requir-
ing a deeper understanding of the physical interdependencies of the process. [4] The
development of data-driven models thus shows a high potential for even further opti-
mization of production processes. In the presented case we chose to transform the data
into a data-driven model by applying ML-algorithms.
amount of dimensions. Besides a multitude of missing values, the data set is also im-
balanced. In this context, an imbalanced dataset means that more products are in-spec
than off-spec.
To predict the product quality it is necessary to trace the product data throughout the
entire process chain. For that reason, the six different CSV-files need to be linked. This
link is created using a product identification number. Since the CSV-files are not uni-
formly structured, the files need to be transformed multiple times. After the product-
related link, the data is cleaned by deleting empty values, apparent correlations as well
as by reducing dimensions. Overall, the process of data understanding and preparation
took about 80 % of the time regarding the entire CRISP-DM procedure.
In the beginning of the modeling step, a suitable approach how to create a model
needs to be selected. Due to the time it takes to learn a data-driven model with an ML-
algorithm, only a small number of algorithms can be applied. The process of selecting
ML-algorithms depends highly on the use case, the appearance of the data set and the
personal experience of the involved data scientists. In this specific case, we interpret
the prediction whether a product will be in-spec or off-spec as a classification problem.
One class includes all products that run through the process chain being in-spec. Since
the quality of the product is measured after each process, the product can become off-
spec after each process resulting in six additional classes. Because we are able to label
the data set, this multiclass classification problem can be solved using supervised learn-
ing algorithms. Fig. 2 shows a visualization of the processes and the even classes.
The characteristics of the data set result in the requirements for the algorithm that
has to deal with an imbalanced data set, few samples as well as many dimensions. Best
practices in other sectors with similar problems are taken from the literature. Besides
the results of the literature research, own experiences show beneficial results when de-
cision tree algorithms are applied. Considering the mentioned explanations, the deci-
sion tree algorithm Classification and Regression Tree (CART) is selected for this use
case [6]. CART can handle high dimensional data sets and has the further advantage
that process owners can understand the results of the analysis very quickly and intui-
tively. The localization, in which the prediction states that the product will run out of
tolerance, can be easily detected. Furthermore, the implementation and validation of
the decision tree algorithm is simple.
There exist many different platforms for Data Mining as well as ML-algorithm im-
plementation [7]. These platforms can be divided in “Data Science and Machine Learn-
ing platforms” like Matlab or RapidMiner and “open source platforms” like Python and
49
R. Data Science and Machine Learning platforms are characterized by easy handling
and fast model development [8]. Nevertheless, operating the platform can result in high
licensing costs [9]. Open source platforms like Python and R play an increasingly im-
portant role in the data science market because they are free of charge and are the most
common programming languages for ML-implementation [8]. We decided to use the
“open source platform” Python because the libraries that can be called, such as Tensor-
Flow and scikit-learn, are undergoing strong development. The algorithm is imple-
mented in Python by calling the decision tree algorithm via the scikit-learn library.
Scikit-learn uses an optimized version of the CART algorithm [10].
To achieve betters performances of the ML-algorithm, hyperparameter must be set.
Hyperparameters are the configuration that is external to the model and whose values
cannot be estimated from the data set [11]. They are initially set when the algorithm is
called by scikit-learn and need to be optimized. Hyperparameters of the decision tree
algorithm are e.g. the maximum depth and the minimum size of the tree. There are
different approaches to optimize hyperparameters. For this use case, the basic approach,
called random search, is applied on the decision tree algorithm. Random search ran-
domly selects any combination of the hyperparameters to be set within an interval of
possible hyperparameters. If this combination of hyperparameters lead to better results,
the parameters are updated. Basic approaches to set and tune hyperparameters are grid-
search and random-search. Over the last years, other tuning approaches like Bayesian
Optimization and Gradient Descent became popular [12]. In addition to these advanced
approaches, research institutes try to apply heuristics to the hyperparameter tuning-
problem. These academic approaches include metaheuristics like Particle Swarm Opti-
mization, Ant Colony Optimization and Harmony Search [13].
After running, the performance of the model can be evaluated by a multitude of met-
rics. The basis of measuring the performance of a classification model is the confusion
matrix. The rows of the 2x2 confusion matrix represent the instances in a predicted
class while the columns represent the instances in an actual class [14]. If the classifica-
tion model correctly classifies the input as positive (in-spec) or negative (off-spec), they
are considered as true positives (TP) or true negatives (TN). Classifying products
falsely as positive or negative counts as false positive (FP) or false negative (FN). Based
on the confusion matrix, we can derive different metrics.
Metrics that can be easily derived from the confusion matrix are accuracy and error
rate. Other single-value metrics like the F1-Score and Mathew Correlation Coefficient
(MCC) are more complex to set up but can still be derived from the confusion matrix.
In order to evaluate the performance of the CART algorithm in this specific use case,
the MCC is selected. MCC considers imbalanced data sets more efficiently than accu-
racy and error rate [14]. The mathematical relationship can be taken from equation (1).
ήǦή
ൌ (1)
ඥሺାሻήሺାሻήሺାሻήሺାሻ
The MCC considers both mutual accuracies and error rates on both classes. Further-
more, the MCC is a coefficient between the observed and predicted classifications and
50
returns a value between “−1” and “+1”. A coefficient of “+1” represents a perfect pre-
diction, “0” no better than random prediction and “−1” indicates total disagreement
between prediction and observation. [14]
In order to predict the product quality after each process, different CART-algorithms
need to be trained because at each process, different amount of data is available to train
the CART-algorithm. This leads to four different CART-algorithms, whose perfor-
mances are depicted in Fig. 3. The results include the decision trees that were created
after the hyperparameter tuning. By applying random search, the results could be im-
proved by 30% which can be observed in other cases as well [15]. Since no new data is
generated in the fourth process, no new decision tree was learned for the change from
the fourth to the fifth process.
The metric MCC shows the performance of the algorithm in predicting the actual
classes of the process. For the first process step the metric is MCC = 0.21. This means
that there is a match between predicted and actual class, which is relatively low, but
better than random prediction. The MCC increases the more processes are accom-
plished and the fewer processes have to be carried out. The quality of the model im-
proves when more data points are used for the learning task. In addition, less processes
and results need to be predicted for the future. After the completion of the fifth process,
the metric value is MCC = 0.70, which means that the decision tree is a suitable algo-
rithm to predict the product quality sufficiently [16].
The use of methodologies to solve a specific task creates comprehensible and repro-
ducible results. Therefore, methodologies were developed especially for data mining
52
and knowledge discovery [18]. Due to the mentioned benefits, they are used in the ma-
jority of corresponding projects [3].
CRISP-DM, SEMMA (Sample, Explore, Modify, Model, and Assess) and KDD
(Knowledge Discovery in Databases) as the top three methodologies all include a phase
specifically designated to create the model for the problem [19]. Due to the generic
nature of the three methodologies, the activities in the phase of “Modeling” can be on
a different level of complexity ranging from the application of linear regression up to
deep learning. Therefore, a data scientist has to decide how to conduct the phase of
“Modeling” e.g. by applying an ML-algorithm. Normally the following three aspects
are included in this decision: Personal experience, appearance of the data set and liter-
ature review. [20]
The problems and corresponding data sets that need to be tackled are domain-spe-
cific. Tools that support the data scientist in selecting an ML-algorithm are mostly so
called “cheat sheets” [21]. Team members solely bring domain-specific knowledge into
the solution. The process of choosing the ML-algorithm is therefore highly dependent
on the expertise of the data scientist. Since neither methodologies nor tools include this
domain-specific knowledge, the process of selecting the ML-algorithm is not reproduc-
ible. Not all domain-specific knowledge can be integrated into a tool. The process of
selecting the ML-algorithm stands out by the required creativity of the data scientist.
Therefore a decision making tool cannot dismiss the data scientist from his responsi-
bility, but can serve as a support in fulfilling that task. In the following, we present a
concept how to set up such a domain-specific decision making tool.
The decision making tool (DMT) works as a domain-specific support for the data
scientist in selecting an appropriate ML-algorithm to create a model that fulfils prob-
lem-specific requirements. This is done by including three main aspects as depicted in
Fig. 4: Appearance of the data input, requirements of the model to be created and do-
main-specific knowledge regarding the considered use case. All three factors are in-
cluded when providing the user a recommendation.
The data scientist interacts with the DMT over a user interface (UI), which he utilizes
to describe the specific case he wants to model applying ML-algorithms. The DMT
compares the input with historical assessments and problems, including their evalua-
tion. Afterwards the DMT provides the data scientist a list of ML-algorithms probably
suitable for the specific use case and additional information about the corresponding
selection process. The concept of the DMT is depicted in Fig. 5 and described in detail
in the following.
Using the UI, the data scientist loads the characteristics of the data set, the require-
ments of the model to be created and a description of the use case into the DMT. Char-
acteristics of the data set are for example the dimensionality of the data, number of
features, number of data points, data quality, data distribution or data noise. Require-
ments of the model to be created are for instance the learning time, performance of the
model or transparency of the model. The description of the use case includes infor-
mation about the type of the use case, e.g. predictive maintenance or product quality
prediction. Characteristics like the dimensionality or the maximum running time are
quantitative and can directly be loaded into the DMT. Others like the transparency of
the model need to be transformed from their qualitative state into a measurable form
using for example goal question metrics [22]. This influences the degree of automation
to which the characteristics can be loaded into the DMT.
Two main databases function as the backbone of the DMT: A database that includes
the domain-specific characteristics of ML-algorithms and a database that stores prob-
lem-specific characteristics of ML-algorithms.
54
5 Conclusion
In this paper, we presented how ML-algorithms can be applied in a tangible use case
from the production quality-domain. In a process chain consisting of six processes, it
should be predicted after completion of each individual process whether the product
would be off-spec in the following processes. In order to achieve beneficial results, the
methodology CRISP-DM was followed. After focusing on the process understanding,
data was initially acquired. Afterwards, formats as well as characteristics of the data set
has been explored. The preparation of the data comprised the cleaning, transforming
and dimensionality reduction in order to apply the ML-algorithm sufficiently. Since we
have a multiclass classification problem, the decision tree algorithm CART was se-
lected. The evaluation of the CART algorithm showed that both the methodology and
the application of ML-algorithms could lead to beneficial results. On the basis of the
mentioned use case, tangible lessons learned could be derived and were divided into
lessons learned on the management, project and technology level.
Based on the variety of ML-algorithms, it is difficult to determine, which ML-
algorithm is the most suitable for predicting the product quality. In this use case, we
compared the performance of different algorithms. These algorithms were selected by
the character of the problem, by analyzing the data, by reviewing literature and by the
55
authors own experience. This process of choosing the ML-algorithm is highly depend-
ent on the expertise of the involved team members. Therefore, a tool that supports the
user selecting the ML-algorithm could help in making the process more reliable.
We explained why methodologies are widely used in data mining-projects but why
they are just a footnote when choosing ML-algorithm for a specific problem. A concept
how a DMT can support data scientists in selecting ML-algorithms for a specific prob-
lem was presented. The DMT takes domain-specific demands into account and charac-
terizes ML-algorithms accordingly. Problem type-specific evaluations of ML-
algorithms are included in the recommendations. Nevertheless, domain-specific
knowledge, expertise regarding selection and implementation of ML-algorithms and
the creativity of data scientists will not become obsolete.
6 Funding notes
“The IGF promotion plan 18504N of the Research Community for Quality (FQS),
August-Schanz-Str. 21A, 60433 Frankfurt/Main has been funded by the AiF within the
programme for sponsorship by Industrial Joint Research (IGF) of the German Federal
Ministry of Economic Affairs and Energy based on an enactment of the German Par-
liament.”
56
7 References
1. Michael Driscoll (2011) Building data startups: Fast, big, and focused: Low
costs and cloud tools are empowering new data startups. http://ra-
dar.oreilly.com/2011/08/building-data-startups.html. Accessed 14 May 2018
2. Peter Sondergaard (2011) Gartner Says Worldwide Enterprise IT Spending to
Reach $2.7 Trillion in 2012. https://www.gartner.com/newsroom/id/1824919.
Accessed 14 May 2018
3. Piatetsky-Shapiro G (2014) What main methodology are you using for your ana-
lytics, data mining, or data science projects? https://www.kdnug-
gets.com/polls/2014/analytics-data-mining-data-science-methodology.html. Ac-
cessed 14 May 2018
4. Datong P. Zhou, Qie Hu, Claire J. Tomlin (2017) Quantitative comparison of
data-driven and physics-based models for commercial building HVAC systems
5. Pete Chapman, Julian Clinton, Randy Kerber, Thomas Khabaza, Thomas
Reinartz, Colin Shearer and Rüdiger Wirth CRISP-DM: Step-by-step data min-
ing guide
6. Scikit-learn Developers (2018) Decision Tree Classifier. http://scikit-
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html.
Accessed 14 May 2018
7. Gregory Piatetsky (2018) Survey regarding data mining platforms: What soft-
ware you used for Analytics, Data Mining, Data Science, Machine Learning
projects in the past 12 months? http://vote.sparklit.com/poll.spark/203792. Ac-
cessed 14 May 2018
8. Gregory Piatetsky (2018) Gainers and Losers in Gartner 2018 Magic Quadrant
for Data Science and Machine Learning Platforms. https://www.kdnug-
gets.com/2018/02/gartner-2018-mq-data-science-machine-learning-
changes.html. Accessed 14 May 2018
9. Gregory Piatetsky (2017) Forrester vs Gartner on Data Science Platforms and
Machine Learning Solutions. https://www.kdnuggets.com/2017/04/forrester-
gartner-data-science-platforms-machine-learning.html. Accessed 14 May 2018
10. Scikit-learn Developers (2018) Decision Trees. http://scikit-learn.org/sta-
ble/modules/tree.html. Accessed 14 May 2018
11. Dr. Jason Brownlee (2017) What is the Difference Between a Parameter and a
Hyperparameter? https://machinelearningmastery.com/difference-between-a-pa-
rameter-and-a-hyperparameter/. Accessed 14 May 2018
12. Rafael G. Mantovani, Tomáš Horváth, Ricardo Cerri, Joaquin Vanschoren, An-
dré C.P.L.F. de_Carvalho (2016) Hyperparameter Tuning of a Decision Tree In-
duction Algorithm. IEEE, Piscataway, NJ
13. Pawel Matuszyk, Rene Tatua Castillo, Daniel Kottke A Comparative Study on
Hyperparameter Optimization for Recommender Systems
57
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
Which deep artifical neural network architecture
to use for anomaly detection in Mobile Robots
kinematic data?
1 Introduction
The navigation of mobile robots typically relies on laser scanner data. Small
humps on the floor, e.g. cable channels, doorsills, floor unevenness or other envi-
ronmental anomalies go beyond its detectable scope. Typically only a 2D map of
the environment e.g. 10cm over ground can be established. However, even such
small irregularities can have a tremendous effect on the robot’s stability and the
path quality. Induced vibrations can impact cargo or can reduce the storage life
of the robot or its mechanical components.
The new idea of our project is to seek to integrate the detection of small
anomalies into dynamic adaptation during the execution of a path and into path
planning itself. This should be done based on acceleration data, which can be
collected simple and inexpensive by inertial sensors.
Commercial mobile platforms like the Mir-100 allow the definition of driving
routes by defining manually a few target points in the map. Then, subsequent
path planning is done automatically considering several boundary conditions,
e.g. distances to walls. Such a map based path planning can be extended by
dynamic path planning in order to adjust to temporary changes in the environ-
ment [1]. By driving around or stopping in front of unpredicted and potentially
dynamic obstacles collisions can be avoided.
2 Methodology
3 Concept
The bigger aim of the project behind this paper is to make the usage of mo-
bile robots more robust and flexible by dynamic adaptions to a changing en-
vironment. This paper extends the work in [13], which describes in detail the
kinematics of the commercially available mobile platform Mir-100 during over-
run of a cable channel as a model for an environmental anomaly. Takeoffs are
happening particular strong for the rear wheels as a product of the front and
the drive wheels already past the cable channel and therefore pulling is more
60
effectively. To avoid a damage of the platform or its cargo the idea is to detect
the overrun of the front wheels as an anomaly in realtime and to slow down the
mobile platform before the rear wheels reach the cable channel.
The measurements described in [13] are done with high precision by a marker
based optical system to have a ”gold standard”. This dataset is also used to train
the DNNs presented in this paper.
4 Experiments
during driving a mobile platform Mir-100 (Fig. 1) in a gait- and motion analysis
lab. Details of the dataset and its aquisition is described in [13]. Three trials are
arbitrary chosen to build a validation set.
The DNNs are trained with the remaining 24 example trials with about 15000
time frames each. Only the sections of the trial without the overruns of the cable
channel are included in the training set. Over each trial a time window of width
100 frames is moved step by step and the resulting 100 * trial length sequences
are mixed up to build the training sequence. To normalize the data and make
it more suitable as input for the DNN the mean is subtracted and a division by
the standard deviation is done.
Further three test trials with acceleration data (sampling rate 120Hz) are
collected from an inertial measurement unit MPU 9250 (Inven Sense) connected
via I2C to a Omega2 module (Onion, Fig. 2) and mounted on the mobile plat-
form. To test the DNNs the data is saved in csv files. In principle the data can
be streamed via WiFi to an external laptop, which also collects the position data
of the mobile platform via the MiRs REST-API.
Vertical acceleration data is collected for three test trials during driving the
robot in a corridor with full speed. A cable channel (Fig. 3) is overrun in the
middle of the trial.
5 Results
Training of LSTM based autoencoder and the VAE (4) both converges well with a
batch size of 50 and a learning rate of 0.2. Loss function values after training with
1 and after 5 epochs are 4.686 and 1.154 for the LSTM layers based autoencoder
and 0.619 and 0.039 for the VAE. The values show no differences between the
three test trials (optical marker based measurements) for the shown digits.
62
Reconstructed non anomalous data looks very similar in both cases and the
overruns of the cable channels are detected clearly as anomaly in all (validation-
an inertial sensor based test trials) cases. Fig. 5 shows the difference between
original and the predicted/reconstructed data for non anomalous data. The data
was normalized to one for the complete trial inclusive anomalous data. That is
why the values for non anomalous data in Fig. 5 are so small. Fig. 6 shows a
part of the same trial with anomalous data. The three peaks correspond with
the overrun of the front-, drive- and rear-wheels. The detections work fine too for
inertial sensor based test trials although the DNNs are trained with the marker
based optical high precision lab data only.
The approach with a convolutional layer based architecture has no success
until now.
Fig. 4. Score (value of the loss function) over the current minibatch (x-axis), during
training of the VAE.
6 Discussion
Anomaly detection works fine for both tested DNN architectures but training of
the VAE converges faster and to smaller loss function values which can be an
advantage.
63
These positive results should not hide the fact that a neural net application
often needs more care and expenditure in its configuration than an explicit for-
mulated algorithm. Neural nets always come along with the risk to learn hidden
but unwanted rules by so called overfitting. In practice you can meet this by
a number of arrangements. Carfully choosen architecture details, e.g. for the
variational autoencoder used for this project the count of hidden nodes is set
higher than the count of input/output nodes. This helps a lot against overfitting.
Furthermore you can use so called data augmentation techniques, if the training
data set is not divers enough or too small. To be sure that the DNN learns the
concrete paths of the training data as normal, we cut the complete movement
paths into pieces and create the training set with an random sequence of these
pieces.
If the configuration is such sensitive, why to use a neural net et all? The
overrun of the cable channel produces a time window with spikes. With a simple
threshold spike detector anomaly detection could be achieved with less effort.
Furthermore, this could have the additional advantage that the time threshold
for spiky data considered as anomalous, can be defined explicitly, so that the
concrete mobil platform is meaningful affected. If only 1D acceleration data is
available this can be the better approach.
However, if multichannel data is available e.g. from multiple 3d-acceleration
and other sensors in combination and if the algorithm should be robust against
single sensor dropouts, the DNN approach is more flexible. It is much easier
to train a DNN with a different sensor configuration than to adjust thresholds
for multiple sensors and to implement a configuration specific logic to make the
system robust against dropouts.
The failure of our convolutional layer approach seems to be caused by a too
small training data set.
References
1. Meyer J., Filliat D.: Map-based navigation in mobile robots: II A review of map-
learning and path-planning strategies. Cognitive Systems Research 4 283-317 (2003)
2. Gamboa, J. C. B.: Deep Learning for Time-Series Analysis. arXiv preprint
arXiv:1701.01887. (2017)
65
3. Tai L., Liu M.: Deep-learning in mobile robotics-from perception to control systems:
A survey on why and why not. arXiv:1612.07139. (2016)
4. Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., Shroff, G.:
LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint
arXiv:1607.00148. (2016)
5. Neto, H. V., Nehmzow, U.: Real-time automated visual inspection using mobile
robots. Journal of Intelligent and Robotic Systems, 49(3), 293-307 (2007)
6. Sofman, B., Neuman, B., Stentz, A., Bagnell, J. A.: Anytime online novelty and
change detection for mobile robots. Journal of Field Robotics, 28(4), 589-618 (2011)
7. Kingma, D. P., Welling, M.: Auto-encoding variational bayes. arXiv preprint
arXiv:1312.6114. (2013)
8. Rezende, D. J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approx-
imate inference in deep generative models. arXiv preprint arXiv:1401.4082. (2014)
9. Sölch, M., Bayer, J., Ludersdorfer, M., van der Smagt, P.: Variational infer-
ence for on-line anomaly detection in high-dimensional timeseries. arXiv preprint
arXiv:1602.07109. (2016)
10. Fabius, O., van Amersfoort, J. R.: Variational recurrent auto-encoders. arXiv
preprint arXiv:1412.6581 (2014)
11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation,
9(8), 1735-1780 (1997)
12. Graves, A.: Supervised sequence labelling with recurrent neural networks. Studies
in Computational Intelligence 385 (2012)
13. Rettig, O., Mller, S., Strand, M., Katic, D.: Unsupervised Hump Detection for
Mobile Robots Based on Kinematic Measurements and Deep-Learning Based Au-
toencoder. IAS-15 (http://www.ias-15.org) 2018 (submitted and accepted)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
GPU GEMM-Kernel Autotuning for scalable
machine learners
Abstract. Deep learning (DL) is one of the key technologies in the ar-
tificial intelligence (AI) domain Deep learning neural networks (DLNN)
profit a lot from the overall exponential data growth while on the other
hand the computational effort for training and inference strongly in-
crease. Most of the computational time in DLNN is consumed by the con-
volution step, which is based on a general matrix multiplication (GEMM).
In order to accelerate the computational time for DLNN different highly
optimized GEMM implementations for Graphic Processing Units (GPUs)
have been presented in the last years [1] most of these approaches are
GPU hardware specific implementations of the GEMM software kernel
and do not incorporate the performance dependency of the training data
layout . In order to achieve a maximum performance the parameters of
the GEMM algorithm have to be tuned for the different GPU hardware
and specific data layout of the training task. In this paper we present a
two step autotuning approach for GPU based GEMM algorithms. In the
first step the kernel parameter search space is pruned by several perfor-
mance criteria and afterwards further processed by a modified Simulated
Annealing in order to find the best kernel parameter combinations with
respect to the GPU hardware and the task specific data layout. Our re-
sults were carried out on 160 different input problems with the proposed
approach an average speedup against the state of the art implementation
from NVIDIA (cuBLAS) from around 12 on a NVIDIA GTX 1080 Ti
accelerator card can be achieved.
1 Introduction
1.1 Motivation
Deep learning (DL) is one of the key technologies in the artificial intelligence
(AI) domain Deep learning neural networks (DLNN) profit a lot from the over-
all exponential data growth while on the other hand the computational effort for
training and inference strongly increase. Machine learning applications profit a
lot from that overall data growth, since the models can be trained more precise.
However, those algorithms runtime depend heavily on the input data. Most of
the computational time in DLNN is consumed by the convolution step, which
is based on a general matrix multiplication (GEMM). In order to accelerate
the computational time for DLNN different highly optimized GEMM implemen-
tations for Graphic Processing Units (GPUs) have been presented in the last
years [1]. In order to achieve a high computational throughput, most of these
approaches are based on a hardware specific software kernel implementation
of the GEMM algorithm. Usually the different hardware dependent kernel pa-
rameters are tuned manually, which involves expertise about the specific GPU
architecture. Furthermore the performance of the GEMM kernel is strongly af-
fected by the shape of the input data processed different data sizes have a huge
impact on the computational runtime of the GEMM kernel due to the different
memory layouts of the GPU accelerators.
In order to achieve a maximum performance the parameters of the GEMM al-
gorithm have to be tuned hardware and task specific. In the last years, several
autotuning approaches of GEMM kernel parameters have been proposed [2] -
the basic idea is to automatically tune a limited number of essential GPU kernel
parameters in order to achieve a maximum performance. Usually the approaches
do not take into account the size and shape of the given input data, which yields
to varying computational runtimes.
The motivation of the presented work is to develop an autotune procedure for
GPU based GEMM kernels, which takes into account a comprehensive set of
kernel parameters and varying shapes of the data in the input task.
Proposed autotuning solutions such as [2] usually require a lot of computational
runtime to find an optimal kernel parameter set. The kernel parameter space
e.g. in the MAGMA GEMM kernel [4] is very large and therefore restrictions
are made to reduce the search space for the kernel parameters followed by a
brute search mechanism. This usually results in high search times for the kernel
parameters to be set.
the performance of these concepts is not optimal. The presented work in this
contribution is based on the well-known MAGMA GEMM kernel. The software
implementation is characterized by an extensive GPU kernel parameter space.
The MAGMA GEMM Kernel, has already been investigates in several autotun-
ing approaches [2,12–16]. The original kernel implementation has been described
in [12] and a first autotune concept [13]. With the introduction of the NVIDIA
Fermi GPU architecture, the kernel implementation has been revised [14] and an
autotuning procedure has been presented in [2]. The approach is characterized
by a huge search space for the GPU kernel parameters in conjunction with a
brute-force parameter search mechanism, which leads to a high computational
effort for finding optimal kernel parameters. With respect to small GEMM oper-
ations in [15,16] approaches for batched GEMM operations have been presented
and [17] describes the utilization of the Magma GEMM kernel in machine learn-
ing procedures. The autotuning approach presented in [18] focuses on energy
efficiency of the GPU while processing GEMM operations.
Most of the presented state-of-the-art work is based on a brute-force approach
for determining the optimal GEMM kernel parameters. This usually yields to
a huge parameter search space and therefore most of the approaches use a pa-
rameter combination pre-elimination step in order to reduce the computational
effort. The different heuristics for reducing the search space can possibly dismiss
optimal kernel parameter combinations. With respect to this suppositions, the
presented work focuses on defining optimal heuristics to reduce the search space
in combination with a Simulated Annealing(SA) procedure to find efficiently
optimal performing GEMM kernel parameters.
2 Solution
Optimal GPU kernel parameters strongly rely on the underlying GPU hard-
ware architecture, the memory layout and the input data size different settings
lead to different optimal parameter combinations. Therefore the resulting search
space for finding the optimal parameter combination can be enormous. Tuning
the parameters by hand is impractical, since it has to be redone for every GPU
architecture and every set of input data size again. With respect to these sup-
positions in the following sections we present a two step autotuning approach
for GPU-based GEMM algorithms. In the first step the kernel parameter search
space is pruned by several heuristic performance criteria, keeping good perform-
ing parameter combinations for a set of different use cases. In the second step
based on a modified Simulated Annealing (SA) algorithm the remaining pa-
rameter sets are further processed in order to find the best kernel parameter
combinations with respect to the GPU architecture and task specific data lay-
out.
In the following sections, the proposed autotuning approach is presented in sec-
tion 2.1 a short overview of the MAGMA GEMM kernel is given, in section 2.2
we explain the developed heuristics for reducing the search space and in section
2.3 the SA approach is introduced.
69
Blocksizes The Blocksizes BLK M, BLK N and BLK K define how many ele-
ments a Threadblock will calculate.
Subdimensions The Subdimensions DIM XA, DIM XB, DIM YA and DIM YB
determine how the Shared Memory(SMEM) is filled.
Algorithm 1: GEMM Kernel Algorithm (simplified)
prelimitations
We started with reducing the viable threadcounts respectively the threadblock
dimensions. The threadblock dimensions(DIM X, DIM Y) can only be 8, 16 or
32 resulting in 64, 256, 512 or 1024 threads. The GPU manufacturer NVIDIA
recommends using a minimum of 64 threads [20], which is the lower limit we
are applying, the upper limit is given by the hardware specification of the GPU.
Other configurations will not map onto the GPU hardware.
70
utilization criteria
The idea behind this approach is to make use of the Latency Hiding Principle
of the GPU explained in [21]. Basically when the GPU chip loads data from the
off-chip Global Memory (GMEM), it will pause the corresponding warp, which
is a bundle of 32 threads. The GPU will schedule another warp, while previous
one is waiting. Typically loading data from GMEM takes many hundred GPU
cycles so Latency Hiding this is essential for performance. To enable Latency
Hiding it is essential GPU kernels keep enough warps available and the GPU
can switch between contexts while loading data.
The number of available warps on the GPU is described by the utilization. The
utilization is limited by the available SMEM and number of Registers (REG)
used by the GPU kernel itself. Based on these resources the upper limit of
the achievable utilization can be calculated. The resource consumption and the
maximum utilization can be determined by analysing the kernel source code
- a similar approach can be found in [2]. Important to note is, that the pre-
sented work measures the utilization in Warps per Streaming Multiprocessor
(SM). The GPU schedules everything in Warps so this seems to be a reasonable
approach. Furthermore we are forcing similar utilization levels of SMEM and
REG. This constraint avoids parameter combinations, which heavily utilize one
resource while barely utilizing the other one. Parameter combinations, which
are heavily limited in utilization due to REG suffer from poor performance as
well as those, which are heavily limited through SMEM. Those parameter com-
binations, which are heavily limited in utilization due to SMEM, are keeping to
few entries from the result matrix, for the utilization they achieve. Therefore,
data has to be loaded more frequently from GMEM than necessary. Parameter
combinations, which are highly restricted with REG, are keeping to less data
to read for achieving faster times. Therefore, they have to load and wait more
frequently.
efficiency criteria
The presented work introduces a further criteria for finding optimal kernel pa-
rameters: The efficiency criteria describes how long a parameter combination
can work, until data has to be reloaded from GMEM. The efficiency criteria is
calculated based on the kernel source code by the equations given in 1 to 3.
• Equation 1 describes how often data is loaded from SMEM, minus how often
data is loaded from GMEM.
• Equation 2 describes how often data is read from SMEM compared to load-
ing data from GMEM.
3 Performance Evaluation
The performance evaluation of the proposed work is based on a NVIDIA Pascal
GPU (MSI Geforce GTX 1080 Ti Aero 11G OC) in combination with a Intel
72
Xeon E5-1620 with 96 GB Memory host system. The operating system is Win-
dows 64 Bit with NVIDIA Driver Version is 390.65 and CUDA 8. To evaluate
the performance of the proposed approach different data sets are used - Table
1 gives on overview of the different matrix shapes for evaluation. These matrix
shapes have been chosen, because cuBLAS proven to perform very well. An eval-
uation test consists of three Matrices A,B and C with format M x K, K x N and
M x N ∈ N. Additionally in order to illustrate the flexibility of the proposed
approach, several other matrix shapes have been evaluated. The results of the
performance evaluation are shown in Figure 1 and Table 2. Figure 1 shows the
achieved speedups with respect to the matrix shapes compared to cuBLAS. It
can be seen, that the larger N the lower the performance speedup. In the worst
case the achieved result of the proposed approach is 1.3 times faster than the
highly optimized cuBLAS routine, in the best case the speedup is 187 times
faster than cuBLAS.
Table 1 shows a comparison between the best-found solutions with a standard
the brute-force approach to the proposed approach based on SA proposed in
this work. The speedup for finding optimal kernel parameters with the proposed
SA approach is nearly five to six times faster than the standard brute force
approach, while the performance loss for GEMM kernel execution is maximum
10%.
1000,0
100,0
Speedup
10,0
1,0
M=25 M=0,5*N M=N M=5*N
Format of the matrices form [M x K, K x N, M x N]
Fig. 1. Comparison of the speedup times against cuBLAS with the brute-force ap-
proach on the examples from Algorythm 2. The minimum Speedup was 1,3, the max-
imum was 187 times as fast as cuBLAS. The average was 12.3 compared to 11.9 in
the Simulated Annealing approach. The figure shows, that with an increasing size of
N compared to M the speedup reduces. But there was no negative speedup in this test
so the results are always faster than the calculation with cuBLAS.
4 Conclusion
5 Acknowledgements
References
1. Theano: Deep learning on gpus with python Bergstra, James and Bastien, Frédéric
and Breuleux, Olivier and Lamblin, Pascal and Pascanu, Razvan and Delalleau,
Olivier and Desjardins, Guillaume and Warde-Farley, David and Goodfellow, Ian
and Bergeron, Arnaud and others, NIPS 2011, BigLearning Workshop, Granada,
Spain
2. Autotuning GEMMs for Fermi Jakub Kurzak, Stanimire Tomov, Jack Dongarra
2011
3. Fast k nearest neighbour search using GPU Vincent Garcia, Eric Debreuve,
Michel Barlaud available at: http://vincentfpgarcia.github.io/kNN-CUDA/ access
13.06.2018
4. Magma project page http://icl.cs.utk.edu/magma/ access 13.06.2018
75
1 Introduction
One of the goals of Industry 4.0 is the optimization and customization of pro-
duction processes through digitization with algorithms, big data approaches and
high technologies [1]. Currently, machine learning (ML) approaches support mon-
itoring, diagnosis and (off-line) system optimization for fault detection, mainte-
nance, decision support and product quality improvement [2,3]. The field of ML
is manifold and various different methods are available. However, in manufac-
turing and other fields of application the complexity of ML methods can hinder
their adoption even though the data acquisition for many production processes
is possible and a sufficient data base is available or can be obtained. Therefore,
this work aims to implement a simplistic ML and optimization approach for a
production line. The paper starts with a discussion of work related to ML and
process control in Section 2, followed by the presentation of the methodology in
Section 3, that includes a description of the data sets, the data preparation, and
the estimation techniques. The results of the analysis are described in Section 4.
Section 5 presents the conclusions and discussion of practical implications.
We consider a production line for the press hardening of sheet metal in order
to produce center pillars, which are ultra-high-strength car body parts. Here,
we will focus on the three process steps warming, handling and quenching, see
Figure 1. The process involves inserting sheets, which have been heated beyond
the austenitizing temperature of about 900 C, into a cooled forming tool, in
which they are then quenched. The thermal integrated processing produces press-
hardened parts with an extremely high tensile strength of up to 1,500 MPa for
the ultra-high-strength steel 22MnB5. The handling of the sheets is done by
robots.
Fig. 1. Production line for the press hardening of sheet metal focusing on the three
steps: (1) warming in a furnace unit, (2) handling with a robot system with grippers,
and (3) quenching.
Fig. 2. Production line with three process steps and their respective controllable and
uncontrollable variables. Linear regression is conducted based on the existing database.
After the warming process is finished, parameter optimization for the process steps
handling and quenching is possible.
ML model to describe the relationship between the input and output param-
eters. Upper and lower boundaries for the allowed input parameter variations
are defined as stated in Table 1. Boundaries for the quality criteria have to be
defined as well. These depend on the type of component that is produced. The
focus can be on maximum component hardness or for example on the maximum
thickness of the finished component. As we focus on a part from the automotive
industry we want to maximize/increase both, the sheet thickness and hardness
at critical points which are prone to tearing. Thus, no upper boundaries for P1H
and P2ST were defined.
Table 1. Process parameters, quality criteria, and regression coefficients for the esti-
mation of P1H and P2ST.
Description of the Model Ultimately, we aim for on-line process control which
makes the application of high speed models and fast predictions necessary. As
a first step – conducted off-line – we need to describe the relationship between
input and output variables in a distinguishable way. A general linear model which
accounts for the single parameters linear effects was considered. In general, a
linear regression equation has the following form
Validation of the Model The regression analysis indicates that STemp, TT,
ToTemp, QF, QT and Sp had significant influence on P1H, which is confirmed by
the p-values (no significant influence of ST). The overall suitability of a linear
regression approach is supported by an adjusted R2 of 0.90 which describes
the percentage of the dependent variable variation by the model. P2ST can be
thoroughly described by linear combinations of ST, TT, ToTemp, QF, QT and
Sp (no significant influence of STemp) with an adjusted R2 of 0.99.
Since the total number of observations is limited and a partition into training
and test data is not sensible without loosing significant modeling capability the
models were validated with K-fold cross-validation. For K = 5, the overall mean
square of prediction error is 97.6 for the linear model (compared to 102 for the
complete model with all variables) to predict P1H and 3.87×10−6 for the predic-
tion of P2ST (compared to 6.14 × 10−6 for the complete model). This indicates
reasonably good linear models despite the limited number of observations which
will be increased in the future.
lower boundaries of the adjustable variables and the linear regression equations
combined with the quality boundaries. The optimization after process step 2 is
conducted in a similar way but only 4 adjustable variables are remaining.
Sometimes the quality prognosis after process step 1 indicates that the produced
part will not meet the final product quality requirements. Given the fact, that
the prognosis is accurate, this is a very valuable information this early on in a
production line because defective parts can be removed early in the production
process with the additional benefit of cost and energy savings. Table 4 shows an
example where after process step 2 no parameter adjustment is possible without
violating the constraints. HP1 and P2ST will both be too low no matter how
the process parameters in process 3 are altered.
References
1. Lu, Y.: Industry 4.0: a survey on technologies, applications and open research issues.
Journal of Industrial Information Integration 6, 1–10 (2017)
2. Harding, J.A., Shahbaz, M., Kusiak, A.: Data mining in manufacturing: a review.
Journal of Manufacturing Science and Engineering 128.4, 969–976 (2006)
3. Niggemann, O., Stein, B., Maier, A.: Solving Modeling Problems with Machine
Learning A Classification Scheme of Model Learning Approaches for Technical
Systems. In Model-Based Development of Embedded Systems (MBEES), Dagstuhl
(2012)
4. Oh, S., Han, J., Cho, H.: Intelligent process control system for quality improvement
by data mining in the process industry. Data mining for design and manufacturing.
Springer US, 289–309 (2001)
5. Senn, M., Link, N.: A universal model for hidden state observation in adaptive
process controls. International Journal on Advances in Intelligent Systems 4(3-4),
245–255 (2012)
6. AutoForm, url: https://www.autoform.com/
7. Neugebauer, R., Schieck, F., Polster, S., Mosel, A., Rautenstrauch, A., Schönherr, J.,
Pierschel, N.: Press hardening An innovative and challenging technology. Archives
of civil and mechanical engineering 12(2), 113–118 (2012)
8. Ihaka, R., Gentleman, R.: R: A language for data analysis and graphics. Journal of
Computational and Graphical Statistics 5(3), 299 – 314 (1996)
9. R Core Team: R: A Language and Environment for Statistical Computing. R Foun-
dation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
(2017)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
A Process Model for Enhancing Digital Assistance in
Knowledge-Based Maintenance
Klaudia Kovacs1,2*, Fazel Ansari1,2, Claudio Geisert3, Eckart Uhlmann3,4,
Robert Glawar2, Wilfried Sihn1,2
1
Vienna University of Technology (TU Wien), Institute of Management Science
2FraunhoferAustria, Division of Production & Logistics Management, Vienna, Austria
3Fraunhofer Institute for Productions Systems and Design Technology IPK, Berlin, Germany
4Institute for Machine Tools and Factory Management, TU Berlin, Berlin, Germany
1 Introduction
1.1 Digital Assistance in Knowledge-Based Maintenance
Maintenance is a knowledge-intensive process in which the process participants (or-
ganizations or (group of) individuals involved in the maintenance process and sub-pro-
cess(es) either as internal or external stakeholders) create, (re)use, and share specialized
professional knowledge, while enriching their implicit and experiential knowledge.
Considering maintenance organization as a learnable unit, it encompasses the creation,
Considering the discussion above, this paper presents an approach to improve the
maintenance efficiency through DAS using a morphological approach for the proper
hardware selection combined with a process-modeling tool providing the adequate in-
formation to fulfill the needed maintenance task. The goal of the proposed process
model is to systematically identify functionalities of the emerging technologies on the
market and apply the functionalities to requirements in order to find appropriate assis-
tance systems for various industrial applications. Therefore, an overview on present
digital assistance solutions is given and a morphological approach for the elicitation of
derived requirements on digital assistance solutions is presented.
Fig. 1. Overview of digital assistance systems on the market and their market entrance.
As a result, the four most common DAS in industrial application are: industrial tab-
lets, smart watches, smart phones and head mounted displays [12], [17], [18], [19].
While the pros and cons of handheld devices (industrial tablets, smart watches, smart
90
phones) are well known and elaborated in literature, the potential of head-mounted dis-
plays are disputed. The most value-creating functionalities of head mounted displays
lie in information provision, environmental identification and tracking [6]. The oppor-
tunity to access information hands free provides additional benefits. However, due to
various technical limitations and challenges, such as wear comfort or poor wireless net-
work connections, the question of usefulness in maintenance still arises.
2 Selection Methodology
This section explains the methodology of the developed model to select proper DAS
for maintenance tasks. The proposed model builds on three integrated elements (see
Fig. 2): i) Morphological Approach, ii) Application Layer and iii) Device Selection
Layer.
the problem complexity is categorized into several dimensions. Second, all possible
conditions (also referred to as parameters) to each dimension are identified. These pa-
rameters represent the characteristics of each dimension. Finally, a morphological ma-
trix is developed based on the identified dimensions and their assigned condition pa-
rameters [22]. Figure 3 depicts a morphological matrix, which contains a collection of
identified features that are critical to selecting an assistance system. Key features for an
adequate assistance system can be categorized into three groups: i) requirements re-
garding the application (software): How and to what extent maintenance information is
presented to maintenance operators and engineers towards increasing their performance
in an affordable manner? ii) requirements regarding the information system: How and
to what extent maintenance information is tailored to the application? iii) requirements
regarding the hardware: which hardware should be applied for the selected case?
Fig. 5. Schematic software architecture for a context sensitive digital assistance system.
To model business processes, the method of integrated enterprise modeling (IEM)
was developed in the 1990s at the Fraunhofer IPK [23]. The application of the IEM
supports the description of business processes and their interactions with description
elements of companies, such as organization, system, product or control. It is compati-
ble with DIN EN ISO 19440 "Enterprise Integration - Constructs for Enterprise Mod-
elling" and describes four element classes that can be related by five connection types.
Table 1 shows a selection of element classes and connection types which are needed to
model maintenance processes. The graphical modeling tool MO²GO[24], also devel-
oped at Fraunhofer IPK, is well suited to model the maintenance processes and forms
the basis for the implementation of DAS[25]. MO²GO supports the XML (eXtensible
93
Markup Language) exchange format, which is suitable for exchanging data between
different applications. For the process step representation in a graphical user interface
(GUI) of a digital assistance system, MO²GO offers an interface to provide the XML
format of the process model as a JAVA object representation. The elements and their
connections are then converted to JSON format and interpreted by an application inter-
face (API) to link resources, generate context sensitive instructions and to initialize
support functions on the maintained system during the various process steps. This
JSON representation is then transformed to the web-capable HTML5 format in which
JAVA Script is embedded to realize human-machine-interaction.
Table 1. Excerpt of IEM classes and connection types used for maintenance process modeling.
Objects and linkage types Description
Product All objects that are changed by activities during a field service deploy-
Produkt ment, e. g. the product “machine tool with failure” (start condition) is
changed by the activity “performing service deployment” towards the
product “machine tool without failure” (final condition)
Activity Changes the condition of a product
Aktion
Resource All needed objects necessary to perform an activity, e. g. web service call
to invoke a test routine on the machine tool
Ressource
Parallel branching Activities are performed parallel; parallel activities have to be completed
before the next activity can be started
Loop The activity in the loop will be performed until the condition for starting
X=1 the next activity is met
X=0
Fig. 6 Pictorial representation of a need for action and textual explanation of the activity com-
bined with a pictorial representation of the tool and the object to be exchanged.
see a complete virtual model of the equipment and the needed information to fullfil the
maintenance task to the right. The MO²GO model is used to provide logic and infor-
mation for the augmented reality (AR) based assistance system and to guide the worker
through eight process steps.
Acknowledgement
The authors would like to acknowledge the financial support of the European Com-
mission provided through the H2020 project EPIC under the grant No. 739592. The TU
Wien Pilot Factory Industry 4.0 has been partly funded by the public through the Aus-
trian Research Promotion Agency (FFG) and several private industrial firms – our part-
ners in the project.
References
1. Ansari, F.: Meta-Analysis of Knowledge Assets for Continuous Improvement of Maintenance
Cost Controlling. Faculty of Science and Technology, University of Siegen (2014).
2. Nemeth, T., Ansari, F., Sihn, W., Haslhofer, B., Schindler, A.: PriMa-X: A Reference Model for
Realizing Prescriptive Maintenance and Assessing its Maturity Enhanced by Machine Learning.
Procedia CIRP, Vol. 72, pp. 1039-1044. (2018).
3. Glawar, R., Karner, M., Nemeth, T., Matyas, K., Sihn, W.: An Approach for the Integration of
Anticipative Maintenance Strategies within a Production Planning and Control Model. Procedia
CIRP 67 46 – 51, (2018).
4. Hao, Y., & Helo, P.: The role of wearable devices in meeting the needs of cloud manufacturing:
A case study. Robotics and Computer-Integrated Manufacturing, 45. Jg., S. 168-179. (2017).
5. Kernchen, A., Jachmann, D., Adler, S..: Assistenzsysteme für die Instandhaltung und Störungs-
behebung. 21. Magdeburger Logistik Tage. Logistik neu denken und gestalten. S.195. (2016).
6. Niemöller, C., Metzger, D., Fellmann, M., Özcan, D., Thomas, O.: Shaping the future of mobile
service support systems-ex-ante evaluation of smart glasses in technical customer service pro-
cesses. Informatik 2016, (2016).
7. Erkoyuncu, J. A., del Amo, I. F., Dalle Mura, M., Roy, R., Dini, G.: Improving efficiency of
industrial maintenance with context aware adaptive authoring in augmented reality. CIRP Annals
66.1. 465-468. (2017).
95
8. Uhlmann E., Raue N., Geisert C.: Unterstützungspotenziale der Automatisierungstechnik im tech-
nischen Kundendienst. Summary of an explorative survey on best pactices in field service. Berlin:
Fraunhofer IPK, (2013).
9. Mourtzis, D., Zogopoulos, V., Vlachou, E.: Augmented reality application to support remote
maintenance as a service in the Robotics industry. Procedia CIRP 63: 46-51. (2017).
10. Neges, M., Wolf, M., Abramovici, M.: Secure access augmented reality solution for mobile
maintenance support utilizing condition-oriented work instructions. Procedia CIRP, 38, 58-62.
(2015).
11. Palmarini, R., Erkoyuncu, J., Rajkumar, R..: An innovative process to select Augmented Reality
(AR) technology for maintenance. Procedia CIRP 59: 23-28 (2017).
12. Palmarini, R., Erkoyuncu, J. A., Roy, R., Torabmostaedi, H.: A systematic review of augmented
reality applications in maintenance. Robotics and Computer-Integrated Manufacturing 49: 215-
228. (2018).
13. Hold, P., Erol, S., Reisinger, G., & Sihn, W.: Planning and Evaluation of Digital Assistance Sys-
tems. Procedia Manufacturing 9:143-150. (2017).
14. Reisinger, G., Komenda, T., Hold, P., & Sihn, W.: A Concept towards Automated Data-Driven
Reconfiguration of Digital Assistance Systems. Education & Training 2351: 9789. (2018).
15. Hohwieler E, Geisert C.: Intelligent Machines Offer Condition Monitoring and Maintenance Pre-
diction Services. In: Teti R, editor. Proceedings of the 4th CIRP International Seminar on Intelli-
gent Computation in Manufacturing Engineering (CIRP ICME ’04). 30 June - 2 July 2004, Sor-
rento, Italy; pp. 599-604. (2004).
16. Hohwieler E, Berger R, Geisert C.: Condition Monitoring Services for e-Maintenance. In: Za-
remba M, Sasiadek J, Erbe HH, editors. A proceedings volume from the 7th IFAC Symposium,
Gatineau, Québec, Canada, 6-9 June 2004. Oxford: Elsevier pp. 239-244. (2005).
17. Ziegler, J., Heinze, S., Urbas, L.: The potential of smartwatches to support mobile industrial
maintenance tasks. Emerging Technologies & Factory Automation (ETFA), IEEE 20th Confer-
ence on. IEEE, (2015).
18. Bokrantz, J., Skoogh, A., Berlin, C., & Stahre, J.: Maintenance in digitalised manufacturing: Del-
phi-based scenarios for 2030. International Journal of Production Economics, 191, 154-169.
(2017).
19. Hold, P., Ranz, F., Hummel, V., Sihn, W..: Durchblick im Variantendschungel: visuelle Assis-
tenzsysteme als Flexibilitätshebel auf dem Shop Floor (2015).
20. Ritchey, T.: Modeling alternative futures with general morphological analysis. World Future Re-
view, 3(1), 83-94. (2011).
21. Ritchey, T.: Problem structuring using computer-aided morphological analysis. Journal of the
Operational Research Society, 57(7), 792-801. (2006).
22. Im, K., Cho, H.: A systematic approach for developing a new business model using morphological
analysis and integrated fuzzy approach. Expert Systems with Applications, 40(11), 4463-4477.
(2013).
23. Spur, G.; Mertins, K.; Jochem, R.: Integrated Enterprise Modelling. Berlin, Wien, Zürich: Beuth.
(1996).
24. Mertins K, Jaekel FW.: MO²GO: User Oriented Enterprise Models for Organisational and IT So-
lutions. In: Schmidt G, Mertins K, Bernus P, editors. Handbook on architectures of information
systems. Berlin, New York: Springer p. 649-663. (2006).
25. Uhlmann, E.; Geisert, C.; Raue, N.; Gabriel, C.: Situation Adapted Field Service Support Using
Business Process Models and ICT Based Human-Machine-Interaction. Procedia CIRP 47, p. 240-
245. (2016).
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
Detection of Directed Connectivities in Dynamic
Systems for Different Excitation Signals using
Spectral Granger Causality
1 Introduction
Process control systems at production plants usually consist of a large number
of process variables, while the interconnectivity of the variables is not always
directly evident. Hence, due to the interconnectivity, if some change, voluntary
or on purpose is performed on one unit, this can lead to unwanted effects at
another unit. Therefore, it is of great interest to understand which variable has
a significant influence on another variable.
For the automatic detection of directed connectivities in time series exists al-
ready a wide variety of methods, which are mainly developed for the use in
neuroscience (e.g. [3] or [1] for reviews) or for the analysis of econometric data
[9]. One of the first methods developed, was done by Granger [8], being called
the Granger Causality. This method uses two vector autoregressive functions
and, by comparing their residual sum of squares, the method tells if one variable
causes the other or not. The original approach, taking place in the time domain,
© The Author(s) 2019
J. Beyerer et al. (Eds.), Machine Learning for Cyber Physical Systems, Technologien
für die intelligente Automation 9, https://doi.org/10.1007/978-3-662-58485-9_11
98
was extended by Geweke [7] into the spectral domain, having the advantage to
select specific frequencies for analysis. In 2000 Schreiber [14] developed a method
called Transfer Entropy, which measures the amount of information transferred
from one random process to another. In recent research, Transfer Entropy has
been extended by contains several extensions like Partial Transfer Entropy [11]
or Symbolic Entropy [13]. Bauer [2] proposes a Nearest-Neighbor approach for
cause-effect analysis. In [12] different methods for the detection of significant
directed influences were developed and compared on several benchmarks, con-
sisting of simulated dynamic systems data, biosignals and on disturbances from
a glass forming process. Kaminski [10] proposes the estimation of directed trans-
fer functions.
This aim of this paper to investigate under which circumstances it is possible
to detect directed influences in measurements, depending on the excitation sig-
nal as well as the underlying dynamic systems. As specific detection method
Spectral Granger Causality [7] is used, which is extended with a surrogate-based
significance test. In difference to [12], which already defines benchmark processes
for the detection of causal dependencies, the current paper focuses more on the
excitation signal characteristics.
The paper is structured as follows: Section 2 introduces how directed connectivi-
ties can be detected in time series and how Spectral Granger Causality is applied.
Additionally, the surrogate-based calculation of the significance threshold is ex-
plained. Section 3 describes the defined input signals and dynamic systems for
benchmarking, while section 4 discusses the results. Finally, section 5 gives a
summary and some ideas for future research.
In 1, 2 n is the model order, auu , auy , ayu , ayy ∈ Rn contain the model coefficients
and eu [k], ey [k] ∈ R define the residuals. Finally GC checks the coefficients in
ayu (respectively auy ). If these are significantly different from zero, it is assumed
that u causes y (respectively y causes u). Usually, this is done by comparing the
squared-sum of residuals of eu (respectively ey ) with and without taking into
account the influencing variable y (respectively u).
Auu (f ) Auy (f ) u(f ) e (f )
= u (4)
Ayu (f ) Ayy (f ) y(f ) ey (f )
with u(f ) and y(f ) are the Fourier transformed time series from u[k] and y[k] and
eu (f ), ey (f ) are the Fourier transformations of eu [k] and ey [k]. The components
of A are then transformed as
n
Auu (f ) = 1 − auu (n)e(−i2πf n) (5)
i=1
n
Auy (f ) = − auy (n)e(−i2πf n) (6)
i=1
which counts analogous for Ayu (f ) and Ayy (f ). Finally, equation 4 can be rewrit-
ten as
u(f ) Huu (f ) Huy (f ) eu (f )
= (7)
y(f ) Hyu (f ) Hyy (f ) ey (f )
100
with H(f ) defining the transfer function matrix. Following Geweke [7], under
the assumption that the covariance Σuy = 0, the auto spectrum Suu (f ) for the
time series u[k] can be derived as
Suu (f ) = Huu (f )Σuu Huu (f )∗ + Huy Σyy Huy (f )∗ . (8)
The asterisk in equation 8 defines the transposed and complex conjugated trans-
fer function. According to Seth [15], equation 8 can finally be divided into an in-
trinsic part, namely Huu (f )Σuu Huu (f )∗ and a causal part, namely Huy Σyy Huy (f )∗ .
Hence, the Granger Causality for each frequency can be calculated as
|Suu (f )|
fu→y (f ) = ln .
|Suu (f ) − Huy Σyy Huy (f )∗ |
Finally, the causal strength Fu→y is calculated by integrating over the complete
frequency spectrum being defined as
2π
1
Fu→y = fu→y (f )df (9)
2π 0
2.2 Threshold
The in equation 9 defined causal strength Fu→y is not bounded, meaning that
from the bare value it is not possible to tell if a causal dependency is really sig-
nificant or not. Therefore, a threshold needs to be calculated each time an input
u is tested against a possible output y. Following Choudhury [5] a surrogate time
series needs to calculated for u, while surrogate means that the phase coupling
is removed but the signal keeps the same power spectrum. In other words, all
causal information is removed from the signal. To calculate the surrogate of u
the following steps need to be performed
uFFT = FFT(u)
⎧
⎪
⎨uFFT [k] k = 1, N/2 + 1
uFFT = uFFT [k]ejΦk−1
surr
k = 2, . . . , N/2
⎪
⎩
uFFT [k]ejΦk−1 k = (N/2 + 2), . . . , N
usurr = IFFT(usurr
FFT )
with FFT being the Fourier and IFFT being the Inverse Fourier Transform.
In that case N describes the number of samples and Φn ∈ 0, . . . , 2π with k =
1, . . . , (N/2 − 1) is a random phase value. The final threshold is derived in terms
of a 3σ test being defined as
Fu→y
Threshold
= μsurr + 3σ surr
with
M M
1 1
μsurr = Fusurr →y , σ surr = (Fusurr →y − μsurr )2 .
M M m=1
k=1
101
and M being the number of surrogate trials. If the outcome indicates Fu→y >
Fu→y
Threshold
, the found causal dependency is defined as being significant.
3 Benchmarks
For the detection of directed connectivities in time series two things are impor-
tant, namely the characteristics of the excitation signal and the underlying pro-
cess behavior. Hence, this section proposes several possible input signals (section
3.1) and several dynamic SISO systems (section 3.2). Next, the Spectral Granger
Causality is used to detect the input and output signal for each pair.
White Noise - A time series that consists of white noise means to have a sequence
of uncorrelated random variables with constant mean μ and variance σ 2 . In the
following, the input time series uwn [k] ∈ R is modeled as a stochastic process
with μ = 0 and σ 2 = 1.
Sawtooth Wave - This time series can be interpreted as some sort of a drift
e.g. when sensors are slowly polluting. For the sawtooth wave the input series
usw [k] ∈ R is defined as usw [k] = frac( Tk + Φ) with a period of T = 100 and the
phase Φ = 0 and frac being the fractional part defined as frac ≡ x − x .
Impulse train - Having so-called impulse or spike train means that e.g. an inert
gas or fluid injection into a process at a predefined cycle occurs. Therefore, the
KN
−1
input time series uit [k] ∈ R is defined as uit [k] = k=0 δ[n − kK] with N |K,
δ being a Dirac impulse, N ∈ R representing the length of the time series and
K ∈ R the period. In the following the period K is set to 100.
Random Walk - The time series of a random walk is defined as a process where
the value at sample point [k] is composed of the past value [k − 1] plus an error
term defined as white noise. In this paper the random walk is used to investigate
how used methods behave on low-frequent changes in a process e.g. when having
a fluctuation of some concentration in a fluid. Therefore, the input time series
uRW [k] ∈ R is defined as uRW [k] = uRW [k − 1] + [k] where [k] is a white-noise
sequence with μ = 0 and σ 2 = 0.1.
102
−14
−1
Saw Tooth Wave Saw Tooth Wave
1 −1
−3
0
Impulse Train Impulse Train
1
−5
−15
0
Random Walk Random Walk
2
−1
−2
0
0 40 80 0.0 0.5 1.0 1.5 2.0
time (s) frequency (Hz)
Fig. 1. Investigated excitation signals in the time domain and their corresponding
power spectra.
Low-pass filter - The low pass filter with the time constant T = 1 s represents
the most basic system for the detection of input and output signal. In process
technology low-pass filter are e.g. fluid tanks or pipes which tend to attenuate
a disturbance and hence making in sometimes complicated to track back the
disturbance propagation path. This benchmark is mainly used to investigate the
behavior regarding the defined input signals in section 3.1.
103
Fig. 2. Used transfer functions for the validation of the detection of directed influences.
4 Results
For analysis, each dynamic systems wax excited with the different input signals
and the spectral Granger causality was used for the detection of directed influ-
ences from u → y, with results shown in figure 4, and from y → u, where the
results are shown in figure 4. If a directed influence has been found, the corre-
sponding box contains a checkmark, elsewise it contains a cross. In the following
a summary is given by following corresponding to the defined benchmarking
dynamic systems.
Dead time: In that use case, consisting of a simple time shift, for all input
signals, the directed dependencies from u → y are detected and defined as being
significant. Nevertheless, for the input signal usin and uimp a false positive di-
rected influence has been found pointing from y → u. The explanation is straight
forward, since the impulse train as well as the sinusoid are cyclic excitations sig-
nals, hence having only a time shift in the signals, it is not possible to distinguish
input from output signal .
104
yout yout
uwn usin ust uimp urw uwn usin ust uimp urw
yout yout
uwn usin ust uimp urw uwn usin ust uimp urw
Fig. 3. Results of the benchmarks when testing for directed influences Fu→y
Fig. 4. Results of the benchmarks when testing for directed influences Fy→u
Low-pass filter: Regarding the low pass filter, uwn ,usin and uimp detect the
correct directed connectivity. The saw tooth and random walk, both having a
similar power spectrum (see figure 1) are not detected. The reason is that the
low-pass filter has to too much attenuation, resulting in an output signal which
has already too much information about itself in past values. Hence, in terms
of Granger Causality, this results in a non-significant information gain for urw .
The only excitation signal leading to a connectivity from y → u is the sinusoid.
Like for the dead time benchmark, the reason is that the sinusoid is cyclic.
5 Summary
The results showed when using spectral Granger Causality, the detection of di-
rected influences in time series depends the excitating signal as well as on the
underlying dynamic system. Regarding the excitation signals, for none of the sig-
nals it was possible to detect for all four dynamic systems the correct directed
influence u → y, while at the same time never detecting a wrong influence y → u.
Hence, when using Granger Causality, detected or not detected directed influ-
ences in data always need to be questioned in terms of the excitation as well as
in terms of the underlying process behavior. Still, this method can be of great
help to generate first a understanding of the influences variables have onto each
other in a data set, since no always, but most of the times Granger Causality
detected the correct dependency.
In terms of the development of benchmarks, there is a variety of future research.
Questions that arise are the impact of noise in the data or how a directed in-
fluence can still be detected if variables having a common cause. Regarding
Granger Causality, it can be evaluated, in which cases it is possible to differ-
enciate between direct and indirect influences, e.g. when using the multivariate
Granger Causality. Additionally, the benchmarks should be used to compare
several methods like Transfer Entropy with its extensions or the estimation of
Directed Transfer Functions.
6 Acknowledgements
This work was developed in the Fraunhofer Cluster of Excellence ”Cognitive
Internet Technologies”.
References
1. Bastos, A. and Schoffelen, J.: A Tutorial Review of Functional Connectivity Analysis
Methods and Their Interpretational Pitfalls, Frontiers in Systems Neuroscience(9),
pp. 175, (2016)
2. Bauer, M.: Data-driven Methods for Process Analysis. University College London,
PhD Thesis, (2005)
3. Blinowska, K.: Review of the methods of determination of directed connectivity
from multichannel data, Medical & Biological Engineering & Computing (49), pp.
521 - 529, (2011)
4. Blinowska, K. et al.: Granger causality and information flow in multivariate pro-
cesses, Phy. Rev. E.(70), no.p. 4, (2004)
106
5. Choudhury, A.A.S. and Shah, S.L. and Thornhill, N.F.: Diagnosis of Process Non-
linearities and Valve Stiction: Data Driven Approaches, Advances in Industrial Con-
trol, Springer, (2008)
6. Spectral connectivity toolbox: ”https://github.com/Eden-Kramer-Lab/spectral_
connectivity/”, last access August 2018
7. Geweke, J., Measurement of linear dependence and feedback between multiple time
series. Journal of American Statistical Association, vol. 77. pp 304–313 (1982)
8. Granger, C. W.J.: Investigating Causal relations by Econometric Models and Cross-
Spectral Methods, Econometrica 37, (1969)
9. Heckman, J.: Econometric causality, International statistical review(76), pp. 1–27,
(2008)
10. Kaminski M., and Blinowska K.J.: Directed Transfer Function is not influenced
by volume conduction—inexpedient pre-processing should be avoided, Frontiers in
Computational Neuroscience. (8), (2014)
11. Kugiumtzis, D.: Partial transfer entropy on rank vectors, The European Physical
Journal Special Topics (222), pp. 401–420, (2013)
12. Kühnert, C., et al.: Methoden zur datengetriebenen Formulierung und Visual-
isierung von Kausalitätshypothesen. at - Automatisierungstechnik Methoden und
Anwendungen der Steuerungs-, Regelungs- und Informationstechnik, 60(10), pp.
630-640. , (2012)
13. Staniek, M. and Lehnertz, K.: Symbolic transfer entropy: inferring directionality
in biosignals,Biomedizinische Technik/Biomedical Engineering (54), pp.323–328 ,
(2009)
14. Schreiber, T.: Measuring information transfer, Physical Review letters 85(2), pp.
461-464, (2000)
15. Seth, A. et al.: Granger Causality Analysis in Neuroscience and Neuroimaging,
Journal of Neuroscienc(8), pp. 3293–3297, (2015)
16. Yang, D. et al.: Granger Causality for Multivariate Time Series Classification,
IEEE International Conference on Big Knowledge (ICBK), pp. 103-110, (2017)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
Enabling Self-Diagnosis of Automation Devices through
Industrial Analytics
Weidmüller Interface GmbH & Co. KG, Klingenbergstraße 16, 32758 Detmold, Germany
{carlos.paizgatica;alexander.boschmann}@weidmueller.com
Abstract. This paper shows how automation components can be enhanced with
self-monitoring capabilities, which are more effective than traditional rule-based
methods, by using Industrial Analytics approaches. Two application examples
are presented to show how this approach allows the realization of a predictive
maintenance strategy, while drastically reducing the realization effort. Further-
more, the benefits of a flexible architecture combining edge- and cloud-compu-
ting for the realization of such monitoring system are discussed.
This paper shows the use of Industrial Analytics as means of enabling a condition
based- or even a predictive maintenance strategy for simple automation components
lacking dedicated monitoring resources. It is shown in section 2 how a flexible archi-
tecture combining edge and cloud computation enables the realization of such monitor-
ing system. The process to develop an Industrial Analytics solution is then explained in
section 3. Two practical use cases are then presented in section 4, disclosing the poten-
tial of this approach to reduce maintenance costs while increasing its effectiveness.
The next step is the selection, training and tuning of machine learning algorithms to
derive a model from the selected features (model learning). Again, the combination of
analytics expertise and domain knowledge is key to develop an efficient model. Once
developed, the model can be used at runtime to monitor the machine or process (model
execution). To be useful the results need to be properly visualized (visualization). The
kind of visualization should be selected according to the role of the person who shall
109
use this information, e.g., the machine operator, the maintenance manager, etc. The
integration of an industrial analytics function in an automation system can be done at
different levels, for instance at the machine, or using a cloud platform. These possibil-
ities are explored in the next section.
The core benefit of this approach is to allow for low latency by computing the data
where it is created without incurring network latencies, which is essential for real-time
condition monitoring applications. Another benefit is scalability: while a traditional
centralized approach will no longer be feasible with an increasing number of communi-
cating devices, Edge Computing provides a linear scalability and is needed as augmen-
tation to reduce pressure on network infrastructure. Furthermore, storage and operation
cost can be reduced by processing time-sensitive data locally and significantly reducing
raw data before being sent to the Cloud. This technique can also be used to preserve
privacy by ensuring that sensitive data is pre-processed on-premise so that only privacy-
complaint data is transferred to the Cloud. Following the steps from data acquisition to
analytics processing and to the visualization of meaningful machine information, vari-
ous processing steps at different system components are involved. Figure 1 illustrates
an example of a flexible automation system architecture implementing Industrial Ana-
lytics at Edge-, on-premise- and Cloud levels.
Raw data are acquired by Remote Terminal Units (RTUs) from machines, and pro-
cess-relevant actuators and sensors over a fieldbus, e.g. PROFIBUS, depicted by green
bus connections. An initial pre-processing stage such as filtering can be implemented
on these devices. The signals are then collected by a Programmable Logic Controller
(PLC) and used to control the system. Additional process-independent components like
smart temperature-, vibration- or pressure sensors are typically connected to an Indus-
trial IoT (IIoT) gateway via Bluetooth, WiFi, Ethernet or the emerging 5G
[PLZW2015]. These components play an important role in the process of retrofitting
and enabling Industrial Analytics services on older machines. Monitoring systems for
important control parts that usually don’t offer data interfaces by design (i.e. electro-
mechanical relays or solenoid valves) can ideally be connected to an IIoT Gateway. We
present two practical use cases for these systems in the following section of this paper.
110
Low latency Edge Analytics functions can be implemented in both, modern PLCs
and IIoT gateways. While the PLC can only monitor the devices connected to it, the
IIoT gateway typically can access the PLC data in addition to the process-independent
component data to generate a larger machine learning model. If necessary, the data
density can be further decreased at the Edge level. In addition to data storage and visu-
alization, more complex analytics functions over multiple machines or devices can be
performed on-premise by an Industrial PC (IPC) or in the Cloud at the cost of higher
latency and increased network traffic. Rich and detailed visualization functions are of-
fered by the Supervisory Control and Data Acquisition (SCADA) or Manufacturing
Execution System (MES).
4 Use Cases
In this section two use cases are presented, which show the benefits of enabling sim-
ple automation devices with self-monitoring capabilities: Monitoring of electrome-
chanical relays and solenoid valves.
They are widely used in industrial areas such as plant construction, mechanical engi-
neering or shipbuilding for switching inductive loads, e.g. for controlling solenoid
valves.
In this use case, electromechanical relays were tested for inductive load over their
lifetime to develop Industrial Analytics methods for failure detection. In the experi-
mental setup, relays were tested by switching on and off repeatedly under a high DC
load. An inductive load was connected to the contact side of the relays, causing an arc
between the opening contact surfaces at the moment of switch-off and damaging the
relay contacts. This process was repeated until failure of the relay.
A combination of features based on the electric current flow through the relay coil
in combination with a Kullback-Leibler divergence-based classifier [KL1951] has been
found which allows for a prediction of imminent failure and predictive maintenance. In
this study, only features that can be directly measured in the relay without additional
sensors were considered. Figure 3 shows an example plot of the classification output.
112
Here, the relays were classified into three categories: healthy (green), damaged (or-
ange) and possible failure (red). With the method presented in this paper it is possible
to detect an imminent failure due to welding of the relay contacts with high accuracy.
In this case, a condition monitoring system can trigger a warning and initiate a predic-
tive maintenance measure before actual damage has occurred. The time remaining in a
concrete use case scenario to respond to the imminent failure depends heavily on the
switching frequency of the relay being monitored. Based on our experiments, the
method presented here allows enough reaction time for applications having high
switching frequencies (10 operations per second) or low switching frequencies (1 op-
eration per hour). For this kind of applications, analytics
When a current is applied to the magnet winding, the movable magnet armature is
attracted, thus releasing the valve plug from the valve seat (see Figure 5). A medium
can flow. When switching off the current, the return spring ensures the lowering of the
magnet armature and thus the closure of the valve seat by the valve plug. Mechanical
loads on the moving parts and the permanent flow of media cause signs of wear inside
the solenoid valve. Also, the continuous use under difficult operating conditions, such
as high temperatures and vibrating environments, can cause additional wear. Since so-
lenoid valves are often used in safety-critical applications, malfunctions can have cata-
strophic economic consequences and, above all, put in danger human lives. Not only is
wear within a solenoid valve a safety hazard, errors in the signal line (e.g., wire break,
short circuit) to the solenoid valve can also cause failures and thus pose a high risk.
113
To prevent premature wear due to wear, the valve or drain in the solenoid valve and
the signal line to the valve must be monitored. Four error classes dominate the reports
[NRC1987]:
x Foreign matter in the valve (16%)
x Burnt coil / short circuit (15%)
x Worn or defective valve parts (11%)
x Open circuit in coil (9%)
When monitoring solenoid valves, there are two different approaches. The first
approach is a rule based approach. During operation, the load current is monitored by
means of an electronic component. If the current falls below or exceeds the set limits,
the block sends a signal to the controller.
Figure 6: A significant shift in the curves indicates signs of wear on the valve mechanism
114
With this method, events such as wire breakage, short circuit or overvoltage and
undervoltage can be detected and reported. However, changes in the dynamics of the
system inside the defined boundaries are not detected.
The second approach pursues the goal of early detection of valve failure. Here, the
current waveforms of switching cycles are recorded and compared (see Figure 6). This
approach enables device- and application-specific monitoring, because the reference
model is created or parameterized during operation. Deviations to a certain extent may
indicate a near defect and thus initiate the timely replacement of the valve (see Figure
6). As in the previous case, the realization of this monitoring strategy does not require
the use of dedicated sensors, because features extracted from already existing signals
are used. This enables the realization of such strategy also for low cost applications.
References
1. [SD2016] W. Shi and S. Dustdar, "The Promise of Edge Computing," in Computer, vol. 49,
no. 5, pp. 78-81, May 2016.
2. [GJFVR2016] D. Georgakopoulos, P. P. Jayaraman, M. Fazia, M. Villari and R. Ranjan,
"Internet of Things and Edge Cloud Computing Roadmap for Manufacturing," in IEEE
Cloud Computing, vol. 3, no. 4, pp. 66-73, July-Aug. 2016.
3. [PLZW2015] M. Peng, Y. Li, Z. Zhao and C. Wang, "System architecture and key technol-
ogies for 5G heterogeneous cloud radio access networks," in IEEE Network, vol. 29, no. 2,
pp. 6-14, March-April 2015.
4. [KL1951] Kullback, S.; Leibler, R. A. On Information and Sufficiency. Ann. Math. Statist.
22 (1951), no. 1, 79--86.
5. [NRC1987] Aging and service wear of solenoid-operated valves used in safety systems of
nuclear power plants: Volume 1, Operating experience and failure identification (Nuclear
Regulatory Commission, Washington, DC, USA, 1987)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
Making Industrial Analytics work for Factory
Automation Applications
Markus Koester
1 Introduction
this paper is more on the challenges in implementing these solutions. Section 2 gives
a high-level overview over the functionality of the industrial analytics pipeline, which
was considered in the implementations. This pipeline describes the data flow starting
from the raw data created by the target machine to the visualization of the analytics
results. Section 3 covers the main scope of this contribution by highlighting the chal-
lenges, which are a) design considerations to allow for scalability of the solution, b)
our underlying process from the first idea of the solution to the final production-ready
software, and c) a continuous integration (CI) and continuous delivery (CD) pipeline
for automatically building the software solution. In Section 4 we give an overview
over an example application, which we have implemented the industrial analytics
solution for.
The core concept of the analytics pipeline is present in Fig. 1. Collecting machine
data is highly use case dependent and requires to be tailored according to the given
data sources and accessibilities of the target machine. To simplify the data processing
of the following analytics steps the raw data requires being collected and stored cen-
tralized if the target architecture allows. Having a single data source for further data
operations of the pipeline, such as a centralized data base, greatly simplifies the data
handling.
Preprocessing of the data is a key step to filter out data that has little or even no
impact on the modeling success and to create relevant features that represent the actu-
al state of the target machine. As described in the context of data dependencies in [2]
the quality of the result of an analytics model greatly depends on the given input fea-
tures. Besides statistical and data centric approaches, we consider domain knowledge
provided by the machine user in the creation of features. Thus we combine expert
know-how from the industry application domain and from the data science domain.
The selected features are used in the two branches model learning and model exe-
cution. The selection of the underlying machine learning algorithm highly depends on
the target application. Once a model is created it can be used in the model execution
branch to compute analytics results. These can be numerical indicators for anomaly
detection or contextual information reflecting the current state of the machine. For the
scenario of predictive maintenance the output of the model can be e.g. the likelihood
of a failure in a given future time interval. This information is finally visualized to
support the user in taking decisions for optimizing the efficiency of the machine and
for avoiding unplanned down-times.
Frontend UI
Scheduler Analytics Models
Pipeline
Machine / Model
Configuration
Analytics
Plot Creation Results
The fronted user interface holds different functionality, which is described in the
following: Status monitoring is used to track the state of the analytics pipeline and to
inform the user about abnormal behavior. The analytics architecture is designed to
handle different users and to provide authentication and user grouping functionalities.
Analytics functions, such as model scoring or model learning can be executed in dif-
ferent time intervals, which can be configured in a scheduler. The user can select
different models out of the ones given in the model data base and configure and tune
the models according to the target machine. The plot creation container is used to
generate user-defined plots based on the resulting analytics data.
The machine data is collected and stored in a corresponding data base, which is
used as source for the data analytics pipeline container. Besides the machine data the
architecture additionally comprises a model data base, where different machine learn-
ing models with its pre-processing pipelines are stored.
In a typical flow of the analytics functionality the scheduler triggers the execution
of the analytics pipeline, which loads the selected model from the database and ap-
plies the model to the specified input machine data. For model scoring the resulting
data are written to the analytics result data base, which holds the data for result visual-
ization. For a model learning scenario, the result of the analytics pipeline is a new or
updated machine model, which is stored in the model data base, and which can be
used for scoring in the future.
The architecture is designed for horizontal scalability and platform independence.
Instead of using a single analytics pipeline, the architecture allows for running several
analytics pipelines at the same time, which can be used to speed up the execution, or
to run different models concurrently. Its container-based implementation allows the
architecture to be deployed locally on a single PC (with reasonable amount of availa-
ble resources) as well as on virtual environments in the cloud.
120
Target
Data Exploration Proof of Concept Pilot Development for Production
Definition
CI/CD pipeline for the industrial analytics solution and were able to significantly
reduce the development effort.
4 Practical Example
5 Summary
References
1. Gatica, C. P.; Koester, M.; Gaukstern, T.; Berlin, E.; Meyer, M.:"An industrial analytics
approach to predictive maintenance for machinery applications," 2016 IEEE 21st Interna-
tional Conference on Emerging Technologies and Factory Automation (ETFA), Berlin,
(2016).
2. Sculley, D.; Holt, G.; Golovin, D. et al.: Hidden technical debt in Machine learning sys-
tems. In: Proceedings of the 28th International Conference on Neural Information Pro-
cessing Systems - Volume 2 (NIPS'15), C. Cortes, D. D. Lee, M. Sugiyama, and R. Gar-
nett (Eds.), Vol. 2. pp. 2503-2511. MIT Press, Cambridge, MA, USA (2015).
3. Wirth, Rüdiger: CRISP-DM: Towards a Standard Process Model for Data Mining. In: Pro-
ceedings of the Fourth International Conference on the Practical Application of
Knowledge Discovery and Data Mining, pp. 29-39 (2000).
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
Application of Reinforcement Learning in Production
Planning and Control of Cyber Physical Production
Systems
1 Introduction
The productivity of manufacturing systems and thus their economic efficiency depends
on the performance of production control mechanisms. Because of an increasing global
competition and high customer demands, the optimal use of existing resources is ever
more important. Optimizing production control is hence a central issue in the manufac-
turing industry.
Companies are additionally facing complex manufacturing processes due to high
product diversity, lot size reduction and high quality requirements. In the herein con-
sidered real-world example of the semiconductor industry, complexity arises through a
high number of manufacturing processes and their precision on a nanometer level [1].
Planning and coordinating processes is a challenging task and requires appropriate con-
trol methods and decision support systems.
Moreover, production control has to deal with a dynamic and non-deterministic system
inside a volatile environment and thus has to handle uncertainty and unexpected inci-
dents [2]. Currently, production planning and control systems such as mathematical
programming, heuristics and rule-based approaches are highly centralized and mono-
lithic and not able to meet these needs [3]. Therefore, the dynamic characteristics of
production systems are poorly met.
Through the integration of manufacturing components, enhanced process monitor-
ing and data collection, Cyber Physical Production Systems (CPPS) provide real-time
data such as order tracking, machine breaks and inventory levels. This makes it possible
to apply data-driven techniques, such as Machine Learning (ML) algorithms. Addition-
ally, these are able to adjust to the current system state by analyzing the available data
in real-time. This paper shows the successful implementation of a decentral production
control system that is based on ML algorithms. The system focuses on the following
two use cases: order dispatching and maintenance management. As performance bench-
mark an existing rule-based heuristic is considered. The real-world use case is taken
from a semiconductor manufacturing company that is regarded as a highly suitable ex-
ample of a cyber physical and digitized production system.
Given the challenges of wafer fabrication, order dispatching and maintenance man-
agement becomes crucial. Based on real-time process and product data the dispatching
and maintenance decisions can be enhanced by ML algorithms in order to optimally
match the current manufacturing situation and objectives.
The agent perceives the actual state of the environment as a vector St. In order to
decide on an action At the information is processed in the agent function that stores the
current policy ߨ௧ (ܽ| = )ݏԶ(ܣ௧ = ܽ |ܵ௧ = )ݏ. After the action is performed in the en-
vironment the agent perceives the new state St+1 and a reward signal Rt+1. Note that the
environmental transformation is closely linked to the concepts of Markov Decision Pro-
cesses (MDP). According to the received feedback, the agent adapts its policy. [11]
These steps are repeated in an iterative procedure. As a result, the agent optimizes
its behavior in a way to find a policy ߨ maximizing the long-term reward – and therefore
a policy that corresponds best to the agent’s objectives. [11]
Finding an optimal policy is a iterative process. In each iteration, the current policy
ߨ௧ is adapted depending on the latest experiences. There are two main techniques to
127
determine the new policy: (i) value-based and (ii) policy-based approaches. The main
difference between both approaches is that value approximation learns the action-value
function during the interaction instead of directly learning a policy Ɏ. The value func-
tion qɎ(s,a) defines the expected long-term return when choosing an action a in state s
following policy Ɏ. The policy is then derived from the estimated value of all possible
actions in each state. Policy approximation, on the other hand, directly updates the pol-
icy function ߨ௧ = ߨ௧ (ܽ| )ݏ.
Most real-world problems deal with continuous action and state spaces. Storing and
updating the policy or value function in a table is therefore computationally inefficient
and requires lots of memory space. One possibility is to store the original policy or
value function approximatively. Artificial neural networks are widely used for that pur-
pose, as they are capable of approximating complex functional relationships via multi-
ple weights connecting the neurons within the network and allow the adaption of those
weights dynamically during the learning process [11]. As a result, neural networks re-
duce the computational effort by updating a set of weight parameters instead of the
values for each state-action pair in each iteration. A dense fully connected feed-forward
network is considered in this paper.
Depending on the dimension and the characteristics of the problem, different learn-
ing approaches lead to good results. In recent years, new kinds of RL algorithms such
as PPO [12], TRPO [13] and DQN [14] were developed to deal with complex problems
in different domains. They can be regarded as advanced policy or value approximation
algorithms that are optimized with regard to an efficient and stable learning process.
The results of this paper are based on these RL algorithms.
the simulation model and the RL algorithm, are implemented in Python to be able to
implement the bidirectional interaction of the RL agent with the production system.
It can be shown that the RL algorithm improves its performance over time, proving
that it can be applied as flexible order dispatching control that continuously learns the
optimal behavior. Fig. 3 shows the development of the reward signal starting from the
initial state where the agent’s behavior is completely random. The agent successfully
learns a high performance behavior, however not losing the desired flexible behavior.
The reward fluctuation points out that the agent is adaptive enough to react to changing
conditions of the production system (e.g. disturbances, demand fluctuations). The
benchmark FIFO-heuristic approach is based on a set of if-then-rules, e.g. “take the
longest waiting batch next” and “first dispatch all batches in one area and move to an-
other area afterwards” (to minimize time consuming area changes). According to Fig.
3 the RL-based algorithm yields a superior performance behavior. After the first itera-
tions the utilization drops to a bottom value. In the end, an overall machine utilization
of above 90% is achieved, comparing to a utilization of far below 90% for the heuristic.
The same applies for the TPT. Moreover, the heuristic results show an almost stable
performance that is not able to adapt to changing conditions. [15]
0,5 1 140
TPT [s]
0,3
reward
0,8 120
0,2
0,7 110
0,1
0 0,6 100
0 1000 2000 0 1000 2000 0 1000 2000
iterations [th.] iterations [th.] iterations [th.]
RL algorithm Rule-based heuristic
Fig. 3. Reward signal (left), utilization (middle) and throughput time (right); moving average
values for 1000 iterations
a certain period. If a machine breaks, a maintenance engineer who is responsible for all
machines repairs it and afterwards the machine is back in the desired mode.
In this use case the intelligent maintenance agent is responsible for the decision when
and which maintenance action to take. The goal is to reduce the opportunistic mainte-
nance cost, i.e. the optimal action considering the current system load of incoming or-
ders, the cost of the action and the cost of a machine breakdown.
Fig. 4 illustrates the remaining time to failure of a critical state machine at the time
the agent performs the action over the learning phase iterations. The agent learns to
follow a strategy that brings the action closer to the failure. Additionally, the results
proves that the algorithm is able to implicitly learn the prediction and, based on this,
perform a suitable preventive action.
Fig. 4 also proves that conducting maintenance as late as possible is able to increase
the overall output of the system and comes at lower total cost, since fewer maintenance
actions are carried out. The results are compared to two benchmarks: a reactive and a
time-base maintenance strategy. The numbers do not take into account the further ex-
ploited wear rate of the machine components at the latest possible maintenance time,
which is why the actual value tends to be underestimated.
Fig. 4. Remaining time to failure (left, moving average values for 1000 iterations) and cost
comparison with benchmark maintenance strategies (right, average values of 40 runs)
This research has shown that CPPS providing real-time data pave the way for the ap-
plication of data-driven algorithms to enhance the operational efficiency of production
planning and control. RL algorithms are successfully implemented for order dispatch-
ing and maintenance management outperforming existing rule-based approaches.
However, ML algorithms are not favorable for all industrial applications. The fol-
lowing properties are advantageous: (i) applications with a limited scope in terms of
the number of states and actions (the learning period is dependent on these dimensions),
(ii) responsive real-time decision systems (computing the output of a ML algorithm
requires just linear operations), (iii) “cheap” training data (the trial-and-error approach
is intensively data-driven) and (iv) complex environments that can hardly be described
in detail (ability to generalize) [15].
131
This work brings the application of ML algorithms and the transition towards auton-
omous production systems one step closer to reality. However, the limitations of ML
algorithms and RL in particular still prevail, e.g. in terms of solution robustness. Further
research in the area of designing RL algorithms is needed to achieve a broad application
also in other areas of production control such as employee allocation and capacity con-
trol. Furthermore, research on multi-agent systems is required to broaden the scope of
applications.
Acknowledgments
We extend our sincere thanks to the German Federal Ministry of Education and Re-
search (BMBF) for supporting this research project 02P14B161 “Empowerment and
Implementation Strategies for Industry 4.0”.
References
1. Mönch L, Fowler JW, Mason SJ (2013) Production planning and control for semiconductor
wafer fabrication facilities. Springer, New York
2. Monostori L, Csáji BC, Kádár B (2004) Adaptation and Learning in Distributed Production
Control. CIRP Annals 53:349-352
3. Csáji BC, Monostori L, Kádár B (2006) Reinforcement learning in a distributed market-
based production control system. Advanced Engineering Informatics 20:279-288
4. Sturm R (2006) Modellbasiertes Verfahren zur Online-Leistungsbewertung von automati-
sierten Transportsystemen in der Halbleiterfertigung. Jost-Jetter Verlag, Heimsheim
5. Monostori L, Váncza J, Kumara SRT (2006) Agent-Based Systems for Manufacturing.
CIRP Annals 55:697-720
6. Günther J, Pilarski PM, Helfrich G, Shen H, Diepold K (2016) Intelligent laser welding
through representation, prediction, and control learning. Mechatronics 34:1-11
7. Stegherr F (2000) Reinforcement Learning zur dispositiven Auftragssteuerung in der Vari-
anten-Reihenproduktion. Herbert Utz Verlag, München
8. Thomas MM (1997) Machine Learning. McGraw-Hill, Inc., New York
9. Ertel W. (2011) Introduction to Artificial Intelligence. Springer, London
10. Russel S, Norvig P (2016) Artificial intelligence. Pearson Education Limited, Malaysia
11. Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction. MIT press, Cam-
bridge
12. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal Policy Optimi-
zation Algorithms. arXiv preprint:1707.06347
13. Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust Region Policy Optimi-
zation. International Conference on Machine Learning 2015;1889-1897
14. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M
(2013) Playing Atari with Deep Reinforcement Learning. arXiv preprint:1312.5602
15. Stricker N, Kuhnle A, Sturm R, Friess S (2018) Reinforcement learning for adaptive order
dispatching in the semiconductor industry. CIRP Annals 67;511-514
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.
LoRaWan for Smarter Management of Water Network:
From metering to data analysis
Water distribution systems (WDSs) are large complex infrastructures made from
pipes, valves, pumps, tanks and other elements designed and erected to transport water
of sufficient quality from water sources to consumers. The amount of the above ele-
ments, which can reach up to tens of thousands of links and junctions, their frequently
wide spatial dispersion and the WDS characteristic of being very dynamic structures
make the management of real WDSs a complex problem [1-4]. However, although the
main objective is to supply water in the quantity and quality required, other require-
ments are essential, namely maintaining conditions far from failure scenarios [5,6],
ability to quickly detect sources of contamination intrusion [7,8], minimization of leaks
[9-10], etc.
Advances in low powered sensors and data transmission are making their way on the
creation of smarter water networks. Despite prices are getting attractive, the return on
investment is far from being clear for many water company managers in the water dis-
tribution industry. To be prepared to arouse in these managers a real interest in the need
for the implementation of an adequate lattice of sensors in their water distribution net-
works, and to provide them with convincing arguments for their rapid implementation
three important questions should be first answered that should be clearly perceived as
main support elements in ad hoc decision-making: firstly, how many sensors are
needed; secondly, where sensors should be located in order to get the most out of them;
and, finally, what to do with the measurements in terms of improving operation and
customer services. This contribution addresses the third of these questions without for-
getting the other two and present a pilot project at early stage.
There are three aspects crucially important for water utilities and where the correct
use of measurements makes the difference on what the company can achieve: reduction
of non-revenue water, network operation optimization and provisioning of a quality
service. This contribution presents the development of a platform for Smarter Water
Network Operation and Management specifically aimed to support the three mentioned
aspects. It uses a water network analysis engine to estimate the state of the water net-
work based on measurements taken from the field combined with a mathematical model
of the water distribution network. The estimation of the network state is done starting
from the current moment of the analysis and looking 24 hours ahead. This makes pos-
sible to optimize the operation of pumps for the next 24 hours considering the price of
energy, the expected demands and the available tank capacity in the network. The op-
eration decision of pumps is corrected every hour and can be directly transmitted to the
pump station or introduced there by an operator depending on the technology available.
A sensible element in the mathematic modelling of water networks is the estimation
of demands. Sub-estimating demands when optimizing the operation of the network
can result on a lower quality of the service. Overestimating the demand would result on
over costs. The platform developed includes the possibility to receive the consumption
measurements directly from water meters installed at the client side or at different in-
terest points of the network. This way, demand values and forecasting algorithms will
be periodically getting updated based on the information received. Measuring demand
will help in this case not only to improve the results of the operation optimization but
also to create a water balance between the water volume supplied and consumed in the
network. Water balance is the first analytic step to start estimating non-revenue water
in a distribution system. Running water balances for subregions or sectors of the net-
work can help to locate zones with a higher leak impact. Identifying these zones and
eliminating their leaks will improve the levels of non-revenue water at utilities. The
effect of leaks and as consequence the non-revenue water volume can be also improved
by managing properly the pressure of the network based on a robust mathematical
model of the distribution system. Additionally, consumption measurements will also
help to achieve a better quality in the service: the platform checks the plausibility of
consumption and inform both the utility and the client about potential leaks at the client
side. Discovering leaks at the client side will avoid the surprise of receiving an expen-
sive invoice with a high consumption due to undetected leaks.
The development of the platform described here is the result of a collaboration be-
tween the group Fluing of the Polytechnic University of Valencia, Aguas Bixquert S.L.
and Ingeniousware GmbH. This collaboration has resulted in a pilot project developed
at a water distribution system managed by the company Aguas Bixquert S.L. For in-
strumenting the water network, it was considered convenient to use high energy-effi-
cient sensor nodes, preferable battery based and able to communicate across long dis-
tance. These characteristics motivated the use of Low-Power Wide Area Networks
(LPWAN) [11] technologies for supporting measurements in the pilot project. A Lo-
raWan [11] antenna was installed at a high point of the zone and it redirects all meas-
urement data to the servers of Ingeniousware where the platform for smarter water net-
work is running. About 30 water meters transmitting consumption via LoraWan has
been already installed at different part of the network. Installation directly at clients will
happen in the next phase of the project. A first version of the mathematic model of the
water network has been developed and can be visualized directly from the platform.
Consumption at all water meters installed can also be visualized as well as transmission
statistics. Installed water meters has a temperature sensor integrated and transmit also
the temperature value at the installation point. Temperature is a factor that improve
significantly the estimation of the water consumption in the network.
The coverage of the data transmission, its stability and the accuracy of the received
consumption measurements compared to manual reading of the water meter has been
135
evaluated. A water meter test bank has been created for these purposes. The most im-
portant conclusion of our evaluation is that certification authorities should include an
additional error produced at water meters when converting the mechanical movement
of the device into a digital signal. Differences from up to 18% where obtained when
comparing transmitted values with values read directly from the water meter. It makes
think about the necessity of extending the certification of metering devices that consider
the maximum error they can have depending on the existing flow. This certification
that defines the class of the device and the range of flow where it may work should also
consider the potential errors happening when converting the mechanical movement of
the water meter into a digital signal. Note that all water meter installed until now in the
pilot project are mechanical. A different situation may happen in the case of water mater
based on different measurement technology like the ultrasonic but it is still to be tested.
At the current stage of the project water meters from only one company has been tested
and it is expected to include at least two additional water meters providers for compar-
ison purposes.
References
1. Perelman L. and Ostfeld A. (2012) Water distribution systems simplifications through clus-
tering, Journal of Water Resources Planning and Management Division, ASCE, Vol. 138,
No. 3, pp. 218 – 229, http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000173.
2. Izquierdo, J., Montalvo, I., Pérez-García, R., Matías, A., On the Complexities of the Design
of Water Distribution Networks. Mathematical Problems in Engineering, Vol. 2012, 1-25,
2012.
3. Ostfeld A. (2012) "Optimal reliable design and operation of water distribution systems
through decomposition", Water Resources Research, Vol. 48, W10521, 14.
4. Diao, K., Fu, G., Farmani, R., Guidolin, M. and Butler, D., 2015. Twin-hierarchy decompo-
sition for optimal design of water distribution systems. Journal of Water Resources Planning
and Management, 142(5), p.C4015008.
5. Ostfeld A., Oliker N. and Salomons E. (2014). "Multi-objective optimization for least cost
design and resiliency of water distribution systems", Journal of Water Resources Planning
and Management Division, ASCE, Vol. 140, No. 12, 04014037,
http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000407
6. Herrera, M., Abraham, E., & Stoianov, I. (2016). A graph-theoretic framework for assessing
the resilience of sectorised water distribution networks. Water Resources Management,
30(5), 1685-1699.
7. Islam N, Farahat A, Al-Zahrani MAM, Rodriguez MJ, Sadiq R (2015) Contaminant intru-
sion in water distribution networks: review and proposal of an integrated model for decision
making. Environ Rev 23(3):337–352
8. Nafi, A., Crastes, E., Sadiq, R. et al. Intentional contamination of water distribution net-
works: developing indicators for sensitivity and vulnerability assessments Stoch Environ
Res Risk Assess (2018) 32: 527. https://doi.org/10.1007/s00477-017-1415-y
9. Covas, D. and Ramos, H., 1999. Practical methods for leakage control, detection, and loca-
tion in pressurized systems. In BHR Group Conference Series Publication, Bury St. Ed-
munds; Professional Engineering Publishing, Vol. 37, pp. 135-152.
136
10. Candelieri, A., Conti, D. and Archetti, F., 2014. A graph-based analysis of leak localization
in urban water networks. Procedia Engineering, 70, pp. 228-237.
11. Hhutsoane O., Isong, B., Abu-Mahfouz, A., 2017. IoT devices and applications based on
LoRa/LoRaWAN. IECON 2017 – 43rd Annual Cnoference of the IEEE Industrial Electron-
ics Society.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons licence
and indicate if changes were made
The images or other third party material in this chapter are included in the chapter’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the chapter’s Creative Commons licence and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder.