BPMN4sML: A BPMN Extension For Serverless Machine Learning
BPMN4sML: A BPMN Extension For Serverless Machine Learning
Thesis committee:
Dr. Indika P.K. Weeransingha Dewage (Supervisor)
Prof. Dr. Willem-Jan van den Heuvel (Supervisor)
Context: Machine learning (ML) continues to permeate all layers of academia, industry and
society. Despite its successes, mental frameworks to capture and represent machine learning
workflows in a consistent and coherent manner are lacking. For instance, the de facto pro-
cess modeling standard, Business Process Model and Notation (BPMN), managed by the
Object Management Group, is widely accepted and applied. However, it is short of specific
support to represent machine learning workflows.
Further, the number of heterogeneous tools for deployment of machine learning solutions
can easily overwhelm practitioners. Research is needed to align the process from modeling
to deploying ML workflows.
Objective: Confronting the shortcomings with respect to consistent and coherent modeling
of ML workflows in a technology independent and interoperable manner, we extend BPMN
and introduce BPMN4sML (BPMN for serverless machine learning). We further address
the heterogeneity in deployment by proposing a conceptual mapping to convert BPMN4sML
models to corresponding deployment models using TOSCA.
Method : We first analyze requirements for standard based conceptual modeling for ma-
chine learning workflows and their serverless deployment. Building on the requirements, we
extend BPMN’s Meta-Object Facility (MOF) metamodel and the corresponding notation.
Our extension BPMN4sML follows the same outline referenced by the Object Management
Group (OMG) for BPMN. We then take the extended metamodel elements and relate them
to corresponding TOSCA elements to identify a conceptual mapping.
Conclusion: BPMN4sML extends the standard BPMN 2.0 with additional modeling capab-
ilities to support (serverless) machine learning workflows, thereby functioning as a consistent
and coherent mental framework. The conceptual mapping illustrates the potential of lever-
aging BPMN4sML workflow models to derive corresponding serverless deployment models.
i
Preface
This thesis synthesizes the work of the past few months which truly has been a rollercoaster
of ups and downs, dead-ends and new solutions. Certainly, conceptualizing a domain (or
even parts of a domain) as vast as machine learning can seem like an overwhelming task, but
once the first step is taken, another one follows and soon enough the mountain is climbed.
To me, this work represents a great learning experience, both personally and academically,
starting with conceptual modeling over to serverless computing, model-driven engineering
and general design science research.
Through it all, my supervisor Dr. Indika P.K. Weerasingha Dewage has been a tremend-
ous help. You gave me unlimited support and confidence when things did not go as planned,
pushed me to reach beyond what I thought possible, remained critical whenever necessary
and ensured to still enjoy the research. You provided me with incredibly helpful resources
and advice and stood patiently next to me (virtually) without hesitation. For that I am
sincerely grateful.
I would also like to express my sincere gratitude to my supervisor Prof. Dr. Willem-Jan
van den Heuvel. You supported me and provided me with great opportunities and learning
experiences both within and outside the scope of this thesis.
Finally, I want to thank my friends and my family, for being patient, for their relentless
support and for answering countless phone calls.
I hope that you, the reader, will benefit from this work as much as I did.
Laurens Martin Tetzlaff
’s-Hertogenbosch, May 2022
ii
Contents
List of Tables v
List of Figures vi
1 Introduction 1
1.1 Problem Indication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Scope and Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Theoretical Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Practical Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Reading Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Review 6
2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Machine Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Machine Learning Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Requirement Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Model Learning and Verification . . . . . . . . . . . . . . . . . . . . . 11
2.2.4 Model Deployment and Monitoring . . . . . . . . . . . . . . . . . . . . 11
2.2.5 Federated Learning Workflow . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.6 Machine Learning in Operations . . . . . . . . . . . . . . . . . . . . . 14
2.3 A Primer on Serverless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Functions-as-a-Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Business Process Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Business Process Management Lifecycle . . . . . . . . . . . . . . . . . 18
2.4.2 Process Modeling as an instance of Conceptual Modeling . . . . . . . 19
2.5 Business Process Model and Notation . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 BPMN Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Model-driven Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.1 Deployment Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.2 Topology and Orchestration Specification for Cloud Applications . . . 25
2.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7.1 Machine Learning Workflows and Lifecycle . . . . . . . . . . . . . . . 27
2.7.2 Serverless Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7.3 Serverless Application and Workflow Modeling . . . . . . . . . . . . . 27
2.7.4 BPMN Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Methodology 30
3.1 Design Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Research Method and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
iii
CONTENTS
7 Validation 80
7.1 Illustrative Example 1 : The Case of Self-managing Service Architectures . . 80
7.1.1 ML Application Architecture . . . . . . . . . . . . . . . . . . . . . . . 80
7.1.2 Representation as a BPMN4sML Workflow . . . . . . . . . . . . . . . 81
7.1.3 Evaluation of BPMN4sML’s Modeling Capability . . . . . . . . . . . . 82
7.2 Illustrative Example 2 : The Case of Home Credit Default Risk Prediction . . 84
7.2.1 Simplified BPMN4sML Machine Learning Pipeline . . . . . . . . . . . 84
7.2.2 BPMN4sML to TOSCA via Conceptual Mapping . . . . . . . . . . . . 85
7.2.3 Evaluation of Current Extent of Conceptual Mapping . . . . . . . . . 87
9 Conclusion 93
9.1 Implications for Research and Practice . . . . . . . . . . . . . . . . . . . . . . 93
9.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Bibliography 95
Appendix 105
iv
List of Tables
4.1 Aggregated analysis and BPMN equivalence check of generic FaaS and work-
flow concepts and ML workflow and lifecycle concepts for derivation of exten-
sion element requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1 Explicit and implicit mappings between BPMN4sML and TOSCA metamodels 76
v
List of Figures
vi
LIST OF FIGURES
A.1 Winery TOSCA Topology Template of home credit default ML pipeline . . . 105
vii
Chapter 1
Introduction
”Data is food for AI” says Andrew Ng [1] stimulating the search for mental frameworks
to better capture and represent the lifecycle of machine learning (ML).
The domain is at the forefront of methods to analyse and understand large volumes of
data for value creation. It is thus a vast and resourceful solution for any organization to
turn to. Not just novel applications refer to machine learning for ’intelligent’ solutions,
also regular business processes are being overlaid with or amplified by ML, interconnecting
digitized services with fully automated intelligent functionalities [2]. Nonetheless, keeping
track of machine learning workflows and integrating them with existing processes is a cum-
bersome endeavour not least due to the lack of a shared mental framework and language
that allows to communicate what an ML solution entails [3]. General approaches to alleviate
such challenges of complexity include the adoption of standardized modeling languages to
help conceptualizing, visualizing and operationalizing a domain in a straightforward manner.
For instance, process modeling as part of general business process management practices or
deployment modeling embodied in model-driven engineering help in streamlining the iden-
tification and representation of individual components and modeling their interaction for
ultimately both non-technical and technical individuals to grasp logic, workflow and the
overall process or application.
Accounting for the intricacies of machine learning by means of a conceptual model, let
alone modeling and deploying ML workflows that are derived from it, remains difficult with
the functionality provided by existing standards.
1
CHAPTER 1. INTRODUCTION
What are modeling and deployment challenges with current standards and tools?
Process modeling languages such as FlowChart or the Business Process Model and Notation
(BPMN) offer a variety of functionalities to represent workflows [11]. Available extensions
to BPMN cover recently emerged fields like ubiquitous computing technologies or quantum
applications [12, 13, 14], meanwhile still neglecting specific requirements to allow for appro-
priate representation of machine learning concepts.
Several solutions to help realize and manage machine learning workflows exist, commonly
referred to as machine learning in operations (MLOps). However, MLOps frameworks come
with their own language and functionality making it difficult for ML practitioners to master
all of them. Provider differences regarding serverless workflows further intensify the hetero-
geneity. First steps are taken to decrease this burden on the developer [9, 15].
To facilitate and standardize the process of designing and communicating machine learn-
ing workflows irrespective of the selected cloud provider and technology, new modeling con-
structs are required. Moreover, to bridge the gap between modeling of machine learning
workflows and their serverless deployment, the workflow model needs to be relatable to a
deployable artefact.
In order to better answer the main research question, it is operationalized into sub-questions.
First, a knowledge base needs to be established on which the remaining work can build up
ensuring that it draws from the current state-of-the-art. Thus, the first set of questions is dir-
ected towards the main domains this work references, them being 1) technology independent
and interoperable modeling of processes and application deployment, 2) machine learning
2
CHAPTER 1. INTRODUCTION
and its processes and 3) serverless computing. This results in the following questions:
1.1 What constitutes standard-based modeling and deployment?
1.2 What constitutes machine learning and its processes?
1.3 What constitutes serverless computing?
With an established understanding of the domains, the underlying concepts and require-
ments for modeling machine learning workflows considering the goal of their subsequent
serverless deployment orchestration need to be identified. Further, established requirements
are to be associated with capabilities of existing process modeling solutions to discover po-
tential shortcomings and need for extension. As the Business Process Model and Notation
is the de-facto standard in process modeling, it is chosen as the language of comparison.
This results in the following set of questions:
2.1 What are the underlying concepts of a machine learning workflow, i.e. operations,
artefacts, events?
2.2 What are the relevant concepts for FaaS-based deployment of machine learning work-
flows?
2.3 What are the requirements for and shortcomings of BPMN as a modeling language
standard to represent serverless machine learning workflows?
Answering this set of questions lays the foundation to address the guiding research ques-
tion - through application of the identified concepts and requirements the creation of arte-
facts that enables ML engineers and other stakeholders to model ML workflows in a tech-
nology independent and interoperable manner is possible. To further structure and realize
this process, the following set of sub-questions is derived:
3.1 How can a modeling language standard be extended to allow for modeling of serverless
machine learning workflows?
(a) How can serverless machine learning workflows be conceptually captured?
(b) How can serverless machine learning workflows be visually represented?
In answering these questions, the core artefacts of this work can be developed. To further
address the remaining aspects of the guiding research question, i.e. the modeling of serverless
deployment orchestration of designed ML workflows, a conceptual mapping is required that
enables derivation of a deployment model while continuously adhering to the requirements
of technology independence and interoperability. This leads to the last sub-question:
4.1 How can a machine learning workflow model be conceptually transformed into a de-
ployment model for serverless deployment orchestration in a technology-independent
and interoperable manner?
In answering this question, a method proposition can be derived to support the modeling
of ML workflows and their serverless deployment orchestration that ML engineers and other
stakeholders may draw from. It further facilitates communication and analysis of designed
ML workflows in a standardized and structured manner.
3
CHAPTER 1. INTRODUCTION
that allow modeling (serverless) ML workflows. To that end, this work 5) extends the
Business Process and Model Notation to capture and represent the respective ML concepts.
Referencing the extension, an initial conceptual mapping is proposed to 6) translate ML
workflow model diagrams into corresponding (TOSCA) deployment models such that their
serverless deployment orchestration is facilitated.
4
CHAPTER 1. INTRODUCTION
we arrived at as well as their academic and practical implications and provide directions for
future work.
5
Chapter 2
Literature Review
This chapter presents the dominant themes and concepts of importance to the research ques-
tions in section 1.2 thereby establishing a knowledge base and the current state-of-the-art
upon which the remaining work builds. Section 2.1 introduces an overview to the domain
of machine learning. Following, section 2.2 explicates the machine learning lifecycle while
accounting for regular machine learning as well as for federated learning workflows. Ad-
ditionally, a brief introduction to machine learning in operations is given. Subsequently,
a primer on serverless computing establishes the recent paradigm in section 2.3. As an
upcoming service model, section 2.3.1 puts forward Function-as-a-Service and delineates
possible ways of architecting FaaS-based solutions in event-driven or orchestrated fashions.
By drawing from business process management, its lifecycle and associated activities, the
field of business process modeling is introduced in section 2.4. As a solution to business pro-
cess modeling, the Business Process Modeling Notation is presented alongside its extension
mechanism as the de-facto standard in section 2.5. Following, section 2.6 describes different
ways of deployment modeling as part of the overall model-driven engineering methodology
which builds the connection to the deployment modeling standard Topology and Orches-
tration Specification of Cloud Applications that is elaborated on in section 2.6.2. Finally,
section 2.7 concludes this literature review by synthesizing a selection of related studies.
6
CHAPTER 2. LITERATURE REVIEW
learning tackles challenges for which a correct output, that is to be predicted, exists. Hence,
based on labeled input-output pairs the algorithm searches for a mapping from input in-
stances x to output instances y. Thus, it can be trained on a ground truth through a loss
function which measures and minimizes the offset or distance between the predicted value
and the correct one. In contrast, unsupervised learning problems address situations in which
the measurement of the outcome y is absent and only input instances x are given. In that
case, rather than predicting a target variable the unsupervised algorithm focuses on organ-
izing or clustering the data which can be later used to for instance group new observations
with existing ones, thereby informing on their context and relationships in the overall data-
set. In this learning problem, it is not explicitly stated what patterns shall be looked for
and clearly defined error metrics as applied in supervised learning are missing [21, p. 2ff.,
p. 9ff.]. A combination of the two, semi-supervised learning, addresses problems for which a
small number of labeled observations is available next to a larger subset of unlabeled data.
To improve upon the performance of the supervised learner, information in the unlabeled
instances is exploited through unsupervised algorithms. Reinforcement learning focuses on
identifying and performing the correct action in a situation - learning what to do in a given
context. Hence in this field, algorithms map a situation to the corresponding action such
that a numerical reward signal can be maximized that is computed through a reward func-
tion. In this problem context, actions taken by the algorithm can influence later inputs
(situations). Further, the action in each situation ultimately yielding the highest reward is
unknown to the algorithm and needs to be discovered through trial and error [22, p. 15ff.]. In
the context of this work, supervised and unsupervised learning concepts are considered while
aspects of reinforcement learning are left to future research as characteristics and processes
specific to it differ and are therefore out of scope.
Apart from the introduced sub-categories, various overarching machine learning methods
exist - too many to allow for exhaustive presentation. The methodologies, most of which
pertaining to supervised learning, address various challenges inherent to ML such as learner
performance, making the statistical model more robust to unseen data, dealing with limited
computational resources or decentralized datasets to list a few. In the following, an over-
view of a popular subset is given. Nonetheless, each topic can be elaborated upon much
more in depth, hence, the interested reader may consult the indicated references for further
information.
Aside from semi-supervised ML, other hybrid methodologies have been brought forward
one of which is self-supervised learning. Similar to semi-supervised learning, it combines
unsupervised learning problems with supervised learning techniques to circumvent the bot-
tleneck of labeled datasets. By creating an artificial supervisory signal, i.e. output y, an
unsupervised learning task can be reformulated as a supervised learning problem for which
a supervised algorithm is able to pick up co-occurring relationships inherent to the data-
set [23, 24, 25].
As another method, transfer learning can be applied in order to leverage possibilities
of re-use and build up on already well-performing ML models from different but related
domains with respect to the current ML problem. To do so, given one or several source
domains, a target domain and corresponding ML tasks, the related information from the
already solved ML source task(s) is used to improve the predictive function that is applied
to the ML task of the target domain. As a consequence, dependence on data from the target
domain needed to create the target learner can be greatly reduced [26, 27, 28].
Next to differentiating on the learning problem, some machine learning techniques not
limited by it are applicable for this work. Online learning addresses challenges inherent to
streaming data, observations that become available over time. In contrast to conventional
machine learning practices in which an algorithm is applied to batch data offline, online
learning structures the ML task in such a way that the algorithm continuously learns from
observations one by one or in grouped intervals at a time. Thereby, the online learner can
leverage more and more knowledge on ground truths that previously have been prediction
tasks [29].
In some scenarios, one ML model may not be sufficient to adequately solve a machine
learning problem in which case the application of the model to unseen data results in poor
7
CHAPTER 2. LITERATURE REVIEW
learner performance. By considering several different learners applied to the same problem
and combining them, one learner may compensate for the other’s errors and limitations.
Known as ensemble learning, multiple ML algorithms are applied to the ML task, producing
inferior results on their own. However, when fusing the results via a voting mechanism,
better performance can be achieved. Various techniques exist to create the set of distinct
learners such as iterative permutation of the dataset used for training the algorithms or
applying the same algorithm with changing constraints and parameters. Further, several
output fusion approaches have been brought forward. For an overview of ensemble learning
the reader is referred to [30, 31, 32].
Inherent challenges to machine learning are computational bottlenecks resulting in the
origin of distributed ML. Demands for data processing for large ML models exceed devel-
opment progress in computing power. Therefore, various solutions to transfer the machine
learning task and share the associated workload over multiple machines gained popularity.
To accelerate a workload, conventional distributed computing differentiates between vertical
scaling, adding more resources to one single machine, and horizontal scaling, adding more
machines (worker nodes) to the system. Distributed ML considers a system made up of sev-
eral machines, thus the ML problem can be solved by parallelizing the data, data-parallel
approach, or the model, model-parallel approach, or both. When applying the data-parallel
approach, the dataset is partitioned equally over all worker nodes, thereby creating dis-
tinct data subsets on which the same algorithm is then trained. A single coherent result,
output y, can emerge since all machines have access to the same model. In contrast, the
model-parallel approach is more constrained. Each machine receives an exact copy of the
full dataset while it operates on a distinct part of the algorithm. The final ML model is
then an aggregate of all ML model parts. Nonetheless, distributing the ML algorithm across
multiple machines has limits since most times the algorithm parameters cannot be easily
split up [5]. Consequently, most distributed ML approaches refer to a data-parallel solution.
Next to parallelising the algorithm training also other steps occurring during a machine
learning workflow can benefit from distribution over multiple worker nodes such as trying
out different hyperparameter settings, one setting per worker node. A closer look at other
steps involved in the machine learning workflow is taken in the subsequent section 2.2. To
take load off of the data scientist or engineer, recent solutions have been brought forward
that allow offloading the ML task to a more suitable machine learning platform that poten-
tially applies distributed techniques hidden behind the service interface. Leveraging such a
solution abstracts the low-level implementation and management intricacies of sophisticated
techniques.
Not only computational limits but also restricted access to decentralized proprietary data
sources are a challenge in machine learning research. An upcoming solution is federated
learning, a term first coined by Google [33], that facilitates collaboratively solving a ML
problem across a heterogeneous set of devices while retaining data ownership and adhering to
legislation and other restrictions of the location where data is stored - albeit with limitations.
Federated learning can differ in terms of workflow and life-cycle operations that are presented
in section 2.2. Consequently, to clarify differences section 2.2.5 elaborates on it.
Other ongoing streams of ML research are methods such as active learning or multi-task
learning, see [34, 35, 36].
When considering a general machine learning workflow, activities involved may differ
significantly depending on the method applied. Most of the established tools in practice
support an end-to-end ML workflow particularly covering regular supervised learning. It is
one of the best researched and well-established disciplines. Accordingly, the focus of this
study lies predominantly on supervised learning workflows. To account for upcoming fields,
an look is taken at federated learning as an instance of distributed learning however with a
limited scope.
8
CHAPTER 2. LITERATURE REVIEW
9
CHAPTER 2. LITERATURE REVIEW
Data Analysis
Requirement
Preprocessing Training Evaluation
Verification Trained Encoding Verified
Dataset Model(s) Model(s)
Data-oriented Model-oriented
Model Deployment
Integration Monitoring
Maintenance &
Updating
phase to improve differentiation with the subsequent operational ML pipeline that can be
deployed [44].
10
CHAPTER 2. LITERATURE REVIEW
11
CHAPTER 2. LITERATURE REVIEW
independent and identically distributed (iid). The ML problem that is solved by the distrib-
uted nodes (e.g. mobile devices) is coordinated by a central aggregator. The key difference
to other distributed ML solutions is the concept of not only using a ML model locally on
the distributed node but also training it only with the data available in place while simul-
taneously benefiting from a shared global model that has been informed by all local model
instances through transmission of local model parameters. The parameters are uploaded to
the aggregator, creating a global model which can in turn be shared again with the par-
ticipants (distributed nodes) to improve the solution of the ML task [49, 50, 51, 52]. The
two main parties in a federated learning workflow are represented by the mentioned clients
and the central server [53], see Figure 2.2 for an overview. The notion of what a client
constitutes may differ depending on the scenario - in case of cross-silo FL a client may be
an organization or institution that runs the client-side operations on cloud or on their own
data centres; in cross-device FL a client would constitute an edge device [54].
In comparison to the presented machine learning lifecycle, federated learning can differ
in that the data management, model learning and verification and model deployment phases
are more interconnected and can occur at different times on different devices thereby also
changing the location and environment of executed tasks and generated artefacts. While
data management activities such as a data collection are mostly equivalent in their logic
to the conventional ML lifecycle, especially the subsequent stages, i.e. model training,
evaluation, deployment and monitoring can vary, not least due to the federated nature of
the ML workflow. The model learning phase in particular encapsulates a set of activities that
interact between federated clients and the central server, i.e. local model training, model
uploading, global model collection and aggregation as well as potential model evaluation
and broadcasting back to the clients to iterate over the phase.
Since federated learning has only just recently been introduced to the research com-
munity and industry, its applications and processes involved are still debated and remain
inconsistent [53, 49, 50, 56, 54]. To mitigate this, the present work leverages the reference
architecture proposed by Lo et al. [53] to establish an overview and common ground regard-
ing the federated learning workflow, see Figure 2.3. Note that for an in-depth discussion of
various federated learning architectures and intricacies, the reader is referred to the following
works [53, 55, 49, 50, 51].
To condense the intricacies and possibilities of federated learning workflows and focus
on the elements studied within this work, only the main components, i.e. activities, are
elaborated upon. For further explanations on patterns, we refer to the original paper by Lo
et al. [53].
Federated learning does not replace all phases and activities that are relevant to the
machine learning lifecycle previously explained. Instead, it intermingles them. In most
12
CHAPTER 2. LITERATURE REVIEW
cases, a federated learning process starts with a job creation executed by the central server.
This implicates the requirement analysis, potential data analysis (if possible) and other
configuration activities prior to initialization of the workflow in order to define its goal,
constraints and set-up, similar to the ML lifecycle. Also data management activities that
are possible on a central level can be executed. Once a job is created, a first version of
a global ML model is generated. Thus, the initialization of a federated learning process
can be closely associated with the model learning phase of the ML lifecycle. After job
creation and its propagation to the respective clients, local data collection and processing
activities are executed that only differ from conventional operations in terms of the entity
that manages them, i.e. the client side. Following, the propagated model is trained on the
local dataset and evaluated. In case evaluation complies with previously set requirements
and constraints the now local ML model is sent back to the central server for aggregation.
The aggregation considers the parameters of all models received thus far, resulting in a new
updated global model. As previously mentioned this process can be iterative in that the now
aggregated model can be evaluated by the central server and in case of malperformance sent
back to other clients to train and update it again. Once a sufficiently good global model is
trained, it can be deployed on the client side to allow for its inference services - represented in
Figure 2.3 by the Decision-Maker component. These activities are in line with the previously
covered model verification and model deployment phase. The ML model monitoring can be
executed on a client level to investigate local model performance and potentially demand a
local re-training or initialization of a new global training process.
Note that optional activities and artefacts represented by Patterns in Figure 2.3 do not
explicitly include data privacy related implementations. Further, patterns such as a client
registry or deployment selector are needed to fully specify the clients which are to be included
in the federated training or deployment.
13
CHAPTER 2. LITERATURE REVIEW
14
CHAPTER 2. LITERATURE REVIEW
• utilization-based billing, i.e. paying solely for actually used resources which can be
enabled by scaling them to zero when idle
• limited control (NoOps), i.e. abstraction of management and maintenance away from
the user
• event-driven, i.e. changes in states, so called events, e.g. a pub-sub message, function
as triggers to invoke (request) a serverless execution unit when needed.
A popular service model that is used almost equivalently with serverless computing is
Function-as-a-Service (FaaS). Since this thesis particularly focuses on FaaS as a means
for serverless computing and uses it as the underlying mode of deployment for a modeled
ML pipeline, a primer on FaaS is in order.
2.3.1 Functions-as-a-Service
Function-as-a-Service describes another level of explicit abstraction, away from virtual ma-
chines, containers and micro-services towards event-driven code snippets, functions, that
incorporate only the program’s logic and are managed and scaled automatically by pro-
viders (as is typical in serverless computing). The cloud providers take care of the entire
deployment stack that is required for hosting the function3 [9]. In contrast, even with con-
tainer solutions such as Docker the developer is still required to consider and construct
deployment of environments.
Fundamentally, the idea behind FaaS directly resembles the concept of functions in math-
ematics or traditional functional programming. A function is created mapping inputs to
outputs. A composition of several functions would then result in a program. Transferred to
the concept of cloud computing, a developer should then be able to ’program the cloud’ by
registering serverless functions on it and combining them into a program [63].
What makes FaaS not only attractive for users but also for cloud providers are promises
that can be enforced through this paradigm. For instance setting a hard upper limit on
function running time allows to better predict and allocate resource utilization as well as
basing the billing on actual running time of a function. Moreover, statelessness enables
the provider to safely remove or relocate all information, i.e. states, of the function after
its execution providing more possibilities to utilize their computing resources [61]. Next to
the traits typical to generic serverless computing, additional characteristics can therefore be
defined for FaaS:
• statelessness, i.e. pure-function like behaviour of an invocation of an execution unit
• fixed memory, i.e. limited amount of memory an invocation can be allocated with
• short running time, i.e. a limited time window for execution completion.
FaaS functions are stateless in that they are ephemeral. They are deployed, run through and
the instance is then deleted which also deletes eventual variables or states that were created
during function invocation inside the environment of that function. This may not always be
the case, as potential solutions are investigated to make the FaaS concept more performant
by for example reducing the impact of so called cold-starts, i.e. having to redeploy a func-
tion environment from scratch instead of reusing existing ones. However, providers do not
guarantee that any state is still available in the next invocation, thus making it stateless. An
ongoing effort in research and industry is put into relaxing such characteristics for example
by enabling stateful serverless applications4 , facilitating state propagation, expanding the
memory a function can use or extending possible function lifetimes [64, 65]. While these
particularities of FaaS simplify the general understanding of the overall concept it comes
with a certain slew of complications. As Hellerstein et al. [63] postulate while FaaS en-
ables to harvest some benefits of cloud computing such as pay-as-you-go, at least at the
3 FaaS functions are referred to as functions, serverless functions, cloud functions or FaaS functions
15
CHAPTER 2. LITERATURE REVIEW
moment of writing it still limits realising other potentials such as unlimited data storage or
distributed computing power. The reason for that essentially lies in some of the conceived
benefits. Limited lifetimes, isolation of the function from data and statelessness slows down
communication between components, for instance by making it necessary to write produced
data artefacts to persistent storage locations such as AWS S3 or Google cloud storage in
between function calls. Novel solutions to this are continuously being introduced, e.g. to
speed up data querying and caching, and yet to be fully tested for various use cases. A
full discussion on topics such as choosing the ideal serverless architecture and products for
certain applications such as heavy machine learning is out of the scope of this work given the
abundance of potential solutions and vendor differences (one only needs to look for example
at AWS storage options5 and related services or Google cloud versions6 of it).
Regardless, due to the computational limitations of FaaS, composition of several func-
tions often is necessary to cover the entire logic required for representing the requested pro-
gram or functionality, positing an ongoing area of research [9]. In the context of this work,
the FaaS paradigm is of most interest from a user, e.g. developer, perspective. Therefore in
the following sections, design patterns to compose functions in order to model workflows or
full application systems are introduced.
FaaS Choreography
FaaS choreography is based on the typical serverless paradigm of event-driven architec-
tures [66]. When applied to composing multiple related functions, event-driven paradigms
are applied to chain them together through data dependencies that allow to pass along out-
puts and inputs. Data dependencies arise when functions manipulate states in some way
that can be tracked by queueing systems or object stores which in turn inform the subsequent
function and results in its invocation [63]. For instance, a component responsible for data
pre-processing may lead to producing an event once the pre-processing is finished and the
dataset is written to an object store. These so called event producers are responsible for
generating events. A broker, i.e. event router then ingests and filters incoming events and
maps them to the respective event consumer. Finding the right event consumer is defined
beforehand by triggers 7 . The event router is responsible for connecting the different com-
ponents and serves as the medium through which messages are propagated by executing
a response to the originally produced event and sending it out to the corresponding con-
sumers. Regarding our ML example, an event consumer may be another component that
reads the processed dataset for model training or tuning, see Figure 2.4. If several differ-
ent models are to be trained on this data set, multiple model training components can be
set as subscribers to this event. By creating logic around these three artefacts (producers,
router, consumers) loose-coupling can be achieved [67]. Each component can independently
be deployed, updated or scaled depending on changing requirements.
Some challenges with exclusively event-driven approaches, particularly with regards to
complex potentially long-running workflows such as machine learning pipelines, lie in prop-
erly tracking state and individual execution failures and, as a result, also triggering recovery
actions from errors. While the representation of typical workflow patterns, see section 2.4.2,
is also supported in event-driven architectures [68], providers claim to enable better and
more consistent support and alleviation of other challenges through function orchestration
engines. To that end, explicit function orchestration (function orchestration in short) gained
popularity as an approach to composing multiple severless functions together with other ap-
plications while supporting common control flow patterns, e.g. branching constructs, and
integration with other internal and external services and actors.
5 https://docs.aws.amazon.com/whitepapers/latest/aws-overview/storage-services.html
6 https://cloud.google.com/products/storage
7 Exact terminology may differ between providers and among researchers. We make use of Google Cloud’s
vocabulary [67]
16
CHAPTER 2. LITERATURE REVIEW
broker subscriber
trigger
filter
event consumers
event producer event router x x x
(targetable)
FaaS Orchestration
Function orchestrations draw from workflow research to represent for example business pro-
cesses or more generic applications in a serverless manner. Most providers support some
degree of workflow patterns, most commonly patterns such as sequences of a set of consec-
utive functions, branching, e.g. parallel execution of different sets of functions based on the
same input, or conditional process flows. Next to that, also other characteristics common to
business processes can be supported by some, for instance integrating events such as waiting
timers to pause the workflow for a specified duration. Leveraging function orchestration en-
gines, better state and execution control is possible which also simplifies troubleshooting and
helps with recovery actions through explicit error handling for each step in the workflow. A
closer look at these functionalities is taken in section 4.2 when considering inherent charac-
teristics that create requirements for the BPMN modeling extension. To create a serverless
function orchestration at least two different modeling concepts have to be considered [9]:
• modeling the control flow of the orchestration, which would represent the workflow,
covering multiple functions
• modeling the deployment of the overall serverless application that surrounds the func-
tion orchestrations.
When modeling the control flow, also business logic aspects need to be considered and imple-
mented with respect to constraints set by the respective orchestration engine and serverless
function provider. For actually modeling the function orchestration, either vendor-specific
modeling languages, such as Amazon’s ASL (Amazon State Language), or proprietary tech-
nology or regular programming languages such as Python can be used. This heterogeneity
leads not only to lock-in effects but also to challenges when trying to even semantically trans-
fer the function orchestration logic [9, 10]. These characteristics are however also typical
for the level of maturity of function orchestration technologies. Most providers offer their
own function orchestration engines with varying levels of functionality such as Amazon’s
AWS Step Functions [69], Google’s Workflows [70] or Azure’s Durable Functions [71] which
are ready to be used as-a-service. Finally deploying the modeled function orchestration also
requires considerations regarding which vendor or open-source technology and respective
services as well as deployment modeling language to use. This leads to two levels of expert-
ise a developer requires to make use of function orchestrations, 1) the function orchestration
modeling language and 2) the deployment modeling language [9, 10].
17
CHAPTER 2. LITERATURE REVIEW
18
CHAPTER 2. LITERATURE REVIEW
model is generated. The BPM life-cycle is relevant to this research to the extent that the
study addresses the limitations of current process modeling - one of the core activities of
BPM - with respect to machine learning. The BPM life-cycle thus serves as a more general
orientation, putting process modeling into perspective and highlighting its importance in
the context of business process management, particularly regarding design and analysis of
the process as well as its representation and communication among various stakeholders.
Thus, subsequently process modeling is presented as a part of general conceptual modeling.
19
CHAPTER 2. LITERATURE REVIEW
object and iii) to eventually solve the problem being tackled through domain analysis and
designing of solutions [79]. Ultimately, a conceptual model can guide development activities
in their respective domain [8].
As a version of conceptual modeling, process modeling derives a conceptualization spe-
cialized in largely representing a workflow, i.e. a flow of activities, decisions and events that
are interdependent. A large body of research has been spent to better understand work-
flows, their patterns, functionality, potentials and drawbacks. The fundamental element, a
task, can, when combined with other elements, be arranged to describe complex patterns,
i.e. workflow patterns, the most fundamental ones being [80]:
• sequence, i.e. one task is dependent on the completion of the preceding one
• parallel, i.e. several tasks can be realized independent of each other
• distribution (e.g. fan-out), i.e. m tasks directly follow the completion of a single
preceding task
• aggregation, i.e. a dependency relation between one task and several directly preceding
ones.
For a more elaborate discussion on workflow patterns see Russel et al. [81, p. 105ff.].
Once a process model depicting a workflow is generated, in operations an instance of it
can represent the running process in its actual form. The practice of process modeling con-
sequently aims to abstractly express the process at its current state or to-be state through
a clear and standardized graphical notation and formalized schema language thereby redu-
cing ambiguity, risks of misinterpretation and allowing for backtracing of process instances
against the designed model to identify areas of improvement. Process modeling can be ap-
plied for two main purposes, namely 1) organizational design which leads to conceptual mod-
els to improve understanding, communication and facilitate process (re-)design and analysis
and 2) application system design which concentrates on IT-oriented aspects such as process
automation and implementation details represented in an executable process model [78].
The level of detail that a process model should contain depends on its purpose. Forr
documentation means, a high-level model alongside text annotations is sufficient. When
quantitative analysis on process performance should be performed, more fine-grained in-
formation in the process model is required such as time taken to fulfill a task. Executable
process models that are to be deployed require even more granular information on the process
itself and any information associated with or required by it, for instance inputs and outputs
of the tasks. The presented work focuses on the latter level of detail when extending the
modeling language as the process model needs to contain the required information for it to
enable conversion to TOSCA and deployment of the ML pipeline. Various process modeling
languages exist that allow to design a process diagram. Most are based on two basic kinds
of nodes - control nodes and activity nodes with activity nodes representing a unit of work
that is to be performed by a person, a software or other agent and control nodes regulating
the execution flow among activities. Further, event nodes are supported by some languages
as a third major category to indicate an event taking place that requires a corresponding
reaction during the process [73]. More generally, a modeling language is compromised of four
aspects - 1) vocabulary, 2) syntax, 3) semantics and 4) notation. The modeling elements
are given by the vocabulary which are in turn constrained by the syntax to enforce the rules
of the language and describe the relationship between the elements available. Their precise
meaning in the domain context is bound by the language’s semantics and mostly given via
textual descriptions [82]. Finally, the notation provides graphical symbols to visualize the
elements when modeling the concept at hand [78, 83].
A flowchart, one of the oldest process modeling languages, is a graphical representation
of formalised structures such as a software logic sequence or a work process [84]. By using
symbols the process is described as a sequence of actions in an easy-to-understand fashion
while providing a high degree of flexibility and low adoption threshold. With it, however,
come several weaknesses such as a blurred boundary of the actual process and a rapid increase
in size of the model due to it having no differentiation between main and sub-activities.
20
CHAPTER 2. LITERATURE REVIEW
Further, responsibilities and performers cannot be easily represented. The language suits
itself well for explaining processes with a high level of detail while providing an overview of
the process itself falls short. Also numerous other process modeling techniques have been
brought forward over the years originating from various methodologies such as Gantt charts
or petri nets each having their own characteristics and weaknesses [85, 86].
A noteworthy modeling technique that took inspiration from flowcharts are UML dia-
grams. The Unified Modeling Language (UML), an object-oriented method, can model a
process which is represented by objects that are transformed by the process activities over
the process lifetime. UML, further, provides nine different diagram types addressing various
aspects of a system that is to be modeled [87]. All of them build up on three main concepts
- 1) objects that represent the entity under study by incorporating the data structure, i.e.
attributes, and its behaviour, i.e. supported operations, 2) state a condition that the ob-
ject may be in with its attributes taking on specific values, and 3) behaviour which are the
actions and reactions according to the operations the object supports and can perform and
that lead to state changes. As Aguilar-Savén constitutes UML aides in ’specifying, visual-
izing, constructing and documenting the artefacts of [...] systems as well as [in] business
modeling’ [85]. In this work, the UML class diagram notation is applied to conceptualize
the meta-model artefacts of the BPMN extension for ML workflows. A class refers to a set
of objects that have similar properties. Classes can be associated with one another to depict
relations between instances of them, e.g. an aggregation to explain a part-whole relationship
or multiplicities to elaborate on various forms of cardinality. Additional features are sup-
ported to facilitate enrichment of the conceptual models such as generalizations and other
forms of hierarchies [88]. A minimal example applying the class diagram notation to arrive
at a conceptual model is shown in Figure 2.6. The Unified Modeling Language established
itself as a standard among object-oriented modeling techniques. While UML allows for in-
ternal consistency checking and to directly build software off of it, it is not withstanding
shortcomings. Modeling UML diagrams is a time-consuming and complex process. Addi-
tionally, users may quickly be overwhelmed with excessively large models and fragmented
information.
Among the available process modeling languages the Business Process Modeling Notation
established itself as a standard answering the need for a language that is both expressive
and formal while still being understandable and comprehensive to take into account the
various stakeholders, technical and non-technical, affected by a process and contributing to
the overall business process management practices [89, 90, 91]. Consequently, throughout
the rest of this thesis, we place our focus on BPMN.
21
CHAPTER 2. LITERATURE REVIEW
DataInput BaseElement
*
InputOutputSpecification id: String
0.. 1 DataOutput
*
processType : ProcessType
+ resources isClosed : Boolean +container
isExecutable : Boolean +flowElements
*
ResourceRole FlowElement
HumanPerformer
Activity +sourceRef
FlowNode * SequenceFlow DataObject
0.. 1 +outgoing
isForCompensation : Boolean *
startQuantity : Integer +targetRef +incoming
0.. 1 completionQuantity : Integer
0.. 1
Task Event Gateway
22
CHAPTER 2. LITERATURE REVIEW
a full arrow-head [73]. Events are described by circles. They can mark the start or end
of a process as well as intermediate situations happening instantaneously during execution
and can be further classified into catching and throwing events. Circles with a thin (thick)
border describe start (end) events while a double layered border indicates an intermediate
event. Catching events are markers with no fill whereas throwing events are markers with a
dark fill [78]. BPMN further accounts for abstract artefacts such as data related objects, for
instance a data store or data in- and outputs. Additional information can be provided via
text annotations which however complicate process validation or verification as the textual
data contained within is ill-defined [12]. Process resources such as process participants are
described by pools or lanes. Resources can be active and passive - the former capable of
autonomously performing an activity while the latter is only somewhat involved with the
activity’s performance. A complete overview of the BPMN vocabulary, syntax, semantics
and entire range of elements is available in the latest documentation of BPMN [11]. Mak-
ing use and extending BPMN in the course of this research comes with two benefits - 1)
leveraging widely known and properly defined semantics, syntax and logic and 2) building
up on existing state-of-the-art and thereby improving on relevance and validity of the to be
created artefact.
23
CHAPTER 2. LITERATURE REVIEW
can apply extension by addition, i.e. attaching novel elements to the pre-defined existing
ones of the language. When extending BPMN, three pillars can be considered - 1) the MOF
meta-model defining abstract BPMN objects in UML, 2) the XML schema documents which
are derived from the meta-model and represent the structure in a machine readable format
and 3) the graphical notation that is being used by the modeler. A semantically correct
extension primarily focuses on the first pillar which in turn requires the designer to consider
low level implementation challenges and particularities of the respective domain. Based on
the metamodel, the XML schema documents may be extended. A standardized approach
for visualizing the graphical representation of the extended elements has not been fully
formalized, thus most extensions stick close to existing BPMN elements incorporating slight
changes via context-representative icons [95, 14, 12]. The latest BPMN version provides a
guiding extension mechanism that focuses on the addition of elements and attributes to the
BPMN metamodel via four components - 1) ExtensionDefinition to group new attributes
under a new concept name, 2) ExtensionAttributeDefinition to represent the respective at-
tribute, 3) ExtensionAttributeValue to store the attribute’s value and 4) Extension to bind
the new concept to the BPMN model. Notably, this is a means of guidance instead of hard-
written rules to align with the BPMN core semantics and various approaches referenced in
literature exist. Building up on the BPMN metamodel representation, new extensions can
be precisely ideated, defined and explained [13, 97].
Depending on the purpose of the extension, the introduced BPMN additions in related
research focus on different process modeling perspectives - be it to for instance conceptualize
and visualize the domain-specific process in a more articulate fashion to process participants
or to enrich the data contained in the process model, e.g. the data flow with its corresponding
inputs and outputs, such that it contains the required information to enable execution
(transformation towards executable models) in the respective application domain. As the
focal point of the proposed extension artefact is of a technical nature - the ML pipeline design
and implementation -, the latter existing extensions are of interest considering potential re-
use of proposed elements or as a stepping stone to build on top of. The BPMN extension
artefact presented in this thesis is based on a metamodel extension.
24
CHAPTER 2. LITERATURE REVIEW
25
CHAPTER 2. LITERATURE REVIEW
specific middleware and infrastructure components. Reusable node types and relationship
types are referenced to define the characteristics, i.e. the semantics, of the corresponding
node and relationship templates. This way, type hierarchies can be encoded. A set of
normative types is specified by the TOSCA standard, e.g. ’hostedOn’ to define dependency
relationships or ’connectsTo’ to define communicative relations [107]. To connect nodes
through a certain relationship a requirement can be introduced in the source node type and
linked to a target node by adding the corresponding capability [9].
Node and relationship types allow to abstract one layer from the more concrete node
and relationship templates, see Figure 2.10. An instance of the template then functions as
a real existing and instantiated component or relationship [59].
Similar to what Yussopov et al. [9] propose, more customized and domain-specific types
can be derived from the existing normative one. In order to better configure node and
relationship templates, properties can be specified such as a port number for communication
or an identifier defining in which region of a cloud provider a node should be set up. In
addition, nodes and relationships can offer defined interface operations to specify actual
deployment and management details which enable TOSCA-compliant provisioning engines
to trigger the correct lifecycle operation. Again, normative lifecycle operations are pre-
defined, i.e. create, configure, start, stop, delete, and can be further customized [107, 17]. To
provide the actual logic performing the required operation implementation artefacts (IA)
can be assigned to the correct node or relationship type through a node type implementation
or respective relationship type implementation. The actual business logic, i.e. what work
a node should perform, can be attached as deployment artefact (DA) to the corresponding
node template, e.g. as a .zip archive of the component’s code [9].
The combination of components of structure and behaviour information as well as other
metadata and attached artefacts are referred to as a service template and represent a com-
plete, deployment-ready application [17]. The TOSCA application model can then be pack-
aged into a cloud service archive (CSAR) that groups all required information into one file.
The CSAR enables TOSCA-compliant deployment technologies to consume all necessary
artefacts, enact their logic and ultimately deploy the application.
26
CHAPTER 2. LITERATURE REVIEW
27
CHAPTER 2. LITERATURE REVIEW
28
CHAPTER 2. LITERATURE REVIEW
29
Chapter 3
Methodology
This chapter introduces the research methodology followed throughout this thesis. Since the
present work ties in with information systems and software engineering research, design sci-
ence research as a prospective methodology can be considered and is discussed in section 3.1.
Further, with the main goal of this work being the creation of artefacts, the Design Science
Research Methodology process can be applied. Thus, next to introducing design science, its
mandated steps to follow while creating the artefacts are explained alongside presenting the
research design of this work in section 3.2.
30
CHAPTER 3. METHODOLOGY
exact process and activities involved is lacking. Peffers et al. close this gap for the case
of information systems by extrapolating the guidelines and related work into a Design Sci-
ence Research Methodology which involves six activities structured as a process in nominal
sequence which can be iterated upon. The research design of this work follows the Design
Science Research Methodology.
31
CHAPTER 3. METHODOLOGY
4) Demonstration
Refers to demonstration of the artefact’s capability to solve instances of the researched
problem by means of an appropriate activity.
Various exemplary snippets of the core artefact are depicted throughout its presentation
and explanation. Further, an illustrative use case is taken as form of demonstration. It
showcases the possibility of modeling both FaaS-based machine learning aspects as well as
offloaded capabilities in an end-to-end process model and associates the overall ML workflow
with related services (i.e. in the case of the use case a resource provisioner and external
monitoring component). Extensive simulations or experimentation cycles were limited in
the scope and time of this work.
Further, a conceptual mapping scenario is demonstrated that converts a simple BPMN4sML
diagram of a credit default prediction machine learning workflow into a corresponding TO-
SCA template to orchestrate it on a public cloud provider, namely AWS. The serverless
nature of the workflow is realized through FaaS functions orchestrated by the AWS Step-
Function engine.
5) Evaluation
References observation and measurement regarding effectiveness of the artifact’s support for
a solution to the studied problem and ties in with the foregoing demonstration. Evaluation of
design artefacts can follow different methodologies such as elaborate evaluation frameworks
that draw from previously established requirements to assess the artefact. Depending on
available capabilities and constraints different methods can be pursued. In place of a soph-
isticated evaluation set-up, a descriptive method can be applied which however constitutes
a limitation and a call for future research.
32
CHAPTER 3. METHODOLOGY
In the course and scope of this thesis, a descriptive evaluation method is considered by
leveraging the preceding illustrative scenarios. The use case references an existing imple-
mentation of an online machine learning solution proposed in peer-reviewed related literat-
ure to highlight the artefact’s ability of depicting such a system as a BPMN4sML workflow
model. It further validates the developed notation to represent existing machine learning
tasks, data artefacts and events while accounting for appropriate semantics.
Further, the conceptual mapping scenario validates the possibility of converting a tech-
nology independent and interoperable machine learning process model into a corresponding
deployment model for serverless orchestration. Taken together, the illustrative use case ref-
erenced from related literature and the mapping scenario realizing the conceptual mapping
from BPMN4sML to TOSCA are considered to represent a sufficiently convincing proto-
type that addresses the specified objectives and answers to the identified research problem.
Synthesis of requirements through analysis of literature and industry publications can be
understood as its own artefact and contribution, albeit an immeasurable one.
With regards to novelty, the artefacts are new to existing solutions - BPMN4sML incor-
porates (serverless) machine learning previously unaddressed by the standard; the concep-
tual mapping and transformation to TOSCA realizes a new mapping from the novel BPMN
elements to existing TOSCA counterparts. On their own, the artefacts represent a high level
of novelty. As an extension to current solutions, i.e. BPMN and TOSCA, the entirety of
the artefacts and solution could however be viewed as limited in its novelty given that they
extend current solutions.
The artefacts contribute to the knowledge base of (serverless) machine learning workflows
as well as to general process modeling and deployment modeling as part of model-driven
engineering. They address both, practical as well as academic perspectives.
6) Communication
Refers to the communication of the research process, its artefacts and results as well as
the developed knowledge and overall research contribution to different audiences such as
technology-savvy stakeholders as well as managerial ones.
As this study is conducted outside of an enterprise, stakeholders cannot be considered
directly. The communication is realized by means of this thesis which explains the processes,
generated artefacts and results as well as illustrates and discusses them. A scientific public-
ation would be necessary to fully adhere to communication as it is addressed by DSRM.
Overall Considerations
While working towards the objectives and generally throughout the entirety of the DSRM
process, research rigor needs to be ensured. For the generation of the artefacts, math-
ematical knowledge or a similar formalism to establish proofs is not suitable. Instead a
well-established theoretical foundation is created while adhering to the research method-
ology. The theoretical foundation of this work takes in particular from machine learning
research and industry practices. Further, serverless computing concepts are formed and
organized, especially with regards to Functions-as-a-Service. Business process management
practices, workflow patterns and modeling languages are reviewed.
To form the theoretical knowledge foundation and understand the current state of re-
search, this work conducts a white and grey literature review that adheres (to the best
extend possible given the scope of this work) to principles brought forward by Kaiwartya
et al. [123]. A systematic multi-vocal literature review as proposed by Petersen et al. [124]
and realized by for instance Cascavilla et al. [125] was infeasible. To circumvent ensuing
limitations, existing literature reviews, surveys and reference architectures are leveraged and
snow-balled upon.
To create the BPMN extension, next to the DSRM process a proposed methodology from
and applied in related literature is followed [12, 16] - 1) the target domain for the extension
is analysed in-depth, 2) the scope of and requirements for the extension are established, 3)
the core structure of BPMN is extended (i.e. meta-models) and ultimately 4) the notation
is extended.
33
Chapter 4
In order to properly represent machine learning workflows in a process model diagram, their
constituent characteristics such as artefacts generated or accessed throughout the process
or activities recurring over various ML workflows need to be identified so as to derive over-
all modeling elements accounting for the domain. The corresponding requirement analysis
builds up on the previous sections w.r.t. business process modeling, serverless paradigms
and machine learning in chapter 2. Machine learning specificities are elaborated upon by
revisiting the machine learning lifecycle in detail in section 4.1. Moreover, given that the
present research pursues a serverless deployment approach, certain serverless characteristics
come into play that potentially need to be considered when creating the modeling exten-
sion elements. Consequently, section 4.2 discusses prerequisites associated particularly with
the Function-as-a-Service paradigm. Ultimately, BPMN functionality and extensions for
related domains need to be taken into account, ensuring that already established concepts
and elements are re-used, built upon or referenced. Therefore, this chapter concludes with
a synthesis of the identified requirements and an equivalence check to existing BPMN func-
tionality that informs the subsequent work. Note that the provided requirement indices in
this chapter are structured according to Table 4.1 for clarity’s sake and are thus potentially
unsorted throughout the following sections.
34
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
experts such as data scientists perform analytical tasks both on available data, i.e. explor-
atory data analysis, and on surrounding business processes and goals in order to establish a
boundary frame in which the developed ML service can be placed. The respective activities
can be considered outside the actual machine learning workflow, in the sense of a pipeline
that is to be deployed, and thus are not explicitly limited by currently available modeling
elements - for instance a user task in BPMN can reflect the activities a data scientist per-
forms. In contrast, potential information in form of artefacts that are generated through
these tasks may need to be referenced later on in the ML workflow. An example poses a
requirements document which formalizes the constraints for the ML solution through per-
formance thresholds or other explicit standards as elaborated by Ashmore et al. [37] (R 24).
Additionally, decisions made with regards to which overall ML methodology is applicable to
the respective scenario may inform the subsequent ML workflow [42].
A relevant event occurring throughout this phase can be the establishment of new re-
quirements, e.g. a higher threshold for ML model performance, or the identification of new
relevant features that shall be added to the dataset used for model training. If new re-
quirements present themselves, the entire ML pipeline or selected activities such as model
learning or model verification may need to be initialized (R 7). Furthermore, a decision for
actively triggering the subsequent ML workflow phases and activities, i.e. data preparation,
ML learning and verification and potential deployment, can be taken (R 6).
35
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
not directly require labeled data instances as part of the feature sets (they may instead
need to be combined). Thus, in case of supervised learning, another storage for the final
dataset used for the ML learning phase is can be necessary, i.e. a dataset repository, (R 28,
29) [41]. Note that this is not an explicit constraint but more of a guideline as feature sets
may indeed also hold a target variable.
Different from sourcing and fusion operations, data management and preparation in the
context of machine learning requires separated representation. As indicated in section 2.2.2
a large number of data pre-processing steps can be conducted to shape the stored data or
features into the ideal format for a ML task. While not each potential technique needs to be
differentiated, on a modeling level it is still beneficial to account for heterogeneous outputs of
these operations. For instance, a conventional pre-processing activity such as data cleaning
(removing missing values etc.) takes raw data or a feature set and outputs a processed
or ’cleaned’ dataset or a further processed feature set (R 30) [37, 42, 132]. In contrast,
a feature engineering activity may provide a dataset or a feature set which is differently
structured from the one used as input, i.e. only a selected number of new, changed features is
returned - for instance by combining existing features through some numerical computations
or by selecting a set of features through some technique (R 31) [37, 42, 109]. Moreover,
the processed dataset or features can be enriched, e.g. by combining disjoint features on
different aspects of the phenomenon under study into one larger feature or dataset in order
to create a stronger, enriched signal for the ML algorithm to capture (R 32). In the case of
supervised learning, an important operation during the data management phase is splitting
the processed datasets into subsets used for 1) training, 2) potential validation and 3) testing
(R 29, 33) [44, 4, 37].
Notably, not all data preparation activities can be conducted on the dataset prior to
splitting it. In cases such as standardization, dimensionality reduction or imputation, data
leakage must be avoided and thus observations considered for such techniques can only do
so in their bounded context for either training, validation or testing. In literature, these
operations still count as a pre-processing, feature engineering or enrichment activity and
their correct application lies in the hands of ML engineers or data scientists [45]. Overall the
discussed activities (i.e. general overall data preparation) can be understood as a sequence
of immutable operations (or one overall operation) solely aimed at the provision of the right
datasets as input for the ML task.
Further, pre-processing techniques such as data balancing can lead to a new, larger dataset
carrying synthetic data either stored in a dataset repository or directly fed to the ML
algorithm. Hence, these activities may also be associated with an embedded operation as
part of the actual model learning [37, 109]. Thus, next to defining such pre-processing steps
as activities, for instance in the case of job offloading (see section 4.2) the actual processing
can be handed over as a recipe, i.e. a source code file or string containing the code, to the
respective training or tuning operation (34). Besides proper representation of such code
recipes sent to offloading jobs, storing them for future access can also be necessary, e.g.
through a metadata repository (R 35) [41].
Relevant events occurring throughout this phase can be deduced. The presence of latest
available raw data or an update to raw data, feature sets or datasets in the respective
repository can trigger subsequent operations (R 8) [44]. In the case of infrequent data
updates, scheduled events for data sourcing or preparation may be put into place to trigger
a new run of activities (R 9) [44]. Moreover, if a data scientist or engineer decides to update
the data preparation sequence, they may require to manually (or automatically) trigger a
new run - this action-based trigger is of the same concept as the previous one in R 6.
Further, enabling the modeling of events triggering not just the beginning of a phase (or
the entire ML pipeline) but also the actual operation should be supported - in case of large
volumes of data, only the very necessary set of steps should be instantiated to avoid needless
computations (R 22). Due to being mostly event-driven, FaaS-based architectures support
this requirement as well, see section 4.2.
36
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
37
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
automated aspect of the operation, i.e. many models are trained and scored on different
hyperparameter sets. In this case, the tuning operation incorporates required splits for
validation sets - as part of pre-defined resampling strategies that are part of the overall tuning
configuration such as nested resampling [136]. Consequently, any tuning operation requires
a training dataset and potentially a tuning configuration recipe [44]. Similar to regular
training, the tuning configuration may incorporate an identifier to locate the training dataset
instead. Produced artefacts can be the sets of hyperparameters alongside the aggregated
performance scores of the models, i.e. the tuning results, (R 24) [41, 37]. Further, also the
trained models can be kept and stored for future access. In case a large number of models
is generated, it is more efficient to only keep the list of hyperparameters and performance
scores.
Subsequently, a final regular training job can be run on the full dataset (i.e. training
+ validation) that takes the best performing hyperparameter configuration and returns
the possibly ’best’ model in respect of the context. Ultimately, the ML solution is stored
in a model repository, triggering the verification phase given that it meets the requirement
thresholds(47) [44, 43]. If evaluation requirements still cannot be met, a performance deficit
report may be generated (24) [37].
Apart from model training, potential operations preceding or subsequent to it can occur,
referenced as further pre-processing or post-processing. These range from miscellaneous but
necessary activities such as identifying and extracting the best performing set of hyperpara-
meters and models after a tuning strategy (R 44) [44] to more complex ones depending on
the ML method in place. As such, transfer learning requires loading an existing ML model
as part of the training job to learn the model on the new dataset (R 45) [27]. Next to this,
in the case of ensemble learning several ML algorithms are run, each trained and scored or
tuned directly. Ultimately their inferences are merged through a voting or consensus opera-
tion that may aggregate the predictions in some manner or even eliminate the propositions
of some of the models [31]. Principally, ensemble learning can be applied as part of a single
algorithm, e.g. Random Forest or AdaBoost, or across algorithms. In the case of modeling,
it is of interest to represent the cross-algorithm ensemble functionality whereas the former
one is already covered by a single learning operation (R 46).
Relevant events occurring throughout the model learning phase are manifold. Re-training
based on an existing configuration can be necessary in response to several ML lifecycle
scenarios (R 10). Likewise re-tuning the entire solution may be required in case the con-
figuration of hyperparameter values is no longer deemed applicable (R 11) [44]. Reasons
include changes in raw data, feature sets or full datasets. At large, a degradation of a de-
ployed model’s performance [137] caused by events related to concept and data drift requires
proper representation on a modeling level as such events can trigger the model learning phase
as well as possible data preparation activities (R 12, 13). Next to restarts in response to
ML lifecycle situations, also scheduled re-training and tuning is common practice (see R
9) [137, 44, 48]. Just as for changes in the implementation of data preparation activities, a
manual trigger by action of a data scientist or engineer should allow restarting the model
learning phase (or selected activities) from the beginning as well (R 6). Having fully trained,
tuned and evaluated a model, two types of events can occur - 1) a model learning deficit
event holding the generated performance deficit report to request intervention of an expert
or 2) a request for model verification (possibly triggered by writing the model artefacts to a
corresponding directory) (R 15, 16) [37, 42, 44].
Next to triggers for initialisation of operations, in machine learning workflows, excep-
tions and execution errors play an ever-present role. Consequently, accounting for mitigating
actions within a process model should be supported (R 48) [5, 44]. While understanding
all possible manifestations of errors is not focused by this work, a selected sample for illus-
tration purposes can be considered - this includes: 1) a training or tuning job failure due
to convergence problems; 2) tuning or evaluation error e.g. due to new unseen classes of
categorical features that the trained model is unable to process; 3) computational failures
(or exceptions) outside from mathematical problems, such as a training task running out of
allocated memory or crossing a threshold of allocated time (see section 4.2). Considering
such training, tuning or evaluation errors, forwarding them to data scientists for inspection
38
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
4.1.5 Deployment
To provision a ML model and make it accessible to other components of the broader system
infrastructure of an organization (e.g. virtually as a service or physically on an edge device),
different storage and provisioning solutions are available. Similar to operations and artefacts
described previously, the deployment operation warrants a representation as a modeling ele-
ment R 51 [37, 48]. Depending on the scenario, the actual model deployment or integration
activity may be constrained by available computing resources or e.g. by specific demands
for short-time inference responses. Further, ML model deployment may consist of more
than a singular operation - in case of edge-ML, deploying the model can involve elaborate
pre-processing and optimization techniques w.r.t. the ML model to specify and optimize
the deployment plan [109]. Adhering to these constraints can however be understood as a
matter of implementing the modeled deployment operation correctly prior to running it and
does therefore not necessitate an explicitly new modeling statement to arrive at a minimal
set of requirements for conceptual modeling of ML workflows. The deployment operation
can either be informed by a model identifier (e.g. a path to locate the verified ML model)
or by actually receiving the model as a file, for instance in case of edge-based scenarios. It
then correctly integrates the model in the environment and returns identifiable information
w.r.t. the model’s access points [44].
In security-critical environments, prior to the actual model deployment, the ML solution
may instead be integrated into a staging environment resembling the production environ-
39
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
ment (e.g. in form of shadowing). The purpose is to fireproof the model’s adherence to
requirements as well as to fully validate its operational fit [137, 44]. A staging operation can
be synonymous to a deployment operation, the only difference being the value of a variable
specifying the deployment environment.
In case a change or update of an already deployed ML model is required, after deployment
of the latest ML solution, the previous one(s) are to be deprecated (R 52) [48]. This involves
removing the access to the now retired model(s), documenting the process and potentially
saving the retired models in a different location. Thus, also new metadata is created such
as a new identifier or path and ML model status (e.g. deprecated ).
Relevant events occurring throughout the deployment phase are mainly 1) informing
connected services of the now active ML solution depending on the mode of deployment -
for example initiating a monitoring service (R 20) - and 2) potentially triggering deprecation
of an existing model (R 19).
40
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
events relevant to the machine learning lifecycle can be highlighted included in the modeling
diagram.
Relevant events occurring throughout the monitoring and inference phase are related to
concept or data drifts which in turn can trigger previous steps of the ML pipeline. Thus,
a data drift may initialize data preparation and subsequent model training activities. Sim-
ilarly, registering concept drift indicating upcoming model performance degradation can
require a data scientist to step in or to automatically perform re-training or re-tuning (see
R 12, 13). Further, in case the model’s operational performance, e.g. response time,
degrades due to shortcomings in the overall system infrastructure an event requesting an
expert’s intervention to the ML solution can be necessary (R 14). An inference activity
may be automatically initialized in some scenarios and thus justified a representative event
(21). A violation of data structure or format of the incoming inference request may result
in an inference error (R 48).
41
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
are possible or entirely new ways of incorporating model and data partitioning [113]. Such
scenarios can be considered sub-classes or rather instances of the identified general federated
learning operations and artefacts. Hence, in case the explained generic federated learning
elements do not capture the scenarios sufficiently, additional extensions can be created build-
ing on top of the existing one in future work.
42
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
The Business Process Model and Notation standard is compliant (or can be compliant)
with respect to all listed elements. Moreover, similar to reasoning in related works [9, 16],
BPMN 1) is a well-established technology independent and interoperable process modeling
standard, thereby lending itself well for machine learning workflow modeling, 2) supports
a machine-readable format of the modeled workflow incorporating modeling constructs and
workflow artefacts and 3) helps in explaining the workflows to both non-technical and tech-
nical stakeholders through a graphical notation.
To conclude the requirements analysis, a summary next to an equivalence check with
BPMN on the identified machine learning specific requirements is provided in listing 4.1.
43
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
Table 4.1: Aggregated analysis and BPMN equivalence check of generic FaaS and work-
flow concepts and ML workflow and lifecycle concepts for derivation of extension element
requirements
44
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
45
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
46
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
47
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
48
CHAPTER 4. REQUIREMENTS FOR STANDARD BASED CONCEPTUAL
MODELING FOR SERVERLESS ML WORKFLOWS
49
Chapter 5
This chapter consolidates the preceding requirement analysis and introduces the BPMN
extension for modeling FaaS-based (serverless) machine learning workflows (from now on
abbreviated as BPMN4sML). The addition of sML specific BPMN elements targets improved
modeling, analysis, visualization and communication of ML workflows in a standardized
manner, further increasing transparency of activities involved in a ML solution. Moreover,
the extension elements shall help in mapping and translating ML workflows to equivalent
artefacts of deployment models such that a deployment model can eventually be derived from
its preceding BPMN process model. In line with the requirement analysis, the extension
elements build on top of concepts from the ML lifecycle, incorporating phases within and
outside of ML pipelines, and on top of underlying concepts of the Function-as-a-Service
paradigm. Distinctive ML lifecycle and workflow elements can be identified with respect
to activities (i.e. BPMN tasks) alongside their inputs and outputs or produced artefacts
(i.e. data objects) as well as artefact repositories (i.e. data stores). Further, certain key
occurrences (i.e. events) can be generalized across ML workflow instances. In line with
Yousfi et al. [12] the added elements are directed towards a conservative extension, that is
to say an extension which does not alter nor contradict the BPMN semantics of the OMG
standard.
In the following the extended BPMN metamodel is presented in section 5.1 with a sub-
sequent presentation of a corresponding notation, i.e. graphical elements that allow modeling
the added metamodel components, in section 5.2. The notation goes along with exemplary
modeling snippets to showcase usage and further validate applicability of the introduced
extension for specific ML workflow scenarios.
• Figure 5.1 introduces new tasks as extension to Activity belonging to the Flow Object
category
• Figure 5.2 extends the BPMN Event from the category Flow Object
• Figure 5.3 introduces the extension of DataObject and DataStore from the category
Data.
50
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
Activity
(from Activities)
-isForCompensation: Boolean
-startQuantity: Integer
-completionQuantity: Integer
Task
(from Activities)
ServiceTask
(from Activities) BusinessRuleTask
(from Activities)
-implementation: String
-implementation: String
OffloadedTask
FaaSTask
-offloadingTechnology: String ScriptTask
-FaaSConfiguration: String (from Activities)
-platform: String
-platform: String
-MLAccelerator: String
-script: String -scriptFormat: String
--script: String
-script: String
MonitoringTask ScoringTask
EvaluationTask
JobConfigurationTask
TuningTask
All extended elements are fundamental to represent a ML workflow with varying flavors -
for example a fully FaaS-based pipeline, a hybrid pipeline that leverages job offloading or
a service that only focuses on how a deployed model is integrated with another business
process. Grey coloured elements highlight the extension whereas white coloured elements
are part of the BPMN standard metamodel.
51
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
FaaSConfiguration of type String and 3) script of type String. The platform property is spe-
cified to account for differences in function configuration between FaaS-providers. As such
if a FaaSTask is mapped to an element of a deployment model, different verification checks
for its configuration can be run depending on the specified platform on which the serverless
function is deployed. The FaaSConfiguration refers to implementation details that define
the serverless function (for instance an identifier such as an AWS ARN number or allocated
memory). Finally, an optional script may be included (similar to the ScriptTask ) which
can be referenced if the serverless function needs to be freshly deployed and implemented.
The script can be used to define the business logic of the FaaSTask. A script may not be
necessary if the serverless function already is deployed and its logic is implemented.
An OffloadedTask comes with three attributes: 1) offloadingTechnology of type String,
2) MLPlatform of type String and similar to the FaaSTask a script of type String. The
offloadingTechnology takes from the implementation of the ServiceTask and is used to dif-
ferentiate on an abstract level between technology types, e.g. cloud or edge. In case of cloud
the remaining attributes MLPlatform and script may be defined. They allow to specify the
1) machine learning platform such as Azure Machine Learning or Amazon SageMaker which
can be directly included in a machine learning workflow instead of a serverless function and
2) a potential script which can hold configuration instructions of the offloaded job.
Next to the two overarching Tasks, different task types typical to the machine learning
domain are proposed as an extension to the standard. The identified tasks relate to the
requirement synthesis and its preceding analysis in section 4.3. The new task types can
inherit from both the FaasTask or the OffloadedTask, this is to show that the actual activity
can be realized as a serverless function or as an offloaded job. Thus, depending on the
modeled scenario different attributes may need to be specified. An exception is the JobCon-
figurationTask which only inherits from the FaaSTask and can precede an offloaded task to
generate any artefacts that are required to fulfill the job. Note that the DeploymentTask
defines environment as an optional attribute of type String. The attribute allows to specify
the target environment of the deployment activity and can take the form of a staging or a
production environment in which the verified machine learning model should be placed.
52
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
StartEvent
(from Events)
ImplicitThrowEvent
(from Events) Event -isInterrupting: Boolean
(from Events)
IntermediateCatchEvent
IntermediateThrowEvent CatchEvent (from Events)
(from Events) ThrowEvent
(from Events) EventDefinition (from Events)
(from Events)
-inputSet: InputSet -parallelMultiple: Boolean BoundaryEvent
-outputSet: OutputSet (from Events)
EndEvent
(from Events)
-CancelActivity: Boolean
LinkEventDefinition ConditionalEventDefinition
(from Events) (from Events)
TerminateEventDefinition TimeEventDefinition
(from Events) (from Events)
CancelEventDefinition CompensateEventDefinition
(from Events) (from Events)
SignalEventDefinition Signal
(from Events) (from Common)
DataSetID DataSetUpdateEvent
Definition EscalationEventDefinition Escalation
RawDataUpdateEvent (from Events) (from Common)
RawDataID
Definition
PerformanceDeficitEvent DeficitReport
FeatureSetID FeatureUpdateEvent
Definition
Definition
VerificationEvent TrainedModelID
RequirementsDocumentID RequirementChangeEvent
Definition
Definition
VerificationFailureEvent VerificationResult
DataDrift DataDriftEvent
Definition
Definition
DeploymentEvent VerifiedModelID
ConceptDrift ConceptDriftEvent
Definition
Definition
DeprecationEvent DeployedModelID
OperationalPerformance OperationalDegradation
Definition
Degradation EventDefinition
ItemDefinition
(from Common)
-itemKind: itemKind
-structureRef: Element
-isCollection: Boolean
53
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
BaseElement
(from Foundation)
DataAssociation ItemAwareElement
(from Data) (from Data)
DataInput
(from Data)
DataInputAssociation DataOutputAssociation
(from Data) (from Data) DataObject
-name: String (from Data)
-isCollection: Boolean
Activity InputOutputSpecification
ResourceRole (from Activities) (from Data)
-isForCompensation: Boolean
Lane -startQuantity: Integer
-completionQuantity: Integer
MLModelObject MLDataObject
<Extension Enum>
DocumentType
<Extension Enum>
-RequirementDocument
<Extension Enum> <Extension Enum> DataSetType -TuningResult <Extension Enum>
RepositoryType MLDataObjectType -EvaluationResult ConfigurationType
-TrainingDataSet -DeficitReport
-RawDataRepository -RawData -ValidationDataSet -VerificationResult -TrainingConfiguration
-FeatureSetRepository -FeatureSet -VerificationDataSet -InferenceResult -EvaluationConfiguration
-DataSetRepository -FullDataSet -InferenceRequestDataSet -Model & Data Statistics -TuningConfiguration
Figure 5.3: BPMN4sML metamodel data object & data store extension
Extended BPMN data artefacts relating to artefacts produced throughout the machine learning
lifecycle, particularly throughout ML pipelines, as well as to artefact repositories relevant to machine
learning in operation. The new artefacts inherit from the base BPMN DataObject or from DataStore.
They define properties thereby lowering the level of abstraction to accommodate ML and FaaS
characteristics.
54
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
DataObject Extension
In total, seven new elements inheriting from DataObject are proposed. Extensions to the
BPMN DataObject are necessary as a DataObject would otherwise lead to a high level of
abstraction. Further, the use of a Property as an alternative DataAwareElement is not
possible as only processes, activities and events, i.e. FlowElements, may be associated with
it [11, p.208],[16]. Consequently, lowering the level of abstraction can be achieved by creation
of sub-classes of DataObject and defining the necessary attributes.
A MLModelObject represents the machine learning models created throughout the ML
workflow. It further specifies two attributes, an identifier of type String for explicit identi-
fication of the model artefact and an optional status of type String to allow differentiating
between a trained, verified, deployed or deprecated ML model.
Apart from the ML model, other core ML artefacts are accounted for by means of
MLDataObject. The extension element provides an identifier attribute of type String to
explicitly describe it. Further, a MLDataObject defines two other attributes, a dataOb-
jectType of type RawData, FeatureSet or FullDataSet and an optional dataSetType of type
TrainingDataSet, ValidationDataSet, VerificationDataSet or InferenceRequestDataSet. The
dataSetType may be specified in case the dataObjectType is set as FullDataSet. A choice
has been made to account for the different types by means of an attribute in order to avoid
cluttering the metamodel.
Further, a CodeObject is defined with two attributes, 1) an identifier of type String and
2) an operation of type String. The operation attribute allows to specify the logic or set of
programming commands that the artefacts holds.
Next, BPMN4sML proposes a LearningConfiguration with three attributes, 1) an identifier
of type String, 2) a configuration of type String and 3) a configType of type configurationType
which can take the values TrainingConfiguration, EvaluationConfiguration or TuningConfig-
uration. The configuration describes the actual training, evaluation or tuning configuration.
To further address the identified requirements a LogObject is introduced with a LogCon-
tent attribute of type String which holds the information and content of that log file. In
a similar vein, a MetadataObject is proposed alongside three attributes. An association of
type String relates the metadata artefact to the corresponding artefact that the metadata
pertain to. A location of type String holds the location of the related artefact. Optionally,
further information can be described via the description attribute of type String to provide
enough flexibility to account for the various information a metadata object may hold, as
elaborated by for instance Schelter et al. [110].
Finally, a Document artefact is proposed, inspired by the extension by Braun et al. [16].
Similar to the preceding extension elements, the Document is an extension of DataObject.
It holds three attributes. An identifier of type String as well as a documentContent of
type String which holds the information of the document. Further, a documentType allows
referencing a specific type. The different types relate to the various documents that are
relevant to the machine learning lifecycle or are created by ML tasks - they can be realised
as RequirementDocument, TuningResult, EvaluationResult, DeficitReport, VerificationRes-
ult, InferenceResult or Model & Data Statistics.
55
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
DataStore Extension
Next to the DataObject extensions, DataStore extensions provide possibilities for modeling
artefact-specific storage. A ModelRegistry defines a repository to store MLModelObjects at.
It specifies two attributes of type String, a placement and a platform. A placement allows
differentiation between local, i.e. internally hosted, or external cloud-based model registries.
The platform further allows to describe the cloud provider in case of an external cloud-based
storage solution. Two other types of persistent storage are defined with the same attribute
semantic. A LogStore allows modeling a repository to (or from) which LogObjects can be
written (read). A MetadataRepository allows to formally model a storage that can hold
artefacts pertaining to the ML workflow such as MetadataObjects, CodeObjects, Learning-
Configurations, or other Documents. Furthermore, a DataRepository extends the abstract
DataStore. Next to a placement and platform attribute, it allows specifying a repositoryType
which can take the values RawDataRepository, FeatureSetRepository or DataSetRepository.
While the metamodel extension does not specify an individual element for each data repos-
itory, the notation extension supports differentiation by means of different visual elements.
56
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
57
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
MetadataObject Document
ModelRegistry MetadataRepository
58
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
MLDatasetObject InferenceRequest
DatasetObject
5.2.2 Activities
As explained in section 2.5, activities represent operations in a BPMN process diagram. Of
interest to this study is the Task type through which an atomic activity can be defined de-
scribing a specific action executed within the process flow [11, p.154ff]. The OMG standard
provides seven sub-classes to the Task type, namely Service, Send, Receive, User, Manual,
Business Rule and Script. Each of the sub-types constraint the generic Task type to define
the operation more precisely addressing various scenarios. In the context of machine learn-
ing workflows two sub-classes are potential candidates for extension - the ScriptTask and
the ServiceTask. While the former task allows to specify a script which can be equated with
the business logic of a serverless function it is constrained to the execution environment of
business process engines and thus not appropriate for the FaaS-based nature of the modeled
ML workflows. In contrast, a ServiceTask does not constrain the execution environment.
While it does not fully reflect the needs for the context of this research, it can be extended
59
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
FaaSTask TrainingTask
DataSourcingTask VerificationTask
FusionTask TuningTask
DeprecationTask
VotingTask
Pre-processingTask ModelSelectionTask
DataSplitTask
ConfigurationTask MonitoringTask
to do so. Consequently, within BPMN4sML new task types as sub-classes of ServiceTask are
defined that express the extension for the FlowObject Activity in context of the previously
described metamodel in Figure 5.1. Each task comes with its own notation and follows a
set of guidelines that are elaborated in the remaining part of this section. For explanation
purposes, selected examples describing the application of the extension are given with some
of them being visually depicted. The examples are workflow fragments highlighted by using
Link Intermediate events that are of type Catch or Throw This is valid for all of the fol-
lowing examples. Note that all tasks can produce MetadataObjects and LogObjects which is
therefore omitted in the task description to highlight task-specific differences. Furthermore,
the respective produced and sourced artefacts implicitly imply a read or write operation to
the various data store solutions.
FaaSTask
A FaaSTask is a Task that represents a serverless function which is run on a cloud and
executes some logic. It can interact with DataObjects and DataStores and has typical FaaS
constraints, i.e. short running time, fixed memory, statelessness. A FaaSTask is triggered by
an event (e.g. a simple invocation, not to be confused with a BPMN event) which serves as
data input and is assumed to be in JSON format. Its content may be identifiers to relevant
data artefacts that the FaaSTask operates on. On a BPMN modeling level the triggering
event may either be explicitly declared via a BPMN event or implicitly by modeling arcs, e.g.
sequence flow between activities. After executing its logic, the FaaSTask is completed. It
can return information in JSON format, i.e. data outputs, and thereby produce events in the
serverless context. The selection in favor of JSON has been made as FaaS products (e.g. by
Google Cloud, Azure, AWS) and other programming languages support its interpretation.
A FaaSTask object shares the same shape as the Task, which is a rectangle that has rounded
60
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
corners. However, there is a FaaS icon in the upper right corner of the shape that indicates
that the Task is a FaaSTask, see Figure 5.6.
• Example: After a tuning job, a new tuning result is produced carrying information
about hyperparameter sets, their values and respective performance scores that the
corresponding models achieved. The result is propagated to a serverless function,
i.e. FaaSTask, triggering its invocation. The function identifies the best score and
the associated hyperparameter values and MLModelObject to reference it in the next
steps. This ML workflow fragment is illustrated within Figure 5.7.
OffloadedTask
An OffloadedTask is a Task that represents an operation which is run outside of a serverless
function on an optimized hardware stack or on an edge device. The offloaded task executes
some logic. The logic can be a pre-defined configuration or handed to it for instance through
a CodeObject. Similar to a FaaSTask, in cloud-based scenarios the offloaded task may directly
be triggered by an event (from the serverless context) which serves as data input and return
a data output, thereby producing an event in the serverless system (not to confuse with
BPMN events). An OffloadedTask can but does not have to be resource constrained as a
FaaSTask is. After executing its logic, the OffloadedTask is completed. An OffloadedTask
object shares the same shape as the Task, which is a rectangle that has rounded corners.
However, there is a cloud-edge icon in the upper right corner of the shape that indicates that
the Task is an OffloadedTask, see Figure 5.6. An illustrative example is given in Figure 5.7.
JobConfigurationTask
In the context of BPMN4sML, a JobConfigurationTask represents a serverless function
used to prepare an offloaded task. This can manifest in the generation of a CodeObject that
specifies a script to run during the offloaded task, in the generation of a LearningConfigura-
tion, e.g. to specify a tuning set-up, or in the identification of devices on which the offloaded
job shall be executed. The generated artefacts may be directly propagated as files or as iden-
tifiers to those files (stored in a MetadataRepository). A JobConfigurationTask may precede
a task that inherits from OffloadedTask. Alternatively, it may define the entire operation
through the CodeObject. A JobConfigurationTask object shares the same shape as the Task,
which is a rectangle that has rounded corners. However, there is a configuration icon in the
upper left corner of the shape that indicates that the Task is a JobConfigurationTask, see
Figure 5.6.
• Example: This example describes a use case for a JobConfigurationTask, an of-
floaded TuningTask and a JobOffloadEvent. For a XGBoost algorithm, a complex
tuning job defined by a large GridSearch and 10-fold nested cross-validation needs to
be executed to identify an appropriate set of hyperparameters. To mitigate computa-
tional resource constraints, the tuning job is directly run on AWS SageMaker which
serves as a machine learning platform. The tuning strategy may be directly specified
within the configuration script in JSON format as part of the OffloadedTask. Altern-
atively for this use case, a JobConfigurationTask defines certain aspects of the tuning
configuration, such as which hyperparameter sets to try out, dynamically prior to ex-
ecuting the offloaded task. The configuration is then propagated to the TuningTask.
A JobOffloadEvent is triggered once the tuning job finishes and the process continues.
In this scenario, the produced TuningResult is not lost due to the offloaded tuning
task not being stateless. Subsequent tasks may therefore access and analyse the Tun-
ingResult without it being explicitly written to a MetadataRepository. This workflow
fragment is illustrated in Figure 5.7.
Note that the following tasks can be realized as FaaSTask or as OffloadedTask. Their
notation allows to differentiate between the two by having either the FaaSTask or the Of-
floadedTask icon in the upper right corner of the element.
61
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
Tuning-phase [SageMaker]
Trainings
DatasetRepository
Machine Learning Process - FaaS AWS
Data
Tune XGBoost
[SageMaker]
TuningResults
Coordinator [FaaS-System]
Hyperparams
Retrieve best
Configure Hyper Hyperparameter
Parameter Sets Values
Tuning Job Finished
DataSourcingTask
A DataSourcingTask represents an operation to retrieve raw data from a raw data pro-
vider, potentially transform it and write it to a RawDataRepository. The data collected and
written to the repository is a MLRawDataObject. The task can propagate the identifier to
the new MLRawDataObject. In case the sourced data is already processed it can be written
as a MLFeatureSetObject to a FeatureSetRepository. This task often is the initial step to-
wards a machine learning workflow. A DataSourcingTask object shares the same shape as
the Task, which is a rectangle that has rounded corners. However, there is a sourcing (ETL)
icon in the upper left corner of the shape that indicates that the Task is a DataSourcingTask,
see Figure 5.6.
• Example: For a hypothetical stock trading bot the latest stock information needs to
be regularly sourced. Every evening, a DataSourcingTask is executed to retrieve the
stock data of that day in order to update a machine learning model for the next day.
DataValidationTask
A DataValidationTask is a Task that examines if anomalies start occurring in a MLFea-
tureSetObject or MLDatasetObject. Anomalies may be the disappearance of features or a
strong change in the data distribution. The new feature sets or datasets are compared against
existing schemata and known data statistics which can be sourced from the MetadataRepos-
itory to realize the validation activity. A DataValidationTask object shares the same shape
as the Task, which is a rectangle that has rounded corners. However, there is an additional
icon representing data validation in the upper left corner of the shape that indicates that
the Task is a DataValidationTask, see Figure 5.6.
• Example: Elaborating on the previous example, prior to considering the retrieved
stock data as input for model training, it is ensured that no drastic change in the data
statistics occurred - for instance in case of a crash of a specific stock or a part of the
stock market any machine learning model may not be applicable anymore.
DataFusionTask
A DataFusionTask is a Task that fuses data or features originating from different data
providers but specifying the same phenomenon. It can operate on MLRawDataObjects as
62
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
• Example: A radar and a camera capture separate observations of the same car driving
through a street. To create a unified MLFeatureSetObject the two data artefacts need
to be fused by correctly associating the observations with each other. The unified
MLFeatureSetObject is subsequently written to the FeatureRepository.
PreprocessingTask
A PreprocessingTask is a Task that performs various data processing operations on a
MLDataObject such as data cleaning or imputation of missing observations. It can operate
on exactly one MLDataObject and writes the processed MLDataObject to the respective
FeatureRepository or DatasetRepository. A PreprocessingTask object shares the same shape
as the Task, which is a rectangle that has rounded corners. However, there is an additional
icon representing data processing in the upper left corner of the shape that indicates that the
Task is a PreprocessingTask, see Figure 5.6. An illustrative example is given in Figure 5.8.
FeatureEngineeringTask
A FeatureEngineeringTask is a Task that creates structural changes on a MLFeature-
SetObject or a MLDatasetObject. For instance this can be the creation of new features
through the numerical combination of existing ones (e.g. creating group averages) or selec-
tion of a subset of features relevant to the model learning activities. It can operate on exactly
one MLDataObject and writes the processed MLDataObject to the respective FeatureRepos-
itory or DatasetRepository. A FeatureEngineeringTask object shares the same shape as the
Task, which is a rectangle that has rounded corners. However, there is an additional icon
representing feature engineering in the upper left corner of the shape that indicates that the
Task is a FeatureEngineeringTask, see Figure 5.6.
FeatureEnrichmentTask
A FeatureEnrichmentTask is a Task that adds several MLFeatureSetObjects or MLData-
setObjects into one overall MLDatasetObject to improve the predictive signal. The enriched
data artefact is then written to the respective DataRepository. It differs from the previously
introduced PreprocessingTask and FeatureEngineeringTask as it requires several MLDataOb-
jects. In contrast to the FusionTask, a FeatureEnrichmentTask operates on features and
datasets of related but different observations. A FeatureEnrichmentTask object shares the
same shape as the Task, which is a rectangle that has rounded corners. However, there is
an additional icon representing feature enrichment in the upper left corner of the shape that
indicates that the Task is a FeatureEnrichmentTask, see Figure 5.6.
• Example: A ML model within a recommender system of a fashion online shop predicts
potential products that a site visitor is interested in based on already visited products.
If the site visitor is a regular user who can be identified, the dataset used to train the
ML model with may be enriched with a feature on the shopping history of users to
create a more customized model.
DataSplitTask
A DataSplitTask is a Task that can split a MLDatasetObject into several MLDataObjects
such as a TrainDatasetObject, VerificationDatasetObject and a ValidationDatasetObject. Not
all three splits must be realized and can be further customized. The data artefacts are read
from and written to a DatasetRepository which the task has a unique connection to. A
63
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
DataSplitTask object shares the same shape as the Task, which is a rectangle that has
rounded corners. However, there is an additional icon representing a data split in the upper
left corner of the shape that indicates that the Task is a DataSplitTask, see Figure 5.6. An
illustrative example is given in Figure 5.8.
TrainingTask
A TrainingTask is a Task that trains a machine learning algorithm to produce a MLMode-
lObject which may be written to the ModelRegistry. The TrainingTask requires exactly one
connection to a TrainDatasetObject which can be sourced from the DatasetRepository and
an optional one to a TrainingConfiguration sourced from a MetadataRepository (or directly
handed to the task as JSON). A TrainingTask object shares the same shape as the Task,
which is a rectangle that has rounded corners. However, there is an additional icon repres-
enting ML model training in the upper left corner of the shape that indicates that the Task
is a TrainingTask, see Figure 5.6. An illustrative example is given in Figure 5.8.
ScoringTask
A ScoringTask is a Task that scores a single trained MLModelObject by means of a scor-
ing metric and a ValidationDatasetObject. The scoring metric can be hard-coded as part
of the task or sourced from a Document of type RequirementDocument. The ScoringTask
requires exactly one connection to a ValidationDatasetObject sourced from a DatasetRepos-
itory. A ScoringTask produces a performance score for a MLModelObject. Alternatively,
if the learning job continuous to not meet the requirements, a DeficitReport may be pro-
duced. A ScoringTask object shares the same shape as the Task, which is a rectangle that
has rounded corners. However, there is an additional icon representing model scoring in the
upper left corner of the shape that indicates that the Task is a ScoringTask, see Figure 5.6.
EvaluationTask
An EvaluationTask is a Task that evaluates the potential fit of an algorithm for a ML
problem. The task requires exactly one connection to a TrainDatasetObject. The resampling
strategy and evaluation parameters can either be hard-coded as part of the task or sourced
from an EvaluationConfiguration. An EvaluationTask produces an EvaluationResult, e.g. a
robust performance estimate of the algorithm, and optionally writes the ML models produced
during resampling to a ModelRegistry (or in case of an offloaded job, stores the models in that
environment). Alternatively, if the evaluation indicates that the algorithm underperforms,
a DeficitReport may be produced. An EvaluationTask object shares the same shape as the
Task, which is a rectangle that has rounded corners. However, there is an additional icon
representing model evaluation in the upper left corner of the shape that indicates that the
Task is an EvaluationTask, see Figure 5.6.
64
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
ValidationSet
DatasetRepository
ModelRegistry
Credit
Default TrainSet
RF
Dataset Model
[trained]
Hold-out Split Train RF model
Balance TrainSet Score RF Model
Balanced
TrainSet
Cost
Metric
DatasetRepository MetadataRepository
• Example: Various algorithms are considered to solve a given ML problem. For each
algorithm an EvaluationTask is executed to robustly estimate the fit of that algorithm
for the ML task. The most promising algorithm is trained once more on the entirety
of the training dataset before ultimately verifying it.
TuningTask
A TuningTask is a Task that tunes a machine learning algorithm. The tasks requires
exactly one connection to a TrainDatasetObject which can be sourced from the DatasetRe-
pository and an optional one to a TuningConfiguration sourced from a MetadataRepository
(or directly handed to the task as JSON). A TuningTask produces a TuningResult that can
be stored in a MetadataRepository if necessary and optionally writes the tuned model(s) to
a ModelRegistry (or in case of an offloaded job, stores the model(s) in that environment). A
TuningTask object shares the same shape as the Task, which is a rectangle that has rounded
corners. However, there is an additional icon representing ML model tuning in the upper
left corner of the shape that indicates that the Task is a TuningTask, see Figure 5.6. An
illustrative example is given in Figure 5.7.
TransferLearningTask
A TransferLearningTask is a Task that re-learns an existing MLModelObject sourced
from the ModelRegistry on a new but related TrainingDatasetObject sourced from the Data-
setRepository. The task produces a new MLModelObject which can be written to the Mod-
elRegistry. A TransferLearningTask object shares the same shape as the Task, which is a
rectangle that has rounded corners. However, there is an additional icon representing model
transfer in the upper left corner of the shape that indicates that the Task is a Transfer-
LearningTask, see Figure 5.6.
• Example: A city wants to integrate a machine learning model in their smart traffic
control system. The model directly extracts the letters on a license plate of speeding
cars to automatically identify and fine the owner. To avoid spending computational
resources on training a sophisticated model from scratch an existing one shall be lever-
aged. For this task, convolutional neural networks are ideal solution candidates and
consequently a TransferLearningTask is picked to model the respective ML workflow.
A pre-trained deep convolutional neural network is sourced from the ModelRegistry.
Its first few layers are frozen whereas the last layers can be updated to associate the
patterns with the numbers of the license plate. Once updated, the new MLModelObject
65
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
is stored in the model registry from where other services and components can access
it.
VotingTask
A VotingTask is a Task to conduct a consensus operation with. The task requires the
model InferenceResults of at least two MLModelObjects. It weighs each inference result
according to a pre-defined schema or method (e.g. a majority vote) and produces a new
InferenceResult. A VotingTask object shares the same shape as the Task, which is a rectangle
that has rounded corners. However, there is an additional icon representing a consensus
operation in the upper left corner of the shape that indicates that the Task is a VotingTask,
see Figure 5.6.
VerificationTask
A VerificationTask is a Task that verifies if a trained MLModelObject complies with all
constraints of a RequirementDocument. The constraints can either be embedded within the
task or handed to it via the RequirementDocument. It differs from the EvaluationTask as it
sources a never before seen VerificationDatasetObject to test the model with. It can produce
a VerificationResult in case the constraints are not met. Otherwise, it may modify the status
of a MLModelObject to verified. A VerificationTask object shares the same shape as the
Task, which is a rectangle that has rounded corners. However, there is an additional icon
representing a verification operation in the upper left corner of the shape that indicates that
the Task is a VerificationTask, see Figure 5.6.
DeploymentTask
A DeploymentTask is a Task that deploys a MLModelObject in the specified environment
to make it accessible for inference jobs. It may modify the status of a MLModelObject to
deployed. The deployment operation may send the MLModelObject as a file to an endpoint
or write it to a specific directory within the ModelRegistry and provide its identifiable path.
A DeploymentTask object shares the same shape as the Task, which is a rectangle that has
rounded corners. However, there is an additional icon representing a deployment operation
in the upper left corner of the shape that indicates that the Task is a DeploymentTask, see
Figure 5.6.
DeprecationTask
A DeprecationTask is a Task that retires a deployed MLModelObject and removes it from
its accessible endpoint. It may modify the status of a MLModelObject to deprecated. Usually,
a DeprecationTask is the answer to the presence of a new better performing model. The
deprecated MLModelObject can be archived in the ModelRegistry. A DeprecationTask object
shares the same shape as the Task, which is a rectangle that has rounded corners. However,
there is an additional icon representing a deprecation operation in the upper left corner of
the shape that indicates that the Task is a DeprecationTask, see Figure 5.6.
InferenceTask
An InferenceTask is a Task that generates a prediction, i.e. InferenceResult, by accessing
a deployed MLModelObject and using it to run its prediction capability on the InferenceRe-
questDataset. In FaaS-based systems the MLModelObject is loaded from the ModelRegistry
into the serverless function to compute the prediction. In case of offloaded jobs, the MLM-
odelObject may not have to be loaded and can be directly accessed, i.e. no connection to
a ModelRegistry is required. An InferenceTask object shares the same shape as the Task,
which is a rectangle that has rounded corners. However, there is an additional icon repres-
enting a inference operation in the upper left corner of the shape that indicates that the
Task is an InferenceTask, see Figure 5.6.
66
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
ModelSelectionTask
A ModelSelectionTask is a Task that chooses between several MLModelObjects based on
some criteria in order to select the most appropriate one to fulfil the prediction task. It then
propagates the request to the corresponding ML model endpoint or in case of a purely FaaS-
based solution, it may propagate the request alongside the selected MLModelObject identifier
to the next task. A ModelSelectionTask object shares the same shape as the Task, which
is a rectangle that has rounded corners. However, there is an additional icon representing
a selection operation in the upper left corner of the shape that indicates that the Task is a
ModelSelectionTask, see Figure 5.6.
• Example: A complex ML anomaly detection system provides specialized machine
learning models for certain anomaly categories. The categories have been formed
by means of an unsupervised clustering algorithm. Prior to executing the Inferen-
ceTask, the InferenceRequest is assigned to a cluster. The ModelSelectionTask then
identifies the corresponding MLModelObject (or endpoint) and propagates the Infer-
enceRequestDataset accordingly after which the InferenceTask is executed.
ExplanationTask
An ExplanationTask is a Task that justifies, i.e. explains, a decision for a specific pre-
diction, i.e. InferenceResult. It thus focuses on local explainability of a MLModelObject.
Depending on the implemented method the task may source the MLModelObject that gen-
erated the InferenceResult alongside the InferenceRequestDataset. The ExplanationTask
produces a Document of type ModelExplanation. An ExplanationTask object shares the
same shape as the Task, which is a rectangle that has rounded corners. However, there is an
additional icon representing a model explanation operation in the upper left corner of the
shape that indicates that the Task is an ExplanationTask, see Figure 5.6.
MonitoringTask
An MonitoringTask is a Task that can investigate various LogObjects and MetadataOb-
jects which are sourced from the LogRepository and MetadataRepository. Depending on
its implementation, the task may analyse data and model statistics to identify for instance
prospective data drift or concept drift. It can produce new Model & Data Statistics. Addi-
tionally, it is the main activity to trigger subsequent events relating to critical ML workflow
situations, i.e. data or concept drift. A MonitoringTask object shares the same shape as
the Task, which is a rectangle that has rounded corners. However, there is an additional
icon representing a monitoring operation in the upper left corner of the shape that indicates
that the Task is a MonitoringTask, see Figure 5.6.
5.2.3 Events
As explained in section 2.5 events can be differentiated into Start, Intermediate and End
events. They can further be classified into Catch and Throw events, the former catching a
trigger and the latter throwing it. Start and Intermediate Catch events are of type Catch
whereas Intermediate Throw and End events are of type Throw [11, p.233]. The OMG
standard comes with twelve events defined via their EventDefinition with exception of a
NoneEvent [11, p.238ff]. The proposed BPMN4sML extension adds fifteen new events to
the BPMN 2.0.2 standard (see Figure 5.2). The corresponding notation is depicted in
Figures 5.9 and 5.10.
DataSourceEvent
The DataSourceEvent is an Event triggered by the presence of new raw data detected in
the environment of the raw data provider. It is fired upon detection of the raw data. This
event can be of type Start Event and Intermediate Catch.
67
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
Source
Dataset Update
Requirement Update
Data Drift
Concept Drift
Performance Deficit
Verification
Verification Failure
Deployment
Deprecation
Inference
Operation Degradation
Job Offloading
68
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
Credit
DatasetRepository Default
Dataset
FeatureEnrichment
Task
Customer Default Credit Default Dataset
feature set updated updated
Customer Customer
Default Spending
Features Behaviour
FeatureSetRepository
• Example: In an online mobile game a new match has just concluded. Data on
each team, their actions taken, the final score as well as system information become
available, triggering the DataSourceEvent. Subsequently, the DataSourcingTask is
automatically run to ingest the raw data into the RawDataRepository.
RawDataUpdateEvent
The RawDataUpdateEvent is an Event triggered by an update of a RawDataObject in
the RawDataRepository. It is fired as soon as the new raw data file is written to the storage
and its identifier becomes available. This event can be of type Start Event and Intermediate
Catch.
FeatureSetUpdateEvent
The FeatureSetUpdateEvent is an Event triggered by an update of a FeatureSetObject in
the FeatureSetRepository. It is fired as soon as the new feature set is written to the storage
and its identifier becomes available. This event can be of type Start Event and Intermediate
Catch.
DatasetUpdateEvent
The DatasetUpdateEvent is an Event triggered by an update of a DatasetObject in the
DatasetRepository. It is fired as soon as the new dataset is written to the storage and its
identifier becomes available. This event can be of type Start Event and Intermediate Catch.
69
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
RequirementUpdateEvent
The RequirementUpdateEvent is an Event triggered by a change or update of a Re-
quirementDocument in the MetadataRepository. It is fired as soon as the new requirement
document presents itself in the process environment. This event can be of type Start Event
and Intermediate Catch.
DataDriftEvent
The DataDriftEvent is an Event triggered by a detected increasing skew between the
MLDatasetObject used for learning and verifying the model and the InferenceRequestData-
setObject, i.e. the data that the deployed ML solution receives for prediction. Typically,
in consequence of a data drift the deployed model needs to be re-learned (i.e. trained or
tuned) on the latest available TrainDatasetObject. Possible intervention of a domain expert
is necessary to update the data preparation tasks. The DataDriftEvent can be of type Start
as well as of type Intermediate (throw and catch).
• An audio streaming media service provider trained a machine learning model to sug-
gest songs to users based on songs that other users, who liked similar songs, listened
to. After expanding its user base, a change in the user demographic occurs and also
more senior users access the services. The song recommendation is no longer account-
ing for the new demographic. A DataDriftEvent catches this change. Subsequently
information are sent to a domain-expert to facilitate their intervention. Note that the
event is of type Start as the data drift initialises the process, see Figure 5.12.
ConceptDriftEvent
The ConceptDriftEvent is an Event triggered by a detected changing relationship between
the explanatory variables and the target variable used for a MLModelObject in a supervised
learning setting. Typically, in consequence of a concept drift the referenced TrainDatasetO-
bject needs to be updated and the model needs to be re-learned (i.e. trained or tuned). The
ConceptDriftEvent can be of type Start as well as of type Intermediate (throw and catch).
• Example: A monitoring service detects a concept drift which subsequently triggers
an intermediate throwing ConceptDriftEvent. The corresponding catch event registers
the concept drift and initializes subsequent activities.
PerformanceDeficitEvent
The PerformanceDeficitEvent is an Event triggered by a generated PerformanceDeficit
Document that informs on what happened during the model learning phase that led to
no model meeting the requirement constraints. The event can re-route an automated ML
workflow to allow for intervention of a domain expert. The PerformanceDeficitEvent is an
Intermediate event of type Catch.
• Example: After an extensive model tuning operation, the ML pipeline is still unable
to create a good-enough ML solution. Consequently, a PerformanceDeficitReport is
generated which is caught by a PerformanceDeficitEvent and propagated to a domain
70
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
expert to request for intervention. Note that in this case the PerformanceDeficitRe-
port does not have to be explicitly connected to the PerformanceDeficitEvent as it is
fired once the report presents itself in the ML workflow. The snippet containing the
PerformanceDeficitEvent is illustrated in Figure 5.13.
VerificationEvent
The VerificationEvent is an Event triggered by the presence of a newly trained MLDocu-
mentObject in a ModelRegistry. It can initialize a VerificationTask or request for a confirm-
ation by a domain expert. The VerificationEvent can be a Start event or an Intermediate
event of type Catch.
• Example: A trained RandomForest model is written to the ModelRegistry for sub-
sequent access. Once registered a notification is produced informing about the exist-
ence of the new trained RF model. A VerificationEvent catches the trained model ID
and initializes a VerificationTask to fire-proof the MLModelObject prior to its deploy-
ment.
VerificationFailureEvent
The VerificationFailureEvent is an Event triggered by a generated VerificationResult
Document that informs on why a ML model verification by means of a VerificationTask
was unsuccessful. The event fires as soon as a new VerificationResult presents itself in the
process environment. The VerificationFailureEvent is an Intermediate event of type Catch.
• Example: A tuned RandomForest model performed well throughout the ML learning
phase and complied with the defined requirements when evaluating it on the Valid-
ationDatasetObject. After it is written to ModelRegistry and the VerificationTask is
conducted referencing a VerificationDatasetObject containing a data sample on which
the RF model no longer performs sufficiently well. Consequently, a VerificationDoc-
ument is generated and caught by a VerificationFailureEvent. The report is then
propagated as a Message to a domain expert similar to the previous example of the
PerformanceDeficitEvent.
DeploymentEvent
The DeploymentEvent is an Event that can be triggered by information of the ModelRe-
gistry such as the presence of a newly verified MLDocumentObject. Once the ID of the
verified model is available the event is fired. The DeploymentEvent can be of type Start as
well as of type Intermediate (Throw and Catch).
DeprecationEvent
The DeprecationEvent is an Event that can be triggered as response to information cap-
tured throughout the ML lifecycle. Once the identifier of the deployed MLModelObject is
received, the event is fired. It can be used to initialize a DeprecationTask. The Depreca-
tionEvent can be of type Start as well as of type Intermediate (Throw and Catch).
71
CHAPTER 5. BPMN4SML: BPMN FOR SERVERLESS MACHINE LEARNING
MetadataRepository
InferenceRequestDataset
(user information)
InferenceResult
Predict Products
Product
Prediction
Model
[deployed]
ModelRegistry
InferenceEvent
The InferenceEvent is an Event triggered by the presence of a new InferenceRequestData-
Set. The InferenceEvent can be a Start event or an Intermediate event of type Catch.
• Example: A user arrives on the front page of an online web-shop automatically
producing a set of user specific datapoints. The InferenceEvent captures the new In-
ferenceRequestDataSet and initializes the InferenceTask of the recommendation engine
to showcase products that the user is most likely to be interested in. Note that an
Inference Start Event is used since the prediction process starts with the reception of
the InferenceRequestDataSet. The example is depicted in Figure 5.14
OperationDegradationEvent
The OperationDegradationEvent is an Event triggered by detected operational under-
performance of a deployed MLModelObject. The event can be of type Start as well as of
type Intermediate (throw and catch).
JobOffloadingEvent
The JobOffloadingEvent is an Event triggered by JobData specifying information about
an offloaded job. The JobOffloadingEvent is an Intermediate event of type catch.
72
Chapter 6
The proposed BPMN4sML extension enables end-users to design serverless machine learn-
ing workflows based on an accepted process modeling standard. This helps in realizing the
main objective of standardized modeling, analysis and communication of ML workflows in
a technology independent and interoperable manner. To now also facilitate the serverless
deployment orchestration of generated BPMN4sML diagrams in a technology-agnostic man-
ner, TOSCA as an OASIS standard can be leveraged. As elaborated upon in section 2.6.2,
TOSCA enables creation, automated deployment and management of portable cloud ap-
plications and supports declarative deployment modeling [103]. Further, recent TOSCA
extensions form first steps towards support for FaaS-based applications and their choreo-
graphy [17] as well as limited support for function orchestration [9]. A TOSCA deployment
model in form of a service template realizes representation of service components as nodes
as well as their relationships and configuration. Respective semantics of both nodes and
relationships further define functionality, i. e. attributes, properties, requirements and cap-
abilities and corresponding interfaces. Consequently, TOSCA is an ideal candidate solution
that in combination with an orchestrator such as xOpera allows to execute and deploy
modeled topologies on specified resources. The missing link between BPMN4sML model
diagrams and TOSCA deployment models is a mapping on how to relate the BPMN4sML
tasks, artefacts, events and control flow to corresponding TOSCA elements.
To provide orientation, section 6.1 aggregates the entire process as a nominal sequence of
activities starting with serverless ML workflow modeling using BPMN4sML and arriving at
a deployment model realized by TOSCA. Following, section 6.2 elaborates upon the choices
made to realize respective mapping rules towards a potential conceptual model-to-model
mapping. Subsequently, section 6.3 presents the identified mapping.
Note that the scope and functionality of the conceptual mapping and deployment mod-
eling is not covering the full and automated conversion from BPMN4sML to TOSCA service
templates. First, certain architectural design choices for deployment need to be specified by
the modeler. Second, currently availble TOSCA node types and relationship types require
extension for different kinds of event flow logic, function chaining and representation of of-
floaded services (ML platforms). This exceeds the scope and time of this work and thus only
allows showcasing the mapping and transformation on a conceptual level and not on a fully
implemented one for BPMN4sML models which results in a limitation and opportunity for
future research.
73
CHAPTER 6. CONCEPTUAL MAPPING FOR BPMN4SML TO TOSCA
CONVERSION
TOSCA TOSCA
Conceptual
BPMN4sML Modeling Deployment
Mapping
Tool Technology
BPMN4sML
Model
using using using using
TOSCA CSAR
Elements
Figure 6.1: Method for technology independent and interoperable modeling of ML workflows
& their serverless deployment orchestration
74
CHAPTER 6. CONCEPTUAL MAPPING FOR BPMN4SML TO TOSCA
CONVERSION
target which are to be related. Mappings that are built on the metamodel level of both
languages can then be applied to the model level to transform actual model instances from
source to target [147].
To arrive at a model to model transformation one may build conceptual mappings and
subsequently define transformation rules that select which mapping to apply for a given case
by either analysing and relating the respective elements of each modeling language manually
or through pre-defined detecting measurements [146]. Wang et al. propose an automated
detection approach based on semantic and syntactic checks to derive model mappings that
are built among the models’ elements and the group of properties they contain [147]. Between
models each possible pair and combination is iterated over so that the association (pair of
elements) with the highest identified match (semantic & syntactic value) can be considered
a mapping. Adopting such an approach or the one brought forward by Jouault et al [144]
exceeds the bounds of this thesis. Instead the former, i.e. a manual analysis, between
elements and properties of each metamodel, is applied to generate a conceptual mapping.
To do so, the syntax, describing the language’s rules and relationships between available
elements, as well as the semantics that establish their meaning in the domain context are
considered.
75
CHAPTER 6. CONCEPTUAL MAPPING FOR BPMN4SML TO TOSCA
CONVERSION
Table 6.1: Explicit and implicit mappings between BPMN4sML and TOSCA metamodels
TOSCA TOSCA
Nr. BPMN4sML
(explicit mapping) (implicit mapping)
1 FaaSTask (and inherinting TOSCA Node Type (e.g.
elements such as VotingTask technology-specific
etc.) AwsLambdaFunction Node
Type)
2 FaaSTask name/configuration TOSCA Node Properties
; OffloadedTask script (translates to any of the
available node properties - for
Function Node Type e.g.
runtime, memory, timeout and
function provider specific
properties etc.)
3 FaaSTask platform / TOSCA Node Type +
OffloadedTask Relationship Type to the
offloadingTechnology actual Node that the
BPMN4sML task represents
(e.g. cloud provider / edge
Node as Node Type +
hostedOn Relationship Type
to the hosted Node)
76
CHAPTER 6. CONCEPTUAL MAPPING FOR BPMN4SML TO TOSCA
CONVERSION
TOSCA TOSCA
Nr. BPMN4sML
(explicit mapping) (implicit mapping)
4 FaaSTask script TOSCA Node Property
(specifying a path to a zip file
attached as TOSCA
Implementation Artefact)
5 Instance of FaaSTask, TOSCA Node Template with
OffloadedTask (or subclasses) values for node properties
6 OffloadedTask (MLPlatform) TOSCA Node Type
(translates to different Types
depending on implementation
such as 1) edge Node Type for
application component on
edge; or 2) ML cloud service
Node Type for ML platform
e.g. AwsSageMaker - note that
cloud service specific ML
platform Node Types currently
have no TOSCA counterpart
and require extension)
7 DataRepository ; TOSCA Node Type (e.g. AWS
MetadataRepository ; DynamoDB Table, AWS S3
LogStore ; ModelRegistry Bucket, Google Cloud Bucket
etc.); Repositories and
Registries can potentially be
realized as one node with
different directories
8 Platform (DataRepository ; TOSCA Node Type +
MetadataRepository ; Relationship Type (hostedOn)
LogStore ; ModelRegistry) to the actual Node
representing the BPMN4sML
data artefact - similar to
mapping 4
9 BPMN DataAssociation TOSCA Relationship Type
(connecting Tasks with (connectsTo)
DataStores)
10 BPMN4sML Event Conditional semantic Potentially requires additional
equivalence with TOSCA TOSCA Node Type to realize
Relationship Type or TOSCA TOSCA Relationship Type
Node Type (depends on actual (e.g. Node for Bucket or
implementation of event flow Notification service)
and type of BPMN4sML event
- difference between function
orchestration and event-driven;
e.g. DatasetUpdateEvent can
equal connectsTo Relationship
Type with S3 trigger
specification or Node Type
such as S3TriggeredFunction)
- see Wurster et al. [17] and
Radon Particles
77
CHAPTER 6. CONCEPTUAL MAPPING FOR BPMN4SML TO TOSCA
CONVERSION
TOSCA TOSCA
Nr. BPMN4sML
(explicit mapping) (implicit mapping)
11 BPMN4sML process TOSCA Node Type (e.g. for In case of function
(connecting Tasks with Tasks function orchestration realized orchestration: TOSCA
+ workflow patterns) as abstract Workflow Node Function Orchestration Node
Type or as technology-specific Type connects via TOSCA
AwsSFOrchestration or Relationship Type to the
AzureOrchestrating Function orchestrated Nodes, can
Node Type); Function require further Node Types for
Orchestration further requires realization of Function
an orchestration file that can Orchestration Node Type
be derived from a BPMN depending on technology
process model as shown by provider - see Yussupov et
Yussupov et al. [9]; al. [9]; Also necessitates a
Alternative: specify workflow hostedOn Relationship Type
logic as event-driven via to the cloud platform
Relationship Types or Node represented by a Node Type
Types but requires TOSCA (similar to mapping 4)
extension
BPMN4sML TOSCA
Hold-out_Split
(AwsLambdaFunction)
Name: SplitData
[..] DatasetRepository
(AwsS3Bucket)
Name: TrainData
[..]
Figure 6.2: Mapping example of a BPMN4sML workflow fragment (left) to a TOSCA de-
ployment model (right)
78
CHAPTER 6. CONCEPTUAL MAPPING FOR BPMN4SML TO TOSCA
CONVERSION
79
Chapter 7
Validation
To further validate the proposed BPMN4sML extension artefact and the conceptual mapping
towards deployment, two illustrative use cases are referenced. A modeling example inspired
by Alipour et al. [138] is considered that encompasses a simple yet convincing ML applica-
tion, showcasing the functionality and feasibility of BPMN4sML to describe ML workflows.
The authors present a self-managing service architecture for cloud data services that lever-
ages serverless functions and machine learning models to predict future workload of said
data services. The prediction informs decision-making for better resource-provisioning. Sec-
tion 7.1 elaborates on the setting of the use case and presents the corresponding BPMN4sML
diagram.
Next to validating the potential of representing serverless ML workflows through BPMN4sML,
a minimalistic example illustrating the application of the established conceptual mapping to
derive a TOSCA service template is demonstrated in section 7.2. The example references a
simplified instance of a home credit default risk prediction challenge published on Kaggle by
the Home Credit Group [151]. Note that at the moment of writing, the respective available
Ansible roles corresponding to the available TOSCA node Types and Relationship Types
provided by Yussupov et al. [9] that this work draws from are not fully aligned with the
updates on serverless functions made by the respective cloud providers and result in erro-
neous deployment orchestration. Updating the Ansible Roles as well as TOSCA Node Types
exceeds the scope and is left to future research.
• Storage - stores sourced monitoring data that serve as ML model training samples;
translates to DatasetRepository
• ML service - wrapper for offloaded training jobs on ML platform. Note that the pro-
posed implementation leverages Amazon Machine Learning which has been replaced by
80
CHAPTER 7. VALIDATION
Model Storage
Prediction
Algorithm 1
Service
Micro Service
Controller ML Service Cross Validation
Monitoring
Algorithm 2
Service
Data Storage
Figure 7.1: Illustrative example 1 - Conceptual service architecture for online machine learn-
ing
Example adjusted and drawn from Alipour et al. [138].
AWS SageMaker to provide easier access and direct endpoints; translates to offloaded
TrainingTask and an optional OffloadingJobEvent depending on service provider (i.e.
if ML platform allows direct integration with serverless functions or communication
via notification services)
81
CHAPTER 7. VALIDATION
datasets - one used to train the models with and one (the logs of the last minutes) to predict
the future workload. As the focus lies on the machine learning section, the subsequent pro-
visioning operations are hidden within a closed resource manager that adjusts the required
resources for the data services based on the latest inference result. In this modeling example
the different lanes depict the services specified by Alipour et al. [138], see Figure 7.1. The
Controller service is no longer necessary as the modeled workflow replaces its logic.
82
Resource Manager
Multinomial
Multinomial TrainSet
LogReg
LogReg
EvalResult
LinearReg
EvalResult LinearReg
TrainSet
MetadataRepo
[AWS S3]
Prediction
Set DatasetRepo
[AWS S3]
Evaluation
Results
Prediction Component
ModelRegistry
[AWS
DynamoDB]
DatasetRepo
[AWS S3]
Prediction
Set
Usage
Metrics TrainSet
Data Preparation
Source Resource
Split into train and
Demand Metrics
prediction set
83
CHAPTER 7. VALIDATION
AWS CloudWatch
Figure 7.2: Illustrative example 1 - Referenced use case depicted as BPMN4sML model
CHAPTER 7. VALIDATION
LoanApplication
Testset
FeatureSetRepository
[AWS S3]
Home Credit Default Machine Learning Pipeline
DatasetRepository
[AWS S3]
LoanApplication
FeatureSet LoanApplication
TrainSet
Eliminate and Verify AUC
Source Loan Hold-out Split into Tune RF Model via Deploy tuned RF
Recode Features Performance
Application Data train and test Grid Search model
RF
Model
LoanApplication [trained]
Dataset RF
External
Data Store Model
[trained]
DatasetRepository ModelRegistry
[AWS S3] [AWS S3]
RF Model
[deployed]
Figure 7.3: Illustrative Example 2 - simplified BPMN4sML model for home credit default
machine learning pipeline
84
CHAPTER 7. VALIDATION
as a feature. Moreover, categorical features are dummy encoded. The updated feature set
is then written as a dataset to a dataset repository. Subsequently, a data split task divides
the dataset into a training and test set by performing a simple 80:20 holdout split. The new
datasets are again written to the dataset repository. Next, a Random Forest (RF) learner is
tuned on the training dataset via a small GridSearch testing various hyperparameter values
for the number of trees per model and their respective depths. The tuning routine is defined
as a 5-fold cross validation and applies ROC-AUC as a scoring metric. The best performing
Random Forest model is directly written to the model registry. Afterwards, a verification
task verifies the RF model performance on the available test data. Again, ROC-AUC is
measured. Finally, the deployment task writes the verified model as a deployed model to
a new access point, i.e. directory of the model registry, through which other applications
and services can retrieve it. To simplify this workflow, some considerations were made. We
omit an extensive data preparation and tuning routine. Further, the verification task does
not consider potential verification failure that might trigger a verification failure event or
request for intervention. Overall the entire workflow is kept light so that it can be realized
solely on serverless functions as it would otherwise necessitate further TOSCA extension to
represent ML platforms by cloud providers as TOSCA Node Types.
platform.
85
CHAPTER 7. VALIDATION
Home Default
Prediction Pipeline
(AwsSFOrchestartion)
Name: DefaultPred
[..]
Name: SourceData Name: EngineerFeats Name: SplitData Name: TuneRF Name: VerifyRF Name: DeployModel
[..] [..] [..] [..] [..] [..]
ModelDataStorage
(AwsS3Bucket)
Name: ModelDataStorage
[..] Legend NodeTemplate Name
(Node Type Name)
connectsTo Property: Value
Configurations [..]
AwsSFOrchestrates
(AwsPlatform)
hostedOn
Region: eu-central-1
Figure 7.4: Illustrative Example 2 - TOSCA Topology Template of BPMN4sML home credit
default ML pipeline
Figure 7.5: Illustrative Example 2 - direct execution of the modeled workflow on AWS via
Step Functions
86
CHAPTER 7. VALIDATION
87
Chapter 8
This study proposes three artefacts as significant contributions, two core artefacts and a
second-order one. First, it presents a metamodel extension to BPMN based on a preceding
extensive requirement analysis. Second, it brings forward constructs corresponding to the
metamodel and incorporating the necessary modeling elements, extending the BPMN nota-
tion and respective semantic. Together, the two artefacts form the proposed BPMN4sML
extension addressing the requirements of machine learning workflows as well as character-
istics stemming from their serverless deployment. Third, this thesis establishes a conceptual
mapping between BPMN4sML and TOSCA elements with a focus on serverless deployment
orchestrations. The artefacts are consolidated as a method describing a process for techno-
logy independent and interoperable modeling of (serverless) ML workflows via BPMN4sML
and their mapping to TOSCA for subsequent deployment. In this chapter, the findings
of the requirement analysis and artefacts are discussed and interpreted in context of the
research questions.
Machine Learning
Different interpretations exist on what constitutes a machine learning workflow [37, 42, 41].
We establish basics on machine learning as a domain and organize machine learning processes
along the ML lifecycle phases, see sections 2.1, 2.2. We consider requirement analysis, data
management, model learning, model verification, model deployment and model monitoring
and inference as phases relevant to the derivation of specific requirements for the conceptual
modeling of ML, see section 4.1. Each phase is characterized by a set of activities shared
across ML processes. Nonetheless, activities do not pertain solely to one phase. Further, each
phase produces and consumes data artefacts such as machine learning models, data sets and
other code, configuration, result or information objects. Besides, events and decision points
can influence the respective ML lifecycle phase, making the whole processes potentially
iterative.
We further observe that machine learning is a vast and quickly developing field of re-
search. Thus, the scope of this work constrains the extend to which requirements specific to
ML methodologies can be considered. For example, while particularities of federated learn-
ing are identified, they are not further considered in the creation of the extension artefact.
However, future integration of the outlined concepts is supported.
88
CHAPTER 8. DISCUSSION AND LIMITATIONS
Serverless Computing
Realizing a machine learning workflow with serverless computing can be achieved mainly
via two options, 1) function composition and 2) function orchestration, see section 2.3.1.
Moreover, the inherent characteristics of serverless computing - 1) utilization-based billing,
2) limited control, 3) event-driven - and of Function-as-a-Service - 1) statelessness, 2) fixed-
memory, 3) short running time - require job offloading in the case of a more compute-intensive
ML activity, see section 4.2.
This study concentrates on an approach for generic modeling of ML workflows and their
serverless deployment orchestration. Accordingly, creation of frameworks to manage for
instance resource provisioning and mitigate the serverless / FaaS characteristics in new
ways as shown by Carreira et al. [112] or Jiang et al. [116] is not pursued. Instead, we refer
to a combination of BPMN and TOSCA to realize our objective.
Modeling
Making machine learning more accessible to involved stakeholders within an organization
is an ongoing challenge. By reviewing process modeling fundamentals as well as poten-
tial modeling languages and basic workflow concepts in sections 2.4.2, 2.5, we identify the
Business Process Model and Notation as the de-facto standard and reference it henceforth.
BPMN and BPMN extensions however do not offer the required constructs to fully repres-
ent machine learning workflows as we explain in section 2.7. Similarly, existing literature on
standard-based deployment of ML workflows is scarce.
The absence of a fitting modeling language and notation poses large obstacles for non-
technical stakeholder to get involved and for technical stakeholders to communicate and
design workflows in a consistent manner [18, 2]. Further, integration of a ML process into
the overall process infrastructure and environment of an organization is difficult [8]. Also
technical stakeholders such as data scientists and ML engineers are confronted with an
abundance of potential tools and offerings, making the modeling and serverless deployment
orchestration of ML workflows in a technology independent and interoperable manner a chal-
lenge. Therefore, abstraction is necessary to be able to represent ML concepts irrespective
of chosen provider or technology.
Synthesizing the literature review and requirement analysis, we identify 57 requirements
needed to enable fundamental conceptual modeling of (serverless) ML workflows. Ten iden-
tified requirements can be related directly to existing BPMN concepts or concepts proposed
by BPMN extensions. Representing the remaining ones requires extension.
89
CHAPTER 8. DISCUSSION AND LIMITATIONS
are provided as a BPMN notation extension which allow to visually represent the new
elements of the metamodel extension, see section 5.2. In doing so, we lay the foundation to
tackle technology independent and interoperable modeling of (serverless) ML workflows.
Specifically, BPMN4sML facilitates modeling of machine learning tasks, event streams,
data objects and repositories across the entire ML lifecycle. It supports the large part of
supervised and unsupervised machine learning and accounts for various methodologies such
as ensemble or transfer learning.
Further, several ways of implementing and deploying the workflow are considered (serverless,
offloaded, hybrid), enabling modelers to differentiate both conceptually and visually between
entirely FaaS-based solutions or ML services leveraging specific machine learning platforms.
Notably, while this work focuses on serverless machine learning workflows, the identified
requirements, conceptualization, notation and semantics are generalizable (or can be easily
generalized) to ML workflows outside of the domain of serverless computing. Moreover, the
identified ML concepts and notation are applicable to largely represent machine learning
workflows as they are advocated by main cloud providers such as Google Cloud [44, 129],
Amazon Web Services [130, 134, 132] and Microsoft Azure [133, 42] by incorporating a
similar (or more extensive) list of modeling elements.
With BPMN4sML, we further answer to the call for research by Lukyanenko et al.[8].
When modeling ML workflows, ML engineers, data scientists and other process analysts
can minimize ambiguity by referring to the most expressive core modeling elements which
best portray the machine learning workflow instead of overloading the model diagrams with
redundant text annotations to describe an element. Consequently, when implementing a
model, ideas can be conveyed precisely, reducing need for clarifications and thereby also
time and concomitant costs.
Leveraging common process modeling, practitioners can increase business understanding
with respect to ML projects. Moreover, by visualizing a ML workflow as a process model,
transparency and comprehensibility can be improved. Each task, data artefact and decision
can be unambiguously presented. As a result, BPMN4sML also contributes to the field of
explainable artificial intelligence by providing overview and insight into the mechanics of a
ML solution.
Nonetheless, the current version of BPMN4sML does not allow modeling of the entire
domain of machine learning. Extending it further with respect to specific ML methodologies
is necessary. Moreover, this work focused on the conceptualization of the artefact. As a
result, schema descriptions corresponding to the metamodels are still required and will have
to be added in the future. As with any new modeling extension, BPMN4sML requires
machine learning practitioners and stakeholders to first learn and understand the notation
in order to reap its benefits. Thus, communicating directly with such domain experts and
gathering feedback still is paramount to a successful adoption.
As shown in our first illustrative scenario in section 7.1, using BPMN4sML one can
describe end-to-end machine learning workflows in a standardized and coherent manner. It
is however noteworthy that ML workflows in general operate on a large set of different data
artefacts. Further, by focusing on serverless deployment most tasks require a connection to
one or several of the introduced repositories. Hence, a modeler may need to continuously
balance the expressiveness of a BPMN4sML model with element overload in order to only
depict the core set of required data artefacts. BPMN4sML supports this by offering a concise
notation and specific semantics.
90
CHAPTER 8. DISCUSSION AND LIMITATIONS
via BPMN4sML and their corresponding serverless deployment orchestration via TOSCA.
Referencing TOSCA as a standard helps in adhering to technology independence and
interoperability with respect to serverless deployment orchestration. Moreover, it allows us
to leverage an extensive body of existing work on declarative deployment modeling Draw-
ing from our proposed method can ultimately enable ML engineers to speed up the time it
takes from ideating ML solutions to deploying them and, by extension, shortens the time-to-
market. Nevertheless, to realize this objective further work is necessary in automating the
BPMN4sML to TOSCA conversion. In section 6.1 we elaborate that this relates to three
pillars - 1) expressing BPMN4sML models as XML documents, 2) potentially incorporating
functionality to directly derive workflow logic as a function orchestration file and 3) accord-
ingly extend TOSCA Node and Relationship Types as well as corresponding Ansible roles.
Realizing this was constrained by the scope and time of this thesis.
Our illustrative scenario 2 in section 7.2 showcases the potential to map a BPMN4sML
model to a TOSCA deployment model. It however also highlights the need for more work
on the side of TOSCA to fully support deployment of serverless workflows. In particular
with respect to serverless computing, current TOSCA solutions are limited. Yussupov et
al. [9] and Wurster et al. [17] take first steps in this direction. Nevertheless, the latter one’s
proposition does not support serverless workflows and the former one’s necessitates further
extension and maintenance to restore its functionality. Similarly, TOSCA propositions as
part of the RADON project [150] are not directed towards serverless workflow orchestration.
Further, to fully realize the proposed method, new TOSCA Node and Relationship Types
are required for representing offloading technologies such as machine learning platforms.
Alternatively, also other tools, albeit not necessarily standards, can be experimented with
to circumvent current TOSCA shortcomings. Terraform [152] may be another solution
towards cloud infrastructure automation. Similarly, the Serverless Workflow Specification
may aide in better realizing function orchestrations.
91
CHAPTER 8. DISCUSSION AND LIMITATIONS
92
Chapter 9
Conclusion
This thesis researches how machine learning workflows and their serverless deployment or-
chestration can be modeled in a technology independent and interoperable manner via exist-
ing standards (BPMN and TOSCA). As a result, we propose the specification BPMN4sML1
(Business Process Model and Notation for serverless Machine Learning) as an extension to
BPMN 2.0 to support machine learning engineers and other process analysts in modeling
ML processes. Our extension addresses the various challenges that practitioners face when
describing a (serverless) machine learning workflow with the existing standard by:
• Conceptualizing the machine learning lifecycle and workflow by means of a metamodel.
• Developing a new, more accurate and meaningful notation and semantic to better
represent machine learning concepts and artefacts.
• Aiding both technical and managerial personnel to describe, communicate and under-
stand ML workflows in a consistent and transparent manner.
• Taking a first step towards a standardized depiction of machine learning solutions
which can be further extended for upcoming machine learning methodologies.
We realize this by conducting a thorough literature review and extensive requirements ana-
lysis of the core machine learning domains as well as the relevant paradigms for serverless
computing and process and deployment modeling. We further propose a conceptual map-
ping from the introduced BPMN4sML elements to corresponding TOSCA elements show-
casing the potential of directly deriving technology specific deployable topology templates
from a generic machine learning process model. We exemplify our findings by means of
example workflow snippets and two illustrative use cases inspired by 1) a publication and
2) a real-world Kaggle challenge. The BPMN4sML notation was created leveraging draw.io
/ diagrams.net and can be imported in form of a first version as XML library2 .
93
CHAPTER 9. CONCLUSION
94
Bibliography
[1] Gil Press. Andrew Ng Launches A Campaign For Data-Centric AI. Forbes, 2021.
[2] Rohit Panikkar, Tamim Saleh, Maxime Szybowski, and Rob Whiteman. Operational-
izing machine learning in processes, 2021.
[3] Sam Ransbotham, David Kiron, and Pamela Prentice. Beyond the hype: The hard
work behind analytics success. MIT Sloan Management Review, 57:6–6, 2016.
[4] Gharib Gharibi, Vijay Walunj, Raju Nekadi, Raj Marri, and Yugyung Lee. Automated
end-to-end management of the modeling lifecycle in deep learning. Empirical Software
Engineering, 26, 3 2021.
[5] Joost Verbraeken, Matthijs Wolting, Jonathan Katzy, Jeroen Kloppenburg, Tim Ver-
belen, and Jan S. Rellermeyer. A survey on distributed machine learning. ACM
Comput. Surv., 53(2), March 2020.
[9] Vladimir Yussupov, Jacopo Soldani, Uwe Breitenbücher, and Frank Leymann.
Standards-based modeling and deployment of serverless function orchestrations us-
ing BPMN and TOSCA. Software: Practice and Experience, 1 2022.
[10] Pedro Garcı́a López, Marc Sánchez-Artigas, Gerard Parı́s, Daniel Barcelona Pons,
Álvaro Ruiz Ollobarren, and David Arroyo Pinto. Comparison of FaaS Orchestra-
tion Systems. In 2018 IEEE/ACM International Conference on Utility and Cloud
Computing Companion (UCC Companion), pages 148–153, 2018.
[11] OMG. Business Process Model and Notation (BPMN), Version 2.0.2, 2014.
[12] Alaaeddine Yousfi, Christine Bauer, Rajaa Saidi, and Anind K. Dey. UBPMN: A
BPMN extension for modeling ubiquitous business processes. Information and Soft-
ware Technology, 74:55–68, 6 2016.
[13] Raquel M. Pillat, Toacy C. Oliveira, Paulo S.C. Alencar, and Donald D. Cowan.
BPMNt: A BPMN extension for specifying software process tailoring. volume 57,
pages 95–115. Elsevier B.V., 2015.
95
BIBLIOGRAPHY
[14] Benjamin Weder, Uwe Breitenbücher, Frank Leymann, and Karoline Wild. Integrat-
ing quantum computing into workflow modeling and execution. In 2020 IEEE/ACM
13th International Conference on Utility and Cloud Computing (UCC), pages 279–291,
2020.
[15] Serverless Workflow Specification Authors. Serverless workflow, 2022. Last Accessed:
2022-05-26.
[16] Richard Braun, Hannes Schlieter, Martin Burwitz, and Werner Esswein. BPMN4CP:
Design and implementation of a BPMN extension for clinical pathways. In 2014 IEEE
International Conference on Bioinformatics and Biomedicine (BIBM), pages 9–16,
2014.
[17] Michael Wurster, Uwe Breitenbücher, Kálmán Képes, Frank Leymann, and Vladimir
Yussupov. Modeling and Automated Deployment of Serverless Applications Using TO-
SCA. In 2018 IEEE 11th Conference on Service-Oriented Computing and Applications
(SOCA), pages 73–80, 2018.
[18] David Sjödin, Vinit Parida, Maximilian Palmié, and Joakim Wincent. How AI capab-
ilities enable business model innovation: Scaling AI through co-evolutionary processes
and feedback loops. Journal of Business Research, 134:574–587, 9 2021.
[19] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Sci-
ence and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006.
[20] Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The
elements of statistical learning: data mining, inference, and prediction, volume 2.
Springer, 2009.
[21] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
[22] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction.
MIT press, 2018.
[23] Xiao Liu, Fanjin Zhang, Zhenyu Hou, Li Mian, Zhaoyu Wang, Jing Zhang, and Jie
Tang. Self-supervised learning: Generative or contrastive. IEEE Transactions on
Knowledge and Data Engineering, 2021.
[24] Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee,
and Fillia Makedon. A survey on contrastive self-supervised learning. Technologies,
9(1), 2021.
[25] Longlong Jing and Yingli Tian. Self-supervised visual feature learning with deep neural
networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence,
43:4037–4058, 2021.
[26] Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu,
Hui Xiong, and Qing He. A comprehensive survey on transfer learning. Proceedings
of the IEEE, 109(1):43–76, 2021.
[27] Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of transfer learning.
Journal of Big data, 3(1):1–40, 2016.
[28] Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. Deep transfer
learning with joint adaptation networks. In Doina Precup and Yee Whye Teh, editors,
Proceedings of the 34th International Conference on Machine Learning, volume 70 of
Proceedings of Machine Learning Research, pages 2208–2217. PMLR, 06–11 Aug 2017.
[29] Steven C.H. Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. Online learning: A compre-
hensive survey. Neurocomputing, 459:249–289, 10 2021.
96
BIBLIOGRAPHY
[30] M. A. Ganaie, Minghui Hu, Muhammad Tanveer, and Ponnuthurai Nagaratnam Sug-
anthan. Ensemble deep learning: A review. ArXiv, abs/2104.02395, 2021.
[31] Xibin Dong, Zhiwen Yu, Wenming Cao, Yifan Shi, and Qianli Ma. A survey on
ensemble learning. Frontiers of Computer Science, 14(2):241–258, 2020.
[32] Omer Sagi and Lior Rokach. Ensemble learning: A survey. Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, 8(4):e1249, 2018.
[33] Brendan McMahan and Daniel Ramage. Federated Learning: Collaborative Machine
Learning without Centralized Training Data . https://ai.googleblog.com/2017/
04/federated-learning-collaborative.html, 2017. Last Accessed: 2022-02-15.
[34] Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Brij B. Gupta,
Xiaojiang Chen, and Xin Wang. A survey of deep active learning. ACM Comput.
Surv., 54(9), oct 2021.
[35] Punit Kumar and Atul Gupta. Active learning query strategies for classification,
regression, and clustering: a survey. Journal of Computer Science and Technology,
35(4):913–945, 2020.
[36] Yu Zhang and Qiang Yang. A survey on multi-task learning. IEEE Transactions on
Knowledge and Data Engineering, pages 1–1, 2021.
[37] Rob Ashmore, Radu Calinescu, and Colin Paterson. Assuring the machine learning
lifecycle: Desiderata, methods, and challenges. ACM Computing Surveys, 54, 2021.
[38] Rüdiger Wirth. CRISP-DM: Towards a standard process model for data mining. In
Proceedings of the Fourth International Conference on the Practical Application of
Knowledge Discovery and Data Mining, pages 29–39, 2000.
[39] Microsoft. The Team Data Science Process. https://docs.microsoft.com/en-us/
azure/architecture/data-science-process/overview. Last Accessed: 2022-04-
24.
[40] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. The KDD Process
for Extracting Useful Knowledge from Volumes of Data. Commun. ACM, 39(11):27–34,
nov 1996.
[41] Samuel Idowu, Daniel Strüber, and Thorsten Berger. Asset management in machine
learning: A survey. 2 2021.
[42] Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece
Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. Software
engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International
Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP),
pages 291–300, 2019.
[43] Görkem Giray. A software engineering perspective on engineering machine learning
systems: State of the art and challenges. Journal of Systems and Software, 180:111031,
10 2021.
[44] Khalid Salam, Jarek Kazmierczak, and Donna Schut. Practitioners guide to MLOps:
A framework for continuous delivery and automation of machine learning, 2021. Last
Accessed: 2022-02-15.
[45] Stamatios-Aggelos N. Alexandropoulos, Sotiris B. Kotsiantis, and Michael N. Vrahatis.
Data preprocessing in predictive data mining. The Knowledge Engineering Review,
34, 2019.
[46] Alice Zheng and Amanda Casari. Feature engineering for machine learning: principles
and techniques for data scientists. ” O’Reilly Media, Inc.”, 2018.
97
BIBLIOGRAPHY
[47] Max Kuhn, Kjell Johnson, et al. Applied predictive modeling, volume 26. Springer,
2013.
[48] Chong Sun, Nader Azari, and Chintan Turakhia. Gallery: A machine learning model
management system at uber. In EDBT, 2020.
[49] Chen Zhang, Yu Xie, Hang Bai, Bin Yu, Weihong Li, and Yuan Gao. A survey on
federated learning. Knowledge-Based Systems, 216, 3 2021.
[50] Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. Federated learning:
Challenges, methods, and future directions. 8 2019.
[51] Qi Xia, Winson Ye, Zeyi Tao, Jindi Wu, and Qun Li. A survey of federated learning
for edge computing: Research problems and solutions. High-Confidence Computing,
1:100008, 6 2021.
[52] Ji Liu, Jizhou Huang, Yang Zhou, Xuhong Li, Shilei Ji, Haoyi Xiong, and Dejing Dou.
From distributed machine learning to federated learning: A survey. Knowledge and
Information Systems, pages 1–33, 2022.
[53] Sin Kit Lo, Qinghua Lu, Hye Young Paik, and Liming Zhu. FLRA: A Reference
Architecture for Federated Learning Systems. In Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioin-
formatics), volume 12857 LNCS, pages 83–98. Springer Science and Business Media
Deutschland GmbH, 2021.
[54] Andreas Grafberger, Mohak Chadha, Anshul Jindal, Jianfeng Gu, and Michael Gerndt.
FedLess: Secure and Scalable Federated Learning Using Serverless Computing. In
Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, pages
164–173. Institute of Electrical and Electronics Engineers Inc., 2021.
[55] Sin Kit Lo, Qinghua Lu, Liming Zhu, Hye-young Paik, Xiwei Xu, and Chen Wang.
Architectural Patterns for the Design of Federated Learning Systems. January 2021.
[56] Vatsal Patel, Sarth Kanani, Tapan Pathak, Pankesh Patel, Muhammad Intizar Ali,
and John Breslin. A Demonstration of Smart Doorbell Design Using Federated Deep
Learning. In Proceedings of ACM Conference (Conference’17), volume 1. Association
for Computing Machinery, 2020.
[57] Aiswarya Raj Munappy, David Issa Mattos, Jan Bosch, Helena Holmström Olsson,
and Anas Dakkak. From Ad-Hoc Data Analytics to DataOps. In Proceedings of the
International Conference on Software and System Processes, ICSSP ’20, page 165–174.
Association for Computing Machinery, 2020.
[58] Damian A. Tamburri. Sustainable MLOps: Trends and Challenges. In 2020 22nd In-
ternational Symposium on Symbolic and Numeric Algorithms for Scientific Computing
(SYNASC), pages 17–23, 2020.
[59] Tobias Binz, Uwe Breitenbücher, Oliver Kopp, and Frank Leymann. TOSCA: Portable
Automated Deployment and Management of Cloud Applications. In Advanced Web
Services, pages 527–549, New York, NY, 2014. Springer New York.
[60] Anupama Mampage, Shanika Karunasekera, and Rajkumar Buyya. A Holistic View on
Resource Management in Serverless Computing Environments: Taxonomy and Future
Directions. may 2021.
[61] Samuel Kounev, Cristina Abad, Ian T. Foster, et al. Toward a Definition for Serverless
Computing. In Serverless Computing (Dagstuhl Seminar 21201), volume 11, pages 34–
93. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2021.
98
BIBLIOGRAPHY
[62] Ioana Baldini, Paul Castro, Kerry Chang, Perry Cheng, Stephen Fink, Vatche
Ishakian, Nick Mitchell, Vinod Muthusamy, Rodric Rabbah, Aleksander Slominski,
et al. Serverless computing: Current trends and open problems. In Research advances
in cloud computing, pages 1–20. Springer, 2017.
[63] Joseph M. Hellerstein, Jose Faleiro, Joseph E. Gonzalez, Johann Schleier-Smith,
Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. Serverless computing: One
step forward, two steps back. CIDR 2019 - 9th Biennial Conference on Innovative
Data Systems Research, 3, 2019.
[64] Narges Shahidi, Jashwant Raj Gunasekaran, and Mahmut Taylan Kandemir. Cross-
Platform Performance Evaluation of Stateful Serverless Workflows. In Proceedings
- 2021 IEEE International Symposium on Workload Characterization, IISWC 2021,
pages 63–73. Institute of Electrical and Electronics Engineers Inc., 2021.
[65] Sebastian Burckhardt, Chris Gillum, David Justo, Konstantinos Kallas, Connor
McMahon, and Christopher S. Meiklejohn. Durable functions: Semantics for stateful
serverless. Proc. ACM Program. Lang., 5(OOPSLA), oct 2021.
[66] Ralf Bruns and Jürgen Dunkel. Towards pattern-based architectures for event pro-
cessing systems. Software: Practice and Experience, 44(11):1395–1416, nov 2014.
[67] Google Cloud. Event-driven architectures, 2022. Last Accessed: 2022-04-29.
[68] Davide Taibi, Nabil El Ioini, Claus Pahl, and Jan Raphael Schmid Niederkofler. Pat-
terns for serverless functions (function-as-a-service): A multivocal literature review. In
CLOSER 2020 - Proceedings of the 10th International Conference on Cloud Computing
and Services Science, volume 1, pages 181–192. Science and Technology Publications
(SciTePress), 2020.
[69] Amazon Web Services. Aws step functions, 2020. Last Accessed: 2022-04-27.
[70] Google Cloud. Google cloud workflows, 2021. Last Accessed: 2022-04-27.
[71] Microsoft Azure. Azure durable functions, 2021. Last Accessed: 2022-04-27.
[72] Youseef Alotaibi and Fei Liu. Survey of business process management: challenges and
solutions. Enterprise Information Systems, 11:1119–1153, 9 2017.
[73] Marlon Dumas, Marcello La Rosa, Jan Mendling, and Hajo A. Reijers. Introduction
to Business Process Management, pages 1–33. Springer Berlin Heidelberg, Berlin,
Heidelberg, 2018.
[74] Wil M. P. van der Aalst. Business process management: A comprehensive survey.
ISRN Software Engineering, 2013:1–37, 2 2013.
[75] Wil M.P. Van Der Aalst, Marcello La Rosa, and Flávia Maria Santoro. Business
process management: Don’t forget to improve the process! Business and Information
Systems Engineering, 58:1–6, 1 2016.
[76] Paul Mathiesen, Jason Watson, Wasana Bandara, and Michael Rosemann. Applying
social technology to business process lifecycle management. volume 99 LNBIP, pages
231–241. Springer Verlag, 2012.
[77] John Mylopoulos. Conceptual modelling and telos. Conceptual modelling, databases,
and CASE: An integrated view of information system development, pages 49–68, 1992.
[78] Marlon Dumas, Marcello La Rosa, Jan Mendling, and Hajo A. Reijers. Essential
Process Modeling, pages 75–115. Springer Berlin Heidelberg, Berlin, Heidelberg, 2018.
[79] Roman Lukyanenko. Rethinking the role of conceptual modeling in the introductory
IS curriculum. 12 2018.
99
BIBLIOGRAPHY
[80] Shishir Bharathi, Ann Chervenak, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, and
Karan Vahi. Characterization of scientific workflows. In 2008 Third Workshop on
Workflows in Support of Large-Scale Science, pages 1–10, 2008.
[81] Nick Russell, Wil M. P. Van Der Aalst, and Arthur H. M. Ter Hofstede. Workflow
patterns: the definitive guide. MIT Press, 2016.
[82] Yair Wand, David E. Monarchi, Jeffrey Parsons, and Carson C. Woo. Theoretical
foundations for conceptual modelling in information systems development. Decision
Support Systems, 15(4):285–304, 1995.
[83] Jan Philipp Friedenstab, Christian Janiesch, Martin Matzner, and Oliver Müller. Ex-
tending BPMN for business activity monitoring. pages 4158–4167. IEEE Computer
Society, 2012.
[84] Richard Lakin, Nick Capon, and Neil Botten. BPR enabling software for the financial
services industry. Management services, 40(3):18–20, 1996.
[85] Ruth Sara Aguilar-Savén. Business process modelling: Review and framework. Inter-
national Journal of Production Economics, 90:129–149, 7 2004.
[86] Khodakaram Salimifard and Mike Wright. Petri net-based modelling of workflow
systems: An overview. European Journal of Operational Research, 134(3):664–676,
2001.
[91] Michael zur Muehlen and Jan Recker. How Much Language Is Enough? Theoretical
and Practical Use of the Business Process Modeling Notation. In Zohra Bellahsène and
Michel Léonard, editors, Advanced Information Systems Engineering, pages 465–479,
Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
[92] Manfred A. Jeusfeld. Metamodel, pages 1727–1730. Springer US, Boston, MA, 2009.
[95] Gregor Polančič and Boštjan Orban. A BPMN-based language for modeling corporate
communications. Computer Standards and Interfaces, 65:45–60, 7 2019.
[96] Jan Philipp Friedenstab, Christian Janiesch, Martin Matzner, and Oliver Müller. Ex-
tending BPMN for business activity monitoring. pages 4158–4167. IEEE Computer
Society, 2012.
[97] Mariam Ben Hassen, Molka Keskes, Mohamed Turki, and Faı̈ez Gargouri. BPMN4KM:
Design and Implementation of a BPMN Extension for Modeling the Knowledge Per-
spective of Sensitive Business Processes. volume 121, pages 1119–1134. Elsevier B.V.,
2017.
100
BIBLIOGRAPHY
[98] Alfonso Rodrı́guez, Angélica Caro, Cinzia Cappiello, and Ismael Caballero. A BPMN
Extension for Including Data Quality Requirements in Business Process Modeling. In
Jan Mendling and Matthias Weidlich, editors, Business Process Model and Notation,
pages 116–125, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
[99] Sonja Meyer, Andreas Ruppen, and Carsten Magerkurth. Internet of things-aware
process modeling: Integrating iot devices as business process resources. In Camille
Salinesi, Moira C. Norrie, and Óscar Pastor, editors, Advanced Information Systems
Engineering, pages 84–98, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
[100] Stefano Tranquillini, Patrik Spieß, Florian Daniel, Stamatis Karnouskos, Fabio Casati,
Nina Oertel, Luca Mottola, Felix Jonathan Oppermann, Gian Pietro Picco, Kay
Römer, and Thiemo Voigt. Process-based design and integration of wireless sensor
network applications. In Alistair Barros, Avigdor Gal, and Ekkart Kindler, editors,
Business Process Management, pages 134–149, Berlin, Heidelberg, 2012. Springer Ber-
lin Heidelberg.
[101] Luis Jesús Ramón Stroppi, Omar Chiotti, and Pablo David Villarreal. Extending
bpmn 2.0: Method and tool support. In Business Process Model and Notation, pages
59–73. Springer Berlin Heidelberg, 2011.
[102] Douglas C. Schmidt. Model-driven engineering. IEEE Computer, 39(2), February
2006.
[103] Michael Wurster, Uwe Breitenbücher, Michael Falkenthal, Christoph Krieger, Frank
Leymann, Karoline Saatkamp, and Jacopo Soldani. The essential deployment
metamodel: a systematic review of deployment automation technologies. volume 35,
pages 63–75. Springer, 8 2020.
[104] Uwe Breitenbücher, Kálm án Képes, Frank Leymann, and Michael Wurster. Declar-
ative vs. Imperative: How to Model the Automated Deployment of IoTApplications?
In Proceedings of the 11th Advanced Summer School on Service Oriented Computing,
pages 18–27. IBM Research Division, 2017.
[105] OASIS. TOSCA simple profile in YAML version 1.3, 2020.
[106] Johannes Wettinger, Tobias Binz, Uwe Breitenbücher, Oliver Kopp, Frank Leymann,
and Michael Zimmermann. Unified Invocation of Scripts and Services for Provisioning,
Deployment, and Management of Cloud Applications Based on TOSCA. In CLOSER,
pages 559–568, 2014.
[107] Karoline Wild, Uwe Breitenbucher, Lukas Harzenetter, Frank Leymann, Daniel Vietz,
and Michael Zimmermann. TOSCA4QC: Two Modeling Styles for TOSCA to Auto-
mate the Deployment and Orchestration of Quantum Applications. In Proceedings -
2020 IEEE 24th International Enterprise Distributed Object Computing Conference,
EDOC 2020, pages 125–134. Institute of Electrical and Electronics Engineers Inc., oct
2020.
[108] Tobias Binz, Gerd Breiter, Frank Leymann, and Thomas Spatzier. Portable cloud
services using TOSCA. IEEE Internet Computing - INTERNET, 16:80–85, 05 2012.
[109] B Qian, J Su, Z Wen, DN Jha, Y Li, Y Guan, D Puthal, P James, R Yang, AY Zo-
maya, O Rana, L Wang, M Koutny, and R Ranjan. Orchestrating the Development
Lifecycle of Machine Learning-based IoT Applications: A Taxonomy and Survey. ACM
Computing Surveys, 53(4), September 2020.
[110] Sebastian Schelter, Joos-Hendrik Böse, Johannes Kirschnick, Thoralf Klein, and
Stephan Seufert. Automatically Tracking Metadata and Provenance of Machine Learn-
ing Experiments. Machine Learning Systems Workshop at NIPS, pages 1–8, 2017.
101
BIBLIOGRAPHY
[111] Cédric Renggli, Luka Rimanic, Nezihe Merve Gürel, Bojan Karlas, Wentao Wu, and
Ce Zhang. A Data Quality-Driven View of MLOps. CoRR, abs/2102.07750, 2021.
[112] Joao Carreira, Pedro Fonseca, Alexey Tumanov, Andrew Zhang, and Randy Katz. Cir-
rus: A Serverless Framework for End-To-end ML Workflows. pages 13–24. Association
for Computing Machinery, 11 2019.
[113] Minchen Yu, Zhifeng Jiang, Hok Chun Ng, Wei Wang, Ruichuan Chen, and Bo Li.
Gillis: Serving large neural networks in serverless functions with automatic model
partitioning. 2021 IEEE 41st International Conference on Distributed Computing
Systems (ICDCS), pages 138–148, 2021.
[114] Dheeraj Chahal, Manju Ramesh, Ravi Ojha, and Rekha Singhal. High performance
serverless architecture for deep learning workflows. In Proceedings - 21st IEEE/ACM
International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2021,
pages 790–796. Institute of Electrical and Electronics Engineers Inc., 2021.
[115] Malte S. Kurz. Distributed double machine learning with a serverless architecture. In
ICPE 2021 - Companion of the ACM/SPEC International Conference on Performance
Engineering, pages 27–33. Association for Computing Machinery, Inc, apr 2021.
[116] Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic,
Ankit Singla, Wentao Wu, and Ce Zhang. Towards Demystifying Serverless Machine
Learning Training. Proceedings of the ACM SIGMOD International Conference on
Management of Data, pages 857–871, 2021.
[117] Ta Phuong Bac, Minh Ngoc Tran, and Younghan Kim. Serverless Computing Ap-
proach for Deploying Machine Learning Applications in Edge Layer. In International
Conference on Information Networking, volume 2022-January, pages 396–401. IEEE
Computer Society, 2022.
[118] Sasko Ristov, Stefan Pedratscher, and Thomas Fahringer. AFCL: An Abstract Func-
tion Choreography Language for serverless workflow specification. Future Gener. Com-
put. Syst., 114:368–382, 2021.
[119] Alan R Hevner, Salvatore T March, Jinsoo Park, and Sudha Ram. Design Science
in Information Research 1. Design Science in IS Research MIS Quarterly, 28(1):75,
2004.
[120] Salvatore T. March and Gerald F. Smith. Design and natural science research on
information technology. Decision Support Systems, 15(4):251–266, 1995.
[121] Herbert A. Simon. The Sciences of the Artificial (3rd Ed.). MIT Press, Cambridge,
MA, USA, 1996.
[122] Ken Peffers, Tuure Tuunanen, Marcus A. Rothenberger, and Samir Chatterjee. A
design science research methodology for information systems research. Journal of
Management Information Systems, 24(3):45–77, 2007.
[123] Omprakash Kaiwartya, Abdul Hanan Abdullah, Yue Cao, Ayman Altameem, Mukesh
Prasad, Chin Teng Lin, and Xiulei Liu. Guidelines for performing Systematic Liter-
ature Reviews in Software Engineering. IEEE Access, 4:5356–5373, 2016.
[124] Kai Petersen, Robert Feldt, Shahid Mujtaba, and Michael Mattsson. Systematic map-
ping studies in software engineering. In 12th International Conference on Evaluation
and Assessment in Software Engineering (EASE) 12, pages 1–10, 2008.
[125] Giuseppe Cascavilla, Damian A. Tamburri, and Willem-Jan Van Den Heuvel. Cyber-
crime threat intelligence: A systematic multi-vocal literature review. Computers and
Security, 105:102258, 2021.
102
BIBLIOGRAPHY
[126] Vasileios Theodorou, Alberto Abelló, Maik Thiele, and Wolfgang Lehner. Frequent
patterns in ETL workflows: An empirical approach. Data and Knowledge Engineering,
112(July):1–16, 2017.
[127] Judith Awiti, Alejandro A. Vaisman, and Esteban Zimányi. Design and implement-
ation of ETL processes using BPMN and relational algebra. Data and Knowledge
Engineering, 129(March):101837, sep 2020.
[128] Farshid Hassani Bijarbooneh, Wei Du, Edith C.-H. Ngai, Xiaoming Fu, and Jiangchuan
Liu. Cloud-assisted data fusion and sensor selection for internet of things. IEEE
Internet of Things Journal, 3(3):257–268, 2016.
[129] Google Cloud. MLOps: Continuous delivery and automation pipelines in ma-
chine learning. https://cloud.google.com/architecture/mlops-continuous-delivery-and-
automation-pipelines-in-machine-learning, 2020. Last Accessed: 2022-06-01.
[130] Amazon Web Services - Amazon Sagemaker. Machine Learning with
Amazon SageMaker. https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-
works-mlconcepts.html, 2022. Last Accessed: 2022-06-01.
[131] Michael Schmitt and Xiao Xiang Zhu. Data Fusion and Remote Sensing: An ever-
growing relationship. IEEE Geoscience and Remote Sensing Magazine, 4(4):6–23, dec
2016.
[132] Kwabena Mensa-Bonsu and Luca Piccolo. Taming Machine Learning on AWS
with MLOps: A Reference Architecture. https://aws.amazon.com/blogs/apn/
taming-machine-learning-on-aws-with-mlops-a-reference-architecture/,
2021. Last Accessed: 2022-06-01.
[133] Microsoft Azure. MLOps for Python models using Azure Machine Learning. https:
//docs.microsoft.com/en-us/azure/architecture/reference-architectures/
ai/mlops-python, 2022. Last Accessed: 2022-06-01.
[134] Ali Arsanjani. Architect and build the full machine learning lifecycle with AWS: An
end-to-end Amazon SageMaker demo. https://aws.amazon.com/blogs/machine-
learning/architect-and-build-the-full-machine-learning-lifecycle-with-amazon-
sagemaker/, 2021. Last Accessed: 2022-06-01.
[135] Tong Yu and Hong Zhu. Hyper-Parameter Optimization: A Review of Algorithms
and Applications. pages 1–56, 2020.
[136] Gavin C. Cawley and Nicola L. C. Talbot. On over-fitting in model selection and
subsequent selection bias in performance evaluation. Journal of Machine Learning
Research, 11(70):2079–2107, 2010.
[137] Yue Zhou, Yue Yu, and Bo Ding. Towards MLOps: A Case Study of ML Pipeline
Platform. 2020 International Conference on Artificial Intelligence and Computer En-
gineering (ICAICE), pages 494–500, 2020.
[138] Hanieh Alipour and Yan Liu. Online machine learning for cloud resource provisioning
of microservice backend systems. In 2017 IEEE International Conference on Big Data
(Big Data), pages 2433–2441, 2017.
[139] Rupesh Raj Karn, Prabhakar Kudva, and Ibrahim Abe M. Elfadel. Dynamic autose-
lection and autotuning of machine learning models for cloud network analytics. IEEE
Transactions on Parallel and Distributed Systems, 30(5):1052–1064, may 2019.
[140] Alejandro Barredo Arrieta, Natalia Dı́az-Rodrı́guez, Javier Del Ser, Adrien Bennetot,
Siham Tabik, Alberto Barbado, Salvador Garcia, Sergio Gil-Lopez, Daniel Molina,
Richard Benjamins, Raja Chatila, and Francisco Herrera. Explainable Artificial Intel-
ligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible
AI. Information Fusion, 58:82–115, 2020.
103
BIBLIOGRAPHY
[144] Frédéric Jouault, Freddy Allilaire, Jean Bézivin, and Ivan Kurtev. ATL: A model
transformation tool. Sci. Comput. Program., 72:31–39, 2008.
[145] Manuel Wimmer, Michael Strommer, Horst Kargl, and Gerhard Kramler. Towards
model transformation generation by-example. Proceedings of the Annual Hawaii In-
ternational Conference on System Sciences, pages 1–10, 2007.
[146] Tiexin Wang, Sébastien Truptil, and Frédérick Bénaben. An automatic model-to-
model mapping and transformation methodology to serve model-based systems engin-
eering. Information Systems and e-Business Management, 15(2):323–376, 2017.
[147] Liwen Zhang, Franck Fontanili, Elyes Lamine, Christophe Bortolaso, Mustapha
Derras, and Hervé Pingaud. A systematic model to model transformation for
knowledge-based planning generation problems. In Lecture Notes in Computer Sci-
ence, volume 12144 LNAI, pages 140–152. Springer International Publishing, 2020.
[148] Alexander Bergmayr, Uwe Breitenbücher, Oliver Kopp, Manuel Wimmer, Gerti
Kappel, and Frank Leymann. From Architecture Modeling to Application Provisioning
for the Cloud by Combining UML and TOSCA. In CLOSER, 2016.
[149] RADON. RADON - An advanced DevOps framework. https://radon-h2020.eu/,
2020. Last Accessed: 2022-05-20.
[150] RADON. RADON Particles. https://github.com/radon-h2020/radon-
particles/tree/master/nodetypes/radon.nodes.google, 2021. Last Accessed: 2022-05-
20.
[151] Kaggle Inc. Home Credit Default Risk -Can you predict how capable each
applicant is of repaying a loan? https://www.kaggle.com/competitions/
home-credit-default-risk/overview, 2018. Last Accessed: 2022-05-20.
104
Appendix A
Figure A.1: Winery TOSCA Topology Template of BPMN4sML home credit default ML
pipeline.
To model this Topology Template Winery is started with reference to the TOSCA repository
provided by IAAS Serverless Prototyping Lab [153]. Alternatively, Winery can be started with
reference to the adapted version of the repository that already contains the displayed Topology
Template and is provided as part of this thesis.
105