Workflow Engine for Clouds
Workflow Management Systems and Clouds
Architecture of Workflow Management Systems
Utilizing Clouds for Workflow Execution
Case Study: Evolutionary Multiobjective Optimizations
Visionary thoughts for Practitioners
Workflow Management Systems and Clouds
The elastic nature of clouds facilitates changing of resource quantities and
characteristics to vary at runtime, thus dynamically scaling up when there is a
greater need for additional resources and scaling down when the demand is low.
This enables workflow management systems to readily meet quality-of-service
(QoS) requirements of applications
cloud computing services coming from large commercial organizations, service level
agreements (SLAs) have been an important concern to both the service providers
and consumers
Due to competitions within emerging service providers, greater care is being taken
in designing SLAs that seek to offer
(a) better QoS guarantees to customers and
(b) clear terms for compensation in the event of violation.
This allows workflow management systems to provide better end-to-end guarantees
when meeting the service requirements of users by mapping them to service
providers based on characteristics of SLAs.
Architectural Overview
The following diagram presents a high-level architectural view of a Workflow
Management System (WfMS) utilizing cloud resources to drive the execution of a
scientific workflow application.
The workflow system comprises the workflow engine, a resource broker and plug-ins
for communicating with various technological platforms, such as Aneka and Amazon
EC2
User applications could only use cloud services or use cloud together with existing
grid/cluster-based solutions.
The following figure explains two scenarios,
one where the Aneka platform is used in its entirety to complete the workflow,
Other where Amazon EC2 is used to supplement a local cluster when there are
insufficient resources to meet the QoS requirements of the application.
Aneka, is a PaaS cloud and can be run on a corporate network or a dedicated
cluster or can be hosted entirely on an IaaS cloud.
limited resources in local networks, Aneka is capable of transparently provisioning
additional resources by acquiring new resources in third-party cloud services such
as Amazon EC2 to meet application demands.
This relieves the WfMS from the responsibility of managing and allocating
resources directly, to simply negotiating the required resources with Aneka.
Aneka also provides a set of Web services for service negotiation, job submission,
and job monitoring.
The WfMS would coordinates the workflow execution by scheduling jobs in the
right sequence to the Aneka Web Services.
The typical flow of events when executing an application workflow on Aneka
would begin with the WfMS staging in all required data for each job onto a remote
storage resource, such as Amazon S3 or an FTP server.
In this case, the data would take the form of a set of files, including the application
binaries.
These data can be uploaded by the user prior to execution, and they can be stored
in storage facilities offered by cloud services for future use.
The WfMS then forwards workflow tasks to Aneka’s scheduler via the Web service
interface.
These tasks are subsequently examined for required files, and the storage service is
instructed to stage them in from the remote storage server.
so that they are accessible by the internal network of execution nodes
The execution begins by scheduling tasks to available execution nodes (also known
as worker nodes). The workers download any required files for each task they
execute from the storage server, execute the application, and upload all output files
as a result of the execution back to the storage server.
These files are then staged out to the remote storage server so that they are
accessible by other tasks in the workflow managed by the WfMS. This process
continues until the workflow application is complete
The second scenario describes a situation in which the WfMS has greater control
over the compute resources and provisioning policies for executing workflow
applications.
Based on user-specified QoS requirements, the WfMS schedules workflow tasks to
resources that are located at the local cluster and in the cloud. Typical parameters
that drive the scheduling decisions in such a scenario include deadline (time) and
budget (cost)
For instance, a policy for scheduling an application workflow at minimum
execution cost would utilize local resources and then augment them with cheaper
cloud resources, if needed, rather than using high-end but more expensive cloud
resources.
A policy that scheduled workflows to achieve minimum execution time would
always use high-end cluster and cloud resources, irrespective of costs.
The resource provisioning policy determines the extent of additional resources to be
provisioned on the public clouds. In this second scenario, the WfMS interacts
directly with the resources provisioned.
ARCHITECTURE OF WORKFLOW
MANAGEMENT SYSTEMS
Scientific applications are typically modeled as workflows, consisting of tasks, data
elements, control sequences and data dependencies.
Workflow management systems are responsible for managing and executing these
workflow
According to Raicu et al. [17], scientific workflow management systems are engaged
and applied to the following aspects of scientific computations:
1. describing complex scientific procedures (using GUI tools, workflow specific
languages),
2. Automating data derivation processes (data transfer components)
3. high-performance computing (HPC) to improve throughput and
performance (distributed resources and their coordination).
4. provenance management and query (persistence components).
The Cloudbus Workflow Management System [12] consists of components that are
responsible for handling tasks, data and resources taking into account users’ QoS
requirements.
The architecture consists of three major parts:
(a) the user interface (b) the core, and (c) plug-ins.
The user interface allows end users to work with workflow composition, workflow
execution planning, submission, and monitoring.
These features are delivered through a Web portal or through a stand-alone
application that is installed at the user’s end.
Workflow composition is done using an XML-based Workflow Language (xWFL)
Users define task properties and link them based on their data dependencies.
Multiple tasks can be constructed using copy-paste functions present in most GUIs.