Stojnic, Nenad. Self-organizing distributed workflow management. 2015, Doctoral Thesis, University of Basel, Faculty of Science.
|
PDF
9Mb |
Official URL: http://edoc.unibas.ch/diss/DissB_11311
Downloads: Statistics Overview
Abstract
The proliferation of service-oriented architectures in the last decade has brought forward an important class of sophisticated distributed applications that are founded on
the idea of composing multiple simple services into a complex, coherent whole. Such applications spanning multiple service invocations can be most effectively realized by means of workflows. When it comes to high performance workflow execution, distribution (outscaling) of services is a key concept and also a very straightforward advantage of the workflow paradigm. Concretely, both the constituent services of the workflow and the system that manages their invocations have to be distributed across an environment of computational devices. In a wide spectrum of applications, that entail heterogeneity of the encompassed computational devices, e.g., modern emergency management, invocations of optimal service instances in conjunction to their reliability are
fundamental prerequisites of distributed workflow management.
At the center of this thesis is a formal model that defines the distributed (i.e., scalable) execution of workflows. To extend this model for reliability in a novel way, which
does not affect the scalability of execution, the Safety-Ring system service is presented.
The idea behind Safety-Ring is to offer recovery for a wide range of node failures, which
host active services of running workflows. To this end, the Safety-Ring provides a scalable, reliable, and consistent data store that is used for the storage of workflow execution state. The novel failure-recovery mechanism features effective reliability such that can
be applied on the nodes that host the Safety-Ring service themselves, thus we say the Safety-Ring is self-healing.
To apply the reliable (and distributed) execution model, enhanced by Safety-Ring, to heterogeneous node environments, that are predominantly composed of mobile de-
vices, this thesis presents the Compass data access protocol. In providing scalable data lookup for its maintained data, the Safety-Ring assumes network runtime characteristics which are rather stable, and thus Safety-Ring implicitly optimizes for the number of
queried nodes. Especially in mobile applications, where node network connectivity dynamically changes, data access protocols should aim at reducing the overall data lookup
latency, rather than the number of queried nodes. Compass introduces latency optimal paths to each node, which dynamically adapt to changing network characteristics. The
scalable data lookup of Safety-Ring is not affected.
In case distributed execution of workflows spans services of continuous (stateful) type, their reliability is decoupled from the Safety-Ring. Since such service types are
predominantly featured by devices of limited resources, novel approaches to resource conservative recovery of failures for continuous services have to be provided. This thesis builds on proven recovery techniques, such as passive-standby, so as to enhance them for redundancy of the continuous state and thus improve the overall reliability of the system. In doing so the redundancy of state is enforced by means of a lightweight, in terms of network overhead, consistency protocol which allows for its application in resource limited node environments.
In order to improve the execution performance of distributed workflows, in terms of throughput, this thesis offers a novel concept to services distribution. At the heart of our approach lie decentralized controllers that autonomously perform dynamic reconfiguration of the execution environment in terms of available services. This primarily affects the Safety-Ring service and all application services. Thereby, the goals of the controllers
are to prevent workflow execution bottlenecks and unnecessary service deployments that waste resources. Since the controllers are equipped at any node in the system and
can affect any other node of the system we say that the distributed workflow execution model is self-optimizing.
Finally, all the presented concepts are implemented within the context of the OSIRIS distributed workflow engine and quantitatively evaluated in a series of experiments.
The results of experiments confirm the benefits of our concepts for the distributed workflow execution model.
the idea of composing multiple simple services into a complex, coherent whole. Such applications spanning multiple service invocations can be most effectively realized by means of workflows. When it comes to high performance workflow execution, distribution (outscaling) of services is a key concept and also a very straightforward advantage of the workflow paradigm. Concretely, both the constituent services of the workflow and the system that manages their invocations have to be distributed across an environment of computational devices. In a wide spectrum of applications, that entail heterogeneity of the encompassed computational devices, e.g., modern emergency management, invocations of optimal service instances in conjunction to their reliability are
fundamental prerequisites of distributed workflow management.
At the center of this thesis is a formal model that defines the distributed (i.e., scalable) execution of workflows. To extend this model for reliability in a novel way, which
does not affect the scalability of execution, the Safety-Ring system service is presented.
The idea behind Safety-Ring is to offer recovery for a wide range of node failures, which
host active services of running workflows. To this end, the Safety-Ring provides a scalable, reliable, and consistent data store that is used for the storage of workflow execution state. The novel failure-recovery mechanism features effective reliability such that can
be applied on the nodes that host the Safety-Ring service themselves, thus we say the Safety-Ring is self-healing.
To apply the reliable (and distributed) execution model, enhanced by Safety-Ring, to heterogeneous node environments, that are predominantly composed of mobile de-
vices, this thesis presents the Compass data access protocol. In providing scalable data lookup for its maintained data, the Safety-Ring assumes network runtime characteristics which are rather stable, and thus Safety-Ring implicitly optimizes for the number of
queried nodes. Especially in mobile applications, where node network connectivity dynamically changes, data access protocols should aim at reducing the overall data lookup
latency, rather than the number of queried nodes. Compass introduces latency optimal paths to each node, which dynamically adapt to changing network characteristics. The
scalable data lookup of Safety-Ring is not affected.
In case distributed execution of workflows spans services of continuous (stateful) type, their reliability is decoupled from the Safety-Ring. Since such service types are
predominantly featured by devices of limited resources, novel approaches to resource conservative recovery of failures for continuous services have to be provided. This thesis builds on proven recovery techniques, such as passive-standby, so as to enhance them for redundancy of the continuous state and thus improve the overall reliability of the system. In doing so the redundancy of state is enforced by means of a lightweight, in terms of network overhead, consistency protocol which allows for its application in resource limited node environments.
In order to improve the execution performance of distributed workflows, in terms of throughput, this thesis offers a novel concept to services distribution. At the heart of our approach lie decentralized controllers that autonomously perform dynamic reconfiguration of the execution environment in terms of available services. This primarily affects the Safety-Ring service and all application services. Thereby, the goals of the controllers
are to prevent workflow execution bottlenecks and unnecessary service deployments that waste resources. Since the controllers are equipped at any node in the system and
can affect any other node of the system we say that the distributed workflow execution model is self-optimizing.
Finally, all the presented concepts are implemented within the context of the OSIRIS distributed workflow engine and quantitatively evaluated in a series of experiments.
The results of experiments confirm the benefits of our concepts for the distributed workflow execution model.
Advisors: | Schuldt, Heiko |
---|---|
Committee Members: | Pautasso, Cesare |
Faculties and Departments: | 05 Faculty of Science > Departement Mathematik und Informatik > Informatik > Databases and Information Systems (Schuldt) |
UniBasel Contributors: | Stojnic, Nenad and Schuldt, Heiko |
Item Type: | Thesis |
Thesis Subtype: | Doctoral Thesis |
Thesis no: | 11311 |
Thesis status: | Complete |
Number of Pages: | 247 S. |
Language: | English |
Identification Number: |
|
edoc DOI: | |
Last Modified: | 02 Aug 2021 15:11 |
Deposited On: | 13 Aug 2015 08:14 |
Repository Staff Only: item control page