Robust handling of worker and subworker crashes

Currently a crash of a subworker may crash a worker, and a crash of a worker may crash the server. We need to improve this. However, we are *not* aiming for infrastructure resiliency now. Subworker crash may still fail the task (and so also the session) and worker crash may still lose all the objects and fail all involved sessions. The main goal is to keep the server running and deliver a graceful error.

A robust failure handling will open up the road to retrying tasks (possibly on different workers) and later to worker crash resiliency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Robust handling of worker and subworker crashes #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Robust handling of worker and subworker crashes #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions