Semi-transient and cloud workers

## Prework

* [x] Read and agree to the [code of conduct](https://ropensci.org/code-of-conduct/) and [contributing guidelines](https://github.com/ropensci/targets/blob/main/CONTRIBUTING.md).
* [x] If there is [already a relevant issue](https://github.com/ropensci/targets/issues), whether open or closed, comment on the existing thread instead of posting a new issue.
* [x] New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.
* [x] Format your code according to the [tidyverse style guide](https://style.tidyverse.org/).

## Proposal

Since its inception, `targets` has only ever supported fully transient or fully persistent workers. This has mostly worked until now because initialization, monitoring, and idle time do not have terrible consequences on traditional clusters. But the cloud will be totally different. Requesting jobs, initializing Docker images, and communicating with AWS/GCP will all take a lot more time, which means fully transient workers will be inefficient (see https://github.com/HenrikBengtsson/future/discussions/567). At the other extreme, persistent workers on a nontrivial DAG would spend a wasteful amount of time idling, driving up monetary costs.

There are fancy hybrid approaches such as [snakemake job grouping](https://snakemake.readthedocs.io/en/stable/executing/grouping.html), but this is a lot of manual work for the user, and [dyanamic branching](https://books.ropensci.org/targets/dynamic.html) makes it hard to ensure proper load balancing in advance. 

What I have in mind is similar to what I proposed in https://github.com/mschubert/clustermq/discussions/257#discussioncomment-567950:

> 1. Start by submitting an array job of a certain user-specified size, not necessarily the maximum size.
> 2. If more work is requested with $send_call() and the number of currently busy workers is less than the user-specified maximum, then initialize a new worker for the new job.
> 3. If a worker idles for long enough (i.e. receives nothing or only $send_wait() for some length of time) then shut down that worker.

(and of course the implicit step 4 is to close down idle workers when there is no more work to submit).

This is what @mattwarkentin called the "heuristic" approach in https://github.com/mschubert/clustermq/discussions/257#discussioncomment-585417 (which trivially reduces to the "deterministic" approach if idle time is infinite). The feature is discussed for `clustermq`.

To implement this, I picture a separate package to manage a dynamic collection of semitransient workers. The look and feel should be like a `clustermq::workers()` object: it should support message-passing OOP (`R6`-like), and it should spend as little time as possible blocking the main process. However, it should sit as a layer higher in the stack than `clustermq` or `future`. A subclass of a `workers` class may rely on `clustermq` or `future` in the backend, or maybe something entirely custom. This freedom should open up workarounds to increase efficiency: for some subclasses, maybe transient local background processes to launch and/or poll workers.

I would like to overhaul the [`workers`](https://github.com/wlandau/workers) package for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Semi-transient and cloud workers #753

Prework

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Semi-transient and cloud workers #753

Description

Prework

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions