-
Notifications
You must be signed in to change notification settings - Fork 76
Description
Hi Will,
I haven't thought this through well enough to really assess its feasibility, but I wanted to scribble my thoughts down and get your input. Not sure if we need to loop @mschubert in or not.
As you know, for one of my current projects I am using the "mostly local, sometimes remote" approach - my project lives on my local machine, but some computationally intensive tasks are selectively sent to the HPC via SSH thanks to clustermq
. This works great.
However, when using options(clustermq.scheduler = "ssh")
, you have only two options, run jobs locally and sequentially in the "main"
process, or send the job via ssh
. The majority of the tasks run in the "main"
R
process and are forced to run sequentially, all for the ability to send a few select jobs to HPC.
So long story short, I am wondering if would somehow be possible to use "multiprocess"
for jobs with deployment = "main"
and "ssh"
for targets with deployment = "worker"
. I know this is a convoluted use-case, but I am actually constrained to using this workflow for this particular project and was just wondering if something like that could possibly work.
Reasons why I don't just run everything via ssh
:
-
Some of the tasks are trivial and quick, and the overhead of sending them to the HPC over sockets is unnecessary
-
Some of the targets rely on the local NFS for access to files which cannot be moved to the cluster or cloud
Reasons why I don't just run everything locally:
-
Memory constraints on my local machine
-
There are fewer computationally intensive tasks, but they take days to run sometimes