Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Unevaluated promise inflates data sent to workers #279

@wlandau

Description

@wlandau

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

In this example on an SGE cluster, the targets deploy really slowly.

# _targets.R
library(targets)
library(tarchetypes)
options(clustermq.scheduler = "sge")
options(clustermq.template = "cmq.tmpl")
options(crayon.enabled = FALSE)
tar_rep(x, 0, batches = 1000, reps = 1)
# cmq.tmpl
#$ -N {{ job_name }}
#$ -t 1-{{ n_jobs }}
#$ -j y
#$ -cwd
#$ -V
module load R/4.0.3
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'

The profiling study took several minutes.

px <- proffer::pprof(tar_make_clustermq(workers = 10, callr_function = NULL), host = "0.0.0.0")

I saw this flamegraph:

Screen Shot 2021-01-21 at 12 39 01 PM

Which tells me exactly where the bottleneck is:

self$crew$send_call(
expr = target_run_worker(target),
env = list(target = target)
)

And sure enough, when I changed just retrieval to "worker", everything went much faster.

# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(retrieval = "worker") # worker retrieval
options(clustermq.scheduler = "sge")
options(clustermq.template = "cmq.tmpl")
options(crayon.enabled = FALSE)
tar_rep(x, 0, batches = 1000, reps = 1)

Screen Shot 2021-01-21 at 12 44 37 PM

Solution

Apparently, all I need to do is force() all the pre-loaded objects in the subpipeline. It looks like there is an unevaluated promise object that is consuming too much memory. Here I am debugging this example at $run_worker():

> tar_make_clustermq(callr_function = NULL)
● run target x_batchrun branch x_be02823c
Called from: self$run_worker(target)
Browse[1]> object_size(target) # promise object
9.32 MB # way too big.
Browse[1]> tmp <- force(target$subpipeline$targets$x_batch_084b9b29$value$object)
Browse[1]> object_size(target) # evaluated object
193 kB # much better
Browse[1]> 

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions