-
Notifications
You must be signed in to change notification settings - Fork 76
Closed
Labels
Description
Prework
- Read and agree to the code of conduct and contributing guidelines.
- If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
- Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.
Description
In this example on an SGE cluster, the targets deploy really slowly.
# _targets.R
library(targets)
library(tarchetypes)
options(clustermq.scheduler = "sge")
options(clustermq.template = "cmq.tmpl")
options(crayon.enabled = FALSE)
tar_rep(x, 0, batches = 1000, reps = 1)
# cmq.tmpl
#$ -N {{ job_name }}
#$ -t 1-{{ n_jobs }}
#$ -j y
#$ -cwd
#$ -V
module load R/4.0.3
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
The profiling study took several minutes.
px <- proffer::pprof(tar_make_clustermq(workers = 10, callr_function = NULL), host = "0.0.0.0")I saw this flamegraph:
Which tells me exactly where the bottleneck is:
Lines 98 to 101 in 66a0655
| self$crew$send_call( | |
| expr = target_run_worker(target), | |
| env = list(target = target) | |
| ) |
And sure enough, when I changed just retrieval to "worker", everything went much faster.
# _targets.R
library(targets)
library(tarchetypes)
tar_option_set(retrieval = "worker") # worker retrieval
options(clustermq.scheduler = "sge")
options(clustermq.template = "cmq.tmpl")
options(crayon.enabled = FALSE)
tar_rep(x, 0, batches = 1000, reps = 1)Solution
Apparently, all I need to do is force() all the pre-loaded objects in the subpipeline. It looks like there is an unevaluated promise object that is consuming too much memory. Here I am debugging this example at $run_worker():
> tar_make_clustermq(callr_function = NULL)
● run target x_batch
● run branch x_be02823c
Called from: self$run_worker(target)
Browse[1]> object_size(target) # promise object
9.32 MB # way too big.
Browse[1]> tmp <- force(target$subpipeline$targets$x_batch_084b9b29$value$object)
Browse[1]> object_size(target) # evaluated object
193 kB # much better
Browse[1]>