Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

be-marc
Copy link
Member

@be-marc be-marc commented May 22, 2025

No description provided.

@be-marc be-marc requested a review from Copilot May 22, 2025 10:56
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for running experiments in parallel using the mirai package and exposes a new "mirai" option in the learner encapsulation workflow.

  • Introduces a mirai branch in future_map to dispatch work via mirai_map/collect_mirai.
  • Updates Learner$encapsulate() to accept "mirai" as a method choice.
  • Adds mirai to DESCRIPTION Suggestions and configures a GitHub remote; updates .Rbuildignore.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

File Description
R/helper_exec.R Added mirai branch in future_map with mirai_map/collect_mirai.
R/Learner.R Extended encapsulate() to accept "mirai" method.
DESCRIPTION Added mirai to Suggests and configured a Remotes entry.
.Rbuildignore Added ^attic$ to ignore list.
Comments suppressed due to low confidence (5)

R/helper_exec.R:26

  • There's no test for the new mirai branch in future_map. Add a unit test that simulates an available mirai environment and verifies that mirai_map/collect_mirai is invoked.
} else if (requireNamespace("mirai", quietly = TRUE) && mirai::status()$connections) {

R/Learner.R:556

  • The roxygen block for encapsulate() wasn't updated to mention the new mirai method. Include a description of what mirai does and any additional arguments it requires.
#' @return `self` (invisibly).

R/helper_exec.R:28

  • The code calls workhorse inside mirai_map, but FUN is the user-supplied function parameter. Replace workhorse with FUN (or clarify where workhorse is defined) to ensure the intended function is executed.
mirai::collect_mirai(mirai::mirai_map(data.table(...), workhorse, .args = c(MoreArgs, list(is_sequential = FALSE))))

R/Learner.R:557

  • You’ve added "mirai" as a valid choice, but there is no corresponding branch handling method == "mirai" in encapsulate(). Implement the execution logic for the new method or remove it until ready.
assert_choice(method, c("none", "try", "evaluate", "callr", "mirai"))

DESCRIPTION:78

  • The Remotes entry points to mlr-org/mlr3misc@mirai, which appears unrelated to the mirai package. Update this to reference the actual mirai repository (or remove if mirai is on CRAN).
    mlr-org/mlr3misc@mirai

R/helper_exec.R Outdated
if (getOption("mlr3.debug", FALSE)) {
lg$info("Running experiments sequentially in debug mode with %i iterations", n)
mapply(FUN, ..., MoreArgs = MoreArgs, SIMPLIFY = FALSE, USE.NAMES = FALSE)
} else if (requireNamespace("mirai", quietly = TRUE) && mirai::status()$connections) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this will automatically prefer mirai over future? Maybe there should be an option for this

Copy link
Member Author

@be-marc be-marc May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if the user also started a daemon with mirai::daemons()

R/helper_exec.R Outdated
if (getOption("mlr3.debug", FALSE)) {
lg$info("Running experiments sequentially in debug mode with %i iterations", n)
mapply(FUN, ..., MoreArgs = MoreArgs, SIMPLIFY = FALSE, USE.NAMES = FALSE)
} else if (requireNamespace("mirai", quietly = TRUE) && mirai::status()$connections) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works with the default compute profile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't want to make it too complicated. What would be the best way to pass the compute profile? We also have to pass it to mirai_map() .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we expose a .compute argument to benchmark() and resample().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so happy with that because we are introducing parameters for a special backend. Moreover, this argument would then have to be added to all tuning and feature selection functions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about this. Already in a with(daemons(...), { }) scope mirai() calls will default to those settings with .compute = NULL. More to come on this front.

@sebffischer
Copy link
Member

sebffischer commented May 22, 2025

Most importantly: we need to ensure that using mirai_map is reproducible and gets proper RNG streams.

One way to achiev this is to use L-Ecuyer-CMRG and generate seeds using:

RNGkind("L'Ecuyer-CMRG")
set.seed(1)
s <- .Random.seed
for (i in 1:10) {
  s <- nextRNGStream(s)
  # send s to worker i as .Random.seed
}

This is taken from the parallel documentation.

@be-marc
Copy link
Member Author

be-marc commented May 22, 2025

we need to ensure that using mirai_map is reproducible and gets proper RNG streams.

Yes, for now I pointed out in the documentation that it does not deliver the same results as future.

One way to achieve this is to use L-Ecuyer-CMRG and generate seeds using

Yes I did the same in rush. Not sure why Charlie is doing it differently.

@sebffischer
Copy link
Member

sebffischer commented May 23, 2025

we need to ensure that using mirai_map is reproducible and gets proper RNG streams.

Yes, for now I pointed out in the documentation that it does not deliver the same results as future.

One way to achieve this is to use L-Ecuyer-CMRG and generate seeds using

Yes I did the same in rush. Not sure why Charlie is doing it differently.

Actually, Charlie is also doing it. But every mirai worker instantites the L-Ecuyer seed once at startup and set.seed() then has no effect on the parallel workers.

But this causes e.g. the following behavior:

library(mirai)

set.seed(1)
daemons(2)
#> [1] 2
x1 = mirai_map(1, function(i) rnorm(1))[][[1]]
set.seed(1)
x2 = mirai_map(1, function(i) rnorm(1))[][[1]]
x1 == x2
#> [1] FALSE

Created on 2025-05-23 with reprex v2.1.1

I think it would be more user friendly for us to also set the seeds again before starting the parallel work.

@sebffischer
Copy link
Member

sebffischer commented May 23, 2025

Another idea for optimization:

If we had more control over the daemons, e.g. via a special compute profile .mlr3, we could also globally load some packages on the workers (e.g. mlr3 itself).
Not sure whether results in a noticeable runtime improvement but we could explore it.

We could offer both options:
a) The user uses a existing compute profile`
b) We spawn daemons, which is where we can add such optimizations.

@be-marc
Copy link
Member Author

be-marc commented May 23, 2025

Actually, Charlie is also doing it. But every mirai worker instantites the L-Ecuyer seed once at startup and set.seed() then has no effect on the parallel workers.

Does this mean that the substreams are not attached to the tasks but to the daemons? It somehow sounds like that in https://mirai.r-lib.org/reference/daemons.html#arg-seed. Otherwise, the order in which tasks are sent would have no influence.

@be-marc
Copy link
Member Author

be-marc commented May 23, 2025

@sebffischer
Copy link
Member

Actually, Charlie is also doing it. But every mirai worker instantites the L-Ecuyer seed once at startup and set.seed() then has no effect on the parallel workers.

Does this mean that the substreams are not attached to the tasks but to the daemons? It somehow sounds like that in https://mirai.r-lib.org/reference/daemons.html#arg-seed. Otherwise, the order in which tasks are sent would have no influence.

Yes, I think so

@be-marc
Copy link
Member Author

be-marc commented Jul 21, 2025

The results are now reproducible with mirai 2.4.1 if a seed is passed to miria::daemons(). However, the results are not reproducible across parallel backends.

#' To select a parallel backend, use [future::plan()].
#' To use `mirai`, call `mirai::daemons()` before calling this function.
#' The `future` package guarantees reproducible results independent of the parallel backend.
#' The results of `mirai` will not be the same but can be made reproducible by setting a `seed` and `dispatcher = FALSE` when calling `mirai::daemons()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is outdated

@be-marc be-marc merged commit 21d84d5 into main Aug 14, 2025
6 of 7 checks passed
@be-marc be-marc deleted the mirai branch August 14, 2025 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants