-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Related discussions in #83, #3, #78
After discussing #83 and seeing that it might be a little more work than expected for users to update to progressr from my hacky .progress = TRUE argument in furrr, I am reconsidering keeping the .progress argument, but having it use progressr internally. It would be focused on signaling the simplest progress update possible, and would not wrap with_progress() as #78 does. That would still be left to the user.
I am curious about your thoughts on this.
It would work like this:
library(dplyr)
library(tidyr)
library(furrr)
library(progressr)
df <- nest(mtcars, data = -gear)
# Without dplyr
with_progress({
future_map(df$data, nrow, .progress = TRUE)
})
# In a pipeline
df %>%
mutate(x = with_progress(future_map(data, nrow, .progress = TRUE)))
# The old way, for comparison
df %>%
mutate(x = future_map(data, nrow, .progress = TRUE))I am also considering letting .progress take FALSE/TRUE and some other type of argument that would control how the internal progressor is created, just to give it a little more flexibility, but I'm not quite sure yet. I might start simpler and do that if it is requested.
Pros of this:
-
I think it is a little bit easier to use vs creating the
p <- progressor()object and ticking it manually withp()inside the.ffunction. As we've seen in Adding a progress bar by default in a package #78, it seems to be a little tricky to get it exactly right, especially with dplyr. -
The default of
.progress = FALSEwill have zero overhead -
This approach still allows users to create a progressor object themselves and tick it manually, as displayed in Adding a progress bar by default in a package #78.
-
Perhaps this is a longer horizon "pro", but with a global calling handler we could completely remove the need for
with_progress()and it would look exactly like the original furrr implementation.
Cons of this:
-
It requires managing a
.progressoption in furrr, but I think that is okay. -
It is a little strange to have
.progress = TRUEwhich signals progress updates, but then you also have to wrap it inwith_progress()to see the updates. It also requires a little documentation to explain this, but it could go right in the docs for the.progressargument. -
It would call
p()on every element of.x, which might be overkill and result in performance issues. This is related to Limit the number of progression conditions signaled #81, but if it was a real issue then users could use a custom progressor object and control the progress updates themselves. -
Nested progress updates are still an issue, but I think that is a separate issue from this one, and whatever improvements come there would flow naturally into furrr.
-
Technically furrr's
.progress = TRUEargument "worked" on multicore, and this currently won't, but I can live with that.