Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DISCUSSION: .progress argument for map-reduce wrappers #85

@DavisVaughan

Description

@DavisVaughan

Related discussions in #83, #3, #78

After discussing #83 and seeing that it might be a little more work than expected for users to update to progressr from my hacky .progress = TRUE argument in furrr, I am reconsidering keeping the .progress argument, but having it use progressr internally. It would be focused on signaling the simplest progress update possible, and would not wrap with_progress() as #78 does. That would still be left to the user.

I am curious about your thoughts on this.

It would work like this:

library(dplyr)
library(tidyr)
library(furrr)
library(progressr)

df <- nest(mtcars, data = -gear)

# Without dplyr
with_progress({
  future_map(df$data, nrow, .progress = TRUE)
})

# In a pipeline
df %>%
  mutate(x = with_progress(future_map(data, nrow, .progress = TRUE)))

# The old way, for comparison
df %>%
  mutate(x = future_map(data, nrow, .progress = TRUE))

I am also considering letting .progress take FALSE/TRUE and some other type of argument that would control how the internal progressor is created, just to give it a little more flexibility, but I'm not quite sure yet. I might start simpler and do that if it is requested.

Pros of this:

  • I think it is a little bit easier to use vs creating the p <- progressor() object and ticking it manually with p() inside the .f function. As we've seen in Adding a progress bar by default in a package #78, it seems to be a little tricky to get it exactly right, especially with dplyr.

  • The default of .progress = FALSE will have zero overhead

  • This approach still allows users to create a progressor object themselves and tick it manually, as displayed in Adding a progress bar by default in a package #78.

  • Perhaps this is a longer horizon "pro", but with a global calling handler we could completely remove the need for with_progress() and it would look exactly like the original furrr implementation.

Cons of this:

  • It requires managing a .progress option in furrr, but I think that is okay.

  • It is a little strange to have .progress = TRUE which signals progress updates, but then you also have to wrap it in with_progress() to see the updates. It also requires a little documentation to explain this, but it could go right in the docs for the .progress argument.

  • It would call p() on every element of .x, which might be overkill and result in performance issues. This is related to Limit the number of progression conditions signaled #81, but if it was a real issue then users could use a custom progressor object and control the progress updates themselves.

  • Nested progress updates are still an issue, but I think that is a separate issue from this one, and whatever improvements come there would flow naturally into furrr.

  • Technically furrr's .progress = TRUE argument "worked" on multicore, and this currently won't, but I can live with that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions