-
Notifications
You must be signed in to change notification settings - Fork 76
Allow alternative data store locations via _targets.yaml #407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Woohoo! |
|
(Please advise if this is the appropiate place to have this conversation, or you'd prefer to discuss in an issue) Hi @wlandau - I've loosely following I'm curious about the configuration artefact, Presently, given that |
I am not sure which specific features will go into |
|
Thank you @wlandau , your answer clearly implies that thanks for developing this, & drake, I've recently reviewed |
In my hands, there are a two issues with this:
|
That's surprising. Doesn't seem to reproduce on my end. file.exists("_targets.yaml")
#> [1] FALSE
targets::tar_config_set(store = "custom/path")
file.exists("_targets.yaml")
#> [1] TRUE
readLines("_targets.yaml")
#> [1] "store: custom/path"Created on 2021-05-21 by the reprex package (v2.0.0)
Again, that's surprising, and the current behavior should align with your expectations. That's certainly what i see on my end. tmp <- tempfile()
dir.create(tmp)
store <- file.path(tmp, "store")
library(targets)
library(withr)
with_dir(tmp, tar_config_set(store = store))
with_dir(tmp, tar_script())
with_dir(tmp, tar_make())
#> • start target data
#> • built target data
#> • start target summary
#> • built target summary
#> • end pipeline
with_dir(tmp, tar_meta())
#> # A tibble: 3 x 17
#> name type data command depend seed path time size bytes
#> <chr> <chr> <chr> <chr> <chr> <int> <lis> <dttm> <chr> <int>
#> 1 summ func… 725b… <NA> <NA> NA <chr… NA <NA> NA
#> 2 data stem 593b… 4d28b3… 15669… 1.59e9 <chr… 2021-05-21 14:38:06 7f6e… 456
#> 3 summ… stem 8460… 504e74… aaa01… 7.60e8 <chr… 2021-05-21 14:38:06 24d5… 123
#> # … with 7 more variables: format <chr>, iteration <chr>, parent <lgl>,
#> # children <list>, seconds <dbl>, warnings <lgl>, error <lgl>
tar_meta() # Should error out.
#> Error: utility functions like tar_read() and tar_progress() require a _targets/ data store produced by tar_make() or similar.Created on 2021-05-21 by the reprex package (v2.0.0) |
In my hands, I can do this: When I run But if you do it fails for me, erroring out on the |
|
I see what you mean now. In the general case, it would be dangerous to allow a target to modify these config settings because then it could corrupt the pipeline/run for the other targets. Many functions, including https://github.com/ropensci/targets/blob/main/R/utils_callr.R#L63-L64 One workaround is to call Related: 1f8cfb5 fixes a bug/oversight where locked config settings were incorrectly not passed to parallel workers. |
Prework
Related GitHub issues and pull requests
Background
At the inception of
targets, I swore I would stick to super strict guardrails to protect users against common pitfalls fromdrake.drakewas far too dependent on the transient state of the user's session, and the excessive customization and flexibility caused a lot of surprises and frustration for new users.drake's ability to set the path to the cache was one of those areas of flexibility I came to regret because it allowed people to create overly complicated projects with unnecessary levels of tech debt. This is exactly the sort of thing that undermines reproducibility and readability.However, there is too much demand for the ability to set the data store to something other than
./_targets/. Several people have requested this as a matter of personal convenience or to work around rare platform-related inefficiencies, and I do not think this alone is enough to justify such a feature. However, I just learned that custom data paths are an absolute requirement iftargetsis going to be usable for most workflows deployed to RStudio Connect. That covers several situations where custom data store paths are reasonable. So unfortunately, given the other frameworks {targets} needs to interact with, I no longer think it is possible to avoid arbitrarily custom local data storage.Changes
This PR creates a configuration setting management system through an optional
_targets.yamlfile at the project root (next to_targets.R). New functionstar_config_set()andtar_config_get()modify this file, and simply removing_targets.yamlrestores all the default settings. Managing the settings through_targets.yaml(as opposed to global options or environment variables) ensures that the pipeline does not depend on the transient memory of an R session. This should avoid surprises like this one. Also, breaking_targets.yamlin the middle of a pipeline run will not corrupt that run, but it may alter the behavior of subsequent runs if_targets.yamlis not repaired.The only currently supported YAML setting is the data store path.
tar_config_set(store = "custom/path")writesstore: custom/pathto_targets.yaml, and subsequent calls totar_make(),tar_read(), etc. will usecustom/pathas the data store instead of_targets/.custom/pathdoes not need to exist in advance, and after the pipeline is done, you can manually movecustom/pathback to_targets/and trust that your targets will remain up to date. (Lots of new automated tests in this PR.)Example
Created on 2021-04-08 by the reprex package (v1.0.0)
cc