Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

wlandau
Copy link
Member

@wlandau wlandau commented Apr 9, 2021

Prework

Related GitHub issues and pull requests

Background

At the inception of targets, I swore I would stick to super strict guardrails to protect users against common pitfalls from drake. drake was far too dependent on the transient state of the user's session, and the excessive customization and flexibility caused a lot of surprises and frustration for new users. drake's ability to set the path to the cache was one of those areas of flexibility I came to regret because it allowed people to create overly complicated projects with unnecessary levels of tech debt. This is exactly the sort of thing that undermines reproducibility and readability.

However, there is too much demand for the ability to set the data store to something other than ./_targets/. Several people have requested this as a matter of personal convenience or to work around rare platform-related inefficiencies, and I do not think this alone is enough to justify such a feature. However, I just learned that custom data paths are an absolute requirement if targets is going to be usable for most workflows deployed to RStudio Connect. That covers several situations where custom data store paths are reasonable. So unfortunately, given the other frameworks {targets} needs to interact with, I no longer think it is possible to avoid arbitrarily custom local data storage.

Changes

This PR creates a configuration setting management system through an optional _targets.yaml file at the project root (next to _targets.R). New functions tar_config_set() and tar_config_get() modify this file, and simply removing _targets.yaml restores all the default settings. Managing the settings through _targets.yaml (as opposed to global options or environment variables) ensures that the pipeline does not depend on the transient memory of an R session. This should avoid surprises like this one. Also, breaking _targets.yaml in the middle of a pipeline run will not corrupt that run, but it may alter the behavior of subsequent runs if _targets.yaml is not repaired.

The only currently supported YAML setting is the data store path. tar_config_set(store = "custom/path") writes store: custom/path to _targets.yaml, and subsequent calls to tar_make(), tar_read(), etc. will use custom/path as the data store instead of _targets/. custom/path does not need to exist in advance, and after the pipeline is done, you can manually move custom/path back to _targets/ and trust that your targets will remain up to date. (Lots of new automated tests in this PR.)

Example

library(targets)
tar_script(list(tar_target(x, 1 + 1)))

tar_config_get("store")
#> [1] "_targets"

store_path <- tempfile()

tar_config_set(store = store_path)
tar_config_get("store")
#> [1] "/var/folders/k3/q1f45fsn4_13jbn0742d4zj40000gn/T//Rtmp0IZ9IY/file1120c4fa7a822"

tar_make()
#> ● start target x
#> ● built target x
#> ● end pipeline

tar_read(x)
#> [1] 2

file.exists("_targets")
#> [1] FALSE

file.exists(store_path)
#> [1] TRUE

Created on 2021-04-08 by the reprex package (v1.0.0)

cc

@wlandau wlandau self-assigned this Apr 9, 2021
@wlandau wlandau merged commit a926e5a into main Apr 9, 2021
@wlandau wlandau deleted the 297 branch April 9, 2021 02:05
wlandau-lilly pushed a commit to ropensci-books/targets that referenced this pull request Apr 9, 2021
wlandau-lilly pushed a commit to ropensci-books/targets-design that referenced this pull request Apr 9, 2021
@strazto
Copy link

strazto commented Apr 21, 2021

Woohoo!

@strazto
Copy link

strazto commented Apr 21, 2021

(Please advise if this is the appropiate place to have this conversation, or you'd prefer to discuss in an issue)

Hi @wlandau - I've loosely following targets's development since you first brought it to my attention, and I think I may have been the first user to request this feature #196 .

I'm curious about the configuration artefact, _targets.yaml -
Do you anticipate _targets.yaml to be versioned, or is _targets.yaml more in line with a .env file in a Ruby / docker-compose - ie- a local override?

Presently, given that _targets.yaml only configures a storage path, I'd imagine that it would work just fine as a local override, however, I'm curious as to your vision of _targets.yaml's scope/role, and whether it aligns with this.

@wlandau
Copy link
Member Author

wlandau commented Apr 21, 2021

I'm curious about the configuration artefact, _targets.yaml -
Do you anticipate _targets.yaml to be versioned, or is _targets.yaml more in line with a .env file in a Ruby / docker-compose - ie- a local override?

targets does not specifically watch _targets.yaml for changes if that is what you mean. _targets.yaml is more like a special environment file like .Rprofile or .Renviron.

Presently, given that _targets.yaml only configures a storage path, I'd imagine that it would work just fine as a local override, however, I'm curious as to your vision of _targets.yaml's scope/role, and whether it aligns with this.

I am not sure which specific features will go into _targets.yaml going forward, but I have a general plan. Some settings affect the pipeline and the external callr process that runs it, while other settings affect more interactive components like menus and warnings. The former are controlled with tar_option_set(), and the latter are controlled with environment variables like TAR_ASK and TAR_WARN. _targets.yaml will manage settings that affect both the behavior of the pipeline and the local/interactive R session simultaneously. For example, both processes need to know about the data store path because tar_read() uses it as well as tar_make(). Neither a tar_option_set() option nor an environment variable would alone be sufficient.

@strazto
Copy link

strazto commented Apr 21, 2021

Thank you @wlandau , your answer clearly implies that _target.yaml is going to be more in line with a .env file, which is very helpful

thanks for developing this, & drake, I've recently reviewed targetss docs & been very excited for what I've seen on the HPC side of things, so am looking forward to transitioning into it

@nsheff
Copy link

nsheff commented May 20, 2021

tar_config_set(store = "custom/path") writes store: custom/path

In my hands, there are a two issues with this:

  1. if custom/path doesn't exist, the tar_config_set silently fails and does not write the_targets.yaml file.
  2. if you've set an alternative store, then with_dir calls, like withr::with_dir("/some/external/targets/folder", tar_meta()) will also expect the targets to live in the alternative store. This is not the behavior I expected; I would expect that it would only alter the storage location for the targets in the active directory where the _targets.yaml file is specified. In other words, the _targets.yaml file of the folder should be the overriding setting, not the current state of the store environment variable.

@wlandau
Copy link
Member Author

wlandau commented May 21, 2021

  1. if custom/path doesn't exist, the tar_config_set silently fails and does not write the_targets.yaml file.

That's surprising. Doesn't seem to reproduce on my end.

file.exists("_targets.yaml")
#> [1] FALSE
targets::tar_config_set(store = "custom/path")
file.exists("_targets.yaml")
#> [1] TRUE
readLines("_targets.yaml")
#> [1] "store: custom/path"

Created on 2021-05-21 by the reprex package (v2.0.0)

  1. if you've set an alternative store, then with_dir calls, like withr::with_dir("/some/external/targets/folder", tar_meta()) will also expect the targets to live in the alternative store. This is not the behavior I expected; I would expect that it would only alter the storage location for the targets in the active directory where the _targets.yaml file is specified. In other words, the _targets.yaml file of the folder should be the overriding setting, not the current state of the store environment variable.

Again, that's surprising, and the current behavior should align with your expectations. That's certainly what i see on my end. targets does not use an environment variable for the store, it uses the YAML file.

tmp <- tempfile()
dir.create(tmp)
store <- file.path(tmp, "store")
library(targets)
library(withr)
with_dir(tmp, tar_config_set(store = store))
with_dir(tmp, tar_script())
with_dir(tmp, tar_make())
#> • start target data
#> • built target data
#> • start target summary
#> • built target summary
#> • end pipeline
with_dir(tmp, tar_meta())
#> # A tibble: 3 x 17
#>   name  type  data  command depend    seed path  time                size  bytes
#>   <chr> <chr> <chr> <chr>   <chr>    <int> <lis> <dttm>              <chr> <int>
#> 1 summ  func… 725b… <NA>    <NA>   NA      <chr… NA                  <NA>     NA
#> 2 data  stem  593b… 4d28b3… 15669…  1.59e9 <chr… 2021-05-21 14:38:06 7f6e…   456
#> 3 summ… stem  8460… 504e74… aaa01…  7.60e8 <chr… 2021-05-21 14:38:06 24d5…   123
#> # … with 7 more variables: format <chr>, iteration <chr>, parent <lgl>,
#> #   children <list>, seconds <dbl>, warnings <lgl>, error <lgl>
tar_meta() # Should error out.
#> Error: utility functions like tar_read() and tar_progress() require a _targets/ data store produced by tar_make() or similar.

Created on 2021-05-21 by the reprex package (v2.0.0)

@nsheff
Copy link

nsheff commented May 21, 2021

  1. You're right. It was a latency issue (I was using a remote filesystem and checking for the file too quickly).
  2. Here, the issue only seems to arise when I'm inside a tar_make() call. SO, with with_dir works fine in just an interactive R session. But the problem is that for unitar, I'm using with_dir(..., tar_meta() inside a target factory, so I need to be able to run that in an external project when running a _targets.R file. And here, in my hands, it fails.

In my hands, I can do this:

#_targets.R
print("Test normal")
print(tar_meta())

print("Test with_dir")
print(withr::with_dir("external_targets", tar_meta()))

When I run tar_make() on this project, it works... the second tar_meta call using external_targets succeeds and correctly lists the targets in the external folder. I rely on this to build paths with unitar_path, which I need to call inside a _targets.R file for building targets that refer to external projects.

But if you do

tar_config_set(store="custom/path")
tar_make()

it fails for me, erroring out on the tar_meta call with withr... All I have to do is delete the _targets.yaml file and it works again.

@wlandau
Copy link
Member Author

wlandau commented May 22, 2021

I see what you mean now. In the general case, it would be dangerous to allow a target to modify these config settings because then it could corrupt the pipeline/run for the other targets. Many functions, including tar_make() and its siblings, deliberately lock the configuration settings on purpose. When that locking happens, config settings are read-only in a special R6 class and _targets.yaml is totally ignored while the lock is in effect for the current R session.

https://github.com/ropensci/targets/blob/main/R/utils_callr.R#L63-L64

One workaround is to call with_dir() from inside a callr::r() process. That should create a new R process that avoids the config lock of the parent R process. If you don't have too many of those targets, there should not be much overhead due to creating new processes. But if it gets to be too much, I may consider a withr-like wrapper for lockless operations. Currently on the fence about that.

Related: 1f8cfb5 fixes a bug/oversight where locked config settings were incorrectly not passed to parallel workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants