Thanks to visit codestin.com
Credit goes to github.com

Skip to content

format = "auto" #1311

@hadley

Description

@hadley

Prework

  • I understand and agree to help guide.
  • I understand and agree to contributing guide.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

I propose that tar_target() takes a new type of format called auto, and if successful, that would become the default. auto would behave in the following way:

  • If the output of the target is a data frame, use format = "nanoparquet", which would be a new format that uses nanoparquet to create parquet files. nanoparquet is a zero-dependency parquet reader/writer so targets could take a hard dependency on this package ensuring that it's available to all users.
  • If the output is a character vector and all(file.exists(output)) is true, it would use format = "file_fast" (unless trust_object_timestamps is FALSE, in which case it would use "file").
  • For all other types "rds". (Unless you'd be willing to add qs as a hard dependency, in which case I'd argue that qs is basically uniformly superior to rds. Alternatively you could use qs if it's installed, but I don't know enough about the architecture of targets to fully understand the consequences of that).

This steers the new user towards high-performance formats while allowing experienced users to continue to pick the best defaults for them.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions