-
Notifications
You must be signed in to change notification settings - Fork 76
Closed
Labels
Description
Prework
- I understand and agree to help guide.
- I understand and agree to contributing guide.
- New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.
Proposal
I propose that tar_target() takes a new type of format called auto, and if successful, that would become the default. auto would behave in the following way:
- If the output of the target is a data frame, use
format = "nanoparquet", which would be a new format that uses nanoparquet to create parquet files. nanoparquet is a zero-dependency parquet reader/writer so targets could take a hard dependency on this package ensuring that it's available to all users. - If the output is a character vector and
all(file.exists(output))is true, it would useformat = "file_fast"(unlesstrust_object_timestampsisFALSE, in which case it would use"file"). - For all other types "rds". (Unless you'd be willing to add qs as a hard dependency, in which case I'd argue that qs is basically uniformly superior to rds. Alternatively you could use qs if it's installed, but I don't know enough about the architecture of targets to fully understand the consequences of that).
This steers the new user towards high-performance formats while allowing experienced users to continue to pick the best defaults for them.
luciorq, johnbaums, Aariq and philiporlando