-
Notifications
You must be signed in to change notification settings - Fork 76
Description
Under the default settings for cloud storage, targets checks each and every target hash with its own AWS API call, which is extremely time-consuming. This is why https://books.ropensci.org/targets/cloud-storage.html recommend tar_cue(file = FALSE) for large pipelines on the cloud. This is fine if you're not manually modifying objects in the bucket, but it is not ideal. It would be better to find a safer way to speed up targets when it checks that cloud objects are up to date.
Previously I posted #1131. Versioning might not be a problem if we assume most of the objects are in their current version most of the time. However, list_objects_v2() operates on whole prefixes, which might slow us down because it operates on more objects than we really need. And then there's pagination to contend with. This functionality is worth revisiting, but the ideas I have so far range from painful to infeasible.