Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Cache list_objects_v2() to speed up the file cue for cloud objects #1172

@wlandau

Description

@wlandau

Under the default settings for cloud storage, targets checks each and every target hash with its own AWS API call, which is extremely time-consuming. This is why https://books.ropensci.org/targets/cloud-storage.html recommend tar_cue(file = FALSE) for large pipelines on the cloud. This is fine if you're not manually modifying objects in the bucket, but it is not ideal. It would be better to find a safer way to speed up targets when it checks that cloud objects are up to date.

Previously I posted #1131. Versioning might not be a problem if we assume most of the objects are in their current version most of the time. However, list_objects_v2() operates on whole prefixes, which might slow us down because it operates on more objects than we really need. And then there's pagination to contend with. This functionality is worth revisiting, but the ideas I have so far range from painful to infeasible.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions