Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Content addressable storage (CAS) #1314

@wlandau

Description

@wlandau

c.f. #1232

Content addressable storage (CAS) is a type of storage system more amenable to portability and collaboration than what targets currently uses. In CAS, the name of each object is its hash, and there is a mapping from human-friendly target names to these hashes. CAS would allow the actual data to be stored centrally rather than locally, and it would let multiple pipelines leverage each other’s results.

Posit Conf gave me much think about regarding CAS. Many users (many more than I originally thought) would benefit from better native support to write custom third-party CAS systems. I have realized that the approach in #1232 (reply in thread) is difficult for users to implement.

I don’t know exactly how to go forward with this at the moment. However, I can state a few goals of a heavy-handed CAS:

  • Users should be able to declare how to upload an object, download an object, and check if an object exists. Only the hash of the object should be needed as input.
  • {targets} should be able to turn this input info into a CAS without needing the “pointer files” in _targets/objects mentioned in [general] New cloud hashing approach and collaborative workflows #1232.
  • There should be some kind optional “list” step at the beginning to make existence checking fast (e.g. with a LIST request in the case of AWS S3).

I am not sure the above would actually fit well enough into the design of targets.

For a lightweight CAS, a custom tar_format() could be the vehicle, but with some support that avoids the need for users to micromanage key files and their hashes.

I have not decided on a direction yet.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions