-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[CAS] LLVMCAS implementation #68448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[CAS] LLVMCAS implementation #68448
Conversation
@llvm/pr-subscribers-platform-windows @llvm/pr-subscribers-llvm-support ChangesAdds Content Addressable Storage implementation for LLVM, which includes:
Patch is 383.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/68448.diff 59 Files Affected:
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 103c08ffbe83b38..c5f36daa0223ad3 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -758,6 +758,18 @@ option (LLVM_ENABLE_SPHINX "Use Sphinx to generate llvm documentation." OFF)
option (LLVM_ENABLE_OCAMLDOC "Build OCaml bindings documentation." ON)
option (LLVM_ENABLE_BINDINGS "Build bindings." ON)
+if(UNIX AND CMAKE_SIZEOF_VOID_P GREATER_EQUAL 8)
+ set(LLVM_ENABLE_ONDISK_CAS_default ON)
+else()
+ set(LLVM_ENABLE_ONDISK_CAS_default OFF)
+endif()
+option(LLVM_ENABLE_ONDISK_CAS "Build OnDiskCAS." ${LLVM_ENABLE_ONDISK_CAS_default})
+option(LLVM_CAS_ENABLE_REMOTE_CACHE "Build remote CAS service" OFF)
+if(LLVM_CAS_ENABLE_REMOTE_CACHE)
+ include(FindGRPC)
+endif()
+
+
set(LLVM_INSTALL_DOXYGEN_HTML_DIR "${CMAKE_INSTALL_DOCDIR}/llvm/doxygen-html"
CACHE STRING "Doxygen-generated HTML documentation install directory")
set(LLVM_INSTALL_OCAMLDOC_HTML_DIR "${CMAKE_INSTALL_DOCDIR}/llvm/ocaml-html"
diff --git a/llvm/docs/ContentAddressableStorage.md b/llvm/docs/ContentAddressableStorage.md
new file mode 100644
index 000000000000000..4f2d9a6a3a91857
--- /dev/null
+++ b/llvm/docs/ContentAddressableStorage.md
@@ -0,0 +1,120 @@
+# Content Addressable Storage
+
+## Introduction to CAS
+
+Content Addressable Storage, or `CAS`, is a storage system where it assigns
+unique addresses to the data stored. It is very useful for data deduplicaton
+and creating unique identifiers.
+
+Unlikely other kind of storage system like file system, CAS is immutable. It
+is more reliable to model a computation when representing the inputs and outputs
+of the computation using objects stored in CAS.
+
+The basic unit of the CAS library is a CASObject, where it contains:
+
+* Data: arbitrary data
+* References: references to other CASObject
+
+It can be conceptually modeled as something like:
+
+```
+struct CASObject {
+ ArrayRef<char> Data;
+ ArrayRef<CASObject*> Refs;
+}
+```
+
+Such abstraction can allow simple composition of CASObjects into a DAG to
+represent complicated data structure while still allowing data deduplication.
+Note you can compare two DAGs by just comparing the CASObject hash of two
+root nodes.
+
+
+
+## LLVM CAS Library User Guide
+
+The CAS-like storage provided in LLVM is `llvm::cas::ObjectStore`.
+To reference a CASObject, there are few different abstractions provided
+with different trade-offs:
+
+### ObjectRef
+
+`ObjectRef` is a lightweight reference to a CASObject stored in the CAS.
+This is the most commonly used abstraction and it is cheap to copy/pass
+along. It has following properties:
+
+* `ObjectRef` is only meaningful within the `ObjectStore` that created the ref.
+`ObjectRef` created by different `ObjectStore` cannot be cross-referenced or
+compared.
+* `ObjectRef` doesn't guarantee the existence of the CASObject it points to. An
+explicitly load is required before accessing the data stored in CASObject.
+This load can also fail, for reasons like but not limited to: object does
+not exist, corrupted CAS storage, operation timeout, etc.
+* If two `ObjectRef` are equal, it is guarantee that the object they point to
+(if exists) are identical. If they are not equal, the underlying objects are
+guaranteed to be not the same.
+
+### ObjectProxy
+
+`ObjectProxy` represents a loaded CASObject. With an `ObjectProxy`, the
+underlying stored data and references can be accessed without the need
+of error handling. The class APIs also provide convenient methods to
+access underlying data. The lifetime of the underlying data is equal to
+the lifetime of the instance of `ObjectStore` unless explicitly copied.
+
+### CASID
+
+`CASID` is the hash identifier for CASObjects. It owns the underlying
+storage for hash value so it can be expensive to copy and compare depending
+on the hash algorithm. `CASID` is generally only useful in rare situations
+like printing raw hash value or exchanging hash values between different
+CAS instances with the same hashing schema.
+
+### ObjectStore
+
+`ObjectStore` is the CAS-like object storage. It provides API to save
+and load CASObjects, for example:
+
+```
+ObjectRef A, B, C;
+Expected<ObjectRef> Stored = ObjectStore.store("data", {A, B});
+Expected<ObjectProxy> Loaded = ObjectStore.getProxy(C);
+```
+
+It also provides APIs to convert between `ObjectRef`, `ObjectProxy` and
+`CASID`.
+
+
+
+## CAS Library Implementation Guide
+
+The LLVM ObjectStore APIs are designed so that it is easy to add
+customized CAS implementation that are interchangeable with builtin
+CAS implementations.
+
+To add your own implementation, you just need to add a subclass to
+`llvm::cas::ObjectStore` and implement all its pure virtual methods.
+To be interchangeable with LLVM ObjectStore, the new CAS implementation
+needs to conform to following contracts:
+
+* Different CASObject stored in the ObjectStore needs to have a different hash
+and result in a different `ObjectRef`. Vice versa, same CASObject should have
+same hash and same `ObjectRef`. Note two different CASObjects with identical
+data but different references are considered different objects.
+* `ObjectRef`s are comparable within the same `ObjectStore` instance, and can
+be used to determine the equality of the underlying CASObjects.
+* The loaded objects from the ObjectStore need to have the lifetime to be at
+least as long as the ObjectStore itself.
+
+If not specified, the behavior can be implementation defined. For example,
+`ObjectRef` can be used to point to a loaded CASObject so
+`ObjectStore` never fails to load. It is also legal to use a stricter model
+than required. For example, an `ObjectRef` that can be used to compare
+objects between different `ObjectStore` instances is legal but user
+of the ObjectStore should not depend on this behavior.
+
+For CAS library implementer, there is also a `ObjectHandle` class that
+is an internal representation of a loaded CASObject reference.
+`ObjectProxy` is just a pair of `ObjectHandle` and `ObjectStore`, because
+just like `ObjectRef`, `ObjectHandle` is only useful when paired with
+the ObjectStore that knows about the loaded CASObject.
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
index 3a1d1665be439e2..ddd5ffb10c6ac85 100644
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -15,6 +15,7 @@ LLVM and API reference documentation.
BranchWeightMetadata
Bugpoint
CommandGuide/index
+ ContentAddressableStorage
ConvergenceAndUniformity
ConvergentOperations
Coroutines
@@ -228,3 +229,6 @@ Additional Topics
:doc:`ConvergenceAndUniformity`
A description of uniformity analysis in the presence of irreducible
control flow, and its implementation.
+
+:doc:`ContentAddressableStorage`
+ A reference guide for using LLVM's CAS library.
diff --git a/llvm/include/llvm/ADT/TrieRawHashMap.h b/llvm/include/llvm/ADT/TrieRawHashMap.h
new file mode 100644
index 000000000000000..607f64924e75d6c
--- /dev/null
+++ b/llvm/include/llvm/ADT/TrieRawHashMap.h
@@ -0,0 +1,398 @@
+//===- TrieRawHashMap.h -----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_ADT_TRIERAWHASHMAP_H
+#define LLVM_ADT_TRIERAWHASHMAP_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Casting.h"
+#include <atomic>
+#include <optional>
+
+namespace llvm {
+
+class raw_ostream;
+
+/// TrieRawHashMap - is a lock-free thread-safe trie that is can be used to
+/// store/index data based on a hash value. It can be customized to work with
+/// any hash algorithm or store any data.
+///
+/// Data structure:
+/// Data node stored in the Trie contains both hash and data:
+/// struct {
+/// HashT Hash;
+/// DataT Data;
+/// };
+///
+/// Data is stored/indexed via a prefix tree, where each node in the tree can be
+/// either the root, a sub-trie or a data node. Assuming a 4-bit hash and two
+/// data objects {0001, A} and {0100, B}, it can be stored in a trie
+/// (assuming Root has 2 bits, SubTrie has 1 bit):
+/// +--------+
+/// |Root[00]| -> {0001, A}
+/// | [01]| -> {0100, B}
+/// | [10]| (empty)
+/// | [11]| (empty)
+/// +--------+
+///
+/// Inserting a new object {0010, C} will result in:
+/// +--------+ +----------+
+/// |Root[00]| -> |SubTrie[0]| -> {0001, A}
+/// | | | [1]| -> {0010, C}
+/// | | +----------+
+/// | [01]| -> {0100, B}
+/// | [10]| (empty)
+/// | [11]| (empty)
+/// +--------+
+/// Note object A is sinked down to a sub-trie during the insertion. All the
+/// nodes are inserted through compare-exchange to ensure thread-safe and
+/// lock-free.
+///
+/// To find an object in the trie, walk the tree with prefix of the hash until
+/// the data node is found. Then the hash is compared with the hash stored in
+/// the data node to see if the is the same object.
+///
+/// Hash collision is not allowed so it is recommanded to use trie with a
+/// "strong" hashing algorithm. A well-distributed hash can also result in
+/// better performance and memory usage.
+///
+/// It currently does not support iteration and deletion.
+
+/// Base class for a lock-free thread-safe hash-mapped trie.
+class ThreadSafeTrieRawHashMapBase {
+public:
+ static constexpr size_t TrieContentBaseSize = 4;
+ static constexpr size_t DefaultNumRootBits = 6;
+ static constexpr size_t DefaultNumSubtrieBits = 4;
+
+private:
+ template <class T> struct AllocValueType {
+ char Base[TrieContentBaseSize];
+ std::aligned_union_t<sizeof(T), T> Content;
+ };
+
+protected:
+ template <class T>
+ static constexpr size_t DefaultContentAllocSize = sizeof(AllocValueType<T>);
+
+ template <class T>
+ static constexpr size_t DefaultContentAllocAlign = alignof(AllocValueType<T>);
+
+ template <class T>
+ static constexpr size_t DefaultContentOffset =
+ offsetof(AllocValueType<T>, Content);
+
+public:
+ void operator delete(void *Ptr) { ::free(Ptr); }
+
+ LLVM_DUMP_METHOD void dump() const;
+ void print(raw_ostream &OS) const;
+
+protected:
+ /// Result of a lookup. Suitable for an insertion hint. Maybe could be
+ /// expanded into an iterator of sorts, but likely not useful (visiting
+ /// everything in the trie should probably be done some way other than
+ /// through an iterator pattern).
+ class PointerBase {
+ protected:
+ void *get() const { return I == -2u ? P : nullptr; }
+
+ public:
+ PointerBase() noexcept = default;
+ PointerBase(PointerBase &&) = default;
+ PointerBase(const PointerBase &) = default;
+ PointerBase &operator=(PointerBase &&) = default;
+ PointerBase &operator=(const PointerBase &) = default;
+
+ private:
+ friend class ThreadSafeTrieRawHashMapBase;
+ explicit PointerBase(void *Content) : P(Content), I(-2u) {}
+ PointerBase(void *P, unsigned I, unsigned B) : P(P), I(I), B(B) {}
+
+ bool isHint() const { return I != -1u && I != -2u; }
+
+ void *P = nullptr;
+ unsigned I = -1u;
+ unsigned B = 0;
+ };
+
+ /// Find the stored content with hash.
+ PointerBase find(ArrayRef<uint8_t> Hash) const;
+
+ /// Insert and return the stored content.
+ PointerBase
+ insert(PointerBase Hint, ArrayRef<uint8_t> Hash,
+ function_ref<const uint8_t *(void *Mem, ArrayRef<uint8_t> Hash)>
+ Constructor);
+
+ ThreadSafeTrieRawHashMapBase() = delete;
+
+ ThreadSafeTrieRawHashMapBase(
+ size_t ContentAllocSize, size_t ContentAllocAlign, size_t ContentOffset,
+ std::optional<size_t> NumRootBits = std::nullopt,
+ std::optional<size_t> NumSubtrieBits = std::nullopt);
+
+ /// Destructor, which asserts if there's anything to do. Subclasses should
+ /// call \a destroyImpl().
+ ///
+ /// \pre \a destroyImpl() was already called.
+ ~ThreadSafeTrieRawHashMapBase();
+ void destroyImpl(function_ref<void(void *ValueMem)> Destructor);
+
+ ThreadSafeTrieRawHashMapBase(ThreadSafeTrieRawHashMapBase &&RHS);
+
+ // Move assignment can be implemented in a thread-safe way if NumRootBits and
+ // NumSubtrieBits are stored inside the Root.
+ ThreadSafeTrieRawHashMapBase &
+ operator=(ThreadSafeTrieRawHashMapBase &&RHS) = delete;
+
+ // No copy.
+ ThreadSafeTrieRawHashMapBase(const ThreadSafeTrieRawHashMapBase &) = delete;
+ ThreadSafeTrieRawHashMapBase &
+ operator=(const ThreadSafeTrieRawHashMapBase &) = delete;
+
+ // Debug functions. Implementation details and not guaranteed to be
+ // thread-safe.
+ PointerBase getRoot() const;
+ unsigned getStartBit(PointerBase P) const;
+ unsigned getNumBits(PointerBase P) const;
+ unsigned getNumSlotUsed(PointerBase P) const;
+ std::string getTriePrefixAsString(PointerBase P) const;
+ unsigned getNumTries() const;
+ // Visit next trie in the allocation chain.
+ PointerBase getNextTrie(PointerBase P) const;
+
+private:
+ friend class TrieRawHashMapTestHelper;
+ const unsigned short ContentAllocSize;
+ const unsigned short ContentAllocAlign;
+ const unsigned short ContentOffset;
+ unsigned short NumRootBits;
+ unsigned short NumSubtrieBits;
+ struct ImplType;
+ // ImplPtr is owned by ThreadSafeTrieRawHashMapBase and needs to be freed in
+ // destoryImpl.
+ std::atomic<ImplType *> ImplPtr;
+ ImplType &getOrCreateImpl();
+ ImplType *getImpl() const;
+};
+
+/// Lock-free thread-safe hash-mapped trie.
+template <class T, size_t NumHashBytes>
+class ThreadSafeTrieRawHashMap : public ThreadSafeTrieRawHashMapBase {
+public:
+ using HashT = std::array<uint8_t, NumHashBytes>;
+
+ class LazyValueConstructor;
+ struct value_type {
+ const HashT Hash;
+ T Data;
+
+ value_type(value_type &&) = default;
+ value_type(const value_type &) = default;
+
+ value_type(ArrayRef<uint8_t> Hash, const T &Data)
+ : Hash(makeHash(Hash)), Data(Data) {}
+ value_type(ArrayRef<uint8_t> Hash, T &&Data)
+ : Hash(makeHash(Hash)), Data(std::move(Data)) {}
+
+ private:
+ friend class LazyValueConstructor;
+
+ struct EmplaceTag {};
+ template <class... ArgsT>
+ value_type(ArrayRef<uint8_t> Hash, EmplaceTag, ArgsT &&...Args)
+ : Hash(makeHash(Hash)), Data(std::forward<ArgsT>(Args)...) {}
+
+ static HashT makeHash(ArrayRef<uint8_t> HashRef) {
+ HashT Hash;
+ std::copy(HashRef.begin(), HashRef.end(), Hash.data());
+ return Hash;
+ }
+ };
+
+ using ThreadSafeTrieRawHashMapBase::operator delete;
+ using HashType = HashT;
+
+ using ThreadSafeTrieRawHashMapBase::dump;
+ using ThreadSafeTrieRawHashMapBase::print;
+
+private:
+ template <class ValueT> class PointerImpl : PointerBase {
+ friend class ThreadSafeTrieRawHashMap;
+
+ ValueT *get() const {
+ if (void *B = PointerBase::get())
+ return reinterpret_cast<ValueT *>(B);
+ return nullptr;
+ }
+
+ public:
+ ValueT &operator*() const {
+ assert(get());
+ return *get();
+ }
+ ValueT *operator->() const {
+ assert(get());
+ return get();
+ }
+ explicit operator bool() const { return get(); }
+
+ PointerImpl() = default;
+ PointerImpl(PointerImpl &&) = default;
+ PointerImpl(const PointerImpl &) = default;
+ PointerImpl &operator=(PointerImpl &&) = default;
+ PointerImpl &operator=(const PointerImpl &) = default;
+
+ protected:
+ PointerImpl(PointerBase Result) : PointerBase(Result) {}
+ };
+
+public:
+ class pointer;
+ class const_pointer;
+ class pointer : public PointerImpl<value_type> {
+ friend class ThreadSafeTrieRawHashMap;
+ friend class const_pointer;
+
+ public:
+ pointer() = default;
+ pointer(pointer &&) = default;
+ pointer(const pointer &) = default;
+ pointer &operator=(pointer &&) = default;
+ pointer &operator=(const pointer &) = default;
+
+ private:
+ pointer(PointerBase Result) : pointer::PointerImpl(Result) {}
+ };
+
+ class const_pointer : public PointerImpl<const value_type> {
+ friend class ThreadSafeTrieRawHashMap;
+
+ public:
+ const_pointer() = default;
+ const_pointer(const_pointer &&) = default;
+ const_pointer(const const_pointer &) = default;
+ const_pointer &operator=(const_pointer &&) = default;
+ const_pointer &operator=(const const_pointer &) = default;
+
+ const_pointer(const pointer &P) : const_pointer::PointerImpl(P) {}
+
+ private:
+ const_pointer(PointerBase Result) : const_pointer::PointerImpl(Result) {}
+ };
+
+ class LazyValueConstructor {
+ public:
+ value_type &operator()(T &&RHS) {
+ assert(Mem && "Constructor already called, or moved away");
+ return assign(::new (Mem) value_type(Hash, std::move(RHS)));
+ }
+ value_type &operator()(const T &RHS) {
+ assert(Mem && "Constructor already called, or moved away");
+ return assign(::new (Mem) value_type(Hash, RHS));
+ }
+ template <class... ArgsT> value_type &emplace(ArgsT &&...Args) {
+ assert(Mem && "Constructor already called, or moved away");
+ return assign(::new (Mem)
+ value_type(Hash, typename value_type::EmplaceTag{},
+ std::forward<ArgsT>(Args)...));
+ }
+
+ LazyValueConstructor(LazyValueConstructor &&RHS)
+ : Mem(RHS.Mem), Result(RHS.Result), Hash(RHS.Hash) {
+ RHS.Mem = nullptr; // Moved away, cannot call.
+ }
+ ~LazyValueConstructor() { assert(!Mem && "Constructor never called!"); }
+
+ private:
+ value_type &assign(value_type *V) {
+ Mem = nullptr;
+ Result = V;
+ return *V;
+ }
+ friend class ThreadSafeTrieRawHashMap;
+ LazyValueConstructor() = delete;
+ LazyValueConstructor(void *Mem, value_type *&Result, ArrayRef<uint8_t> Hash)
+ : Mem(Mem), Result(Result), Hash(Hash) {
+ assert(Hash.size() == sizeof(HashT) && "Invalid hash");
+ assert(Mem && "Invalid memory for construction");
+ }
+ void *Mem;
+ value_type *&Result;
+ ArrayRef<uint8_t> Hash;
+ };
+
+ /// Insert with a hint. Default-constructed hint will work, but it's
+ /// recommended to start with a lookup to avoid overhead in object creation
+ /// if it already exists.
+ pointer insertLazy(const_pointer Hint, ArrayRef<uint8_t> Hash,
+ function_ref<void(LazyValueConstructor)> OnConstruct) {
+ return pointer(ThreadSafeTrieRawHashMapBase::insert(
+ Hint, Hash, [&](void *Mem, ArrayRef<uint8_t> Hash) {
+ value_type *Result = nullptr;
+ OnConstruct(LazyValueConstructor(Mem, Result, Hash));
+ return Result->Hash.data();
+ }));
+ }
+
+ pointer insertLazy(ArrayRef<uint8_t> Hash,
+ function_ref<void(LazyValueConstructor)> OnConstruct) {
+ return insertLazy(const_pointer(), Hash, OnConstruct);
+ }
+
+ pointer insert(const_pointer Hint, value_type &&HashedData) {
+ return insertLazy(Hint, HashedData.Hash, [&](LazyValueConstructor C) {
+ C(std::move(HashedData.Data));
+ });
+ }
+
+ pointer insert(const_pointer Hint, const value_type &HashedData) {
+ return insertLazy(Hint, HashedData.Hash,
+ [&](LazyValueConstructor C) { C(HashedData.Data); });
+ }
+
+ pointer find(ArrayRef<uint8_t> Hash) {
+ assert(Hash.size() == std::tuple_size<HashT>::value);
+ return ThreadSafeTrieRawHashMapBase::find(Hash);
+ }
+
+ const_pointer find(ArrayRef<uint8_t> Hash) const {
+ assert(Hash.size() == std::tuple_size<HashT>::value);
+ return ThreadSafeTrieRawHashMapBase::find(Hash);
+ }
+
+ ThreadSafeTrieRawHashMap(std::optional<size_t> NumRootBits = std::nullopt,
+ std::optional<size_t> NumSubtrieBits = std::nullopt)
+ : ThreadSafeTrieRawHashMapBase(DefaultContentAllocSize<value_type>,
+ DefaultContentAllocAlign<value_type>,
+ DefaultContentOffset<value_type>,
+ NumRootBits, NumSubtrieBits) {}
+
+ ~ThreadSafeTrieRawHashMap() {
+ if constexpr (std::is_trivially_destructible<value_type>::value)
+ this->destroyImpl(nullptr);
+ else
+ this->destroyImpl(
+ [](void *P) { static_cast<value_type *>(P)->~value_type(); });
+ }
+
+ // Move constructor okay.
+ ThreadSafeTrieRawHashMap(ThreadSafeTrieRawHashMap...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
b415d76
to
a8f5723
Compare
✅ With the latest revision this PR passed the Python code formatter. |
(be great if this could be broken up a bit more - like the TrieRawHashMap looks like it'd be a fine standalone patch to introduce the data structure and its unit tests - maybe similarly with the other VirtualOutputBackends patch, the raw_ostream_proxy might be separable (& it'd be nice if the other parts of the functionality were built up more incrementally too)) But maybe it's just not practical - it's unfortunate when things get to this state, though I understand the value of prototyping out of tree - it makes it hard to adequately review things when they come into the project this way. |
Both of the things you mentioned above are separate commits in those PRs already and I will continue to break apart more functions if needed. I am happy to put up individual PRs if that is better for reviewers while shrinking this mega PR. |
Ah, fair enough. Not ideal (it's still sort of one review for the whole set of changes & any updates get disjoint (because they've got to be stacked on top of the already complex stack, rather than added into the stack back where the original change was made, etc) Such is life - eventually we'll get some better dependent-patch review process I hope. |
llvm/lib/Support/TrieRawHashMap.cpp
Outdated
if (Digit < 10) | ||
OS << char(Digit + '0'); | ||
else | ||
OS << char(Digit - 10 + 'a'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have abstractions for printing hex numbers - perhaps we could use those? (& they print whole hex numbers, which might remove the need for some of the printPrefix
code too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complication here is that the prefix can have N bits, where N is not a multiple of 4 or 8. It was printing as much hex as possible, then print the remaining in []
as binary. I change it to print as much complete bytes as possible, so we can use the existing hex printing function.
28d2115
to
281d178
Compare
Hello @cachemeifyoucan! Do you still plan on working on these PRs? There were some unaddressed comments/requests for changes left last year. +1 with @dwblaikie - this would need to be split in smaller patches, there's just too much code to review. Can you send a PR for the |
/// ObjectHandle encapulates a *loaded* object in the CAS. You need one | ||
/// of these to inspect the content of an object: to look at its stored | ||
/// data and references. | ||
class ObjectHandle : public ReferenceBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just skimmed over the PR, but it still unclear why ObjectHandle
is needed. Is this just for semantic purposes? Why can't we just have ObjectRef
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current intention for the ObjectHandle
class is that it is only exposed to the CAS implementation. There is a documentation file https://github.com/llvm/llvm-project/blob/281d1781dc4cb9edd457600b5133afcf25bda544/llvm/docs/ContentAddressableStorage.md that explains how to use the types (together with doxyen comments).
FWIW we build something like this a while ago: https://github.com/legion-labs/legion/tree/main/crates/lgn-content-store |
Hi @aganea Thanks for helping with the review! I was distracted with other obligation that I temporary unable to follow up with the reviews and I definitely intended to pick this up ASAP. Note this PR was created just as we transition to GitHub PR that there isn't a clear workflow for how to handle a big PR like this. I was breaking up the big commit into smaller reviews (like #69528) while using this PR to provide a larger context. I might take some time to rewrite this PR using
Cool! Glad to see more community members are doing something similar and we hope we can work together! |
Is there any progress on this? |
Implement TrieRawHashMap which stores objects into a Trie based on the hash of the object. User needs to supply the hashing function and guarantees the uniqueness of the hash for the objects to be inserted. Hash collision is not supported
Add llvm::cas::ObjectStore abstraction and InMemoryCAS as a in-memory CAS object store implementation. The ObjectStore models its objects as: * Content: An array of bytes for the data to be stored. * Refs: An array of references to other objects in the ObjectStore. And each CAS Object can be idenfied with an unqine ID/Hash. ObjectStore supports following general action: * Expected<ID> store(Content, ArrayRef<Ref>) * Expected<Ref> get(ID) It also introduces following types to interact with a CAS ObjectStore: * CASID: Hash representation for an CAS Objects with its context to help print/compare CASIDs. * ObjectRef: A light-weight ref for an object in the ObjectStore. It is implementation defined so it can be optimized for read/store/references depending on the implementation. * ObjectHandle: A CAS internal light-weight handle to an loaded object in the ObjectStore. Underlying data for the object is guaranteed to be available and no error handling is required to access data. This is not exposed to the users of CAS from ObjectStore APIs. * ObjectProxy: A proxy for the users of CAS to interact with the data inside CAS Object. It bundles a ObjectHandle and an ObjectStore instance. Differential Revision: https://reviews.llvm.org/D133716
Add parameter to file lock API to allow exclusive file lock. Both Unix and Windows support lock the file exclusively for write for one process and LLVM OnDiskCAS uses exclusive file lock to coordinate CAS creation.
Add current downstream cas API and implementation that includes OnDiskCAS implementation, different level of abstractions for CAS, different utilities.
Allow loading external CAS implementation via PluginCAS. In this patch, it adds: * C APIs that can be implemented by plugin, from which LLVM can load the dylib to use external CAS implementation * A PluginCAS, that implements vending external CAS implementation as llvm ObjectStore and ActionCache class. * A libCASPluginTest dylib, that provides example external CAS implementation that wraps LLVM CAS for testing purpose. * Add a unified way to load external CAS implementation.
281d178
to
e5263c6
Compare
Most content in this PR has been split into following stacked PR:
Going to close this PR once everything is moved. Please head to the new PRs for easier review. Thanks. |
Adds Content Addressable Storage implementation for LLVM, which includes:
Also comes with various level of abstraction for create/query/chaining CAS instance.