-
Notifications
You must be signed in to change notification settings - Fork 216
Add sha256a #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sha256a #329
Conversation
Co-authored-by: Volker Mische <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me, but I'd like to also have an approval from @rvagg before merging.
Ping @rvagg :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wild, I think we're stretching the definition of a "hash function" with this one if we use a classic fn(content)=digest
definition. What's the "content" in this case? If you used this in a CID, what would that mean? It's got some pretty big downsides if used in place of a standard cryptographic hash function - if you broke up content, did a sha256 across each chunk, then combined them with sha256a then you'd have a "digest" that's valid for all possible ordered recombinations of those chunks.
I'm finding it hard to say no, but this is not a comfortable fit I think.
All of the data the sum of which is this hash.
Not sure, this is not the intention. Goal here is to use this as a way to have upgradability if we need to use another ahash function (using something else than sha256 under the hood)
This is actually the exact reason why we are using it (see our Recon spec, or the "Range-Based Set Reconciliation and Authenticated Set Representations” [arXiv:2212.13567] paper where the idea comes from) |
IIUC the distinction here is that most hash functions (i.e. all of the ones currently covered by multihash) take in bytes and spit out a hash. However, it seems like sha256a takes a list of bytes and spits out a hash which is a different abstraction. While you could extract the part the part of sha256a that does bytes -> hash mapping, that would be the part that divides up a byte array into 32 bytes chunks and sums them up, which doesn't have much to do with sha256 which is largely what makes this awkward. |
@rvagg @vmx is the guidance in use for multihash still https://github.com/multiformats/multihash/tree/381cb3310e42d2263130781ab6fd559a46d40ab8/#non-cryptographic-hash-functions? Since collisions are an intentional part of the design of |
@aschmahmann yeah, it's a bit ambiguous in that it builds on a cryptographic hash function but then undoes some of the guarantees provided! The text in your link that's probably most relevant is:
By that definition, if this isn't intended for use as a CID for addressing (acknowledging that there's even ambiguity in how CIDs get used in the wild, so even that isn't a clear demarcation!), then switching it to Would you mind switching to the |
Makes sense, updated to |
This PR adds sha256a which is an associative hash function. It's defined as the sum of multiple sha2-256 hashes.
You can read the specification here.