Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

oed
Copy link

@oed oed commented Jun 13, 2023

This PR adds sha256a which is an associative hash function. It's defined as the sum of multiple sha2-256 hashes.
You can read the specification here.

@oed oed requested review from rvagg and vmx as code owners June 13, 2023 09:47
Co-authored-by: Volker Mische <[email protected]>
Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, but I'd like to also have an approval from @rvagg before merging.

@oed
Copy link
Author

oed commented Jun 27, 2023

Ping @rvagg :)

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wild, I think we're stretching the definition of a "hash function" with this one if we use a classic fn(content)=digest definition. What's the "content" in this case? If you used this in a CID, what would that mean? It's got some pretty big downsides if used in place of a standard cryptographic hash function - if you broke up content, did a sha256 across each chunk, then combined them with sha256a then you'd have a "digest" that's valid for all possible ordered recombinations of those chunks.

I'm finding it hard to say no, but this is not a comfortable fit I think.

@oed
Copy link
Author

oed commented Jun 28, 2023

What's the "content" in this case?

All of the data the sum of which is this hash.

If you used this in a CID, what would that mean?

Not sure, this is not the intention. Goal here is to use this as a way to have upgradability if we need to use another ahash function (using something else than sha256 under the hood)

if you broke up content, did a sha256 across each chunk, then combined them with sha256a then you'd have a "digest" that's valid for all possible ordered recombinations of those chunks.

This is actually the exact reason why we are using it (see our Recon spec, or the "Range-Based Set Reconciliation and Authenticated Set Representations” [arXiv:2212.13567] paper where the idea comes from)

@aschmahmann
Copy link
Contributor

What's the "content" in this case?

All of the data the sum of which is this hash.

IIUC the distinction here is that most hash functions (i.e. all of the ones currently covered by multihash) take in bytes and spit out a hash. However, it seems like sha256a takes a list of bytes and spits out a hash which is a different abstraction.

While you could extract the part the part of sha256a that does bytes -> hash mapping, that would be the part that divides up a byte array into 32 bytes chunks and sums them up, which doesn't have much to do with sha256 which is largely what makes this awkward.

@aschmahmann
Copy link
Contributor

aschmahmann commented Jun 28, 2023

@rvagg @vmx is the guidance in use for multihash still https://github.com/multiformats/multihash/tree/381cb3310e42d2263130781ab6fd559a46d40ab8/#non-cryptographic-hash-functions? Since collisions are an intentional part of the design of sha256a it seems like hash would be the tag to use (leaving aside the other issue of this not quite fitting the bytes -> digest abstraction hashes normally use).

@rvagg
Copy link
Member

rvagg commented Jun 29, 2023

@aschmahmann yeah, it's a bit ambiguous in that it builds on a cryptographic hash function but then undoes some of the guarantees provided! The text in your link that's probably most relevant is:

are not suitable for content addressing systems

By that definition, if this isn't intended for use as a CID for addressing (acknowledging that there's even ambiguity in how CIDs get used in the wild, so even that isn't a clear demarcation!), then switching it to hash might be a better choice here to (a) avoid confusion and side-step ambiguity and (b) not set a problematic precedent, which is a common problem we have with the table.

Would you mind switching to the hash tag @oed? I think we'd be good with that.

@oed
Copy link
Author

oed commented Jun 29, 2023

Makes sense, updated to hash. Thanks @rvagg @aschmahmann

@rvagg rvagg merged commit 08deb26 into multiformats:master Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants