-
Notifications
You must be signed in to change notification settings - Fork 237
Add IPFS Kademlia DHT Specification #497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🚀 Build Preview on IPFS ready
|
de8a611
to
9215300
Compare
src/routing/kad-dht.md
Outdated
DHT Servers SHOULD NOT return their own Peer ID in responses to `FIND_NODE` | ||
queries. However, they MUST include information about the requester, if and | ||
only if the requester is a DHT Server in its routing table and it is among the | ||
`k` closest nodes to the target key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, returning information about self (DHT Server) or requester (DHT Client) will be useless for the request, since the DHT Client already knows about both. The main argument of not including these addresses is to save bytes on the wire.
I don't see a use case where it would be useful that the DHT Server sends information about itself, since listen addresses and supported protocols should be exchanged using libp2p identify.
In some cases, it could be useful for the client (if it is a DHT Server) to know whether or not it is included in the Server's routing table, or which addresses are advertised. Information about the requester will essentially be present when a node is looking up itself when refreshing its routing table, and having this information provides a guarantee that the peer is actually routable. In a network that is large enough, it is very unlikely that the requester (being a DHT Server) will be among the k
closest nodes to a key (other than self) being looked up.
Alternatively is possible to get this information by starting a fresh DHT Client and requesting about the initial peer, but it is a bit cumbersome.
also see libp2p/specs#535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think that including the DHT server is mostly useless and takes space on the wire. It isn't useful when looking for one specific key, or the X
closest peers to a key.
The only use case I can see if for when DHT servers try to see whether they are included in another DHT server's routing table, and what information is stored about them. This would be useful only for crawlers to gain statistics on the network. The exact same information can be retrieved using another peer id.
Hence I would suggest that the closestPeers
field shouldn't contain neither self (DHT server), nor the requester's information. It self AND/OR requester are among the k
closest peers to the requested key in the DHT server's routing table, then they should be replaced by the next closest peer, so that the response always contains k
peers (if network is large enough).
It means that the current implementations may not be compliant with this spec change, but it should be easy to address.
WDYT @achingbrain ?
Related issues:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense.
Is an exception here if the server has the specifically requested ID in it's peer store (e.g. to allow finding multiaddrs for non-DHT servers)? What if the specifically requested ID is for the server or the requester's own ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! In this case, I suggest the DHT server returns k+1
peers: the k
closest DHT servers and the peer record matching the requested peer id (unless it corresponds to a DHT server).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guillaumemichel iiuc this thread correctly, the suggestion is to make this change:
DHT Servers SHOULD NOT return their own Peer ID in responses to `FIND_NODE` | |
queries. However, they MUST include information about the requester, if and | |
only if the requester is a DHT Server in its routing table and it is among the | |
`k` closest nodes to the target key. | |
DHT Servers MUST NOT include the requester's Peer ID in responses to `FIND_NODE` | |
queries. DHT Servers SHOULD NOT include their own Peer ID in responses. | |
When the requester or self would be among the `k` closest nodes, the server | |
SHOULD return the next closest peer(s) to maintain a response of `k` peers | |
when possible. | |
Special case: When a specific Peer ID is requested and that peer is known but | |
is not the queried DHT Server, the server SHOULD include that peer's record even if it's | |
not among the `k` closest DHT servers. |
Sgtm, go-libp2p-kad-dht remains technically compliant, because it implements MUST but just wanted to flag it does not (afaik) do the SHOULDs.
iiuc, the current go-libp2p-kad-dht behavior:
- ✅ Already excludes requester - The code has if targetPid != from check
- ❌ Does NOT exclude self - afaik there is no explicit check to exclude the DHT server's own ID?
- ❌ No special handling for non-DHT peers - doesn't distinguish between DHT servers and clients in responses?
Should we approve spec clarification here + fix reference implementation of kad-dht in go and js for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to clarify in 23d4b91.
iiuc, the current go-libp2p-kad-dht behavior:
1. ✅ Already excludes requester - The code has if targetPid != from check
That is correct.
2. ❌ Does NOT exclude self - afaik there is no explicit check to exclude the DHT server's own ID?
Self is never returned by the routing table when looking for closest peers. See error if it happens here. The only situation where a provider record could be returned for self, is if self's Peer ID is directly requested.
3. ❌ No special handling for non-DHT peers - doesn't distinguish between DHT servers and clients in responses?
The protobuf message only has a single field to return peers. We cannot modify it, since it would be a breaking change. The only DHT client that can ever be returned in a response is if the requested Peer ID is a perfect match of a peer that is in the peerstore, but isn't a DHT Server (in the routing table). See here.
Should we approve spec clarification here + fix reference implementation of kad-dht in go and js for consistency?
will do for now, we should support this in generator at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @guillaumemichel for documenting this ❤️
Made a first pass and pushed small cosmetics (fetch my changes) + some questions / suggestions in comments inline.
I think its ready for wider feedback, and if no concerns landing it in a few weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting here, it's much needed 🙏. Most of my comments fall into:
- What should be here vs libp2p kad spec
- If we've covered all the things we need to here
replace 'recommended' with appropriate SHOULD/MAY terms throughout the document for clarity and consistency with RFC 2119 conventions #497 (comment)
require TCP+Yamux as MUST, QUIC as SHOULD. require both TLS and Noise for DHT servers to ensure maximum interoperability. both go-libp2p and js-libp2p support both security protocols by default. #497 (comment)
Long overdue
Fixes #345
Checklist