feat(block-sync): add support for follower mode#5556
Conversation
cc2b265 to
d251f63
Compare
|
You have run out of free Bugbot PR reviews for this billing cycle. This will reset on January 16. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
| go r.respondToPeer(msg, e.Src) | ||
| case *bcproto.BlockResponse: | ||
| go bcR.handlePeerResponse(msg, e.Src) | ||
| // adds block to the pool | ||
| go r.handlePeerResponse(msg, e.Src) | ||
| case *bcproto.StatusRequest: | ||
| // Send peer our state. | ||
| e.Src.TrySend(p2p.Envelope{ | ||
| go e.Src.TrySend(p2p.Envelope{ | ||
| ChannelID: BlocksyncChannel, | ||
| Message: &bcproto.StatusResponse{ | ||
| Height: bcR.store.Height(), | ||
| Base: bcR.store.Base(), | ||
| Height: r.store.Height(), | ||
| Base: r.store.Base(), | ||
| }, | ||
| }) |
There was a problem hiding this comment.
I guess in libp2p these are already running in goroutines since we have parallel reactor message processing right? Is there a pro/con or reason to run these in goroutines instead of just synchronously?
There was a problem hiding this comment.
In libp2p, yes, but not in comet-p2p.
On the other hand, even though lp2p is concurrent, should Receive() wait for another p2p request to be sent (TrySend)? I see the potential downside in spawning more goroutines and creating congestion on the go scheduler, but here we have a reasonable amount of routines imo (one-to-one)
There was a problem hiding this comment.
I dont feel super strongly here but I do think that handling all of these in their own goroutines is a bit of a premature optimization that may have impact on other places where goroutine scheduling is precious like we have seen with rpc requests.
should Receive() wait for another p2p request to be sent (TrySend)
I totally agree that no, receive shouldn't have to wait for try send, or loading a block from the storage, etc when these are already synchronized internally with locks, so no reason to block the shared receive func. But I'm also not sure spawning a goroutine for each request is the best way to avoid that (or that we even should avoid that/optimize, since we haven't really seen this having any impact yet, but maybe you have seem this or some data on this while testing?).
I think pushing these onto an internal queue and having a set amount of workers pulling messages off and processing them would make more sense so we dont have an unbounded amount of goroutines here (I do see its 1-1 with p2p messages, but what if there are a bunch of nodes that are far behind and we are getting spammed with BlockRequest messages, that would potentially be a lot of gorouintes to spin up).
There was a problem hiding this comment.
| go r.respondToPeer(msg, e.Src) | ||
| case *bcproto.BlockResponse: | ||
| go bcR.handlePeerResponse(msg, e.Src) | ||
| // adds block to the pool | ||
| go r.handlePeerResponse(msg, e.Src) | ||
| case *bcproto.StatusRequest: | ||
| // Send peer our state. | ||
| e.Src.TrySend(p2p.Envelope{ | ||
| go e.Src.TrySend(p2p.Envelope{ | ||
| ChannelID: BlocksyncChannel, | ||
| Message: &bcproto.StatusResponse{ | ||
| Height: bcR.store.Height(), | ||
| Base: bcR.store.Base(), | ||
| Height: r.store.Height(), | ||
| Base: r.store.Base(), | ||
| }, | ||
| }) |
There was a problem hiding this comment.
I dont feel super strongly here but I do think that handling all of these in their own goroutines is a bit of a premature optimization that may have impact on other places where goroutine scheduling is precious like we have seen with rpc requests.
should Receive() wait for another p2p request to be sent (TrySend)
I totally agree that no, receive shouldn't have to wait for try send, or loading a block from the storage, etc when these are already synchronized internally with locks, so no reason to block the shared receive func. But I'm also not sure spawning a goroutine for each request is the best way to avoid that (or that we even should avoid that/optimize, since we haven't really seen this having any impact yet, but maybe you have seem this or some data on this while testing?).
I think pushing these onto an internal queue and having a set amount of workers pulling messages off and processing them would make more sense so we dont have an unbounded amount of goroutines here (I do see its 1-1 with p2p messages, but what if there are a bunch of nodes that are far behind and we are getting spammed with BlockRequest messages, that would potentially be a lot of gorouintes to spin up).
This PR introduces follower mode for non-validating nodes that do not participate in block production. It addresses block discrepancies between validator and non-validator nodes running over libp2p and/or when block times are small.
config.toml
Follower nodes request statuses (min-max available height) every second (regular nodes do this every 10 seconds), keeping pace with the validators.
If the node is a validator, this mode is ignored.
Note that since the consensus reactor is perpetually "waiting", follower nodes:
cometbft_consensus_*metrics (usecometbft_consensus_blocksync_*instead)Changes
blocksync.follower_modeparamblocksyncReactor.Receive()logic non-blockingCloses STACK-2047