-
Notifications
You must be signed in to change notification settings - Fork 112
[p2p/simulated, estimator] simulate bandwidth and message size constraints #1452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[p2p/simulated, estimator] simulate bandwidth and message size constraints #1452
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is SOOOOO cool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left 2 small notes, otherwise getting very close
|
Looks like some failing linting (probably from a doc): |
There was already a test for this, but only between two peers which didn't trigger the issue above. I extended the tests now with one-to-many and many-to-one sends, and also to make sure pipelining works as expected (i.e. only transmission blocks the pipe not latency). |
Will review your updates shortly 👍 |
p2p/src/simulated/network.rs
Outdated
|
|
||
| // Always update sender's egress (sender uses bandwidth regardless of | ||
| // delivery), this reserves the "pipe" for the duration of the transmission | ||
| self.peers.get_mut(&origin).unwrap().egress_available_at = transmission_complete_at; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this makes the approximation better, it blocks all egress from a peer based on the slowest peer they are connecting to (i.e. their next egress_available_at is set to transmission_complete_at).
What I think may simplify this fix (and take it over the finish line) is to loosen the pairwise effective_bps approximation and instead [1] consume the sender egress, [2] put the message into "no mans land", [3] deliver to receiver at transmission_complete_at + latency + ingress_available_at + payload/effective_eps (recipient).
I think there is an open question on ordering there but find the current approach (in my local testing) adds too large of a latency penalty because of this "uniform recipient-biased emission".
I suppose an alternative could be a pairwise reservation of bandwidth that can't exceed the global effective_bps but figured that might be a tad harder.
Lmk what you think (happy to take it from here if you are done with refactors 😅 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will only solve one side of the issue, we will still have the problem of a slow sender blocking a fast receiver. The fundamental problem here is that we are making static scheduling decisions at send time. I believe that works for accurately modeling the send side, you can just use your own egress bandwidth to reserve your side of the pipe, but it's not possible to model the receive side statically like this (because of other competing sends affecting the receiver). For the receive side we need to track the available capacity at any point in time, and dynamically adjust based on that. I think there's no way to get around this (despite my efforts to avoid it 😅), so I'll try to come up with something that doesn't blow up complexity too much.
Edit: actually reserving the send side of the pipe at full speed is also unrealistic because of TCP backpressure. But since we're going to have to dynamically reserve capacity on the receive side, we can use the same logic for the send side.
|
I finally decided to bite the bullet and implement a proper transfer scheduler for reserving capacities on both sides appropriately. The scheduler uses a delta-based approach to track bandwidth usage changes over time, where positive deltas represent bandwidth allocation and negative deltas represent release. It handles scenarios like multiple concurrent transfers competing for bandwidth. The scheduler enforces both sender egress and receiver ingress bandwidth limits simultaneously, taking the minimum available bandwidth at each point in time. It calculates optimal bandwidth reservations upfront when a transfer begins, creating a series of time-bounded allocations that adapt to the changing availability of bandwidth as other transfers start and complete. The algorithm merges sender and receiver schedules chronologically and iterates through time windows between bandwidth change events, calculating how much data can be transferred in each window. There is special handling for zero-bandwidth scenarios: when a sender has 0 bandwidth, transfers block indefinitely (simulating network congestion), while a receiver with 0 bandwidth still allows transfers to complete using only sender bandwidth (simulating one-way communication failure). At this point I think the PR might, hilariously, have too many tests. Some of them may be redundant, I'll clean that up next as needed. |
446d709 to
1424bad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is phenomenal. Left 1 real question/suggestion: why do we take such care to support the 0 bps case? It seems we could prevent that config in the public interface (and I don't see value to supporting a dead channel).
Maybe I'm missing something?
| //! `commonware-p2p::simulated` can be run deterministically when paired with `commonware-runtime::deterministic`. | ||
| //! This makes it possible to reproduce an arbitrary order of delivered/dropped messages with a given seed. | ||
| //! | ||
| //! # Bandwidth Simulation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:chefs-kiss:
p2p/src/simulated/network.rs
Outdated
| let receiver = self.peers.get(&recipient).unwrap(); | ||
| (receiver.ingress_bps, receiver.ingress_available_at) | ||
| }; | ||
| let sender_has_bandwidth = sender_peer.egress.bandwidth_bps > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how much handling we could remove if we required any bandwidth specification to take a non-zero BPS? That seems like a very reasonable expectation?
For completeness, I guess it is worth supporting this to ensure a sender burns their egress (may be useful modeling a sybil attack)? If we don't add a link, the message will never be sent to begin with.
That being said, I suppose we could model a sybil as infinite bandwidth rather than 0. So, I guess my point still stands (why do we support 0 bps as a config)?
|
We decided to still allow the special case of 0 egress/ingress bandwidth, since it adds some functionality compared to removing the link altogether, for example modeling a scenario where a firewall prevents incoming traffic but outgoing still works fine. All comments addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #1452 +/- ##
==========================================
+ Coverage 91.80% 91.97% +0.17%
==========================================
Files 280 281 +1
Lines 70653 72307 +1654
==========================================
+ Hits 64865 66507 +1642
- Misses 5788 5800 +12
... and 12 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
This PR implements bandwidth-aware network simulation in two parts: p2p/simulated network now supports bandwidth constraints with proper message transmission delays and queueing, and estimator CLI/DSL extensions that expose this functionality to users.
Closes #1407