Make unbounded sends unbounded and preserve events order in network #178

nerodono · 2025-12-08T10:46:31Z

Currently elfo-network screws order of events when destination mailbox is full and responses are involved:

ctx.send_to(actor_b, First).await.unwrap();
ctx.respond(token, Response);

The response could arrive before First message gets into target's mailbox.
While it's mostly unnoticeable and even correct (?, probably we need to specify explicitly guarantees made by elfo), this can make various pattern hard-to-impossible to implement.

Additionally, this PR introduces UNBOUNDED flag to NetworkEnvelope and makes use of it to send envelopes unboundedly, previous implementation always sent boundedly. This change is backward and forward compatible.

…s order on the same actor

codecov · 2025-12-09T15:07:01Z

Codecov Report

❌ Patch coverage is 72.15190% with 44 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
elfo-network/src/worker/mod.rs	74.19%	17 Missing and 7 partials ⚠️
elfo-network/src/worker/flows_rx.rs	65.51%	20 Missing ⚠️

📢 Thoughts on this report? Let us know!

loyd · 2025-12-24T09:09:00Z

elfo-network/src/codec/format.rs

 //! ┌───────────────────────┬────┬─────────────────────┐
 //! │ size of whole frame   │ 32 │                     │
 //! ├───────────────────────┼────┤                     │
 //! │ flags                 │  4 │                     │ flags:


Please, add a flag in this comment

Anyway, the flag seems to be useless according to https://github.com/elfo-rs/elfo/pull/178/changes#r2645396209.
It will be here eventually, but in the current implementation, it's a bit confusing

but in the current implementation, it's a bit confusing

Why so? First of all, it's only logical for unbounded sends to be unbounded, honestly, before diving into more of elfo-network, I thought that unbounded sends over the network are also unbounded, but realization that they aren't, brought more confusion 🤷. The current algorithm somewhat matches what you'll observe locally - bounded sends will also postpone succeeding unbounded sends.

loyd · 2025-12-24T10:04:36Z

elfo-network/src/worker/mod.rs

+                    return None;
+                };
+
+                if let Err((token, envelope)) = flow.try_enqueue_response(token, envelope) {


I don't like the composition of all this code. Even calling object.respond in make_envelope() smells bad, but enqueing is much worse because other points in the following code (in do_handle_message). It makes it more error-prone (logic added later in do_handle_message will be forgotten here).

loyd · 2025-12-24T10:32:25Z

elfo-network/src/worker/mod.rs

+    }
+
+    async fn push(&self, event: RxFlowEvent, routed: bool) -> bool {
+        // Sadly we live in rust, `EbrGuard: !Send`, thus writing


EbrGuard: !Send is the main reason why fast and safe EBR is possible =)

The comment here is more about how much code we need to move around to not accidentally capture EbrGuard, which will make returned Future !Send. But it doesn't quite matter, it's just a joke, I can remove it if you consider it inappropriate in the codebase.

loyd · 2025-12-24T10:41:08Z

elfo-network/src/worker/mod.rs

+                let guard = EbrGuard::new();
+                let object = ward!(self.ctx.book().get(self.actor_addr, &guard), return false);
+
+                object.unbounded_send(Addr::NULL, envelope)


Actually, I don't see what's the point to do it here (and in do_handle_message).

Any boundedly sent message before this one will postpone this code anyway.
It seems that the only difference between calling unbounded_send vs send here is inconsiderable.

I do undestand why it would be helpful if we would consider senders' address (in order to preserve guarantees of ctx.send(A); ctx.unbounded_send(B); on the sender side).

The way push is written here is simply the way that will lead to faster pusher queue unclogging. The fact that any bounded send postpone succeeding unbounded sends exactly makes the original problem more visible, isn't it? It'll just increase latencies (aka "the time message stays in pusher queue") for unbounded sends, which will kinda ruin author's intended concurrency, since now every unbounded send take someone's place in the mailbox.

IMO, it's reasonable to have this distinction, since we buffer messages on the receiver side - that makes pusher queue clog more of a problem, since unbounded sends will at least have a chance to be handled, on the contrast with them staying in the pusher queue longer 🤷.

loyd · 2025-12-24T10:45:47Z

elfo-network/src/codec/format.rs

 #[derive(Debug)]
 pub(crate) struct NetworkEnvelope {
    pub(crate) sender: NetworkAddr,
+    pub(crate) bounded: bool,


Let's move it below somewhere not between related fields

loyd · 2025-12-24T10:47:31Z

elfo-network/src/worker/flows_rx.rs

 }

+#[derive(Debug)]
+pub(super) enum RxFlowEvent {


I would prefer avoid using "Event" because events are messages (on a par with commands).

The "Message" variant is also confusing because "Response" is also message. Do you mean regular and requests here, right?

loyd · 2025-12-24T10:52:17Z

elfo-network/src/worker/flows_rx.rs

+            },
+            routed,
+        ));
+        self.acquire_direct(!routed);


I don't understand what's happening with flow control in these methods and why it's different

Why is it called here and not on caller's side as acquire_routed?

Don't you think it's confusing to have several places responsible to call acquire_routed and acquire_direct?

loyd · 2025-12-24T11:04:08Z

elfo/tests/remote_messaging.rs

+                    () = pruned.request_to(master, Ack).resolve().await.unwrap();
+                    _ = pruned.unbounded_send_to(pruned.addr(), TheResponse);
+                }));
+                // ^^ Pusher([MasterFill, BeforeResponse, Respond(TheResponse)])


It's wrong, TheResponse never reaches the pusher's queue, only mailbox

Yeah, I just mixed up the Ack and TheResponse here, it should be actually Respond(Ack) instead

loyd · 2025-12-26T07:06:23Z

I think there is problem with suggested solution. Let's consider the following example:

// A

Start => {
    ctx.send_to(B, StartSpamming).await;
    ctx.request_to(C, FetchData).await
}

// B

StartSpamming => {
    loop {
        ctx.send_to(A, SpamMessage).await;
    }
}

// C

(FetchData, token) => {
    ctx.respond(token, Data);
}

Now, locally or remotely, it never freezes.

With the suggested patch it most likely will freeze if A and C and located on different nodes.

nerodono requested a review from loyd December 8, 2025 10:46

nerodono force-pushed the feat/strict-event-order branch 6 times, most recently from acf669e to 170da04 Compare December 9, 2025 01:10

feat(elfo-network): make unbounded sends unbounded and preserve event…

348f364

…s order on the same actor

nerodono force-pushed the feat/strict-event-order branch from 170da04 to 348f364 Compare December 9, 2025 15:02

nerodono marked this pull request as ready for review December 9, 2025 16:58

loyd requested changes Dec 24, 2025

View reviewed changes

Make unbounded sends unbounded and preserve events order in network #178

Are you sure you want to change the base?

Make unbounded sends unbounded and preserve events order in network #178

Uh oh!

Conversation

nerodono commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 9, 2025

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nerodono Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loyd commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nerodono commented Dec 8, 2025 •

edited

Loading

nerodono Dec 25, 2025 •

edited

Loading