Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@nerodono
Copy link
Contributor

@nerodono nerodono commented Dec 8, 2025

Currently elfo-network screws order of events when destination mailbox is full and responses are involved:

ctx.send_to(actor_b, First).await.unwrap();
ctx.respond(token, Response);

The response could arrive before First message gets into target's mailbox.
While it's mostly unnoticeable and even correct (?, probably we need to specify explicitly guarantees made by elfo), this can make various pattern hard-to-impossible to implement.


Additionally, this PR introduces UNBOUNDED flag to NetworkEnvelope and makes use of it to send envelopes unboundedly, previous implementation always sent boundedly. This change is backward and forward compatible.

@nerodono nerodono requested a review from loyd December 8, 2025 10:46
@nerodono nerodono force-pushed the feat/strict-event-order branch 6 times, most recently from acf669e to 170da04 Compare December 9, 2025 01:10
@nerodono nerodono force-pushed the feat/strict-event-order branch from 170da04 to 348f364 Compare December 9, 2025 15:02
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 72.15190% with 44 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
elfo-network/src/worker/mod.rs 74.19% 17 Missing and 7 partials ⚠️
elfo-network/src/worker/flows_rx.rs 65.51% 20 Missing ⚠️

📢 Thoughts on this report? Let us know!

@nerodono nerodono marked this pull request as ready for review December 9, 2025 16:58
//! ┌───────────────────────┬────┬─────────────────────┐
//! │ size of whole frame │ 32 │ │
//! ├───────────────────────┼────┤ │
//! │ flags │ 4 │ │ flags:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a flag in this comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, the flag seems to be useless according to https://github.com/elfo-rs/elfo/pull/178/changes#r2645396209.
It will be here eventually, but in the current implementation, it's a bit confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in the current implementation, it's a bit confusing

Why so? First of all, it's only logical for unbounded sends to be unbounded, honestly, before diving into more of elfo-network, I thought that unbounded sends over the network are also unbounded, but realization that they aren't, brought more confusion 🤷. The current algorithm somewhat matches what you'll observe locally - bounded sends will also postpone succeeding unbounded sends.

return None;
};

if let Err((token, envelope)) = flow.try_enqueue_response(token, envelope) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the composition of all this code. Even calling object.respond in make_envelope() smells bad, but enqueing is much worse because other points in the following code (in do_handle_message). It makes it more error-prone (logic added later in do_handle_message will be forgotten here).

}

async fn push(&self, event: RxFlowEvent, routed: bool) -> bool {
// Sadly we live in rust, `EbrGuard: !Send`, thus writing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EbrGuard: !Send is the main reason why fast and safe EBR is possible =)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here is more about how much code we need to move around to not accidentally capture EbrGuard, which will make returned Future !Send. But it doesn't quite matter, it's just a joke, I can remove it if you consider it inappropriate in the codebase.

let guard = EbrGuard::new();
let object = ward!(self.ctx.book().get(self.actor_addr, &guard), return false);

object.unbounded_send(Addr::NULL, envelope)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't see what's the point to do it here (and in do_handle_message).

Any boundedly sent message before this one will postpone this code anyway.
It seems that the only difference between calling unbounded_send vs send here is inconsiderable.

I do undestand why it would be helpful if we would consider senders' address (in order to preserve guarantees of ctx.send(A); ctx.unbounded_send(B); on the sender side).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way push is written here is simply the way that will lead to faster pusher queue unclogging. The fact that any bounded send postpone succeeding unbounded sends exactly makes the original problem more visible, isn't it? It'll just increase latencies (aka "the time message stays in pusher queue") for unbounded sends, which will kinda ruin author's intended concurrency, since now every unbounded send take someone's place in the mailbox.

IMO, it's reasonable to have this distinction, since we buffer messages on the receiver side - that makes pusher queue clog more of a problem, since unbounded sends will at least have a chance to be handled, on the contrast with them staying in the pusher queue longer 🤷.

#[derive(Debug)]
pub(crate) struct NetworkEnvelope {
pub(crate) sender: NetworkAddr,
pub(crate) bounded: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move it below somewhere not between related fields

}

#[derive(Debug)]
pub(super) enum RxFlowEvent {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer avoid using "Event" because events are messages (on a par with commands).

The "Message" variant is also confusing because "Response" is also message. Do you mean regular and requests here, right?

},
routed,
));
self.acquire_direct(!routed);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's happening with flow control in these methods and why it's different

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called here and not on caller's side as acquire_routed?

Don't you think it's confusing to have several places responsible to call acquire_routed and acquire_direct?

() = pruned.request_to(master, Ack).resolve().await.unwrap();
_ = pruned.unbounded_send_to(pruned.addr(), TheResponse);
}));
// ^^ Pusher([MasterFill, BeforeResponse, Respond(TheResponse)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's wrong, TheResponse never reaches the pusher's queue, only mailbox

Copy link
Contributor Author

@nerodono nerodono Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just mixed up the Ack and TheResponse here, it should be actually Respond(Ack) instead

@loyd
Copy link
Collaborator

loyd commented Dec 26, 2025

I think there is problem with suggested solution. Let's consider the following example:

// A

Start => {
    ctx.send_to(B, StartSpamming).await;
    ctx.request_to(C, FetchData).await
}

// B

StartSpamming => {
    loop {
        ctx.send_to(A, SpamMessage).await;
    }
}

// C

(FetchData, token) => {
    ctx.respond(token, Data);
}

Now, locally or remotely, it never freezes.

With the suggested patch it most likely will freeze if A and C and located on different nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants