Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

antiguru
Copy link
Member

@antiguru antiguru commented Jul 8, 2025

Extract the inner loop of half-join into a separate function.

This change extracts the inner loop across proposals in half-join into a separate function. The idea is that the inner loop is hot, but by embedding it in the closure, the optimizer has a hard time inlining functions called from within the loop on account of the size of the closure itself. Help the optimizer by extracting the hot loop in the hope that this enables better inlining.

Also some changes around the return type of the half_join_internal_unsafe function. Instead of forcing a stream of vectors, allow the caller to provide a container builder, and return a stream instead. (We could return a collection, but the caller can do that themselves easily enough.)

@antiguru antiguru force-pushed the simplify_half_join branch from 45ba04b to 38f6fb5 Compare July 8, 2025 15:31
*diff1 = R::zero();
}
}
yielded |= process_proposals::<G, _, _, _, _, _, _, _, _>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to avoid relying short-circuited evaluation if at all possible. Mostly from a clarity point of view.

@antiguru antiguru force-pushed the simplify_half_join branch from 816ec43 to 9cea5a8 Compare July 10, 2025 14:18
Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru force-pushed the simplify_half_join branch from 9cea5a8 to f604faa Compare July 10, 2025 14:19
Copy link
Member

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is mergeable, but I also think it reveals that the file is at a bursting point from a readability / clarity point of view. We should do no further changes without some clean-up.

Comment on lines 294 to 298
/// Process proposals one at a time, yielding if necessary.
///
/// Returns `true` if the operator should yield.
///
/// Utility function for `half_join_internal_unsafe`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new layout confuses me, and I think it's partly that the comment used to be on a for-loop, where I could understand what we were talking about wrt "one at a time, yielding if necessary". Now it's a method, and .. it's totally unclear what this means.

This might just be Rust being awkward around refactoring, where the absolute wall of captured state is .. necessary, but utterly inexplicable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature and constraints are almost (20 lines) as long as the method body (30 lines).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tl;dr is that this probably needs a real refactoring, which I'm happy to do, whereas this is more a tear-out. My understanding is that this is for performance more than anything, because Rust/LLVM is unwilling to perform the intended inlining in complex code blocks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with the type above, perhaps the best documentation here is just

/// Outlining of the inner loop of `half_join_internal_unsafe` for reasons of performance.

.as_collection()
}

/// Utility type for a session in scope `G` with a container builder `CB`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we aim for a better comment here. Yes it is a utility type, but that does not clarify what it is for, how you should hold it, etc. In particular, I'd stress that this is just a shortening of a type for readability. I think potentially a better (imo) comment would just be

/// A shorthand for an otherwise complex type describing a session with lifetime `'a` in a scope `G` with a container builder `CB`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe flipped around a bit to be

/// A session with lifetime `'a` in a scope `G` with a container builder `CB`.
///
/// This is a shorthand primarily for purposes of readability.

antiguru added 2 commits July 10, 2025 18:12
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Copy link
Member

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you.

@frankmcsherry frankmcsherry merged commit ce4556b into TimelyDataflow:master Jul 11, 2025
5 checks passed
@github-actions github-actions bot mentioned this pull request Jul 11, 2025
@antiguru antiguru deleted the simplify_half_join branch July 11, 2025 04:48
@github-actions github-actions bot mentioned this pull request Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants