Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[LoopPeel] Fix branch weights' effect on block frequencies #128785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

jdenny-ornl
Copy link
Collaborator

@jdenny-ornl jdenny-ornl commented Feb 25, 2025

[LoopPeel] Fix branch weights' effect on block frequencies

This patch implements the LoopPeel changes discussed in [RFC] Fix Loop Transformations to Preserve Block Frequencies.

In summary, a loop's latch block can have branch weight metadata that encodes an estimated trip count that is derived from application profile data. Initially, the loop body's block frequencies agree with the estimated trip count, as expected. However, sometimes loop transformations adjust those branch weights in a way that correctly maintains the estimated trip count but that corrupts the block frequencies. This patch addresses that problem in LoopPeel, which it changes to:

  • Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body.
  • Store the new estimated trip count as separate loop metadata named llvm.loop.estimated_trip_count.
  • Extend llvm::getLoopEstimatedTripCount to prefer that metadata, if present, over branch weights.

This patch introduces a fixme comment in LoopPeel.cpp that should be discussed before it lands.

@jdenny-ornl jdenny-ornl force-pushed the fix-peel-branch-weights branch from be2ad30 to cec331a Compare March 6, 2025 20:46
@jdenny-ornl jdenny-ornl changed the title [LoopPeel] Fix branch weights [LoopPeel] Fix branch weights' effect on block frequencies Mar 6, 2025
@jdenny-ornl jdenny-ornl force-pushed the fix-peel-branch-weights branch from cec331a to 843b4cf Compare March 12, 2025 23:56
Copy link

github-actions bot commented Mar 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@jdenny-ornl jdenny-ornl force-pushed the fix-peel-branch-weights branch from 843b4cf to 9d72e03 Compare March 13, 2025 00:04
For example:

```
declare void @f(i32)

define void @test(i32 %n) {
entry:
  br label %do.body

do.body:
  %i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %inc = add i32 %i, 1
  call void @f(i32 %i)
  %c = icmp sge i32 %inc, %n
  br i1 %c, label %do.end, label %do.body, !prof !0

do.end:
  ret void
}

!0 = !{!"branch_weights", i32 1, i32 9}
```

Given those branch weights, once any loop iteration is actually
reached, the probability of the loop exiting at the iteration's end is
1/(1+9).  That is, the loop is likely to exit every 10 iterations and
thus has an estimated trip count of 10.  `opt
-passes='print<block-freq>'` shows that 10 is indeed the frequency of
the loop body:

```
Printing analysis results of BFI for function 'test':
block-frequency-info: test
 - entry: float = 1.0, int = 1801439852625920
 - do.body: float = 10.0, int = 18014398509481984
 - do.end: float = 1.0, int = 1801439852625920
```

Key Observation: The frequency of reaching any particular iteration is
less than for the previous iteration because the previous iteration
has a non-zero probability of exiting the loop.  This observation
holds even though every loop iteration, once actually reached, has
exactly the same probability of exiting and thus exactly the same
branch weights.

Now we use `opt -unroll-force-peel-count=2 -passes=loop-unroll` to
peel 2 iterations and insert them before the remaining loop.  We
expect the key observation above not to change, but it does under the
implementation without this patch.  The block frequency becomes 1.0
for the first iteration, 0.9 for the second, and 6.4 for the main loop
body.  Again, a decreasing frequency is expected, but it decreases too
much: the total frequency of the original loop body becomes 8.3.  The
new branch weights reveal the problem:

```
!0 = !{!"branch_weights", i32 1, i32 9}
!1 = !{!"branch_weights", i32 1, i32 8}
!2 = !{!"branch_weights", i32 1, i32 7}
```

The exit probability is now 1/10 for the first peeled iteration, 1/9
for the second, and 1/8 for the remaining loop iterations.  It seems
this behavior is trying to ensure a decreasing block frequency.
However, as in the key observation above for the original loop, that
happens correctly without decreasing the branch weights across
iterations.

This patch changes the peeling implementation not to decrease the
branch weights across loop iterations so that the frequency for every
iteration is the same as it was in the original loop.  The total
frequency of the loop body, summed across all its occurrences, thus
remains 10 after peeling.

Unfortunately, that change means a later analysis cannot accurately
estimate the trip count of the remaining loop while examining the
remaining loop in isolation without considering the probability of
actually reaching it.  For that purpose, this patch stores the new
trip count as separate metadata named `llvm.loop.estimated_trip_count`
and extends `llvm::getLoopEstimatedTripCount` to prefer it, if
present, over branch weights.

An alternative fix is for `llvm::getLoopEstimatedTripCount` to
subtract the `llvm.loop.peeled.count` metadata from the trip count
estimated by a loop's branch weights.  However, there might be other
loop transformations that still corrupt block frequencies in a similar
manner and require a similar fix.  `llvm.loop.estimated_trip_count` is
intended to provide a general way to store estimated trip counts when
branch weights cannot directly store them.

This patch introduces several fixme comments that need to be addressed
before it can land.
@jdenny-ornl jdenny-ornl force-pushed the fix-peel-branch-weights branch from 9d72e03 to f413520 Compare March 19, 2025 20:46
@@ -7866,6 +7866,17 @@ The attributes in this metadata is added to all followup loops of the
loop distribution pass. See
:ref:`Transformation Metadata <transformation-metadata>` for details.

'``llvm.loop.estimated_trip_count``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Copy link
Contributor

@arsenm arsenm Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should explain where it should be attached. What if it's not placed on an arbitrary terminator that isn't a loop backed? Consequences for it being incorrect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should explain where it should be attached. What if it's not placed on an arbitrary terminator that isn't a loop backed?

The llvm.loop metadata description explains placement generally for all llvm.loop.* metadata. I've also added an example in the same manner as the docs I see for other llvm.loop.* metadata.

Consequences for it being incorrect?

I've added a description of how it's consumed.

Comment on lines 7872 to 7878
This metadata records the loop's estimated trip count. If it is not present, a
loop's estimated trip count should be computed from any ``branch_weights``
metadata attached to the latch block's branch instruction.

Thus, this metadata frees loop transformations to compute latch branch weights
solely for the purpose of maintaining accurate block frequencies instead of
requiring the branch weights to always serve both roles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads as more background information rather than a direct description of the meaning of the metadata

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some text about the semantics.

@@ -7866,6 +7866,17 @@ The attributes in this metadata is added to all followup loops of the
loop distribution pass. See
:ref:`Transformation Metadata <transformation-metadata>` for details.

'``llvm.loop.estimated_trip_count``' Metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why _s? Most of the existing metadata use . separators

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other loop metadata, I understood . to separate levels in a naming hierarchy while _ separates words in a single level's name.

For example: ‘llvm.loop.unroll_and_jam.count’ Metadata

I didn't think of any way to break estimated_trip_count apart into a meaningful hierarchy.

Thanks for reviewing. I'll work on your other comments.

@jdenny-ornl jdenny-ornl requested a review from jdoerfert May 5, 2025 20:14
Extending beyond the limitations of `getExpectedExitLoopLatchBranch`
is a possible improvement for the future not an urgent fixme.

No one has pointed out code that computes estimated trip counts
without using `llvm::getLoopEstimatedTripCount`.
@jdenny-ornl jdenny-ornl marked this pull request as ready for review June 16, 2025 22:39
@MatzeB
Copy link
Contributor

MatzeB commented Jun 16, 2025

  • My understanding is that this fixes the branch weights (basically by removing all the code making adjustments which turned out to preserve the trip count estimation but at the price of wrong branch weights) as discussed in the RFC. I think this part is great!
  • I'm not a fan of introducing the new metadata as I'm not convinced we even need it / have users for it; but if I overlooked something and we have indeed use cases that required the loop trip count metric (or if others are convinced we need it) then I'm fine landing this.

@MatzeB
Copy link
Contributor

MatzeB commented Jun 16, 2025

judging from reactiong on the RFC it seems others fine with the new metadata. Then no objections from my side.

@jdenny-ornl
Copy link
Collaborator Author

  • My understanding is that this fixes the branch weights (basically by removing all the code making adjustments which turned out to preserve the trip count estimation but at the price of wrong branch weights) as discussed in the RFC. I think this part is great!

That's right.

* I'm not a fan of introducing the new metadata as I'm not convinced we even need it / have users for it; but if I overlooked something and we have indeed use cases that required the loop trip count metric (or if others are convinced we need it) then I'm fine landing this.

Existing uses of the estimated trip count are listed under point 1 of this RFC comment. As discussed there and afterward, it's not clear to me what to do with estimated trip counts right now, so I think that needs a separate RFC. For now, the new metadata just gets it out of our way for fixing block frequencies.

@@ -850,27 +852,35 @@ llvm::getLoopEstimatedTripCount(Loop *L,
getEstimatedTripCount(LatchBranch, L, ExitWeight)) {
if (EstimatedLoopInvocationWeight)
*EstimatedLoopInvocationWeight = ExitWeight;
if (auto EstimatedTripCount =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would LLVMLoopEstimatedTripCount not be present, i.e. shouldn't one expect that, if the build is (for example) using instrumented profiling, the metadata should be present, unless something dropped it, which should be a bug?

Copy link
Collaborator Author

@jdenny-ornl jdenny-ornl Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently in this PR, the new metadata is not introduced until a pass like LoopPeel calls setLoopEstimatedTripCount. Normally that would be because it creates the situation where branch weights alone cannot encode both block frequencies and estimated trip counts with desired values, so the new metadata is then required (but of course not all passes have been fixed yet to maintain block frequencies correctly).

Moreover, the todo comment this PR introduces into LoopPeel.cpp describes how getLoopEstimatedTripCount and setLoopEstimatedTripCount cannot handle some loop forms and so don't get/set the estimated trip count. That issue exists without this PR. What's new is that it skips adding the new metadata in that case.

As we discussed at some point in the RFC, the inconsistent presence of the new metadata is not ideal, and we might want to ultimately change that. But in the long term, it sounds like people might want to drop estimated trip counts altogether, so I'm not yet convinced that efforts to make the new metadata more consistently present are worthwhile. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently in this PR, the new metadata is not introduced until a pass like LoopPeel calls setLoopEstimatedTripCount. Normally that would be because it creates the situation where branch weights alone cannot encode both block frequencies and estimated trip counts with desired values, so the new metadata is then required (but of course not all passes have been fixed yet to maintain block frequencies correctly).

Could this cause a surprise if other loop transforms come first, i.e. should the metadata be inserted earlier, maybe by a small pass dedicated to this job?

Moreover, the todo comment this PR introduces into LoopPeel.cpp describes how getLoopEstimatedTripCount and setLoopEstimatedTripCount cannot handle some loop forms and so don't get/set the estimated trip count. That issue exists without this PR. What's new is that it skips adding the new metadata in that case.

Ack. How about inserting a form of this metadata that says "unknown". So then we can at least distinguish between "pass dropped the info == bug" from "info wasn't dropped, but you can't use it". Has the same effect wrt the trip count calculation, but (my argument goes) it's way more maintainable (I'd like to do the same for MD_prof fwiw - i.e. "unknown" vs absence)

As we discussed at some point in the RFC, the inconsistent presence of the new metadata is not ideal, and we might want to ultimately change that. But in the long term, it sounds like people might want to drop estimated trip counts altogether, so I'm not yet convinced that efforts to make the new metadata more consistently present are worthwhile. What do you think?

I think my suggestion is minimal: adding the "unknown" form of the metadata and then checking and (at minimum) LLVM_DEBUG - ing when a loop doesn't have it whatsoever. I get the concern about investment vs risk of future (but currently unknown, IIUC) alternatives. I think making it hard for passes to naively drop it would help its value a lot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprises should be avoidable if we can assume accesses to the new metadata will always be via the existing get/set methods. Perhaps that's too big of an assumption, and it at least suggests more documentation encouraging that constraint.

But an "unknown" state does sound like a reasonable solution. I'll look into implementing that. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants