Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[LV] Initial support for stores in early exit loops #137774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

huntergr-arm
Copy link
Collaborator

Adds some basic support for a simple early exit loop with a store.

This is vectorized such that when the next vector iteration would exit, we bail out to the scalar loop to handle the exit.

The main complication faced is transforming the vplan; it feels quite fragile at the moment, and might be better done up front at initial recipe creation time. Thoughts?

An alternative would be to use tail-folding, but that requires doing brka/brkb inside the loop, which could be costly. And would still likely require moving some IR, just not copying to the preheader.

(One advantage to this approach is that architectures without predication could potentially take advantage of ee autovec, though I suspect we'd want this driven by pragmas instead of just guessing about a possible trip count)

Adds some basic support for a simple early exit loop with a store.

This is vectorized such that when the next vector iteration would
exit, we bail out to the scalar loop to handle the exit.
@huntergr-arm huntergr-arm requested review from fhahn and david-arm April 29, 2025 09:16
@david-arm
Copy link
Contributor

It's worth noting that brka/brkb is very much SVE-specific here, but there are other targets where it may be less of an issue.

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I've only reviewed the legality bits, but thought I'd leave the comments I have so far.

bool isConditionCopyRequired() const { return RequiresEarlyExitConditionCopy; }

/// Returns the load instruction, if any, nearest to an uncountable early
/// exit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does nearest refer to nearest load prior to the early exit, after, or either?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load closest in the IR graph to the exit (ideally, the load whose result is used directly in a comparison). I didn't spend much time on the naming ;)

@@ -407,6 +407,13 @@ class LoopVectorizationLegality {
return hasUncountableEarlyExit() ? getUncountableEdge()->second : nullptr;
}

/// Returns true if this is an early exit loop containing a store.
bool isConditionCopyRequired() const { return RequiresEarlyExitConditionCopy; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the store be anywhere in the loop, i.e. including after the early exit? Might be worth clarifying.

If we do stick with this code in the final version I think it probably needs a better name because it's not obvious that it specifically refers to the condition driving an early exit. How would this scale with multiple exits?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store can be anywhere in the loop, since we either execute a full vector iteration as normal or we bail out to the scalar tail if an exit would occur in the middle.

We need to do this for all exits, so a single transformation to copy and reorder the exit condition IR is sufficient.

/// The load used to determine an uncountable early-exit condition. This is
/// only used to allow further analysis in canVectorizeMemory if we found
/// what looks like a valid early exit loop with store beforehand.
std::optional<LoadInst *> EarlyExitLoad;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the case of two loads?, i.e.

%ld1 = load i8, ...
%ld2 = load i8, ...
%cmp = icmp eq i8 %ld1, %ld2
br i1 %cmp, label %early.exit, %loop.inc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail the check that the second operand to the compare is loop invariant. There's a backup check in the vplan transform that would reject it at that point too.

In future, it would be nice to increase the number of cases handled, but I've tried to keep it simple for now and just come up with a list of extra cases to handle later.

Some of the possible future work:

  1. Supporting a combined condition; e.g. no escaping value, but two exit conditions or'd together. Needs work in ScalarEvolution? (This actually applies to the sample loop I chose; for the IR test I manually edited the IR to create 2 exits again)

  2. Multiple uses of loads or comparisons in the exit IR chain; requires introducing a PHI node for current vector iteration value.

  3. Supporting a chain of conditional loads; e.g. early exit is itself in a conditional block so might not always execute.

  4. Supporting a non-const second term for comparison, or more varied comparisons (e.g. bit test instead of icmp).

  5. Tail folding of EE loops with a store.

  6. Dynamic bounds for known-dereferenceable load (via e.g. call void @llvm.assume(i1 true) [ "align"(ptr %pred, i64 2), "dereferenceable"(ptr %pred, i32 %n_bytes) ])

@@ -1823,6 +1930,8 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
} else {
if (!isVectorizableEarlyExitLoop()) {
UncountableEdge = std::nullopt;
EarlyExitLoad = std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a good time to add a helper function to reset these variables perhaps?

@@ -1751,6 +1853,11 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
"backedge taken count: "
<< *SymbolicMaxBTC << '\n');
UncountableEdge = SingleUncountableEdge;
if (HasStore) {
RequiresEarlyExitConditionCopy = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these are always set as a pair, so perhaps there is no need for RequiresEarlyExitConditionCopy and you can just test if EarlyExitLoad has a value or not?

for (auto *BB : TheLoop->blocks())
for (auto &I : *BB) {
if (StoreInst *SI = dyn_cast<StoreInst>(&I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter if the store is prior to or after the early exit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Since this prototype bails out to a scalar loop if the next vector iteration would exit partway through, it doesn't matter where the store is in the loop.

If/when we do tail-folding for these loops, we will need to generate different masks for state-changing operations before or after an exit.

// be even if LAA thinks it is due to performing the load for the
// vector iteration i+1 in vector iteration i.
if (isConditionCopyRequired()) {
assert(EarlyExitLoad.has_value() && "EE Store without condition load.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change isConditionCopyRequired to test for EarlyExitLoad having a value, then you don't need this assert.

if (isConditionCopyRequired()) {
assert(EarlyExitLoad.has_value() && "EE Store without condition load.");

if (LAI->canVectorizeMemory()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to hoist this out and call it once to be reused here and below.

return false;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the else case? Looks like we fall through to:

  if (!LAI->canVectorizeMemory())
    return canVectorizeIndirectUnsafeDependences();

and so we may still treat the loop as legal. Worth adding a test for this case I think.

const auto *Deps = DepChecker.getDependences();

for (const MemoryDepChecker::Dependence &Dep : *Deps) {
if (Dep.getDestination(DepChecker) == EarlyExitLoad ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about other loads too? The header block could have several loads, with only one of them contributing to the branch condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants