-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LV] Initial support for stores in early exit loops #137774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds some basic support for a simple early exit loop with a store. This is vectorized such that when the next vector iteration would exit, we bail out to the scalar loop to handle the exit.
It's worth noting that brka/brkb is very much SVE-specific here, but there are other targets where it may be less of an issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I've only reviewed the legality bits, but thought I'd leave the comments I have so far.
bool isConditionCopyRequired() const { return RequiresEarlyExitConditionCopy; } | ||
|
||
/// Returns the load instruction, if any, nearest to an uncountable early | ||
/// exit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does nearest
refer to nearest load prior to the early exit, after, or either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The load closest in the IR graph to the exit (ideally, the load whose result is used directly in a comparison). I didn't spend much time on the naming ;)
@@ -407,6 +407,13 @@ class LoopVectorizationLegality { | |||
return hasUncountableEarlyExit() ? getUncountableEdge()->second : nullptr; | |||
} | |||
|
|||
/// Returns true if this is an early exit loop containing a store. | |||
bool isConditionCopyRequired() const { return RequiresEarlyExitConditionCopy; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the store be anywhere in the loop, i.e. including after the early exit? Might be worth clarifying.
If we do stick with this code in the final version I think it probably needs a better name because it's not obvious that it specifically refers to the condition driving an early exit. How would this scale with multiple exits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The store can be anywhere in the loop, since we either execute a full vector iteration as normal or we bail out to the scalar tail if an exit would occur in the middle.
We need to do this for all exits, so a single transformation to copy and reorder the exit condition IR is sufficient.
/// The load used to determine an uncountable early-exit condition. This is | ||
/// only used to allow further analysis in canVectorizeMemory if we found | ||
/// what looks like a valid early exit loop with store beforehand. | ||
std::optional<LoadInst *> EarlyExitLoad; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the case of two loads?, i.e.
%ld1 = load i8, ...
%ld2 = load i8, ...
%cmp = icmp eq i8 %ld1, %ld2
br i1 %cmp, label %early.exit, %loop.inc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would fail the check that the second operand to the compare is loop invariant. There's a backup check in the vplan transform that would reject it at that point too.
In future, it would be nice to increase the number of cases handled, but I've tried to keep it simple for now and just come up with a list of extra cases to handle later.
Some of the possible future work:
-
Supporting a combined condition; e.g. no escaping value, but two exit conditions or'd together. Needs work in ScalarEvolution? (This actually applies to the sample loop I chose; for the IR test I manually edited the IR to create 2 exits again)
-
Multiple uses of loads or comparisons in the exit IR chain; requires introducing a PHI node for current vector iteration value.
-
Supporting a chain of conditional loads; e.g. early exit is itself in a conditional block so might not always execute.
-
Supporting a non-const second term for comparison, or more varied comparisons (e.g. bit test instead of icmp).
-
Tail folding of EE loops with a store.
-
Dynamic bounds for known-dereferenceable load (via e.g. call void @llvm.assume(i1 true) [ "align"(ptr %pred, i64 2), "dereferenceable"(ptr %pred, i32 %n_bytes) ])
@@ -1823,6 +1930,8 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) { | |||
} else { | |||
if (!isVectorizableEarlyExitLoop()) { | |||
UncountableEdge = std::nullopt; | |||
EarlyExitLoad = std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like a good time to add a helper function to reset these variables perhaps?
@@ -1751,6 +1853,11 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |||
"backedge taken count: " | |||
<< *SymbolicMaxBTC << '\n'); | |||
UncountableEdge = SingleUncountableEdge; | |||
if (HasStore) { | |||
RequiresEarlyExitConditionCopy = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these are always set as a pair, so perhaps there is no need for RequiresEarlyExitConditionCopy
and you can just test if EarlyExitLoad
has a value or not?
for (auto *BB : TheLoop->blocks()) | ||
for (auto &I : *BB) { | ||
if (StoreInst *SI = dyn_cast<StoreInst>(&I)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if the store is prior to or after the early exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Since this prototype bails out to a scalar loop if the next vector iteration would exit partway through, it doesn't matter where the store is in the loop.
If/when we do tail-folding for these loops, we will need to generate different masks for state-changing operations before or after an exit.
// be even if LAA thinks it is due to performing the load for the | ||
// vector iteration i+1 in vector iteration i. | ||
if (isConditionCopyRequired()) { | ||
assert(EarlyExitLoad.has_value() && "EE Store without condition load."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change isConditionCopyRequired
to test for EarlyExitLoad
having a value, then you don't need this assert.
if (isConditionCopyRequired()) { | ||
assert(EarlyExitLoad.has_value() && "EE Store without condition load."); | ||
|
||
if (LAI->canVectorizeMemory()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to hoist this out and call it once to be reused here and below.
return false; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the else case? Looks like we fall through to:
if (!LAI->canVectorizeMemory())
return canVectorizeIndirectUnsafeDependences();
and so we may still treat the loop as legal. Worth adding a test for this case I think.
const auto *Deps = DepChecker.getDependences(); | ||
|
||
for (const MemoryDepChecker::Dependence &Dep : *Deps) { | ||
if (Dep.getDestination(DepChecker) == EarlyExitLoad || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to worry about other loads too? The header block could have several loads, with only one of them contributing to the branch condition.
Adds some basic support for a simple early exit loop with a store.
This is vectorized such that when the next vector iteration would exit, we bail out to the scalar loop to handle the exit.
The main complication faced is transforming the vplan; it feels quite fragile at the moment, and might be better done up front at initial recipe creation time. Thoughts?
An alternative would be to use tail-folding, but that requires doing brka/brkb inside the loop, which could be costly. And would still likely require moving some IR, just not copying to the preheader.
(One advantage to this approach is that architectures without predication could potentially take advantage of ee autovec, though I suspect we'd want this driven by pragmas instead of just guessing about a possible trip count)