-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LV] Initial support for stores in early exit loops #137774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -407,6 +407,13 @@ class LoopVectorizationLegality { | |
return hasUncountableEarlyExit() ? getUncountableEdge()->second : nullptr; | ||
} | ||
|
||
/// Returns true if this is an early exit loop containing a store. | ||
bool isConditionCopyRequired() const { return RequiresEarlyExitConditionCopy; } | ||
|
||
/// Returns the load instruction, if any, nearest to an uncountable early | ||
/// exit. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The load closest in the IR graph to the exit (ideally, the load whose result is used directly in a comparison). I didn't spend much time on the naming ;) |
||
std::optional<LoadInst *> getEarlyExitLoad() const { return EarlyExitLoad; } | ||
|
||
/// Return true if there is store-load forwarding dependencies. | ||
bool isSafeForAnyStoreLoadForwardDistances() const { | ||
return LAI->getDepChecker().isSafeForAnyStoreLoadForwardDistances(); | ||
|
@@ -654,6 +661,16 @@ class LoopVectorizationLegality { | |
/// Keep track of the loop edge to an uncountable exit, comprising a pair | ||
/// of (Exiting, Exit) blocks, if there is exactly one early exit. | ||
std::optional<std::pair<BasicBlock *, BasicBlock *>> UncountableEdge; | ||
|
||
/// Indicates that we will need to copy the early exit condition into | ||
/// the vector preheader, as we will need to mask some operations in | ||
/// the loop (e.g. stores). | ||
bool RequiresEarlyExitConditionCopy = false; | ||
|
||
/// The load used to determine an uncountable early-exit condition. This is | ||
/// only used to allow further analysis in canVectorizeMemory if we found | ||
/// what looks like a valid early exit loop with store beforehand. | ||
std::optional<LoadInst *> EarlyExitLoad; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens in the case of two loads?, i.e. %ld1 = load i8, ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would fail the check that the second operand to the compare is loop invariant. There's a backup check in the vplan transform that would reject it at that point too. In future, it would be nice to increase the number of cases handled, but I've tried to keep it simple for now and just come up with a list of extra cases to handle later. Some of the possible future work:
|
||
}; | ||
|
||
} // namespace llvm | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
#include "llvm/Transforms/Vectorize/LoopVectorizationLegality.h" | ||
#include "llvm/Analysis/Loads.h" | ||
#include "llvm/Analysis/LoopInfo.h" | ||
#include "llvm/Analysis/MustExecute.h" | ||
#include "llvm/Analysis/OptimizationRemarkEmitter.h" | ||
#include "llvm/Analysis/ScalarEvolutionExpressions.h" | ||
#include "llvm/Analysis/TargetLibraryInfo.h" | ||
|
@@ -1209,6 +1210,36 @@ bool LoopVectorizationLegality::canVectorizeMemory() { | |
}); | ||
} | ||
|
||
// FIXME: Remove or reduce this restriction. We're in a bit of an odd spot | ||
// since we're (potentially) doing the load out of its normal order | ||
// in the loop and that may throw off dependency checking. | ||
// A forward dependency should be fine, but a backwards dep may not | ||
// be even if LAA thinks it is due to performing the load for the | ||
// vector iteration i+1 in vector iteration i. | ||
if (isConditionCopyRequired()) { | ||
assert(EarlyExitLoad.has_value() && "EE Store without condition load."); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you change |
||
|
||
if (LAI->canVectorizeMemory()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better to hoist this out and call it once to be reused here and below. |
||
const MemoryDepChecker &DepChecker = LAI->getDepChecker(); | ||
const auto *Deps = DepChecker.getDependences(); | ||
|
||
for (const MemoryDepChecker::Dependence &Dep : *Deps) { | ||
if (Dep.getDestination(DepChecker) == EarlyExitLoad || | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have to worry about other loads too? The header block could have several loads, with only one of them contributing to the branch condition. |
||
Dep.getSource(DepChecker) == EarlyExitLoad) { | ||
// Refine language a little? This currently only applies when a store | ||
// is present in the early exit loop. | ||
reportVectorizationFailure( | ||
"No dependencies allowed for early exit condition load", | ||
"Early exit condition loads may not have a dependence with another" | ||
" memory operation.", | ||
"CantVectorizeStoreToLoopInvariantAddress", ORE, | ||
TheLoop); | ||
return false; | ||
} | ||
} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens in the else case? Looks like we fall through to:
and so we may still treat the loop as legal. Worth adding a test for this case I think. |
||
} | ||
|
||
if (!LAI->canVectorizeMemory()) | ||
return canVectorizeIndirectUnsafeDependences(); | ||
|
||
|
@@ -1627,6 +1658,7 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |
// Keep a record of all the exiting blocks. | ||
SmallVector<const SCEVPredicate *, 4> Predicates; | ||
std::optional<std::pair<BasicBlock *, BasicBlock *>> SingleUncountableEdge; | ||
std::optional<LoadInst *> EELoad; | ||
for (BasicBlock *BB : ExitingBlocks) { | ||
const SCEV *EC = | ||
PSE.getSE()->getPredicatedExitCount(TheLoop, BB, &Predicates); | ||
|
@@ -1656,6 +1688,21 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |
return false; | ||
} | ||
|
||
// For loops with stores. | ||
// Record load for analysis by isDereferenceableAndAlignedInLoop | ||
// and later by dependence analysis. | ||
if (BranchInst *Br = dyn_cast<BranchInst>(BB->getTerminator())) { | ||
// FIXME: Handle exit conditions with multiple users, more complex exit | ||
// conditions than br(icmp(load, loop_inv)). | ||
ICmpInst *Cmp = dyn_cast<ICmpInst>(Br->getCondition()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess there is no reason to restrict this to integer comparisons, right? It should work fine with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should, but this is supposed to be a minimal implementation :) |
||
if (Cmp && Cmp->hasOneUse() && | ||
TheLoop->isLoopInvariant(Cmp->getOperand(1))) { | ||
LoadInst *Load = dyn_cast<LoadInst>(Cmp->getOperand(0)); | ||
if (Load && Load->hasOneUse() && TheLoop->contains(Load)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is
In general, do we still make correct decisions regarding dependences in the loop between loads and stores? I'm thinking of situations where you have
or
or similarly for stores to negative offsets, like a[i - 4] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The store can also have a dependence with another load in the loop that doesn't contribute to the early exit condition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So there shouldn't be a problem with stores since dependencies for the EarlyExitLoad are checked later (in canVectorizeMemory()), but I didn't check for a uniform load. I've added the !loopInvariant check for now. Technically it would have been rejected by the vplan transformation and the plan deleted (since that's looking for a load with an address based on the canonical IV), but it's nice if we can reject earlier instead. |
||
EELoad = Load; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe you can avoid doing the work here and only do the analysis if you find a store? You can do this by simply keeping a copy of the terminator's condition, since we only support a single early uncountable exit anyway. |
||
} | ||
} | ||
|
||
SingleUncountableEdge = {BB, ExitBlock}; | ||
} else | ||
CountableExitingBlocks.push_back(BB); | ||
|
@@ -1708,16 +1755,31 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |
} | ||
}; | ||
|
||
bool HasStore = false; | ||
for (auto *BB : TheLoop->blocks()) | ||
for (auto &I : *BB) { | ||
if (StoreInst *SI = dyn_cast<StoreInst>(&I)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it matter if the store is prior to or after the early exit? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. Since this prototype bails out to a scalar loop if the next vector iteration would exit partway through, it doesn't matter where the store is in the loop. If/when we do tail-folding for these loops, we will need to generate different masks for state-changing operations before or after an exit. |
||
HasStore = true; | ||
if (SI->isSimple()) | ||
continue; | ||
|
||
reportVectorizationFailure( | ||
"Complex writes to memory unsupported in early exit loops", | ||
"Cannot vectorize early exit loop with complex writes to memory", | ||
"WritesInEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} | ||
|
||
if (I.mayWriteToMemory()) { | ||
// We don't support writes to memory. | ||
reportVectorizationFailure( | ||
"Writes to memory unsupported in early exit loops", | ||
"Cannot vectorize early exit loop with writes to memory", | ||
"Complex writes to memory unsupported in early exit loops", | ||
"Cannot vectorize early exit loop with complex writes to memory", | ||
"WritesInEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} else if (!IsSafeOperation(&I)) { | ||
} | ||
|
||
if (!IsSafeOperation(&I)) { | ||
reportVectorizationFailure("Early exit loop contains operations that " | ||
"cannot be speculatively executed", | ||
"UnsafeOperationsEarlyExitLoop", ORE, | ||
|
@@ -1732,13 +1794,53 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |
|
||
// TODO: Handle loops that may fault. | ||
Predicates.clear(); | ||
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC, | ||
&Predicates)) { | ||
|
||
if (HasStore && EELoad.has_value()) { | ||
LoadInst *LI = *EELoad; | ||
if (isDereferenceableAndAlignedInLoop(LI, TheLoop, *PSE.getSE(), *DT, AC, | ||
&Predicates)) { | ||
ICFLoopSafetyInfo SafetyInfo; | ||
SafetyInfo.computeLoopSafetyInfo(TheLoop); | ||
// FIXME: We may have multiple levels of conditional loads, so will | ||
// need to improve on outright rejection at some point. | ||
if (!SafetyInfo.isGuaranteedToExecute(*LI, DT, TheLoop)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this even possible at the moment? Given the branch condition comes from a icmp, which comes from a load, then the only way this might not execute is if the early exiting block is itself only executed conditionally. This isn't a loop structure we currently support. For example, a loop like this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More future work, hence the FIXME. |
||
LLVM_DEBUG( | ||
dbgs() << "Early exit condition load not guaranteed to execute.\n"); | ||
reportVectorizationFailure( | ||
"Early exit condition load not guaranteed to execute", | ||
"Cannot vectorize early exit loop when condition load is not " | ||
"guaranteed to execute", | ||
"EarlyExitLoadNotGuaranteed", ORE, TheLoop); | ||
} | ||
} else { | ||
LLVM_DEBUG(dbgs() << "Early exit condition load potentially unsafe.\n"); | ||
reportVectorizationFailure("Uncounted loop condition not known safe", | ||
"Cannot vectorize early exit loop with " | ||
"possibly unsafe condition load", | ||
"PotentiallyFaultingEarlyExitLoop", ORE, | ||
TheLoop); | ||
return false; | ||
} | ||
} else if (HasStore) { | ||
LLVM_DEBUG(dbgs() << "Found early exit store but no condition load.\n"); | ||
reportVectorizationFailure( | ||
"Loop may fault", | ||
"Cannot vectorize potentially faulting early exit loop", | ||
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop); | ||
"Early exit loop with store but no condition load", | ||
"Cannot vectorize early exit loop with store but no condition load", | ||
"NoConditionLoadForEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} else { | ||
// Read-only loop. | ||
// FIXME: as with the loops with stores, only the loads contributing to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least prior to this patch, this statement is untrue because the way the vectorised loop is constructed in vplan the latch block is always executed regardless of whether we would exit early in the scalar loop, i.e.
will get vectorised into a single block:
Are you saying that with this patch we now create a masked load for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, that's why it's a FIXME. Future work :) |
||
// the loop condition need to be guaranteed dereferenceable and | ||
// aligned. | ||
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC, | ||
&Predicates)) { | ||
reportVectorizationFailure( | ||
"Loop may fault", | ||
"Cannot vectorize potentially faulting early exit loop", | ||
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} | ||
} | ||
|
||
[[maybe_unused]] const SCEV *SymbolicMaxBTC = | ||
|
@@ -1751,6 +1853,11 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |
"backedge taken count: " | ||
<< *SymbolicMaxBTC << '\n'); | ||
UncountableEdge = SingleUncountableEdge; | ||
if (HasStore) { | ||
RequiresEarlyExitConditionCopy = true; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like these are always set as a pair, so perhaps there is no need for |
||
EarlyExitLoad = EELoad; | ||
} | ||
|
||
return true; | ||
} | ||
|
||
|
@@ -1823,6 +1930,8 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) { | |
} else { | ||
if (!isVectorizableEarlyExitLoop()) { | ||
UncountableEdge = std::nullopt; | ||
EarlyExitLoad = std::nullopt; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feels like a good time to add a helper function to reset these variables perhaps? |
||
RequiresEarlyExitConditionCopy = false; | ||
if (DoExtraAnalysis) | ||
Result = false; | ||
else | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the store be anywhere in the loop, i.e. including after the early exit? Might be worth clarifying.
If we do stick with this code in the final version I think it probably needs a better name because it's not obvious that it specifically refers to the condition driving an early exit. How would this scale with multiple exits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The store can be anywhere in the loop, since we either execute a full vector iteration as normal or we bail out to the scalar tail if an exit would occur in the middle.
We need to do this for all exits, so a single transformation to copy and reorder the exit condition IR is sufficient.