Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[LV] Initial support for stores in early exit loops #137774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,13 @@ class LoopVectorizationLegality {
return hasUncountableEarlyExit() ? getUncountableEdge()->second : nullptr;
}

/// Returns true if this is an early exit loop containing a store.
bool isConditionCopyRequired() const { return RequiresEarlyExitConditionCopy; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the store be anywhere in the loop, i.e. including after the early exit? Might be worth clarifying.

If we do stick with this code in the final version I think it probably needs a better name because it's not obvious that it specifically refers to the condition driving an early exit. How would this scale with multiple exits?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store can be anywhere in the loop, since we either execute a full vector iteration as normal or we bail out to the scalar tail if an exit would occur in the middle.

We need to do this for all exits, so a single transformation to copy and reorder the exit condition IR is sufficient.


/// Returns the load instruction, if any, nearest to an uncountable early
/// exit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does nearest refer to nearest load prior to the early exit, after, or either?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load closest in the IR graph to the exit (ideally, the load whose result is used directly in a comparison). I didn't spend much time on the naming ;)

std::optional<LoadInst *> getEarlyExitLoad() const { return EarlyExitLoad; }

/// Return true if there is store-load forwarding dependencies.
bool isSafeForAnyStoreLoadForwardDistances() const {
return LAI->getDepChecker().isSafeForAnyStoreLoadForwardDistances();
Expand Down Expand Up @@ -654,6 +661,16 @@ class LoopVectorizationLegality {
/// Keep track of the loop edge to an uncountable exit, comprising a pair
/// of (Exiting, Exit) blocks, if there is exactly one early exit.
std::optional<std::pair<BasicBlock *, BasicBlock *>> UncountableEdge;

/// Indicates that we will need to copy the early exit condition into
/// the vector preheader, as we will need to mask some operations in
/// the loop (e.g. stores).
bool RequiresEarlyExitConditionCopy = false;

/// The load used to determine an uncountable early-exit condition. This is
/// only used to allow further analysis in canVectorizeMemory if we found
/// what looks like a valid early exit loop with store beforehand.
std::optional<LoadInst *> EarlyExitLoad;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the case of two loads?, i.e.

%ld1 = load i8, ...
%ld2 = load i8, ...
%cmp = icmp eq i8 %ld1, %ld2
br i1 %cmp, label %early.exit, %loop.inc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail the check that the second operand to the compare is loop invariant. There's a backup check in the vplan transform that would reject it at that point too.

In future, it would be nice to increase the number of cases handled, but I've tried to keep it simple for now and just come up with a list of extra cases to handle later.

Some of the possible future work:

  1. Supporting a combined condition; e.g. no escaping value, but two exit conditions or'd together. Needs work in ScalarEvolution? (This actually applies to the sample loop I chose; for the IR test I manually edited the IR to create 2 exits again)

  2. Multiple uses of loads or comparisons in the exit IR chain; requires introducing a PHI node for current vector iteration value.

  3. Supporting a chain of conditional loads; e.g. early exit is itself in a conditional block so might not always execute.

  4. Supporting a non-const second term for comparison, or more varied comparisons (e.g. bit test instead of icmp).

  5. Tail folding of EE loops with a store.

  6. Dynamic bounds for known-dereferenceable load (via e.g. call void @llvm.assume(i1 true) [ "align"(ptr %pred, i64 2), "dereferenceable"(ptr %pred, i32 %n_bytes) ])

};

} // namespace llvm
Expand Down
125 changes: 117 additions & 8 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "llvm/Transforms/Vectorize/LoopVectorizationLegality.h"
#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MustExecute.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
Expand Down Expand Up @@ -1209,6 +1210,36 @@ bool LoopVectorizationLegality::canVectorizeMemory() {
});
}

// FIXME: Remove or reduce this restriction. We're in a bit of an odd spot
// since we're (potentially) doing the load out of its normal order
// in the loop and that may throw off dependency checking.
// A forward dependency should be fine, but a backwards dep may not
// be even if LAA thinks it is due to performing the load for the
// vector iteration i+1 in vector iteration i.
if (isConditionCopyRequired()) {
assert(EarlyExitLoad.has_value() && "EE Store without condition load.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change isConditionCopyRequired to test for EarlyExitLoad having a value, then you don't need this assert.


if (LAI->canVectorizeMemory()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to hoist this out and call it once to be reused here and below.

const MemoryDepChecker &DepChecker = LAI->getDepChecker();
const auto *Deps = DepChecker.getDependences();

for (const MemoryDepChecker::Dependence &Dep : *Deps) {
if (Dep.getDestination(DepChecker) == EarlyExitLoad ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about other loads too? The header block could have several loads, with only one of them contributing to the branch condition.

Dep.getSource(DepChecker) == EarlyExitLoad) {
// Refine language a little? This currently only applies when a store
// is present in the early exit loop.
reportVectorizationFailure(
"No dependencies allowed for early exit condition load",
"Early exit condition loads may not have a dependence with another"
" memory operation.",
"CantVectorizeStoreToLoopInvariantAddress", ORE,
TheLoop);
return false;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the else case? Looks like we fall through to:

  if (!LAI->canVectorizeMemory())
    return canVectorizeIndirectUnsafeDependences();

and so we may still treat the loop as legal. Worth adding a test for this case I think.

}

if (!LAI->canVectorizeMemory())
return canVectorizeIndirectUnsafeDependences();

Expand Down Expand Up @@ -1627,6 +1658,7 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
// Keep a record of all the exiting blocks.
SmallVector<const SCEVPredicate *, 4> Predicates;
std::optional<std::pair<BasicBlock *, BasicBlock *>> SingleUncountableEdge;
std::optional<LoadInst *> EELoad;
for (BasicBlock *BB : ExitingBlocks) {
const SCEV *EC =
PSE.getSE()->getPredicatedExitCount(TheLoop, BB, &Predicates);
Expand Down Expand Up @@ -1656,6 +1688,21 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
return false;
}

// For loops with stores.
// Record load for analysis by isDereferenceableAndAlignedInLoop
// and later by dependence analysis.
if (BranchInst *Br = dyn_cast<BranchInst>(BB->getTerminator())) {
// FIXME: Handle exit conditions with multiple users, more complex exit
// conditions than br(icmp(load, loop_inv)).
ICmpInst *Cmp = dyn_cast<ICmpInst>(Br->getCondition());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is no reason to restrict this to integer comparisons, right? It should work fine with FCmpInst too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, but this is supposed to be a minimal implementation :)

if (Cmp && Cmp->hasOneUse() &&
TheLoop->isLoopInvariant(Cmp->getOperand(1))) {
LoadInst *Load = dyn_cast<LoadInst>(Cmp->getOperand(0));
if (Load && Load->hasOneUse() && TheLoop->contains(Load))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TheLoop->contains enough here, because it doesn't guarantee Load varies in the loop. Perhaps the load hasn't been hoisted out because it might have a memory conflict with another store? You might still want to do:

  if (Load && Load->hasOneUse() && !TheLoop->isLoopInvariant(Cmp->getOperand(0)))

In general, do we still make correct decisions regarding dependences in the loop between loads and stores? I'm thinking of situations where you have

for.body:
  ... load from a[i] ...
  ... store to a[i + 4] ...
  ... early exit compare ...
  br i1 %cmp, label %early.exit, label %for.inc

for.inc:
  ...

or

for.body:
  ... load from a[i] ...
  ... early exit compare ...
  br i1 %cmp, label %early.exit, label %for.inc

for.inc:
  ... store to a[i + 4] ...
  ...

or similarly for stores to negative offsets, like a[i - 4]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store can also have a dependence with another load in the loop that doesn't contribute to the early exit condition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there shouldn't be a problem with stores since dependencies for the EarlyExitLoad are checked later (in canVectorizeMemory()), but I didn't check for a uniform load. I've added the !loopInvariant check for now.

Technically it would have been rejected by the vplan transformation and the plan deleted (since that's looking for a load with an address based on the canonical IV), but it's nice if we can reject earlier instead.

EELoad = Load;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can avoid doing the work here and only do the analysis if you find a store? You can do this by simply keeping a copy of the terminator's condition, since we only support a single early uncountable exit anyway.

}
}

SingleUncountableEdge = {BB, ExitBlock};
} else
CountableExitingBlocks.push_back(BB);
Expand Down Expand Up @@ -1708,16 +1755,31 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
}
};

bool HasStore = false;
for (auto *BB : TheLoop->blocks())
for (auto &I : *BB) {
if (StoreInst *SI = dyn_cast<StoreInst>(&I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter if the store is prior to or after the early exit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Since this prototype bails out to a scalar loop if the next vector iteration would exit partway through, it doesn't matter where the store is in the loop.

If/when we do tail-folding for these loops, we will need to generate different masks for state-changing operations before or after an exit.

HasStore = true;
if (SI->isSimple())
continue;

reportVectorizationFailure(
"Complex writes to memory unsupported in early exit loops",
"Cannot vectorize early exit loop with complex writes to memory",
"WritesInEarlyExitLoop", ORE, TheLoop);
return false;
}

if (I.mayWriteToMemory()) {
// We don't support writes to memory.
reportVectorizationFailure(
"Writes to memory unsupported in early exit loops",
"Cannot vectorize early exit loop with writes to memory",
"Complex writes to memory unsupported in early exit loops",
"Cannot vectorize early exit loop with complex writes to memory",
"WritesInEarlyExitLoop", ORE, TheLoop);
return false;
} else if (!IsSafeOperation(&I)) {
}

if (!IsSafeOperation(&I)) {
reportVectorizationFailure("Early exit loop contains operations that "
"cannot be speculatively executed",
"UnsafeOperationsEarlyExitLoop", ORE,
Expand All @@ -1732,13 +1794,53 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {

// TODO: Handle loops that may fault.
Predicates.clear();
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC,
&Predicates)) {

if (HasStore && EELoad.has_value()) {
LoadInst *LI = *EELoad;
if (isDereferenceableAndAlignedInLoop(LI, TheLoop, *PSE.getSE(), *DT, AC,
&Predicates)) {
ICFLoopSafetyInfo SafetyInfo;
SafetyInfo.computeLoopSafetyInfo(TheLoop);
// FIXME: We may have multiple levels of conditional loads, so will
// need to improve on outright rejection at some point.
if (!SafetyInfo.isGuaranteedToExecute(*LI, DT, TheLoop)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even possible at the moment? Given the branch condition comes from a icmp, which comes from a load, then the only way this might not execute is if the early exiting block is itself only executed conditionally. This isn't a loop structure we currently support. For example, a loop like this:

for.body:
   %cmp1 = icmp ...
   br i1 %cmp1, label %test.exit, %for.inc

test.exit:
   %ld1 = load i8, ...
   %cmp2 = icmp eq i8 %ld1, ...
   br i1 %cmp2, label %early.exit, label %for.inc

for.inc:
   ... branch back to for.loop or leave loop ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More future work, hence the FIXME.

LLVM_DEBUG(
dbgs() << "Early exit condition load not guaranteed to execute.\n");
reportVectorizationFailure(
"Early exit condition load not guaranteed to execute",
"Cannot vectorize early exit loop when condition load is not "
"guaranteed to execute",
"EarlyExitLoadNotGuaranteed", ORE, TheLoop);
}
} else {
LLVM_DEBUG(dbgs() << "Early exit condition load potentially unsafe.\n");
reportVectorizationFailure("Uncounted loop condition not known safe",
"Cannot vectorize early exit loop with "
"possibly unsafe condition load",
"PotentiallyFaultingEarlyExitLoop", ORE,
TheLoop);
return false;
}
} else if (HasStore) {
LLVM_DEBUG(dbgs() << "Found early exit store but no condition load.\n");
reportVectorizationFailure(
"Loop may fault",
"Cannot vectorize potentially faulting early exit loop",
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
"Early exit loop with store but no condition load",
"Cannot vectorize early exit loop with store but no condition load",
"NoConditionLoadForEarlyExitLoop", ORE, TheLoop);
return false;
} else {
// Read-only loop.
// FIXME: as with the loops with stores, only the loads contributing to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least prior to this patch, this statement is untrue because the way the vectorised loop is constructed in vplan the latch block is always executed regardless of whether we would exit early in the scalar loop, i.e.

for.body:
  %ld1 = load i8, ...
  ...

for.inc:
  %ld2 = load i8, ...
  ...

will get vectorised into a single block:

vector.body:
  ...
  %vld1 = load <16 x i8>, ...
  %vld2 = load <16 x i8>, ...
  ...
  %vcmp = icmp eq <16 x i8> %vld1, splat (i8 3)
  %early.exit.cmp = reduce or %vcmp
  %latch.exit.cmp = icmp ...
  %final.cmp = or i1 %early.exit.cmp, %latch.exit.cmp
  br i1 %final.cmp, label %middle.split, label %vector.body

Are you saying that with this patch we now create a masked load for %vld2 instead based on the early exit condition?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's why it's a FIXME. Future work :)

// the loop condition need to be guaranteed dereferenceable and
// aligned.
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC,
&Predicates)) {
reportVectorizationFailure(
"Loop may fault",
"Cannot vectorize potentially faulting early exit loop",
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
return false;
}
}

[[maybe_unused]] const SCEV *SymbolicMaxBTC =
Expand All @@ -1751,6 +1853,11 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
"backedge taken count: "
<< *SymbolicMaxBTC << '\n');
UncountableEdge = SingleUncountableEdge;
if (HasStore) {
RequiresEarlyExitConditionCopy = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these are always set as a pair, so perhaps there is no need for RequiresEarlyExitConditionCopy and you can just test if EarlyExitLoad has a value or not?

EarlyExitLoad = EELoad;
}

return true;
}

Expand Down Expand Up @@ -1823,6 +1930,8 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
} else {
if (!isVectorizableEarlyExitLoop()) {
UncountableEdge = std::nullopt;
EarlyExitLoad = std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a good time to add a helper function to reset these variables perhaps?

RequiresEarlyExitConditionCopy = false;
if (DoExtraAnalysis)
Result = false;
else
Expand Down
21 changes: 21 additions & 0 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9246,6 +9246,15 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
VPlanTransforms::runPass(VPlanTransforms::truncateToMinimalBitwidths,
*Plan, CM.getMinimalBitwidths());
VPlanTransforms::optimize(*Plan);

// See if we can convert an early exit vplan to bail out to a scalar
// loop if state-changing operations (like stores) are present and
// an exit will be taken in the next vector iteration.
// If not, discard the plan.
if (Legal->isConditionCopyRequired() && !HasScalarVF &&
!VPlanTransforms::runPass(VPlanTransforms::tryEarlyExitConversion,
*Plan))
break;
// TODO: try to put it close to addActiveLaneMask().
// Discard the plan if it is not EVL-compatible
if (CM.foldTailWithEVL() && !HasScalarVF &&
Expand Down Expand Up @@ -9570,6 +9579,12 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
},
Range);
auto Plan = std::make_unique<VPlan>(OrigLoop);

// FIXME: Better place to put this? Or maybe an enum for how to handle
// early exits?
if (Legal->hasUncountableEarlyExit())
Plan->setEarlyExitContinuesInScalarLoop(Legal->isConditionCopyRequired());

// Build hierarchical CFG.
// TODO: Convert to VPlan-transform and consolidate all transforms for VPlan
// creation.
Expand Down Expand Up @@ -9876,6 +9891,12 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlan(VFRange &Range) {

// Create new empty VPlan
auto Plan = std::make_unique<VPlan>(OrigLoop);

// FIXME: Better place to put this? Or maybe an enum for how to handle
// early exits?
if (Legal->hasUncountableEarlyExit())
Plan->setEarlyExitContinuesInScalarLoop(Legal->isConditionCopyRequired());

// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildPlainCFG();
Expand Down
17 changes: 17 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -3522,6 +3522,13 @@ class VPlan {
/// VPlan is destroyed.
SmallVector<VPBlockBase *> CreatedBlocks;

/// Indicates that an early exit loop will exit before the condition is
/// reached, and that the scalar loop must perform the last few iterations.
/// FIXME: Is this the right place? We mainly want to make sure that we
/// know about this for transforming the plan to copy&move the exit
/// condition, but maybe it doesn't need to be in the plan itself.
bool EarlyExitContinuesInScalarLoop = false;

/// Construct a VPlan with \p Entry to the plan and with \p ScalarHeader
/// wrapping the original header of the scalar loop.
VPlan(VPBasicBlock *Entry, VPIRBasicBlock *ScalarHeader)
Expand Down Expand Up @@ -3825,6 +3832,16 @@ class VPlan {
return ExitBlocks.size() > 1 || ExitBlocks[0]->getNumPredecessors() > 1;
}

/// Returns true if all exit paths should reach the scalar loop.
bool shouldEarlyExitContinueInScalarLoop() const {
return EarlyExitContinuesInScalarLoop;
}

/// Set early exit vectorization to always reach the scalar loop.
void setEarlyExitContinuesInScalarLoop(bool Continues) {
EarlyExitContinuesInScalarLoop = Continues;
}

/// Returns true if the scalar tail may execute after the vector loop. Note
/// that this relies on unneeded branches to the scalar tail loop being
/// removed.
Expand Down
Loading