-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[LoopInterchange] Hoist isComputableLoopNest() in the control flow #124247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Madhur Amilkanthwar (madhur13490) ChangesThe profiling of the LLVM Test-suite reveals that a significant portion, specifically 14,090 out of 139,323, loop nests were identified as non-viable candidates for transformation, leading to the transform exiting from isComputableLoopNest() without any action. More importantly, dependence information was computed for these loop nests before reaching the function isComputableLoopNest(), which does not require DI and relies solely on scalar evolution (SE). To enhance compile-time efficiency, this patch moves the call to isComputableLoopNest() earlier in the control-flow, thereby avoiding unnecessary dependence calculations. The impact of this change is evident on the compile-time-tracker, with the overall geometric mean improvement recorded at 0.11%, while the lencode benchmark gets a more substantial benefit of 0.44%. Full diff: https://github.com/llvm/llvm-project/pull/124247.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index d366e749c7370d..e995db1f5c1f6a 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -271,6 +271,26 @@ static bool hasSupportedLoopDepth(SmallVectorImpl<Loop *> &LoopList,
}
return true;
}
+
+static bool isComputableLoopNest(ScalarEvolution *SE, ArrayRef<Loop *> LoopList) {
+ for (Loop *L : LoopList) {
+ const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(L);
+ if (isa<SCEVCouldNotCompute>(ExitCountOuter)) {
+ LLVM_DEBUG(dbgs() << "Couldn't compute backedge count\n");
+ return false;
+ }
+ if (L->getNumBackEdges() != 1) {
+ LLVM_DEBUG(dbgs() << "NumBackEdges is not equal to 1\n");
+ return false;
+ }
+ if (!L->getExitingBlock()) {
+ LLVM_DEBUG(dbgs() << "Loop doesn't have unique exit block\n");
+ return false;
+ }
+ }
+ return true;
+}
+
namespace {
/// LoopInterchangeLegality checks if it is legal to interchange the loop.
@@ -426,25 +446,6 @@ struct LoopInterchange {
return processLoopList(LoopList);
}
- bool isComputableLoopNest(ArrayRef<Loop *> LoopList) {
- for (Loop *L : LoopList) {
- const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(L);
- if (isa<SCEVCouldNotCompute>(ExitCountOuter)) {
- LLVM_DEBUG(dbgs() << "Couldn't compute backedge count\n");
- return false;
- }
- if (L->getNumBackEdges() != 1) {
- LLVM_DEBUG(dbgs() << "NumBackEdges is not equal to 1\n");
- return false;
- }
- if (!L->getExitingBlock()) {
- LLVM_DEBUG(dbgs() << "Loop doesn't have unique exit block\n");
- return false;
- }
- }
- return true;
- }
-
unsigned selectLoopForInterchange(ArrayRef<Loop *> LoopList) {
// TODO: Add a better heuristic to select the loop to be interchanged based
// on the dependence matrix. Currently we select the innermost loop.
@@ -459,10 +460,6 @@ struct LoopInterchange {
"Unsupported depth of loop nest.");
unsigned LoopNestDepth = LoopList.size();
- if (!isComputableLoopNest(LoopList)) {
- LLVM_DEBUG(dbgs() << "Not valid loop candidate for interchange\n");
- return false;
- }
LLVM_DEBUG(dbgs() << "Processing LoopList of size = " << LoopNestDepth
<< "\n");
@@ -1755,10 +1752,17 @@ PreservedAnalyses LoopInterchangePass::run(LoopNest &LN,
// Ensure minimum depth of the loop nest to do the interchange.
if (!hasSupportedLoopDepth(LoopList, ORE))
return PreservedAnalyses::all();
+
+ // Ensure computable loop nest.
+ if (!isComputableLoopNest(&AR.SE, LoopList)) {
+ LLVM_DEBUG(dbgs() << "Not valid loop candidate for interchange\n");
+ return PreservedAnalyses::all();
+ }
+
DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI);
std::unique_ptr<CacheCost> CC =
CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI);
-
+
if (!LoopInterchange(&AR.SE, &AR.LI, &DI, &AR.DT, CC, &ORE).run(LN))
return PreservedAnalyses::all();
U.markLoopNestChanged(true);
|
I am not sure how to test this. One possible option is to add an optimization remark before |
✅ With the latest revision this PR passed the C/C++ code formatter. |
I think it depends on what you are testing. Just to check that |
889bbf6
to
456f238
Compare
Yes, I think that too—updated the tests for the new message. |
456f238
to
ca4d8de
Compare
} | ||
|
||
ORE.emit([&]() { | ||
return OptimizationRemark(DEBUG_TYPE, "Dependence", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's appropriate to use OptimizationRemark
here. IIUIC, it is used to communicate the result of the transformation. If you want to use optimization remarks, OptimizationRemarkAnalysis
seems to be better.
Also, I think optimization remarks are what should show the "hints" to the user, so printing them as debug output with dbgs()
seems to be appropriate in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with using dbgs()
but it forces us to rely on asserts build and thus limits the test coverage. We discussed this in my previous PRs like here
I agree with choices between OptimizationRemark
vs Analysis
. If we agree on Remarks, I will change to OptimizationRemarkAnalysis
. WDYT @sjoerdmeijer @CongzheUalberta ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then I agree with using optimization remarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a brief comment describing what this test checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This is a work in progress patch to enable loop-interchange by default and is a continuation of the RFC: https://discourse.llvm.org/t/enabling-loop-interchange/82589 Basically, we promised to fix any compile-time and correctness issues in the different components involved here (loop-interchange and dependence analaysis.) before discussing enabling interchange by default. We think are close to complete this; I would like to explain where we are and wanted to check if there are any thoughts or concerns. A quick overview of the correctness and compile-time improvements that we have made include: Correctness: - [LoopInterchange] Remove 'S' Scalar Dependencies (llvm#119345) - [LoopInterchange] Fix overflow in cost calculation (llvm#111807) - [LoopInterchange] Handle LE and GE correctly (PR llvm#124901) @kasuga-fj - [DA] disambiguate evolution of base addresses (llvm#116628) Compile-times: - [LoopInterchange] Constrain number of load/stores in a loop (llvm#118973) - [LoopInterchange] Bail out early if minimum loop nest is not met (llvm#115128) - [LoopInterchange] Hoist isComputableLoopNest() in the control flow (llvm#124247) And in terms of remaining work, we think we are very close to fixing these depenence analysis issues: - [DA] do not handle array accesses of different offsets (llvm#123436) - [DA] Dependence analysis does not handle array accesses of different sizes (llvm#116630) - [DA] use NSW arithmetic llvm#116632 The compile-time increase with a geomean increase of 0.19% looks good (after committing llvm#124247), I think: stage1-O3: Benchmark kimwitu++ +0.10% sqlite3 +0.14% consumer-typeset +0.07% Bullet +0.06% tramp3d-v4 +0.21% mafft +0.39% ClamAVi +0.06% lencod +0.61% SPASS +0.17% 7zip +0.08% geomean +0.19% See also: http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au We might want to look into lencod to see if we can improve more, but not sure it is strictly necessary.
62d08e8
to
3521e85
Compare
Addressed comments in the latest commit and moved to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being a bit late, this also looks generally good to me, but had 2 questions/nits.
|
||
define dso_local void @_foo(ptr noundef %a, ptr noundef %neg, ptr noundef %pos) { | ||
entry: | ||
%a.addr = alloca ptr, align 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit pick on the test-case: could the IR be simplified? Will the IR be reduced if this is compiled with an higher optimisation level?
And another nitpick: maybe get rid of the
; preds = %for.inc16, %entry
comments
return OptimizationRemarkAnalysis(DEBUG_TYPE, "Dependence", | ||
LN.getOutermostLoop().getStartLoc(), | ||
LN.getOutermostLoop().getHeader()) | ||
<< "Computed dependence info, invoking the transform."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a user-facing optimisation remark, but it is quite "technical".
Correct me if I am wrong, but I think what we are trying to say here is equivalent to this:
LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n");
in function processLoop
on line 525
.
My question is: can we turn this debug message into an OptimisationRemark and not need this new opt remark? I think this could serve two purposes: it is actually interesting information for users, and does it help you with knowing that "dep info has been computed, the transformation is legal"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed above, I think this message is just for testing that the process exits before computing dependence info if the loop is not computable. To achieve this, inserting a new message here seems to be reasonable to me.
However, in terms of messages to users, I think this is a bit excessive or noisy. Replacing it with dbgs()
would be one option. It depends on which what we want to prioritize: test coverage or having useful "technical" messages for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if the state that we are trying to capture here (interchange is legal, but unknown if profitable) is caught by the "Loops are legal to interchange" debug message so that we can change that in an optimisation remark and don't need this new one.
Alternatively, I don't mind changing this optimisation remark in a debug message. In general I don't like tests that rely on debug builds, but this is an exception that would be okay I think.
But I don't have strong opinions on this to be honest, so WDYIT @madhur13490 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for chipping in many times, but I don't think what we want to capture here is "interchange is legal", I think it is something like "This loop form is not supported, so we will not continue the following processes (including the legality check)". Let me know if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with @kasuga-fj. At this point in the code, we cannot guarantee that the loop nest is legal. For example, if Outer most loop does not have unique exit then transform can terminate. (line 489 )
(I do plan to pull that check up too, but that is of lower priority.)
The profiling of the LLVM Test-suite reveals that a significant portion, specifically 14,090 out of 139,323, loop nests were identified as non-viable candidates for transformation, leading to the transform exiting from isComputableLoopNest() without any action. More importantly, dependence information was computed for these loop nests before reaching the function isComputableLoopNest(), which does not require DI and relies solely on scalar evolution (SE). To enhance compile-time efficiency, this patch moves the call to isComputableLoopNest() earlier in the control-flow, thereby avoiding unnecessary dependence calculations. The impact of this change is evident on the compile-time-tracker, with the overall geometric mean improvement recorded at 0.11%, while the lencode benchmark gets a more substantial benefit of 0.44%. This improvement can be tracked in the isc-ln-exp-2 branch under my repo.
3521e85
to
d351384
Compare
@sjoerdmeijer I added a simpler form of the test as you asked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM too
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/15271 Here is the relevant piece of the build log for the reference
|
…lvm#124247) The profiling of the LLVM Test-suite reveals that a significant portion, specifically 14,090 out of 139,323, loop nests were identified as non-viable candidates for transformation, leading to the transform exiting from isComputableLoopNest() without any action. More importantly, dependence information was computed for these loop nests before reaching the function isComputableLoopNest(), which does not require DI and relies solely on scalar evolution (SE). To enhance compile-time efficiency, this patch moves the call to isComputableLoopNest() earlier in the control-flow, thereby avoiding unnecessary dependence calculations. The impact of this change is evident on the compile-time-tracker, with the overall geometric mean improvement recorded at 0.11%, while the lencode benchmark gets a more substantial benefit of 0.44%. This improvement can be tracked in the isc-ln-exp-2 branch under my repo.
The profiling of the LLVM Test-suite reveals that a significant portion, specifically 14,090 out of 139,323, loop nests were identified as non-viable candidates for transformation, leading to the transform exiting from isComputableLoopNest() without any action.
More importantly, dependence information was computed for these loop nests before reaching the function isComputableLoopNest(), which does not require DI and relies solely on scalar evolution (SE).
To enhance compile-time efficiency, this patch moves the call to isComputableLoopNest() earlier in the control-flow, thereby avoiding unnecessary dependence calculations.
The impact of this change is evident on the compile-time-tracker, with the overall geometric mean improvement recorded at 0.11%, while the lencode benchmark gets a more substantial benefit of 0.44%.
This improvement can be tracked in the isc-ln-exp-2 branch under my repo.