[MemProf] Fix incorrect VP metadata update during ICP promotion#201658
Conversation
Track unpromoted candidates explicitly when performing ICP during MemProf context disambiguation. Previously, the code assumed that the first N candidates were always the ones promoted, which led to incorrect metadata on the fallback indirect call if a candidate was skipped (e.g. due to missing definition or being illegal to promote).
|
@llvm/pr-subscribers-lto Author: Teresa Johnson (teresajohnson) ChangesTrack unpromoted candidates explicitly when performing ICP during MemProf Full diff: https://github.com/llvm/llvm-project/pull/201658.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index d02e67a995ec6..ac93576391aff 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -6208,8 +6208,8 @@ void MemProfContextDisambiguation::performICP(
auto *CB = Info.CB;
auto CallsiteIndex = Info.CallsiteInfoStartIndex;
auto TotalCount = Info.TotalCount;
- unsigned NumPromoted = 0;
unsigned NumClones = 0;
+ SmallVector<InstrProfValueData, 8> RemainingCandidates;
for (auto &Candidate : Info.CandidateProfileData) {
auto &StackNode = AllCallsites[CallsiteIndex++];
@@ -6238,6 +6238,7 @@ void MemProfContextDisambiguation::performICP(
// FIXME: See if we can use the new declaration importing support to
// at least get the declarations imported for this case. Hot indirect
// targets should have been imported normally, however.
+ RemainingCandidates.push_back(Candidate);
continue;
}
@@ -6251,6 +6252,7 @@ void MemProfContextDisambiguation::performICP(
<< " with count of " << ore::NV("TotalCount", TotalCount)
<< ": " << Reason;
});
+ RemainingCandidates.push_back(Candidate);
continue;
}
@@ -6304,10 +6306,9 @@ void MemProfContextDisambiguation::performICP(
// Update TotalCount (all clones should get same count above)
TotalCount -= Candidate.Count;
- NumPromoted++;
}
// Adjust the MD.prof metadata for all clones, now that we have the new
- // TotalCount and the number promoted.
+ // TotalCount and the remaining candidates.
CallBase *CBClone = CB;
for (unsigned J = 0; J < NumClones; J++) {
// If the VMap is empty, this clone was a duplicate of another and was
@@ -6322,9 +6323,8 @@ void MemProfContextDisambiguation::performICP(
// If all promoted, we don't need the MD.prof metadata.
// Otherwise we need update with the un-promoted records back.
if (TotalCount != 0)
- annotateValueSite(
- M, *CBClone, ArrayRef(Info.CandidateProfileData).slice(NumPromoted),
- TotalCount, IPVK_IndirectCallTarget, Info.NumCandidates);
+ annotateValueSite(M, *CBClone, RemainingCandidates, TotalCount,
+ IPVK_IndirectCallTarget, Info.NumCandidates);
}
}
}
diff --git a/llvm/test/ThinLTO/X86/memprof-icp-metadata.ll b/llvm/test/ThinLTO/X86/memprof-icp-metadata.ll
new file mode 100644
index 0000000000000..5eb2af0abddc6
--- /dev/null
+++ b/llvm/test/ThinLTO/X86/memprof-icp-metadata.ll
@@ -0,0 +1,96 @@
+;; Test that MemProf ICP correctly updates MD_prof metadata for the fallback
+;; call when some candidates are skipped during promotion.
+
+; REQUIRES: asserts
+
+; RUN: split-file %s %t
+; RUN: opt -thinlto-bc %t/main.ll -o %t/main.o
+; RUN: opt -thinlto-bc %t/foo.ll -o %t/foo.o
+
+;; Perform ThinLTO. We provide the definition for _ZN2B03barEj but not
+;;_ZN1B3barEj. With -memprof-require-definition-for-promotion, _ZN1B3barEj
+;; should be skipped and _ZN2B03barEj should be promoted.
+; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
+; RUN: -enable-memprof-indirect-call-support=true \
+; RUN: -supports-hot-cold-new \
+; RUN: -r=%t/foo.o,_Z3fooR2B0j,plx \
+; RUN: -r=%t/foo.o,_ZN2B03barEj, \
+; RUN: -r=%t/main.o,main,plx \
+; RUN: -r=%t/main.o,_Z3fooR2B0j, \
+; RUN: -r=%t/main.o,_ZN2B03barEj,plx \
+; RUN: -r=%t/main.o,_Znwm, \
+; RUN: -thinlto-threads=1 \
+; RUN: -pass-remarks=memprof-context-disambiguation \
+; RUN: -save-temps \
+; RUN: -memprof-require-definition-for-promotion \
+; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=REMARKS
+
+; REMARKS: promoted and assigned to call function clone _ZN2B03barEj
+
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+
+;; Check that the fallback call has the correct VP metadata for the skipped
+;; candidate (_ZN1B3barEj with MD5 4445083295448962937 and count 2).
+; IR: define {{.*}} @_Z3fooR2B0j
+; IR: %[[R1:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
+; IR: br i1 %[[R1]], label %[[LABEL:.*]], label %if.false.orig_indirect
+; IR: if.false.orig_indirect:
+; IR: tail call i32 %0(ptr null, i32 0), !prof ![[PROF:[0-9]+]]
+; IR: ![[PROF]] = !{!"VP", i32 0, i64 2, i64 4445083295448962937, i64 2}
+
+;--- foo.ll
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+declare i32 @_ZN2B03barEj(ptr, i32)
+
+define i32 @_Z3fooR2B0j(ptr %b) {
+entry:
+ %0 = load ptr, ptr %b, align 8
+ %call = tail call i32 %0(ptr null, i32 0), !prof !1, !callsite !2
+ ret i32 %call
+}
+
+;; VP metadata with two candidates:
+;; 1. MD5 4445083295448962937 (_ZN1B3barEj), count 2
+;; 2. MD5 -2718743882639408571 (_ZN2B03barEj), count 2
+!1 = !{!"VP", i32 0, i64 4, i64 4445083295448962937, i64 2, i64 -2718743882639408571, i64 2}
+!2 = !{i64 -2101080423462424381}
+
+;--- main.ll
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @main() {
+entry:
+ %call2 = call i32 @_Z3fooR2B0j(ptr null), !callsite !30
+ %call4 = call i32 @_Z3fooR2B0j(ptr null), !callsite !31
+ ret i32 0
+}
+
+declare i32 @_Z3fooR2B0j(ptr)
+
+define i32 @_ZN2B03barEj(ptr %this, i32 %s) {
+entry:
+ %call = tail call ptr @_Znwm(i64 noundef 4) #0, !memprof !33, !callsite !38
+ ret i32 %s
+}
+
+declare ptr @_Znwm(i64)
+
+attributes #0 = { builtin allocsize(0) }
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"ProfileSummary", !1}
+!1 = !{!2, !3}
+!2 = !{!"ProfileFormat", !"InstrProf"}
+!3 = !{!"TotalCount", i64 4}
+
+!30 = !{i64 -6490791336773930154}
+!31 = !{i64 5188446645037944434}
+!33 = !{!34, !36}
+!34 = !{!35, !"notcold"}
+!35 = !{i64 -852997907418798798, i64 -2101080423462424381, i64 -6490791336773930154}
+!36 = !{!37, !"cold"}
+!37 = !{i64 -852997907418798798, i64 -2101080423462424381, i64 5188446645037944434}
+!38 = !{i64 -852997907418798798}
|
|
@llvm/pr-subscribers-llvm-transforms Author: Teresa Johnson (teresajohnson) ChangesTrack unpromoted candidates explicitly when performing ICP during MemProf Full diff: https://github.com/llvm/llvm-project/pull/201658.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index d02e67a995ec6..ac93576391aff 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -6208,8 +6208,8 @@ void MemProfContextDisambiguation::performICP(
auto *CB = Info.CB;
auto CallsiteIndex = Info.CallsiteInfoStartIndex;
auto TotalCount = Info.TotalCount;
- unsigned NumPromoted = 0;
unsigned NumClones = 0;
+ SmallVector<InstrProfValueData, 8> RemainingCandidates;
for (auto &Candidate : Info.CandidateProfileData) {
auto &StackNode = AllCallsites[CallsiteIndex++];
@@ -6238,6 +6238,7 @@ void MemProfContextDisambiguation::performICP(
// FIXME: See if we can use the new declaration importing support to
// at least get the declarations imported for this case. Hot indirect
// targets should have been imported normally, however.
+ RemainingCandidates.push_back(Candidate);
continue;
}
@@ -6251,6 +6252,7 @@ void MemProfContextDisambiguation::performICP(
<< " with count of " << ore::NV("TotalCount", TotalCount)
<< ": " << Reason;
});
+ RemainingCandidates.push_back(Candidate);
continue;
}
@@ -6304,10 +6306,9 @@ void MemProfContextDisambiguation::performICP(
// Update TotalCount (all clones should get same count above)
TotalCount -= Candidate.Count;
- NumPromoted++;
}
// Adjust the MD.prof metadata for all clones, now that we have the new
- // TotalCount and the number promoted.
+ // TotalCount and the remaining candidates.
CallBase *CBClone = CB;
for (unsigned J = 0; J < NumClones; J++) {
// If the VMap is empty, this clone was a duplicate of another and was
@@ -6322,9 +6323,8 @@ void MemProfContextDisambiguation::performICP(
// If all promoted, we don't need the MD.prof metadata.
// Otherwise we need update with the un-promoted records back.
if (TotalCount != 0)
- annotateValueSite(
- M, *CBClone, ArrayRef(Info.CandidateProfileData).slice(NumPromoted),
- TotalCount, IPVK_IndirectCallTarget, Info.NumCandidates);
+ annotateValueSite(M, *CBClone, RemainingCandidates, TotalCount,
+ IPVK_IndirectCallTarget, Info.NumCandidates);
}
}
}
diff --git a/llvm/test/ThinLTO/X86/memprof-icp-metadata.ll b/llvm/test/ThinLTO/X86/memprof-icp-metadata.ll
new file mode 100644
index 0000000000000..5eb2af0abddc6
--- /dev/null
+++ b/llvm/test/ThinLTO/X86/memprof-icp-metadata.ll
@@ -0,0 +1,96 @@
+;; Test that MemProf ICP correctly updates MD_prof metadata for the fallback
+;; call when some candidates are skipped during promotion.
+
+; REQUIRES: asserts
+
+; RUN: split-file %s %t
+; RUN: opt -thinlto-bc %t/main.ll -o %t/main.o
+; RUN: opt -thinlto-bc %t/foo.ll -o %t/foo.o
+
+;; Perform ThinLTO. We provide the definition for _ZN2B03barEj but not
+;;_ZN1B3barEj. With -memprof-require-definition-for-promotion, _ZN1B3barEj
+;; should be skipped and _ZN2B03barEj should be promoted.
+; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
+; RUN: -enable-memprof-indirect-call-support=true \
+; RUN: -supports-hot-cold-new \
+; RUN: -r=%t/foo.o,_Z3fooR2B0j,plx \
+; RUN: -r=%t/foo.o,_ZN2B03barEj, \
+; RUN: -r=%t/main.o,main,plx \
+; RUN: -r=%t/main.o,_Z3fooR2B0j, \
+; RUN: -r=%t/main.o,_ZN2B03barEj,plx \
+; RUN: -r=%t/main.o,_Znwm, \
+; RUN: -thinlto-threads=1 \
+; RUN: -pass-remarks=memprof-context-disambiguation \
+; RUN: -save-temps \
+; RUN: -memprof-require-definition-for-promotion \
+; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=REMARKS
+
+; REMARKS: promoted and assigned to call function clone _ZN2B03barEj
+
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+
+;; Check that the fallback call has the correct VP metadata for the skipped
+;; candidate (_ZN1B3barEj with MD5 4445083295448962937 and count 2).
+; IR: define {{.*}} @_Z3fooR2B0j
+; IR: %[[R1:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
+; IR: br i1 %[[R1]], label %[[LABEL:.*]], label %if.false.orig_indirect
+; IR: if.false.orig_indirect:
+; IR: tail call i32 %0(ptr null, i32 0), !prof ![[PROF:[0-9]+]]
+; IR: ![[PROF]] = !{!"VP", i32 0, i64 2, i64 4445083295448962937, i64 2}
+
+;--- foo.ll
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+declare i32 @_ZN2B03barEj(ptr, i32)
+
+define i32 @_Z3fooR2B0j(ptr %b) {
+entry:
+ %0 = load ptr, ptr %b, align 8
+ %call = tail call i32 %0(ptr null, i32 0), !prof !1, !callsite !2
+ ret i32 %call
+}
+
+;; VP metadata with two candidates:
+;; 1. MD5 4445083295448962937 (_ZN1B3barEj), count 2
+;; 2. MD5 -2718743882639408571 (_ZN2B03barEj), count 2
+!1 = !{!"VP", i32 0, i64 4, i64 4445083295448962937, i64 2, i64 -2718743882639408571, i64 2}
+!2 = !{i64 -2101080423462424381}
+
+;--- main.ll
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @main() {
+entry:
+ %call2 = call i32 @_Z3fooR2B0j(ptr null), !callsite !30
+ %call4 = call i32 @_Z3fooR2B0j(ptr null), !callsite !31
+ ret i32 0
+}
+
+declare i32 @_Z3fooR2B0j(ptr)
+
+define i32 @_ZN2B03barEj(ptr %this, i32 %s) {
+entry:
+ %call = tail call ptr @_Znwm(i64 noundef 4) #0, !memprof !33, !callsite !38
+ ret i32 %s
+}
+
+declare ptr @_Znwm(i64)
+
+attributes #0 = { builtin allocsize(0) }
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"ProfileSummary", !1}
+!1 = !{!2, !3}
+!2 = !{!"ProfileFormat", !"InstrProf"}
+!3 = !{!"TotalCount", i64 4}
+
+!30 = !{i64 -6490791336773930154}
+!31 = !{i64 5188446645037944434}
+!33 = !{!34, !36}
+!34 = !{!35, !"notcold"}
+!35 = !{i64 -852997907418798798, i64 -2101080423462424381, i64 -6490791336773930154}
+!36 = !{!37, !"cold"}
+!37 = !{i64 -852997907418798798, i64 -2101080423462424381, i64 5188446645037944434}
+!38 = !{i64 -852997907418798798}
|
david-xl
left a comment
There was a problem hiding this comment.
this is in backend compilation. Why is there a need to update the fall back path's vp meta data ?
Normal ICP happens later. So the VP metadata should be accurate or I suppose in theory we could do a promotion to the same target again. |
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) ClangClang.Modules/rebuild.mIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
Looks good. With this promotion ordering, does it mean it is possible that a cold target can be promoted first? |
During Memprof ICP we will promote only those that need to invoke a clone, but from hottest to least hottest. But then, yes presumably regular ICP may come along later and promote a hotter target which would then fall later in the comparison sequence. We could consider refining that |
Track unpromoted candidates explicitly when performing ICP during MemProf
context disambiguation. Previously, the code assumed that the first N
candidates were always the ones promoted, which led to incorrect metadata
on the fallback indirect call if a candidate was skipped (e.g. due to
missing definition or being illegal to promote).