[BOLT] Add pre-parsed perf script support#163785
Conversation
|
@llvm/pr-subscribers-bolt Author: Ádám Kallai (kaadam) ChangesExtend perf2bolt functionality by adding a new option to read perf-script output in textual format which created by Linux Perf with using 'script' command. This option helps to add a large Spe test into the 'bolt-tests' repository to cover Arm Spe aggregation. Why does the test need to have a textual format Spe profile?
To bypass these technical difficulties, that's easier to provide a pre-generated textual profile format. How should generate this type of profile?
Full diff: https://github.com/llvm/llvm-project/pull/163785.diff 2 Files Affected:
diff --git a/bolt/include/bolt/Profile/DataAggregator.h b/bolt/include/bolt/Profile/DataAggregator.h
index cb1b87f8d0d65..de88a8bb8ad1e 100644
--- a/bolt/include/bolt/Profile/DataAggregator.h
+++ b/bolt/include/bolt/Profile/DataAggregator.h
@@ -440,6 +440,32 @@ class DataAggregator : public DataReader {
/// B 4b196f 4b19e0 2 0
void parsePreAggregated();
+ /// Detect whether the parsed line is an mmap event or not.
+ bool isMMapEvent(StringRef Line);
+
+ /// Coordinate reading and parsing a hybrid perf-script trace created by
+ /// the following Linux perf script command:
+ /// 'perf script --show-mmap-events -F pid,brstack --itrace=bl -i perf.data'
+ ///
+ /// Note:
+ /// The original perf.data should be profiled with '-b' or 'Arm Spe'.
+ ///
+ /// How the output of this command looks like:
+ /// {<name> .* <sec>.<usec>: }PERF_RECORD_MMAP2 <pid>/<tid>: .* <file_name>
+ /// {<name> .* <sec>.<usec>: }PERF_RECORD_MMAP2 <pid>/<tid>: .* <file_name>
+ /// PID {FROM/TO/P/-/-/1/COND/-}+
+ /// PID {FROM/TO/P/-/-/1/COND/-}+
+ ///
+ /// The hybrid profile means it contains mmap events along with branch events.
+ /// An mmap event might appear among the branch events, therefore
+ /// Bolt will read this hybrid profile, selects the mmap events, the other
+ /// events treat as branch event.
+ /// Then it prepares the ParsingBuf based on the classification and
+ /// call the proper functions like parseMMapEvents() or parseBranchEvents().
+ ///
+ /// This option is only for testing purposes.
+ void parsePerfScriptEvents();
+
/// Parse the full output of pre-aggregated LBR samples generated by
/// an external tool.
std::error_code parsePreAggregatedLBRSamples();
diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index c13fa6dbe582b..8a2119480d49b 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -115,6 +115,12 @@ cl::opt<std::string>
"perf-script output in a textual format"),
cl::ReallyHidden, cl::init(""), cl::cat(AggregatorCategory));
+cl::opt<bool>
+ ReadPerfScript("perfscript",
+ cl::desc("skip perf and read perf-script trace created by "
+ "Linux perf tool with script command"),
+ cl::ReallyHidden, cl::cat(AggregatorCategory));
+
static cl::opt<bool>
TimeAggregator("time-aggr",
cl::desc("time BOLT aggregator"),
@@ -184,7 +190,8 @@ void DataAggregator::start() {
// Don't launch perf for pre-aggregated files or when perf input is specified
// by the user.
- if (opts::ReadPreAggregated || !opts::ReadPerfEvents.empty())
+ if (opts::ReadPreAggregated || opts::ReadPerfScript ||
+ !opts::ReadPerfEvents.empty())
return;
findPerfExecutable();
@@ -226,7 +233,7 @@ void DataAggregator::start() {
}
void DataAggregator::abort() {
- if (opts::ReadPreAggregated)
+ if (opts::ReadPreAggregated || opts::ReadPerfScript)
return;
std::string Error;
@@ -326,7 +333,7 @@ void DataAggregator::processFileBuildID(StringRef FileBuildID) {
}
bool DataAggregator::checkPerfDataMagic(StringRef FileName) {
- if (opts::ReadPreAggregated)
+ if (opts::ReadPreAggregated || opts::ReadPerfScript)
return true;
Expected<sys::fs::file_t> FD = sys::fs::openNativeFileForRead(FileName);
@@ -372,6 +379,80 @@ void DataAggregator::parsePreAggregated() {
}
}
+bool DataAggregator::isMMapEvent(StringRef Line) {
+ // Short cut to avoid string find is possible.
+ if (Line.empty() || Line.size() < 50)
+ return false;
+
+ // Check that PERF_RECORD_MMAP2 or PERF_RECORD_MMAP appear in the line.
+ return Line.contains("PERF_RECORD_MMAP");
+}
+
+void DataAggregator::parsePerfScriptEvents() {
+ outs() << "PERF2BOLT: parsing a hybrid perf-script events...\n";
+ NamedRegionTimer T("parsePerfScriptEvents", "Parsing perf-script events",
+ TimerGroupName, TimerGroupDesc, opts::TimeAggregator);
+
+ ErrorOr<std::unique_ptr<MemoryBuffer>> MB =
+ MemoryBuffer::getFileOrSTDIN(Filename);
+ if (std::error_code EC = MB.getError()) {
+ errs() << "PERF2BOLT-ERROR: cannot open " << Filename << ": "
+ << EC.message() << "\n";
+ exit(1);
+ }
+
+ FileBuf = std::move(*MB);
+ ParsingBuf = FileBuf->getBuffer();
+ Col = 0;
+ Line = 1;
+ std::string MMapEvents = "";
+ std::string BranchEvents = "";
+
+ if (!hasData())
+ return;
+
+ while (hasData()) {
+
+ size_t LineEnd = ParsingBuf.find_first_of("\n");
+ if (LineEnd == StringRef::npos) {
+ reportError("expected rest of line");
+ errs() << "Found: " << ParsingBuf << "\n";
+ }
+ StringRef Event = ParsingBuf.substr(0, LineEnd);
+
+ if (isMMapEvent(Event)) {
+ MMapEvents += Event.str();
+ MMapEvents += "\n";
+ } else {
+ BranchEvents += Event.str();
+ BranchEvents += '\n';
+ }
+
+ ParsingBuf = ParsingBuf.drop_front(LineEnd + 1);
+ Col = 0;
+ Line += 1;
+ }
+
+ // Set ParsingBuf for MMapEvents
+ ParsingBuf = StringRef(MMapEvents);
+ Col = 0;
+ Line = 1;
+ if (!ParsingBuf.empty() && parseMMapEvents()) {
+ errs() << "PERF2BOLT: failed to parse mmap events from the perf-script "
+ "file.\n";
+ exit(1);
+ }
+
+ // Set ParsingBuf for BranchEvents
+ ParsingBuf = StringRef(BranchEvents);
+ Col = 0;
+ Line = 1;
+ if (!ParsingBuf.empty() && parseBranchEvents()) {
+ errs() << "PERF2BOLT: failed to parse samples from perf-script file.\n";
+ exit(1);
+ }
+}
+
void DataAggregator::filterBinaryMMapInfo() {
if (opts::FilterPID) {
auto MMapInfoIter = BinaryMMapInfo.find(opts::FilterPID);
@@ -606,6 +687,8 @@ Error DataAggregator::preprocessProfile(BinaryContext &BC) {
if (opts::ReadPreAggregated) {
parsePreAggregated();
+ } else if (opts::ReadPerfScript) {
+ parsePerfScriptEvents();
} else {
parsePerfData(BC);
}
|
aaupov
left a comment
There was a problem hiding this comment.
Thank you for working on this, I think it's useful outside just testing. Please see couple of comments inline.
Additionally, we have opts::ReadPerfEvents with a similar purpose and similarly used in testing, can we unify the two options?
@aaupov that's the plan, to eventually replace my half-baked |
aaupov
left a comment
There was a problem hiding this comment.
Some more thoughts on this functionality (not blocking but please consider as follow-up items):
- A note on constructing MMapEvents and BranchEvents strings - see inline comment.
- In the general case we also need task events and buildid-list output. We should include that into perf script input as well.
- Extra profile support: basic samples and mem profile.
- If all of the above is implemented, the functionality should be unified with
parsePerfDataas the input would be the same (except coming from a pre-parsed file).
|
Please also retitle as e.g. "[BOLT] Add pre-parsed perf script support" |
There was a problem hiding this comment.
Hi Adam,
Thanks for your work! A couple of high-level thoughts after a quick pass:
(1) For the new flag, my thinking was to provide a textual replacement for perf.data:
whatever perf2bolt would invoke, we run it beforehand outselves (ie perf script ..) and pass the result to the tool.
Are there actual limitations to using --perfscript with basic samples?
Why not call parseBranchEvents or parseBasicEvents, depending on BasicAggregation?
parseBranchEvents would only process the brstack events (ie PerfBranchSample) anyway, right?
(2) The flag is currently a toggle: if set, we treat -p/-perfdata as textual format, right?
We could fully align with llvm-profgen if we want and use -ps/-perfscript as a filename instead. Ofc we'll need to adjust the logic, likely in places like this. (not a strong opinion)
| ReadPerfScript("perfscript", | ||
| cl::desc("skip perf and read perf-script trace created by " | ||
| "Linux perf tool with script command"), | ||
| cl::ReallyHidden, cl::cat(AggregatorCategory)); |
There was a problem hiding this comment.
| cl::ReallyHidden, cl::cat(AggregatorCategory)); | |
| cl::Hidden, cl::cat(AggregatorCategory)); |
I believe we could make the new flag just Hidden instead, so it can appear with the below?
perf2bolt --help-list-hidden
The old flag (ReadPerfEvents) was never really used from the CLI, only in a single unit test (PerfSpeEvents), since it was never fully implemented.
We could replace it and drop the old flag, given that we hardcode the parsing buffer to the values used by the relevant unit test? We could do a follow-up patch on this.
|
@paschalis-mpeis, thanks for your comment.
There shouldn't be any limitation, it should depend on the input. However the current implementation covers only BranchEvent aggregation.
Yes, it only process branch events, in all other cases it throws an error.
Yes, we can use |
|
Hi, @aaupov @paschalis-mpeis perf2bolt would be the file generator, for example this command below generates a perf.text hybrid profile for 'BasicAggregation'. Or with '-spe' for 'BranchAggregation'.
In the test phase we can use this pre-parsed perf profile.
Is that something sounds reasonable for you? |
|
@kaadam, yes, the plan sounds good to me. |
|
I discussed this offline with Adam last week, and I liked his idea provided Amir approves as well. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
|
FYI: This PR depends on #171144 |
| cl::desc("skip perf event collection by supplying a " | ||
| "perf-script output in a textual format"), | ||
| cl::ReallyHidden, cl::init(""), cl::cat(AggregatorCategory)); | ||
| cl::opt<bool> ReadPerfTextProfile( |
There was a problem hiding this comment.
I guess a more user-friendly way might be to auto-detect the format, and set ReadPerfTextProfile true, since we already have PerfTextMagicStr. Alternatively, could we adopt a behavior similar to llvm-profgen, where we pass the filename directly to this option without needing the -p flag at all?
There was a problem hiding this comment.
Yes, I agree. The main reason for adding the magic string to the generated file was indeed to enable better format auto-detection in the future. However, for this initial patch, I wanted to stick to the existing logic and focus solely on adding the "new" aggregation type in isolation, similar to 'pre-Aggregated' format.
Currently, the -p option is mandatory for 'perf2bolt', and the aggregation type must be explicitly defined if we alter the default aggregation. There is probably room for improvement here, changing this core behavior was out of scope for this task.
Thanks for the heads-up! That's a great thought for a follow-up patch to improve the user experience.
There was a problem hiding this comment.
I agree with the request above: we can auto-sense perf-script format. Since perf script will be passed as -p arg, this will be handled by DataAggregator. -pa is used to drive pre-aggregated parsing. We can read the header to decide if it's pre-aggregated or perf-script format. You can add -ps (perf-script) as alias to -pa.
Does this sound reasonable to you?
There was a problem hiding this comment.
@aaupov @Jinjie-Huang I've added the -ps option as an alias for -pa and merged the names of the 'cl::opts'. I also introduced a magic detection method to automatically determine which aggregation format to run. Could you please take another look?
| TimerGroupName, TimerGroupDesc, opts::TimeAggregator); | ||
| if (!Filename.empty()) { | ||
| ErrorOr<std::unique_ptr<MemoryBuffer>> MB = | ||
| MemoryBuffer::getFileOrSTDIN(Filename); |
There was a problem hiding this comment.
Maybe we shouldn't map the full file here just for the header parsing. Could we also use getFileSlice to fetch just the first 133 bytes for the header? In some cases I've encountered, perf script files grow to several tens of GBs, so mapping the whole file could be quite heavy.
There was a problem hiding this comment.
Yes, that's good point. Updating that.
|
@kaadam Thanks for the contribution! Overall, LGTM. A test case covering the full functionality would be great here. |
|
@Jinjie-Huang Thanks for your review. Yes, a test case is definitely necessary. I had an offline discussion with @paschalis-mpeis, and I'm adding a unit test for this functionality at this stage. The main purpose of this feature is to support end-to-end testing for large binaries, and the plan is to further improve test coverage with 'perf-tests' later on. |
aaupov
left a comment
There was a problem hiding this comment.
I have one concern about the use of "event". This PR makes the term ambiguous: currently it means HW sampling event (perf record -e <event>), but here it's used in a sense of perf script invocations: "branch events" is fine, but "buildid events" doesn't make sense to me. Part or group is perhaps a better term. WDYT?
| cl::desc("skip perf event collection by supplying a " | ||
| "perf-script output in a textual format"), | ||
| cl::ReallyHidden, cl::init(""), cl::cat(AggregatorCategory)); | ||
| cl::opt<bool> ReadPerfTextProfile( |
There was a problem hiding this comment.
same here:
| cl::opt<bool> ReadPerfTextProfile( | |
| cl::opt<bool> ReadPerfScript( |
| cl::desc("skip perf event collection by reading a " | ||
| "pre-parsed perf-script output in a textual format"), |
There was a problem hiding this comment.
| cl::desc("skip perf event collection by reading a " | |
| "pre-parsed perf-script output in a textual format"), | |
| cl::desc("read pre-parsed perf script output"), |
Thanks for the review. Yes, I understand your concern about these terms. I will revisit all of them. I originally used 'event' as a generic term for naming variables as well, but you are right: a buildid is definitely not a hardware event. |
This PR implements the functionality to read and parse a pre-paresed
perf-script profile which was made by Perf2bolt's
'--generate-perf-text-data' option.
It helps to add support for large ARM Spe end-to-end tests.
Why does the test need to have a textual format Spe profile?
- To collect an Arm Spe profile by Linux Perf, it needs to have
an arm developer device which has Spe support.
- To decode Spe data, it also needs to have the proper version of
Linux Perf.
The minimum required version of Linux Perf is v6.15.
Bypassing these technical difficulties, that easier to prove
a pre-generated textual profile format.
How should generate this type of profile?
1) You can use Perf2bolt itself to generate a pre-parsed perf-script profile
in textual format.
$ perf2bolt BINARY -p perf.data -o test.text --spe --generate-perf-script
2) Perf2bolt is able to work with this type of profile:
$ perf2bolt BINARY -o test.fdata -p test.text --spe -perf-script
|
@aaupov Thanks for the review. |
aaupov
left a comment
There was a problem hiding this comment.
Thank you, this looks very good. I've left a couple of comments, please take a look. I also realize that #200476 might cause some churn for this PR as well – I'm OK with landing it after this, WDYT? Would also appreciate you reviewing it.
Thanks for your review. I'm updating the PR based on your suggestions. That would be nice, if we could land this soon. :) I will also take a look at #200476. |
| if (opts::ReadPreAggregated && | ||
| checkInputFileMagic(Filename, PerfTextMagicStr)) { | ||
| if (Error Err = parsePerfScript()) { | ||
| errs() << "PERF2BOLT-ERROR: failed to parse perfscript profile" | ||
| << llvm::toString(std::move(Err)) << "\n"; | ||
| exit(1); | ||
| } | ||
| } else if (opts::ReadPreAggregated) { | ||
| parsePreAggregated(); |
There was a problem hiding this comment.
Move checkInputFileMagic into if (opts::ReadPreAggregated)?
|
@aaupov Thanks for the approval! Please let me know if it's good to be merge. I will go ahead and merge it then unless there are any final objections. |
paschalis-mpeis
left a comment
There was a problem hiding this comment.
No further suggestions.
Feel free to proceed with merging. Great work Adam!
This PR implements the functionality to read and parse a pre-paresed
perf-script profile which was made by Perf2bolt's
'--profile-format=perfscript' option.
It helps to add support for large ARM Spe end-to-end tests.
Why does the test need to have a textual format Spe profile?
an arm developer device which has Spe support.
Linux Perf.
The minimum required version of Linux Perf is v6.15.
Bypassing these technical difficulties, that easier to prove
a pre-generated perfscript profile format.
How should generate this type of profile?
You can use Perf2bolt itself to generate a pre-parsed perf-script profile
in textual format.
$ perf2bolt BINARY -p perf.data -o test.text --spe --profile-format=perfscript
Perf2bolt is able to work with this type of profile:
$ perf2bolt BINARY -o test.fdata -p test.text --spe -ps