-
Notifications
You must be signed in to change notification settings - Fork 15k
[DWARFVerifier] Verify that DW_AT_LLVM_stmt_sequence is set correctly #152807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Peter Rong <[email protected]>
Signed-off-by: Peter Rong <[email protected]>
Signed-off-by: Peter Rong <[email protected]>
@llvm/pr-subscribers-debuginfo Author: Peter Rong (DataCorrupted) ChangesPatch is 60.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152807.diff 2 Files Affected:
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index 8ec3f1729b974..7d6a643560643 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -851,6 +851,52 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
}
break;
}
+ case DW_AT_LLVM_stmt_sequence: {
+ // Make sure the offset in the DW_AT_LLVM_stmt_sequence attribute is valid
+ // and points to a valid sequence start in the line table.
+ auto SectionOffset = AttrValue.Value.getAsSectionOffset();
+ if (!SectionOffset) {
+ ReportError("Invalid DW_AT_LLVM_stmt_sequence encoding",
+ "DIE has invalid DW_AT_LLVM_stmt_sequence encoding:");
+ break;
+ }
+ if (*SectionOffset >= U->getLineSection().Data.size()) {
+ ReportError(
+ "DW_AT_LLVM_stmt_sequence offset out of bounds",
+ "DW_AT_LLVM_stmt_sequence offset is beyond .debug_line bounds: " +
+ llvm::formatv("{0:x8}", *SectionOffset));
+ break;
+ }
+
+ // Check if the offset points to a valid sequence start
+ const auto *LineTable = DCtx.getLineTableForUnit(U);
+ if (!LineTable) {
+ ReportError("DW_AT_LLVM_stmt_sequence without line table",
+ "DIE has DW_AT_LLVM_stmt_sequence but compile unit has no "
+ "line table");
+ break;
+ }
+ bool ValidSequenceOffset = false;
+ // Check if the offset matches any of the sequence start offsets using
+ // binary search
+ auto it = std::lower_bound(LineTable->Sequences.begin(),
+ LineTable->Sequences.end(), *SectionOffset,
+ [](const auto &Sequence, const uint64_t Offset) {
+ return Sequence.StmtSeqOffset < Offset;
+ });
+ if (it != LineTable->Sequences.end() &&
+ it->StmtSeqOffset == *SectionOffset) {
+ ValidSequenceOffset = true;
+ }
+
+ if (!ValidSequenceOffset)
+ ReportError(
+ "Invalid DW_AT_LLVM_stmt_sequence offset",
+ "DW_AT_LLVM_stmt_sequence offset " +
+ llvm::formatv("{0:x8}", *SectionOffset) +
+ " does not point to a valid sequence start in the line table");
+ break;
+ }
default:
break;
}
diff --git a/llvm/test/tools/llvm-dwarfdump/verify_stmt_seq.yaml b/llvm/test/tools/llvm-dwarfdump/verify_stmt_seq.yaml
new file mode 100644
index 0000000000000..1873eea9d49f3
--- /dev/null
+++ b/llvm/test/tools/llvm-dwarfdump/verify_stmt_seq.yaml
@@ -0,0 +1,1617 @@
+# Object file copied from llvm/test/tools/dsymutil/ARM/stmt-seq-macho.test
+# Then I manually tempered with some of the value of the attribute
+# I hope there are easier ways to construct tests like this.
+
+# RUN: yaml2obj %s -o verify_stmt_seq.o
+# RUN: not llvm-dwarfdump -verify -debug-info verify_stmt_seq.o | FileCheck %s --check-prefix=CHECK_INVALID
+
+# Line 1326 0XAB
+# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset 0x000000ab does not point to a valid sequence start in the line table
+# Line 1372 0xEEEEE7
+# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset is beyond .debug_line bounds: 0x00eeeee7
+
+# CHECK_INVALID: error: Aggregated error counts:
+# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset out of bounds occurred 1 time(s).
+# CHECK_INVALID: error: Invalid DW_AT_LLVM_stmt_sequence offset occurred 1 time(s).
+
+# CHECK_INVALID-NOT: error:
+--- !mach-o
+IsLittleEndian: true
+FileHeader:
+ magic: 0xFEEDFACF
+ cputype: 0x100000C
+ cpusubtype: 0x0
+ filetype: 0x1
+ ncmds: 5
+ sizeofcmds: 1176
+ flags: 0x2000
+ reserved: 0x0
+LoadCommands:
+ - cmd: LC_SEGMENT_64
+ cmdsize: 1032
+ segname: ''
+ vmaddr: 0
+ vmsize: 3125
+ fileoff: 1208
+ filesize: 3125
+ maxprot: 7
+ initprot: 7
+ nsects: 12
+ flags: 0
+ Sections:
+ - sectname: __text
+ segname: __TEXT
+ addr: 0x0
+ size: 148
+ offset: 0x4B8
+ align: 2
+ reloff: 0x10F0
+ nreloc: 8
+ flags: 0x80000400
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 00040011C0035FD600100011C0035FD600580051C0035FD600100011C0035FD600580051C0035FD6FFC300D1F44F01A9FD7B02A9FD8300916000805200000094F30300AA20058052000000941400130B6001805200000094F30300AA40058052000000947302000B0100009021000091E03F0091000000948002130BFD7B42A9F44F41A9FFC30091C0035FD600000014C0035FD6
+ relocations:
+ - address: 0x8C
+ symbolnum: 4
+ pcrel: true
+ length: 2
+ extern: true
+ type: 2
+ scattered: false
+ value: 0
+ - address: 0x74
+ symbolnum: 3
+ pcrel: true
+ length: 2
+ extern: true
+ type: 2
+ scattered: false
+ value: 0
+ - address: 0x6C
+ symbolnum: 1
+ pcrel: false
+ length: 2
+ extern: true
+ type: 4
+ scattered: false
+ value: 0
+ - address: 0x68
+ symbolnum: 1
+ pcrel: true
+ length: 2
+ extern: true
+ type: 3
+ scattered: false
+ value: 0
+ - address: 0x60
+ symbolnum: 5
+ pcrel: true
+ length: 2
+ extern: true
+ type: 2
+ scattered: false
+ value: 0
+ - address: 0x54
+ symbolnum: 6
+ pcrel: true
+ length: 2
+ extern: true
+ type: 2
+ scattered: false
+ value: 0
+ - address: 0x48
+ symbolnum: 9
+ pcrel: true
+ length: 2
+ extern: true
+ type: 2
+ scattered: false
+ value: 0
+ - address: 0x3C
+ symbolnum: 7
+ pcrel: true
+ length: 2
+ extern: true
+ type: 2
+ scattered: false
+ value: 0
+ - sectname: __cstring
+ segname: __TEXT
+ addr: 0x94
+ size: 5
+ offset: 0x54C
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x2
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: '7465737400'
+ - sectname: __debug_loc
+ segname: __DWARF
+ addr: 0x99
+ size: 412
+ offset: 0x551
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 08000000000000000C000000000000000100500C0000000000000010000000000000000400A301509F0000000000000000000000000000000008000000000000000C00000000000000030070039F0000000000000000000000000000000010000000000000001400000000000000010050140000000000000018000000000000000400A301509F0000000000000000000000000000000018000000000000001C000000000000000100501C0000000000000020000000000000000400A301509F0000000000000000000000000000000018000000000000001C00000000000000030070039F0000000000000000000000000000000020000000000000002400000000000000010050240000000000000028000000000000000400A301509F00000000000000000000000000000000240000000000000028000000000000000100500000000000000000000000000000000038000000000000004400000000000000030011009F4400000000000000500000000000000001006350000000000000005C0000000000000001006400000000000000000000000000000000
+ - sectname: __debug_abbrev
+ segname: __DWARF
+ addr: 0x235
+ size: 372
+ offset: 0x6ED
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ - sectname: __debug_info
+ segname: __DWARF
+ addr: 0x3A9
+ size: 747
+ offset: 0x861
+ align: 0
+ reloff: 0x1130
+ nreloc: 16
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ relocations:
+ - address: 0x2A7
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x28E
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x253
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x1F5
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x1E1
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x1CE
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x1BA
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x1A7
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x169
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x12D
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0xF1
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0xC4
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x88
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x5F
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x37
+ symbolnum: 2
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x22
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - sectname: __debug_str
+ segname: __DWARF
+ addr: 0x694
+ size: 400
+ offset: 0xB4C
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ - sectname: __apple_names
+ segname: __DWARF
+ addr: 0x824
+ size: 288
+ offset: 0xCDC
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 485341480100000009000000090000000C00000000000000010000000100060000000000FFFFFFFFFFFFFFFF0100000003000000040000000600000007000000080000004A08311CC78E3C8288CB36CF89CB36CFD1125E53522B705390D9F86F6A7F9A7C4908311C8C0000009C000000AC000000BC000000CC000000DC000000EC00000000010000100100000601000001000000F000000000000000D6000000010000005E00000000000000F600000001000000C30000000000000016010000010000002C01000000000000440100000100000052020000000000005C01000001000000A6020000000000002B0100000200000052020000A60200000000000026010000010000006801000000000000E6000000010000008700000000000000
+ - sectname: __apple_objc
+ segname: __DWARF
+ addr: 0x944
+ size: 36
+ offset: 0xDFC
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 485341480100000001000000000000000C000000000000000100000001000600FFFFFFFF
+ - sectname: __apple_namespac
+ segname: __DWARF
+ addr: 0x968
+ size: 36
+ offset: 0xE20
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 485341480100000001000000000000000C000000000000000100000001000600FFFFFFFF
+ - sectname: __apple_types
+ segname: __DWARF
+ addr: 0x98C
+ size: 195
+ offset: 0xE44
+ align: 0
+ reloff: 0x0
+ nreloc: 0
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 48534148010000000500000005000000140000000000000003000000010006000300050004000B000000000002000000FFFFFFFF03000000040000007CA8F05D90D9F86F5B738CDC3080880B6320957C64000000770000008A0000009D000000B0000000380100000100000027020000130000000000002B010000010000000502000013000000000000C20000000100000057000000240000000000007401000001000000DE02000024000000000000BD000000010000005000000024000000000000
+ - sectname: __debug_frame
+ segname: __DWARF
+ addr: 0xA50
+ size: 232
+ offset: 0xF08
+ align: 3
+ reloff: 0x11B0
+ nreloc: 8
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ content: 14000000FFFFFFFF0400080001781E0C1F00000000000000140000000000000000000000000000000800000000000000140000000000000008000000000000000800000000000000140000000000000010000000000000000800000000000000140000000000000018000000000000000800000000000000140000000000000020000000000000000800000000000000240000000000000028000000000000006400000000000000500C1D109E019D02930394040000000014000000000000008C000000000000000400000000000000140000000000000090000000000000000400000000000000
+ relocations:
+ - address: 0xD8
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0xC0
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x98
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x80
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x68
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x50
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x38
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - address: 0x20
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: false
+ type: 0
+ scattered: false
+ value: 0
+ - sectname: __debug_line
+ segname: __DWARF
+ addr: 0xB38
+ size: 253
+ offset: 0xFF0
+ align: 0
+ reloff: 0x11F0
+ nreloc: 8
+ flags: 0x2000000
+ reserved1: 0x0
+ reserved2: 0x0
+ reserved3: 0x0
+ relocations:
+ - address: 0xED
+ symbolnum: 1
+ pcrel: false
+ length: 3
+ extern: fa...
[truncated]
|
auto SectionOffset = AttrValue.Value.getAsSectionOffset(); | ||
if (!SectionOffset) { | ||
ReportError("Invalid DW_AT_LLVM_stmt_sequence encoding", | ||
"DIE has invalid DW_AT_LLVM_stmt_sequence encoding:"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the colon from the end of the string since there is no value to show?
if (*SectionOffset >= U->getLineSection().Data.size()) { | ||
ReportError( | ||
"DW_AT_LLVM_stmt_sequence offset out of bounds", | ||
"DW_AT_LLVM_stmt_sequence offset is beyond .debug_line bounds: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably check if the DW_AT_LLVM_stmt_sequence is inside the current line table only? The .debug_line section conttains multiple line tables, each one has a prologue and then N sequences. We want to make sure the *SectionOffset
is after the prologue and before the end of all sequences. Each line table prologue contains:
dwarfdump --debug-line a.out.dSYM -v
a.out.dSYM/Contents/Resources/DWARF/a.out: file format Mach-O arm64
.debug_line contents:
debug_line[0x00000000]
Line table prologue:
total_length: 0x00000055
format: DWARF32
version: 5
address_size: 8
seg_select_size: 0
prologue_length: 0x00000037
The total_length
tells us where the this line table's data ends. And the prologue_length
tells us where the prologue ends. So we want to make sure that the *SectionOffset
is between the end of the prologue and the and of the current line table, not the entire .debug_line section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for out of range (of the intended line table) test
"Invalid DW_AT_LLVM_stmt_sequence offset", | ||
"DW_AT_LLVM_stmt_sequence offset " + | ||
llvm::formatv("{0:x8}", *SectionOffset) + | ||
" does not point to a valid sequence start in the line table"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change "valid sequence start" to "valid sequence offset"?
// Make sure the offset in the DW_AT_LLVM_stmt_sequence attribute is valid | ||
// and points to a valid sequence start in the line table. | ||
auto SectionOffset = AttrValue.Value.getAsSectionOffset(); | ||
if (!SectionOffset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a test for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for " invalid DW_AT_LLVM_stmt_sequence encoding"
# Then manually tempered with some of the value of the attribute | ||
# I hope there are easier ways to construct tests like this. | ||
|
||
# RUN: yaml2obj %p/Inputs/verify_stmt_seq.yaml -o verify_stmt_seq.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the yaml input is only used in one test, we should use split-file
to inline it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That input is thousands lines long, I'd prefer to keep them separate, or loading it could take unnecessarily long (especially on web pages)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The yaml file is shorter than llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
above
# CHECK_INVALID-NEXT: error: Invalid DW_AT_LLVM_stmt_sequence encoding occurred 1 time(s). | ||
# CHECK_INVALID-NEXT: error: Invalid DW_AT_LLVM_stmt_sequence offset occurred 1 time(s). | ||
|
||
# CHECK_INVALID-NOT: error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add --implicit-check-not=error:
to make sure no other error lines exist
if (it != LineTable->Sequences.end() && | ||
it->StmtSeqOffset == *SectionOffset) { | ||
ValidSequenceOffset = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (it != LineTable->Sequences.end() && | |
it->StmtSeqOffset == *SectionOffset) { | |
ValidSequenceOffset = true; | |
} | |
ValidSequenceOffset = it != LineTable->Sequences.end() && it->StmtSeqOffset == *SectionOffset; |
uint64_t LineTableStart = *StmtListOffset; | ||
uint64_t PrologueLength = LineTable->Prologue.PrologueLength; | ||
uint64_t TotalLength = LineTable->Prologue.TotalLength; | ||
uint64_t LineTableEnd = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract (LineTable->Prologue.getFormParams().Format == dwarf::DWARF64 ? 12 : 4)
to a constant ?
ValidSequenceOffset = | ||
it != LineTable->Sequences.end() && it->StmtSeqOffset == *SectionOffset; | ||
|
||
if (!ValidSequenceOffset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this give invalid errors? I think that for sequence extraction we have our own logic to detect the sequences because the built-in parser's sequence detection is incomplete. See:
// The DWARF parser's discovery of sequences can be incomplete. To |
So we can have valid offsets pointing to sequences that are not detected by the DWARF parser, even if they are actually valid sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
built-in parser's sequence detection is incomplete
I feel like that's something we should spend more time on fixing.
But I've updated the logic to take that into consideration as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of patching the same logic in verifier and linker, I've managed to find the real reason some sequence is not detected:
A sequence is only valid when LowPC < HighPC
, which normally is true since no sequence has only one instruction. This is not true for thunks created by ICF, which only contains one branch instruction.
return !Empty && (LowPC < HighPC) && (FirstRowIndex < LastRowIndex); |
This logic is essentially wrong, since the HighPC was added to the sequence by:
llvm-project/llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp
Lines 582 to 590 in b3baa4d
if (Row.EndSequence) { | |
// Record the end of instruction sequence. | |
Sequence.HighPC = Row.Address.Address; | |
Sequence.LastRowIndex = RowNumber + 1; | |
Sequence.SectionIndex = Row.Address.SectionIndex; | |
if (Sequence.isValid()) | |
LineTable->appendSequence(Sequence); | |
Sequence.reset(); | |
} |
But not be considered to be as part of this sequence:
llvm-project/llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h
Lines 227 to 230 in b3baa4d
bool containsPC(object::SectionedAddress PC) const { | |
return SectionIndex == PC.SectionIndex && | |
(LowPC <= PC.Address && PC.Address < HighPC); | |
} |
My proposed fix is to strip the extra sequence matching logic here (it is hard to find a Row's offset to start with) and in the linker, and update how the Sequence is defined. Essentially, it should be [LowPC, HighPC]
(right inclusive) rather than right exclusive. Let me start a discussion with another PR, if we can change HighPC
, both problems will be solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, #154851 enabling single-instruction sequence seems to cause more problem than it solved. I've invested a day on this and I'll add special logic here, but in the future it should go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ce9c629
to
c463b7e
Compare
# 0xd3 would be a valid offset, if the line table wan't ill formed with two rows having the same PC (0x8c). | ||
# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset 0x000000d3 does not point to a valid sequence offset in the line table | ||
# CHECK_INVALID: DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset] (0x000000d3) | ||
|
||
# CHECK_DEBUG_LINE: 0x000000d3: 05 DW_LNS_set_column (85) | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000d5: 0a DW_LNS_set_prologue_end | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000d6: 00 DW_LNE_set_address (0x000000000000008c) | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000e1: 03 DW_LNS_advance_line (30) | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000e3: 01 DW_LNS_copy | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt prologue_end | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000e4: 00 DW_LNE_end_sequence | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt end_sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwblaikie If we agreed that this line table is wrong and needs to be fixed (#154986), on the verifier side I'll just say these type of offsets are invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect issue/PR reference? The linked one seems unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say there's some extra nuance here:
- Agreed it's incorrect to describe a function of length 1 with a sequence of length 0
- But it /may/ be sort of valid to describe a function as having length 0 (in which case two functions could have the same address) - this can happen in UB situations for C++ (well, not technically UB if th efunction is never called), like this: https://godbolt.org/z/efhK69TMG (f1 has an address range [0, 0), f2 has an address range [0, 1)) - we /probably/ want to get rid of these as they break DWARF in other ways, and are more of a hazard than they need to be (it's not especially expensive to have such functions include a trap instruction rather than be truly zero-length) but it is a thing that exists today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow I didn't know this.
Two options:
- What I can do is not emitting this attribute
DW_AT_LLVM_stmt_sequence
when there is no machine instructions. - If HighPC == LowPC, skip the verification of DW_AT_LLVM_stmt_sequence. It would seem like this functions has no meaning at all so whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think potentially both are good/fine - not emitting the stmt_sequence in the first place might be nice but difficult to implement, and (2) might still be useful for older content generated without the fix to (1)
And it'd still be nice to fix the zero-length problem given the issues it creates for DWARF anyway.
And sounds like you also have bugs where a non-zero length function was getting a zero length sequence - which also needs fixing.
But yeah, any/all of these fixes sound good...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not emitting the stmt_sequence in the first place might be nice but difficult to implement
#154986 already does this, it counts the number of instructions emitted by AsmPrinter
, and skips DW_AT_LLVM_stmt_sequence
if it is less than 2 (branch + nop for icf thunks). Approve it if you could.
for older content generated
Nice catch on backward compatibility. I guess we should skip the verification in this PR as well. There could be some workflow where it verifies dSYM before symbolicating, and declaring it illegal out of no where could break them.
it'd still be nice to fix the zero-length problem given the issues it creates for DWARF anyway.
That's a bit tricky, I spent last week investigating the issue we were discussing in #154986 and here, but I haven't found a nice fix yet. While I'll keep working on it, let me know if you have more insights please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not emitting the stmt_sequence in the first place might be nice but difficult to implement
#154986 already does this, it counts the number of instructions emitted by
AsmPrinter
, and skipsDW_AT_LLVM_stmt_sequence
if it is less than 2 (branch + nop for icf thunks). Approve it if you could.
Hmm - think we might be miscommunicating...
/zero/ length sequences could be omitted (though the function still has an address that might be meaningful/interesting - so I wouldn't want to omit DW_AT_low_pc/high_pc from such a function) but you're suggesting skipping stmt_sequence on 2 byte lengths?
Are these branch+nop instructions relaxable by the linker, so what's being emitted as a 2 byte sequence is being relaxed to zero bytes? And that's the complex problem we're trying to grapple with?
for older content generated
Nice catch on backward compatibility. I guess we should skip the verification in this PR as well. There could be some workflow where it verifies dSYM before symbolicating, and declaring it illegal out of no where could break them.
it'd still be nice to fix the zero-length problem given the issues it creates for DWARF anyway.
That's a bit tricky, I spent last week investigating the issue we were discussing in #154986 and here, but I haven't found a nice fix yet. While I'll keep working on it, let me know if you have more insights please.
Last time I looked I'd tried to do something more surgical to only affect zero-length functions - but that's probably overly pedantic and something that does the thing that PS4 and MacOS targets already do is probably fine - namely, they, I think, set TrapUnreachable to true, and NoTrapAfterNoReturn to true:
llvm-project/llvm/include/llvm/Target/TargetOptions.h
Lines 268 to 273 in 2429a8f
/// Emit target-specific trap instruction for 'unreachable' IR instructions. | |
unsigned TrapUnreachable : 1; | |
/// Do not emit a trap instruction for 'unreachable' IR instructions behind | |
/// noreturn calls, even if TrapUnreachable is true. | |
unsigned NoTrapAfterNoreturn : 1; |
Maybe those two settings could/should even go away and be replaced by hardcoded/non-optional behavior... not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the code pointers.
I agree the discussion is a little bit confusing, let me summarize what we have discussed and see if we are on the same page.
Problems we have here:
Zero-length sequence
AFAIK some non-compliant code you mentioned https://godbolt.org/z/efhK69TMG
could led to a zero-length function. By "set TrapUnreachable to true, and NoTrapAfterNoReturn to true" or even make it a "hardcoded/non-optional behavior", we can resolve zero-length sequence once and for all by saying "all functions should have at lease one instruction, weather its return or trap". This transforms all zero-length problems into one-length problem to make DWARF's life easier. Am I understanding correctly?
One-length sequence is not parsed correctly
The problem is consecutive one-length sequences can't be manually terminated correct. Simply emitting DW_LNE_end_sequence
won't progress the PC of the last Row, rendering the line table incorrect. However, DW_AT_LLVM_stmt_sequence
is attached to one-instruction functions that we care about, and its causing DWARF to be invalid.
Steps I propose to resolve these issues:
Step 1 (#154986): Since DW_AT_LLVM_stmt_sequence
is blocking us the most, I'm proposing that, let's avoid emitting DW_AT_LLVM_stmt_sequence
for these zero/one-length functions by making the thunk two-instruction long, and only emit DW_AT_LLVM_stmt_sequence
for 2+ length functions. Thus, whatever problem zero/one-length functions have, we can leave them there and they should work just like before.
To answer your question:
"you're suggesting skipping stmt_sequence on 2 byte lengths?", I'm suggesting skipping DW_AT_LLVM_stmt_sequence
for functions with less than 2 instructions.
"relaxable by the linker, so what's being emitted as a 2 byte sequence is being relaxed to zero bytes?" No, the hope is to steer clear of the existing problems until we fix it.
Step 2: Fix one-instruction sequences. The ideal thing to do is something similar to -ffunction-sections
, but I've studied it for the past week and found that it was not as easy as it seems. Sections have better support than sequences: MCStream
can terminate a section by simply generate a new section symbol; but it cannot directly terminate the sequence by emitting DW_LNE_end_sequence
, otherwise the PC of the last row will be the same as the first row of the next sequence.
Step 3: Eliminate zero-length sequence set TrapUnreachable to true, and NoTrapAfterNoReturn to true
or even make them mandatory for all targets, as you mentioned above.
Step 4: Once this is done, we can revert Step 1 and land this PR as is, basically saying "all DW_AT_LLVM_stmt_sequence
should point to a valid sequence start at this point."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is consecutive one-length sequences can't be manually terminated correct.
I still don't understand this - I think it's a bug in the emission/DWARF, not in the parsing of it.
let's avoid emitting DW_AT_LLVM_stmt_sequence for these zero/one-length functions by making the thunk two-instruction long, and only emit DW_AT_LLVM_stmt_sequence for 2+ length functions.
Yeah, this seems to me like something's being missed. DWARF, so far as I know, certainly can describe a one-byte-long sequence - if we are failing to emit DWARF that does that I think that's a bug in the emission, not a limitation of DWARF - we shouldn't be working around it by making the code longer to avoid one-byte-long cases, we should be fixing our handling of one-byte-long cases to emit DWARF that correctly describes a one-byte-long sequence.
# 0xd3 would be a valid offset, if the line table wan't ill formed with two rows having the same PC (0x8c). | ||
# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset 0x000000d3 does not point to a valid sequence offset in the line table | ||
# CHECK_INVALID: DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset] (0x000000d3) | ||
|
||
# CHECK_DEBUG_LINE: 0x000000d3: 05 DW_LNS_set_column (85) | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000d5: 0a DW_LNS_set_prologue_end | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000d6: 00 DW_LNE_set_address (0x000000000000008c) | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000e1: 03 DW_LNS_advance_line (30) | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000e3: 01 DW_LNS_copy | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt prologue_end | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000e4: 00 DW_LNE_end_sequence | ||
# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt end_sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect issue/PR reference? The linked one seems unrelated.
2. Bug fix on where line table starts
I agree it's the emission of dwarf not the parsing.
In terms of whether we should fix it now or later: I think I have found the
cause of the problem, if it is really the root cause, I can attempt a pr.
Yuyang (Peter) Rong
Research Scientise @ Meta
…On Wed, Sep 3, 2025, 9:39 AM David Blaikie ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In llvm/test/tools/llvm-dwarfdump/verify_stmt_seq.yaml
<#152807 (comment)>:
> +# 0xd3 would be a valid offset, if the line table wan't ill formed with two rows having the same PC (0x8c).
+# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset 0x000000d3 does not point to a valid sequence offset in the line table
+# CHECK_INVALID: DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset] (0x000000d3)
+
+# CHECK_DEBUG_LINE: 0x000000d3: 05 DW_LNS_set_column (85)
+# CHECK_DEBUG_LINE-NEXT: 0x000000d5: 0a DW_LNS_set_prologue_end
+# CHECK_DEBUG_LINE-NEXT: 0x000000d6: 00 DW_LNE_set_address (0x000000000000008c)
+# CHECK_DEBUG_LINE-NEXT: 0x000000e1: 03 DW_LNS_advance_line (30)
+# CHECK_DEBUG_LINE-NEXT: 0x000000e3: 01 DW_LNS_copy
+# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt prologue_end
+# CHECK_DEBUG_LINE-NEXT: 0x000000e4: 00 DW_LNE_end_sequence
+# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt end_sequence
The problem is consecutive one-length sequences can't be manually
terminated correct.
I still don't understand this - I think it's a bug in the emission/DWARF,
not in the parsing of it.
let's avoid emitting DW_AT_LLVM_stmt_sequence for these zero/one-length
functions by making the thunk two-instruction long, and only emit
DW_AT_LLVM_stmt_sequence for 2+ length functions.
Yeah, this seems to me like something's being missed. DWARF, so far as I
know, certainly can describe a one-byte-long sequence - if we are failing
to emit DWARF that does that I think that's a bug in the emission, not a
limitation of DWARF - we shouldn't be working around it by making the code
longer to avoid one-byte-long cases, we should be fixing our handling of
one-byte-long cases to emit DWARF that correctly describes a one-byte-long
sequence.
—
Reply to this email directly, view it on GitHub
<#152807 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFQQFGC2P3AEARGGS3E3GSL3Q4KTLAVCNFSM6AAAAACDPAGAZKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTCOBRGQ2TCNRRHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Sometimes DW_AT_LLVM_stmt_sequence won't point to the correct offset. This feature helps us debug when/where it went wrong.
Added a new test and manually tempered with the value to show the intended verification result.