Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

DataCorrupted
Copy link
Member

@DataCorrupted DataCorrupted commented Aug 8, 2025

Sometimes DW_AT_LLVM_stmt_sequence won't point to the correct offset. This feature helps us debug when/where it went wrong.

Added a new test and manually tempered with the value to show the intended verification result.

@DataCorrupted DataCorrupted marked this pull request as ready for review August 13, 2025 22:20
@llvmbot
Copy link
Member

llvmbot commented Aug 13, 2025

@llvm/pr-subscribers-debuginfo

Author: Peter Rong (DataCorrupted)

Changes

Patch is 60.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152807.diff

2 Files Affected:

  • (modified) llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp (+46)
  • (added) llvm/test/tools/llvm-dwarfdump/verify_stmt_seq.yaml (+1617)
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index 8ec3f1729b974..7d6a643560643 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -851,6 +851,52 @@ unsigned DWARFVerifier::verifyDebugInfoAttribute(const DWARFDie &Die,
     }
     break;
   }
+  case DW_AT_LLVM_stmt_sequence: {
+    // Make sure the offset in the DW_AT_LLVM_stmt_sequence attribute is valid
+    // and points to a valid sequence start in the line table.
+    auto SectionOffset = AttrValue.Value.getAsSectionOffset();
+    if (!SectionOffset) {
+      ReportError("Invalid DW_AT_LLVM_stmt_sequence encoding",
+                  "DIE has invalid DW_AT_LLVM_stmt_sequence encoding:");
+      break;
+    }
+    if (*SectionOffset >= U->getLineSection().Data.size()) {
+      ReportError(
+          "DW_AT_LLVM_stmt_sequence offset out of bounds",
+          "DW_AT_LLVM_stmt_sequence offset is beyond .debug_line bounds: " +
+              llvm::formatv("{0:x8}", *SectionOffset));
+      break;
+    }
+
+    // Check if the offset points to a valid sequence start
+    const auto *LineTable = DCtx.getLineTableForUnit(U);
+    if (!LineTable) {
+      ReportError("DW_AT_LLVM_stmt_sequence without line table",
+                  "DIE has DW_AT_LLVM_stmt_sequence but compile unit has no "
+                  "line table");
+      break;
+    }
+    bool ValidSequenceOffset = false;
+    // Check if the offset matches any of the sequence start offsets using
+    // binary search
+    auto it = std::lower_bound(LineTable->Sequences.begin(),
+                               LineTable->Sequences.end(), *SectionOffset,
+                               [](const auto &Sequence, const uint64_t Offset) {
+                                 return Sequence.StmtSeqOffset < Offset;
+                               });
+    if (it != LineTable->Sequences.end() &&
+        it->StmtSeqOffset == *SectionOffset) {
+      ValidSequenceOffset = true;
+    }
+
+    if (!ValidSequenceOffset)
+      ReportError(
+          "Invalid DW_AT_LLVM_stmt_sequence offset",
+          "DW_AT_LLVM_stmt_sequence offset " +
+              llvm::formatv("{0:x8}", *SectionOffset) +
+              " does not point to a valid sequence start in the line table");
+    break;
+  }
   default:
     break;
   }
diff --git a/llvm/test/tools/llvm-dwarfdump/verify_stmt_seq.yaml b/llvm/test/tools/llvm-dwarfdump/verify_stmt_seq.yaml
new file mode 100644
index 0000000000000..1873eea9d49f3
--- /dev/null
+++ b/llvm/test/tools/llvm-dwarfdump/verify_stmt_seq.yaml
@@ -0,0 +1,1617 @@
+# Object file copied from llvm/test/tools/dsymutil/ARM/stmt-seq-macho.test
+# Then I manually tempered with some of the value of the attribute
+# I hope there are easier ways to construct tests like this.
+
+# RUN: yaml2obj %s -o verify_stmt_seq.o
+# RUN: not llvm-dwarfdump -verify -debug-info verify_stmt_seq.o | FileCheck %s --check-prefix=CHECK_INVALID
+
+# Line 1326 0XAB
+# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset 0x000000ab does not point to a valid sequence start in the line table
+# Line 1372 0xEEEEE7
+# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset is beyond .debug_line bounds: 0x00eeeee7
+
+# CHECK_INVALID: error: Aggregated error counts:
+# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset out of bounds occurred 1 time(s).
+# CHECK_INVALID: error: Invalid DW_AT_LLVM_stmt_sequence offset occurred 1 time(s).
+
+# CHECK_INVALID-NOT: error:
+--- !mach-o
+IsLittleEndian: true
+FileHeader:
+  magic:           0xFEEDFACF
+  cputype:         0x100000C
+  cpusubtype:      0x0
+  filetype:        0x1
+  ncmds:           5
+  sizeofcmds:      1176
+  flags:           0x2000
+  reserved:        0x0
+LoadCommands:
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         1032
+    segname:         ''
+    vmaddr:          0
+    vmsize:          3125
+    fileoff:         1208
+    filesize:        3125
+    maxprot:         7
+    initprot:        7
+    nsects:          12
+    flags:           0
+    Sections:
+      - sectname:        __text
+        segname:         __TEXT
+        addr:            0x0
+        size:            148
+        offset:          0x4B8
+        align:           2
+        reloff:          0x10F0
+        nreloc:          8
+        flags:           0x80000400
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        content:         00040011C0035FD600100011C0035FD600580051C0035FD600100011C0035FD600580051C0035FD6FFC300D1F44F01A9FD7B02A9FD8300916000805200000094F30300AA20058052000000941400130B6001805200000094F30300AA40058052000000947302000B0100009021000091E03F0091000000948002130BFD7B42A9F44F41A9FFC30091C0035FD600000014C0035FD6
+        relocations:
+          - address:         0x8C
+            symbolnum:       4
+            pcrel:           true
+            length:          2
+            extern:          true
+            type:            2
+            scattered:       false
+            value:           0
+          - address:         0x74
+            symbolnum:       3
+            pcrel:           true
+            length:          2
+            extern:          true
+            type:            2
+            scattered:       false
+            value:           0
+          - address:         0x6C
+            symbolnum:       1
+            pcrel:           false
+            length:          2
+            extern:          true
+            type:            4
+            scattered:       false
+            value:           0
+          - address:         0x68
+            symbolnum:       1
+            pcrel:           true
+            length:          2
+            extern:          true
+            type:            3
+            scattered:       false
+            value:           0
+          - address:         0x60
+            symbolnum:       5
+            pcrel:           true
+            length:          2
+            extern:          true
+            type:            2
+            scattered:       false
+            value:           0
+          - address:         0x54
+            symbolnum:       6
+            pcrel:           true
+            length:          2
+            extern:          true
+            type:            2
+            scattered:       false
+            value:           0
+          - address:         0x48
+            symbolnum:       9
+            pcrel:           true
+            length:          2
+            extern:          true
+            type:            2
+            scattered:       false
+            value:           0
+          - address:         0x3C
+            symbolnum:       7
+            pcrel:           true
+            length:          2
+            extern:          true
+            type:            2
+            scattered:       false
+            value:           0
+      - sectname:        __cstring
+        segname:         __TEXT
+        addr:            0x94
+        size:            5
+        offset:          0x54C
+        align:           0
+        reloff:          0x0
+        nreloc:          0
+        flags:           0x2
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        content:         '7465737400'
+      - sectname:        __debug_loc
+        segname:         __DWARF
+        addr:            0x99
+        size:            412
+        offset:          0x551
+        align:           0
+        reloff:          0x0
+        nreloc:          0
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        content:         08000000000000000C000000000000000100500C0000000000000010000000000000000400A301509F0000000000000000000000000000000008000000000000000C00000000000000030070039F0000000000000000000000000000000010000000000000001400000000000000010050140000000000000018000000000000000400A301509F0000000000000000000000000000000018000000000000001C000000000000000100501C0000000000000020000000000000000400A301509F0000000000000000000000000000000018000000000000001C00000000000000030070039F0000000000000000000000000000000020000000000000002400000000000000010050240000000000000028000000000000000400A301509F00000000000000000000000000000000240000000000000028000000000000000100500000000000000000000000000000000038000000000000004400000000000000030011009F4400000000000000500000000000000001006350000000000000005C0000000000000001006400000000000000000000000000000000
+      - sectname:        __debug_abbrev
+        segname:         __DWARF
+        addr:            0x235
+        size:            372
+        offset:          0x6ED
+        align:           0
+        reloff:          0x0
+        nreloc:          0
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+      - sectname:        __debug_info
+        segname:         __DWARF
+        addr:            0x3A9
+        size:            747
+        offset:          0x861
+        align:           0
+        reloff:          0x1130
+        nreloc:          16
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        relocations:
+          - address:         0x2A7
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x28E
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x253
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x1F5
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x1E1
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x1CE
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x1BA
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x1A7
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x169
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x12D
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0xF1
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0xC4
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x88
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x5F
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x37
+            symbolnum:       2
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x22
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+      - sectname:        __debug_str
+        segname:         __DWARF
+        addr:            0x694
+        size:            400
+        offset:          0xB4C
+        align:           0
+        reloff:          0x0
+        nreloc:          0
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+      - sectname:        __apple_names
+        segname:         __DWARF
+        addr:            0x824
+        size:            288
+        offset:          0xCDC
+        align:           0
+        reloff:          0x0
+        nreloc:          0
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        content:         485341480100000009000000090000000C00000000000000010000000100060000000000FFFFFFFFFFFFFFFF0100000003000000040000000600000007000000080000004A08311CC78E3C8288CB36CF89CB36CFD1125E53522B705390D9F86F6A7F9A7C4908311C8C0000009C000000AC000000BC000000CC000000DC000000EC00000000010000100100000601000001000000F000000000000000D6000000010000005E00000000000000F600000001000000C30000000000000016010000010000002C01000000000000440100000100000052020000000000005C01000001000000A6020000000000002B0100000200000052020000A60200000000000026010000010000006801000000000000E6000000010000008700000000000000
+      - sectname:        __apple_objc
+        segname:         __DWARF
+        addr:            0x944
+        size:            36
+        offset:          0xDFC
+        align:           0
+        reloff:          0x0
+        nreloc:          0
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        content:         485341480100000001000000000000000C000000000000000100000001000600FFFFFFFF
+      - sectname:        __apple_namespac
+        segname:         __DWARF
+        addr:            0x968
+        size:            36
+        offset:          0xE20
+        align:           0
+        reloff:          0x0
+        nreloc:          0
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        content:         485341480100000001000000000000000C000000000000000100000001000600FFFFFFFF
+      - sectname:        __apple_types
+        segname:         __DWARF
+        addr:            0x98C
+        size:            195
+        offset:          0xE44
+        align:           0
+        reloff:          0x0
+        nreloc:          0
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        content:         48534148010000000500000005000000140000000000000003000000010006000300050004000B000000000002000000FFFFFFFF03000000040000007CA8F05D90D9F86F5B738CDC3080880B6320957C64000000770000008A0000009D000000B0000000380100000100000027020000130000000000002B010000010000000502000013000000000000C20000000100000057000000240000000000007401000001000000DE02000024000000000000BD000000010000005000000024000000000000
+      - sectname:        __debug_frame
+        segname:         __DWARF
+        addr:            0xA50
+        size:            232
+        offset:          0xF08
+        align:           3
+        reloff:          0x11B0
+        nreloc:          8
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        content:         14000000FFFFFFFF0400080001781E0C1F00000000000000140000000000000000000000000000000800000000000000140000000000000008000000000000000800000000000000140000000000000010000000000000000800000000000000140000000000000018000000000000000800000000000000140000000000000020000000000000000800000000000000240000000000000028000000000000006400000000000000500C1D109E019D02930394040000000014000000000000008C000000000000000400000000000000140000000000000090000000000000000400000000000000
+        relocations:
+          - address:         0xD8
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0xC0
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x98
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x80
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x68
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x50
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x38
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+          - address:         0x20
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          false
+            type:            0
+            scattered:       false
+            value:           0
+      - sectname:        __debug_line
+        segname:         __DWARF
+        addr:            0xB38
+        size:            253
+        offset:          0xFF0
+        align:           0
+        reloff:          0x11F0
+        nreloc:          8
+        flags:           0x2000000
+        reserved1:       0x0
+        reserved2:       0x0
+        reserved3:       0x0
+        relocations:
+          - address:         0xED
+            symbolnum:       1
+            pcrel:           false
+            length:          3
+            extern:          fa...
[truncated]

auto SectionOffset = AttrValue.Value.getAsSectionOffset();
if (!SectionOffset) {
ReportError("Invalid DW_AT_LLVM_stmt_sequence encoding",
"DIE has invalid DW_AT_LLVM_stmt_sequence encoding:");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the colon from the end of the string since there is no value to show?

if (*SectionOffset >= U->getLineSection().Data.size()) {
ReportError(
"DW_AT_LLVM_stmt_sequence offset out of bounds",
"DW_AT_LLVM_stmt_sequence offset is beyond .debug_line bounds: " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably check if the DW_AT_LLVM_stmt_sequence is inside the current line table only? The .debug_line section conttains multiple line tables, each one has a prologue and then N sequences. We want to make sure the *SectionOffset is after the prologue and before the end of all sequences. Each line table prologue contains:

dwarfdump --debug-line a.out.dSYM -v
a.out.dSYM/Contents/Resources/DWARF/a.out:	file format Mach-O arm64

.debug_line contents:
debug_line[0x00000000]
Line table prologue:
    total_length: 0x00000055
          format: DWARF32
         version: 5
    address_size: 8
 seg_select_size: 0
 prologue_length: 0x00000037

The total_length tells us where the this line table's data ends. And the prologue_length tells us where the prologue ends. So we want to make sure that the *SectionOffset is between the end of the prologue and the and of the current line table, not the entire .debug_line section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for out of range (of the intended line table) test

"Invalid DW_AT_LLVM_stmt_sequence offset",
"DW_AT_LLVM_stmt_sequence offset " +
llvm::formatv("{0:x8}", *SectionOffset) +
" does not point to a valid sequence start in the line table");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change "valid sequence start" to "valid sequence offset"?

// Make sure the offset in the DW_AT_LLVM_stmt_sequence attribute is valid
// and points to a valid sequence start in the line table.
auto SectionOffset = AttrValue.Value.getAsSectionOffset();
if (!SectionOffset) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test for this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for " invalid DW_AT_LLVM_stmt_sequence encoding"

# Then manually tempered with some of the value of the attribute
# I hope there are easier ways to construct tests like this.

# RUN: yaml2obj %p/Inputs/verify_stmt_seq.yaml -o verify_stmt_seq.o
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the yaml input is only used in one test, we should use split-file to inline it here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That input is thousands lines long, I'd prefer to keep them separate, or loading it could take unnecessarily long (especially on web pages)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The yaml file is shorter than llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp above

# CHECK_INVALID-NEXT: error: Invalid DW_AT_LLVM_stmt_sequence encoding occurred 1 time(s).
# CHECK_INVALID-NEXT: error: Invalid DW_AT_LLVM_stmt_sequence offset occurred 1 time(s).

# CHECK_INVALID-NOT: error:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add --implicit-check-not=error: to make sure no other error lines exist

Comment on lines 921 to 924
if (it != LineTable->Sequences.end() &&
it->StmtSeqOffset == *SectionOffset) {
ValidSequenceOffset = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (it != LineTable->Sequences.end() &&
it->StmtSeqOffset == *SectionOffset) {
ValidSequenceOffset = true;
}
ValidSequenceOffset = it != LineTable->Sequences.end() && it->StmtSeqOffset == *SectionOffset;

uint64_t LineTableStart = *StmtListOffset;
uint64_t PrologueLength = LineTable->Prologue.PrologueLength;
uint64_t TotalLength = LineTable->Prologue.TotalLength;
uint64_t LineTableEnd =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract (LineTable->Prologue.getFormParams().Format == dwarf::DWARF64 ? 12 : 4) to a constant ?

ValidSequenceOffset =
it != LineTable->Sequences.end() && it->StmtSeqOffset == *SectionOffset;

if (!ValidSequenceOffset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this give invalid errors? I think that for sequence extraction we have our own logic to detect the sequences because the built-in parser's sequence detection is incomplete. See:

// The DWARF parser's discovery of sequences can be incomplete. To

So we can have valid offsets pointing to sequences that are not detected by the DWARF parser, even if they are actually valid sequences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

built-in parser's sequence detection is incomplete

I feel like that's something we should spend more time on fixing.

But I've updated the logic to take that into consideration as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of patching the same logic in verifier and linker, I've managed to find the real reason some sequence is not detected:

A sequence is only valid when LowPC < HighPC, which normally is true since no sequence has only one instruction. This is not true for thunks created by ICF, which only contains one branch instruction.

return !Empty && (LowPC < HighPC) && (FirstRowIndex < LastRowIndex);

This logic is essentially wrong, since the HighPC was added to the sequence by:

if (Row.EndSequence) {
// Record the end of instruction sequence.
Sequence.HighPC = Row.Address.Address;
Sequence.LastRowIndex = RowNumber + 1;
Sequence.SectionIndex = Row.Address.SectionIndex;
if (Sequence.isValid())
LineTable->appendSequence(Sequence);
Sequence.reset();
}

But not be considered to be as part of this sequence:

bool containsPC(object::SectionedAddress PC) const {
return SectionIndex == PC.SectionIndex &&
(LowPC <= PC.Address && PC.Address < HighPC);
}

My proposed fix is to strip the extra sequence matching logic here (it is hard to find a Row's offset to start with) and in the linker, and update how the Sequence is defined. Essentially, it should be [LowPC, HighPC] (right inclusive) rather than right exclusive. Let me start a discussion with another PR, if we can change HighPC, both problems will be solved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, #154851 enabling single-instruction sequence seems to cause more problem than it solved. I've invested a day on this and I'll add special logic here, but in the future it should go away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion on #154986 / #110192 , if the sequence can't be parsed, the line table is ill formed, and we should not bend verifier for it.

I plan to fix that in #154986, if not other approach.

Comment on lines 13 to 24
# 0xd3 would be a valid offset, if the line table wan't ill formed with two rows having the same PC (0x8c).
# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset 0x000000d3 does not point to a valid sequence offset in the line table
# CHECK_INVALID: DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset] (0x000000d3)

# CHECK_DEBUG_LINE: 0x000000d3: 05 DW_LNS_set_column (85)
# CHECK_DEBUG_LINE-NEXT: 0x000000d5: 0a DW_LNS_set_prologue_end
# CHECK_DEBUG_LINE-NEXT: 0x000000d6: 00 DW_LNE_set_address (0x000000000000008c)
# CHECK_DEBUG_LINE-NEXT: 0x000000e1: 03 DW_LNS_advance_line (30)
# CHECK_DEBUG_LINE-NEXT: 0x000000e3: 01 DW_LNS_copy
# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt prologue_end
# CHECK_DEBUG_LINE-NEXT: 0x000000e4: 00 DW_LNE_end_sequence
# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt end_sequence
Copy link
Member Author

@DataCorrupted DataCorrupted Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwblaikie If we agreed that this line table is wrong and needs to be fixed (#154986), on the verifier side I'll just say these type of offsets are invalid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect issue/PR reference? The linked one seems unrelated.

Copy link
Member Author

@DataCorrupted DataCorrupted Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#154986 and #110192, updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say there's some extra nuance here:

  • Agreed it's incorrect to describe a function of length 1 with a sequence of length 0
  • But it /may/ be sort of valid to describe a function as having length 0 (in which case two functions could have the same address) - this can happen in UB situations for C++ (well, not technically UB if th efunction is never called), like this: https://godbolt.org/z/efhK69TMG (f1 has an address range [0, 0), f2 has an address range [0, 1)) - we /probably/ want to get rid of these as they break DWARF in other ways, and are more of a hazard than they need to be (it's not especially expensive to have such functions include a trap instruction rather than be truly zero-length) but it is a thing that exists today

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I didn't know this.

Two options:

  1. What I can do is not emitting this attribute DW_AT_LLVM_stmt_sequence when there is no machine instructions.
  2. If HighPC == LowPC, skip the verification of DW_AT_LLVM_stmt_sequence. It would seem like this functions has no meaning at all so whatever.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think potentially both are good/fine - not emitting the stmt_sequence in the first place might be nice but difficult to implement, and (2) might still be useful for older content generated without the fix to (1)

And it'd still be nice to fix the zero-length problem given the issues it creates for DWARF anyway.

And sounds like you also have bugs where a non-zero length function was getting a zero length sequence - which also needs fixing.

But yeah, any/all of these fixes sound good...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not emitting the stmt_sequence in the first place might be nice but difficult to implement

#154986 already does this, it counts the number of instructions emitted by AsmPrinter, and skips DW_AT_LLVM_stmt_sequence if it is less than 2 (branch + nop for icf thunks). Approve it if you could.

for older content generated

Nice catch on backward compatibility. I guess we should skip the verification in this PR as well. There could be some workflow where it verifies dSYM before symbolicating, and declaring it illegal out of no where could break them.

it'd still be nice to fix the zero-length problem given the issues it creates for DWARF anyway.

That's a bit tricky, I spent last week investigating the issue we were discussing in #154986 and here, but I haven't found a nice fix yet. While I'll keep working on it, let me know if you have more insights please.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not emitting the stmt_sequence in the first place might be nice but difficult to implement

#154986 already does this, it counts the number of instructions emitted by AsmPrinter, and skips DW_AT_LLVM_stmt_sequence if it is less than 2 (branch + nop for icf thunks). Approve it if you could.

Hmm - think we might be miscommunicating...

/zero/ length sequences could be omitted (though the function still has an address that might be meaningful/interesting - so I wouldn't want to omit DW_AT_low_pc/high_pc from such a function) but you're suggesting skipping stmt_sequence on 2 byte lengths?

Are these branch+nop instructions relaxable by the linker, so what's being emitted as a 2 byte sequence is being relaxed to zero bytes? And that's the complex problem we're trying to grapple with?

for older content generated

Nice catch on backward compatibility. I guess we should skip the verification in this PR as well. There could be some workflow where it verifies dSYM before symbolicating, and declaring it illegal out of no where could break them.

it'd still be nice to fix the zero-length problem given the issues it creates for DWARF anyway.

That's a bit tricky, I spent last week investigating the issue we were discussing in #154986 and here, but I haven't found a nice fix yet. While I'll keep working on it, let me know if you have more insights please.

Last time I looked I'd tried to do something more surgical to only affect zero-length functions - but that's probably overly pedantic and something that does the thing that PS4 and MacOS targets already do is probably fine - namely, they, I think, set TrapUnreachable to true, and NoTrapAfterNoReturn to true:

/// Emit target-specific trap instruction for 'unreachable' IR instructions.
unsigned TrapUnreachable : 1;
/// Do not emit a trap instruction for 'unreachable' IR instructions behind
/// noreturn calls, even if TrapUnreachable is true.
unsigned NoTrapAfterNoreturn : 1;

Maybe those two settings could/should even go away and be replaced by hardcoded/non-optional behavior... not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the code pointers.

I agree the discussion is a little bit confusing, let me summarize what we have discussed and see if we are on the same page.

Problems we have here:

Zero-length sequence

AFAIK some non-compliant code you mentioned https://godbolt.org/z/efhK69TMG
could led to a zero-length function. By "set TrapUnreachable to true, and NoTrapAfterNoReturn to true" or even make it a "hardcoded/non-optional behavior", we can resolve zero-length sequence once and for all by saying "all functions should have at lease one instruction, weather its return or trap". This transforms all zero-length problems into one-length problem to make DWARF's life easier. Am I understanding correctly?

One-length sequence is not parsed correctly

The problem is consecutive one-length sequences can't be manually terminated correct. Simply emitting DW_LNE_end_sequence won't progress the PC of the last Row, rendering the line table incorrect. However, DW_AT_LLVM_stmt_sequence is attached to one-instruction functions that we care about, and its causing DWARF to be invalid.

Steps I propose to resolve these issues:

Step 1 (#154986): Since DW_AT_LLVM_stmt_sequence is blocking us the most, I'm proposing that, let's avoid emitting DW_AT_LLVM_stmt_sequence for these zero/one-length functions by making the thunk two-instruction long, and only emit DW_AT_LLVM_stmt_sequence for 2+ length functions. Thus, whatever problem zero/one-length functions have, we can leave them there and they should work just like before.

To answer your question:

"you're suggesting skipping stmt_sequence on 2 byte lengths?", I'm suggesting skipping DW_AT_LLVM_stmt_sequence for functions with less than 2 instructions.

"relaxable by the linker, so what's being emitted as a 2 byte sequence is being relaxed to zero bytes?" No, the hope is to steer clear of the existing problems until we fix it.

Step 2: Fix one-instruction sequences. The ideal thing to do is something similar to -ffunction-sections, but I've studied it for the past week and found that it was not as easy as it seems. Sections have better support than sequences: MCStream can terminate a section by simply generate a new section symbol; but it cannot directly terminate the sequence by emitting DW_LNE_end_sequence, otherwise the PC of the last row will be the same as the first row of the next sequence.

Step 3: Eliminate zero-length sequence set TrapUnreachable to true, and NoTrapAfterNoReturn to true or even make them mandatory for all targets, as you mentioned above.

Step 4: Once this is done, we can revert Step 1 and land this PR as is, basically saying "all DW_AT_LLVM_stmt_sequence should point to a valid sequence start at this point."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is consecutive one-length sequences can't be manually terminated correct.

I still don't understand this - I think it's a bug in the emission/DWARF, not in the parsing of it.

let's avoid emitting DW_AT_LLVM_stmt_sequence for these zero/one-length functions by making the thunk two-instruction long, and only emit DW_AT_LLVM_stmt_sequence for 2+ length functions.

Yeah, this seems to me like something's being missed. DWARF, so far as I know, certainly can describe a one-byte-long sequence - if we are failing to emit DWARF that does that I think that's a bug in the emission, not a limitation of DWARF - we shouldn't be working around it by making the code longer to avoid one-byte-long cases, we should be fixing our handling of one-byte-long cases to emit DWARF that correctly describes a one-byte-long sequence.

Comment on lines 13 to 24
# 0xd3 would be a valid offset, if the line table wan't ill formed with two rows having the same PC (0x8c).
# CHECK_INVALID: error: DW_AT_LLVM_stmt_sequence offset 0x000000d3 does not point to a valid sequence offset in the line table
# CHECK_INVALID: DW_AT_LLVM_stmt_sequence [DW_FORM_sec_offset] (0x000000d3)

# CHECK_DEBUG_LINE: 0x000000d3: 05 DW_LNS_set_column (85)
# CHECK_DEBUG_LINE-NEXT: 0x000000d5: 0a DW_LNS_set_prologue_end
# CHECK_DEBUG_LINE-NEXT: 0x000000d6: 00 DW_LNE_set_address (0x000000000000008c)
# CHECK_DEBUG_LINE-NEXT: 0x000000e1: 03 DW_LNS_advance_line (30)
# CHECK_DEBUG_LINE-NEXT: 0x000000e3: 01 DW_LNS_copy
# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt prologue_end
# CHECK_DEBUG_LINE-NEXT: 0x000000e4: 00 DW_LNE_end_sequence
# CHECK_DEBUG_LINE-NEXT: 0x000000000000008c 30 85 1 0 0 0 is_stmt end_sequence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect issue/PR reference? The linked one seems unrelated.

2. Bug fix on where line table starts
@DataCorrupted
Copy link
Member Author

DataCorrupted commented Sep 3, 2025 via email

@DataCorrupted
Copy link
Member Author

#157529 should've fixed the zero-length issue. With that regard, on the verifier side we can safely declare the offset in DW_AT_LLVM_stmt_sequence invalid no Sequence have that offset.

@clayborg @alx32 Can I get a quick review please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants