Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit dc58013

Browse files
thechenliChen Li
andauthored
[llvm-gsymutil] Replace truncated DWARF names with mangled names from symbol table (#184221)
## Summary - During `GsymCreator::finalize()`, when deduplicating entries with the same address range, check if the DWARF entry's name is a truncated version of the symbol table's mangled name - If the DWARF name is a substring of the demangled symbol table name, replace it with the full mangled name before discarding the symbol table entry - This allows downstream tools to properly demangle and display full function signatures ## Test plan ### Unit tests - `TestMangledNameReplacement`: Verifies DWARF name `make_ftype` is replaced with `_Z10make_ftypePci` and line table is preserved - `TestMangledNameReplacementNegative`: Verifies no replacement when both names are mangled, or when names are unrelated - All 51 GSYM unit tests pass ### Lit test - `elf-mangled-name-replacement.yaml`: End-to-end test creating an ELF with DWARF + symbol table, converting to GSYM, and verifying the output - All 9/9 applicable GSYM lit tests pass (6 unsupported are ARM/macOS tests on x86_64 Linux) ### Manual end-to-end testing Created ELF binaries with `yaml2obj` containing both DWARF debug info and symbol table entries for the same function, then converted to GSYM with `llvm-gsymutil --convert` and verified the output with `llvm-gsymutil` dump. **Test 1: Name replacement happens when DWARF name is truncated** - DWARF has function named `make_ftype` with line table at `0x401000` - Symbol table has `_Z10make_ftypePci` (demangles to `make_ftype(char*, int)`) at same address - After conversion, GSYM output shows: `"_Z10make_ftypePci"` with line table preserved ✅ **Test 2: No replacement when names are unrelated** - DWARF has function named `unrelated_func` with line table at `0x401000` - Symbol table has `_Z10make_ftypePci` at same address - After conversion, GSYM output shows: `"unrelated_func"` — name unchanged ✅ **Test 3: Replacement works with namespaced functions** - DWARF has function named `make_ftype` with line table at `0x401000` - Symbol table has `_ZN12_GLOBAL__N_110make_ftypeEPci` (demangles to `(anonymous namespace)::make_ftype(char*, int)`) at same address - After conversion, GSYM output shows: `"_ZN12_GLOBAL__N_110make_ftypeEPci"` with line table preserved ✅ Co-authored-by: Chen Li <[email protected]>
1 parent ba467b6 commit dc58013

6 files changed

Lines changed: 460 additions & 18 deletions

File tree

llvm/include/llvm/DebugInfo/GSYM/CallSiteInfo.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,13 @@ struct CallSiteInfo {
7878
struct CallSiteInfoCollection {
7979
std::vector<CallSiteInfo> CallSites;
8080

81+
bool operator==(const CallSiteInfoCollection &RHS) const {
82+
return CallSites == RHS.CallSites;
83+
}
84+
bool operator!=(const CallSiteInfoCollection &RHS) const {
85+
return !(*this == RHS);
86+
}
87+
8188
/// Decode a CallSiteInfoCollection object from a binary data stream.
8289
///
8390
/// \param Data The binary stream to read the data from.

llvm/include/llvm/DebugInfo/GSYM/FunctionInfo.h

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,8 @@ struct FunctionInfo {
217217

218218
inline bool operator==(const FunctionInfo &LHS, const FunctionInfo &RHS) {
219219
return LHS.Range == RHS.Range && LHS.Name == RHS.Name &&
220-
LHS.OptLineTable == RHS.OptLineTable && LHS.Inline == RHS.Inline;
220+
LHS.OptLineTable == RHS.OptLineTable && LHS.Inline == RHS.Inline &&
221+
LHS.CallSites == RHS.CallSites;
221222
}
222223
inline bool operator!=(const FunctionInfo &LHS, const FunctionInfo &RHS) {
223224
return !(LHS == RHS);
@@ -233,13 +234,17 @@ inline bool operator!=(const FunctionInfo &LHS, const FunctionInfo &RHS) {
233234
/// inline information with the most entries will appeear last. If the inline
234235
/// information match, either by both function infos not having any or both
235236
/// being exactly the same, we will then compare line tables. Comparing line
236-
/// tables allows the entry with the most line entries to appear last. This
237-
/// ensures we are able to save the FunctionInfo with the most debug info into
238-
/// the GSYM file.
237+
/// tables allows the entry with the most line entries to appear last. As a
238+
/// final tiebreaker, an entry that has call site information sorts after one
239+
/// that does not, so that within a single address range the entry with the
240+
/// most debug info always appears last. This ensures we are able to save the
241+
/// FunctionInfo with the most debug info into the GSYM file.
239242
inline bool operator<(const FunctionInfo &LHS, const FunctionInfo &RHS) {
240243
// First sort by address range
241-
return std::tie(LHS.Range, LHS.Inline, LHS.OptLineTable) <
242-
std::tie(RHS.Range, RHS.Inline, RHS.OptLineTable);
244+
const bool LHSHasCallSites = LHS.CallSites.has_value();
245+
const bool RHSHasCallSites = RHS.CallSites.has_value();
246+
return std::tie(LHS.Range, LHS.Inline, LHS.OptLineTable, LHSHasCallSites) <
247+
std::tie(RHS.Range, RHS.Inline, RHS.OptLineTable, RHSHasCallSites);
243248
}
244249

245250
LLVM_ABI raw_ostream &operator<<(raw_ostream &OS, const FunctionInfo &R);

llvm/lib/DebugInfo/GSYM/GsymCreator.cpp

Lines changed: 47 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
//===----------------------------------------------------------------------===//
77

88
#include "llvm/DebugInfo/GSYM/GsymCreator.h"
9+
#include "llvm/ADT/SmallString.h"
910
#include "llvm/DebugInfo/GSYM/FileWriter.h"
1011
#include "llvm/DebugInfo/GSYM/Header.h"
1112
#include "llvm/DebugInfo/GSYM/LineTable.h"
@@ -21,6 +22,34 @@
2122
using namespace llvm;
2223
using namespace gsym;
2324

25+
// Keep this matching cheap: Itanium and Swift both encode identifiers as
26+
// <length><identifier> in the raw mangled name. Look for that token instead of
27+
// demangling during finalize().
28+
static bool isSupportedMangledPrefix(StringRef Name) {
29+
return Name.starts_with("_Z") || Name.starts_with("$s") ||
30+
Name.starts_with("$S");
31+
}
32+
33+
static bool shouldReplaceWithMangledName(StringRef AlternateName,
34+
StringRef CurrentName) {
35+
// Any name is better than no name.
36+
if (CurrentName.empty() && !AlternateName.empty())
37+
return true;
38+
39+
// Keep the current name if it's already mangled, or if the alternate name
40+
// is not a supported mangled name.
41+
if (isSupportedMangledPrefix(CurrentName) ||
42+
!isSupportedMangledPrefix(AlternateName))
43+
return false;
44+
45+
// Confirm the alternate mangled name actually contains the current name as
46+
// an Itanium/Swift identifier token (<length><identifier>).
47+
SmallString<64> LengthAndName;
48+
raw_svector_ostream OS(LengthAndName);
49+
OS << CurrentName.size() << CurrentName;
50+
return AlternateName.contains(StringRef(LengthAndName));
51+
}
52+
2453
GsymCreator::GsymCreator(bool Quiet)
2554
: StrTab(StringTableBuilder::ELF), Quiet(Quiet) {
2655
insertFile(StringRef());
@@ -180,14 +209,24 @@ llvm::Error GsymCreator::finalize(OutputAggregator &Out) {
180209
if (ranges_equal || Prev.Range.intersects(Curr.Range)) {
181210
// Overlapping ranges or empty identical ranges.
182211
if (ranges_equal) {
183-
// Same address range. Check if one is from debug
184-
// info and the other is from a symbol table. If
185-
// so, then keep the one with debug info. Our
186-
// sorting guarantees that entries with matching
187-
// address ranges that have debug info are last in
188-
// the sort.
189-
if (!(Prev == Curr)) {
190-
if (Prev.hasRichInfo() && Curr.hasRichInfo())
212+
// Same address range. The sort orders entries with more debug info
213+
// last, so when exactly one entry has rich info, Prev is the
214+
// non-rich (typically symbol-table) entry and Curr is the rich
215+
// (typically DWARF) one. DWARF often truncates a function's
216+
// linkage name to its short form, so before dropping the non-rich
217+
// entry check whether its name is a more complete mangled
218+
// (Itanium or Swift) form of the rich entry's name and, if so,
219+
// copy it onto the rich entry. This lets downstream tools
220+
// demangle the full signature.
221+
const bool PrevRich = Prev.hasRichInfo();
222+
const bool CurrRich = Curr.hasRichInfo();
223+
if (PrevRich != CurrRich) {
224+
if (shouldReplaceWithMangledName(getString(Prev.Name),
225+
getString(Curr.Name)))
226+
Curr.Name = Prev.Name;
227+
std::swap(Prev, Curr);
228+
} else if (Prev != Curr) {
229+
if (PrevRich)
191230
Out.Report(
192231
"Duplicate address ranges with different debug info.",
193232
[&](raw_ostream &OS) {
@@ -197,10 +236,6 @@ llvm::Error GsymCreator::finalize(OutputAggregator &Out) {
197236
<< Prev << "\nIn favor of this one:\n"
198237
<< Curr << "\n";
199238
});
200-
201-
// We want to swap the current entry with the previous since
202-
// later entries with the same range always have more debug info
203-
// or different debug info.
204239
std::swap(Prev, Curr);
205240
}
206241
} else {
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
## Test that same-range dedup keeps the DWARF line table while replacing a
2+
## shortened DWARF function name with the full Itanium symbol-table name.
3+
4+
# RUN: yaml2obj %s -o %t
5+
# RUN: llvm-gsymutil --convert %t -o %t.gsym 2>&1 | FileCheck %s --check-prefix=CONVERT
6+
# RUN: llvm-gsymutil %t.gsym 2>&1 | FileCheck %s --check-prefix=DUMP
7+
8+
# CONVERT: Loaded 1 functions from DWARF.
9+
# CONVERT: Loaded 1 functions from symbol table.
10+
# CONVERT: Pruned 1 functions, ended with 1 total
11+
12+
# DUMP: "_Z10make_ftypePci"
13+
# DUMP: LineTable:
14+
# DUMP: main.cpp:10
15+
# DUMP: main.cpp:11
16+
17+
--- !ELF
18+
FileHeader:
19+
Class: ELFCLASS64
20+
Data: ELFDATA2LSB
21+
Type: ET_EXEC
22+
Machine: EM_X86_64
23+
Sections:
24+
- Name: .text
25+
Type: SHT_PROGBITS
26+
Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
27+
Address: 0x0000000000401000
28+
AddressAlign: 0x10
29+
Content: 554889E531C05DC3554889E531C05DC3
30+
DWARF:
31+
debug_str:
32+
- ''
33+
- main.cpp
34+
- make_ftype
35+
debug_abbrev:
36+
- ID: 0
37+
Table:
38+
- Code: 0x1
39+
Tag: DW_TAG_compile_unit
40+
Children: DW_CHILDREN_yes
41+
Attributes:
42+
- Attribute: DW_AT_name
43+
Form: DW_FORM_strp
44+
- Attribute: DW_AT_language
45+
Form: DW_FORM_udata
46+
- Attribute: DW_AT_stmt_list
47+
Form: DW_FORM_sec_offset
48+
- Code: 0x2
49+
Tag: DW_TAG_subprogram
50+
Children: DW_CHILDREN_no
51+
Attributes:
52+
- Attribute: DW_AT_name
53+
Form: DW_FORM_strp
54+
- Attribute: DW_AT_low_pc
55+
Form: DW_FORM_addr
56+
- Attribute: DW_AT_high_pc
57+
Form: DW_FORM_addr
58+
debug_info:
59+
- Length: 0x27
60+
Version: 4
61+
AbbrevTableID: 0
62+
AbbrOffset: 0x0
63+
AddrSize: 8
64+
Entries:
65+
- AbbrCode: 0x1
66+
Values:
67+
- Value: 0x1
68+
- Value: 0x2
69+
- Value: 0x0
70+
- AbbrCode: 0x2
71+
Values:
72+
- Value: 0xA
73+
- Value: 0x401000
74+
- Value: 0x401010
75+
- AbbrCode: 0x0
76+
debug_line:
77+
- Length: 61
78+
Version: 2
79+
PrologueLength: 31
80+
MinInstLength: 1
81+
DefaultIsStmt: 1
82+
LineBase: 251
83+
LineRange: 14
84+
OpcodeBase: 13
85+
StandardOpcodeLengths: [ 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 ]
86+
Files:
87+
- Name: main.cpp
88+
DirIdx: 0
89+
ModTime: 0
90+
Length: 0
91+
Opcodes:
92+
- Opcode: DW_LNS_extended_op
93+
ExtLen: 9
94+
SubOpcode: DW_LNE_set_address
95+
Data: 4198400
96+
- Opcode: DW_LNS_advance_line
97+
SData: 9
98+
Data: 0
99+
- Opcode: DW_LNS_copy
100+
Data: 0
101+
- Opcode: DW_LNS_advance_pc
102+
Data: 8
103+
- Opcode: DW_LNS_advance_line
104+
SData: 1
105+
Data: 0
106+
- Opcode: DW_LNS_copy
107+
Data: 0
108+
- Opcode: DW_LNS_advance_pc
109+
Data: 8
110+
- Opcode: DW_LNS_extended_op
111+
ExtLen: 1
112+
SubOpcode: DW_LNE_end_sequence
113+
Data: 0
114+
ProgramHeaders:
115+
- Type: PT_LOAD
116+
Flags: [ PF_X, PF_R ]
117+
VAddr: 0x0000000000400000
118+
Align: 0x1000
119+
FirstSec: .text
120+
LastSec: .text
121+
Symbols:
122+
- Name: _Z10make_ftypePci
123+
Type: STT_FUNC
124+
Section: .text
125+
Binding: STB_GLOBAL
126+
Value: 0x0000000000401000
127+
Size: 0x0000000000000010
128+
...
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
## Test that same-range dedup keeps the DWARF line table while replacing a
2+
## shortened Swift DWARF function name with the full Swift symbol-table name.
3+
4+
# RUN: yaml2obj %s -o %t
5+
# RUN: llvm-gsymutil --convert %t -o %t.gsym 2>&1 | FileCheck %s --check-prefix=CONVERT
6+
# RUN: llvm-gsymutil %t.gsym 2>&1 | FileCheck %s --check-prefix=DUMP
7+
8+
# CONVERT: Loaded 1 functions from DWARF.
9+
# CONVERT: Loaded 1 functions from symbol table.
10+
# CONVERT: Pruned 1 functions, ended with 1 total
11+
12+
# DUMP: "$s4main10make_ftypeyyF"
13+
# DUMP: LineTable:
14+
# DUMP: main.swift:10
15+
# DUMP: main.swift:11
16+
17+
--- !ELF
18+
FileHeader:
19+
Class: ELFCLASS64
20+
Data: ELFDATA2LSB
21+
Type: ET_EXEC
22+
Machine: EM_X86_64
23+
Sections:
24+
- Name: .text
25+
Type: SHT_PROGBITS
26+
Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
27+
Address: 0x0000000000401000
28+
AddressAlign: 0x10
29+
Content: 554889E531C05DC3554889E531C05DC3
30+
DWARF:
31+
debug_str:
32+
- ''
33+
- main.swift
34+
- make_ftype
35+
debug_abbrev:
36+
- ID: 0
37+
Table:
38+
- Code: 0x1
39+
Tag: DW_TAG_compile_unit
40+
Children: DW_CHILDREN_yes
41+
Attributes:
42+
- Attribute: DW_AT_name
43+
Form: DW_FORM_strp
44+
- Attribute: DW_AT_language
45+
Form: DW_FORM_udata
46+
- Attribute: DW_AT_stmt_list
47+
Form: DW_FORM_sec_offset
48+
- Code: 0x2
49+
Tag: DW_TAG_subprogram
50+
Children: DW_CHILDREN_no
51+
Attributes:
52+
- Attribute: DW_AT_name
53+
Form: DW_FORM_strp
54+
- Attribute: DW_AT_low_pc
55+
Form: DW_FORM_addr
56+
- Attribute: DW_AT_high_pc
57+
Form: DW_FORM_addr
58+
debug_info:
59+
- Length: 0x27
60+
Version: 4
61+
AbbrevTableID: 0
62+
AbbrOffset: 0x0
63+
AddrSize: 8
64+
Entries:
65+
- AbbrCode: 0x1
66+
Values:
67+
- Value: 0x1
68+
- Value: 0x1E
69+
- Value: 0x0
70+
- AbbrCode: 0x2
71+
Values:
72+
- Value: 0xC
73+
- Value: 0x401000
74+
- Value: 0x401010
75+
- AbbrCode: 0x0
76+
debug_line:
77+
- Length: 63
78+
Version: 2
79+
PrologueLength: 33
80+
MinInstLength: 1
81+
DefaultIsStmt: 1
82+
LineBase: 251
83+
LineRange: 14
84+
OpcodeBase: 13
85+
StandardOpcodeLengths: [ 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1 ]
86+
Files:
87+
- Name: main.swift
88+
DirIdx: 0
89+
ModTime: 0
90+
Length: 0
91+
Opcodes:
92+
- Opcode: DW_LNS_extended_op
93+
ExtLen: 9
94+
SubOpcode: DW_LNE_set_address
95+
Data: 4198400
96+
- Opcode: DW_LNS_advance_line
97+
SData: 9
98+
Data: 0
99+
- Opcode: DW_LNS_copy
100+
Data: 0
101+
- Opcode: DW_LNS_advance_pc
102+
Data: 8
103+
- Opcode: DW_LNS_advance_line
104+
SData: 1
105+
Data: 0
106+
- Opcode: DW_LNS_copy
107+
Data: 0
108+
- Opcode: DW_LNS_advance_pc
109+
Data: 8
110+
- Opcode: DW_LNS_extended_op
111+
ExtLen: 1
112+
SubOpcode: DW_LNE_end_sequence
113+
Data: 0
114+
ProgramHeaders:
115+
- Type: PT_LOAD
116+
Flags: [ PF_X, PF_R ]
117+
VAddr: 0x0000000000400000
118+
Align: 0x1000
119+
FirstSec: .text
120+
LastSec: .text
121+
Symbols:
122+
- Name: '$s4main10make_ftypeyyF'
123+
Type: STT_FUNC
124+
Section: .text
125+
Binding: STB_GLOBAL
126+
Value: 0x0000000000401000
127+
Size: 0x0000000000000010
128+
...

0 commit comments

Comments
 (0)