Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[clang][DebugInfo] Add symbol for debugger with VTable information. #130255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

CarlosAlbertoEnciso
Copy link
Member

The IR now includes a global variable for the debugger that holds
the address of the vtable.

Now every class that contains virtual functions, has a static
member (marked as artificial) that identifies where that vtable
is loaded in memory. The unmangled name is '_vtable$'.

This new symbol will allow a debugger to easily associate
classes with the physical location of their VTables using
only the DWARF information. Previously, this had to be done
by searching for ELF symbols with matching names; something
that was time-consuming and error-prone in certain edge cases.

The IR now includes a global variable for the debugger that
holds the address of the vtable.

Now every class that contains virtual functions, has a static
member (marked as artificial) that identifies where that vtable
is loaded in memory. The unmangled name is '_vtable$'.

This new symbol will allow a debugger to easily associate
classes with the physical location of their VTables using
only the DWARF information. Previously, this had to be done
by searching for ELF symbols with matching names; something
that was time-consuming and error-prone in certain edge cases.
@CarlosAlbertoEnciso CarlosAlbertoEnciso added clang Clang issues not falling into any other category lldb debuginfo labels Mar 7, 2025
@CarlosAlbertoEnciso CarlosAlbertoEnciso self-assigned this Mar 7, 2025
@llvmbot llvmbot added clang:modules C++20 modules and Clang Header Modules clang:codegen IR generation bugs: mangling, exceptions, etc. labels Mar 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 7, 2025

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang-modules
@llvm/pr-subscribers-lldb
@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-clang

Author: Carlos Alberto Enciso (CarlosAlbertoEnciso)

Changes

The IR now includes a global variable for the debugger that holds
the address of the vtable.

Now every class that contains virtual functions, has a static
member (marked as artificial) that identifies where that vtable
is loaded in memory. The unmangled name is '_vtable$'.

This new symbol will allow a debugger to easily associate
classes with the physical location of their VTables using
only the DWARF information. Previously, this had to be done
by searching for ELF symbols with matching names; something
that was time-consuming and error-prone in certain edge cases.


Patch is 44.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130255.diff

16 Files Affected:

  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+53)
  • (modified) clang/lib/CodeGen/CGDebugInfo.h (+3)
  • (modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+4)
  • (added) clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-base.cpp (+14)
  • (added) clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-base.h (+15)
  • (added) clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-derived.cpp (+13)
  • (added) clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-derived.h (+14)
  • (modified) clang/test/CodeGenCXX/debug-info-class.cpp (+14-12)
  • (modified) clang/test/CodeGenCXX/debug-info-template-member.cpp (+26-26)
  • (added) clang/test/CodeGenCXX/vtable-debug-info-inheritance-diamond.cpp (+87)
  • (added) clang/test/CodeGenCXX/vtable-debug-info-inheritance-multiple.cpp (+72)
  • (added) clang/test/CodeGenCXX/vtable-debug-info-inheritance-simple-main.cpp (+87)
  • (added) clang/test/CodeGenCXX/vtable-debug-info-inheritance-simple.cpp (+55)
  • (added) clang/test/CodeGenCXX/vtable-debug-info-inheritance-virtual.cpp (+87)
  • (modified) clang/test/Modules/ExtDebugInfo.cpp (+5-5)
  • (added) llvm/test/DebugInfo/X86/vtable-debug-info-inheritance-simple.ll (+206)
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 0e6daa42ee7bf..9cadeadc54111 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -2518,6 +2518,59 @@ StringRef CGDebugInfo::getVTableName(const CXXRecordDecl *RD) {
   return internString("_vptr$", RD->getNameAsString());
 }
 
+// Emit symbol for the debugger that points to the vtable address for
+// the given class. The symbol is named as '_vtable$'.
+// The debugger does not need to know any details about the contents of the
+// vtable as it can work this out using its knowledge of the ABI and the
+// existing information in the DWARF. The type is assumed to be 'void *'.
+void CGDebugInfo::emitVTableSymbol(llvm::GlobalVariable *VTable,
+                                   const CXXRecordDecl *RD) {
+  ASTContext &Context = CGM.getContext();
+  SmallString<64> Buffer;
+  Twine SymbolName = internString("_vtable$");
+  StringRef SymbolNameRef = SymbolName.toStringRef(Buffer);
+  DeclContext *DC = static_cast<DeclContext *>(const_cast<CXXRecordDecl *>(RD));
+  SourceLocation Loc;
+  QualType VoidPtr = Context.getPointerType(Context.VoidTy);
+
+  // We deal with two different contexts:
+  // - The type for the variable, which is part of the class that has the
+  //   vtable, is placed in the context of the DICompositeType metadata.
+  // - The DIGlobalVariable for the vtable is put in the DICompileUnitScope.
+
+  // The created non-member should be mark as 'artificial'. It will be
+  // placed it inside the scope of the C++ class/structure.
+  llvm::DIScope *DContext = getContextDescriptor(cast<Decl>(DC), TheCU);
+  auto *Ctxt = cast<llvm::DICompositeType>(DContext);
+  llvm::DIFile *Unit = getOrCreateFile(Loc);
+  llvm::DIType *VTy = getOrCreateType(VoidPtr, Unit);
+  llvm::DINode::DIFlags Flags = getAccessFlag(AccessSpecifier::AS_private, RD);
+  auto Tag = CGM.getCodeGenOpts().DwarfVersion >= 5
+                 ? llvm::dwarf::DW_TAG_variable
+                 : llvm::dwarf::DW_TAG_member;
+  llvm::DIDerivedType *OldDT = DBuilder.createStaticMemberType(
+      Ctxt, SymbolNameRef, Unit, /*LineNumber=*/0, VTy, Flags,
+      /*Val=*/nullptr, Tag);
+  llvm::DIDerivedType *DT =
+      static_cast<llvm::DIDerivedType *>(DBuilder.createArtificialType(OldDT));
+
+  // Use the same vtable pointer to global alignment for the symbol.
+  LangAS AS = CGM.GetGlobalVarAddressSpace(nullptr);
+  unsigned PAlign = CGM.getItaniumVTableContext().isRelativeLayout()
+                        ? 32
+                        : CGM.getTarget().getPointerAlign(AS);
+
+  // The global variable is in the CU scope, and links back to the type it's
+  // "within" via the declaration field.
+  llvm::DIGlobalVariableExpression *GVE =
+      DBuilder.createGlobalVariableExpression(
+          TheCU, SymbolNameRef, VTable->getName(), Unit, /*LineNo=*/0,
+          getOrCreateType(VoidPtr, Unit), VTable->hasLocalLinkage(),
+          /*isDefined=*/true, nullptr, DT, /*TemplateParameters=*/nullptr,
+          PAlign);
+  VTable->addDebugInfo(GVE);
+}
+
 StringRef CGDebugInfo::getDynamicInitializerName(const VarDecl *VD,
                                                  DynamicInitKind StubKind,
                                                  llvm::Function *InitFn) {
diff --git a/clang/lib/CodeGen/CGDebugInfo.h b/clang/lib/CodeGen/CGDebugInfo.h
index 38f73eca561b7..9cbc61de99a7e 100644
--- a/clang/lib/CodeGen/CGDebugInfo.h
+++ b/clang/lib/CodeGen/CGDebugInfo.h
@@ -636,6 +636,9 @@ class CGDebugInfo {
                                                 StringRef Category,
                                                 StringRef FailureMsg);
 
+  /// Emit symbol for debugger that holds the pointer to the vtable.
+  void emitVTableSymbol(llvm::GlobalVariable *VTable, const CXXRecordDecl *RD);
+
 private:
   /// Emit call to llvm.dbg.declare for a variable declaration.
   /// Returns a pointer to the DILocalVariable associated with the
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index b145da0f0ec09..1e6245387c576 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -2059,6 +2059,10 @@ void ItaniumCXXABI::emitVTableDefinitions(CodeGenVTables &CGVT,
     if (!VTable->isDSOLocal())
       CGVT.GenerateRelativeVTableAlias(VTable, VTable->getName());
   }
+
+  // Emit symbol for debugger only if requested debug info.
+  if (CGDebugInfo *DI = CGM.getModuleDebugInfo())
+    DI->emitVTableSymbol(VTable, RD);
 }
 
 bool ItaniumCXXABI::isVirtualOffsetNeededForVTableField(
diff --git a/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-base.cpp b/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-base.cpp
new file mode 100644
index 0000000000000..ffdfce56aeadc
--- /dev/null
+++ b/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-base.cpp
@@ -0,0 +1,14 @@
+#include "vtable-debug-info-inheritance-simple-base.h"
+
+void NSP::CBase::zero() {}
+int NSP::CBase::one() { return 1; }
+int NSP::CBase::two() { return 2; };
+int NSP::CBase::three() { return 3; }
+
+#ifdef SYMBOL_AT_FILE_SCOPE
+static NSP::CBase Base;
+#else
+void fooBase() {
+  NSP::CBase Base;
+}
+#endif
diff --git a/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-base.h b/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-base.h
new file mode 100644
index 0000000000000..1522419329e1d
--- /dev/null
+++ b/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-base.h
@@ -0,0 +1,15 @@
+#ifndef BASE_H
+#define BASE_H
+
+namespace NSP {
+  struct CBase {
+    unsigned B = 1;
+    virtual void zero();
+    virtual int one();
+    virtual int two();
+    virtual int three();
+  };
+}
+
+extern void fooBase();
+#endif
diff --git a/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-derived.cpp b/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-derived.cpp
new file mode 100644
index 0000000000000..cfc555aa6a485
--- /dev/null
+++ b/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-derived.cpp
@@ -0,0 +1,13 @@
+#include "vtable-debug-info-inheritance-simple-derived.h"
+
+void CDerived::zero() {}
+int CDerived::two() { return 22; };
+int CDerived::three() { return 33; }
+
+#ifdef SYMBOL_AT_FILE_SCOPE
+static CDerived Derived;
+#else
+void fooDerived() {
+  CDerived Derived;
+}
+#endif
diff --git a/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-derived.h b/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-derived.h
new file mode 100644
index 0000000000000..c5a8854b41eac
--- /dev/null
+++ b/clang/test/CodeGenCXX/Inputs/vtable-debug-info-inheritance-simple-derived.h
@@ -0,0 +1,14 @@
+#include "vtable-debug-info-inheritance-simple-base.h"
+
+#ifndef DERIVED_H
+#define DERIVED_H
+
+struct CDerived : NSP::CBase {
+  unsigned D = 2;
+  void zero() override;
+  int two() override;
+  int three() override;
+};
+
+extern void fooDerived();
+#endif
diff --git a/clang/test/CodeGenCXX/debug-info-class.cpp b/clang/test/CodeGenCXX/debug-info-class.cpp
index 8d610ca68a9d4..0bc4fdaa565c3 100644
--- a/clang/test/CodeGenCXX/debug-info-class.cpp
+++ b/clang/test/CodeGenCXX/debug-info-class.cpp
@@ -122,14 +122,6 @@ int main(int argc, char **argv) {
 // CHECK-SAME:             ){{$}}
 
 // CHECK:      ![[INT:[0-9]+]] = !DIBasicType(name: "int"
-// CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "foo"
-// CHECK: !DICompositeType(tag: DW_TAG_class_type, name: "bar"
-// CHECK: !DICompositeType(tag: DW_TAG_union_type, name: "baz"
-// CHECK: !DICompositeType(tag: DW_TAG_class_type, name: "B"
-// CHECK-NOT:              DIFlagFwdDecl
-// CHECK-SAME:             ){{$}}
-// CHECK: !DIDerivedType(tag: DW_TAG_member, name: "_vptr$B",
-// CHECK-SAME:           DIFlagArtificial
 
 // CHECK: [[C:![0-9]*]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "C",
 // CHECK-NOT:                              DIFlagFwdDecl
@@ -145,6 +137,20 @@ int main(int argc, char **argv) {
 // CHECK-SAME:                     DIFlagStaticMember
 // CHECK: [[C_DTOR]] = !DISubprogram(name: "~C"
 
+// CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "K"
+// CHECK-SAME:             identifier: "_ZTS1K"
+// CHECK-SAME:             ){{$}}
+
+// CHECK: !DICompositeType(tag: DW_TAG_class_type, name: "B"
+// CHECK-NOT:              DIFlagFwdDecl
+// CHECK-SAME:             ){{$}}
+// CHECK: !DIDerivedType(tag: DW_TAG_member, name: "_vptr$B",
+// CHECK-SAME:           DIFlagArtificial
+
+// CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "foo"
+// CHECK: !DICompositeType(tag: DW_TAG_class_type, name: "bar"
+// CHECK: !DICompositeType(tag: DW_TAG_union_type, name: "baz"
+
 // CHECK: [[D:![0-9]+]] = !DICompositeType(tag: DW_TAG_structure_type, name: "D"
 // CHECK-SAME:             size:
 // CHECK-SAME:             DIFlagFwdDecl
@@ -156,10 +162,6 @@ int main(int argc, char **argv) {
 // CHECK-NOT:              identifier:
 // CHECK-SAME:             ){{$}}
 
-// CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "K"
-// CHECK-SAME:             identifier: "_ZTS1K"
-// CHECK-SAME:             ){{$}}
-
 // CHECK: [[L:![0-9]+]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "L"
 // CHECK-SAME:             ){{$}}
 // CHECK: [[L_FUNC_DECL:![0-9]*]] = !DISubprogram(name: "func",{{.*}} scope: [[L]]
diff --git a/clang/test/CodeGenCXX/debug-info-template-member.cpp b/clang/test/CodeGenCXX/debug-info-template-member.cpp
index 66d9ba5ebc9b4..bb947c2ad4981 100644
--- a/clang/test/CodeGenCXX/debug-info-template-member.cpp
+++ b/clang/test/CodeGenCXX/debug-info-template-member.cpp
@@ -22,29 +22,6 @@ inline int add3(int x) {
 // CHECK: [[X]] = !DIGlobalVariableExpression(var: [[XV:.*]], expr: !DIExpression())
 // CHECK: [[XV]] = distinct !DIGlobalVariable(name: "x",
 // CHECK-SAME:                                type: ![[OUTER_FOO_INNER_ID:[0-9]+]]
-//
-// CHECK: {{![0-9]+}} = distinct !DIGlobalVariable(
-// CHECK-SAME: name: "var"
-// CHECK-SAME: templateParams: {{![0-9]+}}
-// CHECK: !DITemplateTypeParameter(name: "T", type: [[TY:![0-9]+]])
-// CHECK: {{![0-9]+}} = distinct !DIGlobalVariable(
-// CHECK-SAME: name: "var"
-// CHECK-SAME: templateParams: {{![0-9]+}}
-// CHECK: !DITemplateTypeParameter(name: "T", type: {{![0-9]+}})
-// CHECK: {{![0-9]+}} = distinct !DIGlobalVariable(
-// CHECK-SAME: name: "varray"
-// CHECK-SAME: templateParams: {{![0-9]+}}
-// CHECK: !DITemplateValueParameter(name: "N", type: [[TY]], value: i32 1)
-
-// CHECK: ![[OUTER_FOO_INNER_ID:[0-9]*]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "inner"{{.*}}, identifier:
-// CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "foo"
-// CHECK-SAME:             elements: [[FOO_MEM:![0-9]*]]
-// CHECK-SAME:             identifier: "_ZTS3foo"
-// CHECK: [[FOO_MEM]] = !{[[FOO_FUNC:![0-9]*]]}
-// CHECK: [[FOO_FUNC]] = !DISubprogram(name: "func", linkageName: "_ZN3foo4funcEN5outerIS_E5innerE",
-// CHECK-SAME:                         type: [[FOO_FUNC_TYPE:![0-9]*]]
-// CHECK: [[FOO_FUNC_TYPE]] = !DISubroutineType(types: [[FOO_FUNC_PARAMS:![0-9]*]])
-// CHECK: [[FOO_FUNC_PARAMS]] = !{null, !{{[0-9]*}}, ![[OUTER_FOO_INNER_ID]]}
 
 // CHECK: [[C:![0-9]*]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "MyClass"
 // CHECK-SAME:                             elements: [[C_MEM:![0-9]*]]
@@ -55,9 +32,6 @@ inline int add3(int x) {
 
 // CHECK: [[C_FUNC]] = !DISubprogram(name: "func",{{.*}} line: 9,
 
-// CHECK: !DISubprogram(name: "add<2>"
-// CHECK-SAME:          scope: [[C]]
-//
 // CHECK: [[VIRT_TEMP:![0-9]+]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "virt<elem>"
 // CHECK-SAME:             elements: [[VIRT_MEM:![0-9]*]]
 // CHECK-SAME:             vtableHolder: [[VIRT_TEMP]]
@@ -74,6 +48,32 @@ inline int add3(int x) {
 // CHECK: [[VIRT_TEMP_PARAM]] = !{[[VIRT_T:![0-9]*]]}
 // CHECK: [[VIRT_T]] = !DITemplateTypeParameter(name: "T", type: [[ELEM]])
 
+// CHECK: {{![0-9]+}} = distinct !DIGlobalVariable(
+// CHECK-SAME: name: "var"
+// CHECK-SAME: templateParams: {{![0-9]+}}
+// CHECK: !DITemplateTypeParameter(name: "T", type: [[TY:![0-9]+]])
+// CHECK: {{![0-9]+}} = distinct !DIGlobalVariable(
+// CHECK-SAME: name: "var"
+// CHECK-SAME: templateParams: {{![0-9]+}}
+// CHECK: !DITemplateTypeParameter(name: "T", type: {{![0-9]+}})
+// CHECK: {{![0-9]+}} = distinct !DIGlobalVariable(
+// CHECK-SAME: name: "varray"
+// CHECK-SAME: templateParams: {{![0-9]+}}
+// CHECK: !DITemplateValueParameter(name: "N", type: [[TY]], value: i32 1)
+
+// CHECK: ![[OUTER_FOO_INNER_ID:[0-9]*]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "inner"{{.*}}, identifier:
+// CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "foo"
+// CHECK-SAME:             elements: [[FOO_MEM:![0-9]*]]
+// CHECK-SAME:             identifier: "_ZTS3foo"
+// CHECK: [[FOO_MEM]] = !{[[FOO_FUNC:![0-9]*]]}
+// CHECK: [[FOO_FUNC]] = !DISubprogram(name: "func", linkageName: "_ZN3foo4funcEN5outerIS_E5innerE",
+// CHECK-SAME:                         type: [[FOO_FUNC_TYPE:![0-9]*]]
+// CHECK: [[FOO_FUNC_TYPE]] = !DISubroutineType(types: [[FOO_FUNC_PARAMS:![0-9]*]])
+// CHECK: [[FOO_FUNC_PARAMS]] = !{null, !{{[0-9]*}}, ![[OUTER_FOO_INNER_ID]]}
+
+// CHECK: !DISubprogram(name: "add<2>"
+// CHECK-SAME:          scope: [[C]]
+
 template<typename T>
 struct outer {
   struct inner {
diff --git a/clang/test/CodeGenCXX/vtable-debug-info-inheritance-diamond.cpp b/clang/test/CodeGenCXX/vtable-debug-info-inheritance-diamond.cpp
new file mode 100644
index 0000000000000..5ed1353eebb10
--- /dev/null
+++ b/clang/test/CodeGenCXX/vtable-debug-info-inheritance-diamond.cpp
@@ -0,0 +1,87 @@
+// REQUIRES: target={{x86_64.*-linux.*}}
+
+// Diamond inheritance case:
+// For CBase, CLeft, CRight and CDerived we check:
+// - Generation of their vtables (including attributes).
+// - Generation of their '_vtable$' data members:
+//   * Correct scope and attributes
+
+namespace NSP {
+  struct CBase {
+    int B = 0;
+    virtual char fooBase() { return 'B'; }
+  };
+}
+
+namespace NSP_1 {
+  struct CLeft : NSP::CBase {
+    int M1 = 1;
+    char fooBase() override { return 'O'; };
+    virtual int fooLeft() { return 1; }
+  };
+}
+
+namespace NSP_2 {
+  struct CRight : NSP::CBase {
+    int M2 = 2;
+    char fooBase() override { return 'T'; };
+    virtual int fooRight() { return 2; }
+  };
+}
+
+struct CDerived : NSP_1::CLeft, NSP_2::CRight {
+  int D = 3;
+  char fooBase() override { return 'D'; };
+  int fooDerived() { return 3; };
+};
+
+int main() {
+  NSP::CBase Base;
+  NSP_1::CLeft Left;
+  NSP_2::CRight Right;
+  CDerived Derived;
+
+  return 0;
+}
+
+// RUN: %clang --target=x86_64-linux -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -emit-llvm -S -g %s -o - | FileCheck %s
+
+// CHECK: $_ZTVN3NSP5CBaseE = comdat any
+// CHECK: $_ZTVN5NSP_15CLeftE = comdat any
+// CHECK: $_ZTVN5NSP_26CRightE = comdat any
+// CHECK: $_ZTV8CDerived = comdat any
+
+// CHECK: @_ZTVN3NSP5CBaseE = linkonce_odr {{dso_local|hidden}} unnamed_addr constant {{.*}}, comdat, align 8, !dbg [[BASE_VTABLE_VAR:![0-9]*]]
+// CHECK: @_ZTVN5NSP_15CLeftE = linkonce_odr {{dso_local|hidden}} unnamed_addr constant {{.*}}, comdat, align 8, !dbg [[LEFT_VTABLE_VAR:![0-9]*]]
+// CHECK: @_ZTVN5NSP_26CRightE = linkonce_odr {{dso_local|hidden}} unnamed_addr constant {{.*}}, comdat, align 8, !dbg [[RIGHT_VTABLE_VAR:![0-9]*]]
+// CHECK: @_ZTV8CDerived = linkonce_odr {{dso_local|hidden}} unnamed_addr constant {{.*}}, comdat, align 8, !dbg [[DERIVED_VTABLE_VAR:![0-9]*]]
+
+// CHECK: [[BASE_VTABLE_VAR]] = !DIGlobalVariableExpression(var: [[BASE_VTABLE:![0-9]*]], expr: !DIExpression())
+// CHECK-NEXT: [[BASE_VTABLE]] = distinct !DIGlobalVariable(name: "_vtable$", linkageName: "_ZTVN3NSP5CBaseE"
+
+// CHECK: [[LEFT_VTABLE_VAR]] = !DIGlobalVariableExpression(var: [[LEFT_VTABLE:![0-9]*]], expr: !DIExpression())
+// CHECK-NEXT: [[LEFT_VTABLE]] = distinct !DIGlobalVariable(name: "_vtable$", linkageName: "_ZTVN5NSP_15CLeftE"
+
+// CHECK: [[TYPE:![0-9]*]] = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null, size: 64)
+
+// CHECK: !DIDerivedType(tag: DW_TAG_variable, name: "_vtable$", scope: [[LEFT:![0-9]*]], file: {{.*}}, baseType: [[TYPE]], flags: DIFlagPrivate | DIFlagArtificial | DIFlagStaticMember)
+
+// CHECK: [[LEFT]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "CLeft"
+
+// CHECK: [[BASE:![0-9]*]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "CBase"
+
+// CHECK: [[RIGHT_VTABLE_VAR]] = !DIGlobalVariableExpression(var: [[RIGHT_VTABLE:![0-9]*]], expr: !DIExpression())
+// CHECK-NEXT: [[RIGHT_VTABLE]] = distinct !DIGlobalVariable(name: "_vtable$", linkageName: "_ZTVN5NSP_26CRightE"
+
+// CHECK: !DIDerivedType(tag: DW_TAG_variable, name: "_vtable$", scope: [[RIGHT:![0-9]*]], file: {{.*}}, baseType: [[TYPE]], flags: DIFlagPrivate | DIFlagArtificial | DIFlagStaticMember)
+
+// CHECK: [[RIGHT]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "CRight"
+
+// CHECK: [[DERIVED_VTABLE_VAR]] = !DIGlobalVariableExpression(var: [[DERIVED_VTABLE:![0-9]*]], expr: !DIExpression())
+// CHECK-NEXT: [[DERIVED_VTABLE]] = distinct !DIGlobalVariable(name: "_vtable$", linkageName: "_ZTV8CDerived"
+
+// CHECK: !DIDerivedType(tag: DW_TAG_variable, name: "_vtable$", scope: [[DERIVED:![0-9]*]], file: {{.*}}, baseType: [[TYPE]], flags: DIFlagPrivate | DIFlagArtificial | DIFlagStaticMember)
+
+// CHECK: [[DERIVED]] = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "CDerived"
+
+// CHECK: !DIDerivedType(tag: DW_TAG_variable, name: "_vtable$", scope: [[BASE]], file: {{.*}}, baseType: [[TYPE]], flags: DIFlagPrivate | DIFlagArtificial | DIFlagStaticMember)
diff --git a/clang/test/CodeGenCXX/vtable-debug-info-inheritance-multiple.cpp b/clang/test/CodeGenCXX/vtable-debug-info-inheritance-multiple.cpp
new file mode 100644
index 0000000000000..23973a35d0e17
--- /dev/null
+++ b/clang/test/CodeGenCXX/vtable-debug-info-inheritance-multiple.cpp
@@ -0,0 +1,72 @@
+// REQUIRES: target={{x86_64.*-linux.*}}
+
+// Multiple inheritance case:
+// For CBaseOne, CBaseTwo and CDerived we check:
+// - Generation of their vtables (including attributes).
+// - Generation of their '_vtable$' data members:
+//   * Correct scope and attributes
+
+namespace NSP_1 {
+  struct CBaseOne {
+    int B1 = 1;
+    virtual int one() { return 1; }
+    virtual int two() { return 2; }
+    virtual int three() { return 3; }
+  };
+}
+
+namespace NSP_2 {
+  struct CBaseTwo {
+    int B2 = 1;
+    virtual int four() { return 4; }
+    virtual int five() { return 5; }
+    virtual int six() { return 6; }
+  };
+}
+
+struct CDerived : NSP_1::CBaseOne, NSP_2::CBaseTwo {
+  int D = 1;
+  int two() override { return 22; };
+  int six() override { return 66; }
+};
+
+int main() {
+  NSP_1::CBaseOne BaseOne;
+  NSP_2::CBaseTwo BaseTwo;
+  CDerived Derived;
+
+  return 0;
+}
+
+// RUN: %clang --target=x86_64-linux -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -emit-llvm -S -g %s -o - | FileCheck %s
+
+// CHECK: $_ZTVN5NSP_18CBaseOneE = comdat any
+// CHECK: $_ZTVN5NSP_28CBaseTwoE = comdat any
+// CHECK: $_ZTV8CDerived = comdat any
+
+// CHECK: @_ZTVN5NSP_18CBaseOneE = linkonce_odr {{dso_local|hidden}} unnamed_addr constant {{.*}}, comdat, align 8, !dbg [[BASE_ONE_VTABLE_VAR:![0-9]*]]
+// CHECK: @_ZTVN5NSP_28CBaseTwoE = linkonce_odr {{dso_local|hidden}} unnamed_addr constant {{.*}}, comdat, align 8, !dbg [[BASE_TWO_VTABLE_VAR:![0-9]*]]
+// CHECK: @_ZTV8CDerived = linkonce_odr {{dso_local|hidden}} unnamed_addr constant {{.*}}, comdat, align 8, !dbg [[DERIVED_VTABLE_VAR:![0-9]*]]
+
+// CHECK: [[BASE_ONE_VTABLE_VAR]] = !DIGlobalVariableExpression(var: [[BASE_ONE_VTABLE:![0-9]*]], expr: !DIExpression())
+// CHECK-NEXT: [[BASE_ONE_VTABLE]] = distinct !DIGlobalVariable(name: "_vtable$", linkageName: "_ZTVN5NSP_18CBaseOneE"
+
+// CHECK: [[BASE_TWO_VTABLE_VAR]] = !DIGlobalVariableExpression(var: [[BASE_TWO_VTABLE:![0-9]*]], expr: !DIExpression())
+// CHECK-NEXT: [[BASE_TWO_VTABLE]] = distinct !DIGlobalVariable(name: "_vtable$", linkageName: "_ZTVN5NSP_28CBaseTwoE"
+
+// CHECK: [[TYPE:![0-9]*]] = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null, size: 64)
+
+// CHECK: !DIDerivedType(tag: DW_TAG_variable, name: "_vtable$", scope: [[BASE_TWO:![0-9]*]], file: {{.*}}, baseType: [[TYPE]], flags: DIFlagPrivate | DIFlagArtificial...
[truncated]

SmallString<64> Buffer;
Twine SymbolName = internString("_vtable$");
StringRef SymbolNameRef = SymbolName.toStringRef(Buffer);
DeclContext *DC = static_cast<DeclContext *>(const_cast<CXXRecordDecl *>(RD));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to cast away const here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to: const DeclContext *DC = static_cast<const DeclContext *>(RD);

ASTContext &Context = CGM.getContext();
SmallString<64> Buffer;
Twine SymbolName = internString("_vtable$");
StringRef SymbolNameRef = SymbolName.toStringRef(Buffer);
Copy link
Member

@Michael137 Michael137 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need the call to internString here? AFAIU it's just used when we require the copy of a string and don't want to heap allocate? Can we just make it a local?

llvm::StringRef SymbolName = "_vtable$";

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Changed to be local.

Copy link
Member

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to have a simpler of getting to the vtable info! I left some nits

From a debugger's perspective, could you elaborate on how one would make use of these two new DIEs? E.g., how do we get to the global variable when parsing the class? If we're given the _vtable$ artificial member, there's nothing useful we can do with it right?

// - The DIGlobalVariable for the vtable is put in the DICompileUnitScope.

// The created non-member should be mark as 'artificial'. It will be
// placed it inside the scope of the C++ class/structure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// placed it inside the scope of the C++ class/structure.
// placed inside the scope of the C++ class/structure.

Ctxt, SymbolNameRef, Unit, /*LineNumber=*/0, VTy, Flags,
/*Val=*/nullptr, Tag);
llvm::DIDerivedType *DT =
static_cast<llvm::DIDerivedType *>(DBuilder.createArtificialType(OldDT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling createArtificialType could we just add llvm::DINode::FlagArtificial to the Flags parameter we pass to createStaticMemberType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point.

LangAS AS = CGM.GetGlobalVarAddressSpace(nullptr);
unsigned PAlign = CGM.getItaniumVTableContext().isRelativeLayout()
? 32
: CGM.getTarget().getPointerAlign(AS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is how ItaniumCXXABI::getAddrOfVTable does it? Might be worth splitting this out into a common helper that can be shared between the two? (there's a couple more copies of this around Clang).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also need to guard this behind CGM.getTarget().getCXXABI().isItaniumFamily() or something (or even just short-circuit this entire function if we're not generating for Itanium?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a helper function just to calculate the alignment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole function emitVTableSymbol now is guarded.

@dwblaikie
Copy link
Collaborator

I wouldn't mind a few more details here on the motivation.

This new symbol will allow a debugger to easily associate classes with the physical location of their VTables using only the DWARF information.

What sort of features are you picturing building with this?

The DWARF currently provides access to the vtable location for /instances/ of the class, so curious what the distinction/need is for doing this from the class, without instances?

Previously, this had to be done by searching for ELF symbols with matching names; something that was time-consuming and error-prone in certain edge cases.

(I can appreciate that, if we are at the point of searching the symbol table, it's not a great one - but could you talk more about the edge cases/error-prone situations?)

@CarlosAlbertoEnciso
Copy link
Member Author

CarlosAlbertoEnciso commented Mar 14, 2025

@Michael137, @dwblaikie, @labath Thanks for your feedback.

@CarlosAlbertoEnciso
Copy link
Member Author

What sort of features are you picturing building with this?

Automatic type promotion: when displaying an object through a base pointer the debugger wants to still be able to show the object’s state.

@CarlosAlbertoEnciso
Copy link
Member Author

The DWARF currently provides access to the vtable location for /instances/ of the class, so curious what the distinction/need is for doing this from the class, without instances?

Previously, this had to be done by searching for ELF symbols with matching names; something that was time-consuming and error-prone in certain edge cases.

(I can appreciate that, if we are at the point of searching the symbol table, it's not a great one - but could you talk more about the edge cases/error-prone situations?)

The simplest example of an edge-case is classes with VTables inside functions. This is because the demangled name of the VTable’s ELF symbol (e.g., vtable for Func()::ClassA ) is not a searchable name in the global context (in the SCE debugger at least). You could argue that it should be, but it is further complicated by things like template parameters, function overloading, anonymous namespaces etc.

Another example, the demangled ELF symbol: vtable for int (anonymous namespace)::Func<((anonymous namespace)::E)0>(int)::A.

To work out which class A this refers to would involve parsing the template parameter correctly and matching to the correct anonymous namespace. While this technically isn’t impossible, it would involve the debugger keeping a lot of extra information around to disambiguate these rare cases, something we’re unlikely to be able to justify.

Implementing such demangling-and-interpretation would be error-prone, and the whole point of DWARF is to present information in a structured manner, so making it easy to access + interpret is part of its purpose.

@CarlosAlbertoEnciso
Copy link
Member Author

The DWARF currently provides access to the vtable location for /instances/ of the class, so curious what the distinction/need is for doing this from the class, without instances?

The need to be done for the class is to give the debugger extra information about the vtables during the debug information loading before any code is executed. We are using it to construct a map of vtable pointer => Class definition to enable the type promotion.

@labath
Copy link
Collaborator

labath commented Mar 14, 2025

IIUC, your debugger parses all of debug info upfront and builds up a vtable pointer->class DIE map. That's not something we would want to do in lldb, but I think we could still make use of this by first searching for the type using the name from the vtable (like we do now) and then confirming the match using the new information. It would be better if we could go from the vtable pointer straight to the right type DIE, but I don't think we can do that without inventing a new sort of an accelerator table (which I guess we don't have an appetite for).

To work out which class A this refers to would involve parsing the template parameter correctly and matching to the correct anonymous namespace. While this technically isn’t impossible

Are you sure about that? Anonymous types are confined to a single CU statically, but their values can definitely leak out at runtime. So if I'm stopped in a random CU and I see am object whose dynamic type is (anonymous namespace)::X, I don't see how one could determine which type (out of possibly many) is that vtable referring to.

@CarlosAlbertoEnciso
Copy link
Member Author

From a debugger's perspective, could you elaborate on how one would make use of these two new DIEs?

Using the test case (llvm/test/DebugInfo/X86/vtable-debug-info-inheritance-simple.ll):

The relevant generated DWARF for the “CDerived” class is:

0x00000042:   DW_TAG_variable
                DW_AT_specification (0x0000005c "_vtable$")
                DW_AT_alignment     (8)
                DW_AT_location      (DW_OP_addrx 0x1)
                DW_AT_linkage_name  ("_ZTV8CDerived")

0x0000004c:   DW_TAG_structure_type ("CDerived")
                ...
0x0000005c:     DW_TAG_variable
                  DW_AT_name  ("_vtable$")
                  DW_AT_type  (0x00000041 "void *")
                  DW_AT_external    (true)
                  DW_AT_declaration (true)
                  DW_AT_artificial  (true)
                  DW_AT_accessibility     (DW_ACCESS_private)
              ...

.debug_addr contents:
Addrs: [
0x0000000000000000
0x0000000000000000  <- DW_OP_addrx 0x1
0x0000000000000000
]

During DWARF loading if we load a symbol (0x00000042 -> 0x0000005c) on a compound type (0x0000004c) with a name _vtable$ and an artificial attribute, we use its address (DW_OP_addrx 0x1) as the vtable location for that compound type.
After DWARF loading, for each type's vtable we found, we deduce its layout and store the location of the virtual functions for that type.

@CarlosAlbertoEnciso
Copy link
Member Author

If we're given the _vtable$ artificial member, there's nothing useful we can do with it right?

Correct. The only direct information is the enclosing class.

Address comments from reviewers:
- Created a helper function to get the alignment.
- Remove the 'internString' call. Use a local variable.
- Remove the 'createArtificialType' call by updating the
  flags to include the 'artificial' bit.
@CarlosAlbertoEnciso
Copy link
Member Author

To work out which class A this refers to would involve parsing the template parameter correctly and matching to the correct anonymous namespace. While this technically isn’t impossible

Are you sure about that? Anonymous types are confined to a single CU statically, but their values can definitely leak out at runtime. So if I'm stopped in a random CU and I see am object whose dynamic type is (anonymous namespace)::X, I don't see how one could determine which type (out of possibly many) is that vtable referring to.

@labath I will double check with our debugger team.

@dwblaikie
Copy link
Collaborator

It would be better if we could go from the vtable pointer straight to the right type DIE, but I don't think we can do that without inventing a new sort of an accelerator table (which I guess we don't have an appetite for).

yeah, data address lookup (as opposed to code address lookup) is somewhat of a gap in DWARF at the moment. In /theory/ aranges held data and code addresses, but GCC only produced code addresses (LLVM produced data and code addresses, but didn't produce aranges by default because they were redundant (at least when ignoring data) with ranges)...

We could revisit that in some way - it (ranges or aranges) is not a lookup table, but it does at least give quick per-CU ranges. For DWARFv5 output, if you know only indexed addresses are being used (either becuase it's Split DWARF, or by scanning the abbrevs, maybe?) maybe you can still get a per-CU "which CU covers this vtable" query that's pretty quick (not sure if DWARF compression tools like dwz would make that more difficult because the indexed addr entry might not point straight to the start of the vtable - an offset to the vtable might be used somehow (like the new addr+offset form in DWARFv6)?)

But, yeah, I wouldn't mind hearing more about lldb's needs/preferences/hopes/dreams for this feature so we might get a design that's more portable at least between SCE and LLDB. (bonus points if anyone's got GDB's needs in mind - perhaps @tromey might be able to lend us some insight as to how GDB does things and what they might be inclined to use/support to improve this feature area)

@tromey
Copy link
Contributor

tromey commented Mar 17, 2025

But, yeah, I wouldn't mind hearing more about lldb's needs/preferences/hopes/dreams for this feature so we might get a design that's more portable at least between SCE and LLDB. (bonus points if anyone's got GDB's needs in mind - perhaps @tromey might be able to lend us some insight as to how GDB does things and what they might be inclined to use/support to improve this feature area)

For C++, GDB knows the details of the Itanium ABI. When set print object is enabled, it uses these to find the runtime type of an object. It's somewhat buggy, though, since it too does not understand types local to a function.

For Rust, we added a feature to LLVM to enable this. In Rust, a "trait object" has a vtable pointer and a pointer to the underlying real object. To discover the type of this real object, the vtable is emitted like:

 <1><2a>: Abbrev Number: 2 (DW_TAG_variable)
    <2b>   DW_AT_name        : (indirect string, offset: 0xda): <std::rt::lang_start::{closure_env#0}<()> as core::ops::function::Fn<()>>::{vtable}
    <2f>   DW_AT_type        : <0x3d>
    <33>   DW_AT_location    : 9 byte block: 3 68 5e 5 0 0 0 0 0 	(DW_OP_addr: 55e68)
 <1><3d>: Abbrev Number: 3 (DW_TAG_structure_type)
    <3e>   DW_AT_containing_type: <0xb5>
    <42>   DW_AT_name        : (indirect string, offset: 0x178): <std::rt::lang_start::{closure_env#0}<()> as core::ops::function::Fn<()>>::{vtable_type}
    <46>   DW_AT_byte_size   : 48
    <47>   DW_AT_alignment   : 8
... members ...

That is, the vtable is emitted as a global variable. It's type describes the vtable itself (of course). But the vtable type has a DW_AT_containing_type that points to the runtime type corresponding to this particular vtable.

I tend to think C++ should do something like this as well. The reason for this approach is that it makes it simple to go from some C++ object in memory to the runtime type: fetch the vtable pointer, look through the DWARF for the object at this address (which can sometimes be problematic as pointed out earlier), then examine the "containing type" to find the DWARF for the real type.

Existing code for Rust is here.

@CarlosAlbertoEnciso
Copy link
Member Author

To work out which class A this refers to would involve parsing the template parameter correctly and matching to the correct anonymous namespace. While this technically isn’t impossible

Are you sure about that? Anonymous types are confined to a single CU statically, but their values can definitely leak out at runtime. So if I'm stopped in a random CU and I see am object whose dynamic type is (anonymous namespace)::X, I don't see how one could determine which type (out of possibly many) is that vtable referring to.

@labath I will double check with our debugger team.

From our debugger team:

"You could in theory look at the ELF File symbol for the VTable symbol to work out which CU the anonymous namespace refers to, which is why we say it’s technically possible. You’d have to transfer that information to the debugger during loading though which we don’t currently do."

@tromey
Copy link
Contributor

tromey commented Mar 24, 2025

DW_AT_specification has a fairly specific meaning in DWARF. I don't really understand why you want to link from the class type to the vtable (the reverse seems more useful to me), but I would suggest a new attribute, considering it is a new capability. The link from the class to the specific vtable even seems mildly incorrect, in that during object construction the vtable will go through several different values, not just one.

Also linking from the vtable object to a member of the class seems less useful than the DW_AT_containing_type approach, where the link is explicitly to the type and not some member.

@dwblaikie
Copy link
Collaborator

The link from the class to the specific vtable even seems mildly incorrect, in that during object construction the vtable will go through several different values, not just one.

Not sure I follow this - the object is only of the type, in some sense, when it is pointing to that particular vtable. When the base subobject is constructed - it's only that, the base subobject (Or on destruction - once the most derived destruction has run, and the vtable is set to the base type, all the object is, in some sense, at that point, is the base subobject)

Though I haven't thought seriously about the representation - truly off the cuff, take with a grain of salt, etc, the static member that is the vtable seems sort of reasonable to me.

Not sure why it'd be necessary to make that vtable global variable "global" rather than static within the class? Is that for debug_names lookup? (I think static members are still in the index, right?) If it's a class member you can still do bidirectional lookup, right? IF you find the variable, you can find its parent to see which class it applies to, and if you have the class you can find the vtable variable inside it?

@clayborg
Copy link
Collaborator

FYI: There is already VTable support in our lldb::SBValue class and it is part of the public API in LLDB and doesn't require any of this:

  /// If this value represents a C++ class that has a vtable, return an value
  /// that represents the virtual function table.
  ///
  /// SBValue::GetError() will be in the success state if this value represents
  /// a C++ class with a vtable, or an appropriate error describing that the
  /// object isn't a C++ class with a vtable or not a C++ class.
  ///
  /// SBValue::GetName() will be the demangled symbol name for the virtual
  /// function table like "vtable for <classname>".
  ///
  /// SBValue::GetValue() will be the address of the first vtable entry if the
  /// current SBValue is a class with a vtable, or nothing the current SBValue
  /// is not a C++ class or not a C++ class that has a vtable.
  ///
  /// SBValue::GetValueAtUnsigned(...) will return the address of the first
  /// vtable entry.
  ///
  /// SBValue::GetLoadAddress() will return the address of the vtable pointer
  /// found in the parent SBValue.
  ///
  /// SBValue::GetNumChildren() will return the number of virtual function
  /// pointers in the vtable, or zero on error.
  ///
  /// SBValue::GetChildAtIndex(...) will return each virtual function pointer
  /// as a SBValue object.
  ///
  /// The child SBValue objects will have the following values:
  ///
  /// SBValue::GetError() will indicate success if the vtable entry was
  /// successfully read from memory, or an error if not.
  ///
  /// SBValue::GetName() will be the vtable function index in the form "[%u]"
  /// where %u is the index.
  ///
  /// SBValue::GetValue() will be the virtual function pointer value as a
  /// string.
  ///
  /// SBValue::GetValueAtUnsigned(...) will return the virtual function
  /// pointer value.
  ///
  /// SBValue::GetLoadAddress() will return the address of the virtual function
  /// pointer.
  ///
  /// SBValue::GetNumChildren() returns 0
  lldb::SBValue lldb::SBValue::GetVTable();

So you can do this:

$ cat main.cpp
   1   	#include <stdio.h>
   2   	
   3   	class Foo {
   4   	public:
   5   	  virtual ~Foo() = default;
   6   	  virtual void Dump() {
   7   	    puts(__PRETTY_FUNCTION__);
   8   	  }
   9   	};
   10  	
   11  	int main(int argc, const char **argv) {
   12  	  Foo f;
   13  	  f.Dump();
   14  	  return 0;
   15  	}

Then when you debug:

(lldb) script
Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
>>> v = lldb.frame.FindVariable('f')
>>> v.GetVTable()
vtable for Foo = 0x0000000100004030 {
  [0] = 0x0000000100003ea4 a.out`Foo::~Foo() at main.cpp:5
  [1] = 0x0000000100003ef4 a.out`Foo::~Foo() at main.cpp:5
  [2] = 0x0000000100003e7c a.out`Foo::Dump() at main.cpp:6
}

Doesn't require any debug info.

@clayborg
Copy link
Collaborator

The link from the class to the specific vtable even seems mildly incorrect, in that during object construction the vtable will go through several different values, not just one.

Not sure I follow this - the object is only of the type, in some sense, when it is pointing to that particular vtable. When the base subobject is constructed - it's only that, the base subobject (Or on destruction - once the most derived destruction has run, and the vtable is set to the base type, all the object is, in some sense, at that point, is the base subobject)

Though I haven't thought seriously about the representation - truly off the cuff, take with a grain of salt, etc, the static member that is the vtable seems sort of reasonable to me.

Not sure why it'd be necessary to make that vtable global variable "global" rather than static within the class? Is that for debug_names lookup? (I think static members are still in the index, right?) If it's a class member you can still do bidirectional lookup, right? IF you find the variable, you can find its parent to see which class it applies to, and if you have the class you can find the vtable variable inside it?

I don't mind this being in the debug info, but it would be nice to make sure it doesn't show up by default when dumping variable values. Some data formatters call SBValue SBValue::GetChildAtIndex(<idx>) and we don't want to change the indexes of all child values by having the vtable entry show up in a variable and possibly throw off people that use direct indexing.

@CarlosAlbertoEnciso
Copy link
Member Author

@clayborg Thanks very much for the extra information.

FYI: There is already VTable support in our lldb::SBValue class and it is part of the public API in LLDB and doesn't require any of this:

$ cat main.cpp
   1   	#include <stdio.h>
   2   	
   3   	class Foo {
   4   	public:
   5   	  virtual ~Foo() = default;
   6   	  virtual void Dump() {
   7   	    puts(__PRETTY_FUNCTION__);
   8   	  }
   9   	};
   10  	
   11  	int main(int argc, const char **argv) {
   12  	  Foo f;
   13  	  f.Dump();
   14  	  return 0;
   15  	}

Then when you debug:

(lldb) script
Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
>>> v = lldb.frame.FindVariable('f')
>>> v.GetVTable()
vtable for Foo = 0x0000000100004030 {
  [0] = 0x0000000100003ea4 a.out`Foo::~Foo() at main.cpp:5
  [1] = 0x0000000100003ef4 a.out`Foo::~Foo() at main.cpp:5
  [2] = 0x0000000100003e7c a.out`Foo::Dump() at main.cpp:6
}

Doesn't require any debug info.

Just a question: Can that functionality be used before the object is constructed?

@tromey
Copy link
Contributor

tromey commented Mar 25, 2025

The link from the class to the specific vtable even seems mildly incorrect, in that during object construction the vtable will go through several different values, not just one.

Not sure I follow this - the object is only of the type, in some sense, when it is pointing to that particular vtable.

Yeah, I agree, sorry about that.

@CarlosAlbertoEnciso
Copy link
Member Author

Though I haven't thought seriously about the representation - truly off the cuff, take with a grain of salt, etc, the static member that is the vtable seems sort of reasonable to me.

Not sure why it'd be necessary to make that vtable global variable "global" rather than static within the class? Is that for debug_names lookup? (I think static members are still in the index, right?) If it's a class member you can still do bidirectional lookup, right? IF you find the variable, you can find its parent to see which class it applies to, and if you have the class you can find the vtable variable inside it?

We used the global variable approach to give the debugger (SCE) a similar mechanism to an existing one to identify symbols that represent vtables. I think using just the static member will work and have the additional benefit of smaller debug info. I will check with the debugger team.

@dwblaikie
Copy link
Collaborator

FYI: There is already VTable support in our lldb::SBValue class and it is part of the public API in LLDB and doesn't require any of this:
...
Doesn't require any debug info.

Does this/can this be used to determine the type of an object that points to that vtable, though? If so, how does lldb resolve the ambiguities discussed earlier in this PR? (it'd be great to find out there's a way to do this, but it's not clear if/how that can be done - so more details about how lldb does it, if it does, would be super interesting!)

Looks like it fails under ambiguity:
test.h

struct base {
  virtual ~base() { }
  int i = 3;
};
base* f1();
base* f2();

test.cpp

#include "test.h"
namespace {
struct derived: base {
  float f = 3.14;
};
}
base* f1() {
  return new derived();
};
void breakpoint() { }
int main() {
  base* b1 = f1();
  base* b2 = f2();

  breakpoint();
}

test2.cpp

#include "test.h"
namespace {
struct derived: base {
  long l = 42;
};
}
base* f2() {
  return new derived();
}
$ lldb ./a.out
(lldb) b breakpoint
(lldb) r
(lldb) up
(lldb) v *b1
(derived) *b1 = {
  base = (i = 3)
  f = 3.1400001
}
(lldb) v *b2
(derived) *b2 = {
  base = (i = 3)
  f = 0
}

The latter should have a member named l, not f (with a different value too, but I assume that's just from interpreting the long bitpattern as a float).

gdb fails in the same way:

(gdb) p *b1
$1 = ((anonymous namespace)::derived) {
  <base> = {
    _vptr$base = 0x555555557d20 <vtable for (anonymous namespace)::derived+16>,
    i = 3
  },
  members of (anonymous namespace)::derived:
  f = 3.1400001
}
(gdb) p *b2
$2 = ((anonymous namespace)::derived) {
  <base> = {
    _vptr$base = 0x555555557d88 <vtable for (anonymous namespace)::derived+16>,
    i = 3
  },
  members of (anonymous namespace)::derived:
  f = 0
}

If you actually debug into the derived type (add a virtual function, call it in main, step into it from there in the debugger) lldb still thinks the type of derived is the float version, even when you're in the long version - gdb at least shows you the long version from inside, even though it continues to show you the float version from outside.

==================================================

Unrelated to this discussion on vtables, I was curious about some related local-type-naming-collision things:
if you remove the virtuality from it, lldb always thinks "derived" names the type in the main file, even when you're in the context of test2.cpp. lldb correctly follows type links (so if you print a local variable of "derived*" in test2.cpp, it prints the long version), but not naming the type in the expression evaluator, it seems.

(lldb) n
Process 2958937 stopped
* thread #1, name = 'a.out', stop reason = step over
    frame #0: 0x00005555555552d4 a.out`s2(b=0x000055555556b2d0) at test2.cpp:12:3
   9    }
   10   void s2(base* b) {
   11     derived* d = (derived*)b;
-> 12     breakpoint();
   13   }
(lldb) p *d
(derived) {
  base = (i = 3)
  l = 42
}
(lldb) p *(derived*)b;
(derived) {
  base = (i = 3)
  f = 0
}

(whereas gdb does get this ^ right, naming the type finds the appropriate local one - and gdb rejects using the name in a context it doesn't apply, whereas lldb will resolve the name in files that don't have this file-local name (seems to resolve it to whichever instance of the name comes first in the DWARF))

@CarlosAlbertoEnciso
Copy link
Member Author

@dwblaikie:

Not sure why it'd be necessary to make that vtable global variable "global" rather than static within the class? Is that for debug_names lookup? (I think static members are still in the index, right?) If it's a class member you can still do bidirectional lookup, right? IF you find the variable, you can find its parent to see which class it applies to, and if you have the class you can find the vtable variable inside it?

0x0000004c:   DW_TAG_structure_type ("CDerived")
                ...
0x0000005c:     DW_TAG_variable
                  DW_AT_name  ("_vtable$")
                  DW_AT_type  (0x00000041 "void *")
                  DW_AT_external  (true)
                  DW_AT_artificial  (true)
                  DW_AT_accessibility  (DW_ACCESS_private)
                  DW_AT_location  (DW_OP_addrx 0x1)
              ...

.debug_addr contents:
Addrs: [
0x0000000000000000
0x0000000000000000  <- DW_OP_addrx 0x1
0x0000000000000000
]

Removing the vtable global variable and moving the "location info" into the static within the class, will work for the SCE debugger.

@tromey
Copy link
Contributor

tromey commented Mar 31, 2025

Removing the vtable global variable and moving the "location info" into the static within the class, will work for the SCE debugger.

I was thinking about this last night and wondering if the vtable will appear as a class member even if the class is local to a function?

If so then it seems like this would be hard for gdb to find (can't speak for other debuggers). The issue being that gdb tends not to read DIEs that it thinks are uninteresting, and this means function bodies in general are skipped.

If the vtable were a global-but-artificial object, then it would readily be found by the initial scan.

@dwblaikie
Copy link
Collaborator

Removing the vtable global variable and moving the "location info" into the static within the class, will work for the SCE debugger.

I was thinking about this last night and wondering if the vtable will appear as a class member even if the class is local to a function?

If so then it seems like this would be hard for gdb to find (can't speak for other debuggers). The issue being that gdb tends not to read DIEs that it thinks are uninteresting, and this means function bodies in general are skipped.

If the vtable were a global-but-artificial object, then it would readily be found by the initial scan.

Hmm, so I think the idea was that there'd still be an out-of-line definition (like for an inline static class member variable with storage - the DWARF contains a definition DIE that's outside the class, eg: https://godbolt.org/z/z3Kz8Eorn ) - though since static class members can't exist in function-local classes we can't quite extrapolate from there. I'd say we can extrapolate from function-local types member function definitions, which appear outside the function the type is local to: https://godbolt.org/z/zvf15YKhE

Copy link

github-actions bot commented May 9, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@CarlosAlbertoEnciso
Copy link
Member Author

CarlosAlbertoEnciso commented May 9, 2025

Uploaded a patch that eliminates the global variable and it moves the vtable information into the static member;
in that way, a consumer always will have access to the vtable information, just by having the object instance or the object definition.

For clarity: Using the test case (llvm/test/DebugInfo/X86/vtable-debug-info-inheritance-simple.ll):

The relevant generated DWARF for the “CDerived” class before the patch is:

0x00000042:   DW_TAG_variable
                DW_AT_specification (0x0000005c "_vtable$")
                DW_AT_alignment     (8)
                DW_AT_location      (DW_OP_addrx 0x1)
                DW_AT_linkage_name  ("_ZTV8CDerived")

0x0000004c:   DW_TAG_structure_type ("CDerived")
                ...
0x0000005c:     DW_TAG_variable
                  DW_AT_name  ("_vtable$")
                  DW_AT_type  (0x00000041 "void *")
                  DW_AT_external    (true)
                  DW_AT_declaration (true)
                  DW_AT_artificial  (true)
                  DW_AT_accessibility     (DW_ACCESS_private)
              ...

.debug_addr contents:
Addrs: [
0x0000000000000000
0x0000000000000000  <- DW_OP_addrx 0x1
0x0000000000000000
]

The relevant generated DWARF for the “CDerived” class after the patch is:

0x0000004c:   DW_TAG_structure_type ("CDerived")
                ...
0x0000005c:     DW_TAG_variable
                  DW_AT_name  ("_vtable$")
                  DW_AT_type  (0x00000041 "void *")
                  DW_AT_external  (true)
                  DW_AT_artificial  (true)
                  DW_AT_accessibility  (DW_ACCESS_private)
                  DW_AT_alignment     (8)
                  DW_AT_location  (DW_OP_addrx 0x1)
                  DW_AT_linkage_name  ("_ZTV8CDerived")
              ...

.debug_addr contents:
Addrs: [
0x0000000000000000
0x0000000000000000  <- DW_OP_addrx 0x1
0x0000000000000000
]

The patch has been tested by the (SCE) debugger team.

@tromey
Copy link
Contributor

tromey commented May 12, 2025

Uploaded a patch that eliminates the global variable and it moves the vtable information into the static member; in that way, a consumer always will have access to the vtable information, just by having the object instance or the object definition.

IIUC this means that to see the vtable for a class, the debugger has to scan the class declaration itself -- the vtable isn't a separate CU-level global variable but is instead a static member of the class.

This approach won't work well for gdb. gdb tries not to scan DIEs that it does not need, in order to improve startup times. In particular it tries not to scan function bodies until necessary.

OTOH having a separate global variable representing the vtable itself is reasonably easy to handle. And, it would solve the "function-local class" problem for gdb.

@dwblaikie
Copy link
Collaborator

Uploaded a patch that eliminates the global variable and it moves the vtable information into the static member; in that way, a consumer always will have access to the vtable information, just by having the object instance or the object definition.

IIUC this means that to see the vtable for a class, the debugger has to scan the class declaration itself -- the vtable isn't a separate CU-level global variable but is instead a static member of the class.

This approach won't work well for gdb. gdb tries not to scan DIEs that it does not need, in order to improve startup times. In particular it tries not to scan function bodies until necessary.

My intent (haven't checked the patch) is that it'd be modeled as a static member variable - so there'd be a declaration in the class, but a definition DIE outside the class that'd be indexed by gdb OK, I'd have thought? (it'd go in .debug_names, and gdb_index, I think - figure gdb would parse/index the definition DIE?)

@dwblaikie
Copy link
Collaborator

Ah, yeah, I see the example from #130255 (comment) isn't consistent with what I had in mind (in the example there the member DIE is a definition - I don't think many consumers will be ready to handle that, since it's not how even inline-defined static member variables tend to be rendered in DWARF today)

@dwblaikie
Copy link
Collaborator

(I'm open to being overruled by other folks/perspectives if the straight up global variable is preferred - other folks who are in support of that?)

@tromey
Copy link
Contributor

tromey commented May 12, 2025

My intent (haven't checked the patch) is that it'd be modeled as a static member variable - so there'd be a declaration in the class, but a definition DIE outside the class that'd be indexed by gdb OK, I'd have thought? (it'd go in .debug_names, and gdb_index, I think - figure gdb would parse/index the definition DIE?)

I think this would be fine. The crucial thing, I think, is that there's some indication at the CU scope. This way the initial scan can take note of the global and its address; then fully read the CU if the class type is needed at some point.

@pogo59
Copy link
Collaborator

pogo59 commented May 12, 2025

I like modeling it as an artificial static member, which I think is the "before the patch" version from #130255 (comment)
The CU-level variable definition has a DW_AT_specification pointing to its declaration within the class type (which is using DW_AT_specification correctly), letting you find the class type from the variable. The declaration within the class type has the vtable's linkage name, which lets you find the vtable from the class type.

@dwblaikie
Copy link
Collaborator

My intent (haven't checked the patch) is that it'd be modeled as a static member variable - so there'd be a declaration in the class, but a definition DIE outside the class that'd be indexed by gdb OK, I'd have thought? (it'd go in .debug_names, and gdb_index, I think - figure gdb would parse/index the definition DIE?)

I think this would be fine. The crucial thing, I think, is that there's some indication at the CU scope. This way the initial scan can take note of the global and its address; then fully read the CU if the class type is needed at some point.

As a note - when you say "at the CU scope" do you mean a direct child of the CU, or anything outside a function or class definition? (ie: could be inside a namespace) - Clang puts definitions, I think, in the namespace nearest the declaration for the definition - compare these: https://godbolt.org/z/EoK4noe7o

@tromey
Copy link
Contributor

tromey commented May 12, 2025

As a note - when you say "at the CU scope" do you mean a direct child of the CU, or anything outside a function or class definition? (ie: could be inside a namespace) - Clang puts definitions, I think, in the namespace nearest the declaration for the definition - compare these: https://godbolt.org/z/EoK4noe7o

Outside of function bodies is probably good enough.

For me conceptually the vtable is an artificial global, but I could understand wanting it to be in a namespace or whatever.

And really if one were going that route, having the vtable object be a function-scoped static would also make sense. It's just that this incurs a new cost on the debuginfo reader -- but not for any deep source-related reason, because these aren't source-accessible objects anyway.

@jmorse
Copy link
Member

jmorse commented May 13, 2025

It sounds like there's agreement that the "before" approach was better/acceptable, i.e. having a CU-level variable that refers by DW_AT_specification to a variable in the class type. Doing so would also avoid the customisation for vtable-addresses in the latest patch with the createGlobalVariableVTableDIE method, which'd be neater. With that in mind, we'll head back in that direction.

It's also worth noting that this has spawned some DWARF issues such as https://dwarfstd.org/issues/250506.2.html , but I feel that's "future work".

@CarlosAlbertoEnciso
Copy link
Member Author

@dwblaikie @tromey @pogo59 @jmorse Thanks for your input in order to reach an agreement on the "better" approach.
Reverted to the "before" patch: having a CU-level variable that refers by DW_AT_specification to a variable in the class type.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole this looks fine to me with a final nit

Comment on lines 1819 to 1827
// Helper to get the alignment for a variable.
unsigned getGlobalVarAlignment(const VarDecl *D = nullptr) {
LangAS AS = GetGlobalVarAddressSpace(D);
unsigned PAlign = getItaniumVTableContext().isRelativeLayout()
? 32
: getTarget().getPointerAlign(AS);
return PAlign;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to refactor this out; I feel the name of the function should contain "vtable" somewhere though, it's fundamentally tied to producing vtable information as there's a call to getItaniumVTableContext, yes? There's a small risk that someone uses it for a different purpose, which we can fix by putting "vtable" in the name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed function name to getVtableGlobalVarAlignment.

@tromey
Copy link
Contributor

tromey commented May 13, 2025

Apologies if I missed it, but one thing I didn't see in the patch is a test for the case where a class is defined inside a function.

Given the discussion here, I guess this might not fully work correctly; but it seems to me that checking that the vtable symbol is global could be done and might provide some future-proofing.

Thanks.

Address comments from reviewers:
- Add 'vtable' string to the 'getGlobalVarAlignment()'
  function name to avoid any confusion on its usage.
- Add test cases to cover when a class is defined inside
  a function:
  - CBase (global) and CDerived (local)
  - CBase (local) and CDerived (local).
@CarlosAlbertoEnciso
Copy link
Member Author

Apologies if I missed it, but one thing I didn't see in the patch is a test for the case where a class is defined inside a function.

Given the discussion here, I guess this might not fully work correctly; but it seems to me that checking that the vtable symbol is global could be done and might provide some future-proofing.

Added 2 test cases to cover when a class is define inside a function:
Using CBase and CDerived from the previous test cases:

  • CBase defined at global scope and CDerived defined at function scope.
  • CBase and CDerived both defined at function scope.

@CarlosAlbertoEnciso
Copy link
Member Author

Updated patch.

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip the llvm testing here, I think - there are no LLVM changes to test. (as far as LLVM is concerned, this is just another static member variable)

Got measurements on debug info size growth or any other metrics we should be considering?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category debuginfo lldb
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants