Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

mpark
Copy link
Member

@mpark mpark commented Aug 28, 2025

Problem Description

Consider the following example involving an anonymous union in a class template:

// a.h
template <typename T>
struct S { union { T x; }; };

using SI = S<int>;

// b.h
import "a.h";
inline void f(S<int> s = {}) { s.x; }

// main.cpp
import "a.h";
void g(S<int>) {}

import "b.h";
void h() { f(); }

This example currently triggers an assertion failure that looks like this:

clang: llvm-project/clang/lib/CodeGen/CGRecordLayout.h:204: unsigned int clang::CodeGen::CGRecordLayout::getLLVMFieldNo(const FieldDecl *) const: Assertion `FieldInfo.count(FD) && "Invalid field for record!"' failed.

We try to look-up a FieldDecl instance in the FieldInfo of a CGRecordLayout, which is missing.

  • a.h has a ClassTemplateDecl (from the S primary template), and a ClassTemplateSpecializationDecl (from the using SI = S<int>;). Note however, that the ClassTemplateSpecializationDecl is not an instantiation at this point.
  • b.h imports a.h, and requires a full instantiation of S<int> (from f(S<int> s)). It performs the instantiation, creates the FieldDecl instances for the anonymous union and int x;, forms the reference to them, and write all of that to b.pcm.
  • main.cpp imports a.h, and it also requires a full instantiation of S<int> (from g(S<int>)). Since we don't have an instantiated version (we haven't imported b.h yet), we perform the instantiation again here.
  • main.cpp imports b.h, and that updates S<int> instantiated in main.cpp with the information from b.h. The invocation of f() pulls in the s.x expression which requires the FieldDecls (both the anonymous union and the int x;) in b.pcm. The anonymous union FieldDecl first tries to merge with the existing one, but this process is unsuccessful, and the merging does not occur.
  • We flow into CodeGen layer in this state, which produces a CGRecordLayout with the ClassTemplateSpecializationDecl with a FieldDecl for the anonymous union from main.cpp. We later try to do a look-up of the FieldDecl for the anonymous union from b.h. At this point, the expected behavior is for the FieldDecls to have been merged such that getCanonicalDecl returns the canonical version, which is present in the CGRecordLayout. However, since the merging failed, we end up doing the look-up of the FieldDecl for the anonymous union from b.h in a hashmap that contains the FieldDecl for the anonymous union from main.cpp.

Why Merging Fails

The following piece of code is within ASTDeclReader::getAnonymousDeclForMerging:

  // If this is the first time, but we have parsed a declaration of the context,
  // build the anonymous declaration list from the parsed declaration.
  auto *PrimaryDC = getPrimaryDCForAnonymousDecl(DC);
  if (PrimaryDC && !cast<Decl>(PrimaryDC)->isFromASTFile()) {
    numberAnonymousDeclsWithin(PrimaryDC, [&](NamedDecl *ND, unsigned Number) {
      if (Previous.size() == Number)
        Previous.push_back(cast<NamedDecl>(ND->getCanonicalDecl()));
      else
        Previous[Number] = cast<NamedDecl>(ND->getCanonicalDecl());
    });
  }

https://github.com/llvm/llvm-project/blob/main/clang/lib/Serialization/ASTReaderDecl.cpp#L3430

Typically, an AST import action will insert a DeclContext into the AnonymousDeclarationsForMerging hashmap if no existing entry has been found. However, if the DeclContext was parsed within this source file, it doesn't go through the same codepath to insert itself into the hashmap. As such, this piece of code is trying to fill that gap by performing the insertion if the DeclContext is not from an AST file. It's meant to handle an example like this:

// b.h
struct S { union { int x; }; };

// main.cpp
struct S { union { int x; }; };  // parsed within the source!
import "b.h";  // merge with above

The problem is that this condition is not accurate enough in the face of template instantiations.

Consider this slightly modified version of the above example:

// a.h
template <typename T>
struct S { union { T x; }; };

using SI = S<int>;  // creates a `ClassTemplateSpecializationDecl` but doesn't instantiate.

// b.h
import "a.h";
void f(S<int>) {}  // full instantiation.

// main.cpp
import "a.h";
void g(S<int>) {}  // fully instantiate `S<int>` here, but it fills in
                           // the `ClassTemplateSpecializationDecl` skeleton imported from `a.h`.

import "b.h"; // <-- problem

Here, in main.cpp during the import of a.h, we do an import of the skeleton (uninstantiated) instance of ClassTemplateSpecializationDecl. As such, the numbering of anonymous members does not take place. Then, it gets fully instantiated from void g(S<int>); but still, no numbering occurs. So semantically we're in the same situation as the previous example of having parsed the decl in the source (it just happens to be instantiated in the source rather than parsed), but the isFromASTFile of the ClassTemplateSpecializationDecl reports true (since the instance was imported from a.h!). The numbering never happens, which in turn makes it such that merging doesn't happen.

Proposed Solution

The solution proposed here is to check whether the point of instantiation is local, even if the decl is from an AST file. Additionally, we make an UpdateRecord only update the point of instantiation if it's not already set. This way, the point of instantiation always remains the first point of instantiation.

Alternative Considered

An approach where we inject the numbering logic before an UpdateDecl call was considered. The idea was that since for UpdateDecl we already know we're not the first one, we know we should prepare to do some merging. This approach does actually work to address merge-anon-in-template-3.cpp where an UpdateDecl occurs. However, it does not fix the case where an UpdateDecl does not occur, e.g. in merge-anon-in-template-2.cpp. As such, I believe that the solution proposed in this PR is simpler and more robust.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Aug 28, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2025

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Michael Park (mpark)

Changes

I'm having a hard time figuring out what the expected behavior is for this case.

If I get rid of the anonymous union and just keep S like this:

template &lt;typename T&gt;
struct S { T x; };

There are no issues. However, I see that merging occurs to avoid any issues in this case. With the union, merging logic also occurs there but it doesn't seem to work out properly. But, I'm not sure if this should be triggering merging logic at all?

If I structure the main.cpp like this:

import "hu-01.h";
import "hu-02.h";

void g(S&lt;int&gt;) {}
void h() { f(); }

There are also no issues, and this time there doesn't seem to be any merging either.

Question: Should this case (1) not be merging at all? or (2) is it correct to be merging, and we need to fix the logic for merging anonymous unions?


Full diff: https://github.com/llvm/llvm-project/pull/155948.diff

1 Files Affected:

  • (added) clang/test/Modules/anon-union-in-template.cpp (+47)
diff --git a/clang/test/Modules/anon-union-in-template.cpp b/clang/test/Modules/anon-union-in-template.cpp
new file mode 100644
index 0000000000000..97fcdc7db86be
--- /dev/null
+++ b/clang/test/Modules/anon-union-in-template.cpp
@@ -0,0 +1,47 @@
+// RUN: rm -rf %t
+// RUN: mkdir -p %t
+// RUN: split-file %s %t
+
+// RUN: %clang_cc1 -std=c++20 -fmodule-name=hu-01 -emit-header-unit -xc++-user-header %t/hu-01.h \
+// RUN:  -o %t/hu-01.pcm
+
+// RUN: %clang_cc1 -std=c++20 -fmodule-name=hu-02 -emit-header-unit -xc++-user-header %t/hu-02.h \
+// RUN:  -Wno-experimental-header-units \
+// RUN:  -fmodule-map-file=%t/hu-01.map -fmodule-file=hu-01=%t/hu-01.pcm \
+// RUN:  -o %t/hu-02.pcm
+
+// RUN: %clang_cc1 -std=c++20 -emit-obj %t/main.cpp \
+// RUN:  -Wno-experimental-header-units \
+// RUN:  -fmodule-map-file=%t/hu-01.map -fmodule-file=hu-01=%t/hu-01.pcm \
+// RUN:  -fmodule-map-file=%t/hu-02.map -fmodule-file=hu-02=%t/hu-02.pcm
+
+//--- hu-01.map
+module "hu-01" {
+  header "hu-01.h"
+  export *
+}
+
+//--- hu-02.map
+module "hu-02" {
+  header "hu-02.h"
+  export *
+}
+
+//--- hu-01.h
+#pragma once
+
+template <typename T>
+struct S { union { T x; }; };
+
+using SI = S<int>;
+
+//--- hu-02.h
+import "hu-01.h";
+inline void f(S<int> s = {}) { s.x; }
+
+//--- main.cpp
+import "hu-01.h";
+void g(S<int>) {}
+
+import "hu-02.h";
+void h() { f(); }

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to merge it. Why do you think we don't need to merge it?

@mpark
Copy link
Member Author

mpark commented Aug 29, 2025

It's just that this is a failing test case that I'd like to ship with a fix.

@mpark mpark force-pushed the anon-union-in-template branch from 375388f to 7940530 Compare September 5, 2025 06:31
@mpark
Copy link
Member Author

mpark commented Sep 5, 2025

I think we need to merge it. Why do you think we don't need to merge it?

oh goodness... I only just realized now that you meant that the decls should be merged. At the time I thought you were saying we should merge the PR 😂

@mpark mpark changed the title [C++20][Modules] Add a test for field info assertion failure. [C++20][Modules] Fix merging of anonymous members in templates. Sep 5, 2025
@mpark mpark changed the title [C++20][Modules] Fix merging of anonymous members in templates. [C++20][Modules] Fix merging of anonymous members of class templates. Sep 5, 2025
@mpark mpark force-pushed the anon-union-in-template branch from 7940530 to c69375f Compare September 5, 2025 19:01
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How these decls get merged if:

// a.h
template <typename T>
struct S { union { T x; }; };

// b.h
import "a.h";
inline void f(S<int> s = {}) { s.x; }

// main.cpp
import "a.h";
void g(S<int>) {}

import "b.h";
void h() { f(); }

I feel we can get some ideas from it.

auto InstantiatedLocally = [](Decl *D, SourceManager &SourceMgr) -> bool {
auto *CTSD = dyn_cast<ClassTemplateSpecializationDecl>(D);
return CTSD && CTSD->getPointOfInstantiation().isValid() &&
SourceMgr.isLocalSourceLocation(CTSD->getPointOfInstantiation());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really not ideal. I only use source location for semantical analysis when debugging... it is pandora's box. I don't want that really.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... great point. Let me see if I can capture this into a bit flag instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've changed the approach a bit to where this information is essentially tracked in a separate bit flag in ClassTemplateSpecializationDecl called InstantiatedLocally.

@mpark
Copy link
Member Author

mpark commented Sep 11, 2025

How these decls get merged if:

// a.h
template <typename T>
struct S { union { T x; }; };

// b.h
import "a.h";
inline void f(S<int> s = {}) { s.x; }

// main.cpp
import "a.h";
void g(S<int>) {}

import "b.h";
void h() { f(); }

I feel we can get some ideas from it.

In this case, because of the missing using SI = S<int>; in a.h, when we import it in main.cpp, there's no ClassTemplateSpecializationDecl to be imported from a.h at all. We create a new instance of ClassTemplateSpecializationDecl within main.cpp where the isFromASTFile is set to false (since we created within main.cpp). The "not isFromASTFile" condition in this case kicks in, and therefore behaves correctly.

@mpark mpark force-pushed the anon-union-in-template branch from c69375f to 302a2ea Compare September 13, 2025 03:50
@llvmbot llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Sep 13, 2025
@mpark mpark force-pushed the anon-union-in-template branch from 302a2ea to c9f95c1 Compare September 13, 2025 04:26
@mpark mpark force-pushed the anon-union-in-template branch from c9f95c1 to 899fb7f Compare September 13, 2025 04:33
@llvm llvm deleted a comment from github-actions bot Sep 13, 2025
@mpark mpark requested a review from ChuanqiXu9 September 13, 2025 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants