Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[clang-repl] Fix Value::setRawBits unit confusion and right-size raw storage.#200886

Merged
vgvassilev merged 1 commit into
llvm:mainfrom
vgvassilev:clang-repl-value-byte-units
Jun 4, 2026
Merged

[clang-repl] Fix Value::setRawBits unit confusion and right-size raw storage.#200886
vgvassilev merged 1 commit into
llvm:mainfrom
vgvassilev:clang-repl-value-byte-units

Conversation

@vgvassilev
Copy link
Copy Markdown
Contributor

Value::setRawBits had inconsistent units: the default value and the size assert treated the parameter as bytes (sizeof(Storage)), while the memcpy treated it as bits (NBits / 8). A caller passing the natural byte count (e.g. sizeof(long long)) ended up copying only sizeof(T)/8 bytes -- one byte for an 8-byte payload, leaving the rest stale. The one in-tree caller compensated by multiplying by 8, hiding the bug.

Rename the parameter to NBytes and drop the / 8 so the API name, default, assert, and memcpy all agree on bytes. Update the caller in InterpreterValuePrinter.cpp to pass ElemSize directly.

Right-size the Storage::m_RawBits array while we are here: it was sizeof(long double) * 8 bytes, which reads like a bit/byte confusion since the widest typed member of the union is long double itself. The oversized array made sizeof(Value) ~144 bytes on x86_64 instead of ~40, bloating every copy/move of a Value.

Add a regression test exercising setRawBits with both an explicit byte count and the default argument. Pre-fix the test fails for both: the explicit-count branch copies 1 byte instead of 8, and the default branch copies sizeof(Storage)/8 bytes instead of the full union width.

@vgvassilev vgvassilev requested a review from AaronBallman June 1, 2026 17:49
@llvmorg-github-actions llvmorg-github-actions Bot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jun 1, 2026
…storage

Value::setRawBits had inconsistent units: the default value and the
size assert treated the parameter as bytes (sizeof(Storage)), while the
memcpy treated it as bits (NBits / 8). A caller passing the natural
byte count (e.g. sizeof(long long)) ended up copying only sizeof(T)/8
bytes -- one byte for an 8-byte payload, leaving the rest stale. The
one in-tree caller compensated by multiplying by 8, hiding the bug.

Rename the parameter to NBytes and drop the / 8 so the API name,
default, assert, and memcpy all agree on bytes. Update the caller in
InterpreterValuePrinter.cpp to pass ElemSize directly.

Right-size the Storage::m_RawBits array while we are here: it was
sizeof(long double) * 8 bytes, which reads like a bit/byte confusion
since the widest typed member of the union is long double itself. The
oversized array made sizeof(Value) ~144 bytes on x86_64 instead of
~40, bloating every copy/move of a Value.

Add a regression test exercising setRawBits with both an explicit
byte count and the default argument. Pre-fix the test fails for both:
the explicit-count branch copies 1 byte instead of 8, and the
default branch copies sizeof(Storage)/8 bytes instead of the full
union width.
@vgvassilev vgvassilev force-pushed the clang-repl-value-byte-units branch from 89caae4 to 926d06e Compare June 1, 2026 17:49
@llvmorg-github-actions
Copy link
Copy Markdown

@llvm/pr-subscribers-clang

Author: Vassil Vassilev (vgvassilev)

Changes

Value::setRawBits had inconsistent units: the default value and the size assert treated the parameter as bytes (sizeof(Storage)), while the memcpy treated it as bits (NBits / 8). A caller passing the natural byte count (e.g. sizeof(long long)) ended up copying only sizeof(T)/8 bytes -- one byte for an 8-byte payload, leaving the rest stale. The one in-tree caller compensated by multiplying by 8, hiding the bug.

Rename the parameter to NBytes and drop the / 8 so the API name, default, assert, and memcpy all agree on bytes. Update the caller in InterpreterValuePrinter.cpp to pass ElemSize directly.

Right-size the Storage::m_RawBits array while we are here: it was sizeof(long double) * 8 bytes, which reads like a bit/byte confusion since the widest typed member of the union is long double itself. The oversized array made sizeof(Value) ~144 bytes on x86_64 instead of ~40, bloating every copy/move of a Value.

Add a regression test exercising setRawBits with both an explicit byte count and the default argument. Pre-fix the test fails for both: the explicit-count branch copies 1 byte instead of 8, and the default branch copies sizeof(Storage)/8 bytes instead of the full union width.


Full diff: https://github.com/llvm/llvm-project/pull/200886.diff

4 Files Affected:

  • (modified) clang/include/clang/Interpreter/Value.h (+5-2)
  • (modified) clang/lib/Interpreter/InterpreterValuePrinter.cpp (+1-1)
  • (modified) clang/lib/Interpreter/Value.cpp (+3-3)
  • (modified) clang/unittests/Interpreter/InterpreterTest.cpp (+31)
diff --git a/clang/include/clang/Interpreter/Value.h b/clang/include/clang/Interpreter/Value.h
index b91301e6096eb..23ef123ded8ee 100644
--- a/clang/include/clang/Interpreter/Value.h
+++ b/clang/include/clang/Interpreter/Value.h
@@ -98,7 +98,7 @@ class REPL_EXTERNAL_VISIBILITY Value {
     REPL_BUILTIN_TYPES
 #undef X
     void *m_Ptr;
-    unsigned char m_RawBits[sizeof(long double) * 8]; // widest type
+    unsigned char m_RawBits[sizeof(long double)]; // widest typed member
   };
 
 public:
@@ -140,7 +140,10 @@ class REPL_EXTERNAL_VISIBILITY Value {
 
   void *getPtr() const;
   void setPtr(void *Ptr) { Data.m_Ptr = Ptr; }
-  void setRawBits(void *Ptr, unsigned NBits = sizeof(Storage));
+  /// Copy `NBytes` bytes from `Ptr` into the raw storage. Default copies
+  /// the full Storage width. Used by the value printer to read a single
+  /// array element through a typed lens without an extra heap allocation.
+  void setRawBits(void *Ptr, unsigned NBytes = sizeof(Storage));
 
 #define X(type, name)                                                          \
   void set##name(type Val) { Data.m_##name = Val; }                            \
diff --git a/clang/lib/Interpreter/InterpreterValuePrinter.cpp b/clang/lib/Interpreter/InterpreterValuePrinter.cpp
index 1754e7812469a..79f1e2b6571c6 100644
--- a/clang/lib/Interpreter/InterpreterValuePrinter.cpp
+++ b/clang/lib/Interpreter/InterpreterValuePrinter.cpp
@@ -204,7 +204,7 @@ std::string Interpreter::ValueDataToString(const Value &V) const {
       if (ElemTy->isBuiltinType()) {
         // Single dim arrays, advancing.
         uintptr_t Offset = (uintptr_t)V.getPtr() + Idx * ElemSize;
-        InnerV.setRawBits((void *)Offset, ElemSize * 8);
+        InnerV.setRawBits((void *)Offset, ElemSize);
       } else {
         // Multi dim arrays, position to the next dimension.
         size_t Stride = ElemCount / N;
diff --git a/clang/lib/Interpreter/Value.cpp b/clang/lib/Interpreter/Value.cpp
index d4c9d51ffcb61..b985361ed748a 100644
--- a/clang/lib/Interpreter/Value.cpp
+++ b/clang/lib/Interpreter/Value.cpp
@@ -229,9 +229,9 @@ void *Value::getPtr() const {
   return Data.m_Ptr;
 }
 
-void Value::setRawBits(void *Ptr, unsigned NBits /*= sizeof(Storage)*/) {
-  assert(NBits <= sizeof(Storage) && "Greater than the total size");
-  memcpy(/*dest=*/Data.m_RawBits, /*src=*/Ptr, /*nbytes=*/NBits / 8);
+void Value::setRawBits(void *Ptr, unsigned NBytes /*= sizeof(Storage)*/) {
+  assert(NBytes <= sizeof(Storage) && "Greater than the total size");
+  memcpy(/*dest=*/Data.m_RawBits, /*src=*/Ptr, /*nbytes=*/NBytes);
 }
 
 QualType Value::getType() const {
diff --git a/clang/unittests/Interpreter/InterpreterTest.cpp b/clang/unittests/Interpreter/InterpreterTest.cpp
index 9ff9092524d21..2df29b3d5def5 100644
--- a/clang/unittests/Interpreter/InterpreterTest.cpp
+++ b/clang/unittests/Interpreter/InterpreterTest.cpp
@@ -421,6 +421,37 @@ TEST_F(InterpreterTest, Value) {
   EXPECT_STREQ(prettyPrint.c_str(), "(D) (One) : unsigned int 1\n");
 }
 
+// Regression: Value::setRawBits's NBytes parameter must be interpreted as a
+// byte count end-to-end. Before this was fixed, the parameter was named
+// NBits and the memcpy divided by 8, so a caller passing sizeof(T) (the
+// natural byte count) ended up copying only sizeof(T)/8 bytes -- leaving
+// the upper bytes uninitialised. The only in-tree caller compensated by
+// multiplying by 8, hiding the bug.
+TEST_F(InterpreterTest, ValueSetRawBitsCopiesByteCount) {
+  std::vector<const char *> Args;
+  std::unique_ptr<Interpreter> Interp = createInterpreter(Args);
+
+  // Explicit byte count: writing sizeof(long long) bytes must round-trip
+  // every byte. Pre-fix this copied 1 byte (8 / 8) and left the upper 7
+  // bytes stale.
+  Value V;
+  llvm::cantFail(Interp->ParseAndExecute("long long x = 0; x", &V));
+  ASSERT_EQ(V.getKind(), Value::K_LongLong);
+  long long Src = 0x0123456789ABCDEFLL;
+  V.setRawBits(&Src, sizeof(Src));
+  EXPECT_EQ(V.getLongLong(), Src);
+
+  // Default NBytes argument copies sizeof(Storage). Pre-fix this copied
+  // sizeof(Storage) / 8 bytes, dropping the high half of an 8-byte payload.
+  Value V2;
+  llvm::cantFail(Interp->ParseAndExecute("long long y = 0; y", &V2));
+  ASSERT_EQ(V2.getKind(), Value::K_LongLong);
+  unsigned char Buf[sizeof(long double)] = {};
+  std::memcpy(Buf, &Src, sizeof(Src));
+  V2.setRawBits(Buf);
+  EXPECT_EQ(V2.getLongLong(), Src);
+}
+
 TEST_F(InterpreterTest, TranslationUnit_CanonicalDecl) {
   std::vector<const char *> Args;
   std::unique_ptr<Interpreter> Interp = createInterpreter(Args);

@AaronBallman AaronBallman requested a review from tbaederr June 3, 2026 18:13
Copy link
Copy Markdown
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this come with a release note?

#undef X
void *m_Ptr;
unsigned char m_RawBits[sizeof(long double) * 8]; // widest type
unsigned char m_RawBits[sizeof(long double)]; // widest typed member
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still a problem, but a different problem -- this doesn't handle _BitInt properly, nor __int128, etc. and the size of a long double is quite platform-specific.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct, but I will address that in a separate PR.

@tbaederr tbaederr removed their request for review June 4, 2026 03:40
@vgvassilev
Copy link
Copy Markdown
Contributor Author

Should this come with a release note?

This is more like a memory optimization I don't think we need to put anything in the readme.

Copy link
Copy Markdown
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vgvassilev vgvassilev merged commit 3a1d8a8 into llvm:main Jun 4, 2026
10 checks passed
@vgvassilev vgvassilev deleted the clang-repl-value-byte-units branch June 4, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants