Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[libc++][test] Don't pass ill-formed UTF-8 to MAKE_STRING_VIEW #136403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cpplearner
Copy link
Contributor

The tests escaped_output.unicode.pass.cpp and fill.unicode.pass.cpp use SV (which expands to MAKE_STRING_VIEW) to create a string view of CharT. MAKE_STRING_VIEW internally creates a u8 string literal, which is potentially non-portable when there's a numeric escape sequence (see CWG 1656). Latest MSVC preview (v17.14.0-pre.3.0) produces warning C5321 for this.

These tests don't actually need to produce a u8 string literal. (In fact, the affected lines are exercised only if CharT is char.) It seems possible to simply avoid SV in these places.

@cpplearner cpplearner requested a review from a team as a code owner April 19, 2025 04:32
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Apr 19, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 19, 2025

@llvm/pr-subscribers-libcxx

Author: S. B. Tam (cpplearner)

Changes

The tests escaped_output.unicode.pass.cpp and fill.unicode.pass.cpp use SV (which expands to MAKE_STRING_VIEW) to create a string view of CharT. MAKE_STRING_VIEW internally creates a u8 string literal, which is potentially non-portable when there's a numeric escape sequence (see CWG 1656). Latest MSVC preview (v17.14.0-pre.3.0) produces warning C5321 for this.

These tests don't actually need to produce a u8 string literal. (In fact, the affected lines are exercised only if CharT is char.) It seems possible to simply avoid SV in these places.


Full diff: https://github.com/llvm/llvm-project/pull/136403.diff

2 Files Affected:

  • (modified) libcxx/test/std/utilities/format/format.functions/escaped_output.unicode.pass.cpp (+1-1)
  • (modified) libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp (+23-13)
diff --git a/libcxx/test/std/utilities/format/format.functions/escaped_output.unicode.pass.cpp b/libcxx/test/std/utilities/format/format.functions/escaped_output.unicode.pass.cpp
index c4adf601c40af..eb27c70954664 100644
--- a/libcxx/test/std/utilities/format/format.functions/escaped_output.unicode.pass.cpp
+++ b/libcxx/test/std/utilities/format/format.functions/escaped_output.unicode.pass.cpp
@@ -337,7 +337,7 @@ void test_string() {
 
   // Ill-formed
   if constexpr (sizeof(CharT) == 1)
-    test_format(SV(R"("\x{80}")"), SV("{:?}"), SV("\x80"));
+    test_format(SV(R"("\x{80}")"), SV("{:?}"), "\x80");
 
   // *** P2713R1 examples ***
   test_format(SV(R"(["\u{301}"])"), SV("[{:?}]"), SV("\u0301"));
diff --git a/libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp b/libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp
index cd555e1ab9ce8..76f756ae91483 100644
--- a/libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp
+++ b/libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp
@@ -75,30 +75,40 @@ void test() {
 
   // Invalid Unicode Scalar Values
   if constexpr (std::same_as<CharT, char>) {
-    check_exception("The format specifier contains malformed Unicode characters", SV("{:\xed\xa0\x80^}"), 42); // U+D800
-    check_exception("The format specifier contains malformed Unicode characters", SV("{:\xed\xa0\xbf^}"), 42); // U+DBFF
-    check_exception("The format specifier contains malformed Unicode characters", SV("{:\xed\xbf\x80^}"), 42); // U+DC00
-    check_exception("The format specifier contains malformed Unicode characters", SV("{:\xed\xbf\xbf^}"), 42); // U+DFFF
+    check_exception("The format specifier contains malformed Unicode characters",
+                    std::string_view{"{:\xed\xa0\x80^}"},
+                    42); // U+D800
+    check_exception("The format specifier contains malformed Unicode characters",
+                    std::string_view{"{:\xed\xa0\xbf^}"},
+                    42); // U+DBFF
+    check_exception("The format specifier contains malformed Unicode characters",
+                    std::string_view{"{:\xed\xbf\x80^}"},
+                    42); // U+DC00
+    check_exception("The format specifier contains malformed Unicode characters",
+                    std::string_view{"{:\xed\xbf\xbf^}"},
+                    42); // U+DFFF
 
-    check_exception(
-        "The format specifier contains malformed Unicode characters", SV("{:\xf4\x90\x80\x80^}"), 42); // U+110000
-    check_exception(
-        "The format specifier contains malformed Unicode characters", SV("{:\xf4\x90\xbf\xbf^}"), 42); // U+11FFFF
+    check_exception("The format specifier contains malformed Unicode characters",
+                    std::string_view{"{:\xf4\x90\x80\x80^}"},
+                    42); // U+110000
+    check_exception("The format specifier contains malformed Unicode characters",
+                    std::string_view{"{:\xf4\x90\xbf\xbf^}"},
+                    42); // U+11FFFF
 
     check_exception("The format specifier contains malformed Unicode characters",
-                    SV("{:\x80^}"),
+                    std::string_view{"{:\x80^}"},
                     42); // Trailing code unit with no leading one.
     check_exception("The format specifier contains malformed Unicode characters",
-                    SV("{:\xc0^}"),
+                    std::string_view{"{:\xc0^}"},
                     42); // Missing trailing code unit.
     check_exception("The format specifier contains malformed Unicode characters",
-                    SV("{:\xe0\x80^}"),
+                    std::string_view{"{:\xe0\x80^}"},
                     42); // Missing trailing code unit.
     check_exception("The format specifier contains malformed Unicode characters",
-                    SV("{:\xf0\x80^}"),
+                    std::string_view{"{:\xf0\x80^}"},
                     42); // Missing two trailing code units.
     check_exception("The format specifier contains malformed Unicode characters",
-                    SV("{:\xf0\x80\x80^}"),
+                    std::string_view{"{:\xf0\x80\x80^}"},
                     42); // Missing trailing code unit.
 
 #ifndef TEST_HAS_NO_WIDE_CHARACTERS

Copy link
Contributor

@frederick-vs-ja frederick-vs-ja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Although the PR description looks a bit outdated. CWG1656 has been resolved by P2029R4, and now it's specified that each of \x80..\xff in a u8 string literal exactly produces a single char8_t array element. Before P2029R4 such use seemed to be ill-formed, but old versions of compilers used to silently accept it with different meanings.

Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sensible, but I'd really like to have @mordante's input on this before merging.

@cpplearner
Copy link
Contributor Author

ping @mordante

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants