-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[clang] Constant-evaluate format strings as last resort #135864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Clang's -Wformat checker can see through an inconsistent set of operations. We can fall back to the recently-updated constant string evaluation infrastructure when Clang's initial evaluation fails for a second chance at figuring out what the format string is intended to be. This enables analyzing format strings that were built at compile-time with std::string and other constexpr-capable types in C++, as long as all pieces are also constexpr-visible, and a number of other patterns. Radar-ID: rdar://99940060
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: None (apple-fcloutier) ChangesI asked on the forums and people were generally supportive of the idea, so: Clang's -Wformat checker can see through an inconsistent set of operations. We can fall back to the recently-updated constant string evaluation infrastructure when Clang's initial evaluation fails for a second chance at figuring out what the format string is intended to be. This enables analyzing format strings that were built at compile-time with std::string and other constexpr-capable types in C++, as long as all pieces are also constexpr-visible, and a number of other patterns. As a side effect, it also enables Radar-ID: rdar://99940060 Full diff: https://github.com/llvm/llvm-project/pull/135864.diff 7 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 77bf3355af9da..05566d66a65d2 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -265,6 +265,9 @@ related warnings within the method body.
``format_matches`` accepts an example valid format string as its third
argument. For more information, see the Clang attributes documentation.
+- Format string checking now supports the compile-time evaluation of format
+ strings as a fallback mechanism.
+
- Introduced a new statement attribute ``[[clang::atomic]]`` that enables
fine-grained control over atomic code generation on a per-statement basis.
Supported options include ``[no_]remote_memory``,
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index 20f70863a05b3..78eda8bc3c43e 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -791,7 +791,14 @@ class Expr : public ValueStmt {
const Expr *PtrExpression, ASTContext &Ctx,
EvalResult &Status) const;
- /// If the current Expr can be evaluated to a pointer to a null-terminated
+ /// Fill \c Into with the first characters that can be constant-evaluated
+ /// from this \c Expr . When encountering a null character, stop and return
+ /// \c true (the null is not returned in \c Into ). Return \c false if
+ /// evaluation runs off the end of the constant-evaluated string before it
+ /// encounters a null character.
+ bool tryEvaluateString(ASTContext &Ctx, std::string &Into) const;
+
+ /// If the current \c Expr can be evaluated to a pointer to a null-terminated
/// constant string, return the constant string (without the terminating
/// null).
std::optional<std::string> tryEvaluateString(ASTContext &Ctx) const;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 3cb2731488fab..4139ff2737c76 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -10170,6 +10170,8 @@ def warn_format_bool_as_character : Warning<
"using '%0' format specifier, but argument has boolean value">,
InGroup<Format>;
def note_format_string_defined : Note<"format string is defined here">;
+def note_format_string_evaluated_to : Note<
+ "format string was constant-evaluated">;
def note_format_fix_specifier : Note<"did you mean to use '%0'?">;
def note_printf_c_str: Note<"did you mean to call the %0 method?">;
def note_format_security_fixit: Note<
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 80ece3c4ed7e1..fec92edf49096 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -17945,15 +17945,36 @@ bool Expr::tryEvaluateObjectSize(uint64_t &Result, ASTContext &Ctx,
static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
EvalInfo &Info, std::string *StringResult) {
- if (!E->getType()->hasPointerRepresentation() || !E->isPRValue())
+ QualType Ty = E->getType();
+ if (!E->isPRValue())
return false;
LValue String;
-
- if (!EvaluatePointer(E, String, Info))
+ QualType CharTy;
+ if (Ty->canDecayToPointerType()) {
+ if (E->isGLValue()) {
+ if (!EvaluateLValue(E, String, Info))
+ return false;
+ } else {
+ APValue &Value = Info.CurrentCall->createTemporary(
+ E, Ty, ScopeKind::FullExpression, String);
+ if (!EvaluateInPlace(Value, Info, String, E))
+ return false;
+ }
+ // The result is a pointer to the first element of the array.
+ auto *AT = Info.Ctx.getAsArrayType(Ty);
+ CharTy = AT->getElementType();
+ if (auto *CAT = dyn_cast<ConstantArrayType>(AT))
+ String.addArray(Info, E, CAT);
+ else
+ String.addUnsizedArray(Info, E, CharTy);
+ } else if (Ty->hasPointerRepresentation()) {
+ if (!EvaluatePointer(E, String, Info))
+ return false;
+ CharTy = Ty->getPointeeType();
+ } else {
return false;
-
- QualType CharTy = E->getType()->getPointeeType();
+ }
// Fast path: if it's a string literal, search the string value.
if (const StringLiteral *S = dyn_cast_or_null<StringLiteral>(
@@ -17995,13 +18016,16 @@ static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
}
}
-std::optional<std::string> Expr::tryEvaluateString(ASTContext &Ctx) const {
+bool Expr::tryEvaluateString(ASTContext &Ctx, std::string &StringResult) const {
Expr::EvalStatus Status;
EvalInfo Info(Ctx, Status, EvalInfo::EM_ConstantFold);
uint64_t Result;
- std::string StringResult;
+ return EvaluateBuiltinStrLen(this, Result, Info, &StringResult);
+}
- if (EvaluateBuiltinStrLen(this, Result, Info, &StringResult))
+std::optional<std::string> Expr::tryEvaluateString(ASTContext &Ctx) const {
+ std::string StringResult;
+ if (tryEvaluateString(Ctx, StringResult))
return StringResult;
return {};
}
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index bffd0dd461d3d..017be929ca18e 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -98,6 +98,7 @@
#include "llvm/Support/Locale.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/SaveAndRestore.h"
+#include "llvm/Support/SmallVectorMemoryBuffer.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/TargetParser/RISCVTargetParser.h"
#include "llvm/TargetParser/Triple.h"
@@ -5935,8 +5936,14 @@ static void CheckFormatString(
llvm::SmallBitVector &CheckedVarArgs, UncoveredArgHandler &UncoveredArg,
bool IgnoreStringsWithoutSpecifiers);
-static const Expr *maybeConstEvalStringLiteral(ASTContext &Context,
- const Expr *E);
+enum StringLiteralConstEvalResult {
+ SLCER_NotEvaluated,
+ SLCER_NotNullTerminated,
+ SLCER_Evaluated,
+};
+
+static StringLiteralConstEvalResult
+constEvalStringAsLiteral(Sema &S, const Expr *E, const StringLiteral *&SL);
// Determine if an expression is a string literal or constant string.
// If this function returns false on the arguments to a function expecting a
@@ -5968,14 +5975,9 @@ static StringLiteralCheckType checkFormatStringExpr(
switch (E->getStmtClass()) {
case Stmt::InitListExprClass:
- // Handle expressions like {"foobar"}.
- if (const clang::Expr *SLE = maybeConstEvalStringLiteral(S.Context, E)) {
- return checkFormatStringExpr(
- S, ReferenceFormatString, SLE, Args, APK, format_idx, firstDataArg,
- Type, CallType, /*InFunctionCall*/ false, CheckedVarArgs,
- UncoveredArg, Offset, IgnoreStringsWithoutSpecifiers);
- }
- return SLCT_NotALiteral;
+ // try to constant-evaluate the string
+ break;
+
case Stmt::BinaryConditionalOperatorClass:
case Stmt::ConditionalOperatorClass: {
// The expression is a literal if both sub-expressions were, and it was
@@ -6066,10 +6068,9 @@ static StringLiteralCheckType checkFormatStringExpr(
if (InitList->isStringLiteralInit())
Init = InitList->getInit(0)->IgnoreParenImpCasts();
}
- return checkFormatStringExpr(
- S, ReferenceFormatString, Init, Args, APK, format_idx,
- firstDataArg, Type, CallType,
- /*InFunctionCall*/ false, CheckedVarArgs, UncoveredArg, Offset);
+ InFunctionCall = false;
+ E = Init;
+ goto tryAgain;
}
}
@@ -6142,11 +6143,9 @@ static StringLiteralCheckType checkFormatStringExpr(
}
return SLCT_UncheckedLiteral;
}
- return checkFormatStringExpr(
- S, ReferenceFormatString, PVFormatMatches->getFormatString(),
- Args, APK, format_idx, firstDataArg, Type, CallType,
- /*InFunctionCall*/ false, CheckedVarArgs, UncoveredArg,
- Offset, IgnoreStringsWithoutSpecifiers);
+ E = PVFormatMatches->getFormatString();
+ InFunctionCall = false;
+ goto tryAgain;
}
}
@@ -6214,20 +6213,13 @@ static StringLiteralCheckType checkFormatStringExpr(
unsigned BuiltinID = FD->getBuiltinID();
if (BuiltinID == Builtin::BI__builtin___CFStringMakeConstantString ||
BuiltinID == Builtin::BI__builtin___NSStringMakeConstantString) {
- const Expr *Arg = CE->getArg(0);
- return checkFormatStringExpr(
- S, ReferenceFormatString, Arg, Args, APK, format_idx,
- firstDataArg, Type, CallType, InFunctionCall, CheckedVarArgs,
- UncoveredArg, Offset, IgnoreStringsWithoutSpecifiers);
+ E = CE->getArg(0);
+ goto tryAgain;
}
}
}
- if (const Expr *SLE = maybeConstEvalStringLiteral(S.Context, E))
- return checkFormatStringExpr(
- S, ReferenceFormatString, SLE, Args, APK, format_idx, firstDataArg,
- Type, CallType, /*InFunctionCall*/ false, CheckedVarArgs,
- UncoveredArg, Offset, IgnoreStringsWithoutSpecifiers);
- return SLCT_NotALiteral;
+ // try to constant-evaluate the string
+ break;
}
case Stmt::ObjCMessageExprClass: {
const auto *ME = cast<ObjCMessageExpr>(E);
@@ -6248,11 +6240,8 @@ static StringLiteralCheckType checkFormatStringExpr(
IgnoreStringsWithoutSpecifiers = true;
}
- const Expr *Arg = ME->getArg(FA->getFormatIdx().getASTIndex());
- return checkFormatStringExpr(
- S, ReferenceFormatString, Arg, Args, APK, format_idx, firstDataArg,
- Type, CallType, InFunctionCall, CheckedVarArgs, UncoveredArg,
- Offset, IgnoreStringsWithoutSpecifiers);
+ E = ME->getArg(FA->getFormatIdx().getASTIndex());
+ goto tryAgain;
}
}
@@ -6314,7 +6303,8 @@ static StringLiteralCheckType checkFormatStringExpr(
}
}
- return SLCT_NotALiteral;
+ // try to constant-evaluate the string
+ break;
}
case Stmt::UnaryOperatorClass: {
const UnaryOperator *UnaOp = cast<UnaryOperator>(E);
@@ -6331,26 +6321,79 @@ static StringLiteralCheckType checkFormatStringExpr(
}
}
- return SLCT_NotALiteral;
+ // try to constant-evaluate the string
+ break;
}
default:
+ // try to constant-evaluate the string
+ break;
+ }
+
+ const StringLiteral *FakeLiteral = nullptr;
+ switch (constEvalStringAsLiteral(S, E, FakeLiteral)) {
+ case SLCER_NotEvaluated:
return SLCT_NotALiteral;
+
+ case SLCER_NotNullTerminated:
+ S.Diag(Args[format_idx]->getBeginLoc(),
+ diag::warn_printf_format_string_not_null_terminated)
+ << Args[format_idx]->getSourceRange();
+ if (!InFunctionCall)
+ S.Diag(E->getBeginLoc(), diag::note_format_string_defined);
+ // Stop checking, as this might just mean we're missing a chunk of the
+ // format string and there would be other spurious format issues.
+ return SLCT_UncheckedLiteral;
+
+ case SLCER_Evaluated:
+ InFunctionCall = false;
+ E = FakeLiteral;
+ goto tryAgain;
}
}
-// If this expression can be evaluated at compile-time,
-// check if the result is a StringLiteral and return it
-// otherwise return nullptr
-static const Expr *maybeConstEvalStringLiteral(ASTContext &Context,
- const Expr *E) {
+static StringLiteralConstEvalResult
+constEvalStringAsLiteral(Sema &S, const Expr *E, const StringLiteral *&SL) {
+ // As a last resort, try to constant-evaluate the format string. If it
+ // evaluates to a string literal in the first place, we can point to that
+ // string literal in source and use that.
Expr::EvalResult Result;
- if (E->EvaluateAsRValue(Result, Context) && Result.Val.isLValue()) {
+ if (E->EvaluateAsRValue(Result, S.Context) && Result.Val.isLValue()) {
const auto *LVE = Result.Val.getLValueBase().dyn_cast<const Expr *>();
- if (isa_and_nonnull<StringLiteral>(LVE))
- return LVE;
+ if (auto *BaseSL = dyn_cast_or_null<StringLiteral>(LVE)) {
+ SL = BaseSL;
+ return SLCER_Evaluated;
+ }
}
- return nullptr;
+
+ // Otherwise, try to evaluate the expression as a string constant.
+ std::string FormatString;
+ if (!E->tryEvaluateString(S.Context, FormatString)) {
+ return FormatString.empty() ? SLCER_NotEvaluated : SLCER_NotNullTerminated;
+ }
+
+ std::unique_ptr<llvm::MemoryBuffer> MemBuf;
+ {
+ llvm::SmallString<80> EscapedString;
+ {
+ llvm::raw_svector_ostream OS(EscapedString);
+ OS << '"';
+ OS.write_escaped(FormatString);
+ OS << '"';
+ }
+ MemBuf.reset(new llvm::SmallVectorMemoryBuffer(std::move(EscapedString),
+ "<scratch space>", true));
+ }
+
+ // Plop that string into a scratch buffer, create a string literal and then
+ // go with that.
+ auto ScratchFile = S.getSourceManager().createFileID(std::move(MemBuf));
+ SourceLocation Begin = S.getSourceManager().getLocForStartOfFile(ScratchFile);
+ QualType SLType = S.Context.getStringLiteralArrayType(S.Context.CharTy,
+ FormatString.length());
+ SL = StringLiteral::Create(S.Context, FormatString,
+ StringLiteralKind::Ordinary, false, SLType, Begin);
+ return SLCER_Evaluated;
}
StringRef Sema::GetFormatStringTypeName(Sema::FormatStringType FST) {
@@ -6973,10 +7016,11 @@ void CheckFormatHandler::EmitFormatDiagnostic(
S.Diag(IsStringLocation ? ArgumentExpr->getExprLoc() : Loc, PDiag)
<< ArgumentExpr->getSourceRange();
- const Sema::SemaDiagnosticBuilder &Note =
- S.Diag(IsStringLocation ? Loc : StringRange.getBegin(),
- diag::note_format_string_defined);
-
+ SourceLocation DiagLoc = IsStringLocation ? Loc : StringRange.getBegin();
+ unsigned DiagID = S.getSourceManager().isWrittenInScratchSpace(DiagLoc)
+ ? diag::note_format_string_evaluated_to
+ : diag::note_format_string_defined;
+ const Sema::SemaDiagnosticBuilder &Note = S.Diag(DiagLoc, DiagID);
Note << StringRange;
Note << FixIt;
}
diff --git a/clang/test/Sema/format-strings.c b/clang/test/Sema/format-strings.c
index af30ad5d15fe2..a94e0619ce843 100644
--- a/clang/test/Sema/format-strings.c
+++ b/clang/test/Sema/format-strings.c
@@ -3,6 +3,11 @@
// RUN: %clang_cc1 -fblocks -fsyntax-only -verify -Wformat-nonliteral -isystem %S/Inputs -triple=x86_64-unknown-fuchsia %s
// RUN: %clang_cc1 -fblocks -fsyntax-only -verify -Wformat-nonliteral -isystem %S/Inputs -triple=x86_64-linux-android %s
+// expected-note@-5{{format string was constant-evaluated}}
+// ^^^ there will be a <scratch space> SourceLocation caused by the
+// test_consteval_init_array test, that -verify treats as if it showed up at
+// line 1 of this file.
+
#include <stdarg.h>
#include <stddef.h>
#define __need_wint_t
@@ -900,3 +905,12 @@ void test_promotion(void) {
// pointers
printf("%s", i); // expected-warning{{format specifies type 'char *' but the argument has type 'int'}}
}
+
+void test_consteval_init_array(void) {
+ const char buf_not_terminated[] = {'%', 55 * 2 + 5, '\n'}; // expected-note{{format string is defined here}}
+ printf(buf_not_terminated, "hello"); // expected-warning{{format string is not null-terminated}}
+
+ const char buf[] = {'%', 55 * 2 + 5, '\n', 0};
+ printf(buf, "hello"); // no-warning
+ printf(buf, 123); // expected-warning{{format specifies type 'char *' but the argument has type 'int'}}
+}
diff --git a/clang/test/SemaCXX/format-strings.cpp b/clang/test/SemaCXX/format-strings.cpp
index 48cf23999a94f..7b04ea7d8bc75 100644
--- a/clang/test/SemaCXX/format-strings.cpp
+++ b/clang/test/SemaCXX/format-strings.cpp
@@ -1,6 +1,14 @@
// RUN: %clang_cc1 -fsyntax-only -verify -Wformat-nonliteral -Wformat-non-iso -Wformat-pedantic -fblocks %s
// RUN: %clang_cc1 -fsyntax-only -verify -Wformat-nonliteral -Wformat-non-iso -fblocks -std=c++98 %s
// RUN: %clang_cc1 -fsyntax-only -verify -Wformat-nonliteral -Wformat-non-iso -Wformat-pedantic -fblocks -std=c++11 %s
+// RUN: %clang_cc1 -fsyntax-only -verify -Wformat-nonliteral -Wformat-non-iso -Wformat-pedantic -fblocks -std=c++20 %s
+
+#if __cplusplus >= 202000l
+// expected-note@-6{{format string was constant-evaluated}}
+// ^^^ there will be a <scratch space> SourceLocation caused by the
+// test_constexpr_string test, that -verify treats as if it showed up at
+// line 1 of this file.
+#endif
#include <stdarg.h>
@@ -238,3 +246,69 @@ void f(Scoped1 S1, Scoped2 S2) {
}
#endif
+
+#if __cplusplus >= 202000L
+class my_string {
+ char *data;
+ unsigned size;
+
+public:
+ template<unsigned N>
+ constexpr my_string(const char (&literal)[N]) {
+ data = new char[N+1];
+ for (size = 0; size < N; ++size) {
+ data[size] = literal[size];
+ if (data[size] == 0)
+ break;
+ }
+ data[size] = 0;
+ }
+
+ my_string(const my_string &) = delete;
+
+ constexpr my_string(my_string &&that) {
+ data = that.data;
+ size = that.size;
+ that.data = nullptr;
+ that.size = 0;
+ }
+
+ constexpr ~my_string() {
+ delete[] data;
+ }
+
+ template<unsigned N>
+ constexpr void append(const char (&literal)[N]) {
+ char *cat = new char[size + N + 1];
+ char *tmp = cat;
+ for (unsigned i = 0; i < size; ++i) {
+ *tmp++ = data[i];
+ }
+ for (unsigned i = 0; i < N; ++i) {
+ *tmp = literal[i];
+ if (*tmp == 0)
+ break;
+ ++tmp;
+ }
+ *tmp = 0;
+ delete[] data;
+ size = tmp - cat;
+ data = cat;
+ }
+
+ constexpr const char *c_str() const {
+ return data;
+ }
+};
+
+constexpr my_string const_string() {
+ my_string str("hello %s");
+ str.append(", %d");
+ return str;
+}
+
+void test_constexpr_string() {
+ printf(const_string().c_str(), "hello", 123); // no-warning
+ printf(const_string().c_str(), 123, 456); // expected-warning {{format specifies type 'char *' but the argument has type 'int'}}
+}
+#endif
|
clang/test/Sema/format-strings.c
Outdated
@@ -3,6 +3,11 @@ | |||
// RUN: %clang_cc1 -fblocks -fsyntax-only -verify -Wformat-nonliteral -isystem %S/Inputs -triple=x86_64-unknown-fuchsia %s | |||
// RUN: %clang_cc1 -fblocks -fsyntax-only -verify -Wformat-nonliteral -isystem %S/Inputs -triple=x86_64-linux-android %s | |||
|
|||
// expected-note@-5{{format string was constant-evaluated}} | |||
// ^^^ there will be a <scratch space> SourceLocation caused by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say, I don't like this at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, in an actual diagnostic, it shows up like this:
<scratch space>:1:8: note: format string was constant-evaluated
1 | "hello %s, %d"
| ^~
| %d
The format string was constant-evaluated
note could say format string was constant-evaluated to "hello %s, %d"
and not have the scratch space text, or we could simply not show the constant-evaluated string. This is worse because we are unable to point at the incorrect specifier in the format string. Given format specifies type 'char *' but the argument has type 'int'
, if your format string has two or three %s specifiers, there is no simple way for you to know which one the compiler is talking about.
The patch already supports the case where constant evaluation resolves to a string literal that exists in source. When that's not the case, I feel pretty strongly that we need to bring up the format string to the user somehow to show these diagnostics. Can you think of other ways to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That definitely clarifies a lot, do we do this anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm aware of. Before this change, buffers are used only for macro expansion. (The other specially-named buffers are <built-in>
and <command line>
, but we really could call it anything.) As far as I know, diagnosing based on the string result of compile-time evaluation is unprecedented and we need to do something new one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like @shafik and @tbaederr - I have a slight concern with performance.
In general, we should avoid checking format strings when these diagnostics are not enabled. That would at least lead to less work in system headers.
Benchmarking sounds like a good idea.
But I don't have a better solution than using a scratch space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cor3ntin I looked into this and there's about 50 distinct DiagIDs for format-related issues, which is impractical to for check ahead of time, and impractical to maintain as format diagnostics expand. Most/all of them can be controlled as an aggregate by the -Wformat and -Wformat=2 warning groups. This would be practical to check, but I think that the only facilities we have to check whether diagnostics are enabled are based on DiagIDs rather than groups.
For what it's worth, I'm less worried than you: when the format string is a function call, we already try to evaluate it. However, the result is discarded if the lvalue base is not a string literal. This PR expands the technique as a universal fallback, but I expect the expensive case to be function calls since I think that's the only way to get control flow (aside from expression statements).
I can try to improve this to avoid having to evaluate the string twice, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too worried about performance, but I can see where the concerns come from. It may be worth it to put a branch up on https://llvm-compile-time-tracker.com/ to verify we're not slowing things down too much, but I also don't think it's strictly required. (Checking diagnostic IDs to see if the check is disabled would be really awkward and I think we should avoid it in this case.)
@@ -17945,15 +17945,36 @@ bool Expr::tryEvaluateObjectSize(uint64_t &Result, ASTContext &Ctx, | |||
|
|||
static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result, | |||
EvalInfo &Info, std::string *StringResult) { | |||
if (!E->getType()->hasPointerRepresentation() || !E->isPRValue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better early exit would be:
if (!Ty->hasPointerRepresentation() && !Ty->canDecayToPointerType())
return false;
This would eliminate the need for the else { return false; }
below.
I think leaving the if (!E->isPRValue())
separate might be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh, I don't love checking the same thing twice. I reorganized things a little differently to avoid the else { return false }
branch.
@@ -10170,6 +10170,8 @@ def warn_format_bool_as_character : Warning< | |||
"using '%0' format specifier, but argument has boolean value">, | |||
InGroup<Format>; | |||
def note_format_string_defined : Note<"format string is defined here">; | |||
def note_format_string_evaluated_to : Note< | |||
"format string was constant-evaluated">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned that this wording will lead to further confusion over what constitutes "constant evaluation".
Firstly, this "constant evaluation" is not a constant evaluation required by the language (in terms of "when" constant evaluation is supposed to occur).
Secondly, as all constant evaluation required by the language occurs as-if in a manifestly constant-evaluated context, this "constant evaluation" does not match "how" constant evaluation required by the language would behave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is taken. Would you like to suggest an alternate wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is taken. Would you like to suggest an alternate wording?
"format string was computed, for diagnostic purposes, to"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noodled on it and I think it's a little awkward, but I understand why we're trying to stay away from wording with a standardized definition. How do you feel about "format string resolved to a constant string"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"computed format string is"?
"format string computed to"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this format awkward when it doesn't have the source immediately after. For instance, clang-tidy displays the note without the <scratch buffer> contents. I like the current wording because it feels complete even when that's missing. (I know the C++ diagnostics have lots of "found this candidate", "candidate ignored because ...", "in template instantiation requested here", etc, but I think that they are a necessary evil rather than the format to strive for.)
As I understand your concerns, the main problem with my wording is that it needs an adjective to qualify "string" with that is ideally something other than "constant" because it's not used in the standard sense.
"non-literal format string evaluated/resolved/computed at compile time"? Or do you see something else in that vein that would work? There is one other diagnostic that says "compile time constant expression" (and I'm OK dropping "constant expression" for all the reasons above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"non-literal format string evaluated/resolved/computed at compile time"
"computed format string (from non-literal) for diagnostic purposes"
or just
"computed format string for diagnostic purposes"?
printf(const_string().c_str(), "hello", 123); // no-warning | ||
printf(const_string().c_str(), 123, 456); // expected-warning {{format specifies type 'char *' but the argument has type 'int'}} | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test using if consteval
in a meaningful manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #135913 for other potential considerations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't see any new tests for these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tests for a simple case, but I'm not sure what counts as "meaningful". With that said, I'm hitting the same problem that Hubert reported (which shipping Clang currently exhibits, and that my change does not address): https://godbolt.org/z/zTfGfGvKj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the point @hubert-reinterpretcast was making is that this exacerbates an existing problem rather than introduces a new one. That's unfortunate, but perhaps we can live with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context of the expression being evaluated during computation for the format string determines whether we can get a false positive/negative if we insist on getting a format string computation result.
We know whether or not if consteval
should return true or false if the context is:
- in a manifestly constant-evaluated context or
- outside of a manifestly constant-evaluated context and not "in"
- a default member initializer, or
- a constexpr function or the default arguments thereof.
For such cases, I think we should get the correct format string computation.
For the other cases, I think (at least in the long term) we should (by default) fail the format string computation attempt when if consteval
is encountered.
I don't know how much format strings are actually exercised there, but don't forget to run this through the compile time tracker. |
clang/lib/AST/ExprConstant.cpp
Outdated
@@ -17945,15 +17945,36 @@ bool Expr::tryEvaluateObjectSize(uint64_t &Result, ASTContext &Ctx, | |||
|
|||
static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to rename that method to EvaluateCString
or something like that given how we use it
clang/test/Sema/format-strings.c
Outdated
@@ -3,6 +3,11 @@ | |||
// RUN: %clang_cc1 -fblocks -fsyntax-only -verify -Wformat-nonliteral -isystem %S/Inputs -triple=x86_64-unknown-fuchsia %s | |||
// RUN: %clang_cc1 -fblocks -fsyntax-only -verify -Wformat-nonliteral -isystem %S/Inputs -triple=x86_64-linux-android %s | |||
|
|||
// expected-note@-5{{format string was constant-evaluated}} | |||
// ^^^ there will be a <scratch space> SourceLocation caused by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like @shafik and @tbaederr - I have a slight concern with performance.
In general, we should avoid checking format strings when these diagnostics are not enabled. That would at least lead to less work in system headers.
Benchmarking sounds like a good idea.
But I don't have a better solution than using a scratch space.
clang/lib/Sema/SemaChecking.cpp
Outdated
@@ -5935,8 +5936,14 @@ static void CheckFormatString( | |||
llvm::SmallBitVector &CheckedVarArgs, UncoveredArgHandler &UncoveredArg, | |||
bool IgnoreStringsWithoutSpecifiers); | |||
|
|||
static const Expr *maybeConstEvalStringLiteral(ASTContext &Context, | |||
const Expr *E); | |||
enum StringLiteralConstEvalResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum StringLiteralConstEvalResult { | |
enum StringLiteralConstantEvaluationResult { |
clang/lib/Sema/SemaChecking.cpp
Outdated
enum StringLiteralConstEvalResult { | ||
SLCER_NotEvaluated, | ||
SLCER_NotNullTerminated, | ||
SLCER_Evaluated, | ||
}; | ||
|
||
static StringLiteralConstEvalResult | ||
constEvalStringAsLiteral(Sema &S, const Expr *E, const StringLiteral *&SL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum StringLiteralConstEvalResult { | |
SLCER_NotEvaluated, | |
SLCER_NotNullTerminated, | |
SLCER_Evaluated, | |
}; | |
static StringLiteralConstEvalResult | |
constEvalStringAsLiteral(Sema &S, const Expr *E, const StringLiteral *&SL); | |
enum StringLiteralConstEvalResult { | |
SLCER_NotEvaluated, | |
SLCER_NotNullTerminated, | |
SLCER_Evaluated, | |
}; | |
static StringLiteralConstEvalResult | |
EvaluateStringAndCreateLiteral(Sema &S, const Expr *E, const StringLiteral *&SL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really reflect we might create a scratch space here.
Also, can we comment this function?
clang/lib/Sema/SemaChecking.cpp
Outdated
static const Expr *maybeConstEvalStringLiteral(ASTContext &Context, | ||
const Expr *E) { | ||
static StringLiteralConstEvalResult | ||
constEvalStringAsLiteral(Sema &S, const Expr *E, const StringLiteral *&SL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constEvalStringAsLiteral(Sema &S, const Expr *E, const StringLiteral *&SL) { | |
EvaluateStringAndCreateLiteral(Sema &S, const Expr *E, const StringLiteral *&SL) { |
clang/lib/Sema/SemaChecking.cpp
Outdated
{ | ||
llvm::raw_svector_ostream OS(EscapedString); | ||
OS << '"'; | ||
OS.write_escaped(FormatString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not need to do any escaping here, the diagnostics engine should take care of that for you.
You probably want tests for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I really do need the escaping, otherwise format strings containing quotes, newlines and probably other characters will print incorrectly. For instance, the format string hello "%s"
will print as "hello "%s""
in the scratch space when it needs to be "hello \"%s\""
. Even if we found this to be acceptable, it would break the logic that figures out the source location of a specifier into the string literal (this has to be computed at the point there is a diagnostic to show because Clang doesn't keep source locations for individual characters in a string literal). Keep in mind that this is essentially synthesized source code, not text being piped into a diagnostic.
I don't know how to add a test for it because I don't know how to get clang -verify to surface the code in a <scratch space>
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My big concern here is what happens to applied fix-its when the fix is in the scratch space? Do we need to suppress the fix-its in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang uses fixits in diagnostics to show what should change, which I think is useful:
<scratch space>:1:8: note: format string was constant-evaluated
1 | "hello %s, %d"
| ^~
| %d
%d
here is displayed because of the fixit attached to the diagnostics.
I ran some simple tests and this is what I get:
- with
-fdiagnostics-parseable-fixits
, you do get a diagnostic entry that looks likefix-it:"<scratch space>":{1:4-1:6}:"%d"
- with
-Xclang -fix-what-you-can
, clang completes with no error - with
-Xclang -fix-what-you-can -Xclang -fixit-to-temporary
, no temporary file is created in.
or the source directory (which were different for the purposes of that test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang uses fixits in diagnostics to show what should change, which I think is useful:
<scratch space>:1:8: note: format string was constant-evaluated 1 | "hello %s, %d" | ^~ | %d
%d
here is displayed because of the fixit attached to the diagnostics.
It's useful information, but we have ways which try to automatically apply fixes and we need to make sure those behave reasonably.
I ran some simple tests and this is what I get:
* with `-fdiagnostics-parseable-fixits`, you do get a diagnostic entry that looks like `fix-it:"<scratch space>":{1:4-1:6}:"%d"`
That seems reasonable.
* with `-Xclang -fix-what-you-can`, clang completes with no error
That's good
* with `-Xclang -fix-what-you-can -Xclang -fixit-to-temporary`, no temporary file is created in `.` or the source directory (which were different for the purposes of that test)
I suppose that's reasonable. How about with -fixit
which tries to apply the fix to the source file? Similar question if you run the test via clang-tidy
and try to apply all fixes automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried with -fixit
and had the same result. For clang-tidy, I tried this:
% cat /tmp/test.c
__attribute__((format(printf, 1, 2)))
int printf(const char *, ...);
int main() {
const char buf[] = {'"', '%', 's', '"', 0};
printf(buf, 123);
printf("%s", 123);
}
% bin/clang-tidy --fix /tmp/test.c
....
2 warnings generated.
/tmp/test.c:6:17: warning: format specifies type 'char *' but the argument has type 'int' [clang-diagnostic-format]
6 | printf(buf, 123);
| ~~~ ^~~
note: format string resolved to a constant string
/tmp/test.c:7:18: warning: format specifies type 'char *' but the argument has type 'int' [clang-diagnostic-format]
7 | printf("%s", 123);
| ~~ ^~~
| %d
/tmp/test.c:7:13: note: FIX-IT applied suggested code changes
7 | printf("%s", 123);
| ^
clang-tidy applied 1 of 1 suggested fixes.
In words: it fixes the printf("%s", 123)
line to use %d
and leaves alone the other printf alone without throwing a fuss (claiming "clang-tidy applied 1 of 1 suggested fixes"). Clang-tidy shows the "format string resolved to a constant string" note but not the scratch space contents. It's not ideal, but it's quite reasonable IMO.
clang/test/Sema/format-strings.c
Outdated
@@ -3,6 +3,11 @@ | |||
// RUN: %clang_cc1 -fblocks -fsyntax-only -verify -Wformat-nonliteral -isystem %S/Inputs -triple=x86_64-unknown-fuchsia %s | |||
// RUN: %clang_cc1 -fblocks -fsyntax-only -verify -Wformat-nonliteral -isystem %S/Inputs -triple=x86_64-linux-android %s | |||
|
|||
// expected-note@-5{{format string was constant-evaluated}} | |||
// ^^^ there will be a <scratch space> SourceLocation caused by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too worried about performance, but I can see where the concerns come from. It may be worth it to put a branch up on https://llvm-compile-time-tracker.com/ to verify we're not slowing things down too much, but I also don't think it's strictly required. (Checking diagnostic IDs to see if the check is disabled would be really awkward and I think we should avoid it in this case.)
clang/docs/ReleaseNotes.rst
Outdated
@@ -265,6 +265,9 @@ related warnings within the method body. | |||
``format_matches`` accepts an example valid format string as its third | |||
argument. For more information, see the Clang attributes documentation. | |||
|
|||
- Format string checking now supports the compile-time evaluation of format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may help users to understand the improvement if there's a small code example showing what wasn't checked and is now correctly caught. WDYT?
clang/lib/Sema/SemaChecking.cpp
Outdated
{ | ||
llvm::raw_svector_ostream OS(EscapedString); | ||
OS << '"'; | ||
OS.write_escaped(FormatString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My big concern here is what happens to applied fix-its when the fix is in the scratch space? Do we need to suppress the fix-its in that case?
clang/lib/Sema/SemaChecking.cpp
Outdated
|
||
// Plop that string into a scratch buffer, create a string literal and then | ||
// go with that. | ||
auto ScratchFile = S.getSourceManager().createFileID(std::move(MemBuf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please spell out the type.
Several users of compile-time string evaluation can meaningfully use the special case that compile-time string evaluation resolves to a string literal in source (for instance, to improve diagnostics). This changes Expr::tryEvaluateString to return a StringEvalResult, which can hold either a string literal and an offset or a std::string of evaluated characters.
I have addressed current feedback, except that I still haven't checked for perf (will follow Aaron's directions on that soon) and I'm still working out what I need to check for |
(The broken tests are because I updated the note wording and then did not update the tests. We have ongoing conversation for that so I'll fix that when we have a resolution) |
printf(const_string().c_str(), "hello", 123); // no-warning | ||
printf(const_string().c_str(), 123, 456); // expected-warning {{format specifies type 'char *' but the argument has type 'int'}} | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't see any new tests for these comments.
The pointer auth options PR would also benefit from this - it currently has an ad hoc and restricted version of this as it needed to support strings produced by builtins, and predated @cor3ntin's string evaluation change so that was not even remotely an option at the time. I would much rather have us use a single string evaluation routine rather than the current behavior, but that would block the options PR on this one. |
I'd appreciate if you could fix the merge conflicts btw; I'd like to apply the patch locally and play around with it. The code changes look pretty reasonable to me, but I had some questions as to how stuff was handled. |
I asked on the forums and people were generally supportive of the idea, so:
Clang's -Wformat checker can see through an inconsistent set of operations. We can fall back to the recently-updated constant string evaluation infrastructure when Clang's initial evaluation fails for a second chance at figuring out what the format string is intended to be. This enables analyzing format strings that were built at compile-time with std::string and other constexpr-capable types in C++, as long as all pieces are also constexpr-visible, and a number of other patterns.
As a side effect, it also enables
tryEvaluateString
on char arrays (rather than only char pointers).Radar-ID: rdar://99940060