Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[clang-tidy] Switch misc-confusable-identifiers check to a faster algorithm. #130369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

zygoloid
Copy link
Collaborator

@zygoloid zygoloid commented Mar 7, 2025

Optimizations:

  • Only build the skeleton for each identifier once, rather than once for each declaration of that identifier.
  • Only compute the contexts in which identifiers are declared for identifiers that have the same skeleton as another identifier in the translation unit.
  • Only compare pairs of declarations that are declared in related contexts, rather than comparing all pairs of declarations with the same skeleton.

Also simplify by removing the caching of enclosing DeclContext sets, because with the above changes we don't even compute the enclosing DeclContext sets in common cases. Instead, we terminate the traversal to enclosing DeclContexts immediately if we've already found another declaration in that context with the same identifier. (This optimization is not currently applied to the forallBases traversal, but could be applied there too if needed.)

This also fixes two bugs that together caused the check to fail to find some of the issues it was looking for:

  • The old check skipped comparisons of declarations from different contexts unless both declarations were type template parameters. This caused the checker to not warn on some instances of the CVE it is intended to detect.
  • The old check skipped comparisons of declarations in all base classes other than the first one found by the traversal. This appears to be an oversight, incorrectly returning false rather than true from the forallBases callback, which terminates traversal.

This also fixes an issue where the check would have false positives for template parameters and function parameters in some cases, because those parameters sometimes have a parent DeclContext that is the parent of the parameterized entity, or sometimes is the translation unit. In either case, this would cause warnings about declarations that are never visible together in any scope.

This decreases the runtime of this check, especially in the common case where there are few or no skeletons with two or more different identifiers. Running this check over LLVM, clang, and clang-tidy, the wall time for the check as reported by clang-tidy's internal profiler is reduced from 5202.86s to 3900.90s.

zygoloid added 2 commits March 7, 2025 23:19
- Only build the skeleton for each identifier once.
- Only compute the contexts in which identifiers are declared for
  identifiers that actually have collisions.
- Only compare pairs of declarations that are declared in related
  contexts, rather than comparing all pairs of declarations with the
  same skeleton.

This also fixes several bugs:

- The old check skipped comparisons of declarations from different
  contexts unless both declarations were type template parameters.
- The old check skipped comparisons of declarations in all base classes
  other than the first one found by the traversal.

This decreases the runtime of this check, especially in the common case
where there are few or no skeletons with two or more different
identifiers. From a sample invocation:

```
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.1556 (  1.4%)   0.1513 (  1.8%)   0.3069 (  1.5%)   0.3033 (  1.5%)  before
   0.0746 (  0.7%)   0.0751 (  0.8%)   0.1498 (  0.8%)   0.1491 (  0.8%)  after
```
@llvmbot
Copy link
Member

llvmbot commented Mar 7, 2025

@llvm/pr-subscribers-clang-tools-extra

@llvm/pr-subscribers-clang-tidy

Author: Richard Smith (zygoloid)

Changes

Optimizations:

  • Only build the skeleton for each identifier once, rather than once for each declaration of that identifier.
  • Only compute the contexts in which identifiers are declared for identifiers that have the same skeleton as another identifier in the translation unit.
  • Only compare pairs of declarations that are declared in related contexts, rather than comparing all pairs of declarations with the same skeleton.

Also simplify by removing the caching of enclosing DeclContext sets, because with the above changes we don't even compute the enclosing DeclContext sets in common cases. Instead, we terminate the traversal to enclosing DeclContexts immediately if we've already found another declaration in that context with the same identifier. (This optimization is not currently applied to the forallBases traversal, but could be applied there too if needed.)

This also fixes two bugs that together caused the check to fail to find the issues it was looking for in most cases:

  • The old check skipped comparisons of declarations from different contexts unless both declarations were type template parameters. It's unclear what purpose the checks here were intended to serve, but they caused the checker to not warn on instances of the CVE it is intended to detect.
  • The old check skipped comparisons of declarations in all base classes other than the first one found by the traversal. This appears to be an oversight, incorrectly returning false rather than true from the forallBases callback, which terminates traversal.

This decreases the runtime of this check, especially in the common case where there are few or no skeletons with two or more different identifiers. From a sample invocation:

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.1556 (  1.4%)   0.1513 (  1.8%)   0.3069 (  1.5%)   0.3033 (  1.5%)  before
   0.0746 (  0.7%)   0.0751 (  0.8%)   0.1498 (  0.8%)   0.1491 (  0.8%)  after

Full diff: https://github.com/llvm/llvm-project/pull/130369.diff

3 Files Affected:

  • (modified) clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp (+99-95)
  • (modified) clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.h (+5-19)
  • (modified) clang-tools-extra/test/clang-tidy/checkers/misc/confusable-identifiers.cpp (+31)
diff --git a/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp b/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp
index 6df565c9a9d69..70b948c0784db 100644
--- a/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp
+++ b/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp
@@ -1,5 +1,4 @@
-//===--- ConfusableIdentifierCheck.cpp -
-// clang-tidy--------------------------===//
+//===--- ConfusableIdentifierCheck.cpp - clang-tidy -----------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -89,90 +88,56 @@ static llvm::SmallString<64U> skeleton(StringRef Name) {
   return Skeleton;
 }
 
-static bool mayShadowImpl(const DeclContext *DC0, const DeclContext *DC1) {
-  return DC0 && DC0 == DC1;
-}
-
-static bool mayShadowImpl(const NamedDecl *ND0, const NamedDecl *ND1) {
-  return isa<TemplateTypeParmDecl>(ND0) || isa<TemplateTypeParmDecl>(ND1);
-}
-
-static bool isMemberOf(const ConfusableIdentifierCheck::ContextInfo *DC0,
-                       const ConfusableIdentifierCheck::ContextInfo *DC1) {
-  return llvm::is_contained(DC1->Bases, DC0->PrimaryContext);
-}
-
-static bool enclosesContext(const ConfusableIdentifierCheck::ContextInfo *DC0,
-                            const ConfusableIdentifierCheck::ContextInfo *DC1) {
-  if (DC0->PrimaryContext == DC1->PrimaryContext)
-    return true;
-
-  return llvm::is_contained(DC0->PrimaryContexts, DC1->PrimaryContext) ||
-         llvm::is_contained(DC1->PrimaryContexts, DC0->PrimaryContext);
+namespace {
+struct Entry {
+  const NamedDecl *ND;
+  bool FromDerivedClass;
+};
 }
 
-static bool mayShadow(const NamedDecl *ND0,
-                      const ConfusableIdentifierCheck::ContextInfo *DC0,
-                      const NamedDecl *ND1,
-                      const ConfusableIdentifierCheck::ContextInfo *DC1) {
-
-  if (!DC0->Bases.empty() && !DC1->Bases.empty()) {
-    // if any of the declaration is a non-private member of the other
-    // declaration, it's shadowed by the former
-
-    if (ND1->getAccess() != AS_private && isMemberOf(DC1, DC0))
-      return true;
-
-    if (ND0->getAccess() != AS_private && isMemberOf(DC0, DC1))
+using DeclsWithinContextMap =
+    llvm::DenseMap<const DeclContext *, llvm::SmallVector<Entry, 1>>;
+
+static bool addToContext(DeclsWithinContextMap &DeclsWithinContext,
+                         const DeclContext *DC, Entry E) {
+  auto &Decls = DeclsWithinContext[DC];
+  if (!Decls.empty() &&
+      Decls.back().ND->getIdentifier() == E.ND->getIdentifier()) {
+    // Already have a declaration with this identifier in this context. Don't
+    // track another one. This means that if an outer name is confusable with an
+    // inner name, we'll only diagnose the outer name once, pointing at the
+    // first inner declaration with that name.
+    if (Decls.back().FromDerivedClass && !E.FromDerivedClass) {
+      // Prefer the declaration that's not from the derived class, because that
+      // conflicts with more declarations.
+      Decls.back() = E;
       return true;
-  }
-
-  if (!mayShadowImpl(DC0->NonTransparentContext, DC1->NonTransparentContext) &&
-      !mayShadowImpl(ND0, ND1))
+    }
     return false;
-
-  return enclosesContext(DC0, DC1);
-}
-
-const ConfusableIdentifierCheck::ContextInfo *
-ConfusableIdentifierCheck::getContextInfo(const DeclContext *DC) {
-  const DeclContext *PrimaryContext = DC->getPrimaryContext();
-  auto [It, Inserted] = ContextInfos.try_emplace(PrimaryContext);
-  if (!Inserted)
-    return &It->second;
-
-  ContextInfo &Info = It->second;
-  Info.PrimaryContext = PrimaryContext;
-  Info.NonTransparentContext = PrimaryContext;
-
-  while (Info.NonTransparentContext->isTransparentContext()) {
-    Info.NonTransparentContext = Info.NonTransparentContext->getParent();
-    if (!Info.NonTransparentContext)
-      break;
   }
+  Decls.push_back(E);
+  return true;
+}
 
-  if (Info.NonTransparentContext)
-    Info.NonTransparentContext =
-        Info.NonTransparentContext->getPrimaryContext();
-
+static void addToEnclosingContexts(DeclsWithinContextMap &DeclsWithinContext,
+                                   const DeclContext *DC, const NamedDecl *ND) {
   while (DC) {
-    if (!isa<LinkageSpecDecl>(DC) && !isa<ExportDecl>(DC))
-      Info.PrimaryContexts.push_back(DC->getPrimaryContext());
-    DC = DC->getParent();
-  }
-
-  if (const auto *RD = dyn_cast<CXXRecordDecl>(PrimaryContext)) {
-    RD = RD->getDefinition();
-    if (RD) {
-      Info.Bases.push_back(RD);
-      RD->forallBases([&](const CXXRecordDecl *Base) {
-        Info.Bases.push_back(Base);
-        return false;
-      });
+    DC = DC->getNonTransparentContext()->getPrimaryContext();
+    if (!addToContext(DeclsWithinContext, DC, {ND, false}))
+      return;
+
+    if (const auto *RD = dyn_cast<CXXRecordDecl>(DC)) {
+      RD = RD->getDefinition();
+      if (RD) {
+        RD->forallBases([&](const CXXRecordDecl *Base) {
+          addToContext(DeclsWithinContext, Base, {ND, true});
+          return true;
+        });
+      }
     }
-  }
 
-  return &Info;
+    DC = DC->getParent();
+  }
 }
 
 void ConfusableIdentifierCheck::check(
@@ -181,7 +146,7 @@ void ConfusableIdentifierCheck::check(
   if (!ND)
     return;
 
-  IdentifierInfo *NDII = ND->getIdentifier();
+  const IdentifierInfo *NDII = ND->getIdentifier();
   if (!NDII)
     return;
 
@@ -189,29 +154,68 @@ void ConfusableIdentifierCheck::check(
   if (NDName.empty())
     return;
 
-  const ContextInfo *Info = getContextInfo(ND->getDeclContext());
+  NameToDecls[NDII].push_back(ND);
+}
 
-  llvm::SmallVector<Entry> &Mapped = Mapper[skeleton(NDName)];
-  for (const Entry &E : Mapped) {
-    if (!mayShadow(ND, Info, E.Declaration, E.Info))
-      continue;
+void ConfusableIdentifierCheck::onEndOfTranslationUnit() {
+  llvm::StringMap<llvm::SmallVector<const IdentifierInfo*, 1>> SkeletonToNames;
+  // Compute the skeleton for each identifier.
+  for (auto &[Ident, Decls] : NameToDecls) {
+    SkeletonToNames[skeleton(Ident->getName())].push_back(Ident);
+  }
 
-    const IdentifierInfo *ONDII = E.Declaration->getIdentifier();
-    StringRef ONDName = ONDII->getName();
-    if (ONDName == NDName)
+  // Visit each skeleton with more than one identifier.
+  for (auto &[Skel, Idents] : SkeletonToNames) {
+    if (Idents.size() < 2) {
       continue;
+    }
 
-    diag(ND->getLocation(), "%0 is confusable with %1") << ND << E.Declaration;
-    diag(E.Declaration->getLocation(), "other declaration found here",
-         DiagnosticIDs::Note);
-  }
+    // Find the declaration contexts that transitively contain each identifier.
+    DeclsWithinContextMap DeclsWithinContext;
+    for (const IdentifierInfo *II : Idents) {
+      for (const NamedDecl *ND : NameToDecls[II]) {
+        addToEnclosingContexts(DeclsWithinContext, ND->getDeclContext(), ND);
+      }
+    }
 
-  Mapped.push_back({ND, Info});
-}
+    // Check to see if any declaration is declared in a context that
+    // transitively contains another declaration with a different identifier but
+    // the same skeleton.
+    for (const IdentifierInfo *II : Idents) {
+      for (const NamedDecl *OuterND : NameToDecls[II]) {
+        const DeclContext *OuterDC = OuterND->getDeclContext()
+                                         ->getNonTransparentContext()
+                                         ->getPrimaryContext();
+        for (Entry Inner : DeclsWithinContext[OuterDC]) {
+          // Don't complain if the identifiers are the same.
+          if (OuterND->getIdentifier() == Inner.ND->getIdentifier())
+            continue;
+
+          // Don't complain about a derived-class name shadowing a base class
+          // private member.
+          if (OuterND->getAccess() == AS_private && Inner.FromDerivedClass)
+            continue;
+
+          // If the declarations are in the same context, only diagnose the
+          // later one.
+          if (OuterDC->Equals(
+                  Inner.ND->getDeclContext()->getNonTransparentContext()) &&
+              Inner.ND->getASTContext()
+                  .getSourceManager()
+                  .isBeforeInTranslationUnit(Inner.ND->getLocation(),
+                                             OuterND->getLocation()))
+            continue;
+
+          diag(Inner.ND->getLocation(), "%0 is confusable with %1")
+              << Inner.ND << OuterND;
+          diag(OuterND->getLocation(), "other declaration found here",
+                DiagnosticIDs::Note);
+        }
+      }
+    }
+  }
 
-void ConfusableIdentifierCheck::onEndOfTranslationUnit() {
-  Mapper.clear();
-  ContextInfos.clear();
+  NameToDecls.clear();
 }
 
 void ConfusableIdentifierCheck::registerMatchers(
diff --git a/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.h b/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.h
index f3b0c8ed00306..65669fb61961a 100644
--- a/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.h
+++ b/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.h
@@ -1,5 +1,4 @@
-//===--- ConfusableIdentifierCheck.h - clang-tidy
-//-------------------------------*- C++ -*-===//
+//===--- ConfusableIdentifierCheck.h - clang-tidy ---------------*- C++ -*-===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -11,7 +10,7 @@
 #define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_CONFUSABLE_IDENTIFIER_CHECK_H
 
 #include "../ClangTidyCheck.h"
-#include <unordered_map>
+#include "llvm/ADT/DenseMap.h"
 
 namespace clang::tidy::misc {
 
@@ -31,23 +30,10 @@ class ConfusableIdentifierCheck : public ClangTidyCheck {
     return TK_IgnoreUnlessSpelledInSource;
   }
 
-  struct ContextInfo {
-    const DeclContext *PrimaryContext;
-    const DeclContext *NonTransparentContext;
-    llvm::SmallVector<const DeclContext *> PrimaryContexts;
-    llvm::SmallVector<const CXXRecordDecl *> Bases;
-  };
-
 private:
-  struct Entry {
-    const NamedDecl *Declaration;
-    const ContextInfo *Info;
-  };
-
-  const ContextInfo *getContextInfo(const DeclContext *DC);
-
-  llvm::StringMap<llvm::SmallVector<Entry>> Mapper;
-  std::unordered_map<const DeclContext *, ContextInfo> ContextInfos;
+  llvm::DenseMap<const IdentifierInfo *,
+                 llvm::SmallVector<const NamedDecl *, 1>>
+      NameToDecls;
 };
 
 } // namespace clang::tidy::misc
diff --git a/clang-tools-extra/test/clang-tidy/checkers/misc/confusable-identifiers.cpp b/clang-tools-extra/test/clang-tidy/checkers/misc/confusable-identifiers.cpp
index cdfed7edb431d..acaf39973961d 100644
--- a/clang-tools-extra/test/clang-tidy/checkers/misc/confusable-identifiers.cpp
+++ b/clang-tools-extra/test/clang-tidy/checkers/misc/confusable-identifiers.cpp
@@ -74,6 +74,19 @@ template <typename t1, typename tl>
 // CHECK-MESSAGES: :[[#@LINE-2]]:20: note: other declaration found here
 void f9();
 
+namespace f10 {
+int il;
+namespace inner {
+  int i1;
+  // CHECK-MESSAGES: :[[#@LINE-1]]:7: warning: 'i1' is confusable with 'il' [misc-confusable-identifiers]
+  // CHECK-MESSAGES: :[[#@LINE-4]]:5: note: other declaration found here
+  int j1;
+  // CHECK-MESSAGES: :[[#@LINE-1]]:7: warning: 'j1' is confusable with 'jl' [misc-confusable-identifiers]
+  // CHECK-MESSAGES: :[[#@LINE+2]]:5: note: other declaration found here
+}
+int jl;
+}
+
 struct Base0 {
   virtual void mO0();
 
@@ -103,3 +116,21 @@ struct Derived1 : Base1 {
 
   long mI1(); // no warning: mII is private
 };
+
+struct Base2 {
+  long nO0;
+
+private:
+  long nII;
+};
+
+struct Mid2 : Base0, Base1, Base2 {
+};
+
+struct Derived2 : Mid2 {
+  long nOO;
+  // CHECK-MESSAGES: :[[#@LINE-1]]:8: warning: 'nOO' is confusable with 'nO0' [misc-confusable-identifiers]
+  // CHECK-MESSAGES: :[[#@LINE-12]]:8: note: other declaration found here
+
+  long nI1(); // no warning: mII is private
+};

Copy link

github-actions bot commented Mar 7, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 1b75b9e665ee3c43de85c25f8d5f10d4efb3ca39 b7accb2fa7e66afaa81c5ad00bfe855f4b32a9c4 --extensions h,cpp -- clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.h clang-tools-extra/test/clang-tidy/checkers/misc/confusable-identifiers.cpp
View the diff from clang-format here.
diff --git a/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp b/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp
index 6793135b51..cfdcc62dad 100644
--- a/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp
+++ b/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.cpp
@@ -127,8 +127,7 @@ static bool addToContext(DeclsWithinContextMap &DeclsWithinContext,
 }
 
 static void addToEnclosingContexts(DeclsWithinContextMap &DeclsWithinContext,
-                                   const Decl *Parent,
-                                   const NamedDecl *ND) {
+                                   const Decl *Parent, const NamedDecl *ND) {
   const Decl *Outer = Parent;
   while (Outer) {
     if (const auto *NS = dyn_cast<NamespaceDecl>(Outer))
@@ -160,8 +159,8 @@ void ConfusableIdentifierCheck::check(
   if (!ND)
     return;
 
-  addDeclToCheck(ND, cast<Decl>(ND->getDeclContext()
-                                    ->getNonTransparentContext()));
+  addDeclToCheck(ND,
+                 cast<Decl>(ND->getDeclContext()->getNonTransparentContext()));
 
   // Associate template parameters with this declaration of this template.
   if (const auto *TD = dyn_cast<TemplateDecl>(ND)) {
@@ -255,10 +254,9 @@ void ConfusableIdentifierCheck::registerMatchers(
       ast_matchers::parmVarDecl(), ast_matchers::templateTypeParmDecl(),
       ast_matchers::nonTypeTemplateParmDecl(),
       ast_matchers::templateTemplateParmDecl());
-  Finder->addMatcher(
-      ast_matchers::namedDecl(ast_matchers::unless(AnyParamDecl))
-          .bind("nameddecl"),
-      this);
+  Finder->addMatcher(ast_matchers::namedDecl(ast_matchers::unless(AnyParamDecl))
+                         .bind("nameddecl"),
+                     this);
 }
 
 } // namespace clang::tidy::misc

@PiotrZSL
Copy link
Member

Could you run this check on for example llvm code base, and capture times before and after using build-in clang-tidy profiler ?

@zygoloid
Copy link
Collaborator Author

Could you run this check on for example llvm code base, and capture times before and after using build-in clang-tidy profiler ?

Done and added results to commit message.

Wall time 5202.86s -> 3900.90s.
User time 2336.68s -> 1384.04s.
Sys time 2833.5s -> 2476.65s.

@PiotrZSL
Copy link
Member

Perfect, do one more test, compare findings before and after from for example llvm, if they catch same thing. As for a change, I will try to review it this week.

@zygoloid
Copy link
Collaborator Author

Looking at the warnings across LLVM I found and fixed a few bugs. With that done:

Warnings before: https://gist.github.com/zygoloid/34584fae8789977d2032bd98549aefc8
Warnings after: https://gist.github.com/zygoloid/c6b625251f7acc594f65a93675b1016a

I turned off header filtering to get the full set of diagnostics here. Note that the before warnings have a lot of false positives, which unfortunately get repeated a lot due to the lack of filtering.

@zygoloid
Copy link
Collaborator Author

With default filtering enabled for run-clang-tidy.py (no warnings in header files, sadly):

Before: https://gist.github.com/zygoloid/e06c7d6cb7309a5e3d38e9b2b4e45659
After: https://gist.github.com/zygoloid/11a04866a71513285b35c0cd5848953b

@zygoloid
Copy link
Collaborator Author

I've done some point comparisons between the old and new output. I found:

  • Some cases the old warning warns on and the new one does not: the ones I looked at were all false positives that are now suppressed. For example, the old implementation produced warnings on conflicts between a template parameter and a declaration in an unrelated scope.
  • Some cases the new warning warns on and the old one does not: the ones I looked at were all true positives that are now diagnosed.
  • Some cases where the old warning produced more warnings than the new one, for the same name: I think these are OK; the new approach will only warn once in cases where the new warning would produce multiple diagnostics for all pairs of bad declarations.

@zygoloid

This comment was marked as outdated.

2 similar comments
@zygoloid

This comment was marked as outdated.

@zygoloid

This comment was marked as outdated.

@zygoloid

This comment was marked as outdated.

@zygoloid

This comment was marked as outdated.

@zygoloid
Copy link
Collaborator Author

Ping x6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants