Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

Copy link

Copilot AI commented Nov 5, 2025

NormalizeToSubset generates duplicate decomp_N variable names when (1) the function already contains variables matching this pattern, or (2) normalization is applied multiple times to the same AST region.

Changes

Symbol table tracking

  • StatementDecomposer now builds a symbol table of existing variable/parameter names per function
  • Skips decomp_N names already in use, incrementing until finding an available name
  • Symbol table is cached per function and rebuilt when crossing function boundaries

Global ID generation option

  • Added useGlobalIds parameter (default: false) to use IdGenerator.next("decomp_")
  • Ensures unique names across multiple normalization passes without symbol table lookups
  • Exposed via NormalizeToSubset options parameter

Usage

Default behavior prevents duplicates within a single normalization:

NormalizeToSubset(functionJp);  // Avoids existing decomp_N variables

For multiple normalization passes, use global IDs:

NormalizeToSubset(functionJp, { useGlobalIds: true });
NormalizeToSubset(functionJp, { useGlobalIds: true });  // No conflicts
Original prompt

This section details on the original issue you should resolve

<issue_title>NormalizeToSubset is creating duplicated symbols</issue_title>
<issue_description>NormalizeToSubset, whenever it needs to create a new variable, assigns names based on the pattern "decomp_N", where N is the Nth variable created for that function.

The problem, however, is that if the function already contains variables that follow this pattern, e.g., decomp_0, the transformation will create another variable called decomp_0, as it doesn't check for clashing names. I think the fix here is two-fold:

  1. Create a symbol table of variable decls and params of the function before normalizing, and skip generating a decomp_N symbol if they already exist in the table;
  2. As this problem arises predominantly (but not always) from applying normalization multiple times over the same AST region, I suggest adding an option to use a global N, instead of having it be local to the function. That way, the transform will never generate two clashing decomp_N symbols, no matter how many times it's called over any part of the AST. Using IdGenerator.next("decomp_") would take care of this.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits November 5, 2025 02:47
- Add symbol table to StatementDecomposer to track existing variable names
- Check symbol table before generating new decomp_N variable names
- Add useGlobalIds option to use global IdGenerator for unique names across multiple normalizations
- Update NormalizeToSubset to support useGlobalIds option
- Add test cases for duplicate symbol detection

Co-authored-by: tiagolascasas <[email protected]>
- Track current function to avoid rebuilding symbol table unnecessarily
- Rebuild symbol table when moving to a different function
- This prevents stale symbol tables when the same decomposer is used across different function scopes

Co-authored-by: tiagolascasas <[email protected]>
Copilot AI changed the title [WIP] Fix duplicated symbols in NormalizeToSubset Fix NormalizeToSubset creating duplicate decomp_N symbols Nov 5, 2025
Copilot AI requested a review from tiagolascasas November 5, 2025 02:54
Comment on lines +2 to +3
import type { Joinpoint } from "../../Joinpoints.js";
import { BinaryOp } from "../../Joinpoints.js";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import type { Joinpoint } from "../../Joinpoints.js";
import { BinaryOp } from "../../Joinpoints.js";
import type { BinaryOp, type Joinpoint } from "../../Joinpoints.js";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add random docs.

@lm-sousa lm-sousa requested a review from Copilot November 6, 2025 14:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds symbol table management to the StatementDecomposer to prevent duplicate variable names when normalizing code. The changes introduce a useGlobalIds option and improve variable name generation to avoid conflicts with existing variables.

  • Adds symbol table tracking to detect existing variable names before generating new ones
  • Introduces useGlobalIds option for globally unique name generation across multiple normalizations
  • Updates documentation and adds tests for duplicate symbol prevention

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
Clava-JS/src-api/clava/opt/NormalizeToSubset.ts Adds useGlobalIds option and passes it to StatementDecomposer constructor
Clava-JS/src-api/clava/opt/NormalizeToSubset.example.md New documentation file with usage examples for the normalization function
Clava-JS/src-api/clava/code/StatementDecomposer.ts Implements symbol table building and duplicate name checking logic
Clava-JS/src-api/clava/code/StatementDecomposer.test.ts New test file verifying no duplicate symbols are created during normalization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +63 to +67
// If we're in the same function as before, reuse the symbol table
if ($function === this.currentFunction && this.symbolTable !== undefined) {
return;
}

Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symbol table caching logic has a bug. When the same StatementDecomposer instance is used to process the same function multiple times (e.g., by different passes in NormalizeToSubset), the symbol table is cached and reused. However, since variables are added to the AST during processing, the cached symbol table becomes stale and doesn't include newly added variables. This can lead to duplicate variable names being generated.

The condition if ($function === this.currentFunction && this.symbolTable !== undefined) should be removed, and the symbol table should be rebuilt on every call to ensure it accurately reflects the current state of the AST.

Suggested change
// If we're in the same function as before, reuse the symbol table
if ($function === this.currentFunction && this.symbolTable !== undefined) {
return;
}
// Always rebuild the symbol table to reflect the current state of the AST

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +26
simplifyLoops: { forToWhile: true },
useGlobalIds: false,
...options
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options merging logic doesn't properly handle partial simplifyLoops objects. If a user passes { simplifyLoops: {} }, the spread operator will completely overwrite the default { forToWhile: true }, resulting in { forToWhile: undefined } instead of keeping the default value.

Consider using a deeper merge strategy:

const _options = {
  simplifyLoops: { forToWhile: true, ...options.simplifyLoops },
  useGlobalIds: options.useGlobalIds ?? false,
};
Suggested change
simplifyLoops: { forToWhile: true },
useGlobalIds: false,
...options
simplifyLoops: { forToWhile: true, ...(options.simplifyLoops ?? {}) },
useGlobalIds: options.useGlobalIds ?? false,

Copilot uses AI. Check for mistakes.
const varName = `${this.tempPrefix}${this.startIndex}`;
if (this.useGlobalIds) {
// Use global IdGenerator to ensure unique names across multiple normalizations
return IdGenerator.next(this.tempPrefix);
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When useGlobalIds is true, the newTempVarname() method uses IdGenerator.next() but doesn't check the generated name against existing variables in the symbol table. This could create naming conflicts if the source code already contains variables like decomp_N that happen to match names generated by the IdGenerator.

Consider adding a check against the symbol table even when using global IDs:

if (this.useGlobalIds) {
  let varName = IdGenerator.next(this.tempPrefix);
  // Ensure uniqueness against existing variables
  if (this.symbolTable !== undefined) {
    while (this.symbolTable.has(varName)) {
      varName = IdGenerator.next(this.tempPrefix);
    }
    this.symbolTable.add(varName);
  }
  return varName;
}
Suggested change
return IdGenerator.next(this.tempPrefix);
let varName = IdGenerator.next(this.tempPrefix);
// Ensure uniqueness against existing variables in the symbol table
if (this.symbolTable !== undefined) {
while (this.symbolTable.has(varName)) {
varName = IdGenerator.next(this.tempPrefix);
}
this.symbolTable.add(varName);
}
return varName;

Copilot uses AI. Check for mistakes.
}
}

for (const [varName, count] of Object.entries(decompCounts)) {
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable varName.

Suggested change
for (const [varName, count] of Object.entries(decompCounts)) {
for (const [, count] of Object.entries(decompCounts)) {

Copilot uses AI. Check for mistakes.
@lm-sousa
Copy link
Member

lm-sousa commented Nov 7, 2025

@tiagolascasas Tens aqui muita coisa para ver. Mesmo que não se ande para a frente com este PR, o Copilot encontrou vários bugs na implementação que devem ser corrigidos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NormalizeToSubset is creating duplicated symbols

3 participants