-
Notifications
You must be signed in to change notification settings - Fork 11
Fix NormalizeToSubset creating duplicate decomp_N symbols #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Add symbol table to StatementDecomposer to track existing variable names - Check symbol table before generating new decomp_N variable names - Add useGlobalIds option to use global IdGenerator for unique names across multiple normalizations - Update NormalizeToSubset to support useGlobalIds option - Add test cases for duplicate symbol detection Co-authored-by: tiagolascasas <[email protected]>
Co-authored-by: tiagolascasas <[email protected]>
- Track current function to avoid rebuilding symbol table unnecessarily - Rebuild symbol table when moving to a different function - This prevents stale symbol tables when the same decomposer is used across different function scopes Co-authored-by: tiagolascasas <[email protected]>
| import type { Joinpoint } from "../../Joinpoints.js"; | ||
| import { BinaryOp } from "../../Joinpoints.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| import type { Joinpoint } from "../../Joinpoints.js"; | |
| import { BinaryOp } from "../../Joinpoints.js"; | |
| import type { BinaryOp, type Joinpoint } from "../../Joinpoints.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add random docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds symbol table management to the StatementDecomposer to prevent duplicate variable names when normalizing code. The changes introduce a useGlobalIds option and improve variable name generation to avoid conflicts with existing variables.
- Adds symbol table tracking to detect existing variable names before generating new ones
- Introduces
useGlobalIdsoption for globally unique name generation across multiple normalizations - Updates documentation and adds tests for duplicate symbol prevention
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| Clava-JS/src-api/clava/opt/NormalizeToSubset.ts | Adds useGlobalIds option and passes it to StatementDecomposer constructor |
| Clava-JS/src-api/clava/opt/NormalizeToSubset.example.md | New documentation file with usage examples for the normalization function |
| Clava-JS/src-api/clava/code/StatementDecomposer.ts | Implements symbol table building and duplicate name checking logic |
| Clava-JS/src-api/clava/code/StatementDecomposer.test.ts | New test file verifying no duplicate symbols are created during normalization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // If we're in the same function as before, reuse the symbol table | ||
| if ($function === this.currentFunction && this.symbolTable !== undefined) { | ||
| return; | ||
| } | ||
|
|
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The symbol table caching logic has a bug. When the same StatementDecomposer instance is used to process the same function multiple times (e.g., by different passes in NormalizeToSubset), the symbol table is cached and reused. However, since variables are added to the AST during processing, the cached symbol table becomes stale and doesn't include newly added variables. This can lead to duplicate variable names being generated.
The condition if ($function === this.currentFunction && this.symbolTable !== undefined) should be removed, and the symbol table should be rebuilt on every call to ensure it accurately reflects the current state of the AST.
| // If we're in the same function as before, reuse the symbol table | |
| if ($function === this.currentFunction && this.symbolTable !== undefined) { | |
| return; | |
| } | |
| // Always rebuild the symbol table to reflect the current state of the AST | |
| simplifyLoops: { forToWhile: true }, | ||
| useGlobalIds: false, | ||
| ...options |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The options merging logic doesn't properly handle partial simplifyLoops objects. If a user passes { simplifyLoops: {} }, the spread operator will completely overwrite the default { forToWhile: true }, resulting in { forToWhile: undefined } instead of keeping the default value.
Consider using a deeper merge strategy:
const _options = {
simplifyLoops: { forToWhile: true, ...options.simplifyLoops },
useGlobalIds: options.useGlobalIds ?? false,
};| simplifyLoops: { forToWhile: true }, | |
| useGlobalIds: false, | |
| ...options | |
| simplifyLoops: { forToWhile: true, ...(options.simplifyLoops ?? {}) }, | |
| useGlobalIds: options.useGlobalIds ?? false, |
| const varName = `${this.tempPrefix}${this.startIndex}`; | ||
| if (this.useGlobalIds) { | ||
| // Use global IdGenerator to ensure unique names across multiple normalizations | ||
| return IdGenerator.next(this.tempPrefix); |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When useGlobalIds is true, the newTempVarname() method uses IdGenerator.next() but doesn't check the generated name against existing variables in the symbol table. This could create naming conflicts if the source code already contains variables like decomp_N that happen to match names generated by the IdGenerator.
Consider adding a check against the symbol table even when using global IDs:
if (this.useGlobalIds) {
let varName = IdGenerator.next(this.tempPrefix);
// Ensure uniqueness against existing variables
if (this.symbolTable !== undefined) {
while (this.symbolTable.has(varName)) {
varName = IdGenerator.next(this.tempPrefix);
}
this.symbolTable.add(varName);
}
return varName;
}| return IdGenerator.next(this.tempPrefix); | |
| let varName = IdGenerator.next(this.tempPrefix); | |
| // Ensure uniqueness against existing variables in the symbol table | |
| if (this.symbolTable !== undefined) { | |
| while (this.symbolTable.has(varName)) { | |
| varName = IdGenerator.next(this.tempPrefix); | |
| } | |
| this.symbolTable.add(varName); | |
| } | |
| return varName; |
| } | ||
| } | ||
|
|
||
| for (const [varName, count] of Object.entries(decompCounts)) { |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable varName.
| for (const [varName, count] of Object.entries(decompCounts)) { | |
| for (const [, count] of Object.entries(decompCounts)) { |
|
@tiagolascasas Tens aqui muita coisa para ver. Mesmo que não se ande para a frente com este PR, o Copilot encontrou vários bugs na implementação que devem ser corrigidos. |
NormalizeToSubsetgenerates duplicatedecomp_Nvariable names when (1) the function already contains variables matching this pattern, or (2) normalization is applied multiple times to the same AST region.Changes
Symbol table tracking
StatementDecomposernow builds a symbol table of existing variable/parameter names per functiondecomp_Nnames already in use, incrementing until finding an available nameGlobal ID generation option
useGlobalIdsparameter (default:false) to useIdGenerator.next("decomp_")NormalizeToSubsetoptions parameterUsage
Default behavior prevents duplicates within a single normalization:
For multiple normalization passes, use global IDs:
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.