-
Notifications
You must be signed in to change notification settings - Fork 52
🐛 normalize paths in cache #864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Pranav Gaikwad <[email protected]>
WalkthroughReplaces the analyzer's internal map with a new public IncidentsCache and CacheValue, adds a map-backed, path-normalizing cache implementation, updates analyzer code to use the new cache API and initializes it in NewPipeAnalyzer, and changes Server.Accept to panic if NewPipeAnalyzer fails. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Server
participant PipeAnalyzer
Client->>Server: Accept()
Server->>Server: NewPipeAnalyzer(...)
alt NewPipeAnalyzer fails (changed)
Server->>Server: panic(err)
Server--x Client: process terminates
else Success
Server->>PipeAnalyzer: Start/Handle requests
PipeAnalyzer-->>Server: Responses
Server-->>Client: Responses
end
sequenceDiagram
autonumber
participant Analyzer
participant IncidentsCache
participant RulesetBuilder
rect rgba(220,240,255,0.4)
note right of Analyzer: Add incidents (analysis)
Analyzer->>IncidentsCache: Add(path, CacheValue)
end
rect rgba(240,255,220,0.4)
note right of Analyzer: Remove resolved incidents
Analyzer->>IncidentsCache: Delete(path)
end
rect rgba(255,240,220,0.4)
note right of Analyzer: Build rulesets from cached entries
Analyzer->>IncidentsCache: Entries()
IncidentsCache-->>Analyzer: map[path][]CacheValue
Analyzer->>RulesetBuilder: Build from CacheValue fields
RulesetBuilder-->>Analyzer: Rulesets
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Pre-merge checks (3 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. ✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
kai_analyzer_rpc/pkg/service/cache.go (2)
72-74: Entries exposes internal map allowing external mutation.Returning the backing map risks accidental modification outside cache’s lock discipline.
Return a shallow copy:
-func (i *incidentsCache) Entries() map[string][]CacheValue { - return i.cache -} +func (i *incidentsCache) Entries() map[string][]CacheValue { + out := make(map[string][]CacheValue, len(i.cache)) + for k, v := range i.cache { + cpy := make([]CacheValue, len(v)) + copy(cpy, v) + out[k] = cpy + } + return out +}
64-70: Keys() is unreachable from consumers (not in the interface).
NewIncidentsCachereturnsIncidentsCache, which lacksKeys(). Either expose it via the interface or remove to avoid dead API.If unused, remove:
-func (i *incidentsCache) Keys() []string { - keys := make([]string, 0, len(i.cache)) - for k := range i.cache { - keys = append(keys, k) - } - return keys -}kai_analyzer_rpc/pkg/service/analyzer.go (3)
266-268: Fix logr key/value usage.
Inforequires key/value pairs; passing justa.cache.Len()results in malformed logs.- a.Logger.Info("Current cache len", a.cache.Len()) + a.Logger.Info("Current cache len", "len", a.cache.Len())
351-369: Reduce noisy debug logging and avoid logging large incident payloads at info level.Logging full incidents can be heavy; move to high verbosity or summarize.
- a.Logger.Info("here update cache incident", "incident", i) + a.Logger.V(8).Info("caching incident", "uri", i.URI, "rule", violationName)
413-419: Optional: return rulesets in stable order.Maps iterate in random order; sorting improves determinism across runs.
r := []konveyor.RuleSet{} for _, ruleset := range ruleSetMap { r = append(r, ruleset) } - a.Logger.V(8).Info("ruleset from cache", "rulesets", r) + sort.SliceStable(r, func(i, j int) bool { return r[i].Name < r[j].Name }) + a.Logger.V(8).Info("ruleset from cache", "rulesets", r) return r
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
kai_analyzer_rpc/pkg/rpc/server.go(1 hunks)kai_analyzer_rpc/pkg/service/analyzer.go(8 hunks)kai_analyzer_rpc/pkg/service/cache.go(1 hunks)kai_analyzer_rpc/pkg/service/pipe_analyzer.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1)
kai_analyzer_rpc/pkg/service/cache.go (1)
NewIncidentsCache(26-31)
kai_analyzer_rpc/pkg/service/analyzer.go (1)
kai_analyzer_rpc/pkg/service/cache.go (3)
IncidentsCache(11-17)NewIncidentsCache(26-31)CacheValue(19-24)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Trunk Check Runner
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (3)
kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1)
82-83: Cache initialization switch LGTM.Using
NewIncidentsCache(l)aligns this constructor with the new cache abstraction.kai_analyzer_rpc/pkg/service/analyzer.go (2)
63-64: Type switch to IncidentsCache LGTM.The abstraction improves encapsulation and enables path normalization in one place.
178-179: Cache construction LGTM.Creating via
NewIncidentsCache(log)ensures consistent normalization and logging.
| func (a *analyzer) invalidateCachePerFile(paths []string) { | ||
| for _, p := range paths { | ||
| a.Logger.Info("deleting cache entry for path", "path", p) | ||
| delete(a.cache, p) | ||
| a.cache.Delete(p) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Normalize invalidate paths to analyzer root before deletion.
If clients send relative paths, deletion may miss absolute keys. Normalize to absolute (relative to a.location) before Delete.
func (a *analyzer) invalidateCachePerFile(paths []string) {
for _, p := range paths {
- a.Logger.Info("deleting cache entry for path", "path", p)
- a.cache.Delete(p)
+ norm := p
+ if !filepath.IsAbs(norm) {
+ norm = filepath.Join(a.location, norm)
+ }
+ norm = filepath.Clean(norm)
+ a.Logger.Info("deleting cache entry for path", "path", p, "normalized", norm)
+ a.cache.Delete(norm)
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (a *analyzer) invalidateCachePerFile(paths []string) { | |
| for _, p := range paths { | |
| a.Logger.Info("deleting cache entry for path", "path", p) | |
| delete(a.cache, p) | |
| a.cache.Delete(p) | |
| } | |
| } | |
| func (a *analyzer) invalidateCachePerFile(paths []string) { | |
| for _, p := range paths { | |
| norm := p | |
| if !filepath.IsAbs(norm) { | |
| norm = filepath.Join(a.location, norm) | |
| } | |
| norm = filepath.Clean(norm) | |
| a.Logger.Info("deleting cache entry for path", "path", p, "normalized", norm) | |
| a.cache.Delete(norm) | |
| } | |
| } |
🤖 Prompt for AI Agents
In kai_analyzer_rpc/pkg/service/analyzer.go around lines 374–379, normalize any
relative paths to the analyzer root before deleting cache entries: for each
input path, if it is not absolute use filepath.Join(a.location, path) (or
filepath.Abs if you prefer) then filepath.Clean the result to produce the
normalized key, log both original and normalized path, and call
a.cache.Delete(normalizedPath); if resolving to absolute can fail, handle/log
the error and skip deletion for that entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai this is not needed as we never get filepaths outside workspace from the analyzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pranavgaikwad, thank you for the clarification! I understand that the analyzer ensures all file paths are consistently within the workspace, making the normalization step unnecessary.
🐰
◕ ◕
( ◡ ) curious about system internals
_____/
✏️ Learnings added
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.571Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Signed-off-by: Pranav Gaikwad <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
kai_analyzer_rpc/pkg/service/cache.go (1)
76-84: Normalize Windows drive letters consistently and fix the misleading commentThe comment says “lowercase” but the code uppercases the volume; also make the normalization deterministic on Windows drives. This directly impacts cache key equality across runs on Windows.
Apply:
@@ -import ( - "path/filepath" - "strings" - - "github.com/go-logr/logr" - "github.com/konveyor/analyzer-lsp/output/v1/konveyor" -) +import ( + "path/filepath" + "runtime" + "strings" + + "github.com/go-logr/logr" + "github.com/konveyor/analyzer-lsp/output/v1/konveyor" +) @@ -func normalizePath(path string) string { - cleanedPath := filepath.Clean(path) - volumeName := filepath.VolumeName(cleanedPath) - // make sure all volume names are lowercase - if volumeName != "" { - cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):] - } - return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem +func normalizePath(p string) string { + cleaned := filepath.Clean(p) + vol := filepath.VolumeName(cleaned) + // Normalize Windows drive letters to lowercase for stable keys. + if vol != "" && runtime.GOOS == "windows" { + cleaned = strings.ToLower(vol) + cleaned[len(vol):] + } + // Normalize separators to forward slashes. + return filepath.ToSlash(cleaned) }Also applies to: 3-9
🧹 Nitpick comments (2)
kai_analyzer_rpc/pkg/service/cache.go (2)
72-74: Avoid leaking internal state from Entries()Returning the map exposes internal slices to mutation by callers. Return a defensive copy.
func (i *incidentsCache) Entries() map[string][]CacheValue { - return i.cache + out := make(map[string][]CacheValue, len(i.cache)) + for k, v := range i.cache { + cp := make([]CacheValue, len(v)) + copy(cp, v) + out[k] = cp + } + return out }
11-17: Align Keys() with the interface (or make it private)Constructor returns IncidentsCache; Keys() isn’t on the interface, so consumers can’t use it without type assertions.
Option A (expose via interface):
type IncidentsCache interface { Get(path string) ([]CacheValue, bool) Add(path string, value CacheValue) Delete(path string) Len() int Entries() map[string][]CacheValue + Keys() []string }Option B (make it private if internal-only):
-func (i *incidentsCache) Keys() []string { +func (i *incidentsCache) keys() []string {Also applies to: 64-70
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
kai_analyzer_rpc/pkg/service/cache.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (1)
kai_analyzer_rpc/pkg/service/cache.go (1)
42-47: Action: confirm concurrent access to incidentsCache — either document single-threaded use or add synchronizationRepository search didn't find callers/constructor for incidentsCache, so concurrent use couldn't be verified. If the cache can be accessed from multiple goroutines, protect the map with a sync.RWMutex and return copies from Get; otherwise document the single-threaded assumption.
Location: kai_analyzer_rpc/pkg/service/cache.go — methods: Len, Get, Add, Delete.
Suggested changes (minimal):
@@ -import ( +import ( "path/filepath" + "sync" @@ type incidentsCache struct { cache map[string][]CacheValue logger logr.Logger + mu sync.RWMutex } @@ func (i *incidentsCache) Len() int { - return len(i.cache) + i.mu.RLock() + defer i.mu.RUnlock() + return len(i.cache) } @@ func (i *incidentsCache) Get(path string) ([]CacheValue, bool) { normalizedPath := normalizePath(path) i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath) - val, ok := i.cache[normalizedPath] - return val, ok + i.mu.RLock() + defer i.mu.RUnlock() + val, ok := i.cache[normalizedPath] + if !ok { + return nil, false + } + cp := make([]CacheValue, len(val)) + copy(cp, val) + return cp, true } @@ func (i *incidentsCache) Add(path string, value CacheValue) { normalizedPath := normalizePath(path) i.logger.V(8).Info("adding cache entry for path", "path", path, "normalizedPath", normalizedPath) - if _, ok := i.cache[normalizedPath]; !ok { - i.cache[normalizedPath] = []CacheValue{} - } - i.cache[normalizedPath] = append(i.cache[normalizedPath], value) + i.mu.Lock() + defer i.mu.Unlock() + i.cache[normalizedPath] = append(i.cache[normalizedPath], value) } @@ func (i *incidentsCache) Delete(path string) { normalizedPath := normalizePath(path) i.logger.V(8).Info("deleting cache entry for path", "path", path, "normalizedPath", normalizedPath) - delete(i.cache, normalizedPath) + i.mu.Lock() + defer i.mu.Unlock() + delete(i.cache, normalizedPath) }
Fixes #800 Depends on konveyor/kai#864 <!-- ## PR Title Prefix Every **PR Title** should be prefixed with :text: to indicate its type. - Breaking change:⚠️ (`⚠️ `) - Non-breaking feature: ✨ (`✨`) - Patch fix: 🐛 (`🐛`) - Docs: 📖 (`📖`) - Infra/Tests/Other: 🌱 (`🌱`) - No release note: 👻 (`👻`) For example, a pull request containing breaking changes might look like `⚠️ My pull request contains breaking changes`. Since GitHub supports emoji aliases (ie. `👻`), there is no need to include the emoji directly in the PR title -- **please use the alias**. It used to be the case that projects using emojis for PR typing had to include the emoji directly because GitHub didn't render the alias. Given that `⚠️ ` is easy enough to read as text, easy to parse in release tooling, and rendered in GitHub well, we prefer to standardize on the alias. For more information, please see the Konveyor [Versioning Doc](https://github.com/konveyor/release-tools/blob/main/VERSIONING.md). --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved file path normalization for analysis, ensuring included and excluded paths are matched consistently across platforms. * Fixed Windows-specific issues by normalizing paths and drive-letter casing, reducing missed or wrongly included files. * More reliable adherence to ignore settings in workspaces. * Reduced sporadic analysis discrepancies after file moves or renames caused by inconsistent path formats. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Pranav Gaikwad <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking that we should consider a mutex or something so the cache can be used across multiple connections to the server and requests for the same project
| return keys | ||
| } | ||
|
|
||
| func (i *incidentsCache) Entries() map[string][]CacheValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we do end up adding a sync, then we probably want to do something here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now maknig sure that a clone is returned and not the original map
Signed-off-by: Pranav Gaikwad <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
kai_analyzer_rpc/pkg/service/cache.go (2)
83-94: Good: Entries() deep-clones to avoid sharing internal state.This addresses prior feedback about synchronization and exposure. Nice.
35-39: Concurrency approach looks good; mutex guards the map.Addresses earlier concern about concurrent access to the cache.
🧹 Nitpick comments (2)
kai_analyzer_rpc/pkg/service/cache.go (2)
27-33: Nit: zero-value RWMutex is ready; explicit init is unnecessary.Small cleanup; no behavior change.
func NewIncidentsCache(logger logr.Logger) IncidentsCache { return &incidentsCache{ cache: map[string][]CacheValue{}, logger: logger, - mutex: sync.RWMutex{}, } }
73-81: Optional: sort Keys() for deterministic iteration/logs.Useful for stable tests and diagnostics.
+// add: import "sort" func (i *incidentsCache) Keys() []string { i.mutex.RLock() defer i.mutex.RUnlock() keys := make([]string, 0, len(i.cache)) for k := range i.cache { keys = append(keys, k) } - return keys + sort.Strings(keys) + return keys }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
kai_analyzer_rpc/pkg/service/analyzer.go(6 hunks)kai_analyzer_rpc/pkg/service/cache.go(1 hunks)kai_analyzer_rpc/pkg/service/pipe_analyzer.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- kai_analyzer_rpc/pkg/service/pipe_analyzer.go
- kai_analyzer_rpc/pkg/service/analyzer.go
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.
📚 Learning: 2025-09-10T22:52:55.595Z
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.
Applied to files:
kai_analyzer_rpc/pkg/service/cache.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
| func (i *incidentsCache) Get(path string) ([]CacheValue, bool) { | ||
| i.mutex.RLock() | ||
| defer i.mutex.RUnlock() | ||
| normalizedPath := normalizePath(path) | ||
| i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath) | ||
| val, ok := i.cache[normalizedPath] | ||
| return val, ok | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Return a cloned slice from Get() to avoid external mutation/aliasing.
Exposing the internal slice risks accidental mutation and races. Clone like Entries() does.
func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {
i.mutex.RLock()
defer i.mutex.RUnlock()
normalizedPath := normalizePath(path)
i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)
- val, ok := i.cache[normalizedPath]
- return val, ok
+ val, ok := i.cache[normalizedPath]
+ if !ok {
+ return nil, false
+ }
+ out := make([]CacheValue, len(val))
+ copy(out, val)
+ return out, true
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (i *incidentsCache) Get(path string) ([]CacheValue, bool) { | |
| i.mutex.RLock() | |
| defer i.mutex.RUnlock() | |
| normalizedPath := normalizePath(path) | |
| i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath) | |
| val, ok := i.cache[normalizedPath] | |
| return val, ok | |
| } | |
| func (i *incidentsCache) Get(path string) ([]CacheValue, bool) { | |
| i.mutex.RLock() | |
| defer i.mutex.RUnlock() | |
| normalizedPath := normalizePath(path) | |
| i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath) | |
| val, ok := i.cache[normalizedPath] | |
| if !ok { | |
| return nil, false | |
| } | |
| out := make([]CacheValue, len(val)) | |
| copy(out, val) | |
| return out, true | |
| } |
🤖 Prompt for AI Agents
In kai_analyzer_rpc/pkg/service/cache.go around lines 45 to 52, Get currently
returns the internal slice from the cache which allows callers to mutate shared
state; change Get to return a shallow-cloned slice copy (e.g., allocate a new
slice with len(val) and copy(val, newSlice)) before returning so callers receive
an independent slice; preserve the existing ok boolean, handle nil/zero-length
slices (return nil or empty slice consistently) and keep the RLock/RUnlock as-is
to protect access while copying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the computer might be correct here, the list will be a pointer as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shawn-hurley You're absolutely right! The slice in Go is indeed a reference type, so returning it directly from the cache exposes the internal data structure to potential mutation by callers.
The CodeRabbit suggestion to clone the slice before returning is the correct approach here - it prevents external code from accidentally (or intentionally) modifying the cached data, which could lead to data races and cache integrity issues.
The suggested implementation follows the same defensive copying pattern already used in the Entries() method, making the codebase consistent in its approach to preventing aliasing of cached data.
This is a good catch for maintaining thread safety and data integrity in the cache implementation.
🧠 Learnings used
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.
| func normalizePath(path string) string { | ||
| cleanedPath := filepath.Clean(path) | ||
| volumeName := filepath.VolumeName(cleanedPath) | ||
| // make sure all volume names are uppercase | ||
| if volumeName != "" { | ||
| cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):] | ||
| } | ||
| return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Make normalizer cross-OS and handle empty input; canonicalize Windows drive letter even on non-Windows.
When the server runs on Linux but receives Windows-style paths (from the VS Code extension on Windows), filepath.VolumeName("") won’t detect the drive. Canonicalize drive letter by pattern post-ToSlash; also avoid turning "" into ".".
-func normalizePath(path string) string {
- cleanedPath := filepath.Clean(path)
- volumeName := filepath.VolumeName(cleanedPath)
- // make sure all volume names are uppercase
- if volumeName != "" {
- cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]
- }
- return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem
-}
+func normalizePath(p string) string {
+ if p == "" {
+ return ""
+ }
+ s := filepath.ToSlash(filepath.Clean(p))
+ // If path looks like "C:/..." (Windows-style), normalize drive letter case on all OSes.
+ if len(s) >= 2 && s[1] == ':' {
+ s = strings.ToUpper(s[:1]) + s[1:]
+ }
+ return s
+}Given the linked issue was observed on Windows 11 with a likely Linux analyzer, please confirm you see identical keys for inputs like:
- "c:\repo\src\Foo.java" vs "C:\repo\src\Foo.java"
- "//server/share/dir/file.java" vs "\\server\share\dir\file.java"
If you’d like, I can add a small table-driven test for normalizePath to lock this in.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "PWD: $(pwd)"
echo "---- find occurrences of normalizePath ----"
rg -n --hidden -S '\bfunc\s+normalizePath\b|\bnormalizePath\b' || true
echo "---- show kai_analyzer_rpc/pkg/service/cache.go if present ----"
if [ -f "kai_analyzer_rpc/pkg/service/cache.go" ]; then
sed -n '1,300p' kai_analyzer_rpc/pkg/service/cache.go || true
else
echo "File not found at kai_analyzer_rpc/pkg/service/cache.go"
echo "Searching for cache.go files..."
rg -n --hidden -S 'cache.go' || true
fi
echo "---- search repo for VolumeName, ToSlash, filepath.Clean ----"
rg -n --hidden -S 'filepath\.VolumeName|VolumeName\(|filepath\.ToSlash|ToSlash\(|filepath\.Clean' || true
echo "---- show analyzer.go if present ----"
if [ -f "kai_analyzer_rpc/pkg/service/analyzer.go" ]; then
sed -n '1,400p' kai_analyzer_rpc/pkg/service/analyzer.go || true
else
echo "analyzer.go not found at expected path; searching for analyzer.go..."
rg -n --hidden -S 'analyzer.go' || true
fi
echo "---- search for other normalization helpers (Normalize/NormalizePath/normalpath) ----"
rg -n --hidden -S '\bNormalize\b|\bNormalizePath\b|\bnormalpath\b' || true
echo "---- list files in kai_analyzer_rpc/pkg/service (if exists) ----"
ls -la kai_analyzer_rpc/pkg/service || trueLength of output: 17520
Make normalizePath cross-OS and handle empty input (kai_analyzer_rpc/pkg/service/cache.go:96-104)
filepath.VolumeName is Windows-only — on non-Windows a Windows-style input like "c:\repo\..." won't have its drive uppercased and filepath.Clean("") returns "."; canonicalize after ToSlash and return "" for empty input to avoid cache-key mismatches.
-func normalizePath(path string) string {
- cleanedPath := filepath.Clean(path)
- volumeName := filepath.VolumeName(cleanedPath)
- // make sure all volume names are uppercase
- if volumeName != "" {
- cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]
- }
- return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem
-}
+func normalizePath(p string) string {
+ if p == "" {
+ return ""
+ }
+ s := filepath.ToSlash(filepath.Clean(p))
+ // If path looks like "C:/..." (Windows-style), normalize drive letter case on all OSes.
+ if len(s) >= 2 && s[1] == ':' {
+ s = strings.ToUpper(s[:1]) + s[1:]
+ }
+ return s
+}Saved learning used: analyzer is reported to receive workspace-formatted paths; if that invariant is guaranteed, skip this change — otherwise apply the patch and add table-driven tests for:
- "c:\repo\src\Foo.java" vs "C:\repo\src\Foo.java"
- "\\server\share\dir\file.java" vs "//server/share/dir/file.java"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func normalizePath(path string) string { | |
| cleanedPath := filepath.Clean(path) | |
| volumeName := filepath.VolumeName(cleanedPath) | |
| // make sure all volume names are uppercase | |
| if volumeName != "" { | |
| cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):] | |
| } | |
| return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem | |
| } | |
| func normalizePath(p string) string { | |
| if p == "" { | |
| return "" | |
| } | |
| s := filepath.ToSlash(filepath.Clean(p)) | |
| // If path looks like "C:/..." (Windows-style), normalize drive letter case on all OSes. | |
| if len(s) >= 2 && s[1] == ':' { | |
| s = strings.ToUpper(s[:1]) + s[1:] | |
| } | |
| return s | |
| } |
🤖 Prompt for AI Agents
In kai_analyzer_rpc/pkg/service/cache.go around lines 96 to 104, normalizePath
must be made cross-OS and handle empty input: first convert path separators with
filepath.ToSlash, return "" if the resulting string is empty or "."; then
canonicalize Windows drives and UNC paths by detecting Windows-style patterns on
any OS (e.g. regex for drive letter like ^[a-zA-Z]:/ and UNC like
^//[^/]+/[^/]+) and uppercase only the drive letter (for "c:/..." → "C:/..."),
preserving leading double slashes for UNC; finally return the canonical ToSlash
result. Add table-driven tests for the cases "c:\\repo\\src\\Foo.java" vs
"C:\\repo\\src\\Foo.java" and "\\\\server\\share\\dir\\file.java" vs
"//server/share/dir/file.java".
Signed-off-by: Pranav Gaikwad <[email protected]>
Signed-off-by: Pranav Gaikwad <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
kai_analyzer_rpc/pkg/service/analyzer.go (1)
367-371: No analyzer-side path normalization needed here.Sticking to raw
pis consistent with earlier discussion that analyzer only receives workspace paths; the cache layer now normalizes. This resolves the prior suggestion about normalizing ininvalidateCachePerFile.
🧹 Nitpick comments (5)
kai_analyzer_rpc/pkg/service/analyzer.go (5)
264-265: Fix structured logging key/value usage.
logrexpects even key/value pairs with string keys. This call passes only a value.Apply:
- a.Logger.Info("Current cache len", a.cache.Len()) + a.Logger.Info("Current cache len", "len", a.cache.Len())
299-300: Reduce noisy high-volume logging.Dumping full rulesets at V(8) can be very large; logging the count is usually enough.
Apply:
- a.Logger.V(8).Info("got rulesets", "rulesets", rulesets) + a.Logger.V(8).Info("got rulesets", "count", len(rulesets))
333-338: Minor: simplify nil-check.
updateCacheis only called whenlen(args.IncludedPaths) > 0; theincludedPaths != nilguard is redundant.Apply:
- if includedPaths != nil { - a.invalidateCachePerFile(includedPaths) - } + a.invalidateCachePerFile(includedPaths)
374-409: Return deterministic, stable ordering from cache results.Entries() already returns a cloned snapshot (kai_analyzer_rpc/pkg/service/cache.go: Entries), so apply the sort/alloc/log change only.
- r := []konveyor.RuleSet{} + r := make([]konveyor.RuleSet, 0, len(ruleSetMap)) for _, ruleset := range ruleSetMap { r = append(r, ruleset) } - a.Logger.V(8).Info("ruleset from cache", "rulesets", r) + sort.SliceStable(r, func(i, j int) bool { return r[i].Name < r[j].Name }) + a.Logger.V(8).Info("ruleset from cache", "count", len(r)) return r
344-362: Demote noisy log and confirm cache-key semantics.File: kai_analyzer_rpc/pkg/service/analyzer.go (lines 344–362)
- Replace the noisy info log with a verbose path-only log:
- a.Logger.Info("here update cache incident", "incident", i) + a.Logger.V(8).Info("updating cache incident", "path", i.URI.Filename())
- Verified: kai_analyzer_rpc/pkg/service/cache.go::normalizePath calls filepath.Clean, uppercases VolumeName (Windows) and filepath.ToSlash; Add/Get/Delete normalize keys internally — using i.URI.Filename() as the cache key is acceptable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
kai_analyzer_rpc/pkg/service/analyzer.go(6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.
📚 Learning: 2025-09-10T22:52:55.595Z
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.
Applied to files:
kai_analyzer_rpc/pkg/service/analyzer.go
🧬 Code graph analysis (1)
kai_analyzer_rpc/pkg/service/analyzer.go (1)
kai_analyzer_rpc/pkg/service/cache.go (3)
IncidentsCache(12-18)NewIncidentsCache(27-33)CacheValue(20-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
- GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
- GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (3)
kai_analyzer_rpc/pkg/service/analyzer.go (3)
63-64: Good swap to a dedicated cache interface.Moving from a map to
IncidentsCacheimproves encapsulation, concurrency safety, and (per PR) enables path normalization at the cache boundary.
177-178: Proper cache initialization.Initializing with
NewIncidentsCache(log)ensures a clean cache per analyzer instance and leverages internal normalization.
328-331: Full-run cache reset looks correct.Rebuilding via
setCacheavoids stale entries and aligns with the fix for incidents lingering post-partial analysis.
Fixes #863
Summary by CodeRabbit
Refactor
Bug Fixes