🐛 normalize paths in cache #864

pranavgaikwad · 2025-09-10T20:01:21Z

Fixes #863

Summary by CodeRabbit

Refactor
- Added a unified in-memory incidents cache with standardized add/delete/list operations.
- Normalized file paths in cache handling for consistent incident aggregation across platforms.
- Analyzer updated to use the new cache and now constructs rulesets from cached entries.
- Improved logging around cache size and ruleset construction.
Bug Fixes
- Service now fails fast on analyzer initialization errors to avoid partial startup.

Signed-off-by: Pranav Gaikwad <[email protected]>

coderabbitai · 2025-09-10T20:01:30Z

Walkthrough

Replaces the analyzer's internal map with a new public IncidentsCache and CacheValue, adds a map-backed, path-normalizing cache implementation, updates analyzer code to use the new cache API and initializes it in NewPipeAnalyzer, and changes Server.Accept to panic if NewPipeAnalyzer fails.

Changes

Cohort / File(s)	Summary
Incidents cache implementation `kai_analyzer_rpc/pkg/service/cache.go`	Adds `IncidentsCache` interface and `CacheValue` struct; implements `NewIncidentsCache(logger)` returning a map-backed `incidentsCache` with `Get`/`Add`/`Delete`/`Entries`/`Len` (and `Keys`), path normalization (`normalizePath`), concurrency control, and verbose logging.
Analyzer refactor to use cache API `kai_analyzer_rpc/pkg/service/analyzer.go`	Replaces internal `map[string][]cacheValue` with `IncidentsCache`; removes private `cacheValue` type; introduces and uses public `CacheValue`; updates initialization, add/delete, iteration (`Entries`/`Len`), and ruleset construction to use the new cache API and fields; adjusts logging.
Pipe analyzer initialization `kai_analyzer_rpc/pkg/service/pipe_analyzer.go`	In `NewPipeAnalyzer`, initialize analyzer `cache` via `NewIncidentsCache(l)` instead of a map literal; removes separate cache mutex initialization.
Server initialization error handling `kai_analyzer_rpc/pkg/rpc/server.go`	In `Server.Accept`, on `NewPipeAnalyzer` failure the code now panics (`panic(err)`) instead of returning an error.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Server
  participant PipeAnalyzer
  Client->>Server: Accept()
  Server->>Server: NewPipeAnalyzer(...)
  alt NewPipeAnalyzer fails (changed)
    Server->>Server: panic(err)
    Server--x Client: process terminates
  else Success
    Server->>PipeAnalyzer: Start/Handle requests
    PipeAnalyzer-->>Server: Responses
    Server-->>Client: Responses
  end

sequenceDiagram
  autonumber
  participant Analyzer
  participant IncidentsCache
  participant RulesetBuilder
  rect rgba(220,240,255,0.4)
    note right of Analyzer: Add incidents (analysis)
    Analyzer->>IncidentsCache: Add(path, CacheValue)
  end
  rect rgba(240,255,220,0.4)
    note right of Analyzer: Remove resolved incidents
    Analyzer->>IncidentsCache: Delete(path)
  end
  rect rgba(255,240,220,0.4)
    note right of Analyzer: Build rulesets from cached entries
    Analyzer->>IncidentsCache: Entries()
    IncidentsCache-->>Analyzer: map[path][]CacheValue
    Analyzer->>RulesetBuilder: Build from CacheValue fields
    RulesetBuilder-->>Analyzer: Rulesets
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks (3 passed, 2 warnings)

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes Check	⚠️ Warning	The PR includes changes that appear unrelated to the linked issue: server.go's Server.Accept now panics on analyzer initialization failure which changes startup error handling semantics and is not required to fix cache path normalization, and several internal types were made public (IncidentsCache, CacheValue, NewIncidentsCache) which expand the public API surface beyond the bug fix. These modifications increase risk and should be separated or justified as part of the same change.	Revert or move the server.go panic change and any unrelated API-export changes into a separate PR, or provide a short rationale and add compatibility tests; limit this PR to the cache normalization and add tests that demonstrate the fix for issue #863.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title directly describes the primary change — normalizing cache paths — and matches the modifications in cache.go and analyzer.go that introduce path normalization and a normalized incidents cache; it is short and focused. However the title contains an emoji which is unnecessary noise for PR history. It is not vague or unrelated to the changeset.
Linked Issues Check	✅ Passed	The changes implement a normalized incidents cache (normalizePath) and replace the raw map with an IncidentsCache API used by the analyzer so that cache keys are canonicalized and deleted via normalized paths; this addresses the root cause described in issue #863 where path/casing differences prevented cache invalidation during partial analysis. The normalization logic (filepath.Clean, uppercasing volume name, ToSlash) specifically targets Windows-style path differences referenced in the issue. Based on the provided summaries, the code-level changes align with and should resolve the linked issue's objective.

Poem

I nibble at paths, tidy each track,
Backslashes to slashes — no crumbs come back.
New cache bins store each incident bright,
Resolved ones vanish, rulesets reunite.
If startup trips, I thump — panic in sight! 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

kai_analyzer_rpc/pkg/service/cache.go (2)
72-74: Entries exposes internal map allowing external mutation.

Returning the backing map risks accidental modification outside cache’s lock discipline.

Return a shallow copy:
-func (i *incidentsCache) Entries() map[string][]CacheValue {
-	return i.cache
-}
+func (i *incidentsCache) Entries() map[string][]CacheValue {
+	out := make(map[string][]CacheValue, len(i.cache))
+	for k, v := range i.cache {
+		cpy := make([]CacheValue, len(v))
+		copy(cpy, v)
+		out[k] = cpy
+	}
+	return out
+}
64-70: Keys() is unreachable from consumers (not in the interface).

NewIncidentsCache returns IncidentsCache, which lacks Keys(). Either expose it via the interface or remove to avoid dead API.

If unused, remove:
-func (i *incidentsCache) Keys() []string {
-	keys := make([]string, 0, len(i.cache))
-	for k := range i.cache {
-		keys = append(keys, k)
-	}
-	return keys
-}
kai_analyzer_rpc/pkg/service/analyzer.go (3)
266-268: Fix logr key/value usage.

Info requires key/value pairs; passing just a.cache.Len() results in malformed logs.
-		a.Logger.Info("Current cache len", a.cache.Len())
+		a.Logger.Info("Current cache len", "len", a.cache.Len())
351-369: Reduce noisy debug logging and avoid logging large incident payloads at info level.

Logging full incidents can be heavy; move to high verbosity or summarize.
-				a.Logger.Info("here update cache incident", "incident", i)
+				a.Logger.V(8).Info("caching incident", "uri", i.URI, "rule", violationName)
413-419: Optional: return rulesets in stable order.

Maps iterate in random order; sorting improves determinism across runs.
 	r := []konveyor.RuleSet{}
 	for _, ruleset := range ruleSetMap {
 		r = append(r, ruleset)
 	}
-	a.Logger.V(8).Info("ruleset from cache", "rulesets", r)
+	sort.SliceStable(r, func(i, j int) bool { return r[i].Name < r[j].Name })
+	a.Logger.V(8).Info("ruleset from cache", "rulesets", r)
 	return r

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 830ead0 and d5279c7.

📒 Files selected for processing (4)

kai_analyzer_rpc/pkg/rpc/server.go (1 hunks)
kai_analyzer_rpc/pkg/service/analyzer.go (8 hunks)
kai_analyzer_rpc/pkg/service/cache.go (1 hunks)
kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1)

kai_analyzer_rpc/pkg/service/cache.go (1)

NewIncidentsCache (26-31)

kai_analyzer_rpc/pkg/service/analyzer.go (1)

kai_analyzer_rpc/pkg/service/cache.go (3)

IncidentsCache (11-17)

NewIncidentsCache (26-31)

CacheValue (19-24)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Trunk Check Runner
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

🔇 Additional comments (3)

kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1)

82-83: Cache initialization switch LGTM.

Using NewIncidentsCache(l) aligns this constructor with the new cache abstraction.

kai_analyzer_rpc/pkg/service/analyzer.go (2)

63-64: Type switch to IncidentsCache LGTM.

The abstraction improves encapsulation and enables path normalization in one place.

178-179: Cache construction LGTM.

Creating via NewIncidentsCache(log) ensures consistent normalization and logging.

kai_analyzer_rpc/pkg/rpc/server.go

coderabbitai · 2025-09-10T20:10:30Z

kai_analyzer_rpc/pkg/service/analyzer.go

 func (a *analyzer) invalidateCachePerFile(paths []string) {
 	for _, p := range paths {
 		a.Logger.Info("deleting cache entry for path", "path", p)
-		delete(a.cache, p)
+		a.cache.Delete(p)
 	}
 }


🛠️ Refactor suggestion

Normalize invalidate paths to analyzer root before deletion.

If clients send relative paths, deletion may miss absolute keys. Normalize to absolute (relative to a.location) before Delete.

func (a *analyzer) invalidateCachePerFile(paths []string) { for _, p := range paths { - a.Logger.Info("deleting cache entry for path", "path", p) - a.cache.Delete(p) + norm := p + if !filepath.IsAbs(norm) { + norm = filepath.Join(a.location, norm) + } + norm = filepath.Clean(norm) + a.Logger.Info("deleting cache entry for path", "path", p, "normalized", norm) + a.cache.Delete(norm) } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (a *analyzer) invalidateCachePerFile(paths []string) {

for _, p := range paths {

a.Logger.Info("deleting cache entry for path", "path", p)

delete(a.cache, p)

a.cache.Delete(p)

}

}

func (a *analyzer) invalidateCachePerFile(paths []string) {

for _, p := range paths {

norm := p

if !filepath.IsAbs(norm) {

norm = filepath.Join(a.location, norm)

}

norm = filepath.Clean(norm)

a.Logger.Info("deleting cache entry for path", "path", p, "normalized", norm)

a.cache.Delete(norm)

}

}

🤖 Prompt for AI Agents

In kai_analyzer_rpc/pkg/service/analyzer.go around lines 374–379, normalize any relative paths to the analyzer root before deleting cache entries: for each input path, if it is not absolute use filepath.Join(a.location, path) (or filepath.Abs if you prefer) then filepath.Clean the result to produce the normalized key, log both original and normalized path, and call a.cache.Delete(normalizedPath); if resolving to absolute can fail, handle/log the error and skip deletion for that entry.

@coderabbitai this is not needed as we never get filepaths outside workspace from the analyzer

@pranavgaikwad, thank you for the clarification! I understand that the analyzer ensures all file paths are consistently within the workspace, making the normalization step unnecessary.

🐰
◕ ◕
( ◡ ) curious about system internals
_____/

✏️ Learnings added

Learnt from: pranavgaikwad PR: konveyor/kai#864 File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379 Timestamp: 2025-09-10T22:52:55.571Z Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

kai_analyzer_rpc/pkg/service/cache.go

Signed-off-by: Pranav Gaikwad <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

kai_analyzer_rpc/pkg/service/cache.go (1)

76-84: Normalize Windows drive letters consistently and fix the misleading comment

The comment says “lowercase” but the code uppercases the volume; also make the normalization deterministic on Windows drives. This directly impacts cache key equality across runs on Windows.

Apply:

@@
-import (
-	"path/filepath"
-	"strings"
-
-	"github.com/go-logr/logr"
-	"github.com/konveyor/analyzer-lsp/output/v1/konveyor"
-)
+import (
+	"path/filepath"
+	"runtime"
+	"strings"
+
+	"github.com/go-logr/logr"
+	"github.com/konveyor/analyzer-lsp/output/v1/konveyor"
+)
@@
-func normalizePath(path string) string {
-	cleanedPath := filepath.Clean(path)
-	volumeName := filepath.VolumeName(cleanedPath)
-	// make sure all volume names are lowercase
-	if volumeName != "" {
-		cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]
-	}
-	return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem
+func normalizePath(p string) string {
+	cleaned := filepath.Clean(p)
+	vol := filepath.VolumeName(cleaned)
+	// Normalize Windows drive letters to lowercase for stable keys.
+	if vol != "" && runtime.GOOS == "windows" {
+		cleaned = strings.ToLower(vol) + cleaned[len(vol):]
+	}
+	// Normalize separators to forward slashes.
+	return filepath.ToSlash(cleaned)
 }

Also applies to: 3-9

🧹 Nitpick comments (2)

kai_analyzer_rpc/pkg/service/cache.go (2)
72-74: Avoid leaking internal state from Entries()

Returning the map exposes internal slices to mutation by callers. Return a defensive copy.
 func (i *incidentsCache) Entries() map[string][]CacheValue {
-	return i.cache
+	out := make(map[string][]CacheValue, len(i.cache))
+	for k, v := range i.cache {
+		cp := make([]CacheValue, len(v))
+		copy(cp, v)
+		out[k] = cp
+	}
+	return out
 }
11-17: Align Keys() with the interface (or make it private)

Constructor returns IncidentsCache; Keys() isn’t on the interface, so consumers can’t use it without type assertions.

Option A (expose via interface):
 type IncidentsCache interface {
 	Get(path string) ([]CacheValue, bool)
 	Add(path string, value CacheValue)
 	Delete(path string)
 	Len() int
 	Entries() map[string][]CacheValue
+	Keys() []string
 }
Option B (make it private if internal-only):
-func (i *incidentsCache) Keys() []string {
+func (i *incidentsCache) keys() []string {
Also applies to: 64-70

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5279c7 and 7893237.

📒 Files selected for processing (1)

kai_analyzer_rpc/pkg/service/cache.go (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

🔇 Additional comments (1)

kai_analyzer_rpc/pkg/service/cache.go (1)

42-47: Action: confirm concurrent access to incidentsCache — either document single-threaded use or add synchronization

Repository search didn't find callers/constructor for incidentsCache, so concurrent use couldn't be verified. If the cache can be accessed from multiple goroutines, protect the map with a sync.RWMutex and return copies from Get; otherwise document the single-threaded assumption.

Location: kai_analyzer_rpc/pkg/service/cache.go — methods: Len, Get, Add, Delete.

Suggested changes (minimal):

@@
-import (
+import (
 	"path/filepath"
+	"sync"
@@
 type incidentsCache struct {
 	cache  map[string][]CacheValue
 	logger logr.Logger
+	mu     sync.RWMutex
 }
@@
 func (i *incidentsCache) Len() int {
-	return len(i.cache)
+	i.mu.RLock()
+	defer i.mu.RUnlock()
+	return len(i.cache)
 }
@@
 func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {
 	normalizedPath := normalizePath(path)
 	i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)
-	val, ok := i.cache[normalizedPath]
-	return val, ok
+	i.mu.RLock()
+	defer i.mu.RUnlock()
+	val, ok := i.cache[normalizedPath]
+	if !ok {
+		return nil, false
+	}
+	cp := make([]CacheValue, len(val))
+	copy(cp, val)
+	return cp, true
 }
@@
 func (i *incidentsCache) Add(path string, value CacheValue) {
 	normalizedPath := normalizePath(path)
 	i.logger.V(8).Info("adding cache entry for path", "path", path, "normalizedPath", normalizedPath)
-	if _, ok := i.cache[normalizedPath]; !ok {
-		i.cache[normalizedPath] = []CacheValue{}
-	}
-	i.cache[normalizedPath] = append(i.cache[normalizedPath], value)
+	i.mu.Lock()
+	defer i.mu.Unlock()
+	i.cache[normalizedPath] = append(i.cache[normalizedPath], value)
 }
@@
 func (i *incidentsCache) Delete(path string) {
 	normalizedPath := normalizePath(path)
 	i.logger.V(8).Info("deleting cache entry for path", "path", path, "normalizedPath", normalizedPath)
-	delete(i.cache, normalizedPath)
+	i.mu.Lock()
+	defer i.mu.Unlock()
+	delete(i.cache, normalizedPath)
 }

Fixes #800 Depends on konveyor/kai#864   ## Summary by CodeRabbit * **Bug Fixes** * Improved file path normalization for analysis, ensuring included and excluded paths are matched consistently across platforms. * Fixed Windows-specific issues by normalizing paths and drive-letter casing, reducing missed or wrongly included files. * More reliable adherence to ignore settings in workspaces. * Reduced sporadic analysis discrepancies after file moves or renames caused by inconsistent path formats.  Signed-off-by: Pranav Gaikwad <[email protected]>

shawn-hurley

I am thinking that we should consider a mutex or something so the cache can be used across multiple connections to the server and requests for the same project

kai_analyzer_rpc/pkg/service/cache.go

shawn-hurley · 2025-09-11T13:47:16Z

kai_analyzer_rpc/pkg/service/cache.go

+	return keys
+}
+
+func (i *incidentsCache) Entries() map[string][]CacheValue {


if we do end up adding a sync, then we probably want to do something here

now maknig sure that a clone is returned and not the original map

Signed-off-by: Pranav Gaikwad <[email protected]>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

kai_analyzer_rpc/pkg/service/cache.go (2)

83-94: Good: Entries() deep-clones to avoid sharing internal state.

This addresses prior feedback about synchronization and exposure. Nice.

35-39: Concurrency approach looks good; mutex guards the map.

Addresses earlier concern about concurrent access to the cache.

🧹 Nitpick comments (2)

kai_analyzer_rpc/pkg/service/cache.go (2)
27-33: Nit: zero-value RWMutex is ready; explicit init is unnecessary.

Small cleanup; no behavior change.
 func NewIncidentsCache(logger logr.Logger) IncidentsCache {
   return &incidentsCache{
     cache:  map[string][]CacheValue{},
     logger: logger,
-    mutex:  sync.RWMutex{},
   }
 }
73-81: Optional: sort Keys() for deterministic iteration/logs.

Useful for stable tests and diagnostics.
+// add: import "sort"
 func (i *incidentsCache) Keys() []string {
   i.mutex.RLock()
   defer i.mutex.RUnlock()
   keys := make([]string, 0, len(i.cache))
   for k := range i.cache {
     keys = append(keys, k)
   }
-  return keys
+  sort.Strings(keys)
+  return keys
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7893237 and 07a2e1c.

📒 Files selected for processing (3)

kai_analyzer_rpc/pkg/service/analyzer.go (6 hunks)
kai_analyzer_rpc/pkg/service/cache.go (1 hunks)
kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

kai_analyzer_rpc/pkg/service/pipe_analyzer.go
kai_analyzer_rpc/pkg/service/analyzer.go

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

📚 Learning: 2025-09-10T22:52:55.595Z

Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

Applied to files:

kai_analyzer_rpc/pkg/service/cache.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

kai_analyzer_rpc/pkg/service/cache.go

coderabbitai · 2025-09-11T19:57:05Z

kai_analyzer_rpc/pkg/service/cache.go

+func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {
+	i.mutex.RLock()
+	defer i.mutex.RUnlock()
+	normalizedPath := normalizePath(path)
+	i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)
+	val, ok := i.cache[normalizedPath]
+	return val, ok
+}


🛠️ Refactor suggestion

Return a cloned slice from Get() to avoid external mutation/aliasing.

Exposing the internal slice risks accidental mutation and races. Clone like Entries() does.

func (i *incidentsCache) Get(path string) ([]CacheValue, bool) { i.mutex.RLock() defer i.mutex.RUnlock() normalizedPath := normalizePath(path) i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath) - val, ok := i.cache[normalizedPath] - return val, ok + val, ok := i.cache[normalizedPath] + if !ok { + return nil, false + } + out := make([]CacheValue, len(val)) + copy(out, val) + return out, true }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {

i.mutex.RLock()

defer i.mutex.RUnlock()

normalizedPath := normalizePath(path)

i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)

val, ok := i.cache[normalizedPath]

return val, ok

}

func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {

i.mutex.RLock()

defer i.mutex.RUnlock()

normalizedPath := normalizePath(path)

i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)

val, ok := i.cache[normalizedPath]

if !ok {

return nil, false

}

out := make([]CacheValue, len(val))

copy(out, val)

return out, true

}

🤖 Prompt for AI Agents

In kai_analyzer_rpc/pkg/service/cache.go around lines 45 to 52, Get currently returns the internal slice from the cache which allows callers to mutate shared state; change Get to return a shallow-cloned slice copy (e.g., allocate a new slice with len(val) and copy(val, newSlice)) before returning so callers receive an independent slice; preserve the existing ok boolean, handle nil/zero-length slices (return nil or empty slice consistently) and keep the RLock/RUnlock as-is to protect access while copying.

I think that the computer might be correct here, the list will be a pointer as well

@shawn-hurley You're absolutely right! The slice in Go is indeed a reference type, so returning it directly from the cache exposes the internal data structure to potential mutation by callers.

The CodeRabbit suggestion to clone the slice before returning is the correct approach here - it prevents external code from accidentally (or intentionally) modifying the cached data, which could lead to data races and cache integrity issues.

The suggested implementation follows the same defensive copying pattern already used in the Entries() method, making the codebase consistent in its approach to preventing aliasing of cached data.

This is a good catch for maintaining thread safety and data integrity in the cache implementation.

🧠 Learnings used

Learnt from: pranavgaikwad PR: konveyor/kai#864 File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379 Timestamp: 2025-09-10T22:52:55.595Z Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

coderabbitai · 2025-09-11T19:57:06Z

kai_analyzer_rpc/pkg/service/cache.go

+func normalizePath(path string) string {
+	cleanedPath := filepath.Clean(path)
+	volumeName := filepath.VolumeName(cleanedPath)
+	// make sure all volume names are uppercase
+	if volumeName != "" {
+		cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]
+	}
+	return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem
+}


💡 Verification agent

🧩 Analysis chain

Make normalizer cross-OS and handle empty input; canonicalize Windows drive letter even on non-Windows.

When the server runs on Linux but receives Windows-style paths (from the VS Code extension on Windows), filepath.VolumeName("") won’t detect the drive. Canonicalize drive letter by pattern post-ToSlash; also avoid turning "" into ".".

-func normalizePath(path string) string { - cleanedPath := filepath.Clean(path) - volumeName := filepath.VolumeName(cleanedPath) - // make sure all volume names are uppercase - if volumeName != "" { - cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):] - } - return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem -} +func normalizePath(p string) string { + if p == "" { + return "" + } + s := filepath.ToSlash(filepath.Clean(p)) + // If path looks like "C:/..." (Windows-style), normalize drive letter case on all OSes. + if len(s) >= 2 && s[1] == ':' { + s = strings.ToUpper(s[:1]) + s[1:] + } + return s +}

Given the linked issue was observed on Windows 11 with a likely Linux analyzer, please confirm you see identical keys for inputs like:

"c:\repo\src\Foo.java" vs "C:\repo\src\Foo.java"

"//server/share/dir/file.java" vs "\\server\share\dir\file.java"

If you’d like, I can add a small table-driven test for normalizePath to lock this in.

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "PWD: $(pwd)" echo "---- find occurrences of normalizePath ----" rg -n --hidden -S '\bfunc\s+normalizePath\b|\bnormalizePath\b' || true echo "---- show kai_analyzer_rpc/pkg/service/cache.go if present ----" if [ -f "kai_analyzer_rpc/pkg/service/cache.go" ]; then sed -n '1,300p' kai_analyzer_rpc/pkg/service/cache.go || true else echo "File not found at kai_analyzer_rpc/pkg/service/cache.go" echo "Searching for cache.go files..." rg -n --hidden -S 'cache.go' || true fi echo "---- search repo for VolumeName, ToSlash, filepath.Clean ----" rg -n --hidden -S 'filepath\.VolumeName|VolumeName\(|filepath\.ToSlash|ToSlash\(|filepath\.Clean' || true echo "---- show analyzer.go if present ----" if [ -f "kai_analyzer_rpc/pkg/service/analyzer.go" ]; then sed -n '1,400p' kai_analyzer_rpc/pkg/service/analyzer.go || true else echo "analyzer.go not found at expected path; searching for analyzer.go..." rg -n --hidden -S 'analyzer.go' || true fi echo "---- search for other normalization helpers (Normalize/NormalizePath/normalpath) ----" rg -n --hidden -S '\bNormalize\b|\bNormalizePath\b|\bnormalpath\b' || true echo "---- list files in kai_analyzer_rpc/pkg/service (if exists) ----" ls -la kai_analyzer_rpc/pkg/service || true

Length of output: 17520

Make normalizePath cross-OS and handle empty input (kai_analyzer_rpc/pkg/service/cache.go:96-104)

filepath.VolumeName is Windows-only — on non-Windows a Windows-style input like "c:\repo\..." won't have its drive uppercased and filepath.Clean("") returns "."; canonicalize after ToSlash and return "" for empty input to avoid cache-key mismatches.

-func normalizePath(path string) string { - cleanedPath := filepath.Clean(path) - volumeName := filepath.VolumeName(cleanedPath) - // make sure all volume names are uppercase - if volumeName != "" { - cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):] - } - return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem -} +func normalizePath(p string) string { + if p == "" { + return "" + } + s := filepath.ToSlash(filepath.Clean(p)) + // If path looks like "C:/..." (Windows-style), normalize drive letter case on all OSes. + if len(s) >= 2 && s[1] == ':' { + s = strings.ToUpper(s[:1]) + s[1:] + } + return s +}

Saved learning used: analyzer is reported to receive workspace-formatted paths; if that invariant is guaranteed, skip this change — otherwise apply the patch and add table-driven tests for:

"c:\repo\src\Foo.java" vs "C:\repo\src\Foo.java"

"\\server\share\dir\file.java" vs "//server/share/dir/file.java"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func normalizePath(path string) string {

cleanedPath := filepath.Clean(path)

volumeName := filepath.VolumeName(cleanedPath)

// make sure all volume names are uppercase

if volumeName != "" {

cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]

}

return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem

}

func normalizePath(p string) string {

if p == "" {

return ""

}

s := filepath.ToSlash(filepath.Clean(p))

// If path looks like "C:/..." (Windows-style), normalize drive letter case on all OSes.

if len(s) >= 2 && s[1] == ':' {

s = strings.ToUpper(s[:1]) + s[1:]

}

return s

}

🤖 Prompt for AI Agents

In kai_analyzer_rpc/pkg/service/cache.go around lines 96 to 104, normalizePath must be made cross-OS and handle empty input: first convert path separators with filepath.ToSlash, return "" if the resulting string is empty or "."; then canonicalize Windows drives and UNC paths by detecting Windows-style patterns on any OS (e.g. regex for drive letter like ^[a-zA-Z]:/ and UNC like ^//[^/]+/[^/]+) and uppercase only the drive letter (for "c:/..." → "C:/..."), preserving leading double slashes for UNC; finally return the canonical ToSlash result. Add table-driven tests for the cases "c:\\repo\\src\\Foo.java" vs "C:\\repo\\src\\Foo.java" and "\\\\server\\share\\dir\\file.java" vs "//server/share/dir/file.java".

Signed-off-by: Pranav Gaikwad <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

kai_analyzer_rpc/pkg/service/analyzer.go (1)

367-371: No analyzer-side path normalization needed here.

Sticking to raw p is consistent with earlier discussion that analyzer only receives workspace paths; the cache layer now normalizes. This resolves the prior suggestion about normalizing in invalidateCachePerFile.

🧹 Nitpick comments (5)

kai_analyzer_rpc/pkg/service/analyzer.go (5)
264-265: Fix structured logging key/value usage.

logr expects even key/value pairs with string keys. This call passes only a value.

Apply:
- a.Logger.Info("Current cache len", a.cache.Len())
+ a.Logger.Info("Current cache len", "len", a.cache.Len())
299-300: Reduce noisy high-volume logging.

Dumping full rulesets at V(8) can be very large; logging the count is usually enough.

Apply:
- a.Logger.V(8).Info("got rulesets", "rulesets", rulesets)
+ a.Logger.V(8).Info("got rulesets", "count", len(rulesets))
333-338: Minor: simplify nil-check.

updateCache is only called when len(args.IncludedPaths) > 0; the includedPaths != nil guard is redundant.

Apply:
- if includedPaths != nil {
-   a.invalidateCachePerFile(includedPaths)
- }
+ a.invalidateCachePerFile(includedPaths)
374-409: Return deterministic, stable ordering from cache results.

Entries() already returns a cloned snapshot (kai_analyzer_rpc/pkg/service/cache.go: Entries), so apply the sort/alloc/log change only.
- r := []konveyor.RuleSet{}
+ r := make([]konveyor.RuleSet, 0, len(ruleSetMap))
 for _, ruleset := range ruleSetMap {
   r = append(r, ruleset)
 }
- a.Logger.V(8).Info("ruleset from cache", "rulesets", r)
+ sort.SliceStable(r, func(i, j int) bool { return r[i].Name < r[j].Name })
+ a.Logger.V(8).Info("ruleset from cache", "count", len(r))
 return r
344-362: Demote noisy log and confirm cache-key semantics.

File: kai_analyzer_rpc/pkg/service/analyzer.go (lines 344–362)

Replace the noisy info log with a verbose path-only log:
- a.Logger.Info("here update cache incident", "incident", i)
+ a.Logger.V(8).Info("updating cache incident", "path", i.URI.Filename())
Verified: kai_analyzer_rpc/pkg/service/cache.go::normalizePath calls filepath.Clean, uppercases VolumeName (Windows) and filepath.ToSlash; Add/Get/Delete normalize keys internally — using i.URI.Filename() as the cache key is acceptable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 299ecf0 and 6b28be5.

📒 Files selected for processing (1)

kai_analyzer_rpc/pkg/service/analyzer.go (6 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

📚 Learning: 2025-09-10T22:52:55.595Z

Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

Applied to files:

kai_analyzer_rpc/pkg/service/analyzer.go

🧬 Code graph analysis (1)

kai_analyzer_rpc/pkg/service/analyzer.go (1)

kai_analyzer_rpc/pkg/service/cache.go (3)

IncidentsCache (12-18)

NewIncidentsCache (27-33)

CacheValue (20-25)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

🔇 Additional comments (3)

kai_analyzer_rpc/pkg/service/analyzer.go (3)

63-64: Good swap to a dedicated cache interface.

Moving from a map to IncidentsCache improves encapsulation, concurrency safety, and (per PR) enables path normalization at the cache boundary.

177-178: Proper cache initialization.

Initializing with NewIncidentsCache(log) ensures a clean cache per analyzer instance and leverages internal normalization.

328-331: Full-run cache reset looks correct.

Rebuilding via setCache avoids stale entries and aligns with the fix for incidents lingering post-partial analysis.

🐛 normalize paths in cache

d5279c7

Signed-off-by: Pranav Gaikwad <[email protected]>

pranavgaikwad mentioned this pull request Sep 10, 2025

🐛 normalize paths konveyor/editor-extensions#807

Merged

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

ibolton336 approved these changes Sep 10, 2025

View reviewed changes

use toSlash to actually convert slashes

7893237

Signed-off-by: Pranav Gaikwad <[email protected]>

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

shawn-hurley requested changes Sep 11, 2025

View reviewed changes

add mutex and make sure Entries() doesn't return original map

07a2e1c

Signed-off-by: Pranav Gaikwad <[email protected]>

pranavgaikwad requested a review from shawn-hurley September 11, 2025 19:45

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

address comments by the bot

299ecf0

Signed-off-by: Pranav Gaikwad <[email protected]>

shawn-hurley approved these changes Sep 11, 2025

View reviewed changes

update a stray log line

6b28be5

Signed-off-by: Pranav Gaikwad <[email protected]>

coderabbitai bot reviewed Sep 12, 2025

View reviewed changes

pranavgaikwad merged commit 13d82bc into konveyor:main Sep 12, 2025
14 checks passed

This was referenced Sep 17, 2025

operator konveyor-operator (0.8.0-beta.5) k8s-operatorhub/community-operators#6841

Merged

operator konveyor-operator (0.8.0-beta.5) redhat-openshift-ecosystem/community-operators-prod#7641

Merged

This was referenced Sep 29, 2025

operator konveyor-operator (0.8.0) k8s-operatorhub/community-operators#6932

Merged

operator konveyor-operator (0.8.0) redhat-openshift-ecosystem/community-operators-prod#7776

Merged

🐛 normalize paths in cache #864

🐛 normalize paths in cache #864

Uh oh!

Conversation

pranavgaikwad commented Sep 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks (3 passed, 2 warnings)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavgaikwad Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

shawn-hurley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shawn-hurley Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

pranavgaikwad Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shawn-hurley Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pranavgaikwad commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

coderabbitai bot Sep 10, 2025 •

edited

Loading

coderabbitai bot Sep 11, 2025 •

edited

Loading