Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@pranavgaikwad
Copy link
Contributor

@pranavgaikwad pranavgaikwad commented Sep 10, 2025

Fixes #863

Summary by CodeRabbit

  • Refactor

    • Added a unified in-memory incidents cache with standardized add/delete/list operations.
    • Normalized file paths in cache handling for consistent incident aggregation across platforms.
    • Analyzer updated to use the new cache and now constructs rulesets from cached entries.
    • Improved logging around cache size and ruleset construction.
  • Bug Fixes

    • Service now fails fast on analyzer initialization errors to avoid partial startup.

Signed-off-by: Pranav Gaikwad <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Replaces the analyzer's internal map with a new public IncidentsCache and CacheValue, adds a map-backed, path-normalizing cache implementation, updates analyzer code to use the new cache API and initializes it in NewPipeAnalyzer, and changes Server.Accept to panic if NewPipeAnalyzer fails.

Changes

Cohort / File(s) Summary
Incidents cache implementation
kai_analyzer_rpc/pkg/service/cache.go
Adds IncidentsCache interface and CacheValue struct; implements NewIncidentsCache(logger) returning a map-backed incidentsCache with Get/Add/Delete/Entries/Len (and Keys), path normalization (normalizePath), concurrency control, and verbose logging.
Analyzer refactor to use cache API
kai_analyzer_rpc/pkg/service/analyzer.go
Replaces internal map[string][]cacheValue with IncidentsCache; removes private cacheValue type; introduces and uses public CacheValue; updates initialization, add/delete, iteration (Entries/Len), and ruleset construction to use the new cache API and fields; adjusts logging.
Pipe analyzer initialization
kai_analyzer_rpc/pkg/service/pipe_analyzer.go
In NewPipeAnalyzer, initialize analyzer cache via NewIncidentsCache(l) instead of a map literal; removes separate cache mutex initialization.
Server initialization error handling
kai_analyzer_rpc/pkg/rpc/server.go
In Server.Accept, on NewPipeAnalyzer failure the code now panics (panic(err)) instead of returning an error.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Server
  participant PipeAnalyzer
  Client->>Server: Accept()
  Server->>Server: NewPipeAnalyzer(...)
  alt NewPipeAnalyzer fails (changed)
    Server->>Server: panic(err)
    Server--x Client: process terminates
  else Success
    Server->>PipeAnalyzer: Start/Handle requests
    PipeAnalyzer-->>Server: Responses
    Server-->>Client: Responses
  end
Loading
sequenceDiagram
  autonumber
  participant Analyzer
  participant IncidentsCache
  participant RulesetBuilder
  rect rgba(220,240,255,0.4)
    note right of Analyzer: Add incidents (analysis)
    Analyzer->>IncidentsCache: Add(path, CacheValue)
  end
  rect rgba(240,255,220,0.4)
    note right of Analyzer: Remove resolved incidents
    Analyzer->>IncidentsCache: Delete(path)
  end
  rect rgba(255,240,220,0.4)
    note right of Analyzer: Build rulesets from cached entries
    Analyzer->>IncidentsCache: Entries()
    IncidentsCache-->>Analyzer: map[path][]CacheValue
    Analyzer->>RulesetBuilder: Build from CacheValue fields
    RulesetBuilder-->>Analyzer: Rulesets
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks (3 passed, 2 warnings)

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning The PR includes changes that appear unrelated to the linked issue: server.go's Server.Accept now panics on analyzer initialization failure which changes startup error handling semantics and is not required to fix cache path normalization, and several internal types were made public (IncidentsCache, CacheValue, NewIncidentsCache) which expand the public API surface beyond the bug fix. These modifications increase risk and should be separated or justified as part of the same change. Revert or move the server.go panic change and any unrelated API-export changes into a separate PR, or provide a short rationale and add compatibility tests; limit this PR to the cache normalization and add tests that demonstrate the fix for issue #863.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title directly describes the primary change — normalizing cache paths — and matches the modifications in cache.go and analyzer.go that introduce path normalization and a normalized incidents cache; it is short and focused. However the title contains an emoji which is unnecessary noise for PR history. It is not vague or unrelated to the changeset.
Linked Issues Check ✅ Passed The changes implement a normalized incidents cache (normalizePath) and replace the raw map with an IncidentsCache API used by the analyzer so that cache keys are canonicalized and deleted via normalized paths; this addresses the root cause described in issue #863 where path/casing differences prevented cache invalidation during partial analysis. The normalization logic (filepath.Clean, uppercasing volume name, ToSlash) specifically targets Windows-style path differences referenced in the issue. Based on the provided summaries, the code-level changes align with and should resolve the linked issue's objective.

Poem

I nibble at paths, tidy each track,
Backslashes to slashes — no crumbs come back.
New cache bins store each incident bright,
Resolved ones vanish, rulesets reunite.
If startup trips, I thump — panic in sight! 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
kai_analyzer_rpc/pkg/service/cache.go (2)

72-74: Entries exposes internal map allowing external mutation.

Returning the backing map risks accidental modification outside cache’s lock discipline.

Return a shallow copy:

-func (i *incidentsCache) Entries() map[string][]CacheValue {
-	return i.cache
-}
+func (i *incidentsCache) Entries() map[string][]CacheValue {
+	out := make(map[string][]CacheValue, len(i.cache))
+	for k, v := range i.cache {
+		cpy := make([]CacheValue, len(v))
+		copy(cpy, v)
+		out[k] = cpy
+	}
+	return out
+}

64-70: Keys() is unreachable from consumers (not in the interface).

NewIncidentsCache returns IncidentsCache, which lacks Keys(). Either expose it via the interface or remove to avoid dead API.

If unused, remove:

-func (i *incidentsCache) Keys() []string {
-	keys := make([]string, 0, len(i.cache))
-	for k := range i.cache {
-		keys = append(keys, k)
-	}
-	return keys
-}
kai_analyzer_rpc/pkg/service/analyzer.go (3)

266-268: Fix logr key/value usage.

Info requires key/value pairs; passing just a.cache.Len() results in malformed logs.

-		a.Logger.Info("Current cache len", a.cache.Len())
+		a.Logger.Info("Current cache len", "len", a.cache.Len())

351-369: Reduce noisy debug logging and avoid logging large incident payloads at info level.

Logging full incidents can be heavy; move to high verbosity or summarize.

-				a.Logger.Info("here update cache incident", "incident", i)
+				a.Logger.V(8).Info("caching incident", "uri", i.URI, "rule", violationName)

413-419: Optional: return rulesets in stable order.

Maps iterate in random order; sorting improves determinism across runs.

 	r := []konveyor.RuleSet{}
 	for _, ruleset := range ruleSetMap {
 		r = append(r, ruleset)
 	}
-	a.Logger.V(8).Info("ruleset from cache", "rulesets", r)
+	sort.SliceStable(r, func(i, j int) bool { return r[i].Name < r[j].Name })
+	a.Logger.V(8).Info("ruleset from cache", "rulesets", r)
 	return r
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 830ead0 and d5279c7.

📒 Files selected for processing (4)
  • kai_analyzer_rpc/pkg/rpc/server.go (1 hunks)
  • kai_analyzer_rpc/pkg/service/analyzer.go (8 hunks)
  • kai_analyzer_rpc/pkg/service/cache.go (1 hunks)
  • kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1)
kai_analyzer_rpc/pkg/service/cache.go (1)
  • NewIncidentsCache (26-31)
kai_analyzer_rpc/pkg/service/analyzer.go (1)
kai_analyzer_rpc/pkg/service/cache.go (3)
  • IncidentsCache (11-17)
  • NewIncidentsCache (26-31)
  • CacheValue (19-24)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Trunk Check Runner
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (3)
kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1)

82-83: Cache initialization switch LGTM.

Using NewIncidentsCache(l) aligns this constructor with the new cache abstraction.

kai_analyzer_rpc/pkg/service/analyzer.go (2)

63-64: Type switch to IncidentsCache LGTM.

The abstraction improves encapsulation and enables path normalization in one place.


178-179: Cache construction LGTM.

Creating via NewIncidentsCache(log) ensures consistent normalization and logging.

Comment on lines 374 to 379
func (a *analyzer) invalidateCachePerFile(paths []string) {
for _, p := range paths {
a.Logger.Info("deleting cache entry for path", "path", p)
delete(a.cache, p)
a.cache.Delete(p)
}
}
Copy link
Contributor

@coderabbitai coderabbitai bot Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Normalize invalidate paths to analyzer root before deletion.

If clients send relative paths, deletion may miss absolute keys. Normalize to absolute (relative to a.location) before Delete.

 func (a *analyzer) invalidateCachePerFile(paths []string) {
 	for _, p := range paths {
-		a.Logger.Info("deleting cache entry for path", "path", p)
-		a.cache.Delete(p)
+		norm := p
+		if !filepath.IsAbs(norm) {
+			norm = filepath.Join(a.location, norm)
+		}
+		norm = filepath.Clean(norm)
+		a.Logger.Info("deleting cache entry for path", "path", p, "normalized", norm)
+		a.cache.Delete(norm)
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (a *analyzer) invalidateCachePerFile(paths []string) {
for _, p := range paths {
a.Logger.Info("deleting cache entry for path", "path", p)
delete(a.cache, p)
a.cache.Delete(p)
}
}
func (a *analyzer) invalidateCachePerFile(paths []string) {
for _, p := range paths {
norm := p
if !filepath.IsAbs(norm) {
norm = filepath.Join(a.location, norm)
}
norm = filepath.Clean(norm)
a.Logger.Info("deleting cache entry for path", "path", p, "normalized", norm)
a.cache.Delete(norm)
}
}
🤖 Prompt for AI Agents
In kai_analyzer_rpc/pkg/service/analyzer.go around lines 374–379, normalize any
relative paths to the analyzer root before deleting cache entries: for each
input path, if it is not absolute use filepath.Join(a.location, path) (or
filepath.Abs if you prefer) then filepath.Clean the result to produce the
normalized key, log both original and normalized path, and call
a.cache.Delete(normalizedPath); if resolving to absolute can fail, handle/log
the error and skip deletion for that entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai this is not needed as we never get filepaths outside workspace from the analyzer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pranavgaikwad, thank you for the clarification! I understand that the analyzer ensures all file paths are consistently within the workspace, making the normalization step unnecessary.

🐰
◕ ◕
( ◡ ) curious about system internals
_____/


✏️ Learnings added
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.571Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
kai_analyzer_rpc/pkg/service/cache.go (1)

76-84: Normalize Windows drive letters consistently and fix the misleading comment

The comment says “lowercase” but the code uppercases the volume; also make the normalization deterministic on Windows drives. This directly impacts cache key equality across runs on Windows.

Apply:

@@
-import (
-	"path/filepath"
-	"strings"
-
-	"github.com/go-logr/logr"
-	"github.com/konveyor/analyzer-lsp/output/v1/konveyor"
-)
+import (
+	"path/filepath"
+	"runtime"
+	"strings"
+
+	"github.com/go-logr/logr"
+	"github.com/konveyor/analyzer-lsp/output/v1/konveyor"
+)
@@
-func normalizePath(path string) string {
-	cleanedPath := filepath.Clean(path)
-	volumeName := filepath.VolumeName(cleanedPath)
-	// make sure all volume names are lowercase
-	if volumeName != "" {
-		cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]
-	}
-	return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem
+func normalizePath(p string) string {
+	cleaned := filepath.Clean(p)
+	vol := filepath.VolumeName(cleaned)
+	// Normalize Windows drive letters to lowercase for stable keys.
+	if vol != "" && runtime.GOOS == "windows" {
+		cleaned = strings.ToLower(vol) + cleaned[len(vol):]
+	}
+	// Normalize separators to forward slashes.
+	return filepath.ToSlash(cleaned)
 }

Also applies to: 3-9

🧹 Nitpick comments (2)
kai_analyzer_rpc/pkg/service/cache.go (2)

72-74: Avoid leaking internal state from Entries()

Returning the map exposes internal slices to mutation by callers. Return a defensive copy.

 func (i *incidentsCache) Entries() map[string][]CacheValue {
-	return i.cache
+	out := make(map[string][]CacheValue, len(i.cache))
+	for k, v := range i.cache {
+		cp := make([]CacheValue, len(v))
+		copy(cp, v)
+		out[k] = cp
+	}
+	return out
 }

11-17: Align Keys() with the interface (or make it private)

Constructor returns IncidentsCache; Keys() isn’t on the interface, so consumers can’t use it without type assertions.

Option A (expose via interface):

 type IncidentsCache interface {
 	Get(path string) ([]CacheValue, bool)
 	Add(path string, value CacheValue)
 	Delete(path string)
 	Len() int
 	Entries() map[string][]CacheValue
+	Keys() []string
 }

Option B (make it private if internal-only):

-func (i *incidentsCache) Keys() []string {
+func (i *incidentsCache) keys() []string {

Also applies to: 64-70

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5279c7 and 7893237.

📒 Files selected for processing (1)
  • kai_analyzer_rpc/pkg/service/cache.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (1)
kai_analyzer_rpc/pkg/service/cache.go (1)

42-47: Action: confirm concurrent access to incidentsCache — either document single-threaded use or add synchronization

Repository search didn't find callers/constructor for incidentsCache, so concurrent use couldn't be verified. If the cache can be accessed from multiple goroutines, protect the map with a sync.RWMutex and return copies from Get; otherwise document the single-threaded assumption.

Location: kai_analyzer_rpc/pkg/service/cache.go — methods: Len, Get, Add, Delete.

Suggested changes (minimal):

@@
-import (
+import (
 	"path/filepath"
+	"sync"
@@
 type incidentsCache struct {
 	cache  map[string][]CacheValue
 	logger logr.Logger
+	mu     sync.RWMutex
 }
@@
 func (i *incidentsCache) Len() int {
-	return len(i.cache)
+	i.mu.RLock()
+	defer i.mu.RUnlock()
+	return len(i.cache)
 }
@@
 func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {
 	normalizedPath := normalizePath(path)
 	i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)
-	val, ok := i.cache[normalizedPath]
-	return val, ok
+	i.mu.RLock()
+	defer i.mu.RUnlock()
+	val, ok := i.cache[normalizedPath]
+	if !ok {
+		return nil, false
+	}
+	cp := make([]CacheValue, len(val))
+	copy(cp, val)
+	return cp, true
 }
@@
 func (i *incidentsCache) Add(path string, value CacheValue) {
 	normalizedPath := normalizePath(path)
 	i.logger.V(8).Info("adding cache entry for path", "path", path, "normalizedPath", normalizedPath)
-	if _, ok := i.cache[normalizedPath]; !ok {
-		i.cache[normalizedPath] = []CacheValue{}
-	}
-	i.cache[normalizedPath] = append(i.cache[normalizedPath], value)
+	i.mu.Lock()
+	defer i.mu.Unlock()
+	i.cache[normalizedPath] = append(i.cache[normalizedPath], value)
 }
@@
 func (i *incidentsCache) Delete(path string) {
 	normalizedPath := normalizePath(path)
 	i.logger.V(8).Info("deleting cache entry for path", "path", path, "normalizedPath", normalizedPath)
-	delete(i.cache, normalizedPath)
+	i.mu.Lock()
+	defer i.mu.Unlock()
+	delete(i.cache, normalizedPath)
 }

pranavgaikwad added a commit to konveyor/editor-extensions that referenced this pull request Sep 11, 2025
Fixes #800 

Depends on konveyor/kai#864
<!--
## PR Title Prefix

Every **PR Title** should be prefixed with :text: to indicate its type.

- Breaking change: ⚠️ (`⚠️`)
- Non-breaking feature: ✨ (`✨`)
- Patch fix: 🐛 (`🐛`)
- Docs: 📖 (`📖`)
- Infra/Tests/Other: 🌱 (`🌱`)
- No release note: 👻 (`👻`)

For example, a pull request containing breaking changes might look like
`⚠️ My pull request contains breaking changes`.

Since GitHub supports emoji aliases (ie. `👻`), there is no need to
include
the emoji directly in the PR title -- **please use the alias**. It used
to be
the case that projects using emojis for PR typing had to include the
emoji
directly because GitHub didn't render the alias. Given that `⚠️`
is
easy enough to read as text, easy to parse in release tooling, and
rendered in
GitHub well, we prefer to standardize on the alias.

For more information, please see the Konveyor
[Versioning
Doc](https://github.com/konveyor/release-tools/blob/main/VERSIONING.md).
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved file path normalization for analysis, ensuring included and
excluded paths are matched consistently across platforms.
* Fixed Windows-specific issues by normalizing paths and drive-letter
casing, reducing missed or wrongly included files.
  * More reliable adherence to ignore settings in workspaces.
* Reduced sporadic analysis discrepancies after file moves or renames
caused by inconsistent path formats.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Pranav Gaikwad <[email protected]>
Copy link
Contributor

@shawn-hurley shawn-hurley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that we should consider a mutex or something so the cache can be used across multiple connections to the server and requests for the same project

return keys
}

func (i *incidentsCache) Entries() map[string][]CacheValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do end up adding a sync, then we probably want to do something here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now maknig sure that a clone is returned and not the original map

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
kai_analyzer_rpc/pkg/service/cache.go (2)

83-94: Good: Entries() deep-clones to avoid sharing internal state.

This addresses prior feedback about synchronization and exposure. Nice.


35-39: Concurrency approach looks good; mutex guards the map.

Addresses earlier concern about concurrent access to the cache.

🧹 Nitpick comments (2)
kai_analyzer_rpc/pkg/service/cache.go (2)

27-33: Nit: zero-value RWMutex is ready; explicit init is unnecessary.

Small cleanup; no behavior change.

 func NewIncidentsCache(logger logr.Logger) IncidentsCache {
   return &incidentsCache{
     cache:  map[string][]CacheValue{},
     logger: logger,
-    mutex:  sync.RWMutex{},
   }
 }

73-81: Optional: sort Keys() for deterministic iteration/logs.

Useful for stable tests and diagnostics.

+// add: import "sort"
 func (i *incidentsCache) Keys() []string {
   i.mutex.RLock()
   defer i.mutex.RUnlock()
   keys := make([]string, 0, len(i.cache))
   for k := range i.cache {
     keys = append(keys, k)
   }
-  return keys
+  sort.Strings(keys)
+  return keys
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7893237 and 07a2e1c.

📒 Files selected for processing (3)
  • kai_analyzer_rpc/pkg/service/analyzer.go (6 hunks)
  • kai_analyzer_rpc/pkg/service/cache.go (1 hunks)
  • kai_analyzer_rpc/pkg/service/pipe_analyzer.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • kai_analyzer_rpc/pkg/service/pipe_analyzer.go
  • kai_analyzer_rpc/pkg/service/analyzer.go
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.
📚 Learning: 2025-09-10T22:52:55.595Z
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

Applied to files:

  • kai_analyzer_rpc/pkg/service/cache.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)

Comment on lines 45 to 52
func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {
i.mutex.RLock()
defer i.mutex.RUnlock()
normalizedPath := normalizePath(path)
i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)
val, ok := i.cache[normalizedPath]
return val, ok
}
Copy link
Contributor

@coderabbitai coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Return a cloned slice from Get() to avoid external mutation/aliasing.

Exposing the internal slice risks accidental mutation and races. Clone like Entries() does.

 func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {
   i.mutex.RLock()
   defer i.mutex.RUnlock()
   normalizedPath := normalizePath(path)
   i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)
-  val, ok := i.cache[normalizedPath]
-  return val, ok
+  val, ok := i.cache[normalizedPath]
+  if !ok {
+    return nil, false
+  }
+  out := make([]CacheValue, len(val))
+  copy(out, val)
+  return out, true
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {
i.mutex.RLock()
defer i.mutex.RUnlock()
normalizedPath := normalizePath(path)
i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)
val, ok := i.cache[normalizedPath]
return val, ok
}
func (i *incidentsCache) Get(path string) ([]CacheValue, bool) {
i.mutex.RLock()
defer i.mutex.RUnlock()
normalizedPath := normalizePath(path)
i.logger.V(8).Info("getting cache entry for path", "path", path, "normalizedPath", normalizedPath)
val, ok := i.cache[normalizedPath]
if !ok {
return nil, false
}
out := make([]CacheValue, len(val))
copy(out, val)
return out, true
}
🤖 Prompt for AI Agents
In kai_analyzer_rpc/pkg/service/cache.go around lines 45 to 52, Get currently
returns the internal slice from the cache which allows callers to mutate shared
state; change Get to return a shallow-cloned slice copy (e.g., allocate a new
slice with len(val) and copy(val, newSlice)) before returning so callers receive
an independent slice; preserve the existing ok boolean, handle nil/zero-length
slices (return nil or empty slice consistently) and keep the RLock/RUnlock as-is
to protect access while copying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the computer might be correct here, the list will be a pointer as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley You're absolutely right! The slice in Go is indeed a reference type, so returning it directly from the cache exposes the internal data structure to potential mutation by callers.

The CodeRabbit suggestion to clone the slice before returning is the correct approach here - it prevents external code from accidentally (or intentionally) modifying the cached data, which could lead to data races and cache integrity issues.

The suggested implementation follows the same defensive copying pattern already used in the Entries() method, making the codebase consistent in its approach to preventing aliasing of cached data.

This is a good catch for maintaining thread safety and data integrity in the cache implementation.


🧠 Learnings used
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

Comment on lines +96 to +104
func normalizePath(path string) string {
cleanedPath := filepath.Clean(path)
volumeName := filepath.VolumeName(cleanedPath)
// make sure all volume names are uppercase
if volumeName != "" {
cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]
}
return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Make normalizer cross-OS and handle empty input; canonicalize Windows drive letter even on non-Windows.

When the server runs on Linux but receives Windows-style paths (from the VS Code extension on Windows), filepath.VolumeName("") won’t detect the drive. Canonicalize drive letter by pattern post-ToSlash; also avoid turning "" into ".".

-func normalizePath(path string) string {
-	cleanedPath := filepath.Clean(path)
-	volumeName := filepath.VolumeName(cleanedPath)
-	// make sure all volume names are uppercase
-	if volumeName != "" {
-		cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]
-	}
-	return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem
-}
+func normalizePath(p string) string {
+	if p == "" {
+		return ""
+	}
+	s := filepath.ToSlash(filepath.Clean(p))
+	// If path looks like "C:/..." (Windows-style), normalize drive letter case on all OSes.
+	if len(s) >= 2 && s[1] == ':' {
+		s = strings.ToUpper(s[:1]) + s[1:]
+	}
+	return s
+}

Given the linked issue was observed on Windows 11 with a likely Linux analyzer, please confirm you see identical keys for inputs like:

  • "c:\repo\src\Foo.java" vs "C:\repo\src\Foo.java"
  • "//server/share/dir/file.java" vs "\\server\share\dir\file.java"

If you’d like, I can add a small table-driven test for normalizePath to lock this in.


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "PWD: $(pwd)"
echo "---- find occurrences of normalizePath ----"
rg -n --hidden -S '\bfunc\s+normalizePath\b|\bnormalizePath\b' || true

echo "---- show kai_analyzer_rpc/pkg/service/cache.go if present ----"
if [ -f "kai_analyzer_rpc/pkg/service/cache.go" ]; then
  sed -n '1,300p' kai_analyzer_rpc/pkg/service/cache.go || true
else
  echo "File not found at kai_analyzer_rpc/pkg/service/cache.go"
  echo "Searching for cache.go files..."
  rg -n --hidden -S 'cache.go' || true
fi

echo "---- search repo for VolumeName, ToSlash, filepath.Clean ----"
rg -n --hidden -S 'filepath\.VolumeName|VolumeName\(|filepath\.ToSlash|ToSlash\(|filepath\.Clean' || true

echo "---- show analyzer.go if present ----"
if [ -f "kai_analyzer_rpc/pkg/service/analyzer.go" ]; then
  sed -n '1,400p' kai_analyzer_rpc/pkg/service/analyzer.go || true
else
  echo "analyzer.go not found at expected path; searching for analyzer.go..."
  rg -n --hidden -S 'analyzer.go' || true
fi

echo "---- search for other normalization helpers (Normalize/NormalizePath/normalpath) ----"
rg -n --hidden -S '\bNormalize\b|\bNormalizePath\b|\bnormalpath\b' || true

echo "---- list files in kai_analyzer_rpc/pkg/service (if exists) ----"
ls -la kai_analyzer_rpc/pkg/service || true

Length of output: 17520


Make normalizePath cross-OS and handle empty input (kai_analyzer_rpc/pkg/service/cache.go:96-104)

filepath.VolumeName is Windows-only — on non-Windows a Windows-style input like "c:\repo\..." won't have its drive uppercased and filepath.Clean("") returns "."; canonicalize after ToSlash and return "" for empty input to avoid cache-key mismatches.

-func normalizePath(path string) string {
-	cleanedPath := filepath.Clean(path)
-	volumeName := filepath.VolumeName(cleanedPath)
-	// make sure all volume names are uppercase
-	if volumeName != "" {
-		cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]
-	}
-	return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem
-}
+func normalizePath(p string) string {
+	if p == "" {
+		return ""
+	}
+	s := filepath.ToSlash(filepath.Clean(p))
+	// If path looks like "C:/..." (Windows-style), normalize drive letter case on all OSes.
+	if len(s) >= 2 && s[1] == ':' {
+		s = strings.ToUpper(s[:1]) + s[1:]
+	}
+	return s
+}

Saved learning used: analyzer is reported to receive workspace-formatted paths; if that invariant is guaranteed, skip this change — otherwise apply the patch and add table-driven tests for:

  • "c:\repo\src\Foo.java" vs "C:\repo\src\Foo.java"
  • "\\server\share\dir\file.java" vs "//server/share/dir/file.java"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func normalizePath(path string) string {
cleanedPath := filepath.Clean(path)
volumeName := filepath.VolumeName(cleanedPath)
// make sure all volume names are uppercase
if volumeName != "" {
cleanedPath = strings.ToUpper(volumeName) + cleanedPath[len(volumeName):]
}
return filepath.ToSlash(cleanedPath) // prevent forward / backward slashes becoming a problem
}
func normalizePath(p string) string {
if p == "" {
return ""
}
s := filepath.ToSlash(filepath.Clean(p))
// If path looks like "C:/..." (Windows-style), normalize drive letter case on all OSes.
if len(s) >= 2 && s[1] == ':' {
s = strings.ToUpper(s[:1]) + s[1:]
}
return s
}
🤖 Prompt for AI Agents
In kai_analyzer_rpc/pkg/service/cache.go around lines 96 to 104, normalizePath
must be made cross-OS and handle empty input: first convert path separators with
filepath.ToSlash, return "" if the resulting string is empty or "."; then
canonicalize Windows drives and UNC paths by detecting Windows-style patterns on
any OS (e.g. regex for drive letter like ^[a-zA-Z]:/ and UNC like
^//[^/]+/[^/]+) and uppercase only the drive letter (for "c:/..." → "C:/..."),
preserving leading double slashes for UNC; finally return the canonical ToSlash
result. Add table-driven tests for the cases "c:\\repo\\src\\Foo.java" vs
"C:\\repo\\src\\Foo.java" and "\\\\server\\share\\dir\\file.java" vs
"//server/share/dir/file.java".

Signed-off-by: Pranav Gaikwad <[email protected]>
Signed-off-by: Pranav Gaikwad <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
kai_analyzer_rpc/pkg/service/analyzer.go (1)

367-371: No analyzer-side path normalization needed here.

Sticking to raw p is consistent with earlier discussion that analyzer only receives workspace paths; the cache layer now normalizes. This resolves the prior suggestion about normalizing in invalidateCachePerFile.

🧹 Nitpick comments (5)
kai_analyzer_rpc/pkg/service/analyzer.go (5)

264-265: Fix structured logging key/value usage.

logr expects even key/value pairs with string keys. This call passes only a value.

Apply:

- a.Logger.Info("Current cache len", a.cache.Len())
+ a.Logger.Info("Current cache len", "len", a.cache.Len())

299-300: Reduce noisy high-volume logging.

Dumping full rulesets at V(8) can be very large; logging the count is usually enough.

Apply:

- a.Logger.V(8).Info("got rulesets", "rulesets", rulesets)
+ a.Logger.V(8).Info("got rulesets", "count", len(rulesets))

333-338: Minor: simplify nil-check.

updateCache is only called when len(args.IncludedPaths) > 0; the includedPaths != nil guard is redundant.

Apply:

- if includedPaths != nil {
-   a.invalidateCachePerFile(includedPaths)
- }
+ a.invalidateCachePerFile(includedPaths)

374-409: Return deterministic, stable ordering from cache results.

Entries() already returns a cloned snapshot (kai_analyzer_rpc/pkg/service/cache.go: Entries), so apply the sort/alloc/log change only.

- r := []konveyor.RuleSet{}
+ r := make([]konveyor.RuleSet, 0, len(ruleSetMap))
 for _, ruleset := range ruleSetMap {
   r = append(r, ruleset)
 }
- a.Logger.V(8).Info("ruleset from cache", "rulesets", r)
+ sort.SliceStable(r, func(i, j int) bool { return r[i].Name < r[j].Name })
+ a.Logger.V(8).Info("ruleset from cache", "count", len(r))
 return r

344-362: Demote noisy log and confirm cache-key semantics.

File: kai_analyzer_rpc/pkg/service/analyzer.go (lines 344–362)

  • Replace the noisy info log with a verbose path-only log:
- a.Logger.Info("here update cache incident", "incident", i)
+ a.Logger.V(8).Info("updating cache incident", "path", i.URI.Filename())
  • Verified: kai_analyzer_rpc/pkg/service/cache.go::normalizePath calls filepath.Clean, uppercases VolumeName (Windows) and filepath.ToSlash; Add/Get/Delete normalize keys internally — using i.URI.Filename() as the cache key is acceptable.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 299ecf0 and 6b28be5.

📒 Files selected for processing (1)
  • kai_analyzer_rpc/pkg/service/analyzer.go (6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.
📚 Learning: 2025-09-10T22:52:55.595Z
Learnt from: pranavgaikwad
PR: konveyor/kai#864
File: kai_analyzer_rpc/pkg/service/analyzer.go:374-379
Timestamp: 2025-09-10T22:52:55.595Z
Learning: In kai_analyzer_rpc/pkg/service/analyzer.go, the analyzer only receives file paths from within the workspace, so path normalization before cache operations is not needed as paths are already consistently formatted.

Applied to files:

  • kai_analyzer_rpc/pkg/service/analyzer.go
🧬 Code graph analysis (1)
kai_analyzer_rpc/pkg/service/analyzer.go (1)
kai_analyzer_rpc/pkg/service/cache.go (3)
  • IncidentsCache (12-18)
  • NewIncidentsCache (27-33)
  • CacheValue (20-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run e2e test (macos-latest, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (macos-13, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-22.04-arm, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (windows-latest, cmd, ChatOpenAI, kai-test-generation)
  • GitHub Check: Run e2e test (ubuntu-24.04, bash, ChatOpenAI, kai-test-generation)
  • GitHub Check: Check Types on PR Runner (ubuntu-latest, 3.12, x64)
🔇 Additional comments (3)
kai_analyzer_rpc/pkg/service/analyzer.go (3)

63-64: Good swap to a dedicated cache interface.

Moving from a map to IncidentsCache improves encapsulation, concurrency safety, and (per PR) enables path normalization at the cache boundary.


177-178: Proper cache initialization.

Initializing with NewIncidentsCache(log) ensures a clean cache per analyzer instance and leverages internal normalization.


328-331: Full-run cache reset looks correct.

Rebuilding via setCache avoids stale entries and aligns with the fix for incidents lingering post-partial analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Partial analysis does not remove the incidents that have been addressed

3 participants