Codestin Search App

SaiHPE · 2026-05-17T09:25:03Z

Creating PR to merge feat into main

Summary by CodeRabbit

New Features
- Live AI-powered incident dashboard with WebSocket updates, evidence panel, timeline charts, panel health UI, topology visualization, and interactive investigations
- Automated Text→SQL investigations, AI-generated RCA, and automated SQL self-heal for broken dashboard panels
- Demo scenarios and UI controls for onboarding/testing
Documentation
- Added frontend README with setup notes
Chores
- Updated ignore files and added example environment config (Azure OpenAI, DB, webhook, CORS)

coderabbitai · 2026-05-17T09:25:16Z

📝 Walkthrough

Walkthrough

Greenfield full-stack InsightSQL: FastAPI backend with asyncpg/Postgres schema and seed; Azure OpenAI LLM integrations for Text‑to‑SQL, RCA, and SQL healing; WebSocket real‑time broadcasts; and a Vite+React dashboard with hooks, components, and demo scenarios.

Changes

InsightSQL Application

Layer / File(s)	Summary
Backend Foundation: Config, Database Schema, Initialization `backend/.env.example`, `backend/.python-version`, `backend/app/__init__.py`, `backend/app/config.py`, `backend/app/db/__init__.py`, `backend/app/db/engine.py`, `backend/app/db/schema.sql`, `backend/app/db/seed.py`, `backend/pyproject.toml`	Adds runtime settings (Azure OpenAI, DB, webhook key), pins Python 3.12, implements asyncpg pool lifecycle (init/get/close), defines the ops schema with partitions and helper, and seed routines populating mock topology, metrics, backups, and dashboard panels.
Data Ingestion & SQL Validation `backend/app/ingestion/normalizer.py`, `backend/app/ingestion/router.py`, `backend/app/validation/executor.py`, `backend/app/validation/sqlglot_gate.py`	Webhook handlers for alerts/metrics/events, normalizers that persist alerts/metrics/events into ops tables, SQLGlot AST gate enforcing SELECT-only and allowed schemas, and read-only EXPLAIN/execute helpers returning structured ExecutionResult.
LLM Client & Prompts `backend/app/agent/llm.py`, `backend/app/agent/prompts.py`	Async Azure OpenAI wrapper with deployment rotation and failover, plus prompt templates and a schema-map builder used to ground investigation, panel, heal, and RCA prompts.
Agent Workflows: Investigate, RCA, Healer `backend/app/agent/text_to_sql.py`, `backend/app/agent/rca.py`, `backend/app/agent/healer.py`	Text‑to‑SQL investigation loop with retry/validation/explain/execution and evidence persistence, RCA generator that formats evidence/events and parses JSON LLM output, and healer that generates, validates, shadow-executes, and atomically promotes healed panel SQL.
REST API Endpoints & WebSocket Infrastructure `backend/app/main.py`, `backend/app/ingestion/router.py`, `backend/app/incidents/router.py`, `backend/app/panels/router.py`, `backend/app/topology/router.py`, `backend/app/ws/manager.py`	FastAPI app with lifespan DB init/seed, webhook/incidents/panels/topology/demo routers, panel execute/heal/break endpoints, WebSocket endpoint and ConnectionManager for typed UTC-timestamped broadcasts, and health check.
Interactive Demo Scenario with Prescripted Workflows `backend/app/demo/prescripted.py`, `backend/app/demo/router.py`, `backend/app/demo/scenarios.py`	Prescripted queries/RCA replies and a 3-incident orchestrator (SAP slowdown, compute degradation, SQL self-heal) with start/reset/status control and DB reset+seed behavior.
Frontend Scaffolding: Configuration, Entry Point, Main App `frontend/package.json`, `frontend/index.html`, `frontend/vite.config.js`, `frontend/eslint.config.js`, `frontend/README.md`, `frontend/src/main.jsx`, `frontend/src/App.jsx`	Adds Vite+React project files, ESLint flat config, project README, React entrypoint, and App component wiring WebSocket and dashboard state hooks and composing dashboard sections.
Frontend State Management & WebSocket Hook `frontend/src/hooks/useWebSocket.js`, `frontend/src/hooks/useDashboardState.js`	useWebSocket manages connection/reconnect and lastMessage; useDashboardState provides a reducer handling incidents/metrics/agent steps/evidence/RCA/panels/topology and initial data loading from API.
Frontend Dashboard Components `frontend/src/components/*` (Header, IncidentBanner, MetricCards, TimelineChart, TopologyGraph, AIReasoningChain, EvidencePanel, RCANarrative, PanelHealth, DemoControl)	Adds UI components for status/header, incident banner with elapsed impact, KPI cards, ECharts timeline, D3 topology graph, AI reasoning chain, evidence list, RCA narrative, panel health with SQL diff, and demo control dock.
Frontend Stylesheet: Design System, Animations, Layout, Responsive `frontend/src/index.css`	Comprehensive stylesheet with font-face, design tokens, layout, component styles, SQL diff visuals, demo dock, animations, and responsive rules.

🎯 4 (Complex) | ⏱️ ~60 minutes

sequenceDiagram
  participant Browser
  participant API as FastAPI
  participant LLM as AzureOpenAI
  participant DB as PostgreSQL
  participant WS as WS Manager

  Browser->>API: POST /api/incidents/{id}/ask (question)
  API->>DB: build schema map, read incident
  API->>LLM: generate_sql(prompt with schema_map)
  LLM-->>API: SQL text
  API->>DB: explain_query / execute_readonly (validate & shadow-run)
  DB-->>API: plan / rows
  API->>DB: INSERT ops.evidence_runs
  API->>WS: broadcast(evidence_added)
  WS-->>Browser: realtime update

🐰 A rabbit in code, nibbling at bugs,
Spins schema maps and fixes SQL hugs,
LLM whispers a healed query fine,
Seeding demo scenes with a hop and a rhyme,
Dashboard hums — all realtime, sublime!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/insightsql-app

SaiHPE · 2026-05-17T09:32:38Z

@coderabbitai help whats the status?

coderabbitai

Actionable comments posted: 2

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (32)

frontend/src/hooks/useWebSocket.js-3-4 (1)

3-4: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid hardcoding the WebSocket endpoint to localhost.

This will fail outside local dev and can break under HTTPS due to mixed-content restrictions. Make the endpoint environment-driven (or protocol-aware) so it works across environments.

Suggested fix

-const WS_URL = 'ws://localhost:8000/ws';
+const WS_URL =
+  import.meta.env.VITE_WS_URL ??
+  `${window.location.protocol === 'https:' ? 'wss' : 'ws'}://${window.location.host}/ws`;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/hooks/useWebSocket.js` around lines 3 - 4, The WS_URL constant
is hardcoded to ws://localhost:8000/ws which breaks non-local and HTTPS
deployments; update the logic that defines WS_URL (and leave RECONNECT_DELAYS
unchanged) to read an environment-configurable URL (e.g.
process.env.REACT_APP_WS_URL or similar) and/or derive the scheme from
window.location.protocol (use wss: when page is https:, ws: otherwise) and
window.location.host as a fallback so the socket endpoint is environment-driven
and protocol-aware; change the WS_URL declaration to prefer the env var, then
fall back to a constructed ws/wss URL using window.location.protocol and host.

frontend/src/components/IncidentBanner.jsx-5-8 (1)

5-8: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset elapsed on incident switch and depend on a stable incident identity.

The timer currently doesn’t reset for a new incident, and using the whole incident object as dependency can cause unnecessary interval restarts.

Suggested fix

   useEffect(() => {
+    setElapsed(0);
     const t = setInterval(() => setElapsed(e => e + 1), 1000);
     return () => clearInterval(t);
-  }, [incident]);
+  }, [incident?.incident_id]);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/IncidentBanner.jsx` around lines 5 - 8, The useEffect
in IncidentBanner.jsx currently depends on the whole incident object and doesn't
reset the elapsed counter when a new incident appears; update the effect to
depend on a stable identity (e.g., incident?.id) instead of incident, and inside
the effect reset the state with setElapsed(0) before creating the interval (keep
the interval logic using setInterval and clearInterval as-is). This ensures the
timer restarts only when the incident id changes and avoids unnecessary interval
restarts due to object reference changes.

frontend/src/components/TimelineChart.jsx-4-4 (1)

4-4: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard metricsTimeline with a default to prevent undefined access.

Line 9 and Line 76 assume an array is always passed. During initial load this can throw.

Suggested fix

-export default function TimelineChart({ metricsTimeline }) {
+export default function TimelineChart({ metricsTimeline = [] }) {

Also applies to: 9-9, 76-76

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/TimelineChart.jsx` at line 4, The component
TimelineChart assumes metricsTimeline is always an array and can throw when
undefined; update the component to guard it (e.g., set a default empty array) so
accesses at TimelineChart (function) usages around the spots that read
metricsTimeline (the map/filter at the locations noted) won't crash—either
change the function signature to accept a default (metricsTimeline = []) or
create a local const safeMetrics = metricsTimeline || [] and use safeMetrics
where the code currently uses metricsTimeline.

frontend/src/components/PanelHealth.jsx-5-6 (1)

5-6: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a safe default for panels to avoid first-render crashes.

Line 16 and Line 56 assume panels is always defined. If data is still loading, this throws before the empty state can render.

Suggested fix

-export default function PanelHealth({ panels, healing }) {
+export default function PanelHealth({ panels = [], healing }) {

Also applies to: 16-16, 56-56

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/PanelHealth.jsx` around lines 5 - 6, PanelHealth can
crash on first render because panels may be undefined; give panels a safe
default and guard its usage: change the component signature to accept a default
(e.g., function PanelHealth({ panels = [], healing }) ) and/or replace direct
uses with safe access (e.g., panels?.map or panels.length checks). Keep the
existing healing handling (healingEntries = Object.values(healing || {})) but
ensure anywhere PanelHealth reads panels (iterations or indexing) uses the
default or optional chaining to avoid runtime errors.

frontend/src/index.css-162-162 (1)

162-162: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keyframe names violate configured kebab-case pattern.

slideIn and fadeIn should be renamed and their animation references updated to clear lint errors.

Suggested fix

-.banner {
+.banner {
   background: var(--crit-bg);
@@
-  animation: slideIn 0.35s ease;
+  animation: slide-in 0.35s ease;
 }
@@
-@keyframes slideIn {
+@keyframes slide-in {
   from { opacity: 0; transform: translateY(-8px); }
   to { opacity: 1; transform: translateY(0); }
 }
-@keyframes fadeIn {
+@keyframes fade-in {
   from { opacity: 0; }
   to { opacity: 1; }
 }
@@
-.anim-in { animation: fadeIn 0.3s ease; }
+.anim-in { animation: fade-in 0.3s ease; }

Also applies to: 645-645, 649-649, 657-657

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/index.css` at line 162, Keyframe names slideIn and fadeIn
violate kebab-case linting; rename the `@keyframes` declarations (e.g., slide-in
and fade-in) and update all animation property references (e.g., animation:
slideIn 0.35s ease) to use the new kebab-case identifiers (animation: slide-in
0.35s ease) so the `@keyframes` declarations and every use site match; ensure you
update both the declarations and uses of slideIn and fadeIn across the file
(including other occurrences noted) to eliminate the lint errors.

frontend/src/components/PanelHealth.jsx-35-40 (1)

35-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Error-only healing entries are currently dropped.

Line 36 returns early when SQL fields are missing, which also suppresses h.error rendering for those entries.

Suggested fix

-        {healingEntries.map(h => {
-          if (!h.old_sql && !h.new_sql) return null;
+        {healingEntries.map(h => {
+          if (!h.error && !h.old_sql && !h.new_sql) return null;
           return (
             <div key={h.panel_id} className="anim-in">
               {h.error && <div className="error-pill">Error: {h.error}</div>}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/PanelHealth.jsx` around lines 35 - 40, The current
map over healingEntries in PanelHealth.jsx returns null whenever both h.old_sql
and h.new_sql are falsy, which unintentionally drops entries that carry only
h.error; update the rendering logic in the healingEntries.map callback (the
block that checks h.old_sql and h.new_sql and returns early) to still render a
container when h.error exists even if SQL fields are missing—i.e., change the
early return to only skip when there is no h.error and no SQL, and ensure the
existing error-pill render (h.error && <div className="error-pill">...) can run
for entries lacking SQL.

frontend/src/components/TopologyGraph.jsx-65-65 (1)

65-65: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Dependency array is too narrow and can leave stale topology content.

Rebuild logic depends only on counts, so same-length updates (renames, rewired edges, replaced IDs) won’t refresh the graph.

Suggested fix

-  }, [topology.nodes.length, topology.edges.length]);
+  }, [topology.nodes, topology.edges]);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/TopologyGraph.jsx` at line 65, The effect in
TopologyGraph.jsx currently only depends on topology.nodes.length and
topology.edges.length which misses content-only changes; update the useEffect
dependency so it reacts to actual topology changes — for example replace
[topology.nodes.length, topology.edges.length] with a dependency that tracks
node/edge content such as [topology] or [JSON.stringify(topology.nodes),
JSON.stringify(topology.edges)] (or include topology.nodes and topology.edges
arrays directly) in the useEffect inside the TopologyGraph component so renames,
rewired edges or ID replacements correctly trigger a rebuild.

backend/app/panels/router.py-131-142 (1)

131-142: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap INSERT and UPDATE operations in an explicit transaction to ensure atomic active-version switch.

The two separate conn.execute() calls create a window for inconsistency: the INSERT could succeed while the UPDATE fails (leaving two active versions), or vice versa. Add async with conn.transaction(): around both operations. Also consider including the final UPDATE ops.dashboard_panels in the same transaction for consistency.

Suggested change

-        # Insert broken version
-        new_version = row["version_no"] + 1
-        await conn.execute(
-            """INSERT INTO ops.panel_query_versions
-               (panel_id, version_no, sql_text, generated_by, is_active)
-               VALUES ($1, $2, $3, 'human', true)""",
-            panel_id, new_version, broken_sql,
-        )
-        # Deactivate old version
-        await conn.execute(
-            """UPDATE ops.panel_query_versions SET is_active = false
-               WHERE panel_id = $1 AND version_no = $2""",
-            panel_id, row["version_no"],
-        )
+        new_version = row["version_no"] + 1
+        async with conn.transaction():
+            await conn.execute(
+                """UPDATE ops.panel_query_versions SET is_active = false
+                   WHERE panel_id = $1 AND version_no = $2""",
+                panel_id, row["version_no"],
+            )
+            await conn.execute(
+                """INSERT INTO ops.panel_query_versions
+                   (panel_id, version_no, sql_text, generated_by, is_active)
+                   VALUES ($1, $2, $3, 'human', true)""",
+                panel_id, new_version, broken_sql,
+            )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/panels/router.py` around lines 131 - 142, The INSERT into
ops.panel_query_versions and the subsequent UPDATE that deactivates the old
version must be executed inside a single database transaction to avoid transient
inconsistent state; wrap the two conn.execute(...) calls (and also the final
UPDATE ops.dashboard_panels if present in the same flow) in an async with
conn.transaction(): block so both the INSERT of the new version and the UPDATE
of the old version are atomic and will roll back together on error, locating the
changes around the existing calls to conn.execute(...) that insert into
ops.panel_query_versions and update ops.panel_query_versions (and the UPDATE
ops.dashboard_panels statement) in router.py.

backend/app/main.py-42-47 (1)

42-47: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix CORS wildcard + credentials configuration.

The combination allow_origins=["*"] with allow_credentials=True violates the CORS specification. Browsers will actively reject this configuration and block cross-origin requests with credentials, making cookies and authorization headers inaccessible to client-side code. Use an explicit allowlist of trusted origins instead:

Suggested change

 app.add_middleware(
     CORSMiddleware,
-    allow_origins=["*"],
+    allow_origins=[
+        "http://localhost:5173",
+        # add deployed frontend origin(s) explicitly
+    ],
     allow_credentials=True,
     allow_methods=["*"],
     allow_headers=["*"],
 )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/main.py` around lines 42 - 47, The CORS config in
app.add_middleware using CORSMiddleware currently sets allow_origins=["*"] while
allow_credentials=True which violates the CORS spec; update the CORSMiddleware
configuration in main.py (the app.add_middleware call) to replace the wildcard
origin with an explicit allowlist of trusted origins (e.g., load a list from an
env var like TRUSTED_ORIGINS or a settings variable and pass that list to
allow_origins) and keep allow_credentials=True only when the origins list is
explicit; ensure allow_methods and allow_headers remain as needed and validate
the trusted-origins parsing so the middleware receives a proper list rather than
a single comma string.

backend/app/panels/router.py-68-69 (1)

68-69: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use HTTP error status codes for missing panel query states.

These are client-visible error states that should return 404 status codes instead of success responses. Replace the error dict returns at lines 68 and 121 with proper HTTP exceptions. This requires importing HTTPException from fastapi.

Suggested change

-from fastapi import APIRouter
+from fastapi import APIRouter, HTTPException
@@
         if not row:
-            return {"error": "No active query for panel"}
+            raise HTTPException(status_code=404, detail="No active query for panel")
@@
         if not row:
-            return {"error": "No active query found"}
+            raise HTTPException(status_code=404, detail="No active query found")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/panels/router.py` around lines 68 - 69, Replace the
client-visible error dict responses that return {"error": "No active query for
panel"} with proper FastAPI HTTP exceptions: import HTTPException from fastapi
and raise HTTPException(status_code=404, detail="No active query for panel") in
both places (the two return sites that currently return that error dict). Ensure
you remove or replace the original return statements so the endpoint raises the
exception instead of returning a 200 JSON payload.

backend/app/main.py-62-71 (1)

62-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move manager.disconnect() to a finally block to ensure cleanup on all exceptions.

Currently, cleanup only happens when WebSocketDisconnect is caught. Other exceptions raised by websocket.receive_text() or manager.send_personal() will leave stale sockets in manager.active_connections, causing resource leaks.

Suggested change

 `@app.websocket`("/ws")
 async def websocket_endpoint(websocket: WebSocket):
     """WebSocket endpoint for real-time dashboard updates."""
     await manager.connect(websocket)
     try:
         while True:
             # Keep connection alive, listen for client messages
             data = await websocket.receive_text()
             # Client can send pings or commands
             if data == "ping":
                 await manager.send_personal(websocket, "pong", {})
-    except WebSocketDisconnect:
-        manager.disconnect(websocket)
+    except WebSocketDisconnect:
+        pass
+    finally:
+        manager.disconnect(websocket)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/main.py` around lines 62 - 71, The cleanup call
manager.disconnect(websocket) must run for all exit paths, not just on
WebSocketDisconnect; wrap the loop in try/finally so that after the async loop
using websocket.receive_text() and manager.send_personal(...) ends (whether due
to WebSocketDisconnect or any other exception) manager.disconnect(websocket) is
invoked in the finally block; leave the existing WebSocketDisconnect except
block only if you need special handling, but ensure the final cleanup is
performed in the finally section referencing manager.disconnect(websocket).

backend/app/incidents/router.py-35-35 (1)

35-35: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use HTTPException for error responses instead of returning error dictionaries with 200 status.

Lines 35 and 67 return error payloads as JSON with HTTP 200 (success) status. Use HTTPException to return proper HTTP error codes: 404 for the missing incident (line 35) and 422 for the missing question validation error (line 67).

Suggested change

-from fastapi import APIRouter
+from fastapi import APIRouter, HTTPException
@@
         if not incident:
-            return {"error": "Incident not found"}
+            raise HTTPException(status_code=404, detail="Incident not found")
@@
     question = body.get("question", "")
     if not question:
-        return {"error": "question is required"}
+        raise HTTPException(status_code=422, detail="question is required")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/incidents/router.py` at line 35, Replace the plain JSON error
returns with FastAPI HTTPException raises: where the code currently returns
{"error": "Incident not found"} (in the incident lookup handler, symbol: router
or the endpoint function that checks for the incident) raise
HTTPException(status_code=404, detail="Incident not found"); likewise replace
the validation error return (the branch that returns the missing-question
payload at the location around line 67 in the same endpoint or validation
helper) with raise HTTPException(status_code=422, detail="Missing or invalid
question"); ensure you import HTTPException from fastapi at the top of the
module if not already imported.

backend/app/ws/manager.py-35-39 (1)

35-39: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden broadcast loop against concurrent list mutation and blind exception swallowing.

Iterate over a snapshot and catch expected socket-send failures explicitly; avoid masking unrelated bugs. The current code directly iterates over self.active_connections and then modifies it via disconnect() calls, which can cause a RuntimeError if a disconnect occurs during iteration. Additionally, the bare except Exception: masks unexpected errors.

Suggested change

-from fastapi import WebSocket
+from fastapi import WebSocket, WebSocketDisconnect
@@
-        for connection in self.active_connections:
+        for connection in list(self.active_connections):
             try:
                 await connection.send_text(message_json)
-            except Exception:
+            except (WebSocketDisconnect, RuntimeError):
                 disconnected.append(connection)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/ws/manager.py` around lines 35 - 39, The broadcast loop iterates
self.active_connections directly and uses a bare except which can both raise
RuntimeError if the list mutates and hide real bugs; copy the connections into a
snapshot (e.g., list(self.active_connections)) before iterating, catch only
expected send failures (e.g., WebSocketDisconnect,
ConnectionClosedError/ConnectionClosedOK, asyncio.CancelledError) when awaiting
connection.send_text, append those failed connections to disconnected, and after
the loop call the existing disconnect/cleanup logic for each entry; re-raise or
log other unexpected exceptions instead of swallowing them. Reference symbols:
self.active_connections, connection.send_text, disconnected, disconnect().

backend/app/panels/router.py-80-86 (1)

80-86: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Catch database errors explicitly instead of masking programming errors.

The bare Exception catch silently records application bugs (e.g., attribute errors, type mismatches) as panel SQL failures. Catch asyncpg.PostgresError to only record legitimate database errors and let unexpected exceptions surface for debugging.

Suggested change

+import asyncpg
@@
-        except Exception as e:
+        except asyncpg.PostgresError as e:
             # Record failure
             await conn.execute(
                 """INSERT INTO ops.query_failures (panel_id, error_text, bad_sql)
                    VALUES ($1, $2, $3)""",
                 panel_id, str(e), row["sql_text"],
             )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/panels/router.py` around lines 80 - 86, Replace the broad "except
Exception as e" that records failures in ops.query_failures with an exception
handler that only catches asyncpg.PostgresError (import asyncpg.PostgresError at
the top), so that only database errors trigger the INSERT via conn.execute; let
other exceptions propagate (or re-raise) so programming errors aren't masked.
Specifically, update the try/except around the query execution (the block that
calls conn.execute to insert into ops.query_failures) to "except
asyncpg.PostgresError as e:" and preserve the existing INSERT logic inside that
handler.

backend/app/demo/scenarios.py-274-296 (1)

274-296: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make panel-break mutation atomic to avoid partial panel state.

This sequence updates ops.panel_query_versions, ops.dashboard_panels, and ops.query_failures without a transaction. If any statement fails after is_active=false, the panel can be left with no active query version.

Suggested fix

     # Break the panel
     async with pool.acquire() as conn:
-        active = await conn.fetchrow(
-            "SELECT version_no, sql_text FROM ops.panel_query_versions WHERE panel_id = $1 AND is_active = true",
-            panel_id,
-        )
-        if not active:
-            return
-
-        broken_sql = active["sql_text"].replace("display_name", "resource_name")
-        new_version = active["version_no"] + 1
-
-        await conn.execute("UPDATE ops.panel_query_versions SET is_active = false WHERE panel_id = $1", panel_id)
-        await conn.execute(
-            """INSERT INTO ops.panel_query_versions (panel_id, version_no, sql_text, generated_by, is_active)
-               VALUES ($1, $2, $3, 'human', true)""",
-            panel_id, new_version, broken_sql,
-        )
-        await conn.execute("UPDATE ops.dashboard_panels SET status = 'failed' WHERE panel_id = $1", panel_id)
-        await conn.execute(
-            """INSERT INTO ops.query_failures (panel_id, error_text, bad_sql)
-               VALUES ($1, 'column \"resource_name\" does not exist', $2)""",
-            panel_id, broken_sql,
-        )
+        async with conn.transaction():
+            active = await conn.fetchrow(
+                "SELECT version_no, sql_text FROM ops.panel_query_versions WHERE panel_id = $1 AND is_active = true",
+                panel_id,
+            )
+            if not active:
+                return
+
+            broken_sql = active["sql_text"].replace("display_name", "resource_name")
+            new_version = active["version_no"] + 1
+
+            await conn.execute("UPDATE ops.panel_query_versions SET is_active = false WHERE panel_id = $1", panel_id)
+            await conn.execute(
+                """INSERT INTO ops.panel_query_versions (panel_id, version_no, sql_text, generated_by, is_active)
+                   VALUES ($1, $2, $3, 'human', true)""",
+                panel_id, new_version, broken_sql,
+            )
+            await conn.execute("UPDATE ops.dashboard_panels SET status = 'failed' WHERE panel_id = $1", panel_id)
+            await conn.execute(
+                """INSERT INTO ops.query_failures (panel_id, error_text, bad_sql)
+                   VALUES ($1, 'column \"resource_name\" does not exist', $2)""",
+                panel_id, broken_sql,
+            )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/demo/scenarios.py` around lines 274 - 296, The sequence that
updates ops.panel_query_versions, ops.dashboard_panels, and inserts into
ops.query_failures (using variables active, broken_sql, new_version) must be
executed atomically; wrap the block that sets is_active = false, INSERTs the new
version, updates dashboard_panels status, and INSERTs query_failures in a single
database transaction (use conn.transaction() or equivalent on the acquired conn)
so that any error rolls back all changes and prevents leaving no active version.

backend/app/demo/router.py-53-68 (1)

53-68: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run the reset DELETE sequence inside one transaction.

If any DELETE fails, the current approach can leave a partially reset database.

Suggested fix

     pool = await get_pool()
     async with pool.acquire() as conn:
-        # Truncate all telemetry and state tables (order matters for FK constraints)
-        await conn.execute("DELETE FROM ops.remediation_actions")
-        await conn.execute("DELETE FROM ops.evidence_runs")
-        await conn.execute("DELETE FROM ops.incidents")
-        await conn.execute("DELETE FROM ops.query_failures")
-        await conn.execute("DELETE FROM ops.panel_query_versions")
-        await conn.execute("DELETE FROM ops.dashboard_panels")
-        await conn.execute("DELETE FROM ops.sap_alerts")
-        await conn.execute("DELETE FROM ops.sap_backups")
-        await conn.execute("DELETE FROM ops.events_norm")
-        await conn.execute("DELETE FROM ops.alerts_raw")
-        await conn.execute("DELETE FROM ops.metrics_norm")
-        await conn.execute("DELETE FROM ops.resource_edges")
-        await conn.execute("DELETE FROM ops.resources")
+        async with conn.transaction():
+            # Truncate all telemetry and state tables (order matters for FK constraints)
+            await conn.execute("DELETE FROM ops.remediation_actions")
+            await conn.execute("DELETE FROM ops.evidence_runs")
+            await conn.execute("DELETE FROM ops.incidents")
+            await conn.execute("DELETE FROM ops.query_failures")
+            await conn.execute("DELETE FROM ops.panel_query_versions")
+            await conn.execute("DELETE FROM ops.dashboard_panels")
+            await conn.execute("DELETE FROM ops.sap_alerts")
+            await conn.execute("DELETE FROM ops.sap_backups")
+            await conn.execute("DELETE FROM ops.events_norm")
+            await conn.execute("DELETE FROM ops.alerts_raw")
+            await conn.execute("DELETE FROM ops.metrics_norm")
+            await conn.execute("DELETE FROM ops.resource_edges")
+            await conn.execute("DELETE FROM ops.resources")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/demo/router.py` around lines 53 - 68, The DELETE sequence
executed under async with pool.acquire() should be run inside a single database
transaction so any failure rolls back all prior deletes; update the block that
acquires a connection (the async with pool.acquire() usage in
backend/app/demo/router.py) to open a transaction (e.g., async with
conn.transaction():) and move all the await conn.execute("DELETE FROM ...")
calls into that transaction scope so the DB will commit only if all deletes
succeed and automatically rollback on error.

backend/app/demo/router.py-48-51 (1)

48-51: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Await task cancellation before wiping and reseeding state.

cancel() is fire-and-forget here; the demo task can still run while reset is deleting/reseeding tables, causing interleaved writes.

Suggested fix

     if _demo_state["task"] and not _demo_state["task"].done():
         _demo_state["task"].cancel()
+        try:
+            await _demo_state["task"]
+        except asyncio.CancelledError:
+            pass
         _demo_state["running"] = False

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/demo/router.py` around lines 48 - 51, The reset currently cancels
the background demo task with _demo_state["task"].cancel() but does not await
it, so the task may continue running while you wipe/reseed; change the flow to
call _demo_state["task"].cancel(), then await _demo_state["task"] (handling
asyncio.CancelledError and other exceptions) before clearing or reseeding tables
and setting _demo_state["running"]=False; ensure any exceptions from awaiting
the task are logged/handled and that _demo_state is only reset after the awaited
cancellation completes.

backend/app/demo/router.py-24-29 (1)

24-29: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Serialize demo startup state transitions and reject duplicate starts with HTTP 409.

The check/set on _demo_state["running"] is non-atomic, so concurrent /start requests can launch multiple demo tasks. Also, returning {"error": ...} with 200 makes client error handling unreliable.

Suggested fix

 import asyncio
-from fastapi import APIRouter, BackgroundTasks
+from fastapi import APIRouter, BackgroundTasks, HTTPException
 ...
 router = APIRouter()
+_demo_lock = asyncio.Lock()
 ...
 `@router.post`("/start")
 async def start_demo(background_tasks: BackgroundTasks):
     """Start the full 3-incident demo sequence."""
-    if _demo_state["running"]:
-        return {"error": "Demo already running"}
-
-    pool = await get_pool()
-    _demo_state["running"] = True
-    _demo_state["phase"] = "starting"
+    async with _demo_lock:
+        if _demo_state["running"]:
+            raise HTTPException(status_code=409, detail="Demo already running")
+        pool = await get_pool()
+        _demo_state["running"] = True
+        _demo_state["phase"] = "starting"
 ...
-    _demo_state["task"] = asyncio.create_task(_run())
+        _demo_state["task"] = asyncio.create_task(_run())

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/demo/router.py` around lines 24 - 29, The non-atomic check/set on
_demo_state["running"] can allow concurrent /start requests to race; wrap the
startup state transition in a serialized async lock (e.g., an asyncio.Lock or
FastAPI dependency) so only one coroutine may check-and-set _demo_state at a
time, perform the get_pool() call after acquiring the lock, and set
_demo_state["running"]=True and _demo_state["phase"]="starting" while holding
the lock; if _demo_state["running"] is already true, return an HTTP 409 response
(e.g., raise an HTTPException(status_code=409) or equivalent) instead of
returning a 200 JSON error so clients see a proper conflict status.

backend/app/config.py-1-34 (1)

1-34: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove duplicate Settings definition.

This entire file duplicates backend/app/__init__.py. Both files define identical Settings classes and get_settings() functions. Consolidate to a single location to prevent maintenance issues and confusion about which module to import from.
♻️ Recommended approach

Keep the configuration in backend/app/config.py (more explicit module name), and remove the duplicate from backend/app/__init__.py:

backend/app/init.py:
# Application package
Files importing Settings should use:
from app.config import get_settings
This eliminates duplication and establishes app.config as the canonical configuration source.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/config.py` around lines 1 - 34, Remove the duplicate Settings
class and get_settings function from backend/app/__init__.py and keep the
canonical definitions in backend/app/config.py (Settings, get_settings); then
update all imports across the codebase to import the configuration from
app.config (e.g. from app.config import get_settings) and simplify
backend/app/__init__.py to a minimal package initializer (no
Settings/get_settings definitions) so there is a single source of truth.

backend/.env.example-1-4 (1)

1-4: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace real credentials and endpoints with placeholders.

The example file contains what appear to be real Azure endpoints and database credentials. Example configuration should use placeholder values to prevent accidental credential leakage if this file is committed or shared.

🔒 Proposed fix with placeholder values

-AZURE_OPENAI_ENDPOINT=https://oai-gopoc-prod-northcentralus-001.openai.azure.com/
+AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/
 AZURE_OPENAI_KEY=your-key-here
 AZURE_OPENAI_API_VERSION=2024-12-01-preview
-DATABASE_URL=postgresql://insightsql:[email protected]:5432/insightsql
+DATABASE_URL=postgresql://username:password@localhost:5432/insightsql

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/.env.example` around lines 1 - 4, The .env.example currently contains
real-looking secrets; replace the concrete values for AZURE_OPENAI_ENDPOINT,
AZURE_OPENAI_KEY, AZURE_OPENAI_API_VERSION, and DATABASE_URL with generic
placeholder values (e.g., use example URLs/keys and a template DB connection
string) so no real endpoints/credentials remain in the example file; ensure the
placeholders clearly indicate they are fake (like YOUR_AZURE_ENDPOINT,
YOUR_AZURE_KEY, YOUR_API_VERSION, and
POSTGRESQL://USER:PASSWORD@HOST:PORT/DBNAME) and keep the same variable names so
consuming code examples remain valid.

backend/app/config.py-9-9 (1)

9-9: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove hardcoded database credentials from code.

The default value contains hardcoded credentials and an internal IP address. Credentials should never appear in source code, even as defaults. Force explicit configuration via environment variables.

🔒 Proposed fix

-    database_url: str = "postgresql://insightsql:[email protected]:5432/insightsql"
+    database_url: str = ""  # Required: set via DATABASE_URL environment variable

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/config.py` at line 9, The database_url default in config.py
contains hardcoded credentials; remove the literal default and require
configuration from the environment instead: change the database_url declaration
(symbol: database_url) so it is populated from an environment variable (e.g.,
os.environ["DATABASE_URL"] or via your settings loader) or left unset and raise
a clear error if not provided, ensuring no credential string remains in source;
update any usage of database_url to expect an explicitly provided value at
startup and add a short startup-time validation that fails fast when
DATABASE_URL is missing.

backend/app/__init__.py-7-7 (1)

7-7: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove hardcoded database credentials from code.

The default value contains hardcoded credentials and an internal IP address. Credentials should never appear in source code, even as defaults. Force explicit configuration via environment variables by using an empty string or raising an error if DATABASE_URL is not set.

🔒 Proposed fix to require explicit configuration

-    database_url: str = "postgresql://insightsql:[email protected]:5432/insightsql"
+    database_url: str = ""  # Required: set via DATABASE_URL environment variable

Or use Pydantic's field validation to enforce it:

+from pydantic import Field
+
 class Settings(BaseSettings):
     # Database
-    database_url: str = "postgresql://insightsql:[email protected]:5432/insightsql"
+    database_url: str = Field(..., description="PostgreSQL connection string")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/__init__.py` at line 7, The variable database_url in
backend/app/__init__.py currently contains hardcoded credentials; remove the
hardcoded default and require explicit configuration from environment or
validation instead. Change database_url so it does not include secrets (set to
an empty string or None by default) and read the real value from an environment
variable like DATABASE_URL at startup; if DATABASE_URL is missing, raise an
explicit error or use Pydantic/Field validation on the settings class to fail
fast. Update any code that references database_url to expect the new required
value and ensure no secrets remain in source.

backend/app/ingestion/router.py-12-67 (1)

12-67: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Protect ingestion endpoints with webhook authentication.

All three endpoints accept and process unauthenticated payloads. This allows arbitrary event/metric injection and broadcast abuse.

Use a shared secret signature (e.g., HMAC header) or API key dependency before calling normalizers/broadcast.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/ingestion/router.py` around lines 12 - 67, Add webhook
authentication to the ingestion endpoints (receive_alert, receive_metrics,
receive_compute_event) by verifying a shared secret before calling get_pool,
normalize_* functions or manager.broadcast: read the raw request body and a
signature header (e.g., X-Signature) or API key header (e.g., X-Api-Key),
compute/compare an HMAC using the shared secret from config/env, and return
401/raise an HTTPException when the header is missing or the signature/key is
invalid; apply the same verification logic to all three handlers so no
unauthenticated payload reaches normalize_alert, normalize_metrics or
normalize_compute_event or triggers manager.broadcast.

backend/app/validation/executor.py-22-24 (1)

22-24: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add execution guardrails (timeout + row cap) to prevent resource exhaustion.

Current execution can run very expensive queries and materialize unlimited result sets.

At minimum set a local statement_timeout in the transaction and cap rows returned (explicit limit or cursor/pagination).

Also applies to: 36-50

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/validation/executor.py` around lines 22 - 24, Add execution
guardrails inside the transaction around conn.transaction(): set a local
statement timeout (e.g. await conn.execute("SET LOCAL statement_timeout =
'5000ms'")) before executing EXPLAIN, and enforce a maximum row cap when
executing the actual query (either append/ensure a LIMIT n to the SQL or use a
server-side cursor/fetching API to read at most N rows). Update both the EXPLAIN
call that uses conn.fetch (plan_rows) and the subsequent query execution path
(the block around conn.transaction() and the code referenced in lines 36-50) to
apply the same statement_timeout and row cap behavior so expensive queries
cannot run indefinitely or materialize unlimited result sets.

backend/app/db/seed.py-8-22 (1)

8-22: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make seeding idempotency atomic across concurrent startups.

Line 12’s pre-check and the subsequent inserts are not atomic. Two app instances can both pass the empty check, then both seed; ops.metrics_norm writes are not conflict-protected, so duplicates are possible.

Suggested direction

 async def seed_all(pool: asyncpg.Pool):
     """Run all seed functions."""
-    async with pool.acquire() as conn:
-        # Check if already seeded
-        count = await conn.fetchval("SELECT count(*) FROM ops.resources")
-        if count > 0:
-            print("[SEED] Database already seeded, skipping.")
-            return
-
-    await seed_resources(pool)
-    await seed_topology(pool)
-    await seed_baseline_metrics(pool)
-    await seed_baseline_backups(pool)
-    await seed_dashboard_panels(pool)
+    async with pool.acquire() as conn:
+        async with conn.transaction():
+            # Serialize seeding across instances
+            await conn.execute("SELECT pg_advisory_xact_lock(984321)")
+            count = await conn.fetchval("SELECT count(*) FROM ops.resources")
+            if count > 0:
+                print("[SEED] Database already seeded, skipping.")
+                return
+            # call conn-scoped seed helpers (or pass conn instead of pool)
+            ...
     print("[SEED] All seed data loaded.")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/db/seed.py` around lines 8 - 22, The current pre-check in
seed_all is not atomic so concurrent app startups can both see an empty
ops.resources and run seed_* functions (seed_resources, seed_topology,
seed_baseline_metrics, seed_baseline_backups, seed_dashboard_panels) causing
duplicate writes (notably to ops.metrics_norm). Make seeding atomic by acquiring
a database-wide lock or using an idempotent guard row inside a transaction
before running seeds: for example obtain a Postgres advisory lock (or INSERT
INTO a dedicated seed_control table with a unique key using INSERT ... ON
CONFLICT DO NOTHING and check the result) and only run seed_* when the
lock/guard indicates this process owns the seed operation; release the lock
after seeding. Ensure ops.metrics_norm inserts are using upsert/unique
constraints or guarded by the same atomic check to prevent duplicates.

backend/app/ingestion/normalizer.py-54-57 (1)

54-57: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard per-metric parsing so one bad value doesn’t drop the whole payload.

float(metric_value) can raise and abort the entire request. Skip invalid points (or collect errors) and insert valid metrics.

Proposed fix

     rows = []
     for metric_name, metric_value in metrics.items():
         unit = _infer_unit(metric_name)
-        rows.append((event_ts, resource_id, metric_name, float(metric_value), unit, json.dumps(labels)))
+        try:
+            value = float(metric_value)
+        except (TypeError, ValueError):
+            continue
+        rows.append((event_ts, resource_id, metric_name, value, unit, json.dumps(labels)))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/ingestion/normalizer.py` around lines 54 - 57, The loop that
appends rows uses float(metric_value) directly and a single bad metric will
abort processing; wrap the per-metric parsing inside a try/except around
converting metric_value to float (and around any unit inference if needed) in
the for metric_name, metric_value in metrics.items() loop, skip invalid metrics
instead of raising, and optionally collect or log parsing errors (e.g., append
to an errors list or call a logger) while still appending valid rows with
rows.append((event_ts, resource_id, metric_name, float_val,
_infer_unit(metric_name), json.dumps(labels))).

backend/app/ingestion/normalizer.py-127-135 (1)

127-135: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize naive timestamps to UTC in parser.

datetime.fromisoformat(...) may return a naive datetime. Persisting that can shift event ordering depending on DB/session timezone.

Proposed fix

 def _parse_ts(ts_str: str | None) -> datetime:
@@
     try:
-        return datetime.fromisoformat(ts_str.replace("Z", "+00:00"))
+        dt = datetime.fromisoformat(ts_str.replace("Z", "+00:00"))
+        return dt if dt.tzinfo else dt.replace(tzinfo=timezone.utc)
     except (ValueError, AttributeError):
         return datetime.now(timezone.utc)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/ingestion/normalizer.py` around lines 127 - 135, The _parse_ts
function can return naive datetimes from datetime.fromisoformat which can lead
to inconsistent ordering; update _parse_ts so that after parsing (in both
success and fallback paths) any naive datetime is converted to an aware UTC
datetime (e.g., check if parsed_dt.tzinfo is None and set/replace it to
timezone.utc) and continue to return timezone-aware datetimes (use timezone.utc
for the now() fallback as already used).

backend/app/db/seed.py-255-266 (1)

255-266: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Insert panel metadata and version in one transaction.

If Line 258 succeeds and Line 263 fails, you can leave a panel without an active query version.

Proposed fix

     async with pool.acquire() as conn:
-        for p in panels:
-            await conn.execute(
+        async with conn.transaction():
+            for p in panels:
+                await conn.execute(
                 """INSERT INTO ops.dashboard_panels (panel_id, panel_name, contract_json, status)
                    VALUES ($1, $2, $3::jsonb, 'active') ON CONFLICT DO NOTHING""",
                 p["panel_id"], p["panel_name"], p["contract_json"],
-            )
-            await conn.execute(
+                )
+                await conn.execute(
                 """INSERT INTO ops.panel_query_versions (panel_id, version_no, sql_text, generated_by, is_active)
                    VALUES ($1, 1, $2, 'human', true) ON CONFLICT DO NOTHING""",
                 p["panel_id"], p["sql"],
-            )
+                )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/db/seed.py` around lines 255 - 266, The two INSERTs executed
under async with pool.acquire() for each panel must be atomic so a panel row in
ops.dashboard_panels cannot be created without a matching active row in
ops.panel_query_versions; wrap the pair of statements for each p in a
transaction (use conn.transaction() or an explicit BEGIN/COMMIT via the
connection) so that both INSERTs (into ops.dashboard_panels with
panel_id/panel_name/contract_json and into ops.panel_query_versions with
version_no/sql_text/generated_by/is_active) succeed or both roll back on error.

backend/app/validation/sqlglot_gate.py-63-65 (1)

63-65: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return plain projected column names too.

extract_columns() only walks exp.Alias, so non-aliased outputs are dropped. For example, SELECT count(*) AS total, host_id ... comes back as ["total"], which will make any contract check built on this helper misread the actual result shape.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/validation/sqlglot_gate.py` around lines 63 - 65, extract_columns
currently only collects names from exp.Alias nodes so plain projected columns
are dropped; update extract_columns to iterate the SELECT projection expressions
(use parsed.find_all(exp.Select) or parsed.find(exp.Select).expressions) and for
each projection: if it's an exp.Alias use expr.alias, else if it's an exp.Column
or exp.Identifier/exp.Named this use the column name (e.g., expr.name or
expr.this.name), and as a fallback derive a sensible plain expression string;
adjust references to parsed.find_all(exp.Alias) to this new loop so both aliased
and non-aliased outputs are returned.

backend/app/agent/healer.py-130-160 (1)

130-160: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail the heal if the shadow run fails or the result shape drifts.

This block never checks shadow_result.success, and it still does not validate the returned columns against the panel contract before broadcasting "complete" and moving on to promotion. A query that passes EXPLAIN but fails at execution, or returns the wrong columns, can still be marked "healed" and pushed to the dashboard.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/agent/healer.py` around lines 130 - 160, The shadow run logic
after calling execute_readonly(…) must verify execution succeeded and that the
returned columns match the panel contract before broadcasting completion or
promoting the healed SQL; update the block using shadow_result.success to detect
execution failures (broadcast "validating_fix" status "failed" with error and
return an error) and compare shadow_result.columns (or the executed result
schema) against the panel's expected columns/contract (e.g., the panel contract
object used elsewhere for panel_id); if the shape drifts broadcast a failure
with details and return, otherwise continue and include row_count/plan_cost when
broadcasting "complete".

backend/app/agent/rca.py-32-39 (1)

32-39: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope the event context to the incident being analyzed.

This query pulls the last 30 minutes of warning/critical events for the whole platform. That means a delayed RCA can miss the incident’s own events, while concurrent incidents can inject unrelated failures into the prompt. Use the incident’s start time and, if the schema supports it, the incident’s tenant/resources to bound this context.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/agent/rca.py` around lines 32 - 39, The current conn.fetch SQL
pulls the last 30 minutes of platform-wide events; change it to scope events to
the incident being analyzed by replacing the fixed time window with the incident
start time (use the incident.start_ts or incident.start_time parameter) and, if
available in ops.events_norm, add WHERE clauses to filter by tenant_id and/or
resource_id(s) (e.g., incident.tenant_id or incident.resource_ids) using
parameterized placeholders passed into conn.fetch; keep ordering and limit but
ensure the query binds the incident parameters so only events for that
incident’s time range and tenant/resources are returned.

backend/app/agent/llm.py-34-95 (1)

34-95: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add retry logic with failover to different deployments for transient OpenAI errors.

_next_deployment() only distributes requests across deployments; it provides no failover for failures. When client.chat.completions.create() encounters a 429, 5xx, or timeout, the request fails immediately without retrying on another deployment. Create a wrapper around the completion calls that catches transient OpenAI errors, retries remaining deployments, and only raises after all have been exhausted.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/agent/llm.py` around lines 34 - 95, The current calls to
client.chat.completions.create in generate_sql, generate_narrative, and
generate_json have no failover; add a retry wrapper function (e.g.,
call_with_deployment_failover) that: obtains the client via _get_client(),
iterates through deployments returned by repeatedly calling _next_deployment()
(or a snapshot list of deployments), calls client.chat.completions.create with
the original kwargs, catches transient errors (HTTP 429, any 5xx, and
network/timeout exceptions), logs or record the failure, then retries the same
request on the next deployment until all deployments are exhausted; only
re-raise the last caught error if all attempts fail. Use this wrapper inside
generate_sql/generate_narrative/generate_json instead of calling
client.chat.completions.create directly so failures automatically failover
across deployments.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 89f8202e-aa0f-4a45-9114-a103bc10f6f7

📥 Commits

Reviewing files that changed from the base of the PR and between 2885b97 and b418174.

⛔ Files ignored due to path filters (9)

backend/uv.lock is excluded by !**/*.lock
frontend/package-lock.json is excluded by !**/package-lock.json
frontend/public/favicon.svg is excluded by !**/*.svg
frontend/public/fonts/HPE Graphik-Bold.otf is excluded by !**/*.otf
frontend/public/fonts/HPE Graphik-Light.otf is excluded by !**/*.otf
frontend/public/fonts/HPE Graphik-Medium.otf is excluded by !**/*.otf
frontend/public/fonts/HPE Graphik-Regular.otf is excluded by !**/*.otf
frontend/public/fonts/HPE Graphik-Semibold.otf is excluded by !**/*.otf
frontend/public/icons.svg is excluded by !**/*.svg

📒 Files selected for processing (57)

.gitignore
README.md
backend/.env.example
backend/.python-version
backend/app/__init__.py
backend/app/agent/__init__.py
backend/app/agent/healer.py
backend/app/agent/llm.py
backend/app/agent/prompts.py
backend/app/agent/rca.py
backend/app/agent/text_to_sql.py
backend/app/config.py
backend/app/db/__init__.py
backend/app/db/engine.py
backend/app/db/schema.sql
backend/app/db/seed.py
backend/app/demo/__init__.py
backend/app/demo/prescripted.py
backend/app/demo/router.py
backend/app/demo/scenarios.py
backend/app/incidents/__init__.py
backend/app/incidents/router.py
backend/app/ingestion/__init__.py
backend/app/ingestion/normalizer.py
backend/app/ingestion/router.py
backend/app/main.py
backend/app/panels/__init__.py
backend/app/panels/router.py
backend/app/topology/__init__.py
backend/app/topology/router.py
backend/app/validation/__init__.py
backend/app/validation/executor.py
backend/app/validation/sqlglot_gate.py
backend/app/ws/__init__.py
backend/app/ws/manager.py
backend/pyproject.toml
frontend/.gitignore
frontend/README.md
frontend/eslint.config.js
frontend/index.html
frontend/package.json
frontend/src/App.jsx
frontend/src/components/AIReasoningChain.jsx
frontend/src/components/DemoControl.jsx
frontend/src/components/EvidencePanel.jsx
frontend/src/components/Header.jsx
frontend/src/components/IncidentBanner.jsx
frontend/src/components/MetricCards.jsx
frontend/src/components/PanelHealth.jsx
frontend/src/components/RCANarrative.jsx
frontend/src/components/TimelineChart.jsx
frontend/src/components/TopologyGraph.jsx
frontend/src/hooks/useDashboardState.js
frontend/src/hooks/useWebSocket.js
frontend/src/index.css
frontend/src/main.jsx
frontend/vite.config.js

coderabbitai · 2026-05-17T09:33:05Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get PR status details
gh pr view 1 --repo SaiHPE/InsightSQL--Digithon --json title,state,mergeable,mergeStateStatus,reviewDecision,statusCheckRollup,reviews,assignees,reviewRequests,isDraft,headRefName,baseRefName,createdAt,updatedAt,comments,author

Length of output: 6682

@SaiHPE Here's the current status of PR #1 — Feature update (feat/insightsql-app → main):

Item	Status
PR State	🟢 Open
Mergeable	✅ Yes
Merge State	⚠️ Unstable (pending checks)
Human Reviews	None yet
CodeRabbit Review	🔄 In Progress

The review is still actively processing. Here's what's happening:

57 files are queued for review (backend Python/FastAPI code + frontend React components)
9 files were skipped due to path filters (lock files, .svg, .otf fonts)
The CodeRabbit status check shows PENDING since 09:25 UTC

The UNSTABLE merge state is because the CodeRabbit check hasn't completed yet. Once the review finishes, you'll get the full feedback here on the PR. It should complete shortly! 🐇

Frontend: - useWebSocket: env-driven WS URL with protocol-aware fallback - IncidentBanner: reset elapsed on incident switch, stable dependency - TimelineChart: default metricsTimeline=[] to prevent crash - PanelHealth: default panels=[], preserve error-only healing entries - TopologyGraph: widen useEffect deps to track content changes - index.css: rename keyframes to kebab-case (slide-in, fade-in) Backend: - main.py: fix CORS wildcard+credentials violation, move disconnect to finally - panels/router.py: HTTPException 404, catch asyncpg.PostgresError, atomic version switch - incidents/router.py: HTTPException 404/422 for errors - ws/manager.py: iterate snapshot, catch only expected exceptions - demo/router.py: await task cancel, transaction reset, asyncio.Lock startup, HTTP 409 - demo/scenarios.py: atomic transaction for panel break - config.py: remove hardcoded credentials, require env var - __init__.py: deduplicate Settings (canonical in config.py) - .env.example: replace real credentials with placeholders - ingestion/normalizer.py: guard float parsing, normalize naive timestamps to UTC - ingestion/router.py: add webhook API key authentication - validation/executor.py: add statement_timeout + row cap guardrails - db/seed.py: atomic seeding with advisory lock, transaction for panel inserts - validation/sqlglot_gate.py: reject non-ops schema tables, fix extract_columns - agent/healer.py: atomic promotion with row lock, check shadow result + contract - agent/rca.py: scope events to incident start time - agent/llm.py: retry with failover across deployments on transient errors

SaiHPE · 2026-05-17T09:48:51Z

@CodeRabbit i made changes to pr check

coderabbitai · 2026-05-17T09:49:21Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get updated PR status
gh pr view 1 --repo SaiHPE/InsightSQL--Digithon --json title,state,mergeable,mergeStateStatus,reviewDecision,statusCheckRollup,reviews,isDraft,headRefName,baseRefName,updatedAt

Length of output: 65155

@SaiHPE Here's the updated status of PR #1:

Item	Status
PR State	🟢 Open
Mergeable	✅ Yes
Merge State	⚠️ Unstable
CodeRabbit Review	✅ Complete
Human Reviews	None yet

The CodeRabbit review has now completed and is marked ✅ SUCCESS. The review surfaced some important items to address before merging:

🔴 Critical (inline comments):

2 critical issues flagged directly on the diff

🟠 Major (32 comments), highlights include:

Hardcoded credentials in backend/app/config.py and backend/app/__init__.py (internal IP + DB password in source code)
Real endpoints in .env.example — should use placeholders only
CORS misconfiguration — allow_origins=["*"] + allow_credentials=True violates the spec and will be rejected by browsers
Missing database transactions in several places (panel break, demo reset, seeding)
Duplicate Settings class defined in both app/__init__.py and app/config.py
Hardcoded ws://localhost:8000/ws in the frontend WebSocket hook
No webhook authentication on ingestion endpoints

The merge state remains UNSTABLE likely because no human reviewer has approved yet. Addressing the credential/security findings above would be the highest priority before merging. 🐇

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

backend/app/agent/llm.py (1)

74-77: ⚡ Quick win

Rename ambiguous variable l to line.

The single-letter l is easily confused with 1 or I. Ruff flagged this as E741.

Proposed fix

     if content.startswith("```"):
         lines = content.split("\n")
-        lines = [l for l in lines if not l.strip().startswith("```")]
+        lines = [line for line in lines if not line.strip().startswith("```")]
         content = "\n".join(lines).strip()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/agent/llm.py` around lines 74 - 77, The list comprehension using
the single-letter variable `l` is ambiguous (Ruff E741); update the
comprehension in the block that strips triple-backtick code fences so that `l`
is renamed to `line` (i.e., change `lines = [l for l in lines if not
l.strip().startswith("```")]` to use `line`), leaving the surrounding logic (`if
content.startswith("```")`, `lines = content.split("\n")`, and `content =
"\n".join(lines).strip()`) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/agent/healer.py`:
- Around line 31-34: The code currently reads active_query (via conn.fetchrow ->
active_query) before the long LLM/validation flow and then later runs the
promotion transaction which deactivates the prior active version and promotes
the repaired SQL; re-check the active version inside that promotion transaction
to avoid promoting a stale fix: within the same transactional block (the code
that deactivates and promotes), re-query ops.panel_query_versions for the
current active row (checking version_no) and compare it to the original
active_query.version_no, abort or raise/skip the promotion if they differ, or
base the deactivation/promotion on the freshly fetched row; update any use sites
referencing active_query.version_no in the transaction to use the re-fetched
value so you never deactivate/promote against an out-of-date base.

In `@backend/app/agent/llm.py`:
- Around line 39-64: The _call_with_failover function can raise a TypeError when
settings.azure_openai_deployments is empty because last_error stays None; add an
explicit guard at the start of _call_with_failover (after obtaining
settings/num_deployments) to detect num_deployments == 0 and raise a clear
exception (e.g., RuntimeError or ValueError) with a descriptive message about no
deployments configured; ensure the error mentions
settings.azure_openai_deployments so callers and logs can diagnose the
misconfiguration instead of hitting raise last_error later.

In `@backend/app/agent/rca.py`:
- Around line 32-39: The events query in conn.fetch inside rca.py only filters
with incident["started_at"], so later unrelated critical/warning events can leak
in; update the SQL in the conn.fetch call (the code that assigns events) to also
constrain event_ts <= incident["ended_at"] (or a safe fallback if ended_at is
null), and add incident["ended_at"] as the additional query parameter so the RCA
time window is bounded by both incident["started_at"] and incident["ended_at"]
when fetching events.

In `@backend/app/incidents/router.py`:
- Around line 65-67: The handler currently does "question = body.get('question',
'')" and only checks falsiness, allowing non-strings or whitespace-only strings
through; update the validation to first ensure the value is a string
(isinstance(question, str)), then trim it (question = question.strip()) and
raise HTTPException(status_code=422, detail="question is required") if the
trimmed value is empty or not a string before calling the investigation/dispatch
logic (i.e., around the variable question in this router function).

In `@backend/app/ingestion/router.py`:
- Line 29: The webhook handlers for /alerts, /metrics, and /events call
Request.json() directly and don't handle json.JSONDecodeError; add a helper
async function named _read_json(request: Request) that wraps await
request.json() in a try/except catching json.JSONDecodeError and raises
HTTPException(status_code=400, detail="Malformed JSON body"), then replace each
direct call payload = await request.json() in the alert/metric/event handler
functions with payload = await _read_json(request) so malformed JSON returns a
400 instead of an internal 500.

In `@backend/app/validation/executor.py`:
- Around line 21-33: The call site in healer.py that passes the fetched
broken_sql directly into explain_query() must first run the sql through the
existing SQL validation (sqlglot_gate / validate_sql); update the healer.py flow
where broken_sql is used (the call near the explain_query invocation) to call
validate_sql(broken_sql) or sqlglot_gate(broken_sql) and, if validation fails,
short-circuit and return or propagate an ExecutionResult-style failure (include
the validation error message) instead of calling explain_query; only call
explain_query(broken_sql) when validation passes.

In `@frontend/src/index.css`:
- Around line 394-395: Replace the deprecated CSS declaration `word-break:
break-word` (found alongside `white-space: pre-wrap`) with the
standards-compliant `overflow-wrap: anywhere`; remove `word-break: break-word`
and add `overflow-wrap: anywhere` in the same rule so long, unbreakable tokens
wrap correctly in preformatted blocks.

---

Nitpick comments:
In `@backend/app/agent/llm.py`:
- Around line 74-77: The list comprehension using the single-letter variable `l`
is ambiguous (Ruff E741); update the comprehension in the block that strips
triple-backtick code fences so that `l` is renamed to `line` (i.e., change
`lines = [l for l in lines if not l.strip().startswith("```")]` to use `line`),
leaving the surrounding logic (`if content.startswith("```")`, `lines =
content.split("\n")`, and `content = "\n".join(lines).strip()`) unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8681230d-84e6-43e9-92b5-2f19c02555c1

📥 Commits

Reviewing files that changed from the base of the PR and between b418174 and 31ddf17.

📒 Files selected for processing (23)

backend/.env.example
backend/app/__init__.py
backend/app/agent/healer.py
backend/app/agent/llm.py
backend/app/agent/rca.py
backend/app/config.py
backend/app/db/seed.py
backend/app/demo/router.py
backend/app/demo/scenarios.py
backend/app/incidents/router.py
backend/app/ingestion/normalizer.py
backend/app/ingestion/router.py
backend/app/main.py
backend/app/panels/router.py
backend/app/validation/executor.py
backend/app/validation/sqlglot_gate.py
backend/app/ws/manager.py
frontend/src/components/IncidentBanner.jsx
frontend/src/components/PanelHealth.jsx
frontend/src/components/TimelineChart.jsx
frontend/src/components/TopologyGraph.jsx
frontend/src/hooks/useWebSocket.js
frontend/src/index.css

✅ Files skipped from review due to trivial changes (1)

backend/app/ws/manager.py

🚧 Files skipped from review as they are similar to previous changes (10)

frontend/src/hooks/useWebSocket.js
backend/app/config.py
frontend/src/components/IncidentBanner.jsx
frontend/src/components/PanelHealth.jsx
frontend/src/components/TimelineChart.jsx
frontend/src/components/TopologyGraph.jsx
backend/app/demo/router.py
backend/app/main.py
backend/app/ingestion/normalizer.py
backend/app/panels/router.py

- healer.py: re-check active version inside promotion transaction to prevent stale fix; validate broken_sql through sqlglot_gate before explain_query - llm.py: guard against empty deployments list; rename l -> line (Ruff E741) - rca.py: bound event query with resolved_at upper limit - incidents/router.py: type-check and strip question input - ingestion/router.py: add _read_json helper for 400 on malformed JSON - index.css: replace deprecated word-break: break-word with overflow-wrap: anywhere

coderabbitai

🧹 Nitpick comments (4)

backend/app/ingestion/router.py (3)

5-5: 💤 Low value

Unused import: hashlib.

The hashlib module is imported but never used. hmac.compare_digest doesn't require it.

🧹 Proposed fix

 import json
 import hmac
-import hashlib
 from fastapi import APIRouter, Request, HTTPException

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/ingestion/router.py` at line 5, Remove the unused import of
hashlib from the top of the file (it is currently imported but never used);
leave the existing usage of hmac.compare_digest intact and only delete the line
"import hashlib" so there are no unused imports in router.py.

26-31: ⚡ Quick win

Use raise ... from None to suppress exception chaining.

Per Ruff B904, raising a new exception inside an except block should use from to clarify the exception chain. Here, suppressing the original JSONDecodeError is appropriate since you're converting it to an HTTP response.

🧹 Proposed fix

 async def _read_json(request: Request) -> dict:
     """Parse JSON body, returning 400 for malformed payloads."""
     try:
         return await request.json()
     except json.JSONDecodeError:
-        raise HTTPException(status_code=400, detail="Malformed JSON body")
+        raise HTTPException(status_code=400, detail="Malformed JSON body") from None

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/ingestion/router.py` around lines 26 - 31, In _read_json, when
catching json.JSONDecodeError and converting it to an HTTPException you should
suppress exception chaining; replace the bare raise with raising HTTPException
from None (i.e., raise HTTPException(status_code=400, detail="Malformed JSON
body") from None) so the original JSONDecodeError is not chained into the HTTP
response; update the except block in function _read_json accordingly.

41-42: 💤 Low value

Simplify the await pattern.

The two-line pattern is awkward and appears to be a refactoring artifact. Combine into a single line for clarity.

🧹 Proposed fix

-    result = normalize_alert(pool, payload)
-    result = await result
+    result = await normalize_alert(pool, payload)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/ingestion/router.py` around lines 41 - 42, The code uses an
unnecessary two-step await: first assigning result = normalize_alert(pool,
payload) then awaiting it on the next line; replace both lines with a single
awaited call by assigning result = await normalize_alert(pool, payload) so the
coroutine is awaited immediately (update the occurrence where normalize_alert is
called and the variable result is set).

backend/app/agent/healer.py (1)

9-9: 💤 Low value

Unused import: extract_columns.

The extract_columns function is imported but never used in this module.

🧹 Proposed fix

-from app.validation.sqlglot_gate import validate_sql, extract_columns
+from app.validation.sqlglot_gate import validate_sql

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/agent/healer.py` at line 9, The import line in healer.py brings
in extract_columns from app.validation.sqlglot_gate but that symbol is unused;
remove extract_columns from the import so only validate_sql is imported (i.e.,
change the import statement referencing validate_sql, removing extract_columns)
to eliminate the unused import warning and keep imports minimal.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/app/agent/healer.py`:
- Line 9: The import line in healer.py brings in extract_columns from
app.validation.sqlglot_gate but that symbol is unused; remove extract_columns
from the import so only validate_sql is imported (i.e., change the import
statement referencing validate_sql, removing extract_columns) to eliminate the
unused import warning and keep imports minimal.

In `@backend/app/ingestion/router.py`:
- Line 5: Remove the unused import of hashlib from the top of the file (it is
currently imported but never used); leave the existing usage of
hmac.compare_digest intact and only delete the line "import hashlib" so there
are no unused imports in router.py.
- Around line 26-31: In _read_json, when catching json.JSONDecodeError and
converting it to an HTTPException you should suppress exception chaining;
replace the bare raise with raising HTTPException from None (i.e., raise
HTTPException(status_code=400, detail="Malformed JSON body") from None) so the
original JSONDecodeError is not chained into the HTTP response; update the
except block in function _read_json accordingly.
- Around line 41-42: The code uses an unnecessary two-step await: first
assigning result = normalize_alert(pool, payload) then awaiting it on the next
line; replace both lines with a single awaited call by assigning result = await
normalize_alert(pool, payload) so the coroutine is awaited immediately (update
the occurrence where normalize_alert is called and the variable result is set).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: dda5c78f-5321-4ee4-b11e-daa1b6120eb3

📥 Commits

Reviewing files that changed from the base of the PR and between 31ddf17 and e7de696.

📒 Files selected for processing (6)

backend/app/agent/healer.py
backend/app/agent/llm.py
backend/app/agent/rca.py
backend/app/incidents/router.py
backend/app/ingestion/router.py
frontend/src/index.css

🚧 Files skipped from review as they are similar to previous changes (3)

backend/app/incidents/router.py
backend/app/agent/llm.py
frontend/src/index.css

SaiHPE · 2026-05-17T09:58:40Z

@CodeRabbit have made changes in pr check now

coderabbitai · 2026-05-17T09:59:10Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get latest PR status and recent commits
gh pr view 1 --repo SaiHPE/InsightSQL--Digithon --json title,state,mergeable,mergeStateStatus,reviewDecision,statusCheckRollup,reviews,isDraft,updatedAt,commits

echo "---COMMITS---"
gh pr view 1 --repo SaiHPE/InsightSQL--Digithon --json commits --jq '.commits[-5:] | .[] | {oid: .oid[0:8], message: .messageHeadline, authoredDate: .authoredDate}'

Length of output: 86003

@SaiHPE Great news — the PR is in much better shape now! Here's the updated status:

Item	Status
PR State	🟢 Open
Mergeable	✅ Yes
Merge State	✅ CLEAN (was Unstable before)
CodeRabbit Review	✅ SUCCESS
Human Reviews	None yet

Your two fix commits (fix: address all CodeRabbit PR review findings and fix: address second round of CodeRabbit review findings) resolved all the critical and major issues. The latest review pass only found minor nitpicks:

backend/app/ingestion/router.py: Unused hashlib import, raise ... from None for cleaner exception chaining, and a two-step await that can be simplified to one line.
backend/app/agent/healer.py: Unused extract_columns import.

These are low-priority and non-blocking. The PR is ready for a human reviewer to approve and merge! 🐇

#1 🔴 DemoControl: store poll interval in useRef, clear on unmount #2 🔴 App: wrap triggerGlow in useCallback, add to all useEffect deps #3 🟡 DemoControl: fetch /api/demo/status on mount to sync completed #4 🟡 App: track lastGlowedRef to scroll to most-recent section #5 🔵 router.py: split long HTTPException raise across lines #6 🔵 DemoControl: restore talking point display in demo dock #7 🔵 index.css: remove redundant scroll-behavior: smooth

Review fixes: #1 metricsTimeline already in initialState (no change needed — reviewer error) #2 retryCount now uses useState for reactive updates (not stale ref) #3 EventLog adds 5s periodic tick to keep relative timestamps fresh #4 /metrics-baseline query capped with LIMIT 5000 #5 Incident 4 → INC-004 (clean numbering: INC-001, INC-002, INC-004) #6 KPI card key includes status → React remounts on transition, re-triggering animation #7 Removed redundant '|| panels.length === 0' from PanelHealth render #8 Impact line hidden when $0, shows 'calculating…' during first seconds

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread backend/app/agent/healer.py Outdated

Comment thread backend/app/validation/sqlglot_gate.py

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

SaiHPE merged commit f7805d7 into main May 17, 2026
1 check passed

G Sai Roopesh and others added 2 commits May 17, 2026 13:49

feat: InsightSQL — full-stack SAP operations copilot

6282e02

Resolve README.md conflict

b418174

Sai-Roopesh mentioned this pull request May 19, 2026

fix: 19-issue demo audit — story fixes, baseline loading, polish #9

Merged

coderabbitai Bot mentioned this pull request May 19, 2026

feat: Refactor dashboard and incident UI #10

Merged

Conversation

SaiHPE commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

SaiHPE commented May 17, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 17, 2026

Uh oh!

SaiHPE commented May 17, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

SaiHPE commented May 17, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SaiHPE commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading