diff --git a/.agents/skills/impeccable/reference/codex.md b/.agents/skills/impeccable/reference/codex.md index 2ef1db5b9..0b35f3e34 100644 --- a/.agents/skills/impeccable/reference/codex.md +++ b/.agents/skills/impeccable/reference/codex.md @@ -4,16 +4,29 @@ This file is loaded by `$impeccable craft` when the harness has native image gen Read this *before* generating any images. The order matters, and the per-step user pauses are what keep generated imagery from drifting away from the brief. +### Four stop points before code + +Steps A through D each end with the user. Do not advance past any of them on your own read of the situation. + +1. **STOP after Step A questions.** Wait for answers. +2. **STOP after Step B palette generation.** Wait for "confirm palette." +3. **STOP after Step C mocks.** Wait for direction approval or delegation. +4. **Only after Step D approves a direction** do you return to craft.md Step 4 and write code. + +Prior shape approval does **not** satisfy any of these. Shape's "confirm or override" advances you into Step A; it is not a substitute for it. + ## Step A: Explore Directions with the User Before generating anything, run a brief direction conversation grounded in the shape brief. +**Step A is required even when shape just produced a confirmed brief.** The shape questions and Step A questions cover different ground: shape pins purpose, content, scope; Step A pins palette, atmosphere, and named visual references for the comps you're about to generate. The only time you can skip Step A is when the user has already answered these exact palette/atmosphere/reference questions in the same session. + Ask **2-3 targeted questions** about visual lane, color strategy, atmosphere, and named anchor references. Don't enumerate generic menus; tie each question to the shape brief's answers. Example shape-grounded questions: - "Brief says 'editorial restraint, Klim-adjacent.' Are we closer to a quiet specimen page or a magazine-spread feel with hero imagery?" - "Palette strategy from shape was 'Committed.' Want it warm-grounded (deep oxblood + cream) or cool-grounded (slate + paper white)?" -Stop and wait for answers. These pin the palette before any pixel gets generated. +**STOP and wait for answers.** These pin the palette before any pixel gets generated. Do not proceed to Step B until the user has responded. ## Step B: Generate the Brand Palette First @@ -23,7 +36,7 @@ Why palette first: mocks generated against a vague color sense produce noise tha Show the palette to the user. Ask one question: "This is the palette I'm locking in for the mocks. Confirm, or call out what to shift?" -Wait for confirmation. Do not generate mocks against an unconfirmed palette. +**STOP and wait for confirmation.** Do not generate mocks against an unconfirmed palette. "Probably good enough" is the wrong call here; the palette is the contract for everything downstream. ## Step C: Generate 1-3 Visual Mocks Against the Palette @@ -39,7 +52,7 @@ Use the `image_gen` tool directly (or via the imagegen skill when available). Do Show the comps. Ask what carries forward. Iterate until **one direction is approved** or the user explicitly delegates. -If the user delegates, pick the strongest direction and explain it from the brief, not personal taste. +**STOP and wait for the approval or the delegation.** Do not begin Step E or return to craft.md Step 4 until a single direction is named. If the user delegates, pick the strongest direction and explain it from the brief, not personal taste. Before moving to assets, summarize what to carry into code and what *not* to literalize from the mock. This is the handoff between visual exploration and semantic implementation. diff --git a/.agents/skills/impeccable/reference/craft.md b/.agents/skills/impeccable/reference/craft.md index 5ab482412..831b08ab9 100644 --- a/.agents/skills/impeccable/reference/craft.md +++ b/.agents/skills/impeccable/reference/craft.md @@ -6,6 +6,19 @@ Before writing code, you need: PRODUCT.md loaded, register identified and the ma Treat any approved visual direction (generated mock or stated reference) as a concrete contract for composition, hierarchy, density, atmosphere, signature motifs, and distinctive visual moves. Don't let mocks replace structure, copy, accessibility, or state design. But if the live result lacks the approved direction's major ingredients, the implementation is wrong. +### Gates: do not compress + +Craft has **multiple user gates**, not one. When the harness has native image generation (Codex via `image_gen`), the gate sequence before code is: + +1. **Shape brief confirmed** (Step 1) +2. **Direction questions answered** (codex.md Step A) +3. **Palette confirmed** (codex.md Step B) +4. **One mock direction approved or delegated** (codex.md Step D) + +You must stop at every gate. **Shape confirmation alone is NOT a green light to start coding.** It is the green light to begin codex.md Step A. Compressing gates 2 through 4 because the shape brief felt complete is the dominant failure mode of this flow. + +When the harness lacks native image generation, gates 2-4 collapse into the brief itself, and shape confirmation does advance straight to code. + ## Step 0: Project Foundation Before shape, before code: figure out what kind of project you're working in. @@ -37,6 +50,8 @@ If the user already supplied a confirmed brief or ran shape separately, use it a When the original prompt + PRODUCT.md already answer scope, content, and visual direction with no real ambiguity, the shape output can be **compact** (3-5 bullets stating what you're building and the visual lane, ending with one or two specific questions or "confirm or override"). The full 10-section structured brief is reserved for genuinely ambiguous, multi-screen, or stakeholder-heavy tasks. Don't pad a clear brief into a long one to look thorough; equally, don't skip the pause to look efficient. +If the harness has native image generation (Codex), a compact shape's "confirm or override" advances to **Step 3 and the codex.md flow**, not to Step 4. Phrase the closing line accordingly: "Confirm or override; once we lock direction, I'll run a couple of palette and reference questions before generating any mocks." This stops the model from reading shape confirmation as code-green. + ## Step 2: Load References Based on the design brief's "Recommended References" section, consult the relevant impeccable reference files. At minimum, always consult: @@ -61,6 +76,8 @@ Whether you generated mocks or not: don't replace required imagery with generic ## Step 4: Build to Production Quality +**Precondition.** If Step 3 routed you to codex.md (native image generation available), Steps A through D in that file must be complete before any code: questions answered, palette confirmed, mocks generated, one direction approved or delegated. **Do not mention implementation, file paths, or patch plans until that's done.** A confirmed shape brief is not enough; the model that compressed those gates is the model that already failed this flow. + Implement the feature following the design brief. Build in passes so structure, visual system, states, motion/media, and responsive behavior each get deliberate attention. The list below is the definition of done, not inspiration. ### Production bar diff --git a/.agents/skills/impeccable/reference/critique.md b/.agents/skills/impeccable/reference/critique.md index 9c33472e7..4e84fcb55 100644 --- a/.agents/skills/impeccable/reference/critique.md +++ b/.agents/skills/impeccable/reference/critique.md @@ -1,109 +1,96 @@ -> **Additional context needed**: what the interface is trying to accomplish. +### Purpose -### Setup: Resolve Target and Load Ignore List +Resolve one stable target, run two independent assessments, synthesize a design critique, persist a snapshot, and ask the user what to improve next. The chat response is the primary deliverable; the snapshot is an archive/backlog for future commands. -Before gathering assessments, do two small bookkeeping steps. They cost almost nothing and they're what makes critique iterative across runs. +### Hard Invariants -1. **Resolve the primary artifact.** The user's phrasing ("the homepage", "the pricing flow") is not stable enough to track across runs. Resolve it to a concrete file path or URL: the same one you'd already need to scan code or open in a browser. Examples: - - "the homepage" → `site/pages/index.astro` (or `http://localhost:3000/` if you're inspecting live) - - "the settings modal" → the primary component file (e.g., `src/components/Settings.tsx`) - - "this page" → the URL or the page's source file - Prefer the source file path over the dev-server URL when both exist; ports drift between runs (`bun dev` vs `bun preview`), file paths don't. +- Assessment A (design review) and Assessment B (detector/browser evidence) are both required. +- Assessment A must finish before detector findings enter the parent synthesis context. Detector output is deterministic, but it still anchors judgment. +- If sub-agents are unavailable, fall back sequentially: finish and record Assessment A first, then run Assessment B, then synthesize. +- A skipped detector is a failed critique run unless `detect.mjs` is missing or crashes after a real attempt. +- Viewable targets require browser inspection when available. +- Any local server started only for critique visualization must run in the background, have a recorded stop method, and be stopped before final reporting unless the user asks to keep it. +- Do not claim a user-visible overlay exists unless script injection succeeded and the detector ran in the page. -2. **Compute the slug.** Run: +### Setup + +1. **Resolve the target** to a concrete file path or URL. Prefer a source path over a dev-server URL when both identify the same surface; ports drift, paths do not. + - "the homepage" -> `site/pages/index.astro` or `index.html` + - "the settings modal" -> the primary component file + - "this page" -> the current URL or source file +2. **Compute the slug**: ```bash node .agents/skills/impeccable/scripts/critique-storage.mjs slug "" ``` - Keep the printed slug. It identifies this target's stream across runs. If the command exits non-zero ("no stable slug for input"), skip persistence for this run and tell the user; the trend won't update but the critique still goes ahead. - -3. **Read the ignore list** at `.impeccable/critique/ignore.md` if it exists. Plain markdown; each non-empty, non-comment line is something the user has marked as "do not re-raise" (deferred tradeoffs, designer-intended deviations, detector false-positives the user accepts). When a finding's text matches a line here (case-insensitive substring against rule name or snippet), **drop it silently**. Do not mention it in the report. This is the ONLY input critique consumes from prior runs; anchoring on prior findings would defeat the point of independent assessment. - -### Gather Assessments + Keep it. If the command exits non-zero, skip persistence and trend for this run, but continue the critique. +3. **Read `.impeccable/critique/ignore.md`** if it exists. Drop matching findings silently; it is the only prior-run input critique consumes. -Launch two independent assessments. **Neither may see the other's output.** This isolation is what makes the combined score honest. Running both in one head silently anchors them to each other; do not shortcut it for cost, speed, or context-size reasons. +### Assessment Orchestration -Delegate each assessment to a separate sub-agent (Claude Code's `Agent` tool, Codex's subagent spawning, etc.). Each returns structured findings as text. Do NOT output findings to the user yet. +Delegate Assessment A and Assessment B to separate sub-agents when possible. They must not see each other's output. Do not show findings to the user until synthesis. -Fall back to sequential in-head work only if the environment genuinely cannot spawn sub-agents. +Codex sub-agent gate: +- If `spawn_agent` is exposed and the user explicitly allowed sub-agents, delegation, or parallel agent work, spawn A and B immediately. +- If `spawn_agent` is exposed but the user did not explicitly allow sub-agents, ask exactly once: "Impeccable critique is designed to run two independent sub-agents for an unanchored assessment. May I use sub-agents for this critique?" Then stop until the user answers. +- If allowed, spawn A and B. If declined, run sequentially and report `Assessment independence: degraded (sub-agents declined by user)`. +- If `spawn_agent` is not exposed, do not ask; run sequentially and report `Assessment independence: degraded (spawn_agent unavailable in this session)`. +- If spawning fails after permission, run sequentially and report `Assessment independence: degraded (sub-agent spawn failed: )`. +Prefer `fork_context: false` with self-contained prompts containing cwd, target, live URL, references, product context, and output contract. If using `fork_context: true`, omit `agent_type`, `model`, and `reasoning_effort`. -**Tab isolation**: When browser automation is available, each assessment MUST create its own new tab. Never reuse an existing tab, even if one is already open at the correct URL. This prevents the two assessments from interfering with each other's page state. - -#### Assessment A: LLM Design Review - -Read the relevant source files (HTML, CSS, JS/TS) and, if browser automation is available, visually inspect the live page. **Create a new tab** for this; do not reuse existing tabs. After navigation, label the tab by setting the document title: -```javascript -document.title = '[LLM] ' + document.title; -``` -Think like a design director. Evaluate: +If browser automation is available, each assessment creates its own new tab. Never reuse an existing tab, even if it is already at the right URL. -**AI Slop Detection (CRITICAL)**: Does this look like every other AI-generated interface? Review against ALL **DON'T** guidelines from the parent impeccable skill (already loaded in this context). Check for AI color palette, gradient text, dark glows, glassmorphism, hero metric layouts, identical card grids, generic fonts, and all other tells. **The test**: If someone said "AI made this," would you believe them immediately? +### Assessment A: Design Review -**Holistic Design Review**: visual hierarchy (eye flow, primary action clarity), information architecture (structure, grouping, cognitive load), emotional resonance (does it match brand and audience?), discoverability (are interactive elements obvious?), composition (balance, whitespace, rhythm), typography (hierarchy, readability, font choices), color (purposeful use, cohesion, accessibility), states & edge cases (empty, loading, error, success), microcopy (clarity, tone, helpfulness). +Read relevant source files and visually inspect the live page when browser automation is available. Think like a design director. -**Cognitive Load** (consult [cognitive-load](cognitive-load.md)): -- Run the 8-item cognitive load checklist. Report failure count: 0-1 = low (good), 2-3 = moderate, 4+ = critical. -- Count visible options at each decision point. If >4, flag it. -- Check for progressive disclosure: is complexity revealed only when needed? +Evaluate: +- **AI slop**: Would someone believe "AI made this" immediately? Check all DON'T guidance from the parent Impeccable skill. +- **Holistic design**: hierarchy, IA, emotional fit, discoverability, composition, typography, color, accessibility, states, copy, and edge cases. +- **Cognitive load**: consult [cognitive-load](cognitive-load.md); report checklist failures and decision points with >4 visible options. +- **Emotional journey**: peak-end rule, emotional valleys, reassurance at high-stakes moments. +- **Nielsen heuristics**: consult [heuristics-scoring](heuristics-scoring.md); score all 10 heuristics 0-4. -**Emotional Journey**: -- What emotion does this interface evoke? Is that intentional? -- **Peak-end rule**: Is the most intense moment positive? Does the experience end well? -- **Emotional valleys**: Check for anxiety spikes at high-stakes moments (payment, delete, commit). Are there design interventions (progress indicators, reassurance copy, undo options)? +Return: AI slop verdict, heuristic scores, cognitive load, emotional journey, 2-3 strengths, 3-5 priority issues, persona red flags, minor observations, and provocative questions. -**Nielsen's Heuristics** (consult [heuristics-scoring](heuristics-scoring.md)): -Score each of the 10 heuristics 0-4. This scoring will be presented in the report. +### Assessment B: Detector + Browser Evidence -Return structured findings covering: AI slop verdict, heuristic scores, cognitive load assessment, what's working (2-3 items), priority issues (3-5 with what/why/fix), minor observations, and provocative questions. +Run the bundled detector and browser visualization evidence. Assessment B is mandatory and must remain isolated from Assessment A until both are complete. -#### Assessment B: Automated Detection - -Run the bundled deterministic detector, which flags 27 specific patterns (AI slop tells + general design quality). - -**CLI scan**: +CLI scan: ```bash -npx impeccable detect --json [--fast] [target] +node .agents/skills/impeccable/scripts/detect.mjs --json [--fast] [target] ``` -- Pass HTML/JSX/TSX/Vue/Svelte files or directories as `[target]` (anything with markup). Do not pass CSS-only files. -- For URLs, skip the CLI scan (it requires Puppeteer). Use browser visualization instead. -- For large directories (200+ scannable files), use `--fast` (regex-only, skips jsdom) -- For 500+ files, narrow scope or ask the user -- Exit code 0 = clean, 2 = findings +- Pass markup files/directories as `[target]`; do not pass CSS-only files. +- For URLs, skip CLI scan and use browser visualization. +- For 200+ scannable files, use `--fast`; for 500+, narrow scope or ask. +- Exit code 0 = clean; 2 = findings. +- If the detector entrypoint is missing or fails to load, report deterministic scan unavailable and continue with browser/manual review. -**Browser visualization**: **required** when browser automation tools are available AND the target is a viewable page. The `[Human]` overlay tab is the user-facing deliverable; the critique is incomplete without it. Skip only if the target is not a viewable page (CSS-only file, non-browser target). +Browser visualization is required for a viewable target when browser automation is available. Use a localhost dev/static URL for local files; avoid `file://` unless the available browser explicitly supports this workflow. Overlay flow: -The overlay is a **visual aid for the user**. It highlights issues directly in their browser. Do NOT scroll through the page to screenshot overlays. Instead, read the console output to get the results programmatically. +1. Create a fresh tab and navigate. +2. Preflight mutable injection by setting `document.title` and appending a ` + * Re-scan: window.impeccableScan() + */ +(function () { +if (typeof window === 'undefined') return; +// --- cli/engine/shared/constants.mjs --- +// ─── Section 1: Constants ─────────────────────────────────────────────────── + +const SAFE_TAGS = new Set([ + 'blockquote', 'nav', 'a', 'input', 'textarea', 'select', + 'pre', 'code', 'span', 'th', 'td', 'tr', 'li', 'label', + 'button', 'hr', 'html', 'head', 'body', 'script', 'style', + 'link', 'meta', 'title', 'br', 'img', 'svg', 'path', 'circle', + 'rect', 'line', 'polyline', 'polygon', 'g', 'defs', 'use', +]); + +// Per-check safe-tags override for the border (side-tab / border-accent) +// rule. We intentionally re-allow