RFC: Structured Machine Summary for Evaluation Reports #456
syahmiharith
started this conversation in
RFC
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
RFC: Structured Machine Summary for Evaluation Reports
References
1. Problem
Career-Ops currently produces detailed markdown evaluation reports, but downstream tools need to extract structured insights from those reports.
The current approach relies on markdown structure, headings, regex patterns, and natural-language text. That works for human reading, but it is fragile for automation. Small wording or formatting changes can break pattern analysis, calibration, dashboards, or future reporting tools.
This affects features such as:
analyze-patterns.mjsThe user-facing report should remain readable, but the system also needs a stable machine-readable contract inside each saved evaluation report.
2. Proposed Solution
Add a standardized fenced YAML
## Machine Summaryblock to every evaluation report generated by interactive and batch evaluation modes.The block should summarize the final evaluation in a strict, machine-readable shape while preserving the existing human-readable markdown report.
Example:
The parser should:
Machine Summaryblock when available.hard_stops,soft_gaps, andtop_strengths.3. Files Affected
Expected system-layer files:
modes/oferta.mdMachine Summarysection to interactive evaluation output.modes/batch.mdbatch/batch-prompt.mdanalyze-patterns.mjspackage.jsontest-all.mjsdocs/SCRIPTS.mdCLAUDE.mdMachine Summarysection.Potentially affected but not required:
DATA_CONTRACT.mdreports/*format explicitly documented as containing both human-readable markdown and a structured machine-readable summary.4. Data Contract Impact
This change does not introduce a new user-owned file.
It modifies the generated content format of
reports/*, which are already part of the user layer. The newMachine Summaryblock mirrors information already present in the human-readable evaluation report.No update process should modify existing user reports. Existing reports without the block should remain valid through legacy fallback parsing.
System-layer files are updated to produce and parse the new block going forward.
Data Contract summary:
reports/*5. Phases
Phase 1 — Define the Schema
Create the canonical
Machine Summaryschema and document:Acceptance criteria:
Phase 2 — Emit Machine Summary in Evaluation Reports
Update evaluation prompts so generated reports include the structured block.
Acceptance criteria:
modes/oferta.mdemits the block.modes/batch.mdemits the block.batch/batch-prompt.mdemits the same block.Phase 3 — Parse Structured Summaries
Update
analyze-patterns.mjsto prefer the structured block.Acceptance criteria:
0scores instead of treating them as falsy.Phase 4 — Tests and Documentation
Add parser self-tests and script documentation.
Acceptance criteria:
node --check analyze-patterns.mjspasses.node analyze-patterns.mjs --self-testpasses.node test-all.mjs --quickincludes the parser self-test.docs/SCRIPTS.mddocuments the pattern analysis command and flags.CLAUDE.mdreferences the structured report block where appropriate.Phase 5 — Follow-Up Improvements
After the base schema is stable, consider follow-up PRs for:
These should be separate PRs unless maintainers prefer bundling them under this RFC.
6. Non-Goals
This RFC does not propose:
reports/*7. Backward Compatibility
Existing reports should continue to work.
The parser should follow this priority:
Machine Summaryif present and valid.This allows users with older reports to keep using pattern analysis while newer reports become more reliable for automation.
8. Risks and Mitigations
Risk: LLM-generated YAML may be inconsistent
Mitigation:
Risk: Parser accepts malformed data
Mitigation:
Risk: Machine Summary duplicates human-readable content
Mitigation:
Risk: Data Contract confusion
Mitigation:
reports/*remain user-layer files.9. Open Questions
Machine Summaryinclude aschema_versionfield now, or wait until the schema changes?risk_level,confidence, andfinal_decision?Machine Summaryblocks fail loudly, warn, or silently fall back to legacy markdown parsing?DATA_CONTRACT.mdexplicitly mention thatreports/*may contain machine-readable metadata?10. Suggested PR Breakdown
Machine Summarywith fallback.PR #444 already combines much of PR 1-3 into one implementation. If maintainers prefer smaller review units, the same work can be split according to the phases above.
Links
Beta Was this translation helpful? Give feedback.
All reactions