fix(citationKey): resolve old-CLI field aliases in getCitationKey#419
Conversation
LLM output arriving via the old CLI path may carry legacy field names (fullPhrase/anchorText instead of sourceContext/sourceMatch) without going through parseCitation normalization. This caused getCitationKey to hash those fields as empty strings, producing collisions for citations that should be distinct. Route sourceContext and sourceMatch through resolveField so all known alias variants hash identically to the canonical names. Also hoists the single rawCitation cast to the top of the function, eliminating the duplicate as-unknown-as cast that previously existed for startPageId. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 4 Skipped Deployments
|
✅ Playwright Test ReportStatus: Tests passed 📊 Download Report & Snapshots (see Artifacts section) What's in the Visual SnapshotsThe gallery includes visual snapshots for:
Run ID: 24308142691 |
Code ReviewOverall: This is a well-targeted, minimal fix for a real bug. The logic is correct and the refactor is clean. A few things worth addressing before merge: What's Good
Issues1. Missing test coverage for the bug being fixed (important) The PR description says to verify that old-CLI field names produce the same key as canonical names, but Suggested addition to describe("getCitationKey alias resolution", () => {
it("old-CLI fullPhrase/anchorText aliases hash identically to canonical names", () => {
const canonical: DocumentCitation = {
type: "document",
sourceContext: "Revenue grew 45% year-over-year to $2.3B",
sourceMatch: "$2.3B",
pageNumber: 2,
lineIds: [20],
};
// Simulate an old-CLI citation with legacy field names
const legacyCli = {
type: "document",
fullPhrase: "Revenue grew 45% year-over-year to $2.3B",
anchorText: "$2.3B",
pageNumber: 2,
lineIds: [20],
} as unknown as DocumentCitation;
expect(getCitationKey(legacyCli)).toBe(getCitationKey(canonical));
});
it("old-CLI citations no longer hash as empty string (regression guard)", () => {
const legacyCli = {
type: "document",
fullPhrase: "Some context",
anchorText: "context",
} as unknown as DocumentCitation;
const emptyKey = getCitationKey({ type: "document" } as DocumentCitation);
expect(getCitationKey(legacyCli)).not.toBe(emptyKey);
});
});2.
Minor Notes
Summary: The core fix is correct. Before merging, please add the alias-variant test case so the bug can't regress silently. The |
…getVerificationKey scope Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Summary
getCitationKeywas accessingcitation.sourceContextandcitation.sourceMatchdirectly, bypassing alias resolution. Citations arriving via the old CLI path with legacy field names (fullPhrase,anchorText,source_context, etc.) were hashing those fields as empty strings, causing key collisions.sourceContextandsourceMatchthroughresolveFieldso all known alias variants hash identically to their canonical names — consistent with the existingstartPageIdalias handling already in the same function.rawCitationcast to the top of the function, eliminating a duplicateas unknown as Record<string, unknown>cast that was previously inlined forstartPageId.Test plan
bun run test— all 315 unit tests passgetCitationKey({ type: "document", fullPhrase: "x", anchorText: "y" } as Citation)produces the same key asgetCitationKey({ type: "document", sourceContext: "x", sourceMatch: "y" } as Citation)citationKeyStability.test.ts)