Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix(citationKey): resolve old-CLI field aliases in getCitationKey#419

Merged
bensonwong merged 2 commits into
mainfrom
fix/citation-key-alias-resolution
Apr 12, 2026
Merged

fix(citationKey): resolve old-CLI field aliases in getCitationKey#419
bensonwong merged 2 commits into
mainfrom
fix/citation-key-alias-resolution

Conversation

@bensonwong
Copy link
Copy Markdown
Collaborator

Summary

  • getCitationKey was accessing citation.sourceContext and citation.sourceMatch directly, bypassing alias resolution. Citations arriving via the old CLI path with legacy field names (fullPhrase, anchorText, source_context, etc.) were hashing those fields as empty strings, causing key collisions.
  • Routes sourceContext and sourceMatch through resolveField so all known alias variants hash identically to their canonical names — consistent with the existing startPageId alias handling already in the same function.
  • Hoists the single rawCitation cast to the top of the function, eliminating a duplicate as unknown as Record<string, unknown> cast that was previously inlined for startPageId.

Test plan

  • Run bun run test — all 315 unit tests pass
  • Verify getCitationKey({ type: "document", fullPhrase: "x", anchorText: "y" } as Citation) produces the same key as getCitationKey({ type: "document", sourceContext: "x", sourceMatch: "y" } as Citation)
  • Confirm canonical-field citations still produce the same frozen hashes (regression fixtures in citationKeyStability.test.ts)

LLM output arriving via the old CLI path may carry legacy field names
(fullPhrase/anchorText instead of sourceContext/sourceMatch) without
going through parseCitation normalization. This caused getCitationKey
to hash those fields as empty strings, producing collisions for
citations that should be distinct.

Route sourceContext and sourceMatch through resolveField so all known
alias variants hash identically to the canonical names. Also hoists
the single rawCitation cast to the top of the function, eliminating
the duplicate as-unknown-as cast that previously existed for startPageId.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments
Project Deployment Actions Updated (UTC)
agui-chat-deepcitation Ignored Ignored Preview Apr 12, 2026 1:45pm
deepcitation-langchain-rag-chat Ignored Ignored Preview Apr 12, 2026 1:45pm
mastra-rag-deepcitation Ignored Ignored Preview Apr 12, 2026 1:45pm
nextjs-ai-sdk-deepcitation Ignored Ignored Preview Apr 12, 2026 1:45pm

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 12, 2026

✅ Playwright Test Report

Status: Tests passed

📊 Download Report & Snapshots (see Artifacts section)

What's in the Visual Snapshots

The gallery includes visual snapshots for:

  • 🖥️ Desktop showcase (all variants × all states)
  • 📱 Mobile showcase (iPhone SE viewport)
  • 📟 Tablet showcase (iPad viewport)
  • 🔍 Popover states (verified, partial, not found)
  • 🔗 URL citation variants

Run ID: 24308142691

@claude
Copy link
Copy Markdown

claude Bot commented Apr 12, 2026

Code Review

Overall: This is a well-targeted, minimal fix for a real bug. The logic is correct and the refactor is clean. A few things worth addressing before merge:


What's Good

  • Fix is correct. resolveField tries the canonical name first, so existing citations with sourceContext/sourceMatch are unaffected. The regression fixture hashes will stay identical.
  • Hoisting rawCitation eliminates the repeated as unknown as Record<string, unknown> cast — cleaner and easier to audit.
  • Comment quality improved: the new comment names the specific old-CLI aliases (fullPhrase, anchorText) rather than the vague "LLMs sometimes emit...".

Issues

1. Missing test coverage for the bug being fixed (important)

The PR description says to verify that old-CLI field names produce the same key as canonical names, but citationKeyStability.test.ts has no test for this. If someone later removes fullPhrase or anchorText from FIELD_ALIAS_MAP, the bug silently regresses with no red test.

Suggested addition to citationKeyStability.test.ts:

describe("getCitationKey alias resolution", () => {
  it("old-CLI fullPhrase/anchorText aliases hash identically to canonical names", () => {
    const canonical: DocumentCitation = {
      type: "document",
      sourceContext: "Revenue grew 45% year-over-year to $2.3B",
      sourceMatch: "$2.3B",
      pageNumber: 2,
      lineIds: [20],
    };
    // Simulate an old-CLI citation with legacy field names
    const legacyCli = {
      type: "document",
      fullPhrase: "Revenue grew 45% year-over-year to $2.3B",
      anchorText: "$2.3B",
      pageNumber: 2,
      lineIds: [20],
    } as unknown as DocumentCitation;

    expect(getCitationKey(legacyCli)).toBe(getCitationKey(canonical));
  });

  it("old-CLI citations no longer hash as empty string (regression guard)", () => {
    const legacyCli = {
      type: "document",
      fullPhrase: "Some context",
      anchorText: "context",
    } as unknown as DocumentCitation;
    const emptyKey = getCitationKey({ type: "document" } as DocumentCitation);

    expect(getCitationKey(legacyCli)).not.toBe(emptyKey);
  });
});

2. getVerificationKey does not use resolveField

getVerificationKey in the same file accesses verification.verifiedSourceContext and verification.verifiedSourceMatch directly. If verifications can arrive via a similar old-CLI path with aliased field names, it has the same bug. If verifications are always server-generated with canonical names this is fine — but worth a comment confirming that assumption to prevent a future copy-paste bug.


Minor Notes

  • The cast (resolveField(rawCitation, "sourceContext") as string) || "" is appropriate here — resolveField returns unknown by design, and this matches the pre-existing pattern for resolvedStartPageId.
  • lineIds and pageNumber are still accessed directly via citation.lineIds and citation.pageNumber. These fields don't appear to have old-CLI aliases in FIELD_ALIAS_MAP that would carry non-undefined values, so this is fine, but worth confirming.

Summary: The core fix is correct. Before merging, please add the alias-variant test case so the bug can't regress silently. The getVerificationKey concern is lower priority but worth acknowledging.

…getVerificationKey scope

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@bensonwong bensonwong merged commit b29d9d1 into main Apr 12, 2026
14 checks passed
@bensonwong bensonwong deleted the fix/citation-key-alias-resolution branch April 12, 2026 13:52
@bensonwong bensonwong mentioned this pull request Apr 12, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant