Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Improve error messages for malformed CSL files #405

@YDX-2147483647

Description

@YDX-2147483647

Due to the diversity of implementations, many CSL files go beyond the CSL spec, and some even go beyond the CSL-M extension.
Therefore, it would be quite helpful if we provide better error messages for malformed CSL files, for both end authors and developers.

  • Show the place of the error in the CSL file
  • Show what data did not match any variant of the untagged enum
  • Indicate whether the CSL file is really invalid or hayagriva does not support the feature yet
  • (Optional) Ignore minor errors (take them as warnings) and continue parsing

Relates to typst/citationberg#22 and typst/citationberg#23, but this issue covers more.

Common errors

According to the survery, 222 of 302 (74%) CSL files in https://github.com/zotero-chinese/styles/tree/435cf756bf8bfaa193be236e27a347560ee39f54/src are considered malformed by hayagriva 799cfdc.
The error messages are as the follows.

  • 196 (86%) × data did not match any variant of untagged enum Term
    (probably caused by space-et-al chinese et al. #291)
  • 8 (4%) × unknown variant institution, expected one of name, et-al, label, substitute
    (might relate to Some entries in thesis and report bibliography items are not shown #112)
  • 6 × data did not match any variant of untagged enum Variable
  • 4 × data did not match any variant of untagged enum TextTarget
  • 4 × missing field $value
  • 2 × unknown variant monograph, expected one of article, article-journal, article-magazine, article-newspaper, bill, book, broadcast, chapter, classic, collection, dataset, document, entry, entry-dictionary, entry-encyclopedia, event, figure, graphic, hearing, interview, legal_case, legislation, manuscript, map, motion_picture, musical_score, pamphlet, paper-conference, patent, performance, periodical, personal_communication, post, post-weblog, regulation, report, review, review-book, software, song, speech, standard, thesis, treaty, webpage
  • 1 × missing field term
  • 1 × unknown variant last, expected first or all

The duplicate field layout error should be the most common one at present.
It's caused by CSL-M and tracked by typst/citationberg#5.
However, there are recent efforts improving it (e.g., #126), so I exclude this error by keeping only the last <layout> for <citation> and <bibliography> in CSL files.

Links

The following discussions (in Chinese) describe the discrepancies in the CSL world.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions