Thanks to visit codestin.com
Credit goes to github.com

Skip to content

malformed XML generated from malformed marc #100

@jakub-id

Description

@jakub-id

We've received records that contain subfield delimiters (1F) in the subfield contents. While this is wrong according to MARC spec, yaz-macdump will process the record without error. While the generated line and JSON representations are valid, XML will be malformed:

yaz-marcdump -f marc8 -o marcxml malformed.mrc | xmllint --noout -
-:148: parser error : Unescaped '<' not allowed in attributes values
    <subfield code="树上又有什么让人惊讶的景象呢</subfield>
                                                              ^
-:148: parser error : attributes construct error
    <subfield code="树上又有什么让人惊讶的景象呢</subfield>
                                                              ^
-:148: parser error : Couldn't find end of Start Tag subfield line 148
    <subfield code="树上又有什么让人惊讶的景象呢</subfield>
                                                              ^
-:148: parser error : Opening and ending tag mismatch: datafield line 145 and subfield
    <subfield code="树上又有什么让人惊讶的景象呢</subfield>
                                                                         ^
-:150: parser error : Opening and ending tag mismatch: record line 2 and datafield
  </datafield>
              ^
-:200: parser error : Opening and ending tag mismatch: collection line 1 and record
</record>
         ^
-:201: parser error : Extra content at the end of the document
</collection>
^

Expected result: yaz-marcdump handles the records but warns and generates well-formed XML.

malformed.mrc.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions