Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Bare carriage return (CR, U+000D) is not treated as a line ending (CommonMark §2.1) #1106

@owenlamont

Description

@owenlamont

Summary

CommonMark §2.1 defines a line
ending as LF, a carriage return not followed by a line feed, or CRLF.
pulldown-cmark recognizes LF and CRLF as line endings but
not a bare CR, so a document that uses bare \r line endings is parsed as if its
lines were joined, and block-level constructs that require their own line (fenced code
blocks, ATX headings, thematic breaks, …) are not recognized.

Reproduction

Cargo.toml:

[dependencies]
pulldown-cmark = "0.13.4"

src/main.rs:

use pulldown_cmark::{CodeBlockKind, Event, Options, Parser, Tag};

/// Does the document contain a fenced code block with info string `rust`?
fn has_fenced_block(markdown: &str) -> bool {
    Parser::new_ext(markdown, Options::empty()).any(|event| {
        matches!(
            event,
            Event::Start(Tag::CodeBlock(CodeBlockKind::Fenced(ref info)))
                if info.as_ref() == "rust"
        )
    })
}

fn main() {
    // The same fenced block with three line endings. Per CommonMark §2.1 all three
    // are line-structured identically, so the fence should be found in each.
    for (label, markdown) in [
        ("LF",   "```rust\nlet x = 1;\n```\n"),
        ("CRLF", "```rust\r\nlet x = 1;\r\n```\r\n"),
        ("CR",   "```rust\rlet x = 1;\r```\r"),
    ] {
        println!("{label:<4} fenced block detected: {}", has_fenced_block(markdown));
    }
}

cargo run prints:

LF   fenced block detected: true
CRLF fenced block detected: true
CR   fenced block detected: false

The bare-CR document's fence is not detected: into_offset_iter() yields the
``` as ordinary paragraph text (with SoftBreak/HardBreak) instead of a
Start(CodeBlock(Fenced("rust"))) event.

Expected

Per CommonMark §2.1, a bare CR is a
line ending, so the CR document should parse identically to the LF and CRLF documents
and the fenced block should be detected.

Reference

CommonMark spec §2.1 (Characters and
lines):

A line ending is a line feed (U+000A), a carriage return (U+000D) not followed
by a line feed, or a carriage return and a following line feed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions