Summary
CommonMark §2.1 defines a line
ending as LF, a carriage return not followed by a line feed, or CRLF.
pulldown-cmark recognizes LF and CRLF as line endings but
not a bare CR, so a document that uses bare \r line endings is parsed as if its
lines were joined, and block-level constructs that require their own line (fenced code
blocks, ATX headings, thematic breaks, …) are not recognized.
Reproduction
Cargo.toml:
[dependencies]
pulldown-cmark = "0.13.4"
src/main.rs:
use pulldown_cmark::{CodeBlockKind, Event, Options, Parser, Tag};
/// Does the document contain a fenced code block with info string `rust`?
fn has_fenced_block(markdown: &str) -> bool {
Parser::new_ext(markdown, Options::empty()).any(|event| {
matches!(
event,
Event::Start(Tag::CodeBlock(CodeBlockKind::Fenced(ref info)))
if info.as_ref() == "rust"
)
})
}
fn main() {
// The same fenced block with three line endings. Per CommonMark §2.1 all three
// are line-structured identically, so the fence should be found in each.
for (label, markdown) in [
("LF", "```rust\nlet x = 1;\n```\n"),
("CRLF", "```rust\r\nlet x = 1;\r\n```\r\n"),
("CR", "```rust\rlet x = 1;\r```\r"),
] {
println!("{label:<4} fenced block detected: {}", has_fenced_block(markdown));
}
}
cargo run prints:
LF fenced block detected: true
CRLF fenced block detected: true
CR fenced block detected: false
The bare-CR document's fence is not detected: into_offset_iter() yields the
``` as ordinary paragraph text (with SoftBreak/HardBreak) instead of a
Start(CodeBlock(Fenced("rust"))) event.
Expected
Per CommonMark §2.1, a bare CR is a
line ending, so the CR document should parse identically to the LF and CRLF documents
and the fenced block should be detected.
Reference
CommonMark spec §2.1 (Characters and
lines):
A line ending is a line feed (U+000A), a carriage return (U+000D) not followed
by a line feed, or a carriage return and a following line feed.
Summary
CommonMark §2.1 defines a line
ending as LF, a carriage return not followed by a line feed, or CRLF.
pulldown-cmarkrecognizes LF and CRLF as line endings butnot a bare CR, so a document that uses bare
\rline endings is parsed as if itslines were joined, and block-level constructs that require their own line (fenced code
blocks, ATX headings, thematic breaks, …) are not recognized.
Reproduction
Cargo.toml:src/main.rs:cargo runprints:The bare-CR document's fence is not detected:
into_offset_iter()yields the```as ordinary paragraph text (withSoftBreak/HardBreak) instead of aStart(CodeBlock(Fenced("rust")))event.Expected
Per CommonMark §2.1, a bare CR is a
line ending, so the CR document should parse identically to the LF and CRLF documents
and the fenced block should be detected.
Reference
CommonMark spec §2.1 (Characters and
lines):