HTML link attributes are erased when parsed as HTML but not as Markdown

Pandoc correctly generates a HTML link with ID & attributes:

 $ echo '[foo](https://www.example.com){#foo key1=value1 key2=value2}' | pandoc -f markdown -w html
 <a href="https://www.example.com" id="foo" data-key1="value1" data-key2="value2">foo</a>

On reading its own HTML as *HTML* and generating either HTML or Markdown, the key-value attributes are silently erased:

 $ echo '[foo](https://www.example.com){#foo key1=value1 key2=value2}' | pandoc -f markdown -w html | pandoc -f html -w html
 <a href="https://www.example.com" id="foo">foo</a>
 $ echo '[foo](https://www.example.com){#foo key1=value1 key2=value2}' | pandoc -f markdown -w html | pandoc -f html -w markdown
 [foo](https://www.example.com){#foo}

But on reading its own HTML as *Markdown*, the data is preserved correctly:

	$ echo '[foo](https://www.example.com){#foo key1=value1 key2=value2}' | pandoc -f markdown -w html | pandoc -f markdown -w markdown
	```{=html}
	
	```
	`<a href="https://www.example.com" id="foo" data-key1="value1" data-key2="value2">`{=html}foo`</a>`{=html}
	```{=html}
	
	```
	$ echo '[foo](https://www.example.com){#foo key1=value1 key2=value2}' | pandoc -w html | pandoc -f markdown -w html
	
	<a href="https://www.example.com" id="foo" data-key1="value1" data-key2="value2">foo</a>
	

This turned out to be a serious problem for my link annotation code because I write it as HTML, and so naturally my processing code also used `readHTML`; unfortunately, that erases *most* (but not all) of the data (which fooled me for a while because I could see the classes/IDs were all still there when I checked the final generated HTML, but didn't notice the data-* attributes were all gone). Debugging in ghci & CLI were even more confusing until I happened to check every possible pair of HTML/Markdown input/output formats and discovered that `readMarkdown` is better at reading HTML than `readHtml` is (!). This solved the immediate problem of silently stripping annotations but introduced further downstream problems like needing to strip `` surrounding fragments like titles/authors... So it would be good for this to be fixed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HTML link attributes are erased when parsed as HTML but not as Markdown #6970

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

HTML link attributes are erased when parsed as HTML but not as Markdown #6970

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions