An opinionated Rust port of Turndown.js, a robust HTML to Markdown converter. This crate provides a fast, reliable way to transform HTML documents into clean, readable Markdown format.
While there are several HTML to Markdown converters available in Rust, many lack
the flexibility and configurability needed to handle the complex conversion of
varied email HTML content. This version of turndown aims to fill that gap by
offering a highly customizable conversion process, allowing users to tailor the
output to their specific needs.
This crate was created to handle the mail conversion needs of RAVN Mail, the open-source email client for digital natives. Star RAVN on GitHub if you like it!
- Drop-in HTML to Markdown converter with sensible defaults for email content.
- Highly configurable conversion rules and formatting options.
- Support for multiple Markdown styles:
- Heading styles: ATX (
# Heading) or Setext (Heading\n=======) - Code block styles: Fenced (
```) or Indented - Link styles: Inline or Reference
- Heading styles: ATX (
- Performs well on email newsletters, marketing emails and human-written emails.
- Filter tracking pixels and unnecessary elements commonly found in email HTML. (disabled by default)
Add this to your Cargo.toml:
[dependencies]
turndown = "0.1"use turndown::Turndown;
let turndown = Turndown::new();
let html = "<h1>Hello</h1><p>This is a <strong>test</strong>.</p>";
let markdown = turndown.convert(html);
println!("{}", markdown);
// Output:
// # Hello
// This is a **test**.use turndown::{Turndown, TurndownOptions, HeadingStyle, CodeBlockStyle};
let mut options = TurndownOptions::default();
options.heading_style = HeadingStyle::Setext;
options.code_block_style = CodeBlockStyle::Indented;
let turndown = Turndown::with_options(options);
let markdown = turndown.convert(html);This crate includes a CLI tool for converting HTML to Markdown from the command line:
# Convert HTML from stdin to Markdown
echo "<h1>Hello</h1>" | turndownThe TurndownOptions struct provides fine-grained control over the conversion:
| Option | Type | Default | Description |
|---|---|---|---|
heading_style |
HeadingStyle |
Atx |
Heading style: Atx (# Heading) or Setext (Heading\n=======) |
hr |
String |
* * * |
String used to render horizontal rules |
bullet_list_marker |
String |
* |
Marker used for bullet lists (can be *, +, or -) |
code_block_style |
CodeBlockStyle |
Fenced |
Code block style: Fenced (```) or Indented |
fence |
String |
``` |
Delimiter used for fenced code blocks |
em_delimiter |
String |
_ |
Delimiter used for emphasis/italics (can be _ or *) |
strong_delimiter |
String |
** |
Delimiter used for strong emphasis/bold |
link_style |
LinkStyle |
Inlined |
Link style: Inlined or Referenced |
link_reference_style |
LinkReferenceStyle |
Full |
Link reference style: Full, Collapsed, or Shortcut (only for Referenced link style) |
br |
String |
Two spaces | String used to represent line breaks in Markdown |
strip_tracking_images |
bool |
false |
Strip tracking pixels and beacons from output |
tracking_image_regex |
Option<Regex> |
Sensible default | Custom regex pattern to identify tracking images |
strip_images_without_alt |
bool |
false |
Strip images that lack alt attributes |
use turndown::{Turndown, TurndownOptions, HeadingStyle, CodeBlockStyle, LinkStyle};
// Use Setext headings and indented code blocks
let mut options = TurndownOptions::default();
options.heading_style = HeadingStyle::Setext;
options.code_block_style = CodeBlockStyle::Indented;
let turndown = Turndown::with_options(options);
// Strip tracking images from email newsletters
let mut options = TurndownOptions::default();
options.strip_tracking_images = true;
let turndown = Turndown::with_options(options);
// Use reference links and customize link reference style
let mut options = TurndownOptions::default();
options.link_style = LinkStyle::Referenced;
let turndown = Turndown::with_options(options);The conversion process works in two main stages:
- Parsing: HTML is parsed into a DOM tree using the
html5everparser. - Conversion: The DOM tree is traversed and converted to Markdown using a rule-based system.
The rule-based system allows for customization and extension of the conversion logic.
Licensed under either of:
- MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)
- Apache License, Version 2.0, (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.