Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

An opionated Rust port of Turndown.js for processing emails

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

ravnmail/turndown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

turndown

License: MIT Crates.io Rust

An opinionated Rust port of Turndown.js, a robust HTML to Markdown converter. This crate provides a fast, reliable way to transform HTML documents into clean, readable Markdown format.

Motivation

While there are several HTML to Markdown converters available in Rust, many lack the flexibility and configurability needed to handle the complex conversion of varied email HTML content. This version of turndown aims to fill that gap by offering a highly customizable conversion process, allowing users to tailor the output to their specific needs.

This crate was created to handle the mail conversion needs of RAVN Mail, the open-source email client for digital natives. Star RAVN on GitHub if you like it!

Features

  • Drop-in HTML to Markdown converter with sensible defaults for email content.
  • Highly configurable conversion rules and formatting options.
  • Support for multiple Markdown styles:
    • Heading styles: ATX (# Heading) or Setext (Heading\n=======)
    • Code block styles: Fenced (```) or Indented
    • Link styles: Inline or Reference
  • Performs well on email newsletters, marketing emails and human-written emails.
  • Filter tracking pixels and unnecessary elements commonly found in email HTML. (disabled by default)

Usage

Add this to your Cargo.toml:

[dependencies]
turndown = "0.1"

Configuration

Basic Example

use turndown::Turndown;

let turndown = Turndown::new();
let html = "<h1>Hello</h1><p>This is a <strong>test</strong>.</p>";
let markdown = turndown.convert(html);

println!("{}", markdown);
// Output:
// # Hello
// This is a **test**.

Advanced Configuration

use turndown::{Turndown, TurndownOptions, HeadingStyle, CodeBlockStyle};

let mut options = TurndownOptions::default();
options.heading_style = HeadingStyle::Setext;
options.code_block_style = CodeBlockStyle::Indented;

let turndown = Turndown::with_options(options);
let markdown = turndown.convert(html);

Command-line Tool

This crate includes a CLI tool for converting HTML to Markdown from the command line:

# Convert HTML from stdin to Markdown
echo "<h1>Hello</h1>" | turndown

Configuration Options

The TurndownOptions struct provides fine-grained control over the conversion:

Option Type Default Description
heading_style HeadingStyle Atx Heading style: Atx (# Heading) or Setext (Heading\n=======)
hr String * * * String used to render horizontal rules
bullet_list_marker String * Marker used for bullet lists (can be *, +, or -)
code_block_style CodeBlockStyle Fenced Code block style: Fenced (```) or Indented
fence String ``` Delimiter used for fenced code blocks
em_delimiter String _ Delimiter used for emphasis/italics (can be _ or *)
strong_delimiter String ** Delimiter used for strong emphasis/bold
link_style LinkStyle Inlined Link style: Inlined or Referenced
link_reference_style LinkReferenceStyle Full Link reference style: Full, Collapsed, or Shortcut (only for Referenced link style)
br String Two spaces String used to represent line breaks in Markdown
strip_tracking_images bool false Strip tracking pixels and beacons from output
tracking_image_regex Option<Regex> Sensible default Custom regex pattern to identify tracking images
strip_images_without_alt bool false Strip images that lack alt attributes

Configuration Examples

use turndown::{Turndown, TurndownOptions, HeadingStyle, CodeBlockStyle, LinkStyle};

// Use Setext headings and indented code blocks
let mut options = TurndownOptions::default();
options.heading_style = HeadingStyle::Setext;
options.code_block_style = CodeBlockStyle::Indented;

let turndown = Turndown::with_options(options);

// Strip tracking images from email newsletters
let mut options = TurndownOptions::default();
options.strip_tracking_images = true;

let turndown = Turndown::with_options(options);

// Use reference links and customize link reference style
let mut options = TurndownOptions::default();
options.link_style = LinkStyle::Referenced;

let turndown = Turndown::with_options(options);

Architecture

The conversion process works in two main stages:

  1. Parsing: HTML is parsed into a DOM tree using the html5ever parser.
  2. Conversion: The DOM tree is traversed and converted to Markdown using a rule-based system.

The rule-based system allows for customization and extension of the conversion logic.

License

Licensed under either of:

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

An opionated Rust port of Turndown.js for processing emails

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

No packages published

Languages