Thanks to visit codestin.com
Credit goes to lib.rs

13 unstable releases

0.7.0 May 11, 2023
0.6.3 Apr 21, 2022
0.6.2 Dec 21, 2020
0.5.0 Nov 28, 2020
0.1.0 May 20, 2020

#842 in Parser implementations

Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App Codestin Search App

65,232 downloads per month
Used in 73 crates (26 directly)

MIT license

55KB
591 lines

Html parser

A simple and general purpose html/xhtml parser lib/bin, using Pest.

Features

  • Parse html & xhtml (not xml processing instructions)
  • Parse html-documents
  • Parse html-fragments
  • Parse empty documents
  • Parse with the same api for both documents and fragments
  • Parse custom, non-standard, elements; <cat/>, <Cat/> and <C4-t/>
  • Removes comments
  • Removes dangling elements
  • Iterate over all nodes in the dom three

What is it not

  • It's not a high-performance browser-grade parser
  • It's not suitable for html validation
  • It's not a parser that includes element selection or dom manipulation

If your requirements matches any of the above, then you're most likely looking for one of the crates below:

Examples bin

Parse html file

html_parser index.html

Parse stdin with pretty output

curl <website> | html_parser -p

Examples lib

Parse html document

    use html_parser::Dom;

    fn main() {
        let html = r#"
            <!doctype html>
            <html lang="en">
                <head>
                    <meta charset="utf-8">
                    <title>Html parser</title>
                </head>
                <body>
                    <h1 id="a" class="b c">Hello world</h1>
                    </h1> <!-- comments & dangling elements are ignored -->
                </body>
            </html>"#;

        assert!(Dom::parse(html).is_ok());
    }

Parse html fragment

    use html_parser::Dom;

    fn main() {
        let html = "<div id=cat />";
        assert!(Dom::parse(html).is_ok());
    }

Print to json

    use html_parser::{Dom, Result};

    fn main() -> Result<()> {
        let html = "<div id=cat />";
        let json = Dom::parse(html)?.to_json_pretty()?;
        println!("{}", json);
        Ok(())
    }

Dependencies

~2.3–3.5MB
~68K SLoC