htmst is a python library for parsing html into AST with positions.
uv add htmstor
pip install htmstfrom htmst import HtmlAst
html = """<span foo="bar">hi</span>"""
ast = HtmlAst(html)
print(ast.root.children[0].tag) # span
print(ast.root.children[0].start.row) # 0
print(ast.root.children[0].start.col) # 0
print(ast.root.children[0].end.row) # 0
print(ast.root.children[0].end.col) # 25
print(ast.root.children[0].attrs[0].name) # foo
print(ast.root.children[0].attrs[0].value) # bar
print(ast.root.children[0].attrs[0].start.row) # 0
print(ast.root.children[0].attrs[0].start.col) # 6
print(ast.root.children[0].attrs[0].end.row) # 0
print(ast.root.children[0].attrs[0].end.col) # 15DoubleTagNode: represents double tagsSingleTagNode: represents single tagsAttrNode: represents attributesTextNode: represents textsCommentNode: represents commentsDoctypeNode: represents doctypes
Each node has a start and end position.
Contributions are welcome! Please read the contributing guidelines for more information.
This project is licensed under the MIT License.