Thanks to visit codestin.com
Credit goes to github.com

Skip to content

html5lib.treebuilders.dom.dom2sax crashes on 'xml:lang' attribute #6

Closed
@gsnedders

Description

@gsnedders

http://code.google.com/p/html5lib/issues/detail?id=200

Reported by vovanec, Mar 6, 2012

A simple test case(my program has more complex handler implementation but the problem is reproducible with the default handler):

import xml.sax.handler
import html5lib

def test(html):
    handler = xml.sax.handler.ContentHandler()
    parser = html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder('dom'))
    dom = parser.parse(html)
    html5lib.treebuilders.dom.dom2sax(dom, handler)

html = '<html xml:lang="en">'
test(html)

With html5lib 0.95 it produces the following traceback:

python test.py 
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    test(html)
  File "test.py", line 10, in test
    html5lib.treebuilders.dom.dom2sax(dom, handler)
  File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 271, in dom2sax
    for child in node.childNodes: dom2sax(child, handler, nsmap)
  File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 256, in dom2sax
    del attributes[(attr.namespaceURI, attr.nodeName)]
KeyError: (None, u'xml:lang')

With previous versions(at least 0.11) there's no any error. I assume this attribute may be invalid in the xml namespace, but anyway I don't think it is ok for parser just to crash. I've seen A LOT of html documents that has such attribute in the real world.

Tested it with Python 2.6.5, Linux

Please advise.

Thanks,
--Vladimir

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions