Closed
Description
http://code.google.com/p/html5lib/issues/detail?id=200
Reported by vovanec, Mar 6, 2012
A simple test case(my program has more complex handler implementation but the problem is reproducible with the default handler):
import xml.sax.handler
import html5lib
def test(html):
handler = xml.sax.handler.ContentHandler()
parser = html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder('dom'))
dom = parser.parse(html)
html5lib.treebuilders.dom.dom2sax(dom, handler)
html = '<html xml:lang="en">'
test(html)
With html5lib 0.95 it produces the following traceback:
python test.py Traceback (most recent call last): File "test.py", line 13, in <module> test(html) File "test.py", line 10, in test html5lib.treebuilders.dom.dom2sax(dom, handler) File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 271, in dom2sax for child in node.childNodes: dom2sax(child, handler, nsmap) File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 256, in dom2sax del attributes[(attr.namespaceURI, attr.nodeName)] KeyError: (None, u'xml:lang')
With previous versions(at least 0.11) there's no any error. I assume this attribute may be invalid in the xml namespace, but anyway I don't think it is ok for parser just to crash. I've seen A LOT of html documents that has such attribute in the real world.
Tested it with Python 2.6.5, Linux
Please advise.
Thanks,
--Vladimir