Bug report
Bug description:
When HTMLParser is initialized with convert_charrefs=False, it behaves incorrectly when processing an invalid named entity reference (e.g., &A, which is not a valid HTML entity). The parser silently drops the & character and only passes the subsequent A to handle_data. I think this indicates a silent data loss problem.
from html.parser import HTMLParser
class MyParser(HTMLParser):
def handle_data(self, data):
print(f"handle_data received: {data!r}")
parser_false = MyParser(convert_charrefs=False)
parser_false.feed('&A')
parser_false.close()
handle_data received: 'A'
CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
Bug report
Bug description:
When
HTMLParseris initialized withconvert_charrefs=False, it behaves incorrectly when processing an invalid named entity reference (e.g.,&A, which is not a valid HTML entity). The parser silently drops the&character and only passes the subsequentAtohandle_data. I think this indicates a silent data loss problem.CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs