From ab7341d85f03cf94d3dc4c97801dacc8a4704f17 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Sat, 20 Apr 2019 09:46:28 +0200 Subject: [PATCH 1/6] bpo-36673: Implement comment/PI parsing support for the TreeBuilder in ElementTree. --- Doc/library/xml.etree.elementtree.rst | 56 ++++- Lib/test/test_xml_etree.py | 77 +++++- Lib/xml/etree/ElementTree.py | 60 ++++- .../2019-04-20-09-50-32.bpo-36673.XF4Egb.rst | 3 + Modules/_elementtree.c | 237 +++++++++++++++--- Modules/clinic/_elementtree.c.h | 72 +++++- 6 files changed, 452 insertions(+), 53 deletions(-) create mode 100644 Misc/NEWS.d/next/Library/2019-04-20-09-50-32.bpo-36673.XF4Egb.rst diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 9e2c295867ca3a..5c683c74f24e2a 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -523,8 +523,9 @@ Functions Parses an XML section into an element tree incrementally, and reports what's going on to the user. *source* is a filename or :term:`file object` containing XML data. *events* is a sequence of events to report back. The - supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and - ``"end-ns"`` (the "ns" events are used to get detailed namespace + supported events are the strings ``"start"``, ``"end"``, ``"comment"``, + ``"pi"``, ``"start-ns"`` and ``"end-ns"`` + (the "ns" events are used to get detailed namespace information). If *events* is omitted, only ``"end"`` events are reported. *parser* is an optional parser instance. If not given, the standard :class:`XMLParser` parser is used. *parser* must be a subclass of @@ -549,6 +550,10 @@ Functions .. deprecated:: 3.4 The *parser* argument. + .. versionchanged:: 3.8 + The ``comment`` and ``pi`` events were added. + + .. function:: parse(source, parser=None) Parses an XML section into an element tree. *source* is a filename or file @@ -1021,14 +1026,24 @@ TreeBuilder Objects ^^^^^^^^^^^^^^^^^^^ -.. class:: TreeBuilder(element_factory=None) +.. class:: TreeBuilder(element_factory=None, comment_factory=None, \ + pi_factory=None) Generic element structure builder. This builder converts a sequence of start, data, and end method calls to a well-formed element structure. You can use this class to build an element structure using a custom XML parser, - or a parser for some other XML-like format. *element_factory*, when given, - must be a callable accepting two positional arguments: a tag and - a dict of attributes. It is expected to return a new element instance. + or a parser for some other XML-like format. + + *element_factory*, when given, must be a callable accepting two positional + arguments: a tag and a dict of attributes. It is expected to return a new + element instance. + + The *comment_factory* and *pi_factory* functions, when given, should behave + like the :func:`Comment` and :func:`ProcessingInstruction` functions to + create comments and processing instructions. When not given, no comments + or processing instructions will be created. Note that these objects will + not currently be appended to the tree when they appear outside of the root + element. .. method:: close() @@ -1053,6 +1068,21 @@ TreeBuilder Objects Opens a new element. *tag* is the element name. *attrs* is a dictionary containing element attributes. Returns the opened element. + .. method:: comment(text) + + Adds a comment with the given *text*. If *comment_factory* is + :const:`None`, this will just return the text. + + .. versionadded:: 3.8 + + .. method:: pi(target, text) + + Adds a comment with the given *target* name and *text*. If + *pi_factory* is :const:`None`, this will return a ``(target, text)`` + tuple. + + .. versionadded:: 3.8 + In addition, a custom :class:`TreeBuilder` object can provide the following method: @@ -1150,9 +1180,9 @@ XMLPullParser Objects callback target, :class:`XMLPullParser` collects an internal list of parsing events and lets the user read from it. *events* is a sequence of events to report back. The supported events are the strings ``"start"``, ``"end"``, - ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed - namespace information). If *events* is omitted, only ``"end"`` events are - reported. + ``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events + are used to get detailed namespace information). If *events* is omitted, + only ``"end"`` events are reported. .. method:: feed(data) @@ -1172,6 +1202,10 @@ XMLPullParser Objects parser. The iterator yields ``(event, elem)`` pairs, where *event* is a string representing the type of event (e.g. ``"end"``) and *elem* is the encountered :class:`Element` object. + For ``start-ns`` events, the ``elem`` is a tuple ``(prefix, uri)`` naming + the declared namespace mapping. For ``end-ns`` events, the ``elem`` is + :const:`None`. For ``comment`` events, the second value is the comment + text and for ``pi`` events a tuple ``(target, text)``. Events provided in a previous call to :meth:`read_events` will not be yielded again. Events are consumed from the internal queue only when @@ -1191,6 +1225,10 @@ XMLPullParser Objects .. versionadded:: 3.4 + .. versionchanged:: 3.8 + The ``comment`` and ``pi`` events were added. + + Exceptions ^^^^^^^^^^ diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 14ce32af802624..c022906bd938bf 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -1193,6 +1193,9 @@ def _feed(self, parser, data, chunk_size=None): for i in range(0, len(data), chunk_size): parser.feed(data[i:i+chunk_size]) + def assert_events(self, parser, expected): + self.assertEqual(list(parser.read_events()), expected) + def assert_event_tags(self, parser, expected): events = parser.read_events() self.assertEqual([(action, elem.tag) for action, elem in events], @@ -1275,8 +1278,10 @@ def test_events(self): self.assert_event_tags(parser, []) parser = ET.XMLPullParser(events=('start', 'end')) - self._feed(parser, "\n") - self.assert_event_tags(parser, []) + self._feed(parser, "\n") + self.assert_events(parser, []) + + parser = ET.XMLPullParser(events=('start', 'end')) self._feed(parser, "\n") self.assert_event_tags(parser, [('start', 'root')]) self._feed(parser, "text") self.assertIsNone(parser.close()) + def test_events_comment(self): + parser = ET.XMLPullParser(events=('start', 'comment', 'end')) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' text here ')]) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' more text here ')]) + self._feed(parser, "text") + self.assert_event_tags(parser, [('start', 'root-tag')]) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' inner comment')]) + self._feed(parser, "\n") + self.assert_event_tags(parser, [('end', 'root-tag')]) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' outer comment ')]) + + parser = ET.XMLPullParser(events=('comment',)) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' text here ')]) + + def test_events_pi(self): + parser = ET.XMLPullParser(events=('start', 'pi', 'end')) + self._feed(parser, "\n") + self.assert_events(parser, [('pi', ('pitarget', ''))]) + parser = ET.XMLPullParser(events=('pi',)) + self._feed(parser, "\n") + self.assert_events(parser, [('pi', ('pitarget', 'some text '))]) + + def test_events_sequence(self): # Test that events can be some sequence that's not just a tuple or list eventset = {'end', 'start'} @@ -2658,6 +2691,31 @@ class DummyBuilder(BaseDummyBuilder): parser.feed(self.sample1) self.assertIsNone(parser.close()) + def test_treebuilder_comment(self): + b = ET.TreeBuilder() + self.assertEqual(b.comment('ctext'), 'ctext') + + b = ET.TreeBuilder(comment_factory=ET.Comment) + self.assertEqual(b.comment('ctext').tag, ET.Comment) + self.assertEqual(b.comment('ctext').text, 'ctext') + + b = ET.TreeBuilder(comment_factory=len) + self.assertEqual(b.comment('ctext'), len('ctext')) + + def test_treebuilder_pi(self): + b = ET.TreeBuilder() + self.assertEqual(b.pi('target', None), ('target', None)) + + b = ET.TreeBuilder(pi_factory=ET.PI) + self.assertEqual(b.pi('target').tag, ET.PI) + self.assertEqual(b.pi('target').text, "target") + self.assertEqual(b.pi('pitarget', ' text ').tag, ET.PI) + self.assertEqual(b.pi('pitarget', ' text ').text, "pitarget text ") + + b = ET.TreeBuilder(pi_factory=lambda target, text: (len(target), text)) + self.assertEqual(b.pi('target'), (len('target'), None)) + self.assertEqual(b.pi('pitarget', ' text '), (len('pitarget'), ' text ')) + def test_treebuilder_elementfactory_none(self): parser = ET.XMLParser(target=ET.TreeBuilder(element_factory=None)) parser.feed(self.sample1) @@ -2678,6 +2736,21 @@ def foobar(self, x): e = parser.close() self._check_sample1_element(e) + def test_subclass_comment_pi(self): + class MyTreeBuilder(ET.TreeBuilder): + def foobar(self, x): + return x * 2 + + tb = MyTreeBuilder(comment_factory=ET.Comment, pi_factory=ET.PI) + self.assertEqual(tb.foobar(10), 20) + + parser = ET.XMLParser(target=tb) + parser.feed(self.sample1) + parser.feed('') + + e = parser.close() + self._check_sample1_element(e) + def test_element_factory(self): lst = [] def myfactory(tag, attrib): diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index c9e2f36835021e..c2fab3798d87ab 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1374,12 +1374,22 @@ class TreeBuilder: *element_factory* is an optional element factory which is called to create new Element instances, as necessary. + *comment_factory* is a factory to create comments. If not provided, + comments will not be inserted into the tree and "comment" pull parser + events will only return the plain text. + + *pi_factory* is a factory to create processing instructions. If not + provided, PIs will not be inserted into the tree and "pi" pull parser + events will only return a (target, text) tuple. """ - def __init__(self, element_factory=None): + def __init__(self, element_factory=None, comment_factory=None, pi_factory=None): self._data = [] # data collector self._elem = [] # element stack self._last = None # last element + self._root = None # root element self._tail = None # true if we're after an end tag + self._comment_factory = comment_factory + self._pi_factory = pi_factory if element_factory is None: element_factory = Element self._factory = element_factory @@ -1387,8 +1397,8 @@ def __init__(self, element_factory=None): def close(self): """Flush builder buffers and return toplevel document Element.""" assert len(self._elem) == 0, "missing end tags" - assert self._last is not None, "missing toplevel element" - return self._last + assert self._root is not None, "missing toplevel element" + return self._root def _flush(self): if self._data: @@ -1417,6 +1427,8 @@ def start(self, tag, attrs): self._last = elem = self._factory(tag, attrs) if self._elem: self._elem[-1].append(elem) + elif self._root is None: + self._root = elem self._elem.append(elem) self._tail = 0 return elem @@ -1435,6 +1447,39 @@ def end(self, tag): self._tail = 1 return self._last + def comment(self, text): + """Create a comment using the comment_factory. + + If no factory is provided, comments are ignored + and the text returned as is. + + *text* is the text of the comment. + """ + if self._comment_factory is None: + return text + return self._handle_single(self._comment_factory, text) + + def pi(self, target, text=None): + """Create a processing instruction using the pi_factory. + + If no factory is provided, PIs are ignored and a (target, text) + tuple is returned. + + *target* is the target name of the processing instruction. + *text* is the data of the processing instruction, or ''. + """ + if self._pi_factory is None: + return (target, text) + return self._handle_single(self._pi_factory, target, text) + + def _handle_single(self, factory, *args): + self._flush() + self._last = elem = factory(*args) + if self._elem: + self._elem[-1].append(elem) + self._tail = 1 + return elem + # also see ElementTree and TreeBuilder class XMLParser: @@ -1519,6 +1564,15 @@ def handler(prefix, uri, event=event_name, append=append): def handler(prefix, event=event_name, append=append): append((event, None)) parser.EndNamespaceDeclHandler = handler + elif event_name == 'comment': + def handler(text, event=event_name, append=append, self=self): + append((event, self.target.comment(text))) + parser.CommentHandler = handler + elif event_name == 'pi': + def handler(pi_target, data, event=event_name, append=append, + self=self): + append((event, self.target.pi(pi_target, data))) + parser.ProcessingInstructionHandler = handler else: raise ValueError("unknown event %r" % event_name) diff --git a/Misc/NEWS.d/next/Library/2019-04-20-09-50-32.bpo-36673.XF4Egb.rst b/Misc/NEWS.d/next/Library/2019-04-20-09-50-32.bpo-36673.XF4Egb.rst new file mode 100644 index 00000000000000..76bf914e22b196 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2019-04-20-09-50-32.bpo-36673.XF4Egb.rst @@ -0,0 +1,3 @@ +The TreeBuilder and XMLPullParser in xml.etree.ElementTree gained support +for parsing comments and processing instructions. +Patch by Stefan Behnel. diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c index 1e58cd05b51237..663337d42dc768 100644 --- a/Modules/_elementtree.c +++ b/Modules/_elementtree.c @@ -2385,6 +2385,8 @@ typedef struct { Py_ssize_t index; /* current stack size (0 means empty) */ PyObject *element_factory; + PyObject *comment_factory; + PyObject *pi_factory; /* element tracing */ PyObject *events_append; /* the append method of the list of events, or NULL */ @@ -2392,6 +2394,8 @@ typedef struct { PyObject *end_event_obj; PyObject *start_ns_event_obj; PyObject *end_ns_event_obj; + PyObject *comment_event_obj; + PyObject *pi_event_obj; } TreeBuilderObject; #define TreeBuilder_CheckExact(op) (Py_TYPE(op) == &TreeBuilder_Type) @@ -2413,6 +2417,8 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) t->data = NULL; t->element_factory = NULL; + t->comment_factory = NULL; + t->pi_factory = NULL; t->stack = PyList_New(20); if (!t->stack) { Py_DECREF(t->this); @@ -2425,6 +2431,7 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) t->events_append = NULL; t->start_event_obj = t->end_event_obj = NULL; t->start_ns_event_obj = t->end_ns_event_obj = NULL; + t->comment_event_obj = t->pi_event_obj = NULL; } return (PyObject *)t; } @@ -2433,17 +2440,35 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) _elementtree.TreeBuilder.__init__ element_factory: object = NULL + comment_factory: object = NULL + pi_factory: object = NULL [clinic start generated code]*/ static int _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, - PyObject *element_factory) -/*[clinic end generated code: output=91cfa7558970ee96 input=1b424eeefc35249c]*/ + PyObject *element_factory, + PyObject *comment_factory, + PyObject *pi_factory) +/*[clinic end generated code: output=da49f5ab76aee6d6 input=9b7d938a273ab7ad]*/ { - if (element_factory) { + if (element_factory && element_factory != Py_None) { Py_INCREF(element_factory); Py_XSETREF(self->element_factory, element_factory); + } else { + Py_CLEAR(self->element_factory); + } + if (comment_factory && comment_factory != Py_None) { + Py_INCREF(comment_factory); + Py_XSETREF(self->comment_factory, comment_factory); + } else { + Py_CLEAR(self->comment_factory); + } + if (pi_factory && pi_factory != Py_None) { + Py_INCREF(pi_factory); + Py_XSETREF(self->pi_factory, pi_factory); + } else { + Py_CLEAR(self->pi_factory); } return 0; @@ -2452,6 +2477,8 @@ _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, static int treebuilder_gc_traverse(TreeBuilderObject *self, visitproc visit, void *arg) { + Py_VISIT(self->pi_event_obj); + Py_VISIT(self->comment_event_obj); Py_VISIT(self->end_ns_event_obj); Py_VISIT(self->start_ns_event_obj); Py_VISIT(self->end_event_obj); @@ -2462,6 +2489,8 @@ treebuilder_gc_traverse(TreeBuilderObject *self, visitproc visit, void *arg) Py_VISIT(self->last); Py_VISIT(self->data); Py_VISIT(self->stack); + Py_VISIT(self->pi_factory); + Py_VISIT(self->comment_factory); Py_VISIT(self->element_factory); return 0; } @@ -2469,6 +2498,8 @@ treebuilder_gc_traverse(TreeBuilderObject *self, visitproc visit, void *arg) static int treebuilder_gc_clear(TreeBuilderObject *self) { + Py_CLEAR(self->pi_event_obj); + Py_CLEAR(self->comment_event_obj); Py_CLEAR(self->end_ns_event_obj); Py_CLEAR(self->start_ns_event_obj); Py_CLEAR(self->end_event_obj); @@ -2478,6 +2509,8 @@ treebuilder_gc_clear(TreeBuilderObject *self) Py_CLEAR(self->data); Py_CLEAR(self->last); Py_CLEAR(self->this); + Py_CLEAR(self->pi_factory); + Py_CLEAR(self->comment_factory); Py_CLEAR(self->element_factory); Py_CLEAR(self->root); return 0; @@ -2569,7 +2602,7 @@ treebuilder_append_event(TreeBuilderObject *self, PyObject *action, PyObject *event = PyTuple_Pack(2, action, node); if (event == NULL) return -1; - res = PyObject_CallFunctionObjArgs(self->events_append, event, NULL); + res = _PyObject_FastCall(self->events_append, &event, 1); Py_DECREF(event); if (res == NULL) return -1; @@ -2593,7 +2626,7 @@ treebuilder_handle_start(TreeBuilderObject* self, PyObject* tag, return NULL; } - if (!self->element_factory || self->element_factory == Py_None) { + if (!self->element_factory) { node = create_new_element(tag, attrib); } else if (attrib == Py_None) { attrib = PyDict_New(); @@ -2721,6 +2754,84 @@ treebuilder_handle_end(TreeBuilderObject* self, PyObject* tag) return (PyObject*) self->last; } +LOCAL(PyObject*) +treebuilder_handle_comment(TreeBuilderObject* self, PyObject* text) +{ + PyObject* comment = NULL; + PyObject* this; + + if (treebuilder_flush_data(self) < 0) { + return NULL; + } + + if (self->comment_factory) { + comment = _PyObject_FastCall(self->comment_factory, &text, 1); + if (!comment) + return NULL; + + this = self->this; + if (this != Py_None) { + if (treebuilder_add_subelement(this, comment) < 0) + goto error; + } + } else { + Py_INCREF(text); + comment = text; + } + + if (self->events_append && self->comment_event_obj) { + if (treebuilder_append_event(self, self->comment_event_obj, comment) < 0) + goto error; + } + + return comment; + + error: + Py_DECREF(comment); + return NULL; +} + +LOCAL(PyObject*) +treebuilder_handle_pi(TreeBuilderObject* self, PyObject* target, PyObject* text) +{ + PyObject* pi = NULL; + PyObject* this; + PyObject* stack[2] = {target, text}; + + if (treebuilder_flush_data(self) < 0) { + return NULL; + } + + if (self->pi_factory) { + pi = _PyObject_FastCall(self->pi_factory, stack, 2); + if (!pi) { + return NULL; + } + + this = self->this; + if (this != Py_None) { + if (treebuilder_add_subelement(this, pi) < 0) + goto error; + } + } else { + pi = PyTuple_Pack(2, target, text); + if (!pi) { + return NULL; + } + } + + if (self->events_append && self->pi_event_obj) { + if (treebuilder_append_event(self, self->pi_event_obj, pi) < 0) + goto error; + } + + return pi; + + error: + Py_DECREF(pi); + return NULL; +} + /* -------------------------------------------------------------------- */ /* methods (in alphabetical order) */ @@ -2754,6 +2865,38 @@ _elementtree_TreeBuilder_end(TreeBuilderObject *self, PyObject *tag) return treebuilder_handle_end(self, tag); } +/*[clinic input] +_elementtree.TreeBuilder.comment + + text: object + / + +[clinic start generated code]*/ + +static PyObject * +_elementtree_TreeBuilder_comment(TreeBuilderObject *self, PyObject *text) +/*[clinic end generated code: output=22835be41deeaa27 input=47e7ebc48ed01dfa]*/ +{ + return treebuilder_handle_comment(self, text); +} + +/*[clinic input] +_elementtree.TreeBuilder.pi + + target: object + text: object = None + / + +[clinic start generated code]*/ + +static PyObject * +_elementtree_TreeBuilder_pi_impl(TreeBuilderObject *self, PyObject *target, + PyObject *text) +/*[clinic end generated code: output=21eb95ec9d04d1d9 input=349342bd79c35570]*/ +{ + return treebuilder_handle_pi(self, target, text); +} + LOCAL(PyObject*) treebuilder_done(TreeBuilderObject* self) { @@ -2925,7 +3068,7 @@ expat_set_error(enum XML_Error error_code, Py_ssize_t line, Py_ssize_t column, if (errmsg == NULL) return; - error = PyObject_CallFunctionObjArgs(st->parseerror_obj, errmsg, NULL); + error = _PyObject_FastCall(st->parseerror_obj, &errmsg, 1); Py_DECREF(errmsg); if (!error) return; @@ -2988,7 +3131,7 @@ expat_default_handler(XMLParserObject* self, const XML_Char* data_in, (TreeBuilderObject*) self->target, value ); else if (self->handle_data) - res = PyObject_CallFunctionObjArgs(self->handle_data, value, NULL); + res = _PyObject_FastCall(self->handle_data, &value, 1); else res = NULL; Py_XDECREF(res); @@ -3099,7 +3242,7 @@ expat_data_handler(XMLParserObject* self, const XML_Char* data_in, /* shortcut */ res = treebuilder_handle_data((TreeBuilderObject*) self->target, data); else if (self->handle_data) - res = PyObject_CallFunctionObjArgs(self->handle_data, data, NULL); + res = _PyObject_FastCall(self->handle_data, &data, 1); else res = NULL; @@ -3126,7 +3269,7 @@ expat_end_handler(XMLParserObject* self, const XML_Char* tag_in) else if (self->handle_end) { tag = makeuniversal(self, tag_in); if (tag) { - res = PyObject_CallFunctionObjArgs(self->handle_end, tag, NULL); + res = _PyObject_FastCall(self->handle_end, &tag, 1); Py_DECREF(tag); } } @@ -3176,21 +3319,31 @@ expat_end_ns_handler(XMLParserObject* self, const XML_Char* prefix_in) static void expat_comment_handler(XMLParserObject* self, const XML_Char* comment_in) { - PyObject* comment; - PyObject* res; + PyObject* comment = NULL; + PyObject* res = NULL; if (PyErr_Occurred()) return; - if (self->handle_comment) { + if (TreeBuilder_CheckExact(self->target)) { + /* shortcut */ + TreeBuilderObject *target = (TreeBuilderObject*) self->target; + comment = PyUnicode_DecodeUTF8(comment_in, strlen(comment_in), "strict"); - if (comment) { - res = PyObject_CallFunctionObjArgs(self->handle_comment, - comment, NULL); - Py_XDECREF(res); - Py_DECREF(comment); - } + if (!comment) + return; /* parser will look for errors */ + + res = treebuilder_handle_comment(target, comment); + } else if (self->handle_comment) { + comment = PyUnicode_DecodeUTF8(comment_in, strlen(comment_in), "strict"); + if (!comment) + return; + + res = _PyObject_FastCall(self->handle_comment, &comment, 1); } + + Py_XDECREF(res); + Py_DECREF(comment); } static void @@ -3258,26 +3411,30 @@ static void expat_pi_handler(XMLParserObject* self, const XML_Char* target_in, const XML_Char* data_in) { - PyObject* target; - PyObject* data; + PyObject* parcel; PyObject* res; if (PyErr_Occurred()) return; - if (self->handle_pi) { - target = PyUnicode_DecodeUTF8(target_in, strlen(target_in), "strict"); - data = PyUnicode_DecodeUTF8(data_in, strlen(data_in), "strict"); - if (target && data) { - res = PyObject_CallFunctionObjArgs(self->handle_pi, - target, data, NULL); - Py_XDECREF(res); - Py_DECREF(data); - Py_DECREF(target); - } else { - Py_XDECREF(data); - Py_XDECREF(target); + if (TreeBuilder_CheckExact(self->target)) { + /* shortcut: TreeBuilder does not handle PIs */ + TreeBuilderObject *target = (TreeBuilderObject*) self->target; + + if (target->events_append && target->pi_event_obj) { + parcel = Py_BuildValue("ss", target_in, data_in); + if (!parcel) + return; + treebuilder_append_event(target, target->pi_event_obj, parcel); + Py_DECREF(parcel); } + } else if (self->handle_pi) { + parcel = Py_BuildValue("ss", target_in, data_in); + if (!parcel) + return; + res = PyObject_Call(self->handle_pi, parcel, NULL); + Py_XDECREF(res); + Py_DECREF(parcel); } } @@ -3695,6 +3852,8 @@ _elementtree_XMLParser__setevents_impl(XMLParserObject *self, Py_CLEAR(target->end_event_obj); Py_CLEAR(target->start_ns_event_obj); Py_CLEAR(target->end_ns_event_obj); + Py_CLEAR(target->comment_event_obj); + Py_CLEAR(target->pi_event_obj); if (events_to_report == Py_None) { /* default is "end" only */ @@ -3740,6 +3899,18 @@ _elementtree_XMLParser__setevents_impl(XMLParserObject *self, (XML_StartNamespaceDeclHandler) expat_start_ns_handler, (XML_EndNamespaceDeclHandler) expat_end_ns_handler ); + } else if (strcmp(event_name, "comment") == 0) { + Py_XSETREF(target->comment_event_obj, event_name_obj); + EXPAT(SetCommentHandler)( + self->parser, + (XML_CommentHandler) expat_comment_handler + ); + } else if (strcmp(event_name, "pi") == 0) { + Py_XSETREF(target->pi_event_obj, event_name_obj); + EXPAT(SetProcessingInstructionHandler)( + self->parser, + (XML_ProcessingInstructionHandler) expat_pi_handler + ); } else { Py_DECREF(event_name_obj); Py_DECREF(events_seq); @@ -3882,6 +4053,8 @@ static PyMethodDef treebuilder_methods[] = { _ELEMENTTREE_TREEBUILDER_DATA_METHODDEF _ELEMENTTREE_TREEBUILDER_START_METHODDEF _ELEMENTTREE_TREEBUILDER_END_METHODDEF + _ELEMENTTREE_TREEBUILDER_COMMENT_METHODDEF + _ELEMENTTREE_TREEBUILDER_PI_METHODDEF _ELEMENTTREE_TREEBUILDER_CLOSE_METHODDEF {NULL, NULL} }; diff --git a/Modules/clinic/_elementtree.c.h b/Modules/clinic/_elementtree.c.h index d239c802583c6c..b1c5f8e25d205f 100644 --- a/Modules/clinic/_elementtree.c.h +++ b/Modules/clinic/_elementtree.c.h @@ -635,30 +635,46 @@ _elementtree_Element_set(ElementObject *self, PyObject *const *args, Py_ssize_t static int _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, - PyObject *element_factory); + PyObject *element_factory, + PyObject *comment_factory, + PyObject *pi_factory); static int _elementtree_TreeBuilder___init__(PyObject *self, PyObject *args, PyObject *kwargs) { int return_value = -1; - static const char * const _keywords[] = {"element_factory", NULL}; + static const char * const _keywords[] = {"element_factory", "comment_factory", "pi_factory", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "TreeBuilder", 0}; - PyObject *argsbuf[1]; + PyObject *argsbuf[3]; PyObject * const *fastargs; Py_ssize_t nargs = PyTuple_GET_SIZE(args); Py_ssize_t noptargs = nargs + (kwargs ? PyDict_GET_SIZE(kwargs) : 0) - 0; PyObject *element_factory = NULL; + PyObject *comment_factory = NULL; + PyObject *pi_factory = NULL; - fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 0, 1, 0, argsbuf); + fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 0, 3, 0, argsbuf); if (!fastargs) { goto exit; } if (!noptargs) { goto skip_optional_pos; } - element_factory = fastargs[0]; + if (fastargs[0]) { + element_factory = fastargs[0]; + if (!--noptargs) { + goto skip_optional_pos; + } + } + if (fastargs[1]) { + comment_factory = fastargs[1]; + if (!--noptargs) { + goto skip_optional_pos; + } + } + pi_factory = fastargs[2]; skip_optional_pos: - return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory); + return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory, comment_factory, pi_factory); exit: return return_value; @@ -680,6 +696,48 @@ PyDoc_STRVAR(_elementtree_TreeBuilder_end__doc__, #define _ELEMENTTREE_TREEBUILDER_END_METHODDEF \ {"end", (PyCFunction)_elementtree_TreeBuilder_end, METH_O, _elementtree_TreeBuilder_end__doc__}, +PyDoc_STRVAR(_elementtree_TreeBuilder_comment__doc__, +"comment($self, text, /)\n" +"--\n" +"\n"); + +#define _ELEMENTTREE_TREEBUILDER_COMMENT_METHODDEF \ + {"comment", (PyCFunction)_elementtree_TreeBuilder_comment, METH_O, _elementtree_TreeBuilder_comment__doc__}, + +PyDoc_STRVAR(_elementtree_TreeBuilder_pi__doc__, +"pi($self, target, text=None, /)\n" +"--\n" +"\n"); + +#define _ELEMENTTREE_TREEBUILDER_PI_METHODDEF \ + {"pi", (PyCFunction)(void(*)(void))_elementtree_TreeBuilder_pi, METH_FASTCALL, _elementtree_TreeBuilder_pi__doc__}, + +static PyObject * +_elementtree_TreeBuilder_pi_impl(TreeBuilderObject *self, PyObject *target, + PyObject *text); + +static PyObject * +_elementtree_TreeBuilder_pi(TreeBuilderObject *self, PyObject *const *args, Py_ssize_t nargs) +{ + PyObject *return_value = NULL; + PyObject *target; + PyObject *text = Py_None; + + if (!_PyArg_CheckPositional("pi", nargs, 1, 2)) { + goto exit; + } + target = args[0]; + if (nargs < 2) { + goto skip_optional; + } + text = args[1]; +skip_optional: + return_value = _elementtree_TreeBuilder_pi_impl(self, target, text); + +exit: + return return_value; +} + PyDoc_STRVAR(_elementtree_TreeBuilder_close__doc__, "close($self, /)\n" "--\n" @@ -853,4 +911,4 @@ _elementtree_XMLParser__setevents(XMLParserObject *self, PyObject *const *args, exit: return return_value; } -/*[clinic end generated code: output=440b5d90a4b86590 input=a9049054013a1b77]*/ +/*[clinic end generated code: output=94ec504fdbcea1d3 input=a9049054013a1b77]*/ From 2d2df114bb0b29d988ee45105951cd9b91c2d43c Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Sat, 20 Apr 2019 22:36:37 +0200 Subject: [PATCH 2/6] bpo-36673: Rewrite the comment/PI factory handling for the TreeBuilder in "_elementtree" to make it use the same factories as the ElementTree module, and to make it explicit when the comments/PIs are inserted into the tree and when they are not (which is the default). --- Doc/library/xml.etree.elementtree.rst | 32 +++--- Lib/test/test_xml_etree.py | 35 ++++--- Lib/xml/etree/ElementTree.py | 59 ++++++----- Modules/_elementtree.c | 136 ++++++++++++++++++++++---- Modules/clinic/_elementtree.c.h | 76 ++++++++++++-- 5 files changed, 258 insertions(+), 80 deletions(-) diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 5c683c74f24e2a..1e4134aa1e4ad3 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -1026,13 +1026,13 @@ TreeBuilder Objects ^^^^^^^^^^^^^^^^^^^ -.. class:: TreeBuilder(element_factory=None, comment_factory=None, \ - pi_factory=None) +.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \ + pi_factory=None, insert_comments=False, insert_pis=False) Generic element structure builder. This builder converts a sequence of - start, data, and end method calls to a well-formed element structure. You - can use this class to build an element structure using a custom XML parser, - or a parser for some other XML-like format. + start, data, end, comment and pi method calls to a well-formed element + structure. You can use this class to build an element structure using + a custom XML parser, or a parser for some other XML-like format. *element_factory*, when given, must be a callable accepting two positional arguments: a tag and a dict of attributes. It is expected to return a new @@ -1040,10 +1040,10 @@ TreeBuilder Objects The *comment_factory* and *pi_factory* functions, when given, should behave like the :func:`Comment` and :func:`ProcessingInstruction` functions to - create comments and processing instructions. When not given, no comments - or processing instructions will be created. Note that these objects will - not currently be appended to the tree when they appear outside of the root - element. + create comments and processing instructions. When not given, the default + factories will be used. When *insert_comments* and/or *insert_pis* is true, + comments/pis will be inserted into the tree if they appear within the root + element (but not outside of it). .. method:: close() @@ -1068,6 +1068,7 @@ TreeBuilder Objects Opens a new element. *tag* is the element name. *attrs* is a dictionary containing element attributes. Returns the opened element. + .. method:: comment(text) Adds a comment with the given *text*. If *comment_factory* is @@ -1075,6 +1076,7 @@ TreeBuilder Objects .. versionadded:: 3.8 + .. method:: pi(target, text) Adds a comment with the given *target* name and *text*. If @@ -1201,11 +1203,13 @@ XMLPullParser Objects data fed to the parser. The iterator yields ``(event, elem)`` pairs, where *event* is a string representing the type of event (e.g. ``"end"``) and *elem* is the - encountered :class:`Element` object. - For ``start-ns`` events, the ``elem`` is a tuple ``(prefix, uri)`` naming - the declared namespace mapping. For ``end-ns`` events, the ``elem`` is - :const:`None`. For ``comment`` events, the second value is the comment - text and for ``pi`` events a tuple ``(target, text)``. + encountered :class:`Element` object, or other context value as follows. + + * ``start``, ``end``: the current Element. + * ``comment``, ``pi``: the current comment / processing instruction + * ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace + mapping. + * ``end-ns``: :const:`None` (this may change in a future version) Events provided in a previous call to :meth:`read_events` will not be yielded again. Events are consumed from the internal queue only when diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index c022906bd938bf..94a22882cb8343 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -1194,7 +1194,10 @@ def _feed(self, parser, data, chunk_size=None): parser.feed(data[i:i+chunk_size]) def assert_events(self, parser, expected): - self.assertEqual(list(parser.read_events()), expected) + self.assertEqual( + [(event, (elem.tag, elem.text)) + for event, elem in parser.read_events()], + expected) def assert_event_tags(self, parser, expected): events = parser.read_events() @@ -1321,30 +1324,29 @@ def test_events(self): def test_events_comment(self): parser = ET.XMLPullParser(events=('start', 'comment', 'end')) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' text here ')]) + self.assert_events(parser, [('comment', (ET.Comment, ' text here '))]) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' more text here ')]) + self.assert_events(parser, [('comment', (ET.Comment, ' more text here '))]) self._feed(parser, "text") self.assert_event_tags(parser, [('start', 'root-tag')]) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' inner comment')]) + self.assert_events(parser, [('comment', (ET.Comment, ' inner comment'))]) self._feed(parser, "\n") self.assert_event_tags(parser, [('end', 'root-tag')]) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' outer comment ')]) + self.assert_events(parser, [('comment', (ET.Comment, ' outer comment '))]) parser = ET.XMLPullParser(events=('comment',)) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' text here ')]) + self.assert_events(parser, [('comment', (ET.Comment, ' text here '))]) def test_events_pi(self): parser = ET.XMLPullParser(events=('start', 'pi', 'end')) self._feed(parser, "\n") - self.assert_events(parser, [('pi', ('pitarget', ''))]) + self.assert_events(parser, [('pi', (ET.PI, 'pitarget'))]) parser = ET.XMLPullParser(events=('pi',)) self._feed(parser, "\n") - self.assert_events(parser, [('pi', ('pitarget', 'some text '))]) - + self.assert_events(parser, [('pi', (ET.PI, 'pitarget some text '))]) def test_events_sequence(self): # Test that events can be some sequence that's not just a tuple or list @@ -1365,7 +1367,6 @@ def __next__(self): self._feed(parser, "bar") self.assert_event_tags(parser, [('start', 'foo'), ('end', 'foo')]) - def test_unknown_event(self): with self.assertRaises(ValueError): ET.XMLPullParser(events=('start', 'end', 'bogus')) @@ -2693,7 +2694,8 @@ class DummyBuilder(BaseDummyBuilder): def test_treebuilder_comment(self): b = ET.TreeBuilder() - self.assertEqual(b.comment('ctext'), 'ctext') + self.assertEqual(b.comment('ctext').tag, ET.Comment) + self.assertEqual(b.comment('ctext').text, 'ctext') b = ET.TreeBuilder(comment_factory=ET.Comment) self.assertEqual(b.comment('ctext').tag, ET.Comment) @@ -2704,7 +2706,8 @@ def test_treebuilder_comment(self): def test_treebuilder_pi(self): b = ET.TreeBuilder() - self.assertEqual(b.pi('target', None), ('target', None)) + self.assertEqual(b.pi('target', None).tag, ET.PI) + self.assertEqual(b.pi('target', None).text, 'target') b = ET.TreeBuilder(pi_factory=ET.PI) self.assertEqual(b.pi('target').tag, ET.PI) @@ -3408,6 +3411,12 @@ def test_main(module=None): # Copy the path cache (should be empty) path_cache = ElementPath._cache ElementPath._cache = path_cache.copy() + # Align the Comment/PI factories. + if hasattr(ET, '_set_factories'): + old_factories = ET._set_factories(ET.Comment, ET.PI) + else: + old_factories = None + try: support.run_unittest(*test_classes) finally: @@ -3416,6 +3425,8 @@ def test_main(module=None): nsmap.clear() nsmap.update(nsmap_copy) ElementPath._cache = path_cache + if old_factories is not None: + ET._set_factories(*old_factories) # don't interfere with subsequent tests ET = pyET = None diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index c2fab3798d87ab..c6400480f5b4b4 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1374,22 +1374,30 @@ class TreeBuilder: *element_factory* is an optional element factory which is called to create new Element instances, as necessary. - *comment_factory* is a factory to create comments. If not provided, - comments will not be inserted into the tree and "comment" pull parser - events will only return the plain text. + *comment_factory* is a factory to create comments to be used instead of + the standard factory. If *insert_comments* is false (the default), + comments will not be inserted into the tree. - *pi_factory* is a factory to create processing instructions. If not - provided, PIs will not be inserted into the tree and "pi" pull parser - events will only return a (target, text) tuple. + *pi_factory* is a factory to create processing instructions to be used + instead of the standard factory. If *insert_pis* is false (the default), + processing instructions will not be inserted into the tree. """ - def __init__(self, element_factory=None, comment_factory=None, pi_factory=None): + def __init__(self, element_factory=None, *, + comment_factory=None, pi_factory=None, + insert_comments=False, insert_pis=False): self._data = [] # data collector self._elem = [] # element stack self._last = None # last element self._root = None # root element self._tail = None # true if we're after an end tag + if comment_factory is None: + comment_factory = Comment self._comment_factory = comment_factory + self.insert_comments = insert_comments + if pi_factory is None: + pi_factory = ProcessingInstruction self._pi_factory = pi_factory + self.insert_pis = insert_pis if element_factory is None: element_factory = Element self._factory = element_factory @@ -1450,34 +1458,28 @@ def end(self, tag): def comment(self, text): """Create a comment using the comment_factory. - If no factory is provided, comments are ignored - and the text returned as is. - *text* is the text of the comment. """ - if self._comment_factory is None: - return text - return self._handle_single(self._comment_factory, text) + return self._handle_single( + self._comment_factory, self.insert_comments, text) def pi(self, target, text=None): """Create a processing instruction using the pi_factory. - If no factory is provided, PIs are ignored and a (target, text) - tuple is returned. - *target* is the target name of the processing instruction. *text* is the data of the processing instruction, or ''. """ - if self._pi_factory is None: - return (target, text) - return self._handle_single(self._pi_factory, target, text) - - def _handle_single(self, factory, *args): - self._flush() - self._last = elem = factory(*args) - if self._elem: - self._elem[-1].append(elem) - self._tail = 1 + return self._handle_single( + self._pi_factory, self.insert_pis, target, text) + + def _handle_single(self, factory, insert, *args): + elem = factory(*args) + if insert: + self._flush() + self._last = elem + if self._elem: + self._elem[-1].append(elem) + self._tail = 1 return elem @@ -1694,7 +1696,10 @@ def close(self): # (see tests) _Element_Py = Element - # Element, SubElement, ParseError, TreeBuilder, XMLParser + # Element, SubElement, ParseError, TreeBuilder, XMLParser, _set_factories from _elementtree import * + from _elementtree import _set_factories except ImportError: pass +else: + _set_factories(Comment, ProcessingInstruction) diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c index 663337d42dc768..5481c61678712b 100644 --- a/Modules/_elementtree.c +++ b/Modules/_elementtree.c @@ -92,6 +92,8 @@ typedef struct { PyObject *parseerror_obj; PyObject *deepcopy_obj; PyObject *elementpath_obj; + PyObject *comment_factory; + PyObject *pi_factory; } elementtreestate; static struct PyModuleDef elementtreemodule; @@ -114,6 +116,8 @@ elementtree_clear(PyObject *m) Py_CLEAR(st->parseerror_obj); Py_CLEAR(st->deepcopy_obj); Py_CLEAR(st->elementpath_obj); + Py_CLEAR(st->comment_factory); + Py_CLEAR(st->pi_factory); return 0; } @@ -124,6 +128,8 @@ elementtree_traverse(PyObject *m, visitproc visit, void *arg) Py_VISIT(st->parseerror_obj); Py_VISIT(st->deepcopy_obj); Py_VISIT(st->elementpath_obj); + Py_VISIT(st->comment_factory); + Py_VISIT(st->pi_factory); return 0; } @@ -2396,6 +2402,9 @@ typedef struct { PyObject *end_ns_event_obj; PyObject *comment_event_obj; PyObject *pi_event_obj; + + char insert_comments; + char insert_pis; } TreeBuilderObject; #define TreeBuilder_CheckExact(op) (Py_TYPE(op) == &TreeBuilder_Type) @@ -2432,6 +2441,7 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) t->start_event_obj = t->end_event_obj = NULL; t->start_ns_event_obj = t->end_ns_event_obj = NULL; t->comment_event_obj = t->pi_event_obj = NULL; + t->insert_comments = t->insert_pis = 0; } return (PyObject *)t; } @@ -2440,8 +2450,11 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) _elementtree.TreeBuilder.__init__ element_factory: object = NULL + * comment_factory: object = NULL pi_factory: object = NULL + insert_comments: bool = False + insert_pis: bool = False [clinic start generated code]*/ @@ -2449,8 +2462,9 @@ static int _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, PyObject *element_factory, PyObject *comment_factory, - PyObject *pi_factory) -/*[clinic end generated code: output=da49f5ab76aee6d6 input=9b7d938a273ab7ad]*/ + PyObject *pi_factory, + int insert_comments, int insert_pis) +/*[clinic end generated code: output=8571d4dcadfdf952 input=1f967b5c245e0a71]*/ { if (element_factory && element_factory != Py_None) { Py_INCREF(element_factory); @@ -2458,17 +2472,31 @@ _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, } else { Py_CLEAR(self->element_factory); } - if (comment_factory && comment_factory != Py_None) { + + if (!comment_factory || comment_factory == Py_None) { + elementtreestate *st = ET_STATE_GLOBAL; + comment_factory = st->comment_factory; + } + if (comment_factory) { Py_INCREF(comment_factory); Py_XSETREF(self->comment_factory, comment_factory); + self->insert_comments = insert_comments; } else { Py_CLEAR(self->comment_factory); + self->insert_comments = 0; } - if (pi_factory && pi_factory != Py_None) { + + if (!pi_factory || pi_factory == Py_None) { + elementtreestate *st = ET_STATE_GLOBAL; + pi_factory = st->pi_factory; + } + if (pi_factory) { Py_INCREF(pi_factory); Py_XSETREF(self->pi_factory, pi_factory); + self->insert_pis = insert_pis; } else { Py_CLEAR(self->pi_factory); + self->insert_pis = 0; } return 0; @@ -2527,6 +2555,57 @@ treebuilder_dealloc(TreeBuilderObject *self) /* -------------------------------------------------------------------- */ /* helpers for handling of arbitrary element-like objects */ +/*[clinic input] +_elementtree._set_factories + + comment_factory: object + pi_factory: object + / + +Change the factories used to create comments and processing instructions. + +For internal use only. +[clinic start generated code]*/ + +static PyObject * +_elementtree__set_factories_impl(PyObject *module, PyObject *comment_factory, + PyObject *pi_factory) +/*[clinic end generated code: output=813b408adee26535 input=99d17627aea7fb3b]*/ +{ + elementtreestate *st = ET_STATE_GLOBAL; + PyObject *old; + + if (!PyCallable_Check(comment_factory) && comment_factory != Py_None) { + PyErr_Format(PyExc_TypeError, "Comment factory must be callable, not %.100s", + Py_TYPE(comment_factory)->tp_name); + return NULL; + } + if (!PyCallable_Check(pi_factory) && pi_factory != Py_None) { + PyErr_Format(PyExc_TypeError, "PI factory must be callable, not %.100s", + Py_TYPE(pi_factory)->tp_name); + return NULL; + } + + old = PyTuple_Pack(2, + st->comment_factory ? st->comment_factory : Py_None, + st->pi_factory ? st->pi_factory : Py_None); + + if (comment_factory == Py_None) { + Py_CLEAR(st->comment_factory); + } else { + Py_INCREF(comment_factory); + Py_XSETREF(st->comment_factory, comment_factory); + } + if (pi_factory == Py_None) { + Py_CLEAR(st->pi_factory); + } else { + Py_INCREF(pi_factory); + Py_XSETREF(st->pi_factory, pi_factory); + } + + return old; +} + static int treebuilder_set_element_text_or_tail(PyObject *element, PyObject **data, PyObject **dest, _Py_Identifier *name) @@ -2770,7 +2849,7 @@ treebuilder_handle_comment(TreeBuilderObject* self, PyObject* text) return NULL; this = self->this; - if (this != Py_None) { + if (self->insert_comments && this != Py_None) { if (treebuilder_add_subelement(this, comment) < 0) goto error; } @@ -2809,7 +2888,7 @@ treebuilder_handle_pi(TreeBuilderObject* self, PyObject* target, PyObject* text) } this = self->this; - if (this != Py_None) { + if (self->insert_pis && this != Py_None) { if (treebuilder_add_subelement(this, pi) < 0) goto error; } @@ -3411,31 +3490,51 @@ static void expat_pi_handler(XMLParserObject* self, const XML_Char* target_in, const XML_Char* data_in) { - PyObject* parcel; + PyObject* pi_target = NULL; + PyObject* data; PyObject* res; + PyObject* stack[2]; if (PyErr_Occurred()) return; if (TreeBuilder_CheckExact(self->target)) { - /* shortcut: TreeBuilder does not handle PIs */ + /* shortcut */ TreeBuilderObject *target = (TreeBuilderObject*) self->target; if (target->events_append && target->pi_event_obj) { - parcel = Py_BuildValue("ss", target_in, data_in); - if (!parcel) - return; - treebuilder_append_event(target, target->pi_event_obj, parcel); - Py_DECREF(parcel); + pi_target = PyUnicode_DecodeUTF8(target_in, strlen(target_in), "strict"); + if (!pi_target) + goto error; + data = PyUnicode_DecodeUTF8(data_in, strlen(data_in), "strict"); + if (!data) + goto error; + res = treebuilder_handle_pi(target, pi_target, data); + Py_XDECREF(res); + Py_DECREF(data); + Py_DECREF(pi_target); } } else if (self->handle_pi) { - parcel = Py_BuildValue("ss", target_in, data_in); - if (!parcel) - return; - res = PyObject_Call(self->handle_pi, parcel, NULL); + pi_target = PyUnicode_DecodeUTF8(target_in, strlen(target_in), "strict"); + if (!pi_target) + goto error; + data = PyUnicode_DecodeUTF8(data_in, strlen(data_in), "strict"); + if (!data) + goto error; + + stack[0] = pi_target; + stack[1] = data; + res = _PyObject_FastCall(self->handle_pi, stack, 2); Py_XDECREF(res); - Py_DECREF(parcel); + Py_DECREF(data); + Py_DECREF(pi_target); } + + return; + + error: + Py_XDECREF(pi_target); + return; } /* -------------------------------------------------------------------- */ @@ -4156,6 +4255,7 @@ static PyTypeObject XMLParser_Type = { static PyMethodDef _functions[] = { {"SubElement", (PyCFunction)(void(*)(void)) subelement, METH_VARARGS | METH_KEYWORDS}, + _ELEMENTTREE__SET_FACTORIES_METHODDEF {NULL, NULL} }; diff --git a/Modules/clinic/_elementtree.c.h b/Modules/clinic/_elementtree.c.h index b1c5f8e25d205f..0f55480140b315 100644 --- a/Modules/clinic/_elementtree.c.h +++ b/Modules/clinic/_elementtree.c.h @@ -637,23 +637,26 @@ static int _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, PyObject *element_factory, PyObject *comment_factory, - PyObject *pi_factory); + PyObject *pi_factory, + int insert_comments, int insert_pis); static int _elementtree_TreeBuilder___init__(PyObject *self, PyObject *args, PyObject *kwargs) { int return_value = -1; - static const char * const _keywords[] = {"element_factory", "comment_factory", "pi_factory", NULL}; + static const char * const _keywords[] = {"element_factory", "comment_factory", "pi_factory", "insert_comments", "insert_pis", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "TreeBuilder", 0}; - PyObject *argsbuf[3]; + PyObject *argsbuf[5]; PyObject * const *fastargs; Py_ssize_t nargs = PyTuple_GET_SIZE(args); Py_ssize_t noptargs = nargs + (kwargs ? PyDict_GET_SIZE(kwargs) : 0) - 0; PyObject *element_factory = NULL; PyObject *comment_factory = NULL; PyObject *pi_factory = NULL; + int insert_comments = 0; + int insert_pis = 0; - fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 0, 3, 0, argsbuf); + fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 0, 1, 0, argsbuf); if (!fastargs) { goto exit; } @@ -666,15 +669,70 @@ _elementtree_TreeBuilder___init__(PyObject *self, PyObject *args, PyObject *kwar goto skip_optional_pos; } } +skip_optional_pos: + if (!noptargs) { + goto skip_optional_kwonly; + } if (fastargs[1]) { comment_factory = fastargs[1]; if (!--noptargs) { - goto skip_optional_pos; + goto skip_optional_kwonly; } } - pi_factory = fastargs[2]; -skip_optional_pos: - return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory, comment_factory, pi_factory); + if (fastargs[2]) { + pi_factory = fastargs[2]; + if (!--noptargs) { + goto skip_optional_kwonly; + } + } + if (fastargs[3]) { + insert_comments = PyObject_IsTrue(fastargs[3]); + if (insert_comments < 0) { + goto exit; + } + if (!--noptargs) { + goto skip_optional_kwonly; + } + } + insert_pis = PyObject_IsTrue(fastargs[4]); + if (insert_pis < 0) { + goto exit; + } +skip_optional_kwonly: + return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory, comment_factory, pi_factory, insert_comments, insert_pis); + +exit: + return return_value; +} + +PyDoc_STRVAR(_elementtree__set_factories__doc__, +"_set_factories($module, comment_factory, pi_factory, /)\n" +"--\n" +"\n" +"Change the factories used to create comments and processing instructions.\n" +"\n" +"For internal use only."); + +#define _ELEMENTTREE__SET_FACTORIES_METHODDEF \ + {"_set_factories", (PyCFunction)(void(*)(void))_elementtree__set_factories, METH_FASTCALL, _elementtree__set_factories__doc__}, + +static PyObject * +_elementtree__set_factories_impl(PyObject *module, PyObject *comment_factory, + PyObject *pi_factory); + +static PyObject * +_elementtree__set_factories(PyObject *module, PyObject *const *args, Py_ssize_t nargs) +{ + PyObject *return_value = NULL; + PyObject *comment_factory; + PyObject *pi_factory; + + if (!_PyArg_CheckPositional("_set_factories", nargs, 2, 2)) { + goto exit; + } + comment_factory = args[0]; + pi_factory = args[1]; + return_value = _elementtree__set_factories_impl(module, comment_factory, pi_factory); exit: return return_value; @@ -911,4 +969,4 @@ _elementtree_XMLParser__setevents(XMLParserObject *self, PyObject *const *args, exit: return return_value; } -/*[clinic end generated code: output=94ec504fdbcea1d3 input=a9049054013a1b77]*/ +/*[clinic end generated code: output=386a68425d072b5c input=a9049054013a1b77]*/ From aa52e04255765c47cce2abd4c3b9845861a24eb5 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Sat, 20 Apr 2019 13:18:29 +0200 Subject: [PATCH 3/6] bpo-36676: Implement namespace prefix aware parsing support for the XMLParser target in ElementTree. --- Doc/library/xml.etree.elementtree.rst | 12 ++ Lib/test/test_xml_etree.py | 71 +++++++++- Lib/xml/etree/ElementTree.py | 30 +++- .../2019-04-20-13-10-34.bpo-36676.XF4Egb.rst | 3 + Modules/_elementtree.c | 133 +++++++++++++++--- 5 files changed, 221 insertions(+), 28 deletions(-) create mode 100644 Misc/NEWS.d/next/Library/2019-04-20-13-10-34.bpo-36676.XF4Egb.rst diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 1e4134aa1e4ad3..413fe7485cfc7e 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -1169,6 +1169,18 @@ XMLParser Objects >>> parser.close() 4 + Additionally, if the target object provides one or both of the methods + ``start_ns(self, prefix, uri)`` and ``end_ns(self, prefix)``, then they + are called whenever the parser encounters a new namespace declaration. + The ``prefix`` is ``''`` for the default namespace and the declared + namespace prefix otherwise. The ``start_ns()`` method is called before + the ``start()`` callback of the opening tag that defines the namespace, + and the ``end_ns()`` method is called after the corresponding ``end()`` + callback. + + .. versionchanged:: 3.8 + The ``start_ns()`` and ``end_ns()`` callbacks were added. + .. _elementtree-xmlpullparser-objects: diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 94a22882cb8343..29aee69ed47757 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -18,7 +18,7 @@ import warnings import weakref -from itertools import product +from itertools import product, islice from test import support from test.support import TESTFN, findfile, import_fresh_module, gc_collect, swap_attr @@ -693,12 +693,17 @@ def pi(self, target, data): self.append(("pi", target, data)) def comment(self, data): self.append(("comment", data)) + def start_ns(self, prefix, uri): + self.append(("start-ns", prefix, uri)) + def end_ns(self, prefix): + self.append(("end-ns", prefix)) builder = Builder() parser = ET.XMLParser(target=builder) parser.feed(data) self.assertEqual(builder, [ ('pi', 'pi', 'data'), ('comment', ' comment '), + ('start-ns', '', 'namespace'), ('start', '{namespace}root'), ('start', '{namespace}element'), ('end', '{namespace}element'), @@ -707,6 +712,7 @@ def comment(self, data): ('start', '{namespace}empty-element'), ('end', '{namespace}empty-element'), ('end', '{namespace}root'), + ('end-ns', ''), ]) @@ -1193,14 +1199,19 @@ def _feed(self, parser, data, chunk_size=None): for i in range(0, len(data), chunk_size): parser.feed(data[i:i+chunk_size]) - def assert_events(self, parser, expected): + def assert_events(self, parser, expected, max_events=None): self.assertEqual( [(event, (elem.tag, elem.text)) - for event, elem in parser.read_events()], + for event, elem in islice(parser.read_events(), max_events)], expected) - def assert_event_tags(self, parser, expected): - events = parser.read_events() + def assert_event_tuples(self, parser, expected, max_events=None): + self.assertEqual( + list(islice(parser.read_events(), max_events)), + expected) + + def assert_event_tags(self, parser, expected, max_events=None): + events = islice(parser.read_events(), max_events) self.assertEqual([(action, elem.tag) for action, elem in events], expected) @@ -1275,6 +1286,56 @@ def test_ns_events(self): self.assertEqual(list(parser.read_events()), [('end-ns', None)]) self.assertIsNone(parser.close()) + def test_ns_events_start(self): + parser = ET.XMLPullParser(events=('start-ns', 'start', 'end')) + self._feed(parser, "\n") + self.assert_event_tuples(parser, [ + ('start-ns', ('', 'abc')), + ('start-ns', ('p', 'xyz')), + ], max_events=2) + self.assert_event_tags(parser, [ + ('start', '{abc}tag'), + ], max_events=1) + + self._feed(parser, "\n") + self.assert_event_tags(parser, [ + ('start', '{abc}child'), + ('end', '{abc}child'), + ]) + + self._feed(parser, "\n") + parser.close() + self.assert_event_tags(parser, [ + ('end', '{abc}tag'), + ]) + + def test_ns_events_start_end(self): + parser = ET.XMLPullParser(events=('start-ns', 'start', 'end', 'end-ns')) + self._feed(parser, "\n") + self.assert_event_tuples(parser, [ + ('start-ns', ('', 'abc')), + ('start-ns', ('p', 'xyz')), + ], max_events=2) + self.assert_event_tags(parser, [ + ('start', '{abc}tag'), + ], max_events=1) + + self._feed(parser, "\n") + self.assert_event_tags(parser, [ + ('start', '{abc}child'), + ('end', '{abc}child'), + ]) + + self._feed(parser, "\n") + parser.close() + self.assert_event_tags(parser, [ + ('end', '{abc}tag'), + ], max_events=1) + self.assert_event_tuples(parser, [ + ('end-ns', None), + ('end-ns', None), + ]) + def test_events(self): parser = ET.XMLPullParser(events=()) self._feed(parser, "\n") diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index c6400480f5b4b4..5b26ac72fd1aae 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1518,6 +1518,10 @@ def __init__(self, *, target=None, encoding=None): parser.StartElementHandler = self._start if hasattr(target, 'end'): parser.EndElementHandler = self._end + if hasattr(target, 'start_ns'): + parser.StartNamespaceDeclHandler = self._start_ns + if hasattr(target, 'end_ns'): + parser.EndNamespaceDeclHandler = self._end_ns if hasattr(target, 'data'): parser.CharacterDataHandler = target.data # miscellaneous callbacks @@ -1559,12 +1563,24 @@ def handler(tag, event=event_name, append=append, append((event, end(tag))) parser.EndElementHandler = handler elif event_name == "start-ns": - def handler(prefix, uri, event=event_name, append=append): - append((event, (prefix or "", uri or ""))) + # TreeBuilder does not implement .start_ns() + if hasattr(self.target, "start_ns"): + def handler(prefix, uri, event=event_name, append=append, + start_ns=self._start_ns): + append((event, start_ns(prefix, uri))) + else: + def handler(prefix, uri, event=event_name, append=append): + append((event, (prefix or '', uri or ''))) parser.StartNamespaceDeclHandler = handler elif event_name == "end-ns": - def handler(prefix, event=event_name, append=append): - append((event, None)) + # TreeBuilder does not implement .end_ns() + if hasattr(self.target, "end_ns"): + def handler(prefix, event=event_name, append=append, + end_ns=self._end_ns): + append((event, end_ns(prefix))) + else: + def handler(prefix, event=event_name, append=append): + append((event, None)) parser.EndNamespaceDeclHandler = handler elif event_name == 'comment': def handler(text, event=event_name, append=append, self=self): @@ -1595,6 +1611,12 @@ def _fixname(self, key): self._names[key] = name return name + def _start_ns(self, prefix, uri): + return self.target.start_ns(prefix or '', uri or '') + + def _end_ns(self, prefix): + return self.target.end_ns(prefix or '') + def _start(self, tag, attr_list): # Handler for expat's StartElementHandler. Since ordered_attributes # is set, the attributes are reported as a list of alternating diff --git a/Misc/NEWS.d/next/Library/2019-04-20-13-10-34.bpo-36676.XF4Egb.rst b/Misc/NEWS.d/next/Library/2019-04-20-13-10-34.bpo-36676.XF4Egb.rst new file mode 100644 index 00000000000000..e0bede81eec108 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2019-04-20-13-10-34.bpo-36676.XF4Egb.rst @@ -0,0 +1,3 @@ +The XMLParser() in xml.etree.ElementTree provides namespace prefix context to the +parser target if it defines the callback methods "start_ns()" and/or "end_ns()". +Patch by Stefan Behnel. diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c index 5481c61678712b..50d0f20571bcea 100644 --- a/Modules/_elementtree.c +++ b/Modules/_elementtree.c @@ -2911,6 +2911,39 @@ treebuilder_handle_pi(TreeBuilderObject* self, PyObject* target, PyObject* text) return NULL; } +LOCAL(PyObject*) +treebuilder_handle_start_ns(TreeBuilderObject* self, PyObject* prefix, PyObject* uri) +{ + PyObject* parcel; + + if (self->events_append && self->start_ns_event_obj) { + parcel = PyTuple_Pack(2, prefix, uri); + if (!parcel) { + return NULL; + } + + if (treebuilder_append_event(self, self->start_ns_event_obj, parcel) < 0) { + Py_DECREF(parcel); + return NULL; + } + Py_DECREF(parcel); + } + + Py_RETURN_NONE; +} + +LOCAL(PyObject*) +treebuilder_handle_end_ns(TreeBuilderObject* self, PyObject* prefix) +{ + if (self->events_append && self->end_ns_event_obj) { + if (treebuilder_append_event(self, self->end_ns_event_obj, prefix) < 0) { + return NULL; + } + } + + Py_RETURN_NONE; +} + /* -------------------------------------------------------------------- */ /* methods (in alphabetical order) */ @@ -3046,6 +3079,8 @@ typedef struct { PyObject *names; + PyObject *handle_start_ns; + PyObject *handle_end_ns; PyObject *handle_start; PyObject *handle_data; PyObject *handle_end; @@ -3357,42 +3392,85 @@ expat_end_handler(XMLParserObject* self, const XML_Char* tag_in) } static void -expat_start_ns_handler(XMLParserObject* self, const XML_Char* prefix, - const XML_Char *uri) +expat_start_ns_handler(XMLParserObject* self, const XML_Char* prefix_in, + const XML_Char *uri_in) { - TreeBuilderObject *target = (TreeBuilderObject*) self->target; - PyObject *parcel; + PyObject* res = NULL; + PyObject* uri; + PyObject* prefix; + PyObject* stack[2]; if (PyErr_Occurred()) return; - if (!target->events_append || !target->start_ns_event_obj) - return; + if (!uri_in) + uri_in = ""; + if (!prefix_in) + prefix_in = ""; + + if (TreeBuilder_CheckExact(self->target)) { + /* shortcut - TreeBuilder does not actually implement .start_ns() */ + TreeBuilderObject *target = (TreeBuilderObject*) self->target; - if (!uri) - uri = ""; - if (!prefix) - prefix = ""; + if (target->events_append && target->start_ns_event_obj) { + prefix = PyUnicode_DecodeUTF8(prefix_in, strlen(prefix_in), "strict"); + if (!prefix) + return; + uri = PyUnicode_DecodeUTF8(uri_in, strlen(uri_in), "strict"); + if (!uri) + return; - parcel = Py_BuildValue("ss", prefix, uri); - if (!parcel) - return; - treebuilder_append_event(target, target->start_ns_event_obj, parcel); - Py_DECREF(parcel); + res = treebuilder_handle_start_ns(target, prefix, uri); + Py_DECREF(uri); + Py_DECREF(prefix); + } + } else if (self->handle_start_ns) { + prefix = PyUnicode_DecodeUTF8(prefix_in, strlen(prefix_in), "strict"); + if (!prefix) + return; + uri = PyUnicode_DecodeUTF8(uri_in, strlen(uri_in), "strict"); + if (!uri) + return; + + stack[0] = prefix; + stack[1] = uri; + res = _PyObject_FastCall(self->handle_start_ns, stack, 2); + Py_DECREF(uri); + Py_DECREF(prefix); + } + + Py_XDECREF(res); } static void expat_end_ns_handler(XMLParserObject* self, const XML_Char* prefix_in) { - TreeBuilderObject *target = (TreeBuilderObject*) self->target; + PyObject *res = NULL; + PyObject* prefix; if (PyErr_Occurred()) return; - if (!target->events_append) - return; + if (!prefix_in) + prefix_in = ""; - treebuilder_append_event(target, target->end_ns_event_obj, Py_None); + if (TreeBuilder_CheckExact(self->target)) { + /* shortcut - TreeBuilder does not actually implement .end_ns() */ + TreeBuilderObject *target = (TreeBuilderObject*) self->target; + + if (target->events_append && target->end_ns_event_obj) { + res = treebuilder_handle_end_ns(target, Py_None); + } + } else if (self->handle_end_ns) { + prefix = PyUnicode_DecodeUTF8(prefix_in, strlen(prefix_in), "strict"); + if (!prefix) + return; + + res = _PyObject_FastCall(self->handle_end_ns, &prefix, 1); + Py_DECREF(prefix); + } + + Py_XDECREF(res); } static void @@ -3546,6 +3624,7 @@ xmlparser_new(PyTypeObject *type, PyObject *args, PyObject *kwds) if (self) { self->parser = NULL; self->target = self->entity = self->names = NULL; + self->handle_start_ns = self->handle_end_ns = NULL; self->handle_start = self->handle_data = self->handle_end = NULL; self->handle_comment = self->handle_pi = self->handle_close = NULL; self->handle_doctype = NULL; @@ -3614,6 +3693,14 @@ _elementtree_XMLParser___init___impl(XMLParserObject *self, PyObject *target, } self->target = target; + self->handle_start_ns = PyObject_GetAttrString(target, "start_ns"); + if (ignore_attribute_error(self->handle_start_ns)) { + return -1; + } + self->handle_end_ns = PyObject_GetAttrString(target, "end_ns"); + if (ignore_attribute_error(self->handle_end_ns)) { + return -1; + } self->handle_start = PyObject_GetAttrString(target, "start"); if (ignore_attribute_error(self->handle_start)) { return -1; @@ -3645,6 +3732,12 @@ _elementtree_XMLParser___init___impl(XMLParserObject *self, PyObject *target, /* configure parser */ EXPAT(SetUserData)(self->parser, self); + if (self->handle_start_ns || self->handle_end_ns) + EXPAT(SetNamespaceDeclHandler)( + self->parser, + (XML_StartNamespaceDeclHandler) expat_start_ns_handler, + (XML_EndNamespaceDeclHandler) expat_end_ns_handler + ); EXPAT(SetElementHandler)( self->parser, (XML_StartElementHandler) expat_start_handler, @@ -3689,6 +3782,7 @@ xmlparser_gc_traverse(XMLParserObject *self, visitproc visit, void *arg) Py_VISIT(self->handle_end); Py_VISIT(self->handle_data); Py_VISIT(self->handle_start); + Py_VISIT(self->handle_start_ns); Py_VISIT(self->target); Py_VISIT(self->entity); @@ -3712,6 +3806,7 @@ xmlparser_gc_clear(XMLParserObject *self) Py_CLEAR(self->handle_end); Py_CLEAR(self->handle_data); Py_CLEAR(self->handle_start); + Py_CLEAR(self->handle_start_ns); Py_CLEAR(self->handle_doctype); Py_CLEAR(self->target); From 3b332644b22152865018fc91971fa3bc17373602 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Mon, 22 Apr 2019 08:27:03 +0200 Subject: [PATCH 4/6] bpo-36676: Add test to see if a target only with an "end_ns()" callback receives the calls in the right order. --- Lib/test/test_xml_etree.py | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 29aee69ed47757..0b03d077448838 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -13,6 +13,7 @@ import operator import pickle import sys +import textwrap import types import unittest import warnings @@ -715,6 +716,27 @@ def end_ns(self, prefix): ('end-ns', ''), ]) + def test_custom_builder_only_end_ns(self): + class Builder(list): + def end_ns(self, prefix): + self.append(("end-ns", prefix)) + + builder = Builder() + parser = ET.XMLParser(target=builder) + parser.feed(textwrap.dedent("""\ + + + + text + texttail + + + """)) + self.assertEqual(builder, [ + ('end-ns', 'a'), + ('end-ns', 'p'), + ('end-ns', ''), + ]) # Element.getchildren() and ElementTree.getiterator() are deprecated. @checkwarnings(("This method will be removed in future versions. " From 6c903a39f8245295edacfc6e133abb1d8009e571 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Wed, 1 May 2019 19:58:33 +0200 Subject: [PATCH 5/6] Fix reference leaks. --- Modules/_elementtree.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c index 50d0f20571bcea..b69e3a45fe308f 100644 --- a/Modules/_elementtree.c +++ b/Modules/_elementtree.c @@ -3417,8 +3417,10 @@ expat_start_ns_handler(XMLParserObject* self, const XML_Char* prefix_in, if (!prefix) return; uri = PyUnicode_DecodeUTF8(uri_in, strlen(uri_in), "strict"); - if (!uri) + if (!uri) { + Py_DECREF(prefix); return; + } res = treebuilder_handle_start_ns(target, prefix, uri); Py_DECREF(uri); @@ -3429,8 +3431,10 @@ expat_start_ns_handler(XMLParserObject* self, const XML_Char* prefix_in, if (!prefix) return; uri = PyUnicode_DecodeUTF8(uri_in, strlen(uri_in), "strict"); - if (!uri) + if (!uri) { + Py_DECREF(prefix); return; + } stack[0] = prefix; stack[1] = uri; @@ -3783,6 +3787,8 @@ xmlparser_gc_traverse(XMLParserObject *self, visitproc visit, void *arg) Py_VISIT(self->handle_data); Py_VISIT(self->handle_start); Py_VISIT(self->handle_start_ns); + Py_VISIT(self->handle_end_ns); + Py_VISIT(self->handle_doctype); Py_VISIT(self->target); Py_VISIT(self->entity); @@ -3807,6 +3813,7 @@ xmlparser_gc_clear(XMLParserObject *self) Py_CLEAR(self->handle_data); Py_CLEAR(self->handle_start); Py_CLEAR(self->handle_start_ns); + Py_CLEAR(self->handle_end_ns); Py_CLEAR(self->handle_doctype); Py_CLEAR(self->target); From 555593bd453e42589d8aa031b007c2ae69396b0c Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Wed, 1 May 2019 20:09:33 +0200 Subject: [PATCH 6/6] Move the documentation of the start_ns() and end_ns() methods to a more appropriate place. --- Doc/library/xml.etree.elementtree.rst | 34 ++++++++++++++++----------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 413fe7485cfc7e..70ec6ff01a30a0 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -1087,7 +1087,7 @@ TreeBuilder Objects In addition, a custom :class:`TreeBuilder` object can provide the - following method: + following methods: .. method:: doctype(name, pubid, system) @@ -1097,6 +1097,23 @@ TreeBuilder Objects .. versionadded:: 3.2 + .. method:: start_ns(prefix, uri) + + Is called whenever the parser encounters a new namespace declaration, + before the ``start()`` callback for the opening element that defines it. + *prefix* is ``''`` for the default namespace and the declared + namespace prefix name otherwise. *uri* is the namespace URI. + + .. versionadded:: 3.8 + + .. method:: end_ns(prefix) + + Is called after the ``end()`` callback of an element that declared + a namespace prefix mapping, with the name of the *prefix* that went + out of scope. + + .. versionadded:: 3.8 + .. _elementtree-xmlparser-objects: @@ -1132,7 +1149,8 @@ XMLParser Objects :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method for each opening tag, its ``end(tag)`` method for each closing tag, and data - is processed by method ``data(data)``. :meth:`XMLParser.close` calls + is processed by method ``data(data)``. For further supported callback + methods, see the :class:`TreeBuilder` class. :meth:`XMLParser.close` calls *target*\'s method ``close()``. :class:`XMLParser` can be used not only for building a tree structure. This is an example of counting the maximum depth of an XML file:: @@ -1169,18 +1187,6 @@ XMLParser Objects >>> parser.close() 4 - Additionally, if the target object provides one or both of the methods - ``start_ns(self, prefix, uri)`` and ``end_ns(self, prefix)``, then they - are called whenever the parser encounters a new namespace declaration. - The ``prefix`` is ``''`` for the default namespace and the declared - namespace prefix otherwise. The ``start_ns()`` method is called before - the ``start()`` callback of the opening tag that defines the namespace, - and the ``end_ns()`` method is called after the corresponding ``end()`` - callback. - - .. versionchanged:: 3.8 - The ``start_ns()`` and ``end_ns()`` callbacks were added. - .. _elementtree-xmlpullparser-objects: