33XML Processing Modules
44======================
55
6+ .. module :: xml
7+ :synopsis: Package containing XML processing modules
8+ ..
sectionauthor ::
Christian Heimes <[email protected] > 9+ ..
sectionauthor ::
Georg Brandl <[email protected] > 10+
11+
612Python's interfaces for processing XML are grouped in the ``xml `` package.
713
14+ .. warning ::
15+
16+ The XML modules are not secure against erroneous or maliciously
17+ constructed data. If you need to parse untrusted or unauthenticated data see
18+ :ref: `xml-vulnerabilities `.
19+
20+
821It is important to note that modules in the :mod: `xml ` package require that
922there be at least one SAX-compliant XML parser available. The Expat parser is
1023included with Python, so the :mod: `xml.parsers.expat ` module will always be
@@ -27,3 +40,94 @@ The XML handling submodules are:
2740
2841* :mod: `xml.sax `: SAX2 base classes and convenience functions
2942* :mod: `xml.parsers.expat `: the Expat parser binding
43+
44+
45+ .. _xml-vulnerabilities :
46+
47+ XML vulnerabilities
48+ ===================
49+
50+ The XML processing modules are not secure against maliciously constructed data.
51+ An attacker can abuse vulnerabilities for e.g. denial of service attacks, to
52+ access local files, to generate network connections to other machines, or
53+ to or circumvent firewalls. The attacks on XML abuse unfamiliar features
54+ like inline `DTD `_ (document type definition) with entities.
55+
56+
57+ ========================= ======== ========= ========= ======== =========
58+ kind sax etree minidom pulldom xmlrpc
59+ ========================= ======== ========= ========= ======== =========
60+ billion laughs **True ** **True ** **True ** **True ** **True **
61+ quadratic blowup **True ** **True ** **True ** **True ** **True **
62+ external entity expansion **True ** False (1) False (2) **True ** False (3)
63+ DTD retrieval **True ** False False **True ** False
64+ decompression bomb False False False False **True **
65+ ========================= ======== ========= ========= ======== =========
66+
67+ 1. :mod: `xml.etree.ElementTree ` doesn't expand external entities and raises a
68+ ParserError when an entity occurs.
69+ 2. :mod: `xml.dom.minidom ` doesn't expand external entities and simply returns
70+ the unexpanded entity verbatim.
71+ 3. :mod: `xmlrpclib ` doesn't expand external entities and omits them.
72+
73+
74+ billion laughs / exponential entity expansion
75+ The `Billion Laughs `_ attack -- also known as exponential entity expansion --
76+ uses multiple levels of nested entities. Each entity refers to another entity
77+ several times, the final entity definition contains a small string. Eventually
78+ the small string is expanded to several gigabytes. The exponential expansion
79+ consumes lots of CPU time, too.
80+
81+ quadratic blowup entity expansion
82+ A quadratic blowup attack is similar to a `Billion Laughs `_ attack; it abuses
83+ entity expansion, too. Instead of nested entities it repeats one large entity
84+ with a couple of thousand chars over and over again. The attack isn't as
85+ efficient as the exponential case but it avoids triggering countermeasures of
86+ parsers against heavily nested entities.
87+
88+ external entity expansion
89+ Entity declarations can contain more than just text for replacement. They can
90+ also point to external resources by public identifiers or system identifiers.
91+ System identifiers are standard URIs or can refer to local files. The XML
92+ parser retrieves the resource with e.g. HTTP or FTP requests and embeds the
93+ content into the XML document.
94+
95+ DTD retrieval
96+ Some XML libraries like Python's mod:'xml.dom.pulldom' retrieve document type
97+ definitions from remote or local locations. The feature has similar
98+ implications as the external entity expansion issue.
99+
100+ decompression bomb
101+ The issue of decompression bombs (aka `ZIP bomb `_) apply to all XML libraries
102+ that can parse compressed XML stream like gzipped HTTP streams or LZMA-ed
103+ files. For an attacker it can reduce the amount of transmitted data by three
104+ magnitudes or more.
105+
106+ The documentation of `defusedxml `_ on PyPI has further information about
107+ all known attack vectors with examples and references.
108+
109+ defused packages
110+ ----------------
111+
112+ `defusedxml `_ is a pure Python package with modified subclasses of all stdlib
113+ XML parsers that prevent any potentially malicious operation. The courses of
114+ action are recommended for any server code that parses untrusted XML data. The
115+ package also ships with example exploits and an extended documentation on more
116+ XML exploits like xpath injection.
117+
118+ `defusedexpat `_ provides a modified libexpat and patched replacment
119+ :mod: `pyexpat ` extension module with countermeasures against entity expansion
120+ DoS attacks. Defusedexpat still allows a sane and configurable amount of entity
121+ expansions. The modifications will be merged into future releases of Python.
122+
123+ The workarounds and modifications are not included in patch releases as they
124+ break backward compatibility. After all inline DTD and entity expansion are
125+ well-definied XML features.
126+
127+
128+ .. _defusedxml : <https://pypi.python.org/pypi/defusedxml/>
129+ .. _defusedexpat : <https://pypi.python.org/pypi/defusedexpat/>
130+ .. _Billion Laughs : http://en.wikipedia.org/wiki/Billion_laughs
131+ .. _ZIP bomb : http://en.wikipedia.org/wiki/Zip_bomb
132+ .. _DTD : http://en.wikipedia.org/wiki/Document_Type_Definition
133+
0 commit comments