Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 014f0e3

Browse files
committed
Massive addition of SAX documentation from Martin von Loewis
<[email protected]>. Reorganized to be more like other parts of the documentation in its arrangement, but with few content changes. This closes SourceForge patch #101850.
1 parent 110941a commit 014f0e3

4 files changed

Lines changed: 731 additions & 3 deletions

File tree

Doc/lib/xmlsax.tex

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ \section{\module{xml.sax} ---
66
functions.}
77
\moduleauthor{Lars Marius Garshol}{[email protected]}
88
\sectionauthor{Fred L. Drake, Jr.}{[email protected]}
9+
\sectionauthor{Martin v. L\"owis}{[email protected]}
910

1011
\versionadded{2.0}
1112

@@ -41,8 +42,29 @@ \section{\module{xml.sax} ---
4142
received as a parameter.
4243
\end{funcdesc}
4344

44-
45-
The SAX exceptions are also provided here:
45+
A typical SAX application uses three kinds of objects: readers,
46+
handlers and input sources. ``Reader'' in this context is another term
47+
for parser, ie. some piece of code that reads the bytes or characters
48+
from the input source, and produces a sequence of events. The events
49+
then get distributed to the handler objects, ie. the reader invokes a
50+
method on the handler. A SAX application must therefore obtain a
51+
handler object, create or open the input sources, create the handlers,
52+
and connect these objects all together. As the final step, parsing is
53+
invoked. During parsing
54+
55+
For these objects, only the interfaces are relevant; they are normally
56+
not instantiated by the application itself. Since Python does not have
57+
an explicit notion of interface, they are formally introduced as
58+
classes. The \class{InputSource}, \class{Locator},
59+
\class{AttributesImpl}, and \class{XMLReader} interfaces are defined
60+
in the module \refmodule{xml.sax.xmlreader}. The handler interfaces
61+
are defined in \refmodule{xml.sax.handler}. For convenience,
62+
\class{InputSource} (which is often instantiated directly) and the
63+
handler classes are also available from \module{xml.sax}. These
64+
classes are described below.
65+
66+
In addition to these classes, \module{xml.sax} provides the following
67+
exception classes.
4668

4769
\begin{excclassdesc}{SAXException}{msg\optional{, exception}}
4870
Encapsulate an XML error or warning. This class can contain basic
@@ -94,7 +116,7 @@ \section{\module{xml.sax} ---
94116
\end{seealso}
95117

96118

97-
\subsection{SAXException Objects \label{saxexception-objects}}
119+
\subsection{SAXException Objects \label{sax-exception-objects}}
98120

99121
The \class{SAXException} exception class supports the following
100122
methods:

Doc/lib/xmlsaxhandler.tex

Lines changed: 321 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,321 @@
1+
\section{\module{xml.sax.handler} ---
2+
Base classes for SAX handlers}
3+
4+
\declaremodule{standard}{xml.sax.handler}
5+
\modulesynopsis{Base classes for SAX event handlers.}
6+
\sectionauthor{Martin v. L\"owis}{[email protected]}
7+
\moduleauthor{Lars Marius Garshol}{[email protected]}
8+
9+
\versionadded{2.0}
10+
11+
12+
The SAX API defines four kinds of handlers: content handlers, DTD
13+
handlers, error handlers, and entity resolvers. Applications normally
14+
only need to implement those interfaces whose events they are
15+
interested in; they can implement the interfaces in a single object or
16+
in multiple objects. Handler implementations should inherit from the
17+
base classes provided in the module \module{xml.sax}, so that all
18+
methods get default implementations.
19+
20+
\begin{classdesc}{ContentHandler}{}
21+
This is the main callback interface in SAX, and the one most
22+
important to applications. The order of events in this interface
23+
mirrors the order of the information in the document.
24+
\end{classdesc}
25+
26+
\begin{classdesc}{DTDHandler}{}
27+
Handle DTD events.
28+
29+
This interface specifies only those DTD events required for basic
30+
parsing (unparsed entities and attributes).
31+
\end{classdesc}
32+
33+
\begin{classdesc}{EntityResolver}{}
34+
Basic interface for resolving entities. If you create an object
35+
implementing this interface, then register the object with your
36+
Parser, the parser will call the method in your object to resolve all
37+
external entities.
38+
\end{classdesc}
39+
40+
In addition to these classes, \module{xml.sax.handler} provides
41+
symbolic constants for the feature and property names.
42+
43+
\begin{datadesc}{feature_namespaces}
44+
Value: \code{"http://xml.org/sax/features/namespaces"}\\
45+
true: Perform Namespace processing (default).\\
46+
false: Optionally do not perform Namespace processing
47+
(implies namespace-prefixes).\\
48+
access: (parsing) read-only; (not parsing) read/write\\
49+
\end{datadesc}
50+
51+
\begin{datadesc}{feature_namespace_prefixes}
52+
Value: \code{"http://xml.org/sax/features/namespace-prefixes"}\\
53+
true: Report the original prefixed names and attributes used for Namespace
54+
declarations.\\
55+
false: Do not report attributes used for Namespace declarations, and
56+
optionally do not report original prefixed names (default).\\
57+
access: (parsing) read-only; (not parsing) read/write
58+
\end{datadesc}
59+
60+
\begin{datadesc}{feature_string_interning}
61+
Value: \code{"http://xml.org/sax/features/string-interning"}
62+
true: All element names, prefixes, attribute names, Namespace URIs, and
63+
local names are interned using the built-in intern function.\\
64+
false: Names are not necessarily interned, although they may be (default).\\
65+
access: (parsing) read-only; (not parsing) read/write
66+
\end{datadesc}
67+
68+
\begin{datadesc}{feature_validation}
69+
Value: \code{"http://xml.org/sax/features/validation"}\\
70+
true: Report all validation errors (implies external-general-entities and
71+
external-parameter-entities).\\
72+
false: Do not report validation errors.\\
73+
access: (parsing) read-only; (not parsing) read/write
74+
\end{datadesc}
75+
76+
\begin{datadesc}{feature_external_ges}
77+
Value: \code{"http://xml.org/sax/features/external-general-entities"}\\
78+
true: Include all external general (text) entities.\\
79+
false: Do not include external general entities.\\
80+
access: (parsing) read-only; (not parsing) read/write
81+
\end{datadesc}
82+
83+
\begin{datadesc}{feature_external_pes}
84+
Value: \code{"http://xml.org/sax/features/external-parameter-entities"}\\
85+
true: Include all external parameter entities, including the external
86+
DTD subset.\\
87+
false: Do not include any external parameter entities, even the external
88+
DTD subset.\\
89+
access: (parsing) read-only; (not parsing) read/write
90+
\end{datadesc}
91+
92+
\begin{datadesc}{all_features}
93+
List of all features.
94+
\end{datadesc}
95+
96+
\begin{datadesc}{property_lexical_handler}
97+
Value: \code{"http://xml.org/sax/properties/lexical-handler"}\\
98+
data type: xml.sax.sax2lib.LexicalHandler (not supported in Python 2)\\
99+
description: An optional extension handler for lexical events like comments.\\
100+
access: read/write
101+
\end{datadesc}
102+
103+
\begin{datadesc}{property_declaration_handler}
104+
Value: \code{"http://xml.org/sax/properties/declaration-handler"}\\
105+
data type: xml.sax.sax2lib.DeclHandler (not supported in Python 2)\\
106+
description: An optional extension handler for DTD-related events other
107+
than notations and unparsed entities.\\
108+
access: read/write
109+
\end{datadesc}
110+
111+
\begin{datadesc}{property_dom_node}
112+
Value: \code{"http://xml.org/sax/properties/dom-node"}\\
113+
data type: org.w3c.dom.Node (not supported in Python 2) \\
114+
description: When parsing, the current DOM node being visited if this is
115+
a DOM iterator; when not parsing, the root DOM node for
116+
iteration.\\
117+
access: (parsing) read-only; (not parsing) read/write
118+
\end{datadesc}
119+
120+
\begin{datadesc}{property_xml_string}
121+
Value: \code{"http://xml.org/sax/properties/xml-string"}\\
122+
data type: String\\
123+
description: The literal string of characters that was the source for
124+
the current event.\\
125+
access: read-only
126+
\end{datadesc}
127+
128+
\begin{datadesc}{all_properties}
129+
List of all known property names.
130+
\end{datadesc}
131+
132+
133+
\subsection{ContentHandler Objects \label{content-handler-objects}}
134+
135+
Users are expected to subclass \class{ContentHandler} to support their
136+
application. The following methods are called by the parser on the
137+
appropriate events in the input document:
138+
139+
\begin{methoddesc}[ContentHandler]{setDocumentLocator}{locator}
140+
Called by the parser to give the application a locator for locating
141+
the origin of document events.
142+
143+
SAX parsers are strongly encouraged (though not absolutely required)
144+
to supply a locator: if it does so, it must supply the locator to
145+
the application by invoking this method before invoking any of the
146+
other methods in the DocumentHandler interface.
147+
148+
The locator allows the application to determine the end position of
149+
any document-related event, even if the parser is not reporting an
150+
error. Typically, the application will use this information for
151+
reporting its own errors (such as character content that does not
152+
match an application's business rules). The information returned by
153+
the locator is probably not sufficient for use with a search engine.
154+
155+
Note that the locator will return correct information only during
156+
the invocation of the events in this interface. The application
157+
should not attempt to use it at any other time.
158+
\end{methoddesc}
159+
160+
\begin{methoddesc}[ContentHandler]{startDocument}{}
161+
Receive notification of the beginning of a document.
162+
163+
The SAX parser will invoke this method only once, before any other
164+
methods in this interface or in DTDHandler (except for
165+
\method{setDocumentLocator()}).
166+
\end{methoddesc}
167+
168+
\begin{methoddesc}[ContentHandler]{endDocument}{}
169+
Receive notification of the end of a document.
170+
171+
The SAX parser will invoke this method only once, and it will be the
172+
last method invoked during the parse. The parser shall not invoke
173+
this method until it has either abandoned parsing (because of an
174+
unrecoverable error) or reached the end of input.
175+
\end{methoddesc}
176+
177+
\begin{methoddesc}[ContentHandler]{startPrefixMapping}{prefix, uri}
178+
Begin the scope of a prefix-URI Namespace mapping.
179+
180+
The information from this event is not necessary for normal
181+
Namespace processing: the SAX XML reader will automatically replace
182+
prefixes for element and attribute names when the
183+
\code{http://xml.org/sax/features/namespaces} feature is true (the
184+
default).
185+
186+
%% XXX This is not really the default, is it? MvL
187+
188+
There are cases, however, when applications need to use prefixes in
189+
character data or in attribute values, where they cannot safely be
190+
expanded automatically; the start/endPrefixMapping event supplies
191+
the information to the application to expand prefixes in those
192+
contexts itself, if necessary.
193+
194+
Note that start/endPrefixMapping events are not guaranteed to be
195+
properly nested relative to each-other: all
196+
\method{startPrefixMapping()} events will occur before the
197+
corresponding startElement event, and all \method{endPrefixMapping()}
198+
events will occur after the corresponding \method{endElement()} event,
199+
but their order is not guaranteed.
200+
\end{methoddesc}
201+
202+
\begin{methoddesc}[ContentHandler]{endPrefixMapping}{prefix}
203+
End the scope of a prefix-URI mapping.
204+
205+
See \method{startPrefixMapping()} for details. This event will always
206+
occur after the corresponding endElement event, but the order of
207+
endPrefixMapping events is not otherwise guaranteed.
208+
\end{methoddesc}
209+
210+
\begin{methoddesc}[ContentHandler]{startElement}{name, attrs}
211+
Signals the start of an element in non-namespace mode.
212+
213+
The \var{name} parameter contains the raw XML 1.0 name of the
214+
element type as a string and the \var{attrs} parameter holds an
215+
instance of the \class{Attributes} class containing the attributes
216+
of the element.
217+
\end{methoddesc}
218+
219+
\begin{methoddesc}[ContentHandler]{endElement}{name}
220+
Signals the end of an element in non-namespace mode.
221+
222+
The \var{name} parameter contains the name of the element type, just
223+
as with the startElement event.
224+
\end{methoddesc}
225+
226+
\begin{methoddesc}[ContentHandler]{startElementNS}{name, qname, attrs}
227+
Signals the start of an element in namespace mode.
228+
229+
The \var{name} parameter contains the name of the element type as a
230+
(uri, localname) tuple, the \var{qname} parameter the raw XML 1.0
231+
name used in the source document, and the \var{attrs} parameter
232+
holds an instance of the \class{AttributesNS} class containing the
233+
attributes of the element.
234+
235+
Parsers may set the \var{qname} parameter to \code{None}, unless the
236+
\code{http://xml.org/sax/features/namespace-prefixes} feature is
237+
activated.
238+
\end{methoddesc}
239+
240+
\begin{methoddesc}[ContentHandler]{endElementNS}{name, qname}
241+
Signals the end of an element in namespace mode.
242+
243+
The \var{name} parameter contains the name of the element type, just
244+
as with the startElementNS event, likewise the \var{qname} parameter.
245+
\end{methoddesc}
246+
247+
\begin{methoddesc}[ContentHandler]{characters}{content}
248+
Receive notification of character data.
249+
250+
The Parser will call this method to report each chunk of character
251+
data. SAX parsers may return all contiguous character data in a
252+
single chunk, or they may split it into several chunks; however, all
253+
of the characters in any single event must come from the same
254+
external entity so that the Locator provides useful information.
255+
256+
\var{content} may be a Unicode string or a byte string; the
257+
\code{expat} reader module produces always Unicode strings.
258+
\end{methoddesc}
259+
260+
\begin{methoddesc}[ContentHandler]{ignorableWhitespace}{}
261+
Receive notification of ignorable whitespace in element content.
262+
263+
Validating Parsers must use this method to report each chunk
264+
of ignorable whitespace (see the W3C XML 1.0 recommendation,
265+
section 2.10): non-validating parsers may also use this method
266+
if they are capable of parsing and using content models.
267+
268+
SAX parsers may return all contiguous whitespace in a single
269+
chunk, or they may split it into several chunks; however, all
270+
of the characters in any single event must come from the same
271+
external entity, so that the Locator provides useful
272+
information.
273+
\end{methoddesc}
274+
275+
\begin{methoddesc}[ContentHandler]{processingInstruction}{target, data}
276+
Receive notification of a processing instruction.
277+
278+
The Parser will invoke this method once for each processing
279+
instruction found: note that processing instructions may occur
280+
before or after the main document element.
281+
282+
A SAX parser should never report an XML declaration (XML 1.0,
283+
section 2.8) or a text declaration (XML 1.0, section 4.3.1) using
284+
this method.
285+
\end{methoddesc}
286+
287+
\begin{methoddesc}[ContentHandler]{skippedEntity}{name}
288+
Receive notification of a skipped entity.
289+
290+
The Parser will invoke this method once for each entity
291+
skipped. Non-validating processors may skip entities if they have
292+
not seen the declarations (because, for example, the entity was
293+
declared in an external DTD subset). All processors may skip
294+
external entities, depending on the values of the
295+
\code{http://xml.org/sax/features/external-general-entities} and the
296+
\code{http://xml.org/sax/features/external-parameter-entities}
297+
properties.
298+
\end{methoddesc}
299+
300+
301+
\subsection{DTDHandler Objects \label{dtd-handler-objects}}
302+
303+
\class{DTDHandler} instances provide the following methods:
304+
305+
\begin{methoddesc}[DTDHandler]{notationDecl}{name, publicId, systemId}
306+
Handle a notation declaration event.
307+
\end{methoddesc}
308+
309+
\begin{methoddesc}[DTDHandler]{unparsedEntityDecl}{name, publicId,
310+
systemId, ndata}
311+
Handle an unparsed entity declaration event.
312+
\end{methoddesc}
313+
314+
315+
\subsection{EntityResolver Objects \label{entity-resolver-objects}}
316+
317+
\begin{methoddesc}[EntityResolver]{resolveEntity}{publicId, systemId}
318+
Resolve the system identifier of an entity and return either the
319+
system identifier to read from as a string, or an InputSource to
320+
read from. The default implementation returns \var{systemId}.
321+
\end{methoddesc}

0 commit comments

Comments
 (0)