@@ -12,7 +12,7 @@ \section{Standard Module \sectcode{sgmllib}}
1212\stmodindex {htmllib}
1313
1414In particular, the parser is hardcoded to recognize the following
15- elements :
15+ constructs :
1616
1717\begin {itemize }
1818
@@ -22,13 +22,15 @@ \section{Standard Module \sectcode{sgmllib}}
2222`` \code {</\var {tag}>}'' , respectively.
2323
2424\item
25- Character references of the form `` \code {\&\# \var {name};}'' .
25+ Numeric character references of the form `` \code {\&\# \var {name};}'' .
2626
2727\item
2828Entity references of the form `` \code {\& \var {name};}'' .
2929
3030\item
31- SGML comments of the form `` \code {<!--\var {text}>}'' .
31+ SGML comments of the form `` \code {<!--\var {text}-->}'' . Note that
32+ spaces, tabs, and newlines are allowed between the trailing
33+ `` \code {>}'' and the immediately preceeding `` \code {--}'' .
3234
3335\end {itemize }
3436
@@ -63,41 +65,83 @@ \section{Standard Module \sectcode{sgmllib}}
6365redefined version should always call \code {SGMLParser.close()}.
6466\end {funcdesc }
6567
68+ \begin {funcdesc }{handle_starttag}{tag\, method\, attributes}
69+ This method is called to handle start tags for which either a
70+ \code {start_\var {tag}()} or \code {do_\var {tag}()} method has been
71+ defined. The \code {tag} argument is the name of the tag converted to
72+ lower case, and the \code {method} argument is the bound method which
73+ should be used to support semantic interpretation of the start tag.
74+ The \var {attributes} argument is a list of (\var {name}, \var {value})
75+ pairs containing the attributes found inside the tag's \code {<>}
76+ brackets. The \var {name} has been translated to lower case and double
77+ quotes and backslashes in the \var {value} have been interpreted. For
78+ instance, for the tag \code {<A HREF="http://www.cwi.nl/"> }, this
79+ method would be called as \code {unknown_starttag('a', [('href',
80+ 'http://www.cwi.nl/' )])}. The base implementation simply calls
81+ \code {method} with \code {attributes} as the only argument.
82+ \end {funcdesc}
83+
84+ \begin {funcdesc }{handle_endtag}{tag\, method}
85+
86+ This method is called to handle endtags for which an
87+ \code {end_\var {tag}()} method has been defined. The \code {tag}
88+ argument is the name of the tag converted to lower case, and the
89+ \code {method} argument is the bound method which should be used to
90+ support semantic interpretation of the end tag. If no
91+ \code {end_\var {tag}()} method is defined for the closing element, this
92+ handler is not called. The base implementation simply calls
93+ \code {method}.
94+ \end {funcdesc }
95+
96+ \begin {funcdesc }{handle_data}{data}
97+ This method is called to process arbitrary data. It is intended to be
98+ overridden by a derived class; the base class implementation does
99+ nothing.
100+ \end {funcdesc }
101+
66102\begin {funcdesc }{handle_charref}{ref}
67103This method is called to process a character reference of the form
68- `` \code {\&\# \var {ref};}'' where \var {ref} is a decimal number in the
104+ `` \code {\&\# \var {ref};}'' . In the base implementation, \var {ref} must
105+ be a decimal number in the
69106range 0-255. It translates the character to \ASCII {} and calls the
70107method \code {handle_data()} with the character as argument. If
71108\var {ref} is invalid or out of range, the method
72- \code {unknown_charref(\var {ref})} is called instead.
109+ \code {unknown_charref(\var {ref})} is called to handle the error. A
110+ subclass must override this method to provide support for named
111+ character entities.
73112\end {funcdesc }
74113
75114\begin {funcdesc }{handle_entityref}{ref}
76- This method is called to process an entity reference of the form
77- `` \code {\& \var {ref};}'' where \var {ref} is an alphabetic entity
115+ This method is called to process a general entity reference of the form
116+ `` \code {\& \var {ref};}'' where \var {ref} is an general entity
78117reference. It looks for \var {ref} in the instance (or class)
79- variable \code {entitydefs} which should give the entity's translation.
118+ variable \code {entitydefs} which should be a mapping from entity names
119+ to corresponding translations.
80120If a translation is found, it calls the method \code {handle_data()}
81121with the translation; otherwise, it calls the method
82- \code {unknown_entityref(\var {ref})}.
122+ \code {unknown_entityref(\var {ref})}. The default \code {entitydefs}
123+ defines translations for \code {\& amp;}, \code {\& apos}, \code {\& gt;},
124+ \code {\& lt;}, and \code {\& quot;}.
83125\end {funcdesc }
84126
85- \begin {funcdesc }{handle_data}{data}
86- This method is called to process arbitrary data. It is intended to be
87- overridden by a derived class; the base class implementation does
88- nothing.
127+ \begin {funcdesc }{handle_comment}{comment}
128+ This method is called when a comment is encountered. The
129+ \code {comment} argument is a string containing the text between the
130+ `` \code {<!--}'' and `` \code {-->}'' delimiters, but not the delimiters
131+ themselves. For example, the comment `` \code {<!--text-->}'' will
132+ cause this method to be called with the argument \code {'text'}. The
133+ default method does nothing.
134+ \end {funcdesc }
135+
136+ \begin {funcdesc }{report_unbalanced}{tag}
137+ This method is called when an end tag is found which does not
138+ correspond to any open element.
89139\end {funcdesc }
90140
91141\begin {funcdesc }{unknown_starttag}{tag\, attributes}
92142This method is called to process an unknown start tag. It is intended
93143to be overridden by a derived class; the base class implementation
94- does nothing. The \var {attributes} argument is a list of
95- (\var {name}, \var {value}) pairs containing the attributes found inside
96- the tag's \code {<>} brackets. The \var {name} has been translated to
97- lower case and double quotes and backslashes in the \var {value} have
98- been interpreted. For instance, for the tag
99- \code {<A HREF="http://www.cwi.nl/"> }, this method would be
100- called as \code {unknown_starttag('a', [('href', 'http://www.cwi.nl/' )])}.
144+ does nothing.
101145\end {funcdesc }
102146
103147\begin {funcdesc }{unknown_endtag}{tag}
@@ -107,9 +151,9 @@ \section{Standard Module \sectcode{sgmllib}}
107151\end {funcdesc }
108152
109153\begin {funcdesc }{unknown_charref}{ref}
110- This method is called to process an unknown character reference. It
111- is intended to be overridden by a derived class; the base class
112- implementation does nothing.
154+ This method is called to process unresolvable numeric character
155+ references. It is intended to be overridden by a derived class; the
156+ base class implementation does nothing.
113157\end {funcdesc }
114158
115159\begin {funcdesc }{unknown_entityref}{ref}
@@ -127,22 +171,25 @@ \section{Standard Module \sectcode{sgmllib}}
127171\begin {funcdesc }{start_\var {tag}}{attributes}
128172This method is called to process an opening tag \var {tag}. It has
129173preference over \code {do_\var {tag}()}. The \var {attributes} argument
130- has the same meaning as described for \code {unknown_tag ()} above.
174+ has the same meaning as described for \code {handle_starttag ()} above.
131175\end {funcdesc }
132176
133177\begin {funcdesc }{do_\var {tag}}{attributes}
134178This method is called to process an opening tag \var {tag} that does
135179not come with a matching closing tag. The \var {attributes} argument
136- has the same meaning as described for \code {unknown_tag ()} above.
180+ has the same meaning as described for \code {handle_starttag ()} above.
137181\end {funcdesc }
138182
139183\begin {funcdesc }{end_\var {tag}}{}
140184This method is called to process a closing tag \var {tag}.
141185\end {funcdesc }
142186
143- Note that the parser maintains a stack of opening tags for which no
144- matching closing tag has been found yet. Only tags processed by
145- \code {start_\var {tag}()} are pushed on this stack. Definition of a
187+ Note that the parser maintains a stack of open elements for which no
188+ end tag has been found yet. Only tags processed by
189+ \code {start_\var {tag}()} are pushed on this stack. Definition of an
146190\code {end_\var {tag}()} method is optional for these tags. For tags
147191processed by \code {do_\var {tag}()} or by \code {unknown_tag()}, no
148- \code {end_\var {tag}()} method must be defined.
192+ \code {end_\var {tag}()} method must be defined; if defined, it will not
193+ be used. If both \code {start_\var {tag}()} and \code {do_\var {tag}()}
194+ methods exist for a tag, the \code {start_\var {tag}()} method takes
195+ precedence.
0 commit comments