Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 42439ad

Browse files
committed
(libsgmllib.tex): Revised documentation for SGML support.
1 parent 5812488 commit 42439ad

2 files changed

Lines changed: 152 additions & 58 deletions

File tree

Doc/lib/libsgmllib.tex

Lines changed: 76 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ \section{Standard Module \sectcode{sgmllib}}
1212
\stmodindex{htmllib}
1313

1414
In particular, the parser is hardcoded to recognize the following
15-
elements:
15+
constructs:
1616

1717
\begin{itemize}
1818

@@ -22,13 +22,15 @@ \section{Standard Module \sectcode{sgmllib}}
2222
``\code{</\var{tag}>}'', respectively.
2323

2424
\item
25-
Character references of the form ``\code{\&\#\var{name};}''.
25+
Numeric character references of the form ``\code{\&\#\var{name};}''.
2626

2727
\item
2828
Entity references of the form ``\code{\&\var{name};}''.
2929

3030
\item
31-
SGML comments of the form ``\code{<!--\var{text}>}''.
31+
SGML comments of the form ``\code{<!--\var{text}-->}''. Note that
32+
spaces, tabs, and newlines are allowed between the trailing
33+
``\code{>}'' and the immediately preceeding ``\code{--}''.
3234

3335
\end{itemize}
3436

@@ -63,41 +65,83 @@ \section{Standard Module \sectcode{sgmllib}}
6365
redefined version should always call \code{SGMLParser.close()}.
6466
\end{funcdesc}
6567

68+
\begin{funcdesc}{handle_starttag}{tag\, method\, attributes}
69+
This method is called to handle start tags for which either a
70+
\code{start_\var{tag}()} or \code{do_\var{tag}()} method has been
71+
defined. The \code{tag} argument is the name of the tag converted to
72+
lower case, and the \code{method} argument is the bound method which
73+
should be used to support semantic interpretation of the start tag.
74+
The \var{attributes} argument is a list of (\var{name}, \var{value})
75+
pairs containing the attributes found inside the tag's \code{<>}
76+
brackets. The \var{name} has been translated to lower case and double
77+
quotes and backslashes in the \var{value} have been interpreted. For
78+
instance, for the tag \code{<A HREF="http://www.cwi.nl/">}, this
79+
method would be called as \code{unknown_starttag('a', [('href',
80+
'http://www.cwi.nl/')])}. The base implementation simply calls
81+
\code{method} with \code{attributes} as the only argument.
82+
\end{funcdesc}
83+
84+
\begin{funcdesc}{handle_endtag}{tag\, method}
85+
86+
This method is called to handle endtags for which an
87+
\code{end_\var{tag}()} method has been defined. The \code{tag}
88+
argument is the name of the tag converted to lower case, and the
89+
\code{method} argument is the bound method which should be used to
90+
support semantic interpretation of the end tag. If no
91+
\code{end_\var{tag}()} method is defined for the closing element, this
92+
handler is not called. The base implementation simply calls
93+
\code{method}.
94+
\end{funcdesc}
95+
96+
\begin{funcdesc}{handle_data}{data}
97+
This method is called to process arbitrary data. It is intended to be
98+
overridden by a derived class; the base class implementation does
99+
nothing.
100+
\end{funcdesc}
101+
66102
\begin{funcdesc}{handle_charref}{ref}
67103
This method is called to process a character reference of the form
68-
``\code{\&\#\var{ref};}'' where \var{ref} is a decimal number in the
104+
``\code{\&\#\var{ref};}''. In the base implementation, \var{ref} must
105+
be a decimal number in the
69106
range 0-255. It translates the character to \ASCII{} and calls the
70107
method \code{handle_data()} with the character as argument. If
71108
\var{ref} is invalid or out of range, the method
72-
\code{unknown_charref(\var{ref})} is called instead.
109+
\code{unknown_charref(\var{ref})} is called to handle the error. A
110+
subclass must override this method to provide support for named
111+
character entities.
73112
\end{funcdesc}
74113
75114
\begin{funcdesc}{handle_entityref}{ref}
76-
This method is called to process an entity reference of the form
77-
``\code{\&\var{ref};}'' where \var{ref} is an alphabetic entity
115+
This method is called to process a general entity reference of the form
116+
``\code{\&\var{ref};}'' where \var{ref} is an general entity
78117
reference. It looks for \var{ref} in the instance (or class)
79-
variable \code{entitydefs} which should give the entity's translation.
118+
variable \code{entitydefs} which should be a mapping from entity names
119+
to corresponding translations.
80120
If a translation is found, it calls the method \code{handle_data()}
81121
with the translation; otherwise, it calls the method
82-
\code{unknown_entityref(\var{ref})}.
122+
\code{unknown_entityref(\var{ref})}. The default \code{entitydefs}
123+
defines translations for \code{\&amp;}, \code{\&apos}, \code{\&gt;},
124+
\code{\&lt;}, and \code{\&quot;}.
83125
\end{funcdesc}
84126
85-
\begin{funcdesc}{handle_data}{data}
86-
This method is called to process arbitrary data. It is intended to be
87-
overridden by a derived class; the base class implementation does
88-
nothing.
127+
\begin{funcdesc}{handle_comment}{comment}
128+
This method is called when a comment is encountered. The
129+
\code{comment} argument is a string containing the text between the
130+
``\code{<!--}'' and ``\code{-->}'' delimiters, but not the delimiters
131+
themselves. For example, the comment ``\code{<!--text-->}'' will
132+
cause this method to be called with the argument \code{'text'}. The
133+
default method does nothing.
134+
\end{funcdesc}
135+
136+
\begin{funcdesc}{report_unbalanced}{tag}
137+
This method is called when an end tag is found which does not
138+
correspond to any open element.
89139
\end{funcdesc}
90140
91141
\begin{funcdesc}{unknown_starttag}{tag\, attributes}
92142
This method is called to process an unknown start tag. It is intended
93143
to be overridden by a derived class; the base class implementation
94-
does nothing. The \var{attributes} argument is a list of
95-
(\var{name}, \var{value}) pairs containing the attributes found inside
96-
the tag's \code{<>} brackets. The \var{name} has been translated to
97-
lower case and double quotes and backslashes in the \var{value} have
98-
been interpreted. For instance, for the tag
99-
\code{<A HREF="http://www.cwi.nl/">}, this method would be
100-
called as \code{unknown_starttag('a', [('href', 'http://www.cwi.nl/')])}.
144+
does nothing.
101145
\end{funcdesc}
102146
103147
\begin{funcdesc}{unknown_endtag}{tag}
@@ -107,9 +151,9 @@ \section{Standard Module \sectcode{sgmllib}}
107151
\end{funcdesc}
108152
109153
\begin{funcdesc}{unknown_charref}{ref}
110-
This method is called to process an unknown character reference. It
111-
is intended to be overridden by a derived class; the base class
112-
implementation does nothing.
154+
This method is called to process unresolvable numeric character
155+
references. It is intended to be overridden by a derived class; the
156+
base class implementation does nothing.
113157
\end{funcdesc}
114158
115159
\begin{funcdesc}{unknown_entityref}{ref}
@@ -127,22 +171,25 @@ \section{Standard Module \sectcode{sgmllib}}
127171
\begin{funcdesc}{start_\var{tag}}{attributes}
128172
This method is called to process an opening tag \var{tag}. It has
129173
preference over \code{do_\var{tag}()}. The \var{attributes} argument
130-
has the same meaning as described for \code{unknown_tag()} above.
174+
has the same meaning as described for \code{handle_starttag()} above.
131175
\end{funcdesc}
132176
133177
\begin{funcdesc}{do_\var{tag}}{attributes}
134178
This method is called to process an opening tag \var{tag} that does
135179
not come with a matching closing tag. The \var{attributes} argument
136-
has the same meaning as described for \code{unknown_tag()} above.
180+
has the same meaning as described for \code{handle_starttag()} above.
137181
\end{funcdesc}
138182
139183
\begin{funcdesc}{end_\var{tag}}{}
140184
This method is called to process a closing tag \var{tag}.
141185
\end{funcdesc}
142186
143-
Note that the parser maintains a stack of opening tags for which no
144-
matching closing tag has been found yet. Only tags processed by
145-
\code{start_\var{tag}()} are pushed on this stack. Definition of a
187+
Note that the parser maintains a stack of open elements for which no
188+
end tag has been found yet. Only tags processed by
189+
\code{start_\var{tag}()} are pushed on this stack. Definition of an
146190
\code{end_\var{tag}()} method is optional for these tags. For tags
147191
processed by \code{do_\var{tag}()} or by \code{unknown_tag()}, no
148-
\code{end_\var{tag}()} method must be defined.
192+
\code{end_\var{tag}()} method must be defined; if defined, it will not
193+
be used. If both \code{start_\var{tag}()} and \code{do_\var{tag}()}
194+
methods exist for a tag, the \code{start_\var{tag}()} method takes
195+
precedence.

Doc/libsgmllib.tex

Lines changed: 76 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ \section{Standard Module \sectcode{sgmllib}}
1212
\stmodindex{htmllib}
1313

1414
In particular, the parser is hardcoded to recognize the following
15-
elements:
15+
constructs:
1616

1717
\begin{itemize}
1818

@@ -22,13 +22,15 @@ \section{Standard Module \sectcode{sgmllib}}
2222
``\code{</\var{tag}>}'', respectively.
2323

2424
\item
25-
Character references of the form ``\code{\&\#\var{name};}''.
25+
Numeric character references of the form ``\code{\&\#\var{name};}''.
2626

2727
\item
2828
Entity references of the form ``\code{\&\var{name};}''.
2929

3030
\item
31-
SGML comments of the form ``\code{<!--\var{text}>}''.
31+
SGML comments of the form ``\code{<!--\var{text}-->}''. Note that
32+
spaces, tabs, and newlines are allowed between the trailing
33+
``\code{>}'' and the immediately preceeding ``\code{--}''.
3234

3335
\end{itemize}
3436

@@ -63,41 +65,83 @@ \section{Standard Module \sectcode{sgmllib}}
6365
redefined version should always call \code{SGMLParser.close()}.
6466
\end{funcdesc}
6567

68+
\begin{funcdesc}{handle_starttag}{tag\, method\, attributes}
69+
This method is called to handle start tags for which either a
70+
\code{start_\var{tag}()} or \code{do_\var{tag}()} method has been
71+
defined. The \code{tag} argument is the name of the tag converted to
72+
lower case, and the \code{method} argument is the bound method which
73+
should be used to support semantic interpretation of the start tag.
74+
The \var{attributes} argument is a list of (\var{name}, \var{value})
75+
pairs containing the attributes found inside the tag's \code{<>}
76+
brackets. The \var{name} has been translated to lower case and double
77+
quotes and backslashes in the \var{value} have been interpreted. For
78+
instance, for the tag \code{<A HREF="http://www.cwi.nl/">}, this
79+
method would be called as \code{unknown_starttag('a', [('href',
80+
'http://www.cwi.nl/')])}. The base implementation simply calls
81+
\code{method} with \code{attributes} as the only argument.
82+
\end{funcdesc}
83+
84+
\begin{funcdesc}{handle_endtag}{tag\, method}
85+
86+
This method is called to handle endtags for which an
87+
\code{end_\var{tag}()} method has been defined. The \code{tag}
88+
argument is the name of the tag converted to lower case, and the
89+
\code{method} argument is the bound method which should be used to
90+
support semantic interpretation of the end tag. If no
91+
\code{end_\var{tag}()} method is defined for the closing element, this
92+
handler is not called. The base implementation simply calls
93+
\code{method}.
94+
\end{funcdesc}
95+
96+
\begin{funcdesc}{handle_data}{data}
97+
This method is called to process arbitrary data. It is intended to be
98+
overridden by a derived class; the base class implementation does
99+
nothing.
100+
\end{funcdesc}
101+
66102
\begin{funcdesc}{handle_charref}{ref}
67103
This method is called to process a character reference of the form
68-
``\code{\&\#\var{ref};}'' where \var{ref} is a decimal number in the
104+
``\code{\&\#\var{ref};}''. In the base implementation, \var{ref} must
105+
be a decimal number in the
69106
range 0-255. It translates the character to \ASCII{} and calls the
70107
method \code{handle_data()} with the character as argument. If
71108
\var{ref} is invalid or out of range, the method
72-
\code{unknown_charref(\var{ref})} is called instead.
109+
\code{unknown_charref(\var{ref})} is called to handle the error. A
110+
subclass must override this method to provide support for named
111+
character entities.
73112
\end{funcdesc}
74113
75114
\begin{funcdesc}{handle_entityref}{ref}
76-
This method is called to process an entity reference of the form
77-
``\code{\&\var{ref};}'' where \var{ref} is an alphabetic entity
115+
This method is called to process a general entity reference of the form
116+
``\code{\&\var{ref};}'' where \var{ref} is an general entity
78117
reference. It looks for \var{ref} in the instance (or class)
79-
variable \code{entitydefs} which should give the entity's translation.
118+
variable \code{entitydefs} which should be a mapping from entity names
119+
to corresponding translations.
80120
If a translation is found, it calls the method \code{handle_data()}
81121
with the translation; otherwise, it calls the method
82-
\code{unknown_entityref(\var{ref})}.
122+
\code{unknown_entityref(\var{ref})}. The default \code{entitydefs}
123+
defines translations for \code{\&amp;}, \code{\&apos}, \code{\&gt;},
124+
\code{\&lt;}, and \code{\&quot;}.
83125
\end{funcdesc}
84126
85-
\begin{funcdesc}{handle_data}{data}
86-
This method is called to process arbitrary data. It is intended to be
87-
overridden by a derived class; the base class implementation does
88-
nothing.
127+
\begin{funcdesc}{handle_comment}{comment}
128+
This method is called when a comment is encountered. The
129+
\code{comment} argument is a string containing the text between the
130+
``\code{<!--}'' and ``\code{-->}'' delimiters, but not the delimiters
131+
themselves. For example, the comment ``\code{<!--text-->}'' will
132+
cause this method to be called with the argument \code{'text'}. The
133+
default method does nothing.
134+
\end{funcdesc}
135+
136+
\begin{funcdesc}{report_unbalanced}{tag}
137+
This method is called when an end tag is found which does not
138+
correspond to any open element.
89139
\end{funcdesc}
90140
91141
\begin{funcdesc}{unknown_starttag}{tag\, attributes}
92142
This method is called to process an unknown start tag. It is intended
93143
to be overridden by a derived class; the base class implementation
94-
does nothing. The \var{attributes} argument is a list of
95-
(\var{name}, \var{value}) pairs containing the attributes found inside
96-
the tag's \code{<>} brackets. The \var{name} has been translated to
97-
lower case and double quotes and backslashes in the \var{value} have
98-
been interpreted. For instance, for the tag
99-
\code{<A HREF="http://www.cwi.nl/">}, this method would be
100-
called as \code{unknown_starttag('a', [('href', 'http://www.cwi.nl/')])}.
144+
does nothing.
101145
\end{funcdesc}
102146
103147
\begin{funcdesc}{unknown_endtag}{tag}
@@ -107,9 +151,9 @@ \section{Standard Module \sectcode{sgmllib}}
107151
\end{funcdesc}
108152
109153
\begin{funcdesc}{unknown_charref}{ref}
110-
This method is called to process an unknown character reference. It
111-
is intended to be overridden by a derived class; the base class
112-
implementation does nothing.
154+
This method is called to process unresolvable numeric character
155+
references. It is intended to be overridden by a derived class; the
156+
base class implementation does nothing.
113157
\end{funcdesc}
114158
115159
\begin{funcdesc}{unknown_entityref}{ref}
@@ -127,22 +171,25 @@ \section{Standard Module \sectcode{sgmllib}}
127171
\begin{funcdesc}{start_\var{tag}}{attributes}
128172
This method is called to process an opening tag \var{tag}. It has
129173
preference over \code{do_\var{tag}()}. The \var{attributes} argument
130-
has the same meaning as described for \code{unknown_tag()} above.
174+
has the same meaning as described for \code{handle_starttag()} above.
131175
\end{funcdesc}
132176
133177
\begin{funcdesc}{do_\var{tag}}{attributes}
134178
This method is called to process an opening tag \var{tag} that does
135179
not come with a matching closing tag. The \var{attributes} argument
136-
has the same meaning as described for \code{unknown_tag()} above.
180+
has the same meaning as described for \code{handle_starttag()} above.
137181
\end{funcdesc}
138182
139183
\begin{funcdesc}{end_\var{tag}}{}
140184
This method is called to process a closing tag \var{tag}.
141185
\end{funcdesc}
142186
143-
Note that the parser maintains a stack of opening tags for which no
144-
matching closing tag has been found yet. Only tags processed by
145-
\code{start_\var{tag}()} are pushed on this stack. Definition of a
187+
Note that the parser maintains a stack of open elements for which no
188+
end tag has been found yet. Only tags processed by
189+
\code{start_\var{tag}()} are pushed on this stack. Definition of an
146190
\code{end_\var{tag}()} method is optional for these tags. For tags
147191
processed by \code{do_\var{tag}()} or by \code{unknown_tag()}, no
148-
\code{end_\var{tag}()} method must be defined.
192+
\code{end_\var{tag}()} method must be defined; if defined, it will not
193+
be used. If both \code{start_\var{tag}()} and \code{do_\var{tag}()}
194+
methods exist for a tag, the \code{start_\var{tag}()} method takes
195+
precedence.

0 commit comments

Comments
 (0)