|
| 1 | +\section{\module{stringprep} --- |
| 2 | + Internet String Preparation} |
| 3 | + |
| 4 | +\declaremodule{standard}{stringprep} |
| 5 | +\modulesynopsis{String preparation, as per RFC 3453} |
| 6 | +\moduleauthor{Martin v. L \"owis}{ [email protected]} |
| 7 | +\sectionauthor{Martin v. L \"owis}{ [email protected]} |
| 8 | + |
| 9 | +When identifying things (such as host names) in the internet, it is |
| 10 | +often necessary to compare such identifications for |
| 11 | +``equality''. Exactly how this comparison is executed may depend on |
| 12 | +the application domain, e.g. whether it should be case-insensitive or |
| 13 | +not. It may be also necessary to restrict the possible |
| 14 | +identifications, to allow only identifications consisting of |
| 15 | +``printable'' characters. |
| 16 | + |
| 17 | +\rfc{3454} defines a procedure for ``preparing'' Unicode strings in |
| 18 | +internet protocols. Before passing strings onto the wire, they are |
| 19 | +processed with the preparation procedure, after which they have a |
| 20 | +certain normalized form. The RFC defines a set of tables, which can be |
| 21 | +combined into profiles. Each profile must define which tables it uses, |
| 22 | +and what other optional parts of the \code{stringprep} procedure are |
| 23 | +part of the profile. One example of a \code{stringprep} profile is |
| 24 | +\code{nameprep}, which is used for internationalized domain names. |
| 25 | + |
| 26 | +The module \module{stringprep} only exposes the tables from RFC |
| 27 | +3454. As these tables would be very large to represent them as |
| 28 | +dictionaries or lists, the module uses the Unicode character database |
| 29 | +internally. The module source code itself was generated using the |
| 30 | +\code{mkstringprep.py} utility. |
| 31 | + |
| 32 | +As a result, these tables are exposed as functions, not as data |
| 33 | +structures. There are two kinds of tables in the RFC: sets and |
| 34 | +mappings. For a set, \module{stringprep} provides the ``characteristic |
| 35 | +function'', i.e. a function that returns true if the parameter is part |
| 36 | +of the set. For mappings, it provides the mapping function: given the |
| 37 | +key, it returns the associated value. Below is a list of all functions |
| 38 | +available in the module. |
| 39 | + |
| 40 | +\begin{funcdesc}{in_table_a1}{code} |
| 41 | +Determine whether \var{code} is in table{A.1} (Unassigned code points |
| 42 | +in Unicode 3.2). |
| 43 | +\end{funcdesc} |
| 44 | + |
| 45 | +\begin{funcdesc}{in_table_b1}{code} |
| 46 | +Determine whether \var{code} is in table{B.1} (Commonly mapped to |
| 47 | +nothing). |
| 48 | +\end{funcdesc} |
| 49 | + |
| 50 | +\begin{funcdesc}{map_table_b2}{code} |
| 51 | +Return the mapped value for \var{code} according to table{B.2} |
| 52 | +(Mapping for case-folding used with NFKC). |
| 53 | +\end{funcdesc} |
| 54 | + |
| 55 | +\begin{funcdesc}{map_table_b3}{code} |
| 56 | +Return the mapped value for \var{code} according to table{B.3} |
| 57 | +(Mapping for case-folding used with no normalization). |
| 58 | +\end{funcdesc} |
| 59 | + |
| 60 | +\begin{funcdesc}{in_table_c11}{code} |
| 61 | +Determine whether \var{code} is in table{C.1.1} |
| 62 | +(ASCII space characters). |
| 63 | +\end{funcdesc} |
| 64 | + |
| 65 | +\begin{funcdesc}{in_table_c12}{code} |
| 66 | +Determine whether \var{code} is in table{C.1.2} |
| 67 | +(Non-ASCII space characters). |
| 68 | +\end{funcdesc} |
| 69 | + |
| 70 | +\begin{funcdesc}{in_table_c11_c12}{code} |
| 71 | +Determine whether \var{code} is in table{C.1} |
| 72 | +(Space characters, union of C.1.1 and C.1.2). |
| 73 | +\end{funcdesc} |
| 74 | + |
| 75 | +\begin{funcdesc}{in_table_c21}{code} |
| 76 | +Determine whether \var{code} is in table{C.2.1} |
| 77 | +(ASCII control characters). |
| 78 | +\end{funcdesc} |
| 79 | + |
| 80 | +\begin{funcdesc}{in_table_c22}{code} |
| 81 | +Determine whether \var{code} is in table{C.2.2} |
| 82 | +(Non-ASCII control characters). |
| 83 | +\end{funcdesc} |
| 84 | + |
| 85 | +\begin{funcdesc}{in_table_c21_c22}{code} |
| 86 | +Determine whether \var{code} is in table{C.2} |
| 87 | +(Control characters, union of C.2.1 and C.2.2). |
| 88 | +\end{funcdesc} |
| 89 | + |
| 90 | +\begin{funcdesc}{in_table_c3}{code} |
| 91 | +Determine whether \var{code} is in table{C.3} |
| 92 | +(Private use). |
| 93 | +\end{funcdesc} |
| 94 | + |
| 95 | +\begin{funcdesc}{in_table_c4}{code} |
| 96 | +Determine whether \var{code} is in table{C.4} |
| 97 | +(Non-character code points). |
| 98 | +\end{funcdesc} |
| 99 | + |
| 100 | +\begin{funcdesc}{in_table_c5}{code} |
| 101 | +Determine whether \var{code} is in table{C.5} |
| 102 | +(Surrogate codes). |
| 103 | +\end{funcdesc} |
| 104 | + |
| 105 | +\begin{funcdesc}{in_table_c6}{code} |
| 106 | +Determine whether \var{code} is in table{C.6} |
| 107 | +(Inappropriate for plain text). |
| 108 | +\end{funcdesc} |
| 109 | + |
| 110 | +\begin{funcdesc}{in_table_c7}{code} |
| 111 | +Determine whether \var{code} is in table{C.7} |
| 112 | +(Inappropriate for canonical representation). |
| 113 | +\end{funcdesc} |
| 114 | + |
| 115 | +\begin{funcdesc}{in_table_c8}{code} |
| 116 | +Determine whether \var{code} is in table{C.8} |
| 117 | +(Change display properties or are deprecated). |
| 118 | +\end{funcdesc} |
| 119 | + |
| 120 | +\begin{funcdesc}{in_table_c9}{code} |
| 121 | +Determine whether \var{code} is in table{C.9} |
| 122 | +(Tagging characters). |
| 123 | +\end{funcdesc} |
| 124 | + |
| 125 | +\begin{funcdesc}{in_table_d1}{code} |
| 126 | +Determine whether \var{code} is in table{D.1} |
| 127 | +(Characters with bidirectional property ``R'' or ``AL''). |
| 128 | +\end{funcdesc} |
| 129 | + |
| 130 | +\begin{funcdesc}{in_table_d2}{code} |
| 131 | +Determine whether \var{code} is in table{D.2} |
| 132 | +(Characters with bidirectional property ``L''). |
| 133 | +\end{funcdesc} |
| 134 | + |
0 commit comments