|
| 1 | +% Documentation by ESR |
| 2 | +\section{Standard Module \module{multifile}} |
| 3 | +\stmodindex{multiFile} |
| 4 | +\label{module-multifile} |
| 5 | + |
| 6 | +The \code{MultiFile} object enables you to treat sections of a text |
| 7 | +file as file-like input objects, with EOF being returned by |
| 8 | +\code{readline} when a given delimiter pattern is encountered. The |
| 9 | +defaults of this class are designed to make it useful for parsing |
| 10 | +MIME multipart messages, but by subclassing it and overriding methods |
| 11 | +it can be easily adapted for more general use. |
| 12 | + |
| 13 | +\begin{classdesc}{MultiFile}{fp[, seekable=1]} |
| 14 | +Create a multi-file. You must instantiate this class with an input |
| 15 | +object argument for MultiFile to get lines from, such as as a file |
| 16 | +object returned by \code{open}. |
| 17 | + |
| 18 | +MultiFile only ever looks at the input object's \code{readline}, |
| 19 | +\code{seek} and \code{tell} methods, and the latter two are only |
| 20 | +needed if you want to random-access the multifile sections. To use |
| 21 | +MultiFile on a non-seekable stream object, set the optional seekable |
| 22 | +argument to 0; this will avoid using the input object's \code{seek} |
| 23 | +and \code{tell} at all. |
| 24 | +\end{classdesc} |
| 25 | + |
| 26 | +It will be useful to know that in MultiFile's view of the world, text |
| 27 | +is composed of three kinds of lines: data, section-dividers, and |
| 28 | +end-markers. MultiFile is designed to support parsing of |
| 29 | +messages that may have multiple nested message parts, each with its |
| 30 | +own pattern for section-divider and end-marker lines. |
| 31 | + |
| 32 | +\subsection{MultiFile Objects} |
| 33 | +\label{MultiFile-objects} |
| 34 | + |
| 35 | +A \class{MultiFile} instance has the following methods: |
| 36 | + |
| 37 | +\begin{methoddesc}{push}{str} |
| 38 | +Push a boundary string. When an appropriately decorated version of |
| 39 | +this boundary is found as an input line, it will be interpreted as a |
| 40 | +section-divider or end-marker and passed back as EOF. All subsequent |
| 41 | +reads will also be passed back as EOF, until a \method{pop} removes |
| 42 | +the boundary a or \method{next} call reenables it. |
| 43 | + |
| 44 | +It is possible to push more than one boundary. Encountering the |
| 45 | +most-recently-pushed boundary will return EOF; encountering any other |
| 46 | +boundary will raise an error. |
| 47 | +\end{methoddesc} |
| 48 | + |
| 49 | +\begin{methoddesc}{readline}{str} |
| 50 | +Read a line. If the line is data (not a section-divider or end-marker |
| 51 | +or real EOF) return it. If the line matches the most-recently-stacked |
| 52 | +boundary, return EOF and set \code{self.last} to 1 or 0 according as |
| 53 | +the match is or is not an end-marker. If the line matches any other |
| 54 | +stacked boundary, raise an error. If the line is a real EOF, raise an |
| 55 | +error unless all boundaries have been popped. |
| 56 | +\end{methoddesc} |
| 57 | + |
| 58 | +\begin{methoddesc}{readlines}{str} |
| 59 | +Read all lines, up to the next section. Return them as a list of strings |
| 60 | +\end{methoddesc} |
| 61 | + |
| 62 | +\begin{methoddesc}{read}{str} |
| 63 | +Read all lines, up to the next section. Return them as a single |
| 64 | +(multiline) string. Note that this doesn't take a size argument! |
| 65 | +\end{methoddesc} |
| 66 | + |
| 67 | +\begin{methoddesc}{next}{str} |
| 68 | +Skip lines to the next section (that is, read lines until a |
| 69 | +section-divider or end-marker has been consumed). Return 1 if there |
| 70 | +is such a section, 0 if an end-marker is seen. Re-enable the |
| 71 | +most-recently-pushed boundary. |
| 72 | +\end{methoddesc} |
| 73 | + |
| 74 | +\begin{methoddesc}{pop}{str} |
| 75 | +Pop a section boundary. This boundary will no longer be interpreted as EOF. |
| 76 | +\end{methoddesc} |
| 77 | + |
| 78 | +\begin{methoddesc}{seek}{str, pos, whence=0} |
| 79 | +Seek. Seek indices are relative to the start of the current section. |
| 80 | +The pos and whence arguments are interpreted as for a file seek. |
| 81 | +\end{methoddesc} |
| 82 | + |
| 83 | +\begin{methoddesc}{next}{str} |
| 84 | +Tell. Tell indices are relative to the start of the current section. |
| 85 | +\end{methoddesc} |
| 86 | + |
| 87 | +\begin{methoddesc}{is_data}{str} |
| 88 | +Return true if a 1 is certainly data and 0 if it might be a section |
| 89 | +boundary. As written, it tests for a prefix other than '--' at start of |
| 90 | +line (which all MIME boundaries have) but it is declared so it can be |
| 91 | +overridden in derived classes. |
| 92 | + |
| 93 | +Note that this test is used intended as a fast guard for the real |
| 94 | +boundary tests; if it always returns 0 it will merely slow processing, |
| 95 | +not cause it to fail. |
| 96 | +\end{methoddesc} |
| 97 | + |
| 98 | +\begin{methoddesc}{section_divider}{str} |
| 99 | +Turn a boundary into a section-divider line. By default, this |
| 100 | +method prepends '--' (which MIME section boundaries have) but it is |
| 101 | +declared so it can be overridden in derived classes. This method |
| 102 | +need not append LF or CR-LF, as comparison with the result ignores |
| 103 | +trailing whitespace. |
| 104 | +\end{methoddesc} |
| 105 | + |
| 106 | +\begin{methoddesc}{end_marker}{str} |
| 107 | +Turn a boundary string into an end-marker line. By default, this |
| 108 | +method prepends '--' and appends '--' (like a MIME-multipart |
| 109 | +end-of-message marker) but it is declared so it can be be overridden |
| 110 | +in derived classes. This method need not append LF or CR-LF, as |
| 111 | +comparison with the result ignores trailing whitespace. |
| 112 | +\end{methoddesc} |
| 113 | + |
| 114 | +Finally, \class{MultiFile} instances have two public instance variables: |
| 115 | + |
| 116 | +\begin{memberdesc}{level} |
| 117 | +\end{memberdesc} |
| 118 | + |
| 119 | +\begin{memberdesc}{last} |
| 120 | +1 if the last EOF passed back was for an end-of-message marker, 0 otherwise. |
| 121 | +\end{memberdesc} |
| 122 | + |
| 123 | +Example: |
| 124 | + |
| 125 | +\begin{verbatim} |
| 126 | + fp = MultiFile(sys.stdin, 0) |
| 127 | + fp.push(outer_boundary) |
| 128 | + message1 = fp.readlines() |
| 129 | + # We should now be either at real EOF or stopped on a message |
| 130 | + # boundary. Re-enable the outer boundary. |
| 131 | + fp.next() |
| 132 | + # Read another message with the same delimiter |
| 133 | + message2 = fp.readlines() |
| 134 | + # Re-enable that delimiter again |
| 135 | + fp.next() |
| 136 | + # Now look for a message subpart with a different boundary |
| 137 | + fp.push(inner_boundary) |
| 138 | + sub_header = fp.readlines() |
| 139 | + # If no exception has been thrown, we're looking at the start of |
| 140 | + # the message subpart. Reset and grab the subpart |
| 141 | + fp.next() |
| 142 | + sub_body = fp.readlines() |
| 143 | + # Got it. Now pop the inner boundary to re-enable the outer one. |
| 144 | + fp.pop() |
| 145 | + # Read to next outer boundary |
| 146 | + message3 = fp.readlines() |
| 147 | +\end{verbatim} |
0 commit comments