Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b7979c7

Browse files
committed
Marc-Andre Lemburg <[email protected]>:
codecs module documentation, with some preliminary markup adjustments from FLD.
1 parent 9dc30bb commit b7979c7

1 file changed

Lines changed: 126 additions & 0 deletions

File tree

Doc/lib/libcodecs.tex

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
\section{\module{codecs} ---
2+
Python codec registry and base classes}
3+
4+
\declaremodule{standard}{codec}
5+
\modulesynopsis{Encode and decode data and streams.}
6+
\moduleauthor{Marc-Andre Lemburg}{[email protected]}
7+
\sectionauthor{Marc-Andre Lemburg}{[email protected]}
8+
9+
10+
\index{Unicode}
11+
\index{Codecs}
12+
\indexii{Codecs}{encode}
13+
\indexii{Codecs}{decode}
14+
\index{streams}
15+
\indexii{stackable}{streams}
16+
17+
18+
This module defines base classes for standard Python codecs (encoders
19+
and decoders) and provides access to the internal Python codec
20+
registry which manages the codec lookup process.
21+
22+
It defines the following functions:
23+
24+
\begin{funcdesc}{register}{search_function}
25+
Register a codec search function. Search functions are expected to
26+
take one argument, the encoding name in all lower case letters, and
27+
return a tuple of functions \code{(\var{encoder}, \var{decoder}, \var{stream_reader},
28+
\var{stream_writer})} taking the following arguments:
29+
30+
\var{encoder} and \var{decoder}: These must be functions or methods
31+
which have the same interface as the .encode/.decode methods of
32+
Codec instances (see Codec Interface). The functions/methods are
33+
expected to work in a stateless mode.
34+
35+
\var{stream_reader} and \var{stream_writer}: These have to be
36+
factory functions providing the following interface:
37+
38+
\code{factory(\var{stream},\var{errors}='strict')}
39+
40+
The factory functions must return objects providing the interfaces
41+
defined by the base classes
42+
\class{StreamWriter}/\class{StreamReader} resp. Stream codecs can
43+
maintain state.
44+
45+
Possible values for errors are 'strict' (raise an exception in case
46+
of an encoding error), 'replace' (replace malformed data with a
47+
suitable replacement marker, e.g. '?') and 'ignore' (ignore
48+
malformed data and continue without further notice).
49+
50+
In case a search function cannot find a given encoding, it should
51+
return None.
52+
\end{funcdesc}
53+
54+
\begin{funcdesc}{lookup}{encoding}
55+
Looks up a codec tuple in the Python codec registry and returns the
56+
function tuple as defined above.
57+
58+
Encodings are first looked up in the registry's cache. If not found,
59+
the list of registered search functions is scanned. If no codecs tuple
60+
is found, a LookupError is raised. Otherwise, the codecs tuple is
61+
stored in the cache and returned to the caller.
62+
\end{funcdesc}
63+
64+
To simplify working with encoded files or stream, the module
65+
also defines these utility functions:
66+
67+
\begin{funcdesc}{open}{filename, mode\optional{, encoding=None, errors='strict', buffering=1}}
68+
Open an encoded file using the given \var{mode} and return
69+
a wrapped version providing transparent encoding/decoding.
70+
71+
Note: The wrapped version will only accept the object format defined
72+
by the codecs, i.e. Unicode objects for most builtin codecs. Output is
73+
also codec dependent and will usually by Unicode as well.
74+
75+
\var{encoding} specifies the encoding which is to be used for the
76+
the file.
77+
78+
\var{errors} may be given to define the error handling. It defaults
79+
to 'strict' which causes a \exception{ValueError} to be raised in case
80+
an encoding error occurs.
81+
82+
\var{buffering} has the same meaning as for the builtin open() API.
83+
It defaults to line buffered.
84+
\end{funcdesc}
85+
86+
\begin{funcdesc}{EncodedFile}{file, input\optional{, output=None, errors='strict'}}
87+
88+
Return a wrapped version of file which provides transparent
89+
encoding translation.
90+
91+
Strings written to the wrapped file are interpreted according to the
92+
given \var{input} encoding and then written to the original file as
93+
string using the \var{output} encoding. The intermediate encoding will
94+
usually be Unicode but depends on the specified codecs.
95+
96+
If \var{output} is not given, it defaults to input.
97+
98+
\var{errors} may be given to define the error handling. It defaults to
99+
'strict' which causes \exception{ValueError} to be raised in case
100+
an encoding error occurs.
101+
\end{funcdesc}
102+
103+
104+
105+
...XXX document codec base classes...
106+
107+
108+
109+
The module also provides the following constants which are useful
110+
for reading and writing to platform dependent files:
111+
112+
\begin{datadesc}{BOM}
113+
\dataline{BOM_BE}
114+
\dataline{BOM_LE}
115+
\dataline{BOM32_BE}
116+
\dataline{BOM32_LE}
117+
\dataline{BOM64_BE}
118+
\dataline{BOM64_LE}
119+
These constants define the byte order marks (BOM) used in data
120+
streams to indicate the byte order used in the stream or file.
121+
\constant{BOM} is either \constant{BOM_BE} or \constant{BOM_LE}
122+
depending on the platform's native byte order, while the others
123+
represent big endian (\samp{_BE} suffix) and little endian
124+
(\samp{_LE} suffix) byte order using 32-bit and 64-bit encodings.
125+
\end{datadesc}
126+

0 commit comments

Comments
 (0)