Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 7b632a6

Browse files
committed
Just another intermediate version...
1 parent 1c462ad commit 7b632a6

2 files changed

Lines changed: 178 additions & 78 deletions

File tree

Doc/ref.tex

Lines changed: 89 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
% Format this file with latex.
2-
2+
33
\documentstyle[myformat]{report}
44

55
\title{\bf
@@ -65,17 +65,18 @@ \chapter{Introduction}
6565
lexical analysis. This should make the document better understandable
6666
to the average reader, but will leave room for ambiguities.
6767
Consequently, if you were coming from Mars and tried to re-implement
68-
Python from this document alone, you might in fact be implementing
69-
quite a different language. On the other hand, if you are using
68+
Python from this document alone, you might have to guess things and in
69+
fact you would be implementing quite a different language.
70+
On the other hand, if you are using
7071
Python and wonder what the precise rules about a particular area of
71-
the language are, you should be able to find it here.
72+
the language are, you should definitely be able to find it here.
7273

7374
It is dangerous to add too many implementation details to a language
7475
reference document -- the implementation may change, and other
7576
implementations of the same language may work differently. On the
7677
other hand, there is currently only one Python implementation, and
77-
particular quirks of it are sometimes worth mentioning, especially
78-
where it differs from the ``ideal'' specification.
78+
its particular quirks are sometimes worth being mentioned, especially
79+
where the implementation imposes additional limitations.
7980

8081
Every Python implementation comes with a number of built-in and
8182
standard modules. These are not documented here, but in the separate
@@ -93,20 +94,20 @@ \section{Notation}
9394
lcletter: "a"..."z"
9495
\end{verbatim}
9596

96-
The first line says that a \verb\name\ is a \verb\lcletter\ followed by
97-
a sequence of zero or more \verb\lcletter\s and underscores. A
97+
The first line says that a \verb\name\ is an \verb\lcletter\ followed by
98+
a sequence of zero or more \verb\lcletter\s and underscores. An
9899
\verb\lcletter\ in turn is any of the single characters `a' through `z'.
99100
(This rule is actually adhered to for the names defined in syntax and
100101
grammar rules in this document.)
101102

102103
Each rule begins with a name (which is the name defined by the rule)
103-
followed by a colon. Each rule is wholly contained on one line. A
104-
vertical bar (\verb\|\) is used to separate alternatives, it is the
105-
least binding operator in this notation. A star (\verb\*\) means zero
106-
or more repetitions of the preceding item; likewise, a plus (\verb\+\)
107-
means one or more repetitions and a question mark (\verb\?\) zero or
108-
one (in other words, the preceding item is optional). These three
109-
operators bind as tight as possible; parentheses are used for
104+
and a colon, and is wholly contained on one line. A vertical bar
105+
(\verb\|\) is used to separate alternatives; it is the least binding
106+
operator in this notation. A star (\verb\*\) means zero or more
107+
repetitions of the preceding item; likewise, a plus (\verb\+\) means
108+
one or more repetitions, and a question mark (\verb\?\) zero or one
109+
(in other words, the preceding item is optional). These three
110+
operators bind as tightly as possible; parentheses are used for
110111
grouping. Literal strings are enclosed in double quotes. White space
111112
is only meaningful to separate tokens.
112113

@@ -117,7 +118,7 @@ \section{Notation}
117118
informal description of the symbol defined; e.g., this could be used
118119
to describe the notion of `control character' if needed.
119120

120-
Although the notation used is almost the same, there is a big
121+
Even though the notation used is almost the same, there is a big
121122
difference between the meaning of lexical and syntactic definitions:
122123
a lexical definition operates on the individual characters of the
123124
input source, while a syntax definition operates on the stream of
@@ -131,22 +132,22 @@ \chapter{Lexical analysis}
131132

132133
\section{Line structure}
133134

134-
A Python program is divided in a number of logical lines. Statements
135-
do not straddle logical line boundaries except where explicitly
136-
indicated by the syntax (i.e., for compound statements). To this
137-
purpose, the end of a logical line is represented by the token
138-
NEWLINE.
135+
A Python program is divided in a number of logical lines. The end of
136+
a logical line is represented by the token NEWLINE. Statements cannot
137+
cross logical line boundaries except where NEWLINE is allowed by the
138+
syntax (e.g., between statements in compound statements).
139139

140140
\subsection{Comments}
141141

142142
A comment starts with a hash character (\verb\#\) that is not part of
143-
a string literal, and ends at the end of the physical line. Comments
144-
are ignored by the syntax.
143+
a string literal, and ends at the end of the physical line. A comment
144+
always signifies the end of the logical line. Comments are ignored by
145+
the syntax.
145146

146147
\subsection{Line joining}
147148

148149
Two or more physical lines may be joined into logical lines using
149-
backslash characters (\verb/\/), as follows: When physical line ends
150+
backslash characters (\verb/\/), as follows: when a physical line ends
150151
in a backslash that is not part of a string literal or comment, it is
151152
joined with the following forming a single logical line, deleting the
152153
backslash and the following end-of-line character.
@@ -160,13 +161,14 @@ \subsection{Blank lines}
160161

161162
\subsection{Indentation}
162163

163-
Spaces and tabs at the beginning of a logical line are used to compute
164-
the indentation level of the line, which in turn is used to determine
165-
the grouping of statements.
164+
Leading whitespace (spaces and tabs) at the beginning of a logical
165+
line is used to compute the indentation level of the line, which in
166+
turn is used to determine the grouping of statements.
166167

167-
First, each tab is replaced by one to eight spaces such that the total
168-
number of spaces up to that point is a multiple of eight. The total
169-
number of spaces preceding the first non-blank character then
168+
First, tabs are replaced (from left to right) by one to eight spaces
169+
such that the total number of characters up to there is a multiple of
170+
eight (this is intended to be the same rule as used by UNIX). The
171+
total number of spaces preceding the first non-blank character then
170172
determines the line's indentation. Indentation cannot be split over
171173
multiple physical lines using backslashes.
172174

@@ -185,6 +187,38 @@ \subsection{Indentation}
185187
generated. At the end of the file, a DEDENT token is generated for
186188
each number remaining on the stack that is larger than zero.
187189

190+
Here is an example of a correctly (though confusingly) indented piece
191+
of Python code:
192+
193+
\begin{verbatim}
194+
def perm(l):
195+
if len(l) <= 1:
196+
return [l]
197+
r = []
198+
for i in range(len(l)):
199+
s = l[:i] + l[i+1:]
200+
p = perm(s)
201+
for x in p:
202+
r.append(l[i:i+1] + x)
203+
return r
204+
\end{verbatim}
205+
206+
The following example shows various indentation errors:
207+
208+
\begin{verbatim}
209+
def perm(l): # error: first line indented
210+
for i in range(len(l)): # error: not indented
211+
s = l[:i] + l[i+1:]
212+
p = perm(l[:i] + l[i+1:]) # error: unexpected indent
213+
for x in p:
214+
r.append(l[i:i+1] + x)
215+
return r # error: inconsistent indent
216+
\end{verbatim}
217+
218+
(Actually, the first three errors are detected by the parser; only the
219+
last error is found by the lexical analyzer -- the indentation of
220+
\verb\return r\ does not match a level popped off the stack.)
221+
188222
\section{Other tokens}
189223

190224
Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
@@ -205,12 +239,13 @@ \section{Identifiers}
205239
digit: "0"..."9"
206240
\end{verbatim}
207241

208-
Identifiers are unlimited in length. Case is significant.
242+
Identifiers are unlimited in length. Case is significant. Keywords
243+
are not identifiers.
209244

210245
\section{Keywords}
211246

212247
The following identifiers are used as reserved words, or {\em
213-
keywords} of the language, and may not be used as ordinary
248+
keywords} of the language, and cannot be used as ordinary
214249
identifiers. They must be spelled exactly as written here:
215250

216251
\begin{verbatim}
@@ -260,7 +295,7 @@ \subsection{String literals}
260295
\verb/\'/ & Single quote (\verb/'/) \\
261296
\verb/\a/ & ASCII Bell (BEL) \\
262297
\verb/\b/ & ASCII Backspace (BS) \\
263-
\verb/\E/ & ASCII Escape (ESC) \\
298+
%\verb/\E/ & ASCII Escape (ESC) \\
264299
\verb/\f/ & ASCII Formfeed (FF) \\
265300
\verb/\n/ & ASCII Linefeed (LF) \\
266301
\verb/\r/ & ASCII Carriage Return (CR) \\
@@ -272,13 +307,13 @@ \subsection{String literals}
272307
\end{tabular}
273308
\end{center}
274309

275-
For compatibility with in Standard C, up to three octal digits are
310+
In strict compatibility with in Standard C, up to three octal digits are
276311
accepted, but an unlimited number of hex digits is taken to be part of
277312
the hex escape (and then the lower 8 bits of the resulting hex number
278-
are used...).
313+
are used in all current implementations...).
279314

280-
All unrecognized escape sequences are left in the string {\em
281-
unchanged}, i.e., the backslash is left in the string. (This rule is
315+
All unrecognized escape sequences are left in the string unchanged,
316+
i.e., {\em the backslash is left in the string.} (This rule is
282317
useful when debugging: if an escape sequence is mistyped, the
283318
resulting output is more easily recognized as broken. It also helps a
284319
great deal for string literals used as regular expressions or
@@ -313,6 +348,18 @@ \subsection{Numeric literals}
313348
exponent: ("e"|"E") ["+"|"-"] digit+
314349
\end{verbatim}
315350

351+
Some examples of numeric literals:
352+
353+
\begin{verbatim}
354+
1 1234567890 0177777 0x80000
355+
356+
357+
\end{verbatim}
358+
359+
Note that the definitions for literals do not include a sign; a phrase
360+
like \verb\-1\ is actually an expression composed of the operator
361+
\verb\-\ and the literal \verb\1\.
362+
316363
\section{Operators}
317364

318365
The following tokens are operators:
@@ -336,13 +383,16 @@ \section{Delimiters}
336383
; , : . ` =
337384
\end{verbatim}
338385

339-
The following printing ASCII characters are currently not used;
340-
their occurrence is an unconditional error:
386+
The following printing ASCII characters are not used in Python (except
387+
in string literals and in comments). Their occurrence is an
388+
unconditional error:
341389

342390
\begin{verbatim}
343391
! @ $ " ?
344392
\end{verbatim}
345393

394+
They may be used by future versions of the language though!
395+
346396
\chapter{Execution model}
347397

348398
(XXX This chapter should explain the general model of the execution of

0 commit comments

Comments
 (0)