Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 0c2d8b8

Browse files
committed
Fixing a note on encoding declaration, its usage in urlopen based on review
comments from RDM and Ezio.
1 parent 5e73a81 commit 0c2d8b8

1 file changed

Lines changed: 24 additions & 17 deletions

File tree

Doc/library/urllib.request.rst

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1072,30 +1072,37 @@ HTTPErrorProcessor Objects
10721072
Examples
10731073
--------
10741074

1075-
This example gets the python.org main page and displays the first 100 bytes of
1075+
This example gets the python.org main page and displays the first 300 bytes of
10761076
it. ::
10771077

10781078
>>> import urllib.request
10791079
>>> f = urllib.request.urlopen('http://www.python.org/')
1080-
>>> print(f.read(100))
1081-
b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
1082-
<?xml-stylesheet href="./css/ht2html'
1083-
1084-
Note that in Python 3, urlopen returns a bytes object by default. In many
1085-
circumstances, you might expect the output of urlopen to be a string. This
1086-
might be a carried over expectation from Python 2, where urlopen returned
1087-
string or it might even the common usecase. In those cases, you should
1088-
explicitly decode the bytes to string.
1089-
1090-
In the examples below, we have chosen *utf-8* encoding for demonstration, you
1091-
might choose the encoding which is suitable for the webpage you are
1092-
requesting::
1080+
>>> print(f.read(300))
1081+
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1082+
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1083+
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1084+
<meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1085+
<title>Python Programming '
1086+
1087+
Note that urlopen returns a bytes object. This is because there is no way
1088+
for urlopen to automatically determine the encoding of the byte stream
1089+
it receives from the http server. In general, a program will decode
1090+
the returned bytes object to string once it determines or guesses
1091+
the appropriate encoding.
1092+
1093+
The following W3C document, http://www.w3.org/International/O-charset , lists
1094+
the various ways in which a (X)HTML or a XML document could have specified its
1095+
encoding information.
1096+
1097+
As python.org website uses *utf-8* encoding as specified in it's meta tag, we
1098+
will use same for decoding the bytes object. ::
10931099

10941100
>>> import urllib.request
10951101
>>> f = urllib.request.urlopen('http://www.python.org/')
1096-
>>> print(f.read(100).decode('utf-8')
1097-
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
1098-
<?xml-stylesheet href="./css/ht2html
1102+
>>> print(fp.read(100).decode('utf-8'))
1103+
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1104+
"http://www.w3.org/TR/xhtml1/DTD/xhtm
1105+
10991106

11001107
In the following example, we are sending a data-stream to the stdin of a CGI
11011108
and reading the data it returns to us. Note that this example will only work

0 commit comments

Comments
 (0)