' >>> body = fromstring(tag_soup).find('.//body') >>> body.text - u'\xa9\u20ac-\xf5\u01bd' + '\xa9\u20ac-\xf5\u01bd' If you want them back on the way out, you can just serialise with the default encoding, which is 'US-ASCII'. @@ -139,10 +139,10 @@ Any other encoding will output the respective byte sequences. '
\xc2\xa9\xe2\x82\xac-\xc3\xb5\xc6\xbd' >>> tostring(body, encoding='unicode') - u'\xa9\u20ac-\xf5\u01bd' + '\xa9\u20ac-\xf5\u01bd' >>> tostring(body, method="html", encoding='unicode') - u'\xa9\u20ac-\xf5\u01bd' + '\xa9\u20ac-\xf5\u01bd' Using soupparser as a fallback diff --git a/doc/lxmlhtml.txt b/doc/lxmlhtml.txt index fa9bf1bc7..d07eacb7e 100644 --- a/doc/lxmlhtml.txt +++ b/doc/lxmlhtml.txt @@ -433,7 +433,7 @@ You can, for instance, do: ... name='John Smith', ... phone='555-555-3949', ... interest=set(['cats', 'llamas'])) - >>> print tostring(form) + >>> print(tostring(form)) - ... - ... spam spam SPAM! - ...a paragraph
-a paragraph
-a paragraph
-