-
-The ``Cleaner`` class supports several keyword arguments to control exactly
-which content is removed:
-
-.. sourcecode:: pycon
-
- >>> from lxml.html.clean import Cleaner
-
- >>> cleaner = Cleaner(page_structure=False, links=False)
- >>> print cleaner.clean_html(html)
-
-
-
-
-
-
- a link
- another link
-
a paragraph
-
secret EVIL!
- of EVIL!
- Password:
- annoying EVIL!
- spam spam SPAM!
-
-
-
-
- >>> cleaner = Cleaner(style=True, links=True, add_nofollow=True,
- ... page_structure=False, safe_attrs_only=False)
-
- >>> print cleaner.clean_html(html)
-
-
-
-
- a link
- another link
-
a paragraph
-
secret EVIL!
- of EVIL!
- Password:
- annoying EVIL!
- spam spam SPAM!
-
-
-
-
-You can also whitelist some otherwise dangerous content with
-``Cleaner(host_whitelist=['www.youtube.com'])``, which would allow
-embedded media from YouTube, while still filtering out embedded media
-from other sites.
-
-See the docstring of ``Cleaner`` for the details of what can be
-cleaned.
-
-
-autolink
---------
-
-In addition to cleaning up malicious HTML, ``lxml.html.clean``
-contains functions to do other things to your HTML. This includes
-autolinking::
-
- autolink(doc, ...)
-
- autolink_html(html, ...)
-
-This finds anything that looks like a link (e.g.,
-``http://example.com``) in the *text* of an HTML document, and
-turns it into an anchor. It avoids making bad links.
-
-Links in the elements ``