This library provides HTML5 element definitions for HTML Purifier, compliant with the WHATWG spec.
It is the most complete HTML5-compliant solution among all based on HTML Purifier. Apart from providing the most extensive set of element definitions, it provides tidy/sanitization rules for transforming the input into a valid HTML5 output.
Install with Composer by running the following command:
composer require xemlock/htmlpurifier-html5
The most basic usage is similar to the original HTML Purifier. Create a HTML5-compatible config
using HTMLPurifier_HTML5Config::createDefault() factory method, and then pass it to an HTMLPurifier instance:
$config = HTMLPurifier_HTML5Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html5 = $purifier->purify($dirty_html5);To modify the config you can either instantiate the config with a configuration array passed to
HTMLPurifier_HTML5Config::create(), or by calling set method on an already existing config instance.
For example, to allow IFRAMEs with Youtube videos you can do the following:
$config = HTMLPurifier_HTML5Config::create(array(
'HTML.SafeIframe' => true,
'URI.SafeIframeRegexp' => '%^//www\.youtube\.com/embed/%',
));or equivalently:
$config = HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp', '%^//www\.youtube\.com/embed/%');Apart from HTML Purifier's built-in configuration directives, the following new directives are also supported:
-
Attr.AllowedInputTypes
Version added: 0.1.12
Type: Lookup (or null)
Default:nullList of allowed input types, chosen from the types defined in the spec. By default, the setting is
null, meaning there is no restriction on allowed types. Empty array means that no explicittypeattributes are allowed, effectively making all inputs a text inputs. -
HTML.Forms
Version added: 0.1.12
Type: Boolean
Default:falseWhether or not to permit form elements in the user input, regardless of %HTML.Trusted value. Please be very careful when using this functionality, as enabling forms in untrusted documents may allow for phishing attacks.
-
HTML.IframeAllowFullscreen
Version added: 0.1.11
Type: Boolean
Default:falseWhether or not to permit
allowfullscreenattribute oniframetags. It requires either %HTML.SafeIframe or %HTML.Trusted to betrue. -
HTML.Link
Version added: 0.1.12
Type: Boolean
Default:falsePermit the
linktags in the user input, regardless of %HTML.Trusted value. This effectively allowslinktags without allowing other untrusted elements.If enabled, URIs in
linktags will not be matched against a whitelist specified in %URI.SafeLinkRegexp (unless %HTML.SafeIframe is also enabled). -
HTML.SafeLink
Version added: 0.1.12
Type: Boolean
Default:falseWhether to permit
linktags in untrusted documents. This directive must be accompanied by a whitelist of permitted URIs via %URI.SafeLinkRegexp, otherwise nolinktags will be allowed. -
HTML.XHTML
Version added: 0.1.12
Type: Boolean
Default:falseWhile deprecated in HTML 4.01 / XHTML 1.0 context, in HTML5 it's used for enabling support for namespaced attributes and XML self-closing tags.
When enabled it causes
xml:langattribute to take precedence overlang, when both attributes are present on the same element. -
URI.SafeLinkRegexp
Version added: 0.1.12
Type: String
Default:nullA PCRE regular expression that will be matched against a
<link>URI. This directive only has an effect if %HTML.SafeLink is enabled. Here are some example values:%^https?://localhost/%- Allow localhost URIsUse
Attr.AllowedRelto control permitted link relationship types.
Aside from HTML elements supported originally by HTML Purifier, this library adds support for the following HTML5 elements:
<article>, <aside>, <audio>, <bdi>, <data>, <details>, <dialog>, <figcaption>, <figure>, <footer>, <header>, <hgroup>, <main>, <mark>, <nav>, <picture>, <progress>, <section>, <source>, <summary>, <time>, <track>, <video>, <wbr>
as well as HTML5 attributes added to existing HTML elements, such as:
<a>, <del>, <fieldset>, <ins>, <script>
The MIT License (MIT). See the LICENSE file.