This is a Totara-specific fork of the HTML5 Purifier definition.
This library provides HTML5 element definitions for HTML Purifier, compliant with the WHATWG spec.
It is the most complete HTML5-compliant solution among all based on HTML Purifier. Apart from providing the most extensive set of element definitions, it provides tidy/sanitization rules for transforming the input into a valid HTML5 output.
Install with Composer by running the following command:
composer require xemlock/htmlpurifier-html5
The most basic usage is similar to the original HTML Purifier. Create a HTML5-compatible config
using HTMLPurifier_HTML5Config::createDefault() factory method, and then pass it to an HTMLPurifier instance:
$config = HTMLPurifier_HTML5Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html5 = $purifier->purify($dirty_html5);To modify the config you can either instantiate the config with a configuration array passed to
HTMLPurifier_HTML5Config::create(), or by calling set method on an already existing config instance.
For example, to allow IFRAMEs with Youtube videos you can do the following:
$config = HTMLPurifier_HTML5Config::create(array(
  'HTML.SafeIframe' => true,
  'URI.SafeIframeRegexp' => '%^//www\.youtube\.com/embed/%',
));or equivalently:
$config = HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp', '%^//www\.youtube\.com/embed/%');Apart from HTML Purifier's built-in configuration directives, the following new directives are also supported:
- 
Attr.AllowedInputTypes Version added: 0.1.12 
 Type: Lookup (or null)
 Default:nullList of allowed input types, chosen from the types defined in the spec. By default, the setting is null, meaning there is no restriction on allowed types. Empty array means that no explicittypeattributes are allowed, effectively making all inputs a text inputs.
- 
HTML.Forms Version added: 0.1.12 
 Type: Boolean
 Default:falseWhether or not to permit form elements in the user input, regardless of %HTML.Trusted value. Please be very careful when using this functionality, as enabling forms in untrusted documents may allow for phishing attacks. 
- 
HTML.IframeAllowFullscreen Version added: 0.1.11 
 Type: Boolean
 Default:falseWhether or not to permit allowfullscreenattribute oniframetags. It requires either %HTML.SafeIframe or %HTML.Trusted to betrue.
- 
HTML.Link Version added: 0.1.12 
 Type: Boolean
 Default:falsePermit the linktags in the user input, regardless of %HTML.Trusted value. This effectively allowslinktags without allowing other untrusted elements.If enabled, URIs in linktags will not be matched against a whitelist specified in %URI.SafeLinkRegexp (unless %HTML.SafeIframe is also enabled).
- 
HTML.SafeLink Version added: 0.1.12 
 Type: Boolean
 Default:falseWhether to permit linktags in untrusted documents. This directive must be accompanied by a whitelist of permitted URIs via %URI.SafeLinkRegexp, otherwise nolinktags will be allowed.
- 
HTML.XHTML Version added: 0.1.12 
 Type: Boolean
 Default:falseWhile deprecated in HTML 4.01 / XHTML 1.0 context, in HTML5 it's used for enabling support for namespaced attributes and XML self-closing tags. When enabled it causes xml:langattribute to take precedence overlang, when both attributes are present on the same element.
- 
URI.SafeLinkRegexp Version added: 0.1.12 
 Type: String
 Default:nullA PCRE regular expression that will be matched against a <link>URI. This directive only has an effect if %HTML.SafeLink is enabled. Here are some example values:%^https?://localhost/%- Allow localhost URIsUse Attr.AllowedRelto control permitted link relationship types.
Aside from HTML elements supported originally by HTML Purifier, this library adds support for the following HTML5 elements:
<article>, <aside>, <audio>, <bdi>, <data>, <details>, <dialog>, <figcaption>, <figure>, <footer>, <header>, <hgroup>, <main>, <mark>, <nav>, <picture>, <progress>, <section>, <source>, <summary>, <time>, <track>, <video>, <wbr>
as well as HTML5 attributes added to existing HTML elements, such as:
<a>, <del>, <fieldset>, <ins>, <script>
The MIT License (MIT). See the LICENSE file.