A python module for writing pandoc filters
Pandoc filters are pipes that read a JSON serialization of the Pandoc AST from stdin, transform it in some way, and write it to stdout. They can be used with pandoc (>= 1.12) either using pipes
pandoc -t json -s | ./caps.py | pandoc -f json
or using the --filter (or -F) command-line option.
pandoc --filter ./caps.py -s
For more on pandoc filters, see the pandoc documentation under --filter
and the tutorial on writing filters.
Pandoc 1.16 introduced link and image attributes to the existing caption and target arguments, requiring a change in pandocfilters that breaks backwards compatibility. Consequently, you should use:
- pandocfilters version <= 1.2.4 for pandoc versions 1.12--1.15, and
- pandocfilters version >= 1.3.0 for pandoc versions >= 1.16.
Pandoc 1.17.3 (pandoc-types 1.17.*) introduced a new JSON format. pandocfilters 1.4.0 should work with both the old and the new format.
Run this inside the present directory:
python setup.py install
Or install from PyPI:
pip install pandocfilters
The main functions pandocfilters exports are
- walk(x, action, format, meta)- Walk a tree, applying an action to every object. Returns a modified tree. An action is a function of the form - action(key, value, format, meta), where:- keyis the type of the pandoc object (e.g. 'Str', 'Para')
- valueis the contents of the object (e.g. a string for 'Str', a list of inline elements for 'Para')
- formatis the target output format (as supplied by the- formatargument of- walk)
- metais the document's metadata
 - The return of an action is either: - None: this means that the object should remain unchanged
- a pandoc object: this will replace the original object
- a list of pandoc objects: these will replace the original object; the list is merged with the neighbors of the orignal objects (spliced into the list the original object belongs to); returning an empty list deletes the object
 
- toJSONFilter(action)- Like - toJSONFilters, but takes a single action as argument.
- toJSONFilters(actions)- Generate a JSON-to-JSON filter from stdin to stdout - The filter: - reads a JSON-formatted pandoc document from stdin
- transforms it by walking the tree and performing the actions
- returns a new JSON-formatted pandoc document to stdout
 - The argument - actionsis a list of functions of the form- action(key, value, format, meta), as described in more detail under- walk.- This function calls - applyJSONFilters, with the- formatargument provided by the first command-line argument, if present. (Pandoc sets this by default when calling filters.)
- applyJSONFilters(actions, source, format="")- Walk through JSON structure and apply filters - This: - reads a JSON-formatted pandoc document from a source string
- transforms it by walking the tree and performing the actions
- returns a new JSON-formatted pandoc document as a string
 - The - actionsargument is a list of functions (see- walkfor a full description).- The argument - sourceis a string encoded JSON object.- The argument - formatis a string describing the output format.- Returns a the new JSON-formatted pandoc document. 
- stringify(x)- Walks the tree x and returns concatenated string content, leaving out all formatting. 
- attributes(attrs)- Returns an attribute list, constructed from the dictionary attrs. 
Most users will only need toJSONFilter.  Here is a simple example
of its use:
#!/usr/bin/env python
"""
Pandoc filter to convert all regular text to uppercase.
Code, link URLs, etc. are not affected.
"""
from pandocfilters import toJSONFilter, Str
def caps(key, value, format, meta):
  if key == 'Str':
    return Str(value.upper())
if __name__ == "__main__":
  toJSONFilter(caps)
The examples subdirectory in the source repository contains the following filters. These filters should provide a useful starting point for developing your own pandocfilters.
- abc.py
- Pandoc filter to process code blocks with class abccontaining ABC notation into images. Assumes that abcm2ps and ImageMagick's convert are in the path. Images are put in the abc-images directory.
- caps.py
- Pandoc filter to convert all regular text to uppercase. Code, link URLs, etc. are not affected.
- comments.py
- Pandoc filter that causes everything between
<!-- BEGIN COMMENT -->and<!-- END COMMENT -->to be ignored. The comment lines must appear on lines by themselves, with blank lines surrounding
- deemph.py
- Pandoc filter that causes emphasized text to be displayed in ALL CAPS.
- deflists.py
- Pandoc filter to convert definition lists to bullet lists with the defined terms in strong emphasis (for compatibility with standard markdown).
- gabc.py
- Pandoc filter to convert code blocks with class "gabc" to LaTeX \gabcsnippet commands in LaTeX output, and to images in HTML output.
- graphviz.py
- Pandoc filter to process code blocks with class graphvizinto graphviz-generated images.
- lilypond.py
- Pandoc filter to process code blocks with class "ly" containing Lilypond notation.
- metavars.py
- Pandoc filter to allow interpolation of metadata fields into a
document. %{fields}will be replaced by the field's value, assuming it is of the typeMetaInlinesorMetaString.
- myemph.py
- Pandoc filter that causes emphasis to be rendered using the custom
macro \myemph{...}rather than\emph{...}in latex. Other output formats are unaffected.
- plantuml.py
- Pandoc filter to process code blocks with class plantumlto images. Needs plantuml.jar from http://plantuml.com/.
- theorem.py
- Pandoc filter to convert divs with class="theorem"to LaTeX theorem environments in LaTeX output, and to numbered theorems in HTML output.
- tikz.py
- Pandoc filter to process raw latex tikz environments into images.
Assumes that pdflatex is in the path, and that the standalone
package is available. Also assumes that ImageMagick's convert is in
the path. Images are put in the tikz-imagesdirectory.