Query a document tree with selectors
Extracts nodes using a selector syntax that is a subset of the CSS selectors specification.
npm i mkql --save
For the command line interface install mkdoc globally (npm i -g mkdoc).
- Install
- Usage
- Example
- Selectors
- Help
- API
- License
Pass selectors when creating the stream:
var ql = require('mkql')
, ast = require('mkast');
ast.src('Paragraph\n\n* 1\n* 2\n* 3\n\n```javascript\nvar foo;\n```')
.pipe(ql('p, ul, pre[info^=javascript]'))
.pipe(ast.stringify({indent: 2}))
.pipe(process.stdout);mkcat README.md | mkql 'p, ul, pre[info^=javascript]' | mkoutprintf 'Para 1\n\nPara 2\n\n* List item\n\n' | mkcat | mkql '*' | mkout -yImplemented selectors work like their CSS counterparts and in some cases extensions have been added specific to markdown tree nodes.
Types are based on the equivalent HTML element name, so to select a node of paragraph type use p; the universal selector * will select nodes of any type.
The map of standard HTML tag names to node types is:
p: paragraphul: listol: listli: itemh1-h6: headingpre: code_blockblockquote: block_quotehr: thematic_breakcode: codeem: emphstrong: stronga: linkbr: linebreakimg: image
Extensions for markdown specific types:
nl: softbreaktext: texthtml: html_blockinline: html_inline
Use whitespace for a descendant combinator or if you prefer use the explicit >> notation from CSS4:
ol li
ol >> liA selector such as ol li will find all descendants use the child combinator operator when you just want direct children:
ol > liThe adjacent sibling combinator is supported; select all lists that are directly preceeded by a paragraph:
p + ulThe following sibling combinator is supported; select code that is preceeded by a text node:
p text ~ codeYou can match on attributes in the same way as usual but attributes are matched against tree nodes not HTML elements so the attribute names are often different.
a[href^=http://domain.com]See attribute selectors (@mdn) for more information on the available operators.
The operator =~ (not to be confused with ~=) is a non-standard operator that may be used to match by regular expression pattern:
img[src=~\.(png|jpg)$]For all nodes that have a literal property you may match on the attribute.
p text[literal~=example]Nodes that have a literal property include:
pre: code_blockcode: codetext: texthtml: html_blockinline: html_inline
The content attribute is available for containers that can contain text nodes. This is a more powerful (but slower) method to match on the text content.
Consider the document:
Paragraph with some *emphasis* and *italic*.If we select on the literal attribute we would get a text node, for example:
p [literal^=emph]Results in the child text node with a literal value of emphasis. Often we may wish to match the parent element instead to do so use the content attribute:
p [content^=emph]Which returns the emph node containing the text node matched with the previous literal query.
The value for the content attribute is all the child text nodes concatenated together which is why it will always be less performant than matching on the literal.
Links support the href and title attributes.
a[href^=http://]
a[title^=Example]Images support the src and title attributes.
img[src$=.jpg]
img[title^=Example]Code blocks support the info and fenced attributes.
pre[info^=javascript]
pre[fenced]The list and item types (ul, ol and li) support the bullet and delimiter attributes.
So you can select elements depending upon the bullet character used (unordered lists) or the delimiter (ordered lists). For the bullet attribute valid values are +, * and -; for the delimiter attribute valid values are . or ).
This selector will match lists declared using the * character:
ul[bullet=*]Or for all ordered lists declared using the 1) style:
ol[delimiter=)]Use a child selector to get list items:
ul li[bullet=+]The pseudo classes :first-child, :last-child, :only-child and :nth-child are supported.
p a:first-child
p a:last-child
ul li:nth-child(5)
ul li:nth-child(2n+1)
ul li:nth-child(odd) /* same as above */
ul li:nth-child(2n)
ul li:nth-child(even) /* same as above */
ul li:only-childSee the :nth-child docs (@mdn) for more information.
The relational pseudo-class :has is useful for selecting parents based on a condition:
p:has(em)
a:has(> img)The negation pseudo-class :not is also available:
p:not(:first-child)Use the :empty pseudo-class to select nodes with no children:
p :emptyUse the pseudo element prefix :: to select elements not directly in the tree.
The pseudo elements used to select the html_block and html_inline nodes by type are:
::commentSelect comments<!-- -->::piSelect processing instructions<? ?>::doctypeSelect doctype declarations<!doctype html>::cdataSelect CDATA declarations<![CDATA[]]>::elementSelect block and inline elements<div></div>
::doctype /* select doctype declarations */
p ::comment /* select inline html comments */Usage: mkql [-dprmnh] [--delete] [--preserve] [--range] [--multiple]
[--newline] [--help] [--version] <selector...>
mkql [-dprmnh] [--delete] [--preserve] [--multiple] [--newline] [--help]
[--version] --range <start-selector> [end-selector]
Query documents with selectors.
Options
-d, --delete Remove matched nodes
-p, --preserve Preserve text when deleting
-r, --range Execute a range query
-m, --multiple Include multiple ranges
-n, --newline Add line break between matches
-h, --help Display help and exit
--version Print the version and exit
[email protected]
compile(source)Compile a source selector string to a tree representation.
Returns Object result tree.
sourceString input selector.
range(start[, end])Compile a range query.
When an end selector is given it must have the same number of
selectors in the list as the start selector.
If the end selector is not given the range will end when the start
selector matches again or the end of file is reached.
startString selector to start the range match.endString selector to end the range match.
slice(source[, opts])Execute a range query on the input nodes.
Returns Range query execution object.
sourceObject compiled range query.optsObject range query options.
query(markdown, source[, opts])Query a markdown document tree with a source selector.
If the markdown parameter is a string it is parsed into a document tree.
If the given source selector is a string it is compiled otherwise it should be a previously compiled result tree.
If the source selector appears to be a range query the slice function is
called with the range query.
Returns Array list of matched nodes.
markdownArray|Object|String input data.sourceString|Object input selector.optsObject query options.
ql([opts][, cb])Run queries on an input stream.
Returns an output stream.
optsObject processing options.cbFunction callback function.
inputReadable input stream.outputWritable output stream.
MIT
Created by mkdoc on April 24, 2016