Reference Manual
The lezer system consists of multiple modules, each distributed as a separate package on npm.
@lezer/common: The data structures for the syntax tree and the types shared between all parser implementations.@lezer/lr: The LR parser runtime.@lezer/highlight: A system for attaching highlighting information to syntax trees and using that to highlight code.@lezer/generator: The parser generator, an offline build tool to create parse tables from a grammar description.
@lezer/common module
This package provides common data structures used by all Lezer-related parsing—those related to syntax trees and the generic interface of parsers. Their main use is the LR parsers generated by the parser generator, but for example the Markdown parser implements a different parsing algorithm using the same interfaces.
Trees
Lezer syntax trees are not abstract, they just tell you which nodes were parsed where, without providing additional information about their role or relation (beyond parent-child relations). This makes them rather unsuited for some purposes, but quick to construct and cheap to store.
-
classTree A piece of syntax tree. There are two ways to approach these trees: the way they are actually stored in memory, and the convenient way.
Syntax trees are stored as a tree of
TreeandTreeBufferobjects. By packing detail information intoTreeBufferleaf nodes, the representation is made a lot more memory-efficient.However, when you want to actually work with tree nodes, this representation is very awkward, so most client code will want to use the
TreeCursororSyntaxNodeinterface instead, which provides a view on some part of this data structure, and can be used to move around to adjacent nodes.-
new Tree() Construct a new tree. See also
Tree.build.-
props Per-node node props to associate with this node.
-
-
type: NodeType The type of the top node.
-
children: readonly (Tree | TreeBuffer)[] This node's child nodes.
-
positions: readonly number[] The positions (offsets relative to the start of this tree) of the children.
-
length: number The total length of this tree
-
cursor(mode?: IterMode = 0 as IterMode) → TreeCursor Get a tree cursor positioned at the top of the tree. Mode can be used to control which nodes the cursor visits.
-
cursorAt() → TreeCursor Get a tree cursor pointing into this tree at the given position and side (see
moveTo.-
topNode: SyntaxNode Get a syntax node object for the top of the tree.
-
resolve(pos: number, side?: -1 | 0 | 1 = 0) → SyntaxNode Get the syntax node at the given position. If
sideis -1, this will move into nodes that end at the position. If 1, it'll move into nodes that start at the position. With 0, it'll only enter nodes that cover the position from both sides.Note that this will not enter overlays, and you often want
resolveInnerinstead.-
resolveInner(pos: number, side?: -1 | 0 | 1 = 0) → SyntaxNode Like
resolve, but will enter overlaid nodes, producing a syntax node pointing into the innermost overlaid tree at the given position (with parent links going through all parent structure, including the host trees).-
resolveStack(pos: number, side?: -1 | 0 | 1 = 0) → NodeIterator In some situations, it can be useful to iterate through all nodes around a position, including those in overlays that don't directly cover the position. This method gives you an iterator that will produce all nodes, from small to big, around the given position.
-
iterate() Iterate over the tree and its children, calling
enterfor any node that touches thefrom/toregion (if given) before running over such a node's children, andleave(if given) when leaving the node. Whenenterreturnsfalse, that node will not have its children iterated over (orleavecalled).-
prop<T>(prop: NodeProp<T>) → T | undefined Get the value of the given node prop for this node. Works with both per-node and per-type props.
-
propValues: readonly [number | NodeProp<any>, any][] Returns the node's per-node props in a format that can be passed to the
Treeconstructor.-
balance(config?: Object = {}) → Tree Balance the direct children of this tree, producing a copy of which may have children grouped into subtrees with type
NodeType.none.-
config -
makeTree?: fn() → Tree Function to create the newly balanced subtrees.
-
-
-
static empty: Tree The empty tree
-
static build(data: Object) → Tree Build a tree from a postfix-ordered buffer of node information, or a cursor over such a buffer.
-
data -
buffer: BufferCursor | readonly number[] The buffer or buffer cursor to read the node data from.
When this is an array, it should contain four values for every node in the tree.
- The first holds the node's type, as a node ID pointing into
the given
NodeSet. - The second holds the node's start offset.
- The third the end offset.
- The fourth the amount of space taken up in the array by this node and its children. Since there's four values per node, this is the total number of nodes inside this node (children and transitive children) plus one for the node itself, times four.
Parent nodes should appear after child nodes in the array. As an example, a node of type 10 spanning positions 0 to 4, with two children, of type 11 and 12, might look like this:
[11, 0, 1, 4, 12, 2, 4, 4, 10, 0, 4, 12]- The first holds the node's type, as a node ID pointing into
the given
-
nodeSet: NodeSet The node types to use.
-
topID: number The id of the top node type.
-
start?: number The position the tree should start at. Defaults to 0.
-
bufferStart?: number The position in the buffer where the function should stop reading. Defaults to 0.
-
length?: number The length of the wrapping node. The end offset of the last child is used when not provided.
-
maxBufferLength?: number The maximum buffer length to use. Defaults to
DefaultBufferLength.-
reused?: readonly Tree[] An optional array holding reused nodes that the buffer can refer to.
-
minRepeatType?: number The first node type that indicates repeat constructs in this grammar.
-
-
-
-
interfaceSyntaxNodeRef The set of properties provided by both
SyntaxNodeandTreeCursor. Note that, if you need an object that is guaranteed to stay stable in the future, you need to use thenodeaccessor.-
from: number The start position of the node.
-
to: number The end position of the node.
-
type: NodeType The type of the node.
-
name: string The name of the node (
.type.name).-
tree: Tree | null Get the tree that represents the current node, if any. Will return null when the node is in a tree buffer.
-
node: SyntaxNode Retrieve a stable syntax node at this position.
-
matchContext(context: readonly string[]) → boolean Test whether the node matches a given context—a sequence of direct parent nodes. Empty strings in the context array act as wildcards, other strings must match the ancestor node's name.
-
-
interfaceSyntaxNodeextends SyntaxNodeRef A syntax node provides an immutable pointer to a given node in a tree. When iterating over large amounts of nodes, you may want to use a mutable cursor instead, which is more efficient.
-
parent: SyntaxNode | null The node's parent node, if any.
-
firstChild: SyntaxNode | null The first child, if the node has children.
-
lastChild: SyntaxNode | null The node's last child, if available.
-
childAfter(pos: number) → SyntaxNode | null The first child that ends after
pos.-
childBefore(pos: number) → SyntaxNode | null The last child that starts before
pos.-
enter() → SyntaxNode | null Enter the child at the given position. If side is -1 the child may end at that position, when 1 it may start there.
This will by default enter overlaid mounted trees. You can set
overlaysto false to disable that.Similarly, when
buffersis false this will not enter buffers, only nodes (which is mostly useful when looking for props, which cannot exist on buffer-allocated nodes).-
nextSibling: SyntaxNode | null This node's next sibling, if any.
-
prevSibling: SyntaxNode | null This node's previous sibling.
-
cursor(mode?: IterMode) → TreeCursor A tree cursor starting at this node.
-
resolve(pos: number, side?: -1 | 0 | 1) → SyntaxNode Find the node around, before (if
sideis -1), or after (sideis 1) the given position. Will look in parent nodes if the position is outside this node.-
resolveInner(pos: number, side?: -1 | 0 | 1) → SyntaxNode Similar to
resolve, but enter overlaid nodes.-
enterUnfinishedNodesBefore(pos: number) → SyntaxNode Move the position to the innermost node before
posthat looks like it is unfinished (meaning it ends in an error node or has a child ending in an error node right at its end).-
toTree() → Tree Get a tree for this node. Will allocate one if it points into a buffer.
-
getChild() → SyntaxNode | null Get the first child of the given type (which may be a node name or a group name). If
beforeis non-null, only return children that occur somewhere after a node with that name or group. Ifafteris non-null, only return children that occur somewhere before a node with that name or group.-
getChildren() → SyntaxNode[] Like
getChild, but return all matching children, not just the first.
-
-
typeNodeIterator Represents a sequence of nodes.
-
node: SyntaxNode -
next: NodeIterator | null
-
-
classTreeCursorimplements SyntaxNodeRef A tree cursor object focuses on a given node in a syntax tree, and allows you to move to adjacent nodes.
-
type: NodeType The node's type.
-
name: string Shorthand for
.type.name.-
from: number The start source offset of this node.
-
to: number The end source offset.
-
firstChild() → boolean Move the cursor to this node's first child. When this returns false, the node has no child, and the cursor has not been moved.
-
lastChild() → boolean Move the cursor to this node's last child.
-
childAfter(pos: number) → boolean Move the cursor to the first child that ends after
pos.-
childBefore(pos: number) → boolean Move to the last child that starts before
pos.-
enter() → boolean Move the cursor to the child around
pos. If side is -1 the child may end at that position, when 1 it may start there. This will also enter overlaid mounted trees unlessoverlaysis set to false.-
parent() → boolean Move to the node's parent node, if this isn't the top node.
-
nextSibling() → boolean Move to this node's next sibling, if any.
-
prevSibling() → boolean Move to this node's previous sibling, if any.
-
next(enter?: boolean = true) → boolean Move to the next node in a pre-order traversal, going from a node to its first child or, if the current node is empty or
enteris false, its next sibling or the next sibling of the first parent node that has one.-
prev(enter?: boolean = true) → boolean Move to the next node in a last-to-first pre-order traversal. A node is followed by its last child or, if it has none, its previous sibling or the previous sibling of the first parent node that has one.
-
moveTo(pos: number, side?: -1 | 0 | 1 = 0) → TreeCursor Move the cursor to the innermost node that covers
pos. Ifsideis -1, it will enter nodes that end atpos. If it is 1, it will enter nodes that start atpos.-
node: SyntaxNode Get a syntax node at the cursor's current position.
-
tree: Tree | null Get the tree that represents the current node, if any. Will return null when the node is in a tree buffer.
-
iterate() Iterate over the current node and all its descendants, calling
enterwhen entering a node andleave, if given, when leaving one. Whenenterreturnsfalse, any children of that node are skipped, andleaveisn't called for it.-
matchContext(context: readonly string[]) → boolean Test whether the current node matches a given context—a sequence of direct parent node names. Empty strings in the context array are treated as wildcards.
-
-
enum IterMode Options that control iteration. Can be combined with the
|operator to enable multiple ones.ExcludeBuffersWhen enabled, iteration will only visit
Treeobjects, not nodes packed intoTreeBuffers.IncludeAnonymousEnable this to make iteration include anonymous nodes (such as the nodes that wrap repeated grammar constructs into a balanced tree).
IgnoreMountsBy default, regular mounted nodes replace their base node in iteration. Enable this to ignore them instead.
IgnoreOverlaysThis option only applies in
enter-style methods. It tells the library to not enter mounted overlays if one covers the given position.
-
classNodeWeakMap<T> Provides a way to associate values with pieces of trees. As long as that part of the tree is reused, the associated values can be retrieved from an updated tree.
-
set(node: SyntaxNode, value: T) Set the value for this syntax node.
-
get(node: SyntaxNode) → T | undefined Retrieve value for this syntax node, if it exists in the map.
-
cursorSet(cursor: TreeCursor, value: T) Set the value for the node that a cursor currently points to.
-
cursorGet(cursor: TreeCursor) → T | undefined Retrieve the value for the node that a cursor currently points to.
-
Node types
-
classNodeType Each node in a syntax tree has a node type associated with it.
-
name: string The name of the node type. Not necessarily unique, but if the grammar was written properly, different node types with the same name within a node set should play the same semantic role.
-
id: number The id of this node in its set. Corresponds to the term ids used in the parser.
-
prop<T>(prop: NodeProp<T>) → T | undefined Retrieves a node prop for this type. Will return
undefinedif the prop isn't present on this node.-
isTop: boolean True when this is the top node of a grammar.
-
isSkipped: boolean True when this node is produced by a skip rule.
-
isError: boolean Indicates whether this is an error node.
-
isAnonymous: boolean When true, this node type doesn't correspond to a user-declared named node, for example because it is used to cache repetition.
-
is(name: string | number) → boolean Returns true when this node's name or one of its groups matches the given string.
-
static define(spec: Object) → NodeType Define a node type.
-
spec -
id: number The ID of the node type. When this type is used in a set, the ID must correspond to its index in the type array.
-
name?: string The name of the node type. Leave empty to define an anonymous node.
-
props?: readonly (NodePropSource | [NodeProp<any>, any])[] Node props to assign to the type. The value given for any given prop should correspond to the prop's type.
-
top?: boolean Whether this is a top node.
-
error?: boolean Whether this node counts as an error node.
-
skipped?: boolean Whether this node is a skipped node.
-
-
-
static none: NodeType An empty dummy node type to use when no actual type is available.
-
static match<T>(map: Object<T>) → fn(node: NodeType) → T | undefined Create a function from node types to arbitrary values by specifying an object whose property names are node or group names. Often useful with
NodeProp.add. You can put multiple names, separated by spaces, in a single property name to map multiple node names to a single value.
-
-
classNodeSet A node set holds a collection of node types. It is used to compactly represent trees by storing their type ids, rather than a full pointer to the type object, in a numeric array. Each parser has a node set, and tree buffers can only store collections of nodes from the same set. A set can have a maximum of 2**16 (65536) node types in it, so that the ids fit into 16-bit typed array slots.
-
new NodeSet(types: readonly NodeType[]) Create a set with the given types. The
idproperty of each type should correspond to its position within the array.-
types: readonly NodeType[] The node types in this set, by id.
-
extend(...props: NodePropSource[]) → NodeSet Create a copy of this set with some node properties added. The arguments to this method can be created with
NodeProp.add.
-
-
classNodeProp<T> Each node type or individual tree can have metadata associated with it in props. Instances of this class represent prop names.
-
new NodeProp(config?: Object = {}) Create a new node prop type.
-
config -
deserialize?: fn(str: string) → T The deserialize function to use for this prop, used for example when directly providing the prop from a grammar file. Defaults to a function that raises an error.
-
combine?: fn(a: T, b: T) → T If configuring another value for this prop when it already exists on a node should combine the old and new values, rather than overwrite the old value, you can pass a function that does the combining here.
-
perNode?: boolean By default, node props are stored in the node type. It can sometimes be useful to directly store information (usually related to the parsing algorithm) in nodes themselves. Set this to true to enable that for this prop.
-
-
-
perNode: boolean Indicates whether this prop is stored per node type or per tree node.
-
deserialize(str: string) → T A method that deserializes a value of this prop from a string. Can be used to allow a prop to be directly written in a grammar file.
-
add() → NodePropSource This is meant to be used with
NodeSet.extendorLRParser.configureto compute prop values for each node type in the set. Takes a match object or function that returns undefined if the node type doesn't get this prop, and the prop's value if it does.-
static closedBy: NodeProp<readonly string[]> Prop that is used to describe matching delimiters. For opening delimiters, this holds an array of node names (written as a space-separated string when declaring this prop in a grammar) for the node types of closing delimiters that match it.
-
static openedBy: NodeProp<readonly string[]> The inverse of
closedBy. This is attached to closing delimiters, holding an array of node names of types of matching opening delimiters.-
static group: NodeProp<readonly string[]> Used to assign node types to groups (for example, all node types that represent an expression could be tagged with an
"Expression"group).-
static isolate: NodeProp<"rtl" | "ltr" | "auto"> Attached to nodes to indicate these should be displayed in a bidirectional text isolate, so that direction-neutral characters on their sides don't incorrectly get associated with surrounding text. You'll generally want to set this for nodes that contain arbitrary text, like strings and comments, and for nodes that appear inside arbitrary text, like HTML tags. When not given a value, in a grammar declaration, defaults to
"auto".-
static contextHash: NodeProp<number> The hash of the context that the node was parsed in, if any. Used to limit reuse of contextual nodes.
-
static lookAhead: NodeProp<number> The distance beyond the end of the node that the tokenizer looked ahead for any of the tokens inside the node. (The LR parser only stores this when it is larger than 25, for efficiency reasons.)
-
static mounted: NodeProp<MountedTree> This per-node prop is used to replace a given node, or part of a node, with another tree. This is useful to include trees from different languages in mixed-language parsers.
-
-
type NodePropSource = fn(type: NodeType) → [NodeProp<any>, any] | null Type returned by
NodeProp.add. Describes whether a prop should be added to a given node type in a node set, and what value it should have.
Buffers
Buffers are an optimization in the way Lezer trees are stored.
-
classTreeBuffer Tree buffers contain (type, start, end, endIndex) quads for each node. In such a buffer, nodes are stored in prefix order (parents before children, with the endIndex of the parent indicating which children belong to it).
-
new TreeBuffer() Create a tree buffer.
-
buffer: Uint16Array The buffer's content.
-
length: number The total length of the group of nodes in the buffer.
-
set: NodeSet The node set used in this buffer.
-
-
DefaultBufferLength: 1024 The default maximum length of a
TreeBuffernode.-
interfaceBufferCursor This is used by
Tree.buildas an abstraction for iterating over a tree buffer. A cursor initially points at the very last element in the buffer. Every timenext()is called it moves on to the previous one.-
pos: number The current buffer position (four times the number of nodes remaining).
-
id: number The node ID of the next node in the buffer.
-
start: number The start position of the next node in the buffer.
-
end: number The end position of the next node.
-
size: number The size of the next node (the number of nodes inside, counting the node itself, times 4).
-
next() Moves
this.posdown by 4.-
fork() → BufferCursor Create a copy of this cursor.
-
Parsing
-
abstract classParser A superclass that parsers should extend.
-
abstract createParse() → PartialParse Start a parse for a single tree. This is the method concrete parser implementations must implement. Called by
startParse, with the optional arguments resolved.-
startParse() → PartialParse Start a parse, returning a partial parse object.
fragmentscan be passed in to make the parse incremental.By default, the entire input is parsed. You can pass
ranges, which should be a sorted array of non-empty, non-overlapping ranges, to parse only those ranges. The tree returned in that case will start atranges[0].from.-
parse() → Tree Run a full parse, returning the resulting tree.
-
-
interfaceInput This is the interface parsers use to access the document. To run Lezer directly on your own document data structure, you have to write an implementation of it.
-
length: number The length of the document.
-
chunk(from: number) → string Get the chunk after the given position. The returned string should start at
fromand, if that isn't the end of the document, may be of any length greater than zero.-
lineChunks: boolean Indicates whether the chunks already end at line breaks, so that client code that wants to work by-line can avoid re-scanning them for line breaks. When this is true, the result of
chunk()should either be a single line break, or the content betweenfromand the next line break.-
read(from: number, to: number) → string Read the part of the document between the given positions.
-
-
interfacePartialParse Interface used to represent an in-progress parse, which can be moved forward piece-by-piece.
-
advance() → Tree | null Advance the parse state by some amount. Will return the finished syntax tree when the parse completes.
-
parsedPos: number The position up to which the document has been parsed. Note that, in multi-pass parsers, this will stay back until the last pass has moved past a given position.
-
stopAt(pos: number) Tell the parse to not advance beyond the given position.
advancewill return a tree when the parse has reached the position. Note that, depending on the parser algorithm and the state of the parse whenstopAtwas called, that tree may contain nodes beyond the position. It is an error to callstopAtwith a higher position than it's current value.-
stoppedAt: number | null Reports whether
stopAthas been called on this parse.
-
-
type ParseWrapper = fn() → PartialParse Parse wrapper functions are supported by some parsers to inject additional parsing logic.
Incremental Parsing
Efficient reparsing happens by reusing parts of the original parsed structure.
-
classTreeFragment Tree fragments are used during incremental parsing to track parts of old trees that can be reused in a new parse. An array of fragments is used to track regions of an old tree whose nodes might be reused in new parses. Use the static
applyChangesmethod to update fragments for document changes.-
new TreeFragment() Construct a tree fragment. You'll usually want to use
addTreeandapplyChangesinstead of calling this directly.-
from: number The start of the unchanged range pointed to by this fragment. This refers to an offset in the updated document (as opposed to the original tree).
-
to: number The end of the unchanged range.
-
tree: Tree The tree that this fragment is based on.
-
offset: number The offset between the fragment's tree and the document that this fragment can be used against. Add this when going from document to tree positions, subtract it to go from tree to document positions.
-
openStart: boolean Whether the start of the fragment represents the start of a parse, or the end of a change. (In the second case, it may not be safe to reuse some nodes at the start, depending on the parsing algorithm.)
-
openEnd: boolean Whether the end of the fragment represents the end of a full-document parse, or the start of a change.
-
static addTree() → readonly TreeFragment[] Create a set of fragments from a freshly parsed tree, or update an existing set of fragments by replacing the ones that overlap with a tree with content from the new tree. When
partialis true, the parse is treated as incomplete, and the resulting fragment hasopenEndset to true.-
static applyChanges() → readonly TreeFragment[] Apply a set of edits to an array of fragments, removing or splitting fragments as necessary to remove edited ranges, and adjusting offsets for fragments that moved.
-
-
interfaceChangedRange The
TreeFragment.applyChangesmethod expects changed ranges in this format.-
fromA: number The start of the change in the start document
-
toA: number The end of the change in the start document
-
fromB: number The start of the replacement in the new document
-
toB: number The end of the replacement in the new document
-
Mixed Parsing
-
parseMixed() → ParseWrapper Create a parse wrapper that, after the inner parse completes, scans its tree for mixed language regions with the
nestfunction, runs the resulting inner parses, and then mounts their results onto the tree.-
interfaceNestedParse Objects returned by the function passed to
parseMixedshould conform to this interface.-
parser: Parser The parser to use for the inner region.
-
overlay?: readonly {from: number, to: number}[] | When this property is not given, the entire node is parsed with this parser, and it is mounted as a non-overlay node, replacing its host node in tree iteration.
When an array of ranges is given, only those ranges are parsed, and the tree is mounted as an overlay.
When a function is given, that function will be called for descendant nodes of the target node, not including child nodes that are covered by another nested parse, to determine the overlay ranges. When it returns true, the entire descendant is included, otherwise just the range given. The mixed parser will optimize range-finding in reused nodes, which means it's a good idea to use a function here when the target node is expected to have a large, deep structure.
-
-
classMountedTree A mounted tree, which can be stored on a tree node to indicate that parts of its content are represented by another tree.
-
new MountedTree() -
tree: Tree The inner tree.
-
overlay: readonly {from: number, to: number}[] | If this is null, this tree replaces the entire node (it will be included in the regular iteration instead of its host node). If not, only the given ranges are considered to be covered by this tree. This is used for trees that are mixed in a way that isn't strictly hierarchical. Such mounted trees are only entered by
resolveInnerandenter.-
parser: Parser The parser used to create this subtree.
-
@lezer/lr module
This package provides an implementation of a GLR parser that works with the parse tables generated by the parser generator.
Parsing
-
classLRParserextends Parser Holds the parse tables for a given grammar, as generated by
lezer-generator, and provides methods to parse content with.-
nodeSet: NodeSet The nodes used in the trees emitted by this parser.
-
configure(config: ParserConfig) → LRParser Configure the parser. Returns a new parser instance that has the given settings modified. Settings not provided in
configare kept from the original parser.-
hasWrappers() → boolean Tells you whether any parse wrappers are registered for this parser.
-
getName(term: number) → string Returns the name associated with a given term. This will only work for all terms when the parser was generated with the
--namesoption. By default, only the names of tagged terms are stored.-
topNode: NodeType The type of top node produced by the parser.
-
-
interfaceParserConfig Configuration options when reconfiguring a parser.
-
props?: readonly NodePropSource[] Node prop values to add to the parser's node set.
-
top?: string The name of the
@topdeclaration to parse from. If not specified, the first top rule declaration in the grammar is used.-
dialect?: string A space-separated string of dialects to enable.
-
tokenizers?: {from: ExternalTokenizer, to: ExternalTokenizer}[] Replace the given external tokenizers with new ones.
-
specializers?: {}[] Replace external specializers with new ones.
-
contextTracker?: ContextTracker<any> Replace the context tracker with a new one.
-
strict?: boolean When true, the parser will raise an exception, rather than run its error-recovery strategies, when the input doesn't match the grammar.
-
wrap?: ParseWrapper Add a wrapper, which can extend parses created by this parser with additional logic (usually used to add mixed-language parsing).
-
bufferLength?: number The maximum length of the TreeBuffers generated in the output tree. Defaults to 1024.
-
-
classStack A parse stack. These are used internally by the parser to track parsing progress. They also provide some properties and methods that external code such as a tokenizer can use to get information about the parse state.
-
pos: number The input position up to which this stack has parsed.
-
context: any The stack's current context value, if any. Its type will depend on the context tracker's type parameter, or it will be
nullif there is no context tracker.-
canShift(term: number) → boolean Check if the given term would be able to be shifted (optionally after some reductions) on this stack. This can be useful for external tokenizers that want to make sure they only provide a given token when it applies.
-
parser: LRParser Get the parser used by this stack.
-
dialectEnabled(dialectID: number) → boolean Test whether a given dialect (by numeric ID, as exported from the terms file) is enabled.
-
Tokenizers
-
classInputStream Tokenizers interact with the input through this interface. It presents the input as a stream of characters, tracking lookahead and hiding the complexity of ranges from tokenizer code.
-
next: number The character code of the next code unit in the input, or -1 when the stream is at the end of the input.
-
pos: number The current position of the stream. Note that, due to parses being able to cover non-contiguous ranges, advancing the stream does not always mean its position moves a single unit.
-
peek(offset: number) → number Look at a code unit near the stream position.
.peek(0)equals.next,.peek(-1)gives you the previous character, and so on.Note that looking around during tokenizing creates dependencies on potentially far-away content, which may reduce the effectiveness incremental parsing—when looking forward—or even cause invalid reparses when looking backward more than 25 code units, since the library does not track lookbehind.
-
acceptToken(token: number, endOffset?: number = 0) Accept a token. By default, the end of the token is set to the current stream position, but you can pass an offset (relative to the stream position) to change that.
-
acceptTokenTo(token: number, endPos: number) Accept a token ending at a specific given position.
-
advance(n?: number = 1) → number Move the stream forward N (defaults to 1) code units. Returns the new value of
next.
-
-
classExternalTokenizer @external tokensdeclarations in the grammar should resolve to an instance of this class.-
new ExternalTokenizer() Create a tokenizer. The first argument is the function that, given an input stream, scans for the types of tokens it recognizes at the stream's position, and calls
acceptTokenwhen it finds one.-
options -
contextual?: boolean When set to true, mark this tokenizer as depending on the current parse stack, which prevents its result from being cached between parser actions at the same positions.
-
fallback?: boolean By defaults, when a tokenizer returns a token, that prevents tokenizers with lower precedence from even running. When
fallbackis true, the tokenizer is allowed to run when a previous tokenizer returned a token that didn't match any of the current state's actions.-
extend?: boolean When set to true, tokenizing will not stop after this tokenizer has produced a token. (But it will still fail to reach this one if a higher-precedence tokenizer produced a token.)
-
-
-
-
classContextTracker<T> Context trackers are used to track stateful context (such as indentation in the Python grammar, or parent elements in the XML grammar) needed by external tokenizers. You declare them in a grammar file as
@context exportName from "module".Context values should be immutable, and can be updated (replaced) on shift or reduce actions.
The export used in a
@contextdeclaration should be of this type.-
new ContextTracker(spec: Object) Define a context tracker.
-
spec -
start: T The initial value of the context at the start of the parse.
-
shift?: fn() → T Update the context when the parser executes a shift action.
-
reduce?: fn() → T Update the context when the parser executes a reduce action.
-
reuse?: fn() → T Update the context when the parser reuses a node from a tree fragment.
-
hash?: fn(context: T) → number Reduce a context value to a number (for cheap storage and comparison). Only needed for strict contexts.
-
strict?: boolean By default, nodes can only be reused during incremental parsing if they were created in the same context as the one in which they are reused. Set this to false to disable that check (and the overhead of storing the hashes).
-
-
-
@lezer/highlight module
This package provides a vocabulary for syntax-highlighting code based on a Lezer syntax tree.
-
classTag Highlighting tags are markers that denote a highlighting category. They are associated with parts of a syntax tree by a language mode, and then mapped to an actual CSS style by a highlighter.
Because syntax tree node types and highlight styles have to be able to talk the same language, CodeMirror uses a mostly closed vocabulary of syntax tags (as opposed to traditional open string-based systems, which make it hard for highlighting themes to cover all the tokens produced by the various languages).
It is possible to define your own highlighting tags for system-internal use (where you control both the language package and the highlighter), but such tags will not be picked up by regular highlighters (though you can derive them from standard tags to allow highlighters to fall back to those).
-
set: Tag[] The set of this tag and all its parent tags, starting with this one itself and sorted in order of decreasing specificity.
-
toString() → string -
static define(name?: string, parent?: Tag) → Tag Define a new tag. If
parentis given, the tag is treated as a sub-tag of that parent, and highlighters that don't mention this tag will try to fall back to the parent tag (or grandparent tag, etc).-
static defineModifier(name?: string) → fn(tag: Tag) → Tag Define a tag modifier, which is a function that, given a tag, will return a tag that is a subtag of the original. Applying the same modifier to a twice tag will return the same value (
m1(t1) == m1(t1)) and applying multiple modifiers will, regardless or order, produce the same tag (m1(m2(t1)) == m2(m1(t1))).When multiple modifiers are applied to a given base tag, each smaller set of modifiers is registered as a parent, so that for example
m1(m2(m3(t1)))is a subtype ofm1(m2(t1)),m1(m3(t1), and so on.
-
The default set of highlighting tags.
This collection is heavily biased towards programming languages, and necessarily incomplete. A full ontology of syntactic constructs would fill a stack of books, and be impractical to write themes for. So try to make do with this set. If all else fails, open an issue to propose a new tag, or define a local custom tag for your use case.
Note that it is not obligatory to always attach the most specific tag possible to an element—if your grammar can't easily distinguish a certain type of element (such as a local variable), it is okay to style it as its more general variant (a variable).
For tags that extend some parent tag, the documentation links to the parent.
A comment.
A line comment.
A block comment.
A documentation comment.
Any kind of identifier.
The name of a variable.
A type name.
A tag name (subtag of
typeName).A property or field name.
An attribute name (subtag of
propertyName).The name of a class.
A label name.
A namespace name.
The name of a macro.
A literal value.
A string literal.
A documentation string.
A character literal (subtag of string).
An attribute value (subtag of string).
A number literal.
An integer number literal.
A floating-point number literal.
A boolean literal.
Regular expression literal.
An escape literal, for example a backslash escape in a string.
A color literal.
A URL literal.
A language keyword.
The keyword for the self or this object.
The keyword for null.
A keyword denoting some atomic value.
A keyword that represents a unit.
A modifier keyword.
A keyword that acts as an operator.
A control-flow related keyword.
A keyword that defines something.
A keyword related to defining or interfacing with modules.
An operator.
An operator that dereferences something.
Arithmetic-related operator.
Logical operator.
Bit operator.
Comparison operator.
Operator that updates its operand.
Operator that defines something.
Type-related operator.
Control-flow operator.
Program or markup punctuation.
Punctuation that separates things.
Bracket-style punctuation.
Angle brackets (usually
<and>tokens).Square brackets (usually
[and]tokens).Parentheses (usually
(and)tokens). Subtag of bracket.Braces (usually
{and}tokens). Subtag of bracket.Content, for example plain text in XML or markup documents.
Content that represents a heading.
A level 1 heading.
A level 2 heading.
A level 3 heading.
A level 4 heading.
A level 5 heading.
A level 6 heading.
A prose content separator (such as a horizontal rule).
Content that represents a list.
Content that represents a quote.
Content that is emphasized.
Content that is styled strong.
Content that is part of a link.
Content that is styled as code or monospace.
Content that has a strike-through style.
Inserted text in a change-tracking format.
Deleted text.
Changed text.
An invalid or unsyntactic element.
Metadata or meta-instruction.
Metadata that applies to the entire document.
Metadata that annotates or adds attributes to a given syntactic element.
Processing instruction or preprocessor directive. Subtag of meta.
Modifier that indicates that a given element is being defined. Expected to be used with the various name tags.
Modifier that indicates that something is constant. Mostly expected to be used with variable names.
Modifier used to indicate that a variable or property name is being called or defined as a function.
Modifier that can be applied to names to indicate that they belong to the language's standard environment.
Modifier that indicates a given names is local to some scope.
A generic variant modifier that can be used to tag language-specific alternative variants of some common tag. It is recommended for themes to define special forms of at least the string and variable name tags, since those come up a lot.
-
styleTags(spec: Object<Tag | readonly Tag[]>) → NodePropSource This function is used to add a set of tags to a language syntax via
NodeSet.extendorLRParser.configure.The argument object maps node selectors to highlighting tags or arrays of tags.
Node selectors may hold one or more (space-separated) node paths. Such a path can be a node name, or multiple node names (or
*wildcards) separated by slash characters, as in"Block/Declaration/VariableName". Such a path matches the final node but only if its direct parent nodes are the other nodes mentioned. A*in such a path matches any parent, but only a single level—wildcards that match multiple parents aren't supported, both for efficiency reasons and because Lezer trees make it rather hard to reason about what they would match.)A path can be ended with
/...to indicate that the tag assigned to the node should also apply to all child nodes, even if they match their own style (by default, only the innermost style is used).When a path ends in
!, as inAttribute!, no further matching happens for the node's child nodes, and the entire node gets the given style.In this notation, node names that contain
/,!,*, or...must be quoted as JSON strings.For example:
parser.configure({props: [ styleTags({ // Style Number and BigNumber nodes "Number BigNumber": tags.number, // Style Escape nodes whose parent is String "String/Escape": tags.escape, // Style anything inside Attributes nodes "Attributes!": tags.meta, // Add a style to all content inside Italic nodes "Italic/...": tags.emphasis, // Style InvalidString nodes as both `string` and `invalid` "InvalidString": [tags.string, tags.invalid], // Style the node named "/" as punctuation '"/"': tags.punctuation }) ]})-
getStyleTags(node: SyntaxNodeRef) → {tags: readonly Tag[], opaque: boolean, inherit: boolean} | Match a syntax node's highlight rules. If there's a match, return its set of tags, and whether it is opaque (uses a
!) or applies to all child nodes (/...).-
interfaceHighlighter A highlighter defines a mapping from highlighting tags and language scopes to CSS class names. They are usually defined via
tagHighlighteror some wrapper around that, but it is also possible to implement them from scratch.-
style(tags: readonly Tag[]) → string | null Get the set of classes that should be applied to the given set of highlighting tags, or null if this highlighter doesn't assign a style to the tags.
-
scope?: fn(node: NodeType) → boolean When given, the highlighter will only be applied to trees on whose top node this predicate returns true.
-
-
tagHighlighter() → Highlighter Define a highlighter from an array of tag/class pairs. Classes associated with more specific tags will take precedence.
-
options -
scope?: fn(node: NodeType) → boolean By default, highlighters apply to the entire document. You can scope them to a single language by providing the tree's top node type here.
-
all?: string Add a style to all tokens. Probably only useful in combination with
scope.
-
-
-
highlightCode(putBreak: fn(),) Highlight the given tree with the given highlighter, calling
putTextfor every piece of text, either with a set of classes or with the empty string when unstyled, andputBreakfor every line break.-
highlightTree() Highlight the given tree with the given highlighter. Often, the higher-level
highlightCodefunction is easier to use.-
putStyle(from: number, to: number, classes: string) Assign styling to a region of the text. Will be called, in order of position, for any ranges where more than zero classes apply.
classesis a space separated string of CSS classes.-
from The start of the range to highlight.
-
to The end of the range.
-
-
classHighlighter: Highlighter This is a highlighter that adds stable, predictable classes to tokens, for styling with external CSS.
The following tags are mapped to their name prefixed with
"tok-"(for example"tok-comment"):linkheadingemphasisstrongkeywordatomboolurllabelNameinserteddeletedliteralstringnumbervariableNametypeNamenamespaceclassNamemacroNamepropertyNameoperatorcommentmetapunctuationinvalid
In addition, these mappings are provided:
regexp,escape, andspecial(string)are mapped to"tok-string2"special(variableName)to"tok-variableName2"local(variableName)to"tok-variableName tok-local"definition(variableName)to"tok-variableName tok-definition"definition(propertyName)to"tok-propertyName tok-definition"
@lezer/generator module
The parser generator is usually ran through its command-line interface, but can also be invoked as a JavaScript function.
-
typeBuildOptions -
fileName?: string The name of the grammar file
-
warn?: fn(message: string) A function that should be called with warnings. The default is to call
console.warn.-
includeNames?: boolean Whether to include term names in the output file. Defaults to false.
-
moduleStyle?: string Determines the module system used by the output file. Can be either
"cjs"(CommonJS) or"es"(ES2015 module), defaults to"es".-
typeScript?: boolean Set this to true to output TypeScript code instead of plain JavaScript.
-
exportName?: string The name of the export that holds the parser in the output file. Defaults to
"parser".-
externalTokenizer?: fn(name: string, terms: Object<number>) → ExternalTokenizer When calling
buildParser, this can be used to provide placeholders for external tokenizers.-
externalPropSource?: fn(name: string) → NodePropSource Used by
buildParserto resolve external prop sources.-
externalSpecializer?: fn(name: string, terms: Object<number>) → fn(value: string, stack: Stack) → number Provide placeholders for external specializers when using
buildParser.-
externalProp?: fn(name: string) → NodeProp<any> If given, will be used to initialize external props in the parser returned by
buildParser.-
contextTracker?: ContextTracker<any> | If given, will be used as context tracker in a parser built with
buildParser.
-
-
buildParserFile(text: string, options?: BuildOptions = {}) → {parser: string, terms: string} Build the code that represents the parser tables for a given grammar description. The
parserproperty in the return value holds the main file that exports theParserinstance. Thetermsproperty holds a declaration file that defines constants for all of the named terms in grammar, holding their ids as value. This is useful when external code, such as a tokenizer, needs to be able to use these ids. It is recommended to run a tree-shaking bundler when importing this file, since you usually only need a handful of the many terms in your code.-
buildParser(text: string, options?: BuildOptions = {}) → LRParser Build an in-memory parser instance for a given grammar. This is mostly useful for testing. If your grammar uses external tokenizers, you'll have to provide the
externalTokenizeroption for the returned parser to be able to parse anything.-
classGenErrorextends Error The type of error raised when the parser generator finds an issue.