Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

bfredl
Copy link
Member

@bfredl bfredl commented Nov 8, 2018

NB: more a tracking issue than any code that one would actually want to merge/use for the moment. Though it is possible to run a very hacky prototype by cloning tree-sitter/tree-sitter and tree-sitter/tree-sitter-c and JuliaStrings/utf8proc and placing them next to the neovim/ repo root (no build step for these), and building from this branch. The :TSTest command starts a .c parser that tracks edits and displays the node at the cursor ( <Plug>(ts-expand) to traverse upwards).

The current demo uses luajit ffi for prototyping. We probably want to change to C API bindings as (1) luajit is not available everywhere nvim is (2) we would need to write wrappers anyway for typesafety/gc, could just as well do them in C. (also luajit ffi has this 80% done feeling to it, with arbitrary limitations to conversion and signatures, and unhelpful error messages when you reach the limits).

The goal is to make the syntax nodes accessible as lua objects, but other than that the design is quite open. We could add all the needed extension points (such as #9170) to core and have tree-sitter functionality essentially be a included lua plugin. Alternatively all buffer/parser management logic could be written in C, and only the end result being exposed to lua.

  • sane buildsystem integration (bundled dependency? TS own build system requires python2.7, but one can write a cmake substitute for libruntime only)
  • story for out-of-tree language support, build to .so files? Or just bundle everything?
  • safe lua bindings (including GC)
  • logic for maintaining a parser+current tree per buffer for supported filetypes
  • support for nested languages (python in vimL, etc)
  • syntax highlighting (by lua callback in win_update? benchmarking needed)

Ref #1767 (comment) and below.

@bfredl
Copy link
Member Author

bfredl commented Nov 10, 2018

screenshot from 2018-11-10 16-42-15

Implemented rough prototype of callback based syntax highlighing (rough as in very rough, multiline hl:s are not supported, for instance). Except for the highlight for user types there shouldn't be any difference in this screenshot. Preliminary unscientific benchmarks shows that this naive implementation is "at least not much slower, probably somewhat faster" than the regex highlighting, but more work needs to be done here. My current perf+flamegraph.pl setup doesn't play nicely with luajit, needs to find something better there.

@ilAYAli
Copy link
Contributor

ilAYAli commented Nov 22, 2018

@bfredl it would be nice to be able to run the highlighting engine in 'pass-through' mode like e.g: https://github.com/sharkdp/bat, so that nvim can be used as a 'cat' replacement with syntax highlightning. That would also allow this: https://www.reddit.com/r/vim/comments/9xpb18/file_preview_with_fzf_rg_bat_and_devicons/ without the need for an external program (and consistent highlightning).

@justinmk
Copy link
Member

@ilAYAli That sounds orthogonal, like something that we would enable at the API/UI layer rather than anything specific to this feature.

@ilAYAli
Copy link
Contributor

ilAYAli commented Nov 22, 2018

@justinmk yes, it is somewhat orthogonal. However, just invoking the highlightning engine with a vim compatible colorscheme seems like a natural development step/unit test.
It would be a very nice feature, but I will not clutter this thread further with unrelated feature requests ;)

@bfredl bfredl mentioned this pull request Nov 26, 2018
31 tasks
@bfredl bfredl force-pushed the tree-sitter branch 2 times, most recently from 8954067 to bd8b51e Compare December 7, 2018 19:11
@bfredl
Copy link
Member Author

bfredl commented Dec 7, 2018

I began work on proper lua API for tree-sitter trees and nodes. The simple TSCursor and TSSyntax test cases now uses it. However GC is not implemented yet, it will continuously leak trees (though it shouldn't be hard to fix).

Also the parsing part is not ported yet, as I'm not sure of the division between C and lua. Maybe we should just do the parsing and change tracking in C, keeping a parser per buf_T. Though we might want to keep lua involved somehow for multi-language buffers.

@bfredl
Copy link
Member Author

bfredl commented Dec 8, 2018

Now it should no longer leak trees. Fixing build system will be next.

@purpleP
Copy link

purpleP commented Dec 8, 2018

@bfredl One of my constant pains with vim/neovim compared to using IDE was auto indentation and adding closing parens and other things (like closing XML tags). There are plugins for that, but none of them work reliably enough because to work reliably one should use AST and of course no plugin do that.

One option would be to use language server to do all this, but I think vim/neovim should be able to do such basic tasks out of the box and using lazy incremental parser is a good fit.

@ghost
Copy link

ghost commented Jan 8, 2019

The goal is to make the syntax nodes accessible as lua objects, but other than that the design is quite open.

I like the sound of this. If the syntax tree is available thru an API somehow, it should be fairly trivial to create text objects representing AST nodes.

e.g. motions cif or dif could operate on a function body consistently across all language syntax styles. Even in Javascript there are multiple ways to define a function which make it difficult to regex for.

Excited about this exploration and what future capabilities it could unlock (like semantic syntax highlighting)!

@purpleP
Copy link

purpleP commented Jan 8, 2019

@bfredl I've started to look at highlight.c and syntax.c. I don't know what anyone thinks about this, but IMHO for supported languages the right thing to do is to switch to tree-sitter instead of existing vim ad-hoc solution.
I can't guarantee that I will have time to implement this, but If someone will tell me where to look exactly this seems to be quite trivial task.
Judging by vim's ability to highlight file as you type it already tries to parse buffer when it's changing. So the only thing that is missing is maybe the way to get range of changed characters which shouldn't be hard to implement.

I would like to know what do you guys think about that. I don't see any reason for using vim's ad-hoc parsing when we can finally use fast proofed solution. This should of course be an option turn off by default for a long time I guess, because a lot of people have custom syntax files (I certainly do).

Since this is my biggest vim pain I'm quite motivated to do this work. Finally consistent syntax folding, consistent syntax highlighting, consistent matchpairs (My plan is to maybe move matchpairs functionality in .c code somewhere depending on what's more logical to do. It is in .vim internal plugin right now isn't it?) and auto-closing of parenthesis, brackets and tags out of the box.

@bfredl
Copy link
Member Author

bfredl commented Jan 8, 2019

@Breja that will indeed be possible as a lua plugin

@purpleP

So the only thing that is missing is maybe the way to get range of changed characters which shouldn't be hard to implement.

Currently I'm using linewise resolution, but it is of course sub-efficient, and inconvenient as tree-sitter also needs raw byte indicies. I think the code in #5031 to do byte adjustments of marks will be useful, from that we should be able to extract buffer updates at the byte resolution.

I hope to get back to this soon, this will my next priority after finishing off the major loose ends of the ext-ui work (together with extmarks, but as mentioned the work is overlapping).

@bfredl
Copy link
Member Author

bfredl commented Jan 8, 2019

@purpleP my primitive prototype for highlighting is at https://github.com/neovim/neovim/pull/9219/files#diff-729c310c1113b6a293dabed439428069R166, it can only do plain mapping from node type names to highlight groups. If we want more complex rules than that, which we most likely do (i e only hl node A when it is part of node B etc), that could be a good place to work on.

@skbolton
Copy link

Really excited to see how this plays out. I have been beyond impressed with neovim and have loved learning vim through it. My main painpoint with using it has been syntax though. Term colors have gotten so much better but it seems highlighting has always been off in vim themes to me. Once I saw tree-sitter in atom I now notice how off all other colorization patterns are. Hope this keeps up!

@maxbrunsfeld
Copy link

@bfredl In case this is helpful, there is now a reference implementation of syntax highlighting within the Tree-sitter repo itself. And we're starting to include syntax highlighting configuration files in the parser repos (e.g. tree-sitter-ruby) directly.

I'm not sure if you can use the library directly, because it's a Rust crate, but if you want, you could write your own code to consume the same highlighting config files. The library is here:
https://github.com/tree-sitter/tree-sitter/tree/master/highlight

And here's an example configuration file for JavaScript. The syntax highlighting is specified using CSS.
This CSS file is compiled into a JSON state machine, which the highlighting library consumes, and allows the properties to be matched in constant time while walking the tree.
https://github.com/tree-sitter/tree-sitter-javascript/blob/master/properties/highlights.css.
https://github.com/tree-sitter/tree-sitter-javascript/blob/master/src/highlights.json

@bfredl
Copy link
Member Author

bfredl commented Feb 23, 2019

@maxbrunsfeld Great to see! The current public API looks a bit limited, what are the plans to handle changing documents? If the viewport of edited text is line 500-550, would the model allow to efficiently start highlighting at line 500 (by starting from the root, but don't descend into subtrees that can be determined to be irrellevant)? An alternative I guess is for the consumer to save/restore the state machine at relevant points.

Also highlight.rs hardcodes a list of ~30 predefined scopes, this seems a bit inflexible. Traditionally, vim syntax files define quite fine grained highlight groups, but will fallback links. So the color scheme is allowed to define PythonTripleQuotes, but if it doesn't, String will be used.

What is the code for generating the JSON state machine, is it cli/src/properties.rs? What subset of css selectors are implemented? I guess docs are WIP, but just a list or somehing what is supported would be nice.

@maxbrunsfeld
Copy link

maxbrunsfeld commented Feb 25, 2019

The current public API looks a bit limited, what are the plans to handle changing documents?

Yeah, it is very limited currently. The Rust implementation was developed for an internal project where we deal with static documents only. But I would like to generalize the API to allow for highlighting specific regions, and consuming existing (edited) syntax trees.

If the viewport of edited text is line 500-550, would the model allow to efficiently start highlighting at line 500 (by starting from the root, but don't descend into subtrees that can be determined to be irrellevant)?

Yes, exactly. That's exactly what we do in Atom's syntax highlighting. It would be pretty easy to add some more flexible lower level APIs to the Rust library as well, I just haven't had a reason to add them yet. I may do it myself if I have time at some point. I'd also accept a PR for it if anyone else is interested in diving in.

Also highlight.rs hardcodes a list of ~30 predefined scopes, this seems a bit inflexible. Traditionally, vim syntax files define quite fine grained highlight groups, but will fallback links.

Yeah, we did something more flexible when we implemented the Tree-sitter syntax highlighting for Atom, because we wanted to produce highlighting that was compatible with existing TextMate themes. Basically, we allowed the scope values to be arbitrary TextMate scope strings. But dealing with Atom's themes has been a bit frustrating for me. I've come to think it would be better if there were a simple, flat list of available scopes, and themes simply mapped each scope to a color. I really dislike that so many Vim themes and Atom themes are tightly coupled to specific syntax packages (applying styles to language-specific constructs like PythonTripleQuotes).

For my current application, it's especially helpful to have a fixed list of allowed scopes, because they all need to be styled by one theme. In order to allow for more fine-grained styling, (like coloring of triple-quoted strings in python), I really like CodeMirror's approach of having very general constructs like variable, variable-2, string, and string-2.

Anyway, that's my current thinking. It's definitely subject to change. I'd love for this construct to be useful for things like NeoVim. The current list of scopes is especially a WIP.

What is the code for generating the JSON state machine, is it cli/src/properties.rs? What subset of css selectors are implemented?

Yeah, sorry that this isn't documented yet. It is indeed properties.rs. The supported selectors are:

  • descendant selectors (foo bar)
  • immediate child selectors (foo > bar)
  • the SCSS & operator for nesting selectors (& > bar within a block)
  • the token attribute for matching anonymous nodes ([token="{"])
  • the text attribute for matching the text of a leaf node with a regex ([text='^[A-Z]'])
  • the :nth-child pseudo selector (identifier:nth-child(2))

And when tree-sitter/tree-sitter#271 (fields names for syntax nodes) lands, you'll also be able to select nodes based on their field name using class syntax: function > .name.

@bfredl
Copy link
Member Author

bfredl commented Feb 25, 2019

@maxbrunsfeld Thanks for all the info. I'm not sure whether we would use the specific rust implementation (so far we essentially only have C/C++ and lua deps for runtime), but supporting the same model/file format would definitely make sense.

I really dislike that so many Vim themes and Atom themes are tightly coupled to specific syntax packages (applying styles to language-specific constructs like PythonTripleQuotes).

I mostly agree, but correctly used these detailed names bring flexibility to the user. The user can change the links from specific groups to general groups, without modifying either the syntax file or the color scheme file.

it's especially helpful to have a fixed list of allowed scopes, because they all need to be styled by one theme.

I think vim already handles this quite well, there is a set of ~30 canonical groups (as defined by runtime/syntax/syncolor.vim), and if a syntax definition uses anything else, it is supposed to define default links, so a standard color scheme is functional.

The supported selectors are: ...

Nice, thanks!

@bfredl
Copy link
Member Author

bfredl commented Mar 24, 2019

Update: now the build with bundled deps will download and include tree-sitter automatically. It seems a bit messy though, maybe some of the build system people have a better idea to integrate it, also for non-bundled builds. I guess it depends on what story we want for distro builds. Currently I'm using the amalgamation-like strategy of adding tree-sitter/lib/src/lib.c as a single source file, which expects utf8proc.c to also be in the include path.

Also the GC code stopped working suddenly, which is weird as I didn't change the code..

@justinmk
Copy link
Member

it depends on what story we want for distro builds. Currently I'm using the amalgamation-like strategy of adding tree-sitter/lib/src/lib.c as a single source file

+1 for inlining it if we have a way of comparing upstream changes. Ideally upstream provides the amalgamation file, else a script does it.

If treesitter's API (not to mention ABI) will change frequently, it is not worth our time to dance around systems with old versions of the treesitter object files. OTOH, we can always make that choice later.

@asilvadesigns
Copy link

super excited for this. awesome work.



#set(TREESITTER_URL https://github.com/tree-sitter/tree-sitter/archive/f30485f.tar.gz)
#set(TREESITTER_SHA256 cfe00c0b6f423adb082d6a5747d62eef96aab3d4aa9bd3f694524f0639ab272b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to remove that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10124 is the PR that will be merged first. I will clean-up/rebase this one after that (or probably open a new one, as github interface breaks with long issues/PRs).

@bfredl
Copy link
Member Author

bfredl commented Sep 28, 2019

Initial support was merged in #10124. Work and discussion will continue in #11113. I will port over the existing syntax demo, but using queries instead.

@bfredl bfredl closed this Sep 28, 2019
@neovim neovim locked as resolved and limited conversation to collaborators Sep 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants