-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
[WIP] Tree-sitter integration #9219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implemented rough prototype of callback based syntax highlighing (rough as in very rough, multiline hl:s are not supported, for instance). Except for the highlight for user types there shouldn't be any difference in this screenshot. Preliminary unscientific benchmarks shows that this naive implementation is "at least not much slower, probably somewhat faster" than the regex highlighting, but more work needs to be done here. My current |
@bfredl it would be nice to be able to run the highlighting engine in 'pass-through' mode like e.g: https://github.com/sharkdp/bat, so that nvim can be used as a 'cat' replacement with syntax highlightning. That would also allow this: https://www.reddit.com/r/vim/comments/9xpb18/file_preview_with_fzf_rg_bat_and_devicons/ without the need for an external program (and consistent highlightning). |
@ilAYAli That sounds orthogonal, like something that we would enable at the API/UI layer rather than anything specific to this feature. |
@justinmk yes, it is somewhat orthogonal. However, just invoking the highlightning engine with a vim compatible colorscheme seems like a natural development step/unit test. |
8954067
to
bd8b51e
Compare
I began work on proper lua API for tree-sitter trees and nodes. The simple Also the parsing part is not ported yet, as I'm not sure of the division between C and lua. Maybe we should just do the parsing and change tracking in C, keeping a parser per |
Now it should no longer leak trees. Fixing build system will be next. |
@bfredl One of my constant pains with vim/neovim compared to using IDE was auto indentation and adding closing parens and other things (like closing XML tags). There are plugins for that, but none of them work reliably enough because to work reliably one should use AST and of course no plugin do that. One option would be to use language server to do all this, but I think vim/neovim should be able to do such basic tasks out of the box and using lazy incremental parser is a good fit. |
I like the sound of this. If the syntax tree is available thru an API somehow, it should be fairly trivial to create text objects representing AST nodes. e.g. motions Excited about this exploration and what future capabilities it could unlock (like semantic syntax highlighting)! |
@bfredl I've started to look at I would like to know what do you guys think about that. I don't see any reason for using vim's ad-hoc parsing when we can finally use fast proofed solution. This should of course be an option turn off by default for a long time I guess, because a lot of people have custom syntax files (I certainly do). Since this is my biggest vim pain I'm quite motivated to do this work. Finally consistent syntax folding, consistent syntax highlighting, consistent matchpairs (My plan is to maybe move matchpairs functionality in |
@Breja that will indeed be possible as a lua plugin
Currently I'm using linewise resolution, but it is of course sub-efficient, and inconvenient as tree-sitter also needs raw byte indicies. I think the code in #5031 to do byte adjustments of marks will be useful, from that we should be able to extract buffer updates at the byte resolution. I hope to get back to this soon, this will my next priority after finishing off the major loose ends of the ext-ui work (together with extmarks, but as mentioned the work is overlapping). |
@purpleP my primitive prototype for highlighting is at https://github.com/neovim/neovim/pull/9219/files#diff-729c310c1113b6a293dabed439428069R166, it can only do plain mapping from node type names to highlight groups. If we want more complex rules than that, which we most likely do (i e only hl node A when it is part of node B etc), that could be a good place to work on. |
Really excited to see how this plays out. I have been beyond impressed with neovim and have loved learning vim through it. My main painpoint with using it has been syntax though. Term colors have gotten so much better but it seems highlighting has always been off in vim themes to me. Once I saw tree-sitter in atom I now notice how off all other colorization patterns are. Hope this keeps up! |
@bfredl In case this is helpful, there is now a reference implementation of syntax highlighting within the Tree-sitter repo itself. And we're starting to include syntax highlighting configuration files in the parser repos (e.g. I'm not sure if you can use the library directly, because it's a Rust crate, but if you want, you could write your own code to consume the same highlighting config files. The library is here: And here's an example configuration file for JavaScript. The syntax highlighting is specified using CSS. |
@maxbrunsfeld Great to see! The current public API looks a bit limited, what are the plans to handle changing documents? If the viewport of edited text is line 500-550, would the model allow to efficiently start highlighting at line 500 (by starting from the root, but don't descend into subtrees that can be determined to be irrellevant)? An alternative I guess is for the consumer to save/restore the state machine at relevant points. Also What is the code for generating the JSON state machine, is it |
Yeah, it is very limited currently. The Rust implementation was developed for an internal project where we deal with static documents only. But I would like to generalize the API to allow for highlighting specific regions, and consuming existing (edited) syntax trees.
Yes, exactly. That's exactly what we do in Atom's syntax highlighting. It would be pretty easy to add some more flexible lower level APIs to the Rust library as well, I just haven't had a reason to add them yet. I may do it myself if I have time at some point. I'd also accept a PR for it if anyone else is interested in diving in.
Yeah, we did something more flexible when we implemented the Tree-sitter syntax highlighting for Atom, because we wanted to produce highlighting that was compatible with existing TextMate themes. Basically, we allowed the For my current application, it's especially helpful to have a fixed list of allowed scopes, because they all need to be styled by one theme. In order to allow for more fine-grained styling, (like coloring of triple-quoted strings in python), I really like CodeMirror's approach of having very general constructs like Anyway, that's my current thinking. It's definitely subject to change. I'd love for this construct to be useful for things like NeoVim. The current list of scopes is especially a WIP.
Yeah, sorry that this isn't documented yet. It is indeed
And when tree-sitter/tree-sitter#271 (fields names for syntax nodes) lands, you'll also be able to select nodes based on their field name using class syntax: |
@maxbrunsfeld Thanks for all the info. I'm not sure whether we would use the specific rust implementation (so far we essentially only have C/C++ and lua deps for runtime), but supporting the same model/file format would definitely make sense.
I mostly agree, but correctly used these detailed names bring flexibility to the user. The user can change the links from specific groups to general groups, without modifying either the syntax file or the color scheme file.
I think vim already handles this quite well, there is a set of ~30 canonical groups (as defined by
Nice, thanks! |
Update: now the build with bundled deps will download and include Also the GC code stopped working suddenly, which is weird as I didn't change the code.. |
+1 for inlining it if we have a way of comparing upstream changes. Ideally upstream provides the amalgamation file, else a script does it. If treesitter's API (not to mention ABI) will change frequently, it is not worth our time to dance around systems with old versions of the treesitter object files. OTOH, we can always make that choice later. |
super excited for this. awesome work. |
|
||
|
||
#set(TREESITTER_URL https://github.com/tree-sitter/tree-sitter/archive/f30485f.tar.gz) | ||
#set(TREESITTER_SHA256 cfe00c0b6f423adb082d6a5747d62eef96aab3d4aa9bd3f694524f0639ab272b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to remove that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#10124 is the PR that will be merged first. I will clean-up/rebase this one after that (or probably open a new one, as github interface breaks with long issues/PRs).
NB: more a tracking issue than any code that one would actually want to merge/use for the moment. Though it is possible to run a very hacky prototype by cloning tree-sitter/tree-sitter and tree-sitter/tree-sitter-c and JuliaStrings/utf8proc and placing them next to the
neovim/
repo root (no build step for these), and building from this branch. The:TSTest
command starts a.c
parser that tracks edits and displays the node at the cursor (<Plug>(ts-expand)
to traverse upwards).The current demo uses luajit ffi for prototyping. We probably want to change to C API bindings as (1) luajit is not available everywhere nvim is (2) we would need to write wrappers anyway for typesafety/gc, could just as well do them in C. (also luajit ffi has this 80% done feeling to it, with arbitrary limitations to conversion and signatures, and unhelpful error messages when you reach the limits).
The goal is to make the syntax nodes accessible as lua objects, but other than that the design is quite open. We could add all the needed extension points (such as #9170) to core and have tree-sitter functionality essentially be a included lua plugin. Alternatively all buffer/parser management logic could be written in C, and only the end result being exposed to lua.
win_update
? benchmarking needed)Ref #1767 (comment) and below.