Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ychin
Copy link
Contributor

@ychin ychin commented Mar 13, 2025

Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary.

Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts.

This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences.

The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding.

For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise.

For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly.

The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters.

Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified.

This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future.

As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in #16768.


Notes

Marking this as draft/WIP as I am still fixing up a couple minor issues, and need to write a fair amount of tests to cover this feature. That said, the feature should be fully usable now (other than diff_hlID() not working with this yet), and I would love it if people could give it a try and provide feedbacks. (Update: This PR is no longer draft)

Some extra notes below for those interested:

Some examples of what it looks like:

  • inline:simple (old behavior)
    • image
  • inline:char
    • image
  • inline:char,algorithm:patience,icase
    • image
  • inline:word
    • image

More complicated multi-line change (the diff comes from this PR itself):

image

Multi-buffer highlight:

image

Color schemes can define DiffTextAdd to highlight added texts differently from changed texts:

image

Name

Every diff program seems to call this feature something different. I mostly settled on "inline highlight" but open to other suggestions.

character diff

As mentioned above, I added a refinement post-process step to clean up the diffs. Just to show the problem and the before/after (first image is a naive implementation, and second is the one with a post-process step to merge small gaps):

image image

From surveying other popular diff programs, the ones that use character diff for inline highlighting (e.g. Meld, VSCode) all have to do similar work, via pre/post-processing to massage the diff results a little bit. VSCode for example does quite a bit of post-processing (merging small gaps, extending small diffs to word boundaries, etc), and I tried to just keep this minimal and go for the largest bang for the buck to make the result look decent without over-complicating the implementation (implemented in diff_refine_inline_char_highlight()). This step is heuristics based and could be changed in the future.

Sliders

Another issue is sliders, where you could slide a diff left or right due to repeating texts. To fix this, I'm always forcing on indent-heuristics to piggyback on it for free white-space sliding alignment. FWIW I think indent-heuristics should be in the default diffopt setting value (since Git has had it on by default for a long time) but that's another discussion. Without / with indent-heuristics:

Without indent-heuristics (you can see how they don't align at whitespace boundaries):
image
With indent-heuristics forced on:
image

Note that this is not perfect. Since we are hijacking a feature designed for line diffing (where xdiff uses the white space and indentation of lines as a heuristics) rather than per-character diff, this doesn't help solve sliders at symbols (ideally the highlight should border the "+" sign):

image

If we want to fix this we should probably modify xdiff itself to add a new heuristics mode that's aware that it's doing character diffing and has heuristics scoring designed for intraline diff (e.g. VSCode has a scoring system that rank whitespace, symbols, keywords, etc differently to slide the diff to the best location).

Word diff

This basically just splits the diff block along word difference instead of character. For the most part it works fine. As mentioned I'm defining a word based on the first buffer's iskeyword setting to have a consistent definition. Some other programs that use word diff (e.g. diffchar.vim or Git's own word-diff feature) allows you to define words in different ways or via a regex pattern. For first pass I would rather just keep it simple and just provide a single inline:word option for now. We could always add a new inline:pattern etc setting later as it's extensible.

Newline

As noted above, newlines can be diffed as well, but it will only show up when list option is set (note how the first newline on the right is highlighted):

image

I was debating back and forth how to show this information. VSCode for example highlights the entire rest of the line in the changed color, which I find to be too much. Meld essentially automatically shows the equivalent of listchar only where a change happens. I think this could be an interesting future extension, something along the lines of set diffopt+=auto-list where we automatically show list only when there is a diff highlight.

There's also an issue here where more than half of the color schemes (including bundled ones) I tried tend to not work well when list is on during diffing. I am still not sure whether this is because of how Vim combines the highlight attributes or it's the color schemes' fault. This is usually particularly problematic if the color scheme uses gui=inverse for NonText group. That's why most of the screenshots I took were using the third-party "iceberg" color scheme as it works the best in this situation.

linematch

Unfortunately this PR doesn't work perfectly with linematch as mentioned above. Here's a showcase of the issue:

linematch off:
image
linematch on:
image

Note how in the linematch example, the code splits the diff block into two, and therefore inline highlighting fails to recognize that the first part of the texts ("another line of text…" corresponds to the other buffer) and it's marked as changed instead. I have some ideas how to fix it but for scope issues decided not to tackle it for now. (This feature is more complicated when 3-way diff is involved and the "correct" fix is not obvious and may require some additional improvements to multi-buffer diff to work well IMO)

Anticipated FAQs

Why not just take Git's word-diff code when we use xdiff already?

Word diff is an application layer feature in Git, not part of the xdiff library. It does something similar to this PR but is specific to Git. It isn't something we can just take.

Why not a plugin?

There's an existing plugin (diffchar.vim) that already does something similar. It does interesting things but it's limited by Vim's API. For example, it does not do live update when typing, does not handle multi-line blocks, and does not support multi-buffer diff'ing (this PR handles it fine as it uses Vim's builtin diff block merging in diff_read). Vim also has more internal knowledge and can control how the rendering is done instead of a plugin needing to bend over backwards to get this to work.

I also think this feature is a basic requirement for a competent diff program and should be part of the overall design. We should not force users to install a plugin to get access to such features.

It does have some interesting features, such as the ability to highlight corresponding texts in the other buffer when cursor is on a highlighted chunk; and for chunks where there's an added piece of text (meaning there's no corresponding texts in the other buffer), use underline in neighbor texts to highlight where it is inserted in the other buffer. I decided not to clone all of them in this PR.

I do think later on Vim could expose more APIs both for accessing the diff highlight information, and/or provide ways to modify it. E.g. If another diff plugin like Difftastic (which does syntax diffing using treesitter) wants to hook into Vim/Neovim, it should have an API where it could define where exactly the highlight should be. We could add something like set diffopt+=inline:expr.

Any performance issues?

I was planning to do some benchmarks but from initial testing, I simply didn't see any performance issues at all and therefore decided not to do it. Vim will only calculate inline difference for the on-screen diff blocks, and the result is cached inside the diff block, and therefore in vast majority of cases this change will not cause much performance regression when it's on. It will only be an issue if we have some serious degenerate cases where a single line is really long, in which case from experiences this could be slightly slow, but usually still within reason. Xdiff is also quite fast compared to other diff programs so I don't think this should cause much issues.

Relevant links

neovim/neovim#29549
neovim/neovim#3433
https://github.com/rickhowe/diffchar.vim


TODOs

Current TODOs before PR is fully ready:

  • Fix diff_hlID() to work properly
  • Implement tests

There are further diff changes / improvements I would like to work on, but probably will wait to see how this PR goes first. Future potential options include making this work better with linematch, move detection, expose xdiff's --anchored feature, better API integration, etc.

@ychin ychin force-pushed the inline-diff-char-word branch 2 times, most recently from 623a6eb to 8ab47b9 Compare March 13, 2025 17:40
@ychin ychin marked this pull request as draft March 17, 2025 10:56
@ychin ychin force-pushed the inline-diff-char-word branch 3 times, most recently from 2426fb6 to ee58098 Compare March 18, 2025 08:06
ychin added 10 commits March 22, 2025 00:15
Problem:  Diff mode's inline highlighting is lackluster. It only
          performs a line-by-line comparison, and calculates a single
          shortest range within a line that could encompass all the
          changes. In lines with multiple changes, or those that span
          multiple lines, this approach tends to end up highlighting
          much more than necessary.

Solution: Implement new inline highlighting modes by doing per-character
          or per-word diff within the diff block, and highlight only the
          relevant parts.

This change introduces a new diffopt option "inline:<type>". Setting to
"none" will disable all inline highlighting, "simple" (the default) will
use the old behavior, "char" / "word" will perform a character/word-wise
diff of the texts within each diff block and only highlight the
differences.

The new char/word inline diff only use the internal xdiff, and will
respect diff options such as algorithm choice, icase, and misc iwhite
options. indent-heuristics is always on to perform better sliding.

For character highlight, a post-process of the diff results is first
applied before we show the highlight. This is because a naive diff will
create a result with a lot of small diff chunks and gaps, due to the
repetitive nature of individual characters. The post-process is a
heuristic-based refinement that attempts to merge adjacent diff blocks
if they are separated by a short gap (1-3 characters), and can be
further tuned in the future for better results. This process results in
more characters than necessary being highlighted but overall less visual
noise.

For word highlight, always use first buffer's iskeyword definition.
Otherwise if each buffer has different iskeyword settings we would not
be able to group words properly.

The char/word diffing is always per-diff block, not per line, meaning
that changes that span multiple lines will show up correctly.
Added/removed newlines are not shown by default, but if the user has
'list' set (with "eol" listchar defined), the eol character will be be
highlighted correctly for the specific newline characters.

Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by
default. It allows color schemes to use different colors for texts that
have been added within a line versus modified.

This doesn't interact with linematch perfectly currently. The linematch
feature splits up diff blocks into multiple smaller blocks for better
visual matching, which makes inline highlight less useful especially for
multi-line change (e.g. a line is broken into two lines). This could be
addressed in the future.

As a side change, this also removes the bounds checking introduced to
diff_read() as they were added to mask existing logic bugs that were
properly fixed in vim#16768.
Previously in multi-buffer diff if a buffer doesn't have text in a
particular block, it would cause all the texts in this block to show up
as modified.
This helps making sure it will return the correct results when
inline:simple changed to inline:none, but also previously it wouldn't
respect changes to add/remove icase due to the caching. In general the
caching probably does more harm than good.
This is a drive-by change to fix current bug where a character with
composing character will not have icase work properly since the code was
ignoring it.
Vim just fills buffers as they go and don't re-order them if a buffer is
removed.
This happens when we have Å and å. Note that this doesn't fix normal
diffing regarding such texts and simple highlighting. That could be
fixed in the future.
@ychin ychin force-pushed the inline-diff-char-word branch from 5bca11a to 8c33618 Compare March 22, 2025 07:19
@ychin ychin force-pushed the inline-diff-char-word branch 2 times, most recently from 4a925aa to a6bad8a Compare March 22, 2025 12:20
Properly initialize df_changes to avoid uninitialize memory crash

Also, properly initialize df_lnum / df_count to clear them to zero.
We could make sure to never use the invalid values if the buffer is not
NULL but in practice that is quite tricky and is prone to error making
it easy to do memory bugs. Just set everything to 0 to make it easy to
reason about.
@ychin ychin force-pushed the inline-diff-char-word branch from a6bad8a to b479fc6 Compare March 22, 2025 12:35
Fix overflow when parsing change

Don't leak memory when diffing multiple buffers. Need to clear the dout
after every diff_read()
@ychin ychin force-pushed the inline-diff-char-word branch from 08576cb to 3f7f8e2 Compare March 22, 2025 14:06
@ychin ychin marked this pull request as ready for review March 22, 2025 16:53
@ychin ychin changed the title WIP: Improve diff inline highlighting using per-character/word diff Improve diff inline highlighting using per-character/word diff Mar 22, 2025
@ychin
Copy link
Contributor Author

ychin commented Mar 22, 2025

Ok this PR should be ready. Let me know if there are any feedbacks or thoughts. Implemented tests and previously the PR was also crashing in Linux / Windows builds (apparently macOS allocator automatically zeroes allocated memory, but other OSes don't) which are also now fixed.

zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Mar 28, 2025
Problem:  Diff mode's inline highlighting is lackluster. It only
          performs a line-by-line comparison, and calculates a single
          shortest range within a line that could encompass all the
          changes. In lines with multiple changes, or those that span
          multiple lines, this approach tends to end up highlighting
          much more than necessary.

Solution: Implement new inline highlighting modes by doing per-character
          or per-word diff within the diff block, and highlight only the
          relevant parts, add "inline:simple" to the defaults (which is
          the old behaviour)

This change introduces a new diffopt option "inline:<type>". Setting to
"none" will disable all inline highlighting, "simple" (the default) will
use the old behavior, "char" / "word" will perform a character/word-wise
diff of the texts within each diff block and only highlight the
differences.

The new char/word inline diff only use the internal xdiff, and will
respect diff options such as algorithm choice, icase, and misc iwhite
options. indent-heuristics is always on to perform better sliding.

For character highlight, a post-process of the diff results is first
applied before we show the highlight. This is because a naive diff will
create a result with a lot of small diff chunks and gaps, due to the
repetitive nature of individual characters. The post-process is a
heuristic-based refinement that attempts to merge adjacent diff blocks
if they are separated by a short gap (1-3 characters), and can be
further tuned in the future for better results. This process results in
more characters than necessary being highlighted but overall less visual
noise.

For word highlight, always use first buffer's iskeyword definition.
Otherwise if each buffer has different iskeyword settings we would not
be able to group words properly.

The char/word diffing is always per-diff block, not per line, meaning
that changes that span multiple lines will show up correctly.
Added/removed newlines are not shown by default, but if the user has
'list' set (with "eol" listchar defined), the eol character will be be
highlighted correctly for the specific newline characters.

Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by
default. It allows color schemes to use different colors for texts that
have been added within a line versus modified.

This doesn't interact with linematch perfectly currently. The linematch
feature splits up diff blocks into multiple smaller blocks for better
visual matching, which makes inline highlight less useful especially for
multi-line change (e.g. a line is broken into two lines). This could be
addressed in the future.

As a side change, this also removes the bounds checking introduced to
diff_read() as they were added to mask existing logic bugs that were
properly fixed in vim/vim#16768.

closes: vim/vim#16881

vim/vim@9943d47

Co-authored-by: Yee Cheng Chin <[email protected]>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Mar 28, 2025
Problem:  Diff mode's inline highlighting is lackluster. It only
          performs a line-by-line comparison, and calculates a single
          shortest range within a line that could encompass all the
          changes. In lines with multiple changes, or those that span
          multiple lines, this approach tends to end up highlighting
          much more than necessary.

Solution: Implement new inline highlighting modes by doing per-character
          or per-word diff within the diff block, and highlight only the
          relevant parts, add "inline:simple" to the defaults (which is
          the old behaviour)

This change introduces a new diffopt option "inline:<type>". Setting to
"none" will disable all inline highlighting, "simple" (the default) will
use the old behavior, "char" / "word" will perform a character/word-wise
diff of the texts within each diff block and only highlight the
differences.

The new char/word inline diff only use the internal xdiff, and will
respect diff options such as algorithm choice, icase, and misc iwhite
options. indent-heuristics is always on to perform better sliding.

For character highlight, a post-process of the diff results is first
applied before we show the highlight. This is because a naive diff will
create a result with a lot of small diff chunks and gaps, due to the
repetitive nature of individual characters. The post-process is a
heuristic-based refinement that attempts to merge adjacent diff blocks
if they are separated by a short gap (1-3 characters), and can be
further tuned in the future for better results. This process results in
more characters than necessary being highlighted but overall less visual
noise.

For word highlight, always use first buffer's iskeyword definition.
Otherwise if each buffer has different iskeyword settings we would not
be able to group words properly.

The char/word diffing is always per-diff block, not per line, meaning
that changes that span multiple lines will show up correctly.
Added/removed newlines are not shown by default, but if the user has
'list' set (with "eol" listchar defined), the eol character will be be
highlighted correctly for the specific newline characters.

Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by
default. It allows color schemes to use different colors for texts that
have been added within a line versus modified.

This doesn't interact with linematch perfectly currently. The linematch
feature splits up diff blocks into multiple smaller blocks for better
visual matching, which makes inline highlight less useful especially for
multi-line change (e.g. a line is broken into two lines). This could be
addressed in the future.

As a side change, this also removes the bounds checking introduced to
diff_read() as they were added to mask existing logic bugs that were
properly fixed in vim/vim#16768.

closes: vim/vim#16881

vim/vim@9943d47

Co-authored-by: Yee Cheng Chin <[email protected]>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Mar 28, 2025
Problem:  Diff mode's inline highlighting is lackluster. It only
          performs a line-by-line comparison, and calculates a single
          shortest range within a line that could encompass all the
          changes. In lines with multiple changes, or those that span
          multiple lines, this approach tends to end up highlighting
          much more than necessary.

Solution: Implement new inline highlighting modes by doing per-character
          or per-word diff within the diff block, and highlight only the
          relevant parts, add "inline:simple" to the defaults (which is
          the old behaviour)

This change introduces a new diffopt option "inline:<type>". Setting to
"none" will disable all inline highlighting, "simple" (the default) will
use the old behavior, "char" / "word" will perform a character/word-wise
diff of the texts within each diff block and only highlight the
differences.

The new char/word inline diff only use the internal xdiff, and will
respect diff options such as algorithm choice, icase, and misc iwhite
options. indent-heuristics is always on to perform better sliding.

For character highlight, a post-process of the diff results is first
applied before we show the highlight. This is because a naive diff will
create a result with a lot of small diff chunks and gaps, due to the
repetitive nature of individual characters. The post-process is a
heuristic-based refinement that attempts to merge adjacent diff blocks
if they are separated by a short gap (1-3 characters), and can be
further tuned in the future for better results. This process results in
more characters than necessary being highlighted but overall less visual
noise.

For word highlight, always use first buffer's iskeyword definition.
Otherwise if each buffer has different iskeyword settings we would not
be able to group words properly.

The char/word diffing is always per-diff block, not per line, meaning
that changes that span multiple lines will show up correctly.
Added/removed newlines are not shown by default, but if the user has
'list' set (with "eol" listchar defined), the eol character will be be
highlighted correctly for the specific newline characters.

Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by
default. It allows color schemes to use different colors for texts that
have been added within a line versus modified.

This doesn't interact with linematch perfectly currently. The linematch
feature splits up diff blocks into multiple smaller blocks for better
visual matching, which makes inline highlight less useful especially for
multi-line change (e.g. a line is broken into two lines). This could be
addressed in the future.

As a side change, this also removes the bounds checking introduced to
diff_read() as they were added to mask existing logic bugs that were
properly fixed in vim/vim#16768.

closes: vim/vim#16881

vim/vim@9943d47

Co-authored-by: Yee Cheng Chin <[email protected]>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Mar 28, 2025
Problem:  Diff mode's inline highlighting is lackluster. It only
          performs a line-by-line comparison, and calculates a single
          shortest range within a line that could encompass all the
          changes. In lines with multiple changes, or those that span
          multiple lines, this approach tends to end up highlighting
          much more than necessary.

Solution: Implement new inline highlighting modes by doing per-character
          or per-word diff within the diff block, and highlight only the
          relevant parts, add "inline:simple" to the defaults (which is
          the old behaviour)

This change introduces a new diffopt option "inline:<type>". Setting to
"none" will disable all inline highlighting, "simple" (the default) will
use the old behavior, "char" / "word" will perform a character/word-wise
diff of the texts within each diff block and only highlight the
differences.

The new char/word inline diff only use the internal xdiff, and will
respect diff options such as algorithm choice, icase, and misc iwhite
options. indent-heuristics is always on to perform better sliding.

For character highlight, a post-process of the diff results is first
applied before we show the highlight. This is because a naive diff will
create a result with a lot of small diff chunks and gaps, due to the
repetitive nature of individual characters. The post-process is a
heuristic-based refinement that attempts to merge adjacent diff blocks
if they are separated by a short gap (1-3 characters), and can be
further tuned in the future for better results. This process results in
more characters than necessary being highlighted but overall less visual
noise.

For word highlight, always use first buffer's iskeyword definition.
Otherwise if each buffer has different iskeyword settings we would not
be able to group words properly.

The char/word diffing is always per-diff block, not per line, meaning
that changes that span multiple lines will show up correctly.
Added/removed newlines are not shown by default, but if the user has
'list' set (with "eol" listchar defined), the eol character will be be
highlighted correctly for the specific newline characters.

Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by
default. It allows color schemes to use different colors for texts that
have been added within a line versus modified.

This doesn't interact with linematch perfectly currently. The linematch
feature splits up diff blocks into multiple smaller blocks for better
visual matching, which makes inline highlight less useful especially for
multi-line change (e.g. a line is broken into two lines). This could be
addressed in the future.

As a side change, this also removes the bounds checking introduced to
diff_read() as they were added to mask existing logic bugs that were
properly fixed in vim/vim#16768.

closes: vim/vim#16881

vim/vim@9943d47

Co-authored-by: Yee Cheng Chin <[email protected]>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Mar 28, 2025
Problem:  Diff mode's inline highlighting is lackluster. It only
          performs a line-by-line comparison, and calculates a single
          shortest range within a line that could encompass all the
          changes. In lines with multiple changes, or those that span
          multiple lines, this approach tends to end up highlighting
          much more than necessary.

Solution: Implement new inline highlighting modes by doing per-character
          or per-word diff within the diff block, and highlight only the
          relevant parts, add "inline:simple" to the defaults (which is
          the old behaviour)

This change introduces a new diffopt option "inline:<type>". Setting to
"none" will disable all inline highlighting, "simple" (the default) will
use the old behavior, "char" / "word" will perform a character/word-wise
diff of the texts within each diff block and only highlight the
differences.

The new char/word inline diff only use the internal xdiff, and will
respect diff options such as algorithm choice, icase, and misc iwhite
options. indent-heuristics is always on to perform better sliding.

For character highlight, a post-process of the diff results is first
applied before we show the highlight. This is because a naive diff will
create a result with a lot of small diff chunks and gaps, due to the
repetitive nature of individual characters. The post-process is a
heuristic-based refinement that attempts to merge adjacent diff blocks
if they are separated by a short gap (1-3 characters), and can be
further tuned in the future for better results. This process results in
more characters than necessary being highlighted but overall less visual
noise.

For word highlight, always use first buffer's iskeyword definition.
Otherwise if each buffer has different iskeyword settings we would not
be able to group words properly.

The char/word diffing is always per-diff block, not per line, meaning
that changes that span multiple lines will show up correctly.
Added/removed newlines are not shown by default, but if the user has
'list' set (with "eol" listchar defined), the eol character will be be
highlighted correctly for the specific newline characters.

Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by
default. It allows color schemes to use different colors for texts that
have been added within a line versus modified.

This doesn't interact with linematch perfectly currently. The linematch
feature splits up diff blocks into multiple smaller blocks for better
visual matching, which makes inline highlight less useful especially for
multi-line change (e.g. a line is broken into two lines). This could be
addressed in the future.

As a side change, this also removes the bounds checking introduced to
diff_read() as they were added to mask existing logic bugs that were
properly fixed in vim/vim#16768.

closes: vim/vim#16881

vim/vim@9943d47

Co-authored-by: Yee Cheng Chin <[email protected]>
SkohTV pushed a commit to SkohTV/neovim that referenced this pull request Mar 29, 2025
Problem:  Diff mode's inline highlighting is lackluster. It only
          performs a line-by-line comparison, and calculates a single
          shortest range within a line that could encompass all the
          changes. In lines with multiple changes, or those that span
          multiple lines, this approach tends to end up highlighting
          much more than necessary.

Solution: Implement new inline highlighting modes by doing per-character
          or per-word diff within the diff block, and highlight only the
          relevant parts, add "inline:simple" to the defaults (which is
          the old behaviour)

This change introduces a new diffopt option "inline:<type>". Setting to
"none" will disable all inline highlighting, "simple" (the default) will
use the old behavior, "char" / "word" will perform a character/word-wise
diff of the texts within each diff block and only highlight the
differences.

The new char/word inline diff only use the internal xdiff, and will
respect diff options such as algorithm choice, icase, and misc iwhite
options. indent-heuristics is always on to perform better sliding.

For character highlight, a post-process of the diff results is first
applied before we show the highlight. This is because a naive diff will
create a result with a lot of small diff chunks and gaps, due to the
repetitive nature of individual characters. The post-process is a
heuristic-based refinement that attempts to merge adjacent diff blocks
if they are separated by a short gap (1-3 characters), and can be
further tuned in the future for better results. This process results in
more characters than necessary being highlighted but overall less visual
noise.

For word highlight, always use first buffer's iskeyword definition.
Otherwise if each buffer has different iskeyword settings we would not
be able to group words properly.

The char/word diffing is always per-diff block, not per line, meaning
that changes that span multiple lines will show up correctly.
Added/removed newlines are not shown by default, but if the user has
'list' set (with "eol" listchar defined), the eol character will be be
highlighted correctly for the specific newline characters.

Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by
default. It allows color schemes to use different colors for texts that
have been added within a line versus modified.

This doesn't interact with linematch perfectly currently. The linematch
feature splits up diff blocks into multiple smaller blocks for better
visual matching, which makes inline highlight less useful especially for
multi-line change (e.g. a line is broken into two lines). This could be
addressed in the future.

As a side change, this also removes the bounds checking introduced to
diff_read() as they were added to mask existing logic bugs that were
properly fixed in vim/vim#16768.

closes: vim/vim#16881

vim/vim@9943d47

Co-authored-by: Yee Cheng Chin <[email protected]>
@rickhowe
Copy link

FYI: This might be included in your ToDo list, but word-wise inline diff does not recognize character class as a word boundary in a multibyte string. Actually 'w' and 'b' cursor move command and '<' and '>' regexp recognize the word boundary.

:echo split('差分モード用のオプション設定', '\<\|\>')
['差分', 'モード', '用', 'の', 'オプション', '設定']

In mbyte.c, dbcs_class() and utf_class() are implemented to support them.

@ychin
Copy link
Contributor Author

ychin commented Mar 31, 2025

I just tested and you are right. I'll submit a fix.

@ychin
Copy link
Contributor Author

ychin commented Mar 31, 2025

FWIW iskeyword's documentation seems a little misleading / confusing. Multi-byte words are completely hard-coded based on Unicode class rules, and unaffected by iskeyword settings. The description of @ seems to be a little ambiguous:

'isfname' for a description of the format of this option. For '@' characters above 255 check the "word" character class (any character that is not white space or punctuation).

It does mean if you use word diff right now it's impossible to customize how it groups words for say Japanese or Chinese (which is specifically a problem since all the Chinese characters in a whole sentence get lumped into a single "word" unlike Japanese which gets to split on kanji/hiragana/katakana etc). I guess that's why inline:char exists (which IMO is still the better setting than inline:word for most use cases), and in the future we can implement inline:pattern to support custom regex. Right now inline:word should obey the same rule that Vim uses for words.

ychin added a commit to ychin/vim that referenced this pull request Apr 4, 2025
…word

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

Related: vim#16881 (diff inline highlight)
ychin added a commit to ychin/vim that referenced this pull request Apr 4, 2025
…word

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information
in a diff highlight.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

Related: vim#16881 (diff inline highlight)
ychin added a commit to ychin/vim that referenced this pull request Apr 4, 2025
…word

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information
in a diff highlight.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

Related: vim#16881 (diff inline highlight)
@ychin
Copy link
Contributor Author

ychin commented Apr 4, 2025

I filed #17050 to fix this. The proposed solution is to simply treat all non-alphanumeric multi-byte characters as non-word in the context of word diff. It's slightly inconsistent with how Vim word motions work but I think this works best in this context. This mostly means emojis and CJK characters will treat each character as an individual word.

chrisbra pushed a commit that referenced this pull request Apr 4, 2025
Problem:  inline word diff treats multibyte chars as word char
          (after 9.1.1243)
Solution: treat all non-alphanumeric characters as non-word characters
          (Yee Cheng Chin)

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information
in a diff highlight.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

related: #16881 (diff inline highlight)
closes: #17050

Signed-off-by: Yee Cheng Chin <[email protected]>
Signed-off-by: Christian Brabandt <[email protected]>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Apr 5, 2025
Problem:  inline word diff treats multibyte chars as word char
          (after 9.1.1243)
Solution: treat all non-alphanumeric characters as non-word characters
          (Yee Cheng Chin)

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information
in a diff highlight.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

related: vim/vim#16881 (diff inline highlight)
closes: vim/vim#17050

vim/vim@9aa120f

Co-authored-by: Yee Cheng Chin <[email protected]>
zeertzjq added a commit to neovim/neovim that referenced this pull request Apr 5, 2025
…har (#33323)

Problem:  inline word diff treats multibyte chars as word char
          (after 9.1.1243)
Solution: treat all non-alphanumeric characters as non-word characters
          (Yee Cheng Chin)

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information
in a diff highlight.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

related: vim/vim#16881 (diff inline highlight)
closes: vim/vim#17050

vim/vim@9aa120f

Co-authored-by: Yee Cheng Chin <[email protected]>
wincent added a commit to wincent/wincent that referenced this pull request Aug 13, 2025
Repo:

- https://github.com/sindrets/diffview.nvim

Adds minimal config to make it usable for basic use, like:

    :DiffviewOpen

to show current diff with file tree on left, and split diff (two panes)
on the right; or:

    :DiffviewFileHistory

to show, um, file history at bottom, and split diff (two panes) above.

Main things I've noticed that needed changing so far:

- Needed to suppress default mapping for `<tab>` because I use that for
  toggling folds. Using `gn` (ie. "[g]o [n]ext") instead.
- For symmetry, using `gp` (ie. "[g]o [p]revious") as inverse of `gn`.
- Suppressed icons warning.

Changed the `diff` entry in `'fillchars'` based on a suggestion from the
README; the box-drawing character tiles nicely and makes it look like
sections of the buffer are crossed out. I could also have used "╳" but
that's a bit too heavyweight, visually.

Also reworked highlight colors because it's basically impossible to see
what's going on otherwise. There is no word-diff in Neovim:

- neovim/neovim#29549

But there is in Vim:

- vim/vim#16881

Until that gets backported (which it might never be), I can either put
up with the very ugly highlighting I'm using now, or use a plugin like:

- https://github.com/rickhowe/diffchar.vim

I'll think about that. But in any case, in the meantime, starting with
the plug-in I'm adding in this commit. For more detailed info on usage,
see:

- https://github.com/sindrets/diffview.nvim/blob/main/USAGE.md

Contains useful examples of standard commands like:

    :DiffviewOpen origin/HEAD...HEAD

for showing the diff of current branch since the merge base.

Or, to review the same commit-by-commit:

    :DiffviewFileHistory --range=origin/HEAD...HEAD --right-only --no-merges

Or, to see the latest stash:

    :DiffviewFileHistory -g --range=stash

* aspects/nvim/files/.config/nvim/pack/bundle/opt/diffview.nvim 0000000...4516612 (312):
  > feat(actions): added `select_{prev|next}_commit` (#512)
ychin added a commit to ychin/vim that referenced this pull request Sep 9, 2025
The default diff options have not been updated much despite new
functionality having been added to Vim.

- indent-heurstic: This has been enabled by default in Git since
  33de716 in 2017. Given that Vim uses xdiff from Git, it makes sense
  to track the default configuration from Git.

- inline:char: This turns on character-wise inline highlighting which is
  generally much better than the default inline:simple. It has been
  implemented since vim#16881 and we have not seen reports of any issues
  with it, and it has received good feedbacks.
chrisbra pushed a commit that referenced this pull request Sep 11, 2025
Problem:  defaults: 'diffopt' option value can be improved
Solution: Update diffopt defaults to include "indent-heuristic" and
          "inline:char" (Yee Cheng Chin)

The default diff options have not been updated much despite new
functionality having been added to Vim.

- indent-heurstic: This has been enabled by default in Git since
  33de716 in 2017. Given that Vim uses xdiff from Git, it makes sense
  to track the default configuration from Git.

- inline:char: This turns on character-wise inline highlighting which is
  generally much better than the default inline:simple. It has been
  implemented since #16881 and we have not seen reports of any issues
  with it, and it has received good feedbacks.

closes: #18255

Signed-off-by: Yee Cheng Chin <[email protected]>
Signed-off-by: Christian Brabandt <[email protected]>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Sep 12, 2025
Problem:  defaults: 'diffopt' option value can be improved
Solution: Update diffopt defaults to include "indent-heuristic" and
          "inline:char" (Yee Cheng Chin)

The default diff options have not been updated much despite new
functionality having been added to Vim.

- indent-heurstic: This has been enabled by default in Git since
  33de716387 in 2017. Given that Vim uses xdiff from Git, it makes sense
  to track the default configuration from Git.

- inline:char: This turns on character-wise inline highlighting which is
  generally much better than the default inline:simple. It has been
  implemented since vim/vim#16881 and we have not seen reports of any issues
  with it, and it has received good feedbacks.

closes: vim/vim#18255

vim/vim@976b365

Co-authored-by: Yee Cheng Chin <[email protected]>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Sep 12, 2025
Problem:  defaults: 'diffopt' option value can be improved
Solution: Update diffopt defaults to include "indent-heuristic" and
          "inline:char" (Yee Cheng Chin)

The default diff options have not been updated much despite new
functionality having been added to Vim.

- indent-heurstic: This has been enabled by default in Git since
  33de716387 in 2017. Given that Vim uses xdiff from Git, it makes sense
  to track the default configuration from Git.

- inline:char: This turns on character-wise inline highlighting which is
  generally much better than the default inline:simple. It has been
  implemented since vim/vim#16881 and we have not seen reports of any issues
  with it, and it has received good feedbacks.

closes: vim/vim#18255

vim/vim@976b365

Co-authored-by: Yee Cheng Chin <[email protected]>
zeertzjq added a commit to neovim/neovim that referenced this pull request Sep 12, 2025
…35727)

Problem:  defaults: 'diffopt' option value can be improved
Solution: Update diffopt defaults to include "indent-heuristic" and
          "inline:char" (Yee Cheng Chin)

The default diff options have not been updated much despite new
functionality having been added to Vim.

- indent-heurstic: This has been enabled by default in Git since
  33de716387 in 2017. Given that Vim uses xdiff from Git, it makes sense
  to track the default configuration from Git.

- inline:char: This turns on character-wise inline highlighting which is
  generally much better than the default inline:simple. It has been
  implemented since vim/vim#16881 and we have not seen reports of any issues
  with it, and it has received good feedbacks.

closes: vim/vim#18255

vim/vim@976b365

Co-authored-by: Yee Cheng Chin <[email protected]>
dundargoc pushed a commit to dundargoc/neovim that referenced this pull request Sep 27, 2025
…eovim#35727)

Problem:  defaults: 'diffopt' option value can be improved
Solution: Update diffopt defaults to include "indent-heuristic" and
          "inline:char" (Yee Cheng Chin)

The default diff options have not been updated much despite new
functionality having been added to Vim.

- indent-heurstic: This has been enabled by default in Git since
  33de716387 in 2017. Given that Vim uses xdiff from Git, it makes sense
  to track the default configuration from Git.

- inline:char: This turns on character-wise inline highlighting which is
  generally much better than the default inline:simple. It has been
  implemented since vim/vim#16881 and we have not seen reports of any issues
  with it, and it has received good feedbacks.

closes: vim/vim#18255

vim/vim@976b365

Co-authored-by: Yee Cheng Chin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants