fix: support custom prefixes in Git diff parser #361

AlexandrosAlexiou · 2025-04-30T14:12:33Z

The Git diff parser previously expected only the standard "a/" and "b/" prefixes in diff headers, causing it to fail when parsing diffs with custom
prefixes like "i/" and "w/" (which are used by some Git tools).

This change modifies the parse_old_new_file_header method to:

Accept any prefix pattern before the file paths
Maintain the same functionality for standard Git diffs

Added a test case to verify that the parser can now handle custom prefixes.

Fixes error:

panicked at src/git/mod.rs:155:64:
called `Result::unwrap()` on an `Err` value: ParseError { input: "diff --git c/nvim/lazy-lock.json i/nvim/lazy-lock.json

altsem

Hi! Interesting.

I remember writing this code and realizing that parsing it becomes ambiguous. File paths may contain forward slashes, and files may contain spaces.

I'm thinking the least we could do is to make it a little more robust. Perhaps check for [a-z]/.

Sometimes the files are not present down in the

--- i/file1.txt
+++ w/file2.txt

I've pushed a test to demonstrate this to master.
What do you think?

codecov · 2025-05-01T20:02:54Z

Codecov Report

Attention: Patch coverage is 94.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.29%. Comparing base (6b9e124) to head (9725953).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/gitu_diff.rs	94.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #361      +/-   ##
==========================================
+ Coverage   88.28%   88.29%   +0.01%     
==========================================
  Files          66       66              
  Lines        6613     6647      +34     
==========================================
+ Hits         5838     5869      +31     
- Misses        775      778       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The Git diff parser previously expected only the standard "a/" and "b/" prefixes in diff headers, causing it to fail when parsing diffs with custom prefixes like "i/" and "w/" (which are used by some Git tools). This change modifies the `parse_old_new_file_header` method to: - Accept any prefix pattern before the file paths - Look for "/" character instead of hardcoded "a/" and "b/" prefixes - Maintain the same functionality for standard Git diffs Added a test case to verify that the parser can now handle custom prefixes. Fixes error: "called `Result::unwrap()` on an `Err` value: ParseError { input: "diff --git i/..."

AlexandrosAlexiou · 2025-05-02T10:12:37Z

Hi! Interesting.

I remember writing this code and realizing that parsing it becomes ambiguous. File paths may contain forward slashes, and files may contain spaces.

I'm thinking the least we could do is to make it a little more robust. Perhaps check for [a-z]/.

Sometimes the files are not present down in the
--- i/file1.txt
+++ w/file2.txt
I've pushed a test to demonstrate this to master. What do you think?

Hey! Just pulled from master and saw the test was failing. Fixed it in the latest commit though, and now everything's passing.

The code now handles those prefixes like "i/" and "w/" you mentioned, plus the unified diff format with the "---" and "+++" stuff.

Anything else you think we should fix while I'm at it?

PS.
I'm curious though - any particular reason why you decided to roll your own parser instead of using libgit2 bindings? Wouldn't that handle all these edge cases automatically and give you access to Git's internal data structures directly?

Thanks for this really nice tool btw!

altsem · 2025-05-02T21:22:29Z

Without having looked at the source-code, I wonder how git itself handles parsing edge-cases like this.

I think that this way should cover 99.9% of cases at least. (unless someone puts their file inside of a i/ directory...

Perhaps using the regex crate would simplify the implementation? (it's already a dependency)

PS.
I'm curious though - any particular reason why you decided to roll your own parser instead of using libgit2 bindings? Wouldn't that handle all these edge cases automatically and give you access to Git's internal data structures directly?

Gitu used to use libgit2 to parse diffs, but I found the output quite hard to work with.
Especially when it came to working with generating patches / highlighting the text.
I wanted indices into the text, but I couldn't find any lib that did this.
My thought was, fixing these kinds of problems is easier, and will be worth it.
We'll see if that still holds true after some time x)

altsem · 2025-05-02T21:26:03Z

src/gitu_diff.rs

+
+        const CR_BYTE_SIZE: usize = 1;
+        let header_end =
+            if line_end > header_start && self.input.as_bytes()[line_end - CR_BYTE_SIZE] == b'\r' {


This would actually be LF_BYTE_SIZE, right?

Changed in the new impl

AlexandrosAlexiou · 2025-05-03T09:05:33Z

Without having looked at the source-code, I wonder how git itself handles parsing edge-cases like this.

I think that this way should cover 99.9% of cases at least. (unless someone puts their file inside of a i/ directory...

Perhaps using the regex crate would simplify the implementation? (it's already a dependency)

PS.
I'm curious though - any particular reason why you decided to roll your own parser instead of using libgit2 bindings? Wouldn't that handle all these edge cases automatically and give you access to Git's internal data structures directly?

Gitu used to use libgit2 to parse diffs, but I found the output quite hard to work with. Especially when it came to working with generating patches / highlighting the text. I wanted indices into the text, but I couldn't find any lib that did this. My thought was, fixing these kinds of problems is easier, and will be worth it. We'll see if that still holds true after some time x)

Done using regex crate.

altsem

I'll merge it and just turn the regex into a LazyLock so that it's only initialized once.

gj!

AlexandrosAlexiou force-pushed the fix/support-alternative-diff-formats branch 4 times, most recently from 305ee12 to 723621b Compare May 1, 2025 16:17

altsem reviewed May 1, 2025

View reviewed changes

AlexandrosAlexiou added 2 commits May 2, 2025 11:37

fix: handle file names with spaces

a188d78

AlexandrosAlexiou force-pushed the fix/support-alternative-diff-formats branch from 723621b to a188d78 Compare May 2, 2025 10:05

altsem reviewed May 2, 2025

View reviewed changes

refactor: use regex crate for diff parser impl

9725953

altsem approved these changes May 4, 2025

View reviewed changes

altsem enabled auto-merge (squash) May 4, 2025 15:27

altsem merged commit 98ff22c into altsem:master May 4, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: support custom prefixes in Git diff parser #361

fix: support custom prefixes in Git diff parser #361

Uh oh!

AlexandrosAlexiou commented Apr 30, 2025 •

edited

Loading

Uh oh!

altsem left a comment •

edited

Loading

Uh oh!

codecov bot commented May 1, 2025 •

edited

Loading

Uh oh!

AlexandrosAlexiou commented May 2, 2025 •

edited

Loading

Uh oh!

altsem commented May 2, 2025

Uh oh!

altsem May 2, 2025

Uh oh!

AlexandrosAlexiou May 3, 2025

Uh oh!

AlexandrosAlexiou commented May 3, 2025

Uh oh!

altsem left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: support custom prefixes in Git diff parser #361

fix: support custom prefixes in Git diff parser #361

Uh oh!

Conversation

AlexandrosAlexiou commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

altsem left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

AlexandrosAlexiou commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

altsem commented May 2, 2025

Uh oh!

altsem May 2, 2025

Choose a reason for hiding this comment

Uh oh!

AlexandrosAlexiou May 3, 2025

Choose a reason for hiding this comment

Uh oh!

AlexandrosAlexiou commented May 3, 2025

Uh oh!

altsem left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexandrosAlexiou commented Apr 30, 2025 •

edited

Loading

altsem left a comment •

edited

Loading

codecov bot commented May 1, 2025 •

edited

Loading

AlexandrosAlexiou commented May 2, 2025 •

edited

Loading