Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@AlexandrosAlexiou
Copy link
Contributor

@AlexandrosAlexiou AlexandrosAlexiou commented Apr 30, 2025

The Git diff parser previously expected only the standard "a/" and "b/" prefixes in diff headers, causing it to fail when parsing diffs with custom
prefixes like "i/" and "w/" (which are used by some Git tools).

This change modifies the parse_old_new_file_header method to:

  • Accept any prefix pattern before the file paths
  • Maintain the same functionality for standard Git diffs

Added a test case to verify that the parser can now handle custom prefixes.

Fixes error:

panicked at src/git/mod.rs:155:64:
called `Result::unwrap()` on an `Err` value: ParseError { input: "diff --git c/nvim/lazy-lock.json i/nvim/lazy-lock.json

@AlexandrosAlexiou AlexandrosAlexiou force-pushed the fix/support-alternative-diff-formats branch 4 times, most recently from 305ee12 to 723621b Compare May 1, 2025 16:17
Copy link
Owner

@altsem altsem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Interesting.

I remember writing this code and realizing that parsing it becomes ambiguous. File paths may contain forward slashes, and files may contain spaces.

I'm thinking the least we could do is to make it a little more robust. Perhaps check for [a-z]/.

Sometimes the files are not present down in the

--- i/file1.txt
+++ w/file2.txt

I've pushed a test to demonstrate this to master.
What do you think?

@codecov
Copy link

codecov bot commented May 1, 2025

Codecov Report

Attention: Patch coverage is 94.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.29%. Comparing base (6b9e124) to head (9725953).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/gitu_diff.rs 94.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #361      +/-   ##
==========================================
+ Coverage   88.28%   88.29%   +0.01%     
==========================================
  Files          66       66              
  Lines        6613     6647      +34     
==========================================
+ Hits         5838     5869      +31     
- Misses        775      778       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The Git diff parser previously expected only the standard "a/" and "b/"
prefixes in diff headers, causing it to fail when parsing diffs with
custom
prefixes like "i/" and "w/" (which are used by some Git tools).

This change modifies the `parse_old_new_file_header` method to:
- Accept any prefix pattern before the file paths
- Look for "/" character instead of hardcoded "a/" and "b/" prefixes
- Maintain the same functionality for standard Git diffs

Added a test case to verify that the parser can now handle custom
prefixes.

Fixes error: "called `Result::unwrap()` on an `Err` value: ParseError {
input: "diff --git i/..."
@AlexandrosAlexiou AlexandrosAlexiou force-pushed the fix/support-alternative-diff-formats branch from 723621b to a188d78 Compare May 2, 2025 10:05
@AlexandrosAlexiou
Copy link
Contributor Author

AlexandrosAlexiou commented May 2, 2025

Hi! Interesting.

I remember writing this code and realizing that parsing it becomes ambiguous. File paths may contain forward slashes, and files may contain spaces.

I'm thinking the least we could do is to make it a little more robust. Perhaps check for [a-z]/.

Sometimes the files are not present down in the

--- i/file1.txt
+++ w/file2.txt

I've pushed a test to demonstrate this to master. What do you think?

Hey! Just pulled from master and saw the test was failing. Fixed it in the latest commit though, and now everything's passing.

The code now handles those prefixes like "i/" and "w/" you mentioned, plus the unified diff format with the "---" and "+++" stuff.

Anything else you think we should fix while I'm at it?

PS.
I'm curious though - any particular reason why you decided to roll your own parser instead of using libgit2 bindings? Wouldn't that handle all these edge cases automatically and give you access to Git's internal data structures directly?

Thanks for this really nice tool btw!

@altsem
Copy link
Owner

altsem commented May 2, 2025

Without having looked at the source-code, I wonder how git itself handles parsing edge-cases like this.

I think that this way should cover 99.9% of cases at least. (unless someone puts their file inside of a i/ directory...

Perhaps using the regex crate would simplify the implementation? (it's already a dependency)

PS.
I'm curious though - any particular reason why you decided to roll your own parser instead of using libgit2 bindings? Wouldn't that handle all these edge cases automatically and give you access to Git's internal data structures directly?

Gitu used to use libgit2 to parse diffs, but I found the output quite hard to work with.
Especially when it came to working with generating patches / highlighting the text.
I wanted indices into the text, but I couldn't find any lib that did this.
My thought was, fixing these kinds of problems is easier, and will be worth it.
We'll see if that still holds true after some time x)

src/gitu_diff.rs Outdated

const CR_BYTE_SIZE: usize = 1;
let header_end =
if line_end > header_start && self.input.as_bytes()[line_end - CR_BYTE_SIZE] == b'\r' {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would actually be LF_BYTE_SIZE, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in the new impl

@AlexandrosAlexiou
Copy link
Contributor Author

Without having looked at the source-code, I wonder how git itself handles parsing edge-cases like this.

I think that this way should cover 99.9% of cases at least. (unless someone puts their file inside of a i/ directory...

Perhaps using the regex crate would simplify the implementation? (it's already a dependency)

PS.
I'm curious though - any particular reason why you decided to roll your own parser instead of using libgit2 bindings? Wouldn't that handle all these edge cases automatically and give you access to Git's internal data structures directly?

Gitu used to use libgit2 to parse diffs, but I found the output quite hard to work with. Especially when it came to working with generating patches / highlighting the text. I wanted indices into the text, but I couldn't find any lib that did this. My thought was, fixing these kinds of problems is easier, and will be worth it. We'll see if that still holds true after some time x)

Done using regex crate.

Copy link
Owner

@altsem altsem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge it and just turn the regex into a LazyLock so that it's only initialized once.

gj!

@altsem altsem enabled auto-merge (squash) May 4, 2025 15:27
@altsem altsem merged commit 98ff22c into altsem:master May 4, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants