Thanks to visit codestin.com
Credit goes to github.com

Skip to content

commit-graph: Use the commit-graph in revwalks #5765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 26, 2021

Conversation

lhchavez
Copy link
Contributor

@lhchavez lhchavez commented Jan 7, 2021

This change makes revwalks a bit faster by using the commit-graph file
(if present). This is thanks to the commit-graph allow much faster
parsing of the commit information by requiring near-zero I/O (aside from
reading a few dozen bytes off of a mmap(2)-ed file) for each commit,
instead of having to read the ODB, inflate the commit, and parse it.

Part of: #5757

@lhchavez lhchavez mentioned this pull request Jan 7, 2021
8 tasks
@lhchavez lhchavez force-pushed the cgraph-revwalks branch 2 times, most recently from 520de8a to b3bf58f Compare January 7, 2021 04:54
Base automatically changed from master to main January 7, 2021 10:10
@lhchavez lhchavez force-pushed the cgraph-revwalks branch 3 times, most recently from e1fa62e to 853e270 Compare January 10, 2021 21:28
src/odb.c Outdated
@@ -786,6 +806,53 @@ static int odb_exists_1(
return (int)found;
}

int git_odb__get_commit_graph(git_commit_graph_file **out, git_odb *db)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think that returning ENOTFOUND would be a little more idiomatic in the case where the commit graph file does not exist.

src/odb.c Outdated
git_error_set(GIT_ERROR_ODB, "failed to acquire the odb lock");
return -1;
}
if (git_buf_len(&db->objects_dir) == 0 && git_buf_sets(&db->objects_dir, objects_dir) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this interaction between the ODB and the commit graph... It feels a little magical that when you're using the default backends, we store the object directory and a commit graph springs into being, but that's the only opportunity to use a commit graph. If I were adding my own backends, I would also want to be able to add a commit graph. This is notable for users like LibGit2Sharp which -- and I don't actually think that this was a good idea, but here we are -- do the moral equivalent of adding the default backends themselves. 🙃

Naively - and this is coming from a place where you know a lot more about this than I do, so these may be untenable suggestions, but - I would sort of expect:

  • the odb can have multiple commit graphs
  • users can add a commit graph (git_odb_add_commit_graph) much like they can add a backend.
  • git_odb__add_default_backends adds the commit graph for objects_dir

Does multiple commit graphs make sense here? Obviously it will complicate some of the logic, but unreasonably so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't think of a compelling use case to have multiple commit graphs, since a commit graph is a singleton entity of the repository. furthermore, it would complicate things conceptually, because the information from the graphs could be inconsistent among them. say, if one of the commit-graph files goes against the recommendation of not creating it from a shallow repository and the other one does not (arguably, this is the user shooting themselves in the foot). this is opposed to multiple backends, since conceptually one backend cannot contradict another (in theory at least. they can all "lie" and return a different object if they want to). and even repositories with multiple alternates use a single commit graph! https://git-scm.com/docs/commit-graph#_chains_across_multiple_object_directories

if we ignore the generation number contradiction problem outlined above, from the implementation POV, handling multiple commit graphs is not unreasonably complicated: change the singleton commit graph object to a vector of them. at query time, iterate through all of them and arbitrarily choose one entry (out of the commit graphs that contain that one commit).

so with all that in mind, wdyt about adding a restriction of having at most one commit graph? i'm behind explicitly setting one into the odb (i didn't know about that LibGit2Sharp restriction!), so i would be following your expectations with one tweak: git_odb_set_commit_graph instead of git_odb_add_commit_graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just realized something else, hahaha. in order to avoid paying for what you don't use, it might be good to introduce a layer of indirection: if folks manually add a commit graph, that commit graph will be eagerly loaded AND error checked (to get a better feedback loop). but if the commit graph is implicitly added by git_odb__add_default_backends, it won't be loaded upfront, but until the first time it's needed (kinda how it is right now).

this won't change the interface from what the expectations are, it's just going to be a little bit more code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so with all that in mind, wdyt about adding a restriction of having at most one commit graph? i'm behind explicitly setting one into the odb (i didn't know about that LibGit2Sharp restriction!), so i would be following your expectations with one tweak: git_odb_set_commit_graph instead of git_odb_add_commit_graph.

👍

I was getting a little architecture astronauty, perhaps. 😁 Especially since it's all very theoretical that anybody would want a non-default commit-graph at the moment. I think that your recommendation makes sense!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took a little bit of time, but done!

left the refactor as a separate commit for easier reviewing. lmk if you want me to squash them.

This change makes revwalks a bit faster by using the `commit-graph` file
(if present). This is thanks to the `commit-graph` allow much faster
parsing of the commit information by requiring near-zero I/O (aside from
reading a few dozen bytes off of a `mmap(2)`-ed file) for each commit,
instead of having to read the ODB, inflate the commit, and parse it.

This is done by modifying `git_commit_list_parse()` and letting it use
the ODB-owned commit-graph file.

Part of: libgit2#5757
This change does a medium-size refactor of the git_commit_graph_file and
the interaction with the ODB. Now instead of the ODB owning a direct
reference to the git_commit_graph_file, there will be an intermediate
git_commit_graph. The main advantage of that is that now end users can
explicitly set a git_commit_graph that is eagerly checked for errors,
while still being able to lazily use the commit-graph in a regular ODB,
if the file is present.
@ethomson
Copy link
Member

🎉 Thanks @lhchavez!

@ethomson ethomson merged commit 2370e49 into libgit2:main Jul 26, 2021
@lhchavez lhchavez deleted the cgraph-revwalks branch July 27, 2021 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants