-
Notifications
You must be signed in to change notification settings - Fork 2.5k
commit-graph: Use the commit-graph in revwalks #5765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
520de8a
to
b3bf58f
Compare
e1fa62e
to
853e270
Compare
src/odb.c
Outdated
@@ -786,6 +806,53 @@ static int odb_exists_1( | |||
return (int)found; | |||
} | |||
|
|||
int git_odb__get_commit_graph(git_commit_graph_file **out, git_odb *db) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think that returning ENOTFOUND would be a little more idiomatic in the case where the commit graph file does not exist.
src/odb.c
Outdated
git_error_set(GIT_ERROR_ODB, "failed to acquire the odb lock"); | ||
return -1; | ||
} | ||
if (git_buf_len(&db->objects_dir) == 0 && git_buf_sets(&db->objects_dir, objects_dir) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this interaction between the ODB and the commit graph... It feels a little magical that when you're using the default backends, we store the object directory and a commit graph springs into being, but that's the only opportunity to use a commit graph. If I were adding my own backends, I would also want to be able to add a commit graph. This is notable for users like LibGit2Sharp which -- and I don't actually think that this was a good idea, but here we are -- do the moral equivalent of adding the default backends themselves. 🙃
Naively - and this is coming from a place where you know a lot more about this than I do, so these may be untenable suggestions, but - I would sort of expect:
- the odb can have multiple commit graphs
- users can add a commit graph (
git_odb_add_commit_graph
) much like they can add a backend. git_odb__add_default_backends
adds the commit graph forobjects_dir
Does multiple commit graphs make sense here? Obviously it will complicate some of the logic, but unreasonably so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't think of a compelling use case to have multiple commit graphs, since a commit graph is a singleton entity of the repository. furthermore, it would complicate things conceptually, because the information from the graphs could be inconsistent among them. say, if one of the commit-graph files goes against the recommendation of not creating it from a shallow repository and the other one does not (arguably, this is the user shooting themselves in the foot). this is opposed to multiple backends, since conceptually one backend cannot contradict another (in theory at least. they can all "lie" and return a different object if they want to). and even repositories with multiple alternates use a single commit graph! https://git-scm.com/docs/commit-graph#_chains_across_multiple_object_directories
if we ignore the generation number contradiction problem outlined above, from the implementation POV, handling multiple commit graphs is not unreasonably complicated: change the singleton commit graph object to a vector of them. at query time, iterate through all of them and arbitrarily choose one entry (out of the commit graphs that contain that one commit).
so with all that in mind, wdyt about adding a restriction of having at most one commit graph? i'm behind explicitly setting one into the odb (i didn't know about that LibGit2Sharp restriction!), so i would be following your expectations with one tweak: git_odb_set_commit_graph
instead of git_odb_add_commit_graph
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just realized something else, hahaha. in order to avoid paying for what you don't use, it might be good to introduce a layer of indirection: if folks manually add a commit graph, that commit graph will be eagerly loaded AND error checked (to get a better feedback loop). but if the commit graph is implicitly added by git_odb__add_default_backends
, it won't be loaded upfront, but until the first time it's needed (kinda how it is right now).
this won't change the interface from what the expectations are, it's just going to be a little bit more code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so with all that in mind, wdyt about adding a restriction of having at most one commit graph? i'm behind explicitly setting one into the odb (i didn't know about that LibGit2Sharp restriction!), so i would be following your expectations with one tweak: git_odb_set_commit_graph instead of git_odb_add_commit_graph.
👍
I was getting a little architecture astronauty, perhaps. 😁 Especially since it's all very theoretical that anybody would want a non-default commit-graph at the moment. I think that your recommendation makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took a little bit of time, but done!
left the refactor as a separate commit for easier reviewing. lmk if you want me to squash them.
This change makes revwalks a bit faster by using the `commit-graph` file (if present). This is thanks to the `commit-graph` allow much faster parsing of the commit information by requiring near-zero I/O (aside from reading a few dozen bytes off of a `mmap(2)`-ed file) for each commit, instead of having to read the ODB, inflate the commit, and parse it. This is done by modifying `git_commit_list_parse()` and letting it use the ODB-owned commit-graph file. Part of: libgit2#5757
This change does a medium-size refactor of the git_commit_graph_file and the interaction with the ODB. Now instead of the ODB owning a direct reference to the git_commit_graph_file, there will be an intermediate git_commit_graph. The main advantage of that is that now end users can explicitly set a git_commit_graph that is eagerly checked for errors, while still being able to lazily use the commit-graph in a regular ODB, if the file is present.
853e270
to
25b75cd
Compare
🎉 Thanks @lhchavez! |
This change makes revwalks a bit faster by using the
commit-graph
file(if present). This is thanks to the
commit-graph
allow much fasterparsing of the commit information by requiring near-zero I/O (aside from
reading a few dozen bytes off of a
mmap(2)
-ed file) for each commit,instead of having to read the ODB, inflate the commit, and parse it.
Part of: #5757