-
Notifications
You must be signed in to change notification settings - Fork 2.5k
odb: freshen existing objects when writing #3861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
xref #3650 |
Note that git itself will only freshen a pack file once per execution to avoid thrashing a packfile with touches, which would be quite annoying. That's probably not appropriate for us but we can keep track of the last time we've freshened a packfile and re-touch it only every I'm all ears if you have a good idea for what that |
@ethomson What's the status here? Anything I could help with? |
b732160
to
78e7595
Compare
@arthurschreiber Just needs a review and some sanity checking, especially on the 2 second number that I picked out of thin air. |
5e6fab0
to
3754e57
Compare
return odb_freshen_1(db, id, true); | ||
|
||
/* Failed to refresh, hence not found */ | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the return code semantics are a tad confusing - I guess this results from odb_freshen
replacing the git_odb_exists
calls, where the exists-function is obviously returning a boolean value. In this case, though, I'd rather expect to see our usual error semantics with -1
as error code, as odb_freshen
does not hint at a returned boolean value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, I think that the similarity with refresh
is beneficial, so I'm going to keep it returning a boolean for now, especially since it's just internal. But I think that odb.c
has become a bit confusing overall, so I am going to give the whole file a closer look this weekend.
I don't really like the two-second span between refreshes, as the value seems to be rather arbitrary. On the other side we have to choose a value here and I have no arguments for or against any other value besides the trade-off between thrashing and accuracy. Looks good besides this and the few comments I've left. |
My initial thought was to just do one second but I really wanted to avoid dealing with sub-second precision, so to avoid refreshing between a rollover from one second to the next (but within the same one second span) I just chose two. An argument to going much larger (10 seconds) could certainly be made, but I'm not one to make it. I wonder if @peff has any thoughts here. Thanks for the review. |
I considered doing a refresh-after-n-seconds like this in git, but didn't bother since we can usually assume our processes are short-lived. I agree it's a good thing for libgit2 to do. When picking
So...I think you could probably go much higher than you're at. But there's no reason to do so unless you're worried that calling |
When writing an object, we calculate its OID and see if it exists in the object database. If it does, we need to freshen the file that contains it.
Since writing multiple objects may all already exist in a single packfile, avoid freshening that packfile repeatedly in a tight loop. Instead, only freshen pack files every 2 seconds.
3754e57
to
27051d4
Compare
When writing an object, we calculate its OID and see if it exists in the object database. If it does, we need to freshen the file that contains it.
This adds a new function to backends,
freshen
, which will freshen the object if it exists.Previously, we would do a simple
exists
check duringgit_odb_write
, after calculating the new object ID. Now, we will use thefreshen
function if it is implemented on a particular backend (falling back to theexists
check if it doesn't).A slight disappointment is that we no longer look in the cache first to see if an object exists. To do that, we would need to store more information in the cache, namely the backend that the object was found in (or else we would need to go locate the object again to freshen it, and if we do that, there would be no point in looking in the cache at all.) That's quite a yak to shave, it turns out. But it's probably not really a big deal, since this is only used in the
write
functions, where it is presumed that the object doesn't exist (and thus wouldn't be in the cache).(So in the general case, this is as performant as before.)