Thanks to visit codestin.com
Credit goes to github.com

Skip to content

New caching #1454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Apr 22, 2013
Merged

New caching #1454

merged 25 commits into from
Apr 22, 2013

Conversation

vmg
Copy link
Member

@vmg vmg commented Apr 3, 2013

New cache code. Did this during the flight.

As long promised, here's an unified ODB object/parsed object cache that transparently upgrades between the two of them. It uses khash as the backend.

TODO:

  • Settings setter/getters for cache tuneable parameters
  • Expiration of old entries (lru? random?)
  • Size calculation for parsed objects

wat

git__free(cache->nodes);
/* do not infinite loop if there's not enough entries to evict */
if (evict_count > kh_size(cache->map))
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we try to evict as many as we can (say, evict_count = kh_size(cache->map)) instead of just ignoring it?

@arrbee
Copy link
Member

arrbee commented Apr 10, 2013

So, I know it has always been this way, but I don't love it that the git_odb always contains an initialized git_cache even though it will mostly use the git_cache from the containing git_repository. I can imagine a number of alternatives, such as lazy allocation of the cache if the owner of the ODB is not set or what have you, and I'm sure you can think of them, too. If I step back from it, I guess we're only talking a fairly small memory savings (and one less git_mutex_init) so from that perspective it's probably just a minor optimization that's not worth it. Just doesn't feel as clean as it could be.

@arrbee
Copy link
Member

arrbee commented Apr 10, 2013

Also along the things of things that are not introduced by this PR but probably could be addressed, git_repository_odb__weakptr does not seem thread safe, because it calls GIT_REFCOUNT_OWN after the repo->_odb has already been set. Probably the pointer to the newly opened ODB should be stored on the stack until the owner has been set, and then swapped in to the _odb pointer (preferably with something like __sync_val_compare_and_swap if we want to go there, although I think that's just protecting against a memory leak so probably not so critical).

@arrbee
Copy link
Member

arrbee commented Apr 10, 2013

Anyhow, let me get back to the actual content of this PR... BTW, so far it is looking awesome! I really like how smooth the cache usage is once you push to notion of RAW vs PARSED entities into the cache. The object.c code is nice to follow.

for (i = 0; i < 20; ++i)
h = (h << 5) - h + oid->id[i];
khint_t h;
memcpy(&h, oid, sizeof(khint_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much better. Why do you put up with me? 😁

@arrbee
Copy link
Member

arrbee commented Apr 11, 2013

Reading this all over, what do you think about putting a callback void (*free_object)(void *self); into the git_cached_obj and have that set explicitly by git_object__from_odb_object() (or copied from the git_objects_table) and new_odb_object()? I think we could remove git_object__free() completely and replace the switch statement in git_cached_obj_decref(). My main concern is that this is a slippery slope for other such callbacks. If you think this is interesting and are too busy, I can submit another PR with the proposal...

@arrbee
Copy link
Member

arrbee commented Apr 11, 2013

I've been trying to figure out the best way for me to add value while reviewing this PR. The core logic seems fine and until we introduce cache eviction, it seems to work fine and be pretty straightforward.

I'm somewhat concerned about thread safety, so I decided that I'd actually write some more tests to exercise the cache and to try loading objects from multiple threads. I have some stuff in progress and I thought I was seeing threading issues, but now I think I may just have some bugs in my test code. I'm still working on it and I'll try to push something tomorrow to help with more caching test code.

@arrbee
Copy link
Member

arrbee commented Apr 11, 2013

Hmm, no, I think these really are threading issues, although not coming quite from where I would expect them to arise. Let me poke around a little more to make sure I'm not being stupid, then I'll post the tests...

@vmg
Copy link
Member Author

vmg commented Apr 22, 2013

Man, this branch is getting outta control. Gonna try to merge it today even if it's not fully fleshed out.

vmg and others added 16 commits April 22, 2013 16:50
This uses the odb object accessors so we can change the internals
more easily...
Add a git_cache_set_max_object_size method that does more checking
around setting the max object size.  Also add a git_cache_size to
read the number of objects currently in the cache.  This makes it
easier to write tests.
When I was writing threading tests for the new cache, the main
error I kept running into was a pack file having it's content
unmapped underneath the running thread.  This adds a lock around
the routines that map and unmap the pack data so that threads can
effectively reload the data when they need it.

This also required reworking the error handling paths in a couple
places in the code which I tried to make consistent.
This adds some basic tests for the oidmap just to make sure that
collisions, etc. are dealt with correctly.

This also adds some tests for the new caching that check if items
are inserted (or not inserted) properly into the cache, and that
the cache can hold up in a multithreaded environment without error.
This adds create and free callback to the git_objects_table so
that more of the creation and destruction of objects can be table
driven instead of using switch statements.  This also makes the
semantics of certain object creation functions consistent so that
we can make better use of function pointers.  This also fixes a
theoretical error case where an object allocation fails and we
end up storing NULL into the cache.
This unifies the object parse functions into one signature that
takes an odb_object.
arrbee and others added 9 commits April 22, 2013 16:52
This builds on the earlier thread safety work to make it so that
setting the odb, index, refdb, or config for a repository is done
in a threadsafe manner with minimized locking time.  This is done
by adding a lock to the repository object and using it to guard
the assignment of the above listed pointers.  The lock is only
held to assign the pointer value.

This also contains some minor fixes to the other work with pack
files to reduce the time that locks are being held to and fix an
apparently memory leak.
This removes the lock from the repository object and changes the
internals to use the new atomic git__compare_and_swap to update
the _odb, _config, _index, and _refdb variables in a threadsafe
manner.
The indexer was creating a packfile object separately from the
code in pack.c which was a problem since I put a call to
git_mutex_init into just pack.c.  This commit updates the pack
function for creating a new pack object (i.e. git_packfile_check())
so that it can be used in both places and then makes indexer.c
use the shared initialization routine.

There are also a few minor formatting and warning message fixes.
Rename git_packfile_check to git_packfile_alloc since it is now
being used more in that capacity.  Fix the various places that use
it.  Consolidate some repeated code in odb_pack.c related to the
allocation of a new pack_backend.
@vmg
Copy link
Member Author

vmg commented Apr 22, 2013

...And this looks just dandy. Shipping now and iterating on this. Thank you @arrbee for all the help!

vmg pushed a commit that referenced this pull request Apr 22, 2013
@vmg vmg merged commit d08dd72 into development Apr 22, 2013
phatblat pushed a commit to phatblat/libgit2 that referenced this pull request Sep 13, 2014
@nulltoken nulltoken deleted the vmg/new-cache branch November 8, 2014 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants