New caching #1454

vmg · 2013-04-03T13:22:59Z

New cache code. Did this during the flight.

As long promised, here's an unified ODB object/parsed object cache that transparently upgrades between the two of them. It uses khash as the backend.

TODO:

Settings setter/getters for cache tuneable parameters
Expiration of old entries (lru? random?)
Size calculation for parsed objects

wat

carlosmn · 2013-04-05T11:20:22Z

src/cache.c

-	git__free(cache->nodes);
+	/* do not infinite loop if there's not enough entries to evict  */
+	if (evict_count > kh_size(cache->map))
+		return;


Shouldn't we try to evict as many as we can (say, evict_count = kh_size(cache->map)) instead of just ignoring it?

arrbee · 2013-04-10T22:51:52Z

So, I know it has always been this way, but I don't love it that the git_odb always contains an initialized git_cache even though it will mostly use the git_cache from the containing git_repository. I can imagine a number of alternatives, such as lazy allocation of the cache if the owner of the ODB is not set or what have you, and I'm sure you can think of them, too. If I step back from it, I guess we're only talking a fairly small memory savings (and one less git_mutex_init) so from that perspective it's probably just a minor optimization that's not worth it. Just doesn't feel as clean as it could be.

arrbee · 2013-04-10T23:02:15Z

Also along the things of things that are not introduced by this PR but probably could be addressed, git_repository_odb__weakptr does not seem thread safe, because it calls GIT_REFCOUNT_OWN after the repo->_odb has already been set. Probably the pointer to the newly opened ODB should be stored on the stack until the owner has been set, and then swapped in to the _odb pointer (preferably with something like __sync_val_compare_and_swap if we want to go there, although I think that's just protecting against a memory leak so probably not so critical).

arrbee · 2013-04-10T23:04:08Z

Anyhow, let me get back to the actual content of this PR... BTW, so far it is looking awesome! I really like how smooth the cache usage is once you push to notion of RAW vs PARSED entities into the cache. The object.c code is nice to follow.

arrbee · 2013-04-10T23:15:08Z

src/oidmap.h

-	for (i = 0; i < 20; ++i)
-		h = (h << 5) - h + oid->id[i];
+	khint_t h;
+	memcpy(&h, oid, sizeof(khint_t));


So much better. Why do you put up with me? 😁

arrbee · 2013-04-11T00:47:30Z

Reading this all over, what do you think about putting a callback void (*free_object)(void *self); into the git_cached_obj and have that set explicitly by git_object__from_odb_object() (or copied from the git_objects_table) and new_odb_object()? I think we could remove git_object__free() completely and replace the switch statement in git_cached_obj_decref(). My main concern is that this is a slippery slope for other such callbacks. If you think this is interesting and are too busy, I can submit another PR with the proposal...

arrbee · 2013-04-11T23:44:37Z

I've been trying to figure out the best way for me to add value while reviewing this PR. The core logic seems fine and until we introduce cache eviction, it seems to work fine and be pretty straightforward.

I'm somewhat concerned about thread safety, so I decided that I'd actually write some more tests to exercise the cache and to try loading objects from multiple threads. I have some stuff in progress and I thought I was seeing threading issues, but now I think I may just have some bugs in my test code. I'm still working on it and I'll try to push something tomorrow to help with more caching test code.

arrbee · 2013-04-11T23:59:59Z

Hmm, no, I think these really are threading issues, although not coming quite from where I would expect them to arise. Let me poke around a little more to make sure I'm not being stupid, then I'll post the tests...

vmg · 2013-04-22T14:45:27Z

Man, this branch is getting outta control. Gonna try to merge it today even if it's not fully fleshed out.

This uses the odb object accessors so we can change the internals more easily...

Add a git_cache_set_max_object_size method that does more checking around setting the max object size. Also add a git_cache_size to read the number of objects currently in the cache. This makes it easier to write tests.

When I was writing threading tests for the new cache, the main error I kept running into was a pack file having it's content unmapped underneath the running thread. This adds a lock around the routines that map and unmap the pack data so that threads can effectively reload the data when they need it. This also required reworking the error handling paths in a couple places in the code which I tried to make consistent.

This adds some basic tests for the oidmap just to make sure that collisions, etc. are dealt with correctly. This also adds some tests for the new caching that check if items are inserted (or not inserted) properly into the cache, and that the cache can hold up in a multithreaded environment without error.

This adds create and free callback to the git_objects_table so that more of the creation and destruction of objects can be table driven instead of using switch statements. This also makes the semantics of certain object creation functions consistent so that we can make better use of function pointers. This also fixes a theoretical error case where an object allocation fails and we end up storing NULL into the cache.

This unifies the object parse functions into one signature that takes an odb_object.

This builds on the earlier thread safety work to make it so that setting the odb, index, refdb, or config for a repository is done in a threadsafe manner with minimized locking time. This is done by adding a lock to the repository object and using it to guard the assignment of the above listed pointers. The lock is only held to assign the pointer value. This also contains some minor fixes to the other work with pack files to reduce the time that locks are being held to and fix an apparently memory leak.

This removes the lock from the repository object and changes the internals to use the new atomic git__compare_and_swap to update the _odb, _config, _index, and _refdb variables in a threadsafe manner.

The indexer was creating a packfile object separately from the code in pack.c which was a problem since I put a call to git_mutex_init into just pack.c. This commit updates the pack function for creating a new pack object (i.e. git_packfile_check()) so that it can be used in both places and then makes indexer.c use the shared initialization routine. There are also a few minor formatting and warning message fixes.

Rename git_packfile_check to git_packfile_alloc since it is now being used more in that capacity. Fix the various places that use it. Consolidate some repeated code in odb_pack.c related to the allocation of a new pack_backend.

vmg · 2013-04-22T15:07:15Z

...And this looks just dandy. Shipping now and iterating on this. Thank you @arrbee for all the help!

New caching

carlosmn reviewed Apr 5, 2013
View reviewed changes

arrbee mentioned this pull request Apr 10, 2013

Beta/Alpha version release? #1336

Closed

arrbee reviewed Apr 10, 2013
View reviewed changes

vmg and others added 16 commits April 22, 2013 16:50

lol this worked first try wtf

5df1842

Per-object filtering

6b90e24

Random eviction

c4e91d4

What has science done.

8842c75

Duplicated type object

cf7850a

Per-object max size

064236c

Some stats

d9d423e

No longer needed

e16e268

Clear the cache when there are too many items to expire

e183e37

Global option setters

ee12272

Use git_odb_object_data/_size whereever possible

badd85a

This uses the odb object accessors so we can change the internals more easily...

Add range checking around cache opts

b12b72e

Add a git_cache_set_max_object_size method that does more checking around setting the max object size. Also add a git_cache_size to read the number of objects currently in the cache. This makes it easier to write tests.

Simplify object table parse functions

3f27127

This unifies the object parse functions into one signature that takes an odb_object.

arrbee and others added 9 commits April 22, 2013 16:52

clean up tree pointer casting

116bbdf

Add git__compare_and_swap and use it

e976b56

This removes the lock from the repository object and changes the internals to use the new atomic git__compare_and_swap to update the _odb, _config, _index, and _refdb variables in a threadsafe manner.

Fixes for Windows cas/threading stuff

c628918

Consolidate packfile allocation further

5d2d21e

Rename git_packfile_check to git_packfile_alloc since it is now being used more in that capacity. Fix the various places that use it. Consolidate some repeated code in odb_pack.c related to the allocation of a new pack_backend.

tests: Cleanup commit parse testing code

865e2dd

tests: Do not warn for unused variable

cf9709b

cache: Max cache size, and evict when the cache fills up

d877159

vmg pushed a commit that referenced this pull request Apr 22, 2013

Merge pull request #1454 from libgit2/vmg/new-cache

d08dd72

New caching

vmg merged commit d08dd72 into development Apr 22, 2013

phatblat pushed a commit to phatblat/libgit2 that referenced this pull request Sep 13, 2014

Merge pull request libgit2#1454 from libgit2/vmg/new-cache

586b9a5

New caching

nulltoken deleted the vmg/new-cache branch November 8, 2014 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New caching #1454

New caching #1454

Uh oh!

vmg commented Apr 3, 2013

Uh oh!

carlosmn Apr 5, 2013

Uh oh!

arrbee commented Apr 10, 2013

Uh oh!

arrbee commented Apr 10, 2013

Uh oh!

arrbee commented Apr 10, 2013

Uh oh!

arrbee Apr 10, 2013

Uh oh!

arrbee commented Apr 11, 2013

Uh oh!

arrbee commented Apr 11, 2013

Uh oh!

arrbee commented Apr 11, 2013

Uh oh!

vmg commented Apr 22, 2013

Uh oh!

vmg commented Apr 22, 2013

Uh oh!

Uh oh!

New caching #1454

New caching #1454

Uh oh!

Conversation

vmg commented Apr 3, 2013

Uh oh!

carlosmn Apr 5, 2013

Choose a reason for hiding this comment

Uh oh!

arrbee commented Apr 10, 2013

Uh oh!

arrbee commented Apr 10, 2013

Uh oh!

arrbee commented Apr 10, 2013

Uh oh!

arrbee Apr 10, 2013

Choose a reason for hiding this comment

Uh oh!

arrbee commented Apr 11, 2013

Uh oh!

arrbee commented Apr 11, 2013

Uh oh!

arrbee commented Apr 11, 2013

Uh oh!

vmg commented Apr 22, 2013

Uh oh!

vmg commented Apr 22, 2013

Uh oh!

Uh oh!