-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New caching #1454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New caching #1454
Conversation
git__free(cache->nodes); | ||
/* do not infinite loop if there's not enough entries to evict */ | ||
if (evict_count > kh_size(cache->map)) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we try to evict as many as we can (say, evict_count = kh_size(cache->map)
) instead of just ignoring it?
So, I know it has always been this way, but I don't love it that the |
Also along the things of things that are not introduced by this PR but probably could be addressed, |
Anyhow, let me get back to the actual content of this PR... BTW, so far it is looking awesome! I really like how smooth the cache usage is once you push to notion of RAW vs PARSED entities into the cache. The |
for (i = 0; i < 20; ++i) | ||
h = (h << 5) - h + oid->id[i]; | ||
khint_t h; | ||
memcpy(&h, oid, sizeof(khint_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much better. Why do you put up with me? 😁
Reading this all over, what do you think about putting a callback |
I've been trying to figure out the best way for me to add value while reviewing this PR. The core logic seems fine and until we introduce cache eviction, it seems to work fine and be pretty straightforward. I'm somewhat concerned about thread safety, so I decided that I'd actually write some more tests to exercise the cache and to try loading objects from multiple threads. I have some stuff in progress and I thought I was seeing threading issues, but now I think I may just have some bugs in my test code. I'm still working on it and I'll try to push something tomorrow to help with more caching test code. |
Hmm, no, I think these really are threading issues, although not coming quite from where I would expect them to arise. Let me poke around a little more to make sure I'm not being stupid, then I'll post the tests... |
Man, this branch is getting outta control. Gonna try to merge it today even if it's not fully fleshed out. |
This uses the odb object accessors so we can change the internals more easily...
Add a git_cache_set_max_object_size method that does more checking around setting the max object size. Also add a git_cache_size to read the number of objects currently in the cache. This makes it easier to write tests.
When I was writing threading tests for the new cache, the main error I kept running into was a pack file having it's content unmapped underneath the running thread. This adds a lock around the routines that map and unmap the pack data so that threads can effectively reload the data when they need it. This also required reworking the error handling paths in a couple places in the code which I tried to make consistent.
This adds some basic tests for the oidmap just to make sure that collisions, etc. are dealt with correctly. This also adds some tests for the new caching that check if items are inserted (or not inserted) properly into the cache, and that the cache can hold up in a multithreaded environment without error.
This adds create and free callback to the git_objects_table so that more of the creation and destruction of objects can be table driven instead of using switch statements. This also makes the semantics of certain object creation functions consistent so that we can make better use of function pointers. This also fixes a theoretical error case where an object allocation fails and we end up storing NULL into the cache.
This unifies the object parse functions into one signature that takes an odb_object.
This builds on the earlier thread safety work to make it so that setting the odb, index, refdb, or config for a repository is done in a threadsafe manner with minimized locking time. This is done by adding a lock to the repository object and using it to guard the assignment of the above listed pointers. The lock is only held to assign the pointer value. This also contains some minor fixes to the other work with pack files to reduce the time that locks are being held to and fix an apparently memory leak.
This removes the lock from the repository object and changes the internals to use the new atomic git__compare_and_swap to update the _odb, _config, _index, and _refdb variables in a threadsafe manner.
The indexer was creating a packfile object separately from the code in pack.c which was a problem since I put a call to git_mutex_init into just pack.c. This commit updates the pack function for creating a new pack object (i.e. git_packfile_check()) so that it can be used in both places and then makes indexer.c use the shared initialization routine. There are also a few minor formatting and warning message fixes.
Rename git_packfile_check to git_packfile_alloc since it is now being used more in that capacity. Fix the various places that use it. Consolidate some repeated code in odb_pack.c related to the allocation of a new pack_backend.
...And this looks just dandy. Shipping now and iterating on this. Thank you @arrbee for all the help! |
New cache code. Did this during the flight.
As long promised, here's an unified ODB object/parsed object cache that transparently upgrades between the two of them. It uses
khash
as the backend.TODO:
wat