Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

JMMackenzie
Copy link
Contributor

This adds a tool for merging an index with multiple shards into an index with a single shard.

Help is required to test (write tests) and verify the tool.

Copy link

codecov bot commented Apr 8, 2025

Codecov Report

Attention: Patch coverage is 0% with 17 lines in your changes missing coverage. Please review.

Project coverage is 70.02%. Comparing base (1af285b) to head (a4574d4).

Files with missing lines Patch % Lines
src/main/java/io/anserini/index/MergeShards.java 0.00% 17 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2776      +/-   ##
============================================
- Coverage     70.12%   70.02%   -0.10%     
  Complexity     1316     1316              
============================================
  Files           186      187       +1     
  Lines         11922    11939      +17     
  Branches       1414     1415       +1     
============================================
  Hits           8360     8360              
- Misses         3035     3052      +17     
  Partials        527      527              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vincent-4
Copy link
Member

@JMMackenzie I will take a look and test to get this merged soon. @b8zhong has added some tests and I will correct, update as necessary. Thanks for this again 👍
What indexes are you using with it? It might also help to have some reference.

@JMMackenzie
Copy link
Contributor Author

I usually use Anserini to build CIFF indexes that I import to other tools like PISA, but I almost always forget to use -optimize when I index. So this tool fixes that.

You could try it on MSMARCO-v1 and MSMARCO-v2 (passages) if you wanted something large, not sure how long it would take for v2 though.

Thanks for your help!

@lintool lintool self-requested a review April 15, 2025 12:14
@lintool
Copy link
Member

lintool commented Apr 15, 2025

Actually, I approved before noticing broken tests. Should probably fix before merging...

@JMMackenzie
Copy link
Contributor Author

I'm not sure why these tests fail - I can't seem to generate any meaningful output. It seems the VM is crashing (I get "The forked VM terminated without properly saying goodbye" from surefire). Someone with more experience may be able to find a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants