Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@prav0761
Copy link
Contributor

@prav0761 prav0761 commented Oct 13, 2025

Setup:

  • Username: @prav0761
  • Date: 2025-10-13
  • Commit: 4a2f9a0
  • OS: macOS Ventura
  • Python 3.11
  • Result: Everything worked successfully
  1. Begin your journey here. worked perfectly with datasets downloading and data prep.
  2. BM25 Baselines for MS MARCO Passage Ranking in Anserini chapter worked perfectly, just few changes
    2.1 I had a few issues. The index wouldn't build because of a memory issue, so I had to reduce the threads from 9 to 4, and it worked perfectly
bin/run.sh io.anserini.index.IndexCollection \
  -collection JsonCollection \
  -input collections/msmarco-passage/collection_jsonl \
  -index indexes/msmarco-passage/lucene-index-msmarco \
  -generator DefaultLuceneDocumentGenerator \
  -threads 9 -storePositions -storeDocvectors -storeRaw

2.2 The retrieval part didnt work because of memory issue as well, so had to change the run.sh command to

java -cp `ls target/*-fatjar.jar` -Xms512M -Xmx8G --add-modules jdk.incubator.vector $@

Where I have changed to Xmx8G from Xmx192G
2.3 After these changes, retrieval worked perfectly and all the scores matched the expected as well

  1. Dense Retrieval for MS MARCO Passage Ranking in Anserini worked successfully, and scores matched the expected score.

@prav0761 prav0761 closed this Oct 13, 2025
@prav0761 prav0761 reopened this Oct 13, 2025
@prav0761 prav0761 changed the title Add reproduction log entry for MS MARCO Add reproduction log entry for Praveen[Anserini] Oct 13, 2025
@lintool lintool merged commit d6d8a3e into castorini:master Oct 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants