Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@henry4516
Copy link
Contributor

Machine:
Was: MacOS Sequoia 15.6.1
Now: UWaterloo student linux server
openjdk 21
Python 3.13.2

The data preparation process worked well, however, in the instructions that compile anserini, when running MS MARCO en v1.5, the downloading process was interrupted by accident (on my fault, since I didn't expect it to be long and didn't apply the caffeinate command). That said, the index file exists, but is incomplete. When I tried to run it again, it says the index file exists, skipping the download, and then starts to decompress the file, leading to errors. I was trying to find out which file it is, delete it and then re-download it, but I couldn't find the specific file. I might not understand it correctly, but I would suggest adding a detection section to check the integrity of the downloaded files instead of directly using them.
When running maven, the progress seemed ok but the compiled anserini didn't give the desired result. For instance, in the indexing section, it gave only 1G index data instead of the expected 4.3G. I "git back" to previous versions, but still not working even when I re-performed everything, including the data. I thought it was due to the OS, so I switched to linux from MacOS. To be consistent with the program version, I also re-downloaded java jdk 21 and configured the setups, which took some time.

The programs ran well after switching to the Linux server, except for the CPU runtime limits, especially for indexing and retrieval. I remembered that for retrieval, I changed the threads in order to pass the time requirements and found out 4 is too much, 1 is too slow, and 2 is good enough.

@lintool lintool self-requested a review October 15, 2025 23:37
@lintool lintool merged commit e011b38 into castorini:master Oct 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants