Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Machine:
Was: MacOS Sequoia 15.6.1
Now: UWaterloo student linux server
openjdk 21
Python 3.13.2
The data preparation process worked well, however, in the instructions that compile anserini, when running MS MARCO en v1.5, the downloading process was interrupted by accident (on my fault, since I didn't expect it to be long and didn't apply the caffeinate command). That said, the index file exists, but is incomplete. When I tried to run it again, it says the index file exists, skipping the download, and then starts to decompress the file, leading to errors. I was trying to find out which file it is, delete it and then re-download it, but I couldn't find the specific file. I might not understand it correctly, but I would suggest adding a detection section to check the integrity of the downloaded files instead of directly using them.
When running maven, the progress seemed ok but the compiled anserini didn't give the desired result. For instance, in the indexing section, it gave only 1G index data instead of the expected 4.3G. I "git back" to previous versions, but still not working even when I re-performed everything, including the data. I thought it was due to the OS, so I switched to linux from MacOS. To be consistent with the program version, I also re-downloaded java jdk 21 and configured the setups, which took some time.
The programs ran well after switching to the Linux server, except for the CPU runtime limits, especially for indexing and retrieval. I remembered that for retrieval, I changed the threads in order to pass the time requirements and found out 4 is too much, 1 is too slow, and 2 is good enough.