Added Anserini logs #3000

henry4516 · 2025-10-15T23:15:44Z

Machine:
Was: MacOS Sequoia 15.6.1
Now: UWaterloo student linux server
openjdk 21
Python 3.13.2

The data preparation process worked well, however, in the instructions that compile anserini, when running MS MARCO en v1.5, the downloading process was interrupted by accident (on my fault, since I didn't expect it to be long and didn't apply the caffeinate command). That said, the index file exists, but is incomplete. When I tried to run it again, it says the index file exists, skipping the download, and then starts to decompress the file, leading to errors. I was trying to find out which file it is, delete it and then re-download it, but I couldn't find the specific file. I might not understand it correctly, but I would suggest adding a detection section to check the integrity of the downloaded files instead of directly using them.
When running maven, the progress seemed ok but the compiled anserini didn't give the desired result. For instance, in the indexing section, it gave only 1G index data instead of the expected 4.3G. I "git back" to previous versions, but still not working even when I re-performed everything, including the data. I thought it was due to the OS, so I switched to linux from MacOS. To be consistent with the program version, I also re-downloaded java jdk 21 and configured the setups, which took some time.

The programs ran well after switching to the Linux server, except for the CPU runtime limits, especially for indexing and retrieval. I remembered that for retrieval, I changed the threads in order to pass the time requirements and found out 4 is too much, 1 is too slow, and 2 is good enough.

Added Anserini logs

4a0abf5

lintool self-requested a review October 15, 2025 23:37

lintool approved these changes Oct 15, 2025

View reviewed changes

lintool merged commit e011b38 into castorini:master Oct 15, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added Anserini logs #3000

Added Anserini logs #3000

Uh oh!

henry4516 commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added Anserini logs #3000

Added Anserini logs #3000

Uh oh!

Conversation

henry4516 commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants