diff --git a/README.md b/README.md
index 888a0637e1..dbc18854a4 100644
--- a/README.md
+++ b/README.md
@@ -533,6 +533,8 @@ Substitute the appropriate `$MODEL` from the table below.
 
 ### BRIGHT Regressions
 
+BRIGHT is a retrieval benchmark described [here](https://arxiv.org/abs/2407.12883).
+
 | Corpus              |                            Baselines                             |
 |---------------------|:----------------------------------------------------------------:|
 | **StackExchange**   |                                                                  |
@@ -553,11 +555,11 @@ Substitute the appropriate `$MODEL` from the table below.
 
 ### Available Corpora for Download
 
-| Corpora                                                                                                                                |   Size | Checksum                           |
-|:---------------------------------------------------------------------------------------------------------------------------------------|-------:|:-----------------------------------|
-| [Post-Processed](https://huggingface.co/datasets/castorini/collections-bright/resolve/main/bright-corpus.tar)                                        | 297 MB | `d8c829f0e4468a8ce62768b6a1162158` |
+| Corpora                                                                                                       |   Size | Checksum                           |
+|:--------------------------------------------------------------------------------------------------------------|-------:|:-----------------------------------|
+| [Post-Processed](https://huggingface.co/datasets/castorini/collections-bright/resolve/main/bright-corpus.tar) | 297 MB | `d8c829f0e4468a8ce62768b6a1162158` |
 
-The [BRIGHT](https://arxiv.org/abs/2407.12883) corpus used here was processed from Hugging Face with these [scripts](https://github.com/ielab/llm-rankers/tree/main/Rank-R1/bright). 
+The BRIGHT corpora above were processed from Hugging Face with [these scripts](https://github.com/ielab/llm-rankers/tree/main/Rank-R1/bright).
 
 <hr/>
 
diff --git a/docs/fatjar-regressions/fatjar-regressions-v0.36.1.md b/docs/fatjar-regressions/fatjar-regressions-v0.36.1.md
index 5a88f935a1..0f3d31d23d 100644
--- a/docs/fatjar-regressions/fatjar-regressions-v0.36.1.md
+++ b/docs/fatjar-regressions/fatjar-regressions-v0.36.1.md
@@ -53,7 +53,7 @@ The table below reports effectiveness (dev in terms of RR@100, DL21-DL23, RAGgy
 | BM25 doc (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4)           | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
 | BM25 doc-segmented (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
@@ -270,7 +270,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
 | cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries)      | 0.3647 | 0.6956 | 0.7245 |
 | cohere-embed-english-v3.0 w/ HNSW int8 (cached queries)      | 0.3656 | 0.6955 | 0.7262 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
@@ -440,7 +440,7 @@ The table below reports the effectiveness of the models (nDCG@10):
 | `climate-fever`            | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 |
 | `scifact`                  | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
diff --git a/docs/fatjar-regressions/fatjar-regressions-v0.37.0.md b/docs/fatjar-regressions/fatjar-regressions-v0.37.0.md
index 7e0ee10231..c64736ae8d 100644
--- a/docs/fatjar-regressions/fatjar-regressions-v0.37.0.md
+++ b/docs/fatjar-regressions/fatjar-regressions-v0.37.0.md
@@ -108,7 +108,7 @@ The table below reports effectiveness (dev in terms of RR@100, DL21-DL23, RAGgy
 | BM25 doc (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4)           | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
 | BM25 doc-segmented (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
@@ -278,7 +278,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
 | cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries)      | 0.3647 | 0.6956 | 0.7245 |
 | cohere-embed-english-v3.0 w/ HNSW int8 (cached queries)      | 0.3656 | 0.6955 | 0.7262 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
@@ -451,7 +451,7 @@ The table below reports the effectiveness of the models (nDCG@10):
 | `climate-fever`           | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
 | `scifact`                 | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
diff --git a/docs/fatjar-regressions/fatjar-regressions-v0.38.0.md b/docs/fatjar-regressions/fatjar-regressions-v0.38.0.md
index 753ec17f7b..0caefa4f92 100644
--- a/docs/fatjar-regressions/fatjar-regressions-v0.38.0.md
+++ b/docs/fatjar-regressions/fatjar-regressions-v0.38.0.md
@@ -108,7 +108,7 @@ The table below reports effectiveness (dev in terms of RR@100, DL21-DL23, RAGgy
 | BM25 doc (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4)           | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
 | BM25 doc-segmented (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
@@ -278,7 +278,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
 | cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries)      | 0.3647 | 0.6956 | 0.7245 |
 | cohere-embed-english-v3.0 w/ HNSW int8 (cached queries)      | 0.3656 | 0.6955 | 0.7262 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
@@ -451,7 +451,7 @@ The table below reports the effectiveness of the models (nDCG@10):
 | `climate-fever`           | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
 | `scifact`                 | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
diff --git a/docs/fatjar-regressions/fatjar-regressions-v0.39.0.md b/docs/fatjar-regressions/fatjar-regressions-v0.39.0.md
index 3e4aafc437..495b60a5a5 100644
--- a/docs/fatjar-regressions/fatjar-regressions-v0.39.0.md
+++ b/docs/fatjar-regressions/fatjar-regressions-v0.39.0.md
@@ -148,7 +148,7 @@ The table below reports effectiveness (dev in terms of RR@100, DL21-DL23, RAGgy
 | BM25 doc (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4)           | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
 | BM25 doc-segmented (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
@@ -318,7 +318,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
 | cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries)      | 0.3647 | 0.6956 | 0.7245 |
 | cohere-embed-english-v3.0 w/ HNSW int8 (cached queries)      | 0.3656 | 0.6955 | 0.7262 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
@@ -491,7 +491,7 @@ The table below reports the effectiveness of the models (nDCG@10):
 | `climate-fever`           | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
 | `scifact`                 | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
diff --git a/docs/fatjar-regressions/fatjar-regressions-v1.0.0.md b/docs/fatjar-regressions/fatjar-regressions-v1.0.0.md
index 6c029a6da8..1ac3842de6 100644
--- a/docs/fatjar-regressions/fatjar-regressions-v1.0.0.md
+++ b/docs/fatjar-regressions/fatjar-regressions-v1.0.0.md
@@ -227,7 +227,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
 | cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries)      | 0.3647 | 0.6956 | 0.7245 |
 | cohere-embed-english-v3.0 w/ HNSW int8 (cached queries)      | 0.3656 | 0.6955 | 0.7262 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
@@ -402,7 +402,7 @@ The table below reports the effectiveness of the models (nDCG@10):
 | `climate-fever`           | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
 | `scifact`                 | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
diff --git a/docs/fatjar-regressions/fatjar-regressions-v1.1.0.md b/docs/fatjar-regressions/fatjar-regressions-v1.1.0.md
index 8c940006aa..75bdf1bf16 100644
--- a/docs/fatjar-regressions/fatjar-regressions-v1.1.0.md
+++ b/docs/fatjar-regressions/fatjar-regressions-v1.1.0.md
@@ -229,7 +229,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
 | cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries)      | 0.3647 | 0.6956 | 0.7245 |
 | cohere-embed-english-v3.0 w/ HNSW int8 (cached queries)      | 0.3656 | 0.6955 | 0.7262 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
@@ -400,7 +400,7 @@ The table below reports the effectiveness of the models (nDCG@10):
 | `climate-fever`           | 0.1651 | 0.2129 | 0.2625 | 0.2625 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
 | `scifact`                 | 0.6789 | 0.6647 | 0.7140 | 0.7140 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
diff --git a/docs/fatjar-regressions/fatjar-regressions-v1.1.1.md b/docs/fatjar-regressions/fatjar-regressions-v1.1.1.md
index b1b11e03ad..80946286de 100644
--- a/docs/fatjar-regressions/fatjar-regressions-v1.1.1.md
+++ b/docs/fatjar-regressions/fatjar-regressions-v1.1.1.md
@@ -229,7 +229,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
 | cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries)      | 0.3647 | 0.6956 | 0.7245 |
 | cohere-embed-english-v3.0 w/ HNSW int8 (cached queries)      | 0.3656 | 0.6955 | 0.7262 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
@@ -400,7 +400,7 @@ The table below reports the effectiveness of the models (nDCG@10):
 | `climate-fever`           | 0.1651 | 0.2129 | 0.2625 | 0.2625 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
 | `scifact`                 | 0.6789 | 0.6647 | 0.7140 | 0.7140 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
diff --git a/docs/fatjar-regressions/fatjar-regressions-v1.2.0.md b/docs/fatjar-regressions/fatjar-regressions-v1.2.0.md
index 98cb18380f..25d51d7e58 100644
--- a/docs/fatjar-regressions/fatjar-regressions-v1.2.0.md
+++ b/docs/fatjar-regressions/fatjar-regressions-v1.2.0.md
@@ -37,7 +37,9 @@ Using the [UMBRELA qrels](https://trec-rag.github.io/annoucements/umbrela-qrels/
 | RAG24 Test (UMBRELA): nDCG@100   | 0.2563 |    0.4855     |
 | RAG24 Test (UMBRELA): Recall@100 | 0.1395 |    0.2547     |
 
-See instructions below on how to reproduce these runs; more details can be found in the following paper:
+See instructions below on how to reproduce these runs; more details can be found in the following two papers:
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
 
 > Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
@@ -229,17 +231,75 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
 | cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries)      | 0.3647 | 0.6956 | 0.7245 |
 | cohere-embed-english-v3.0 w/ HNSW int8 (cached queries)      | 0.3656 | 0.6955 | 0.7262 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
 ```
 
+To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.
+
+## MS MARCO V2.1 Segmented Documents
+
+❗ Beware, the (automatically downloaded) indexes for running these experiments take up several hundred GBs.
+
+The MS MARCO V2.1 collections were created for the [TREC RAG Track](https://trec-rag.github.io/).
+There were two variants: the documents corpus and the segmented documents corpus.
+The documents corpus served as the source of the segmented documents corpus, but the segmented documents corpus is the one used in official TREC RAG evaluations.
+The following table reports nDCG@20 scores for various retrieval conditions:
+
+|                                               | RAG 24 UMBRELA | RAG 24 NIST |
+|-----------------------------------------------|:--------------:|:-----------:|
+| baselines                                     |     0.3198     |   0.2809    |
+| SPLADE-v3                                     |     0.5167     |   0.4642    |
+| Arctic-embed-l (`shard00`, HNSW int8 indexes) |     0.3003     |   0.2449    |
+| Arctic-embed-l (`shard01`, HNSW int8 indexes) |     0.2599     |   0.2184    |
+| Arctic-embed-l (`shard02`, HNSW int8 indexes) |     0.2661     |   0.2211    |
+| Arctic-embed-l (`shard03`, HNSW int8 indexes) |     0.2705     |   0.2388    |
+| Arctic-embed-l (`shard04`, HNSW int8 indexes) |     0.2937     |   0.2253    |
+| Arctic-embed-l (`shard05`, HNSW int8 indexes) |     0.2590     |   0.2383    |
+| Arctic-embed-l (`shard06`, HNSW int8 indexes) |     0.2444     |   0.2336    |
+| Arctic-embed-l (`shard07`, HNSW int8 indexes) |     0.2417     |   0.2255    |
+| Arctic-embed-l (`shard08`, HNSW int8 indexes) |     0.2847     |   0.2765    |
+| Arctic-embed-l (`shard09`, HNSW int8 indexes) |     0.2432     |   0.2457    |
+
+The following command will reproduce the above experiments:
+
+```bash
+java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1-doc-segmented
+```
+
+To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.
+
+## MS MARCO V2.1 Documents
+
+❗ Beware, the (automatically downloaded) indexes for running these experiments take up several hundred GBs.
+
+The MS MARCO V2.1 collections were created for the [TREC RAG Track](https://trec-rag.github.io/).
+There were two variants: the documents corpus and the segmented documents corpus.
+The documents corpus served as the source of the segmented documents corpus, but is not otherwise used in any formal evaluations.
+It primarily served development purposes for the TREC 2024 RAG evaluation, where previous qrels from MS MARCO V2 and DL21-DL23 were "projected over" to this corpus.
+
+The table below reports effectiveness (dev in terms of RR@10, DL21-DL23, RAGgy in terms of nDCG@10):
+
+|                    |    dev |   dev2 |   DL21 |   DL22 |   DL23 |  RAGgy |
+|:-------------------|-------:|-------:|-------:|-------:|-------:|-------:|
+| BM25 doc           | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
+| BM25 doc-segmented | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |
+
+The following command will reproduce the above experiments:
+
+```bash
+java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
+```
+
+To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.
+
 ## BEIR
 
 ❗ Beware, the (automatically downloaded) indexes for running these experiments take up several hundred GBs.
 
-Currently, Anserini provides support for the following models:
+Here is a selection of models that are currently suppoted in Anserini:
 
 + Flat = BM25, "flat" bag-of-words baseline
 + MF = BM25, "multifield" bag-of-words baseline
@@ -283,9 +343,41 @@ The table below reports the effectiveness of the models (nDCG@10):
 | `climate-fever`           | 0.1651 | 0.2129 | 0.2625 | 0.2625 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
 | `scifact`                 | 0.6789 | 0.6647 | 0.7140 | 0.7140 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |
 
-The follow command will reproduce the above experiments:
+The following command will reproduce the above experiments:
 
 ```bash
 java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
 ```
 
+To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.
+
+## BRIGHT
+
+BRIGHT is a retrieval benchmark described [here](https://arxiv.org/abs/2407.12883).
+The following table reports nDCG@10 scores for BM25 baselines:
+
+| Corpus             |  BM25  |
+|--------------------|:------:|
+| **StackExchange**  |        |
+| Biology            | 0.1824 |
+| Earth Science      | 0.2791 |
+| Economics          | 0.1645 |
+| Psychology         | 0.1342 |
+| Robotics           | 0.1091 |
+| Stack Overflow     | 0.1626 |
+| Sustainable Living | 0.1613 |
+| **Coding**         |        |
+| LeetCode           | 0.2471 |
+| Pony               | 0.0434 |
+| **Theorems**       |        |
+| AoPS               | 0.0645 |
+| TheoremQA-Q        | 0.0733 |
+| TheoremQA-T        | 0.0214 |
+
+The following command will reproduce the above experiments:
+
+```bash
+java -cp $ANSERINI_JAR io.anserini.reproduce.RunBright
+```
+
+To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.
diff --git a/docs/prebuilt-indexes.md b/docs/prebuilt-indexes.md
index 7abc631690..35ed4fa48d 100644
--- a/docs/prebuilt-indexes.md
+++ b/docs/prebuilt-indexes.md
@@ -1,11 +1,11 @@
 # Anserini: Prebuilt Indexes
 
 Anserini ships with a number of prebuilt indexes.
-This means that various indexes (inverted indexes, HNSW indexes, etc.) for common collections used in NLP and IR research have already been built and just needs to be downloaded (from UWaterloo/Hugging Face servers), which Anserini will handle automatically for you.
+This means that various indexes (inverted indexes, HNSW indexes, etc.) for common collections used in NLP and IR research have already been built and just needs to be downloaded (from UWaterloo and Hugging Face servers), which Anserini will handle automatically for you.
 
-Bindings for the available prebuilt indexes are in [`io.anserini.index.IndexInfo`](https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexInfo.java) and below.
+Bindings for the available prebuilt indexes are in [`io.anserini.index.IndexInfo`](https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexInfo.java) as Java enums.
 For example, if you specify `-index msmarco-v1-passage`, Anserini will know that you mean the Lucene index of the MS MARCO V1 passage corpus.
-It will then download the index from the servers and cache locally.
+It will then download the index from the specified location(s) and cache locally.
 All of this happens automagically!
 
 ## Getting Started
@@ -16,7 +16,7 @@ To download a prebuilt index and view its statistics, you can use the following
 bin/run.sh io.anserini.index.IndexReaderUtils -index cacm -stats
 ```
 
-The output of the command will be:
+The output of the above command will be:
 
 ```
 Index statistics
@@ -27,28 +27,28 @@ unique terms:          14363
 total terms:           320968
 ```
 
-Note that unless the underlying index was built with the `-optimize` option (i.e., merging all index segments into a single segment), `unique_terms` will show -1.
+Note that for inverted indexes, unless the underlying index was built with the `-optimize` option (i.e., merging all index segments into a single segment), `unique_terms` will show -1.
 Nope, that's not a bug.
 
 ## Managing Indexes
 
-The downloaded index will by default be in `~/.cache/pyserini/indexes/`.
-(Yes, `pyserini`; this is so prebuilt indexes from both Pyserini and Anserini can live in the same location.)
+Downloaded indexes are by default stored in `~/.cache/pyserini/indexes/`.
+(Yes, `pyserini`, that's not a bug &mdash; this is so prebuilt indexes can be shared between Pyserini and Anserini.)
 You can specify a custom cache directory by setting the environment variable `$ANSERINI_INDEX_CACHE` or the system property `anserini.index.cache`.
 
 Another helpful tip is to download and manage the indexes by hand.
-All relevant information is stored in [`IndexInfo`](https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexInfo.java).
-For example, `msmarco-v1-passage` can be downloaded from:
+As an example, from [`IndexInfo`](https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexInfo.java) you can see that `msmarco-v1-passage` can be downloaded from:
 
 ```
 https://huggingface.co/datasets/castorini/prebuilt-indexes-msmarco-v1/resolve/main/passage/original/lucene-inverted/tf/lucene-inverted.msmarco-v1-passage.20221004.252b5e.tar.gz
 ```
 
-and has an MD5 checksum of `678876e8c99a89933d553609a0fd8793`.
-You can download, verify, and put anywhere you want.
+The tarball has an MD5 checksum of `678876e8c99a89933d553609a0fd8793`.
+
+You can download, verify, unpack, and put the index anywhere you want.
 With `-index /path/to/index/` you'll get exactly the same output as `-index msmarco-v1-passage`, except now you've got fine-grained control over managing the index.
 
-By manually managing the indexes, you can share indexes between multiple users to conserve space.
+By manually managing indexes, you can share indexes between multiple users to conserve space.
 The schema of the index location in `~/.cache/pyserini/indexes/` is the tarball name (after unpacking), followed by a dot and the checksum, so `msmarco-v1-passage` lives in following location:
 
 ```
@@ -56,7 +56,7 @@ The schema of the index location in `~/.cache/pyserini/indexes/` is the tarball
 ```
 
 You can download the index once, put in a common location, and have each user symlink to the actual index location.
-Source would conform to the schema above, target would be where your index actually resides.
+The source of the symlink would conform to the schema above, and the target of the symlink would be where your index actually resides.
 
 ## Recovering from Partial Downloads
 
@@ -72,7 +72,9 @@ Then start over (e.g., rerun the command you were running before).
 
 Below is a summary of the prebuilt indexes that are currently available.
 
-Note that this page is automatically generated from [this script](../src/test/java/io/anserini/doc/GeneratePrebuiltIndexesDocTest.java), so do not modify this page directly; modify the script instead.
+Note that this page is automatically generated from [this test case](../src/test/java/io/anserini/doc/GeneratePrebuiltIndexesDocTest.java).
+This means that the page is updated with every (successful) build.
+Therefore, do not modify this page directly; modify the test case instead.
 
 ### Lucene Flat Indexes
 <details>
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.md
index c6c66d6671..a239c1ab2e 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.md
index 47d5b3075e..ac73322ada 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.md
index 0bf64a9894..d8b790f33c 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.md
index 52bacea239..d352a65f6f 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.md
index dd44dc4ba3..5561da8e3f 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.md
index a1cafb42b1..a23baa379f 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.md
index 00e57dcd62..0a2c09ca87 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.md
index bfe7c28e66..d6397f358c 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.md
index eab43b5a2d..8d86a8c0d7 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.md
index 4b1448ab75..f0ac550d8f 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.md
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.md
index 2e067d7241..83565c5f5f 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.md
@@ -8,9 +8,11 @@ Instructions for downloading the corpus can be found [here](https://trec-rag.git
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 Here, we cover bag-of-words baselines where each _segment_ in the MS MARCO V2.1 segmented document corpus is treated as a unit of indexing.
 
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.cached.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.cached.md
index 6ab04a15b3..3401815ac1 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.cached.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.cached.md
@@ -14,9 +14,11 @@ In these experiments, we are using cached queries (i.e., cached results of query
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.splade-v3.cached.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.cached.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.onnx.md
index a47cd5f847..5402a41c59 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-nist.splade-v3.onnx.md
@@ -14,9 +14,11 @@ In these experiments, we are using ONNX to perform query encoding on the fly.
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-nist.splade-v3.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.md
index bcae2028b5..9127d00704 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard00`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.md
index e05f4e726f..ed4b85b8ec 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard01`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.md
index 83db3cf107..ecd511ba9e 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard02`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.md
index 1a1f69a202..87922a6677 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard03`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.md
index c9b09c42a9..41f72eced1 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard04`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.md
index 1c0292aeae..cb0a2098e9 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard05`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.md
index a19980cd9c..6e0fb147f4 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard06`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.md
index 5ed6ea18b8..5c087c9444 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard07`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.md
index 58ed39d4bb..08e91963db 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard08`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.md
index 8a06d9baa9..38fd90edd8 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.md
@@ -14,9 +14,11 @@ This page documents experiments for `shard09`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.md
index 8400c96a1d..8db7d77375 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.md
@@ -7,9 +7,11 @@ This corpus was derived from the MS MARCO V2 _segmented_ document corpus and pre
 Instructions for downloading the corpus can be found [here](https://trec-rag.github.io/annoucements/2024-corpus-finalization/).
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 Here, we cover bag-of-words baselines where each _segment_ in the MS MARCO V2.1 segmented document corpus is treated as a unit of indexing.
 
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.splade-v3.cached.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.splade-v3.cached.md
index f638762f79..b14d3996fc 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.splade-v3.cached.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.splade-v3.cached.md
@@ -13,9 +13,11 @@ See the [official SPLADE repo](https://github.com/naver/splade) and the followin
 In these experiments, we are using cached queries (i.e., cached results of query encoding).
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.splade-v3.cached.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.cached.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
diff --git a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.splade-v3.onnx.md b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.splade-v3.onnx.md
index 078248ffd5..234553b12c 100644
--- a/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.splade-v3.onnx.md
+++ b/docs/regressions/regressions-rag24-doc-segmented-test-umbrela.splade-v3.onnx.md
@@ -13,9 +13,11 @@ See the [official SPLADE repo](https://github.com/naver/splade) and the followin
 In these experiments, we are using ONNX to perform query encoding on the fly.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/rag24-doc-segmented-test-umbrela.splade-v3.onnx.yaml).
 Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
diff --git a/src/main/java/io/anserini/reproduce/RunMsMarco.java b/src/main/java/io/anserini/reproduce/RunMsMarco.java
index 0ab5faa1a6..d8b72c2de1 100644
--- a/src/main/java/io/anserini/reproduce/RunMsMarco.java
+++ b/src/main/java/io/anserini/reproduce/RunMsMarco.java
@@ -16,20 +16,13 @@
 
 package io.anserini.reproduce;
 
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-
+import io.anserini.reproduce.RunRepro.TrecEvalMetricDefinitions;
 import org.kohsuke.args4j.CmdLineException;
 import org.kohsuke.args4j.CmdLineParser;
 import org.kohsuke.args4j.Option;
 import org.kohsuke.args4j.ParserProperties;
 
-import io.anserini.reproduce.RunRepro.TrecEvalMetricDefinitions;
+import java.util.*;
 
 public class RunMsMarco {
   public static class Args extends RunRepro.Args {
diff --git a/src/main/java/io/anserini/reproduce/RunRepro.java b/src/main/java/io/anserini/reproduce/RunRepro.java
index 367848b560..545c071052 100644
--- a/src/main/java/io/anserini/reproduce/RunRepro.java
+++ b/src/main/java/io/anserini/reproduce/RunRepro.java
@@ -16,17 +16,22 @@
 
 package io.anserini.reproduce;
 
-import java.io.IOException;
-import java.io.InputStream;
-import java.net.URISyntaxException;
-import java.util.*;
-import java.io.File;
-
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.fasterxml.jackson.dataformat.yaml.YAMLFactory;
+import org.apache.commons.lang3.time.DurationFormatUtils;
 import org.kohsuke.args4j.Option;
 
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URISyntaxException;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+
 public class RunRepro {
   // ANSI escape code for red text
   private static final String RED = "\u001B[31m";
@@ -76,6 +81,7 @@ public void run() throws IOException, InterruptedException, URISyntaxException {
     ProcessBuilder pb;
     Process process;
 
+    final long start = System.nanoTime();
     for (Condition condition : config.conditions) {
       System.out.printf("# Running condition \"%s\": %s \n%n", condition.name, condition.display);
       for (Topic topic : condition.topics) {
@@ -168,6 +174,9 @@ public void run() throws IOException, InterruptedException, URISyntaxException {
         }
       }
     }
+
+    final long durationMillis = TimeUnit.MILLISECONDS.convert(System.nanoTime() - start, TimeUnit.NANOSECONDS);
+    System.out.println("Total run time: " + DurationFormatUtils.formatDuration(durationMillis, "HH:mm:ss"));
   }
 
   public static class Config {
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.template
index 2f8a97566b..8bd9643fb2 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.template
index 43e4af2818..4a869cb32c 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.template
index 135d7922a7..f6a8364fd5 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.template
index 8c829082dc..5b7493fddf 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.template
index 5a3197ba63..5beb8e3422 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.template
index ae09073308..e1074762a8 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.template
index 1f9f445606..1fad86b622 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.template
index 283a96ba43..fed68a9f62 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.template
index d8136da6a7..4f194f8fab 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.template
index 9f65212af8..dc275579e5 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.template
@@ -15,9 +15,11 @@ In these experiments, we are performing query inference "on-the-fly" with ONNX,
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.cached.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.cached.template
index 5b8c6e76ca..75cfd8833e 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.cached.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.cached.template
@@ -14,9 +14,11 @@ In these experiments, we are using cached queries (i.e., cached results of query
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.onnx.template
index 273daceba6..f0b7cdf7d8 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.splade-v3.onnx.template
@@ -14,9 +14,11 @@ In these experiments, we are using ONNX to perform query encoding on the fly.
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.template
index 4b44548b79..70eb223ddf 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-nist.template
@@ -8,9 +8,11 @@ Instructions for downloading the corpus can be found [here](https://trec-rag.git
 
 Evaluation uses qrels over 89 topics from the TREC 2024 RAG Track test set.
 These qrels represent manual relevance judgments from NIST assessors, contrasted with automatically generated UMBRELA judgments.
-See the following paper for more details:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 Here, we cover bag-of-words baselines where each _segment_ in the MS MARCO V2.1 segmented document corpus is treated as a unit of indexing.
 
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.template
index 1fc4a1864e..f07f244787 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard00`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.template
index 417ff3778f..b825277b1a 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard01`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.template
index d04a8e343f..bac89b779c 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard02`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.template
index 3589697980..20a1c87fc7 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard03`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.template
index be65265c22..7f213216e8 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard04`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.template
index 6a64707b3e..0e8399632b 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard05`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.template
index c249a6837e..19b28c2f2a 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard06`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.template
index 59dc878d48..f4c66c1407 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard07`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.template
index 6d202ea8d6..96e8d36380 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard08`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.template
index 0835e95d15..f261ea13ac 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.template
@@ -14,9 +14,11 @@ This page documents experiments for `shard09`; we expect the corpus to be in `ms
 In these experiments, we are performing query inference "on-the-fly" with ONNX, using flat vector indexes.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.cached.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.cached.template
index 4316c41ab8..cb8d36543a 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.cached.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.cached.template
@@ -13,9 +13,11 @@ See the [official SPLADE repo](https://github.com/naver/splade) and the followin
 In these experiments, we are using cached queries (i.e., cached results of query encoding).
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.onnx.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.onnx.template
index 7dce21ce30..c84209b1a4 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.onnx.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.splade-v3.onnx.template
@@ -13,9 +13,11 @@ See the [official SPLADE repo](https://github.com/naver/splade) and the followin
 In these experiments, we are using ONNX to perform query encoding on the fly.
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 The exact configurations for these regressions are stored in [this YAML file](${yaml}).
 Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
diff --git a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.template b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.template
index 1a41df730e..31cffb9701 100644
--- a/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.template
+++ b/src/main/resources/docgen/templates/rag24-doc-segmented-test-umbrela.template
@@ -7,9 +7,11 @@ This corpus was derived from the MS MARCO V2 _segmented_ document corpus and pre
 Instructions for downloading the corpus can be found [here](https://trec-rag.github.io/annoucements/2024-corpus-finalization/).
 
 Evaluation uses (automatically generated) UMBRELA qrels over all 301 topics from the TREC 2024 RAG Track test set.
-UMBRELA is described in the following paper:
+More details can be found in the following two papers:
 
-> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA. _Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)_, 2025.
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.
+
+> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.
 
 Here, we cover bag-of-words baselines where each _segment_ in the MS MARCO V2.1 segmented document corpus is treated as a unit of indexing.
 
diff --git a/src/main/resources/reproduce/bright.yaml b/src/main/resources/reproduce/bright.yaml
index e9d322928e..a7445a24b7 100644
--- a/src/main/resources/reproduce/bright.yaml
+++ b/src/main/resources/reproduce/bright.yaml
@@ -45,11 +45,11 @@ conditions:
         eval_key: bright-aops
         scores:
           - nDCG@10: 0.0645
+      - topic_key: theoremqa-questions
+        eval_key: bright-theoremqa-questions
+        scores:
+          - nDCG@10: 0.0733
       - topic_key: theoremqa-theorems
         eval_key: bright-theoremqa-theorems
         scores:
           - nDCG@10: 0.0214
-      - topic_key: theoremqa-questions
-        eval_key: bright-theoremqa-questions
-        scores:
-          - nDCG@10: 0.0733
\ No newline at end of file
diff --git a/src/main/resources/reproduce/msmarco-v2.1-doc-segmented.yaml b/src/main/resources/reproduce/msmarco-v2.1-doc-segmented.yaml
index 99afeeaad6..207dd76e2c 100644
--- a/src/main/resources/reproduce/msmarco-v2.1-doc-segmented.yaml
+++ b/src/main/resources/reproduce/msmarco-v2.1-doc-segmented.yaml
@@ -201,7 +201,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard00.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard00.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -219,7 +219,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard01.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard01.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -237,7 +237,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard02.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard02.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -255,7 +255,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard03.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard03.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -273,7 +273,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard04.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard04.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -291,7 +291,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard05.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard05.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -309,7 +309,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard06.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard06.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -327,7 +327,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard07.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard07.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -345,7 +345,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard08.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard08.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -363,7 +363,7 @@ conditions:
     display: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_html: "ArcticEmbed-L w/ HNSW int8 (ONNX)"
     display_row: ""
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard09.arctic-embed-l.hnsw-int8 -topics $topics -topicReader TsvString -topicField title -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchHnswDenseVectors -threads $threads -index msmarco-v2.1-doc-segmented-shard09.arctic-embed-l.hnsw-int8 -topics $topics -encoder ArcticEmbedL -output $output -hits 250 -efSearch 1000
     topics:
       - topic_key: rag24.test
         eval_key: rag24.test-umbrela-all
@@ -381,7 +381,7 @@ conditions:
     display: "SPLADE-v3 (cached queries)"  
     display_html: "SPLADE-v3 (cached queries)"  
     display_row: ""  
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchCollection -threads $threads -index msmarco-v2.1-doc-segmented-splade-v3 -topics $topics -topicReader TsvString -output $output -impact -pretokenized -removeQuery -hits 1000  
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchCollection -threads $threads -index msmarco-v2.1-doc-segmented-splade-v3 -topics $topics -output $output -impact -pretokenized -removeQuery -hits 1000
     topics:  
       - topic_key: rag24.test.splade-v3  
         eval_key: rag24.test-umbrela-all  
@@ -399,7 +399,7 @@ conditions:
     display: "SPLADE-v3 (ONNX)"  
     display_html: "SPLADE-v3 (ONNX)"  
     display_row: ""  
-    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchCollection -threads $threads -index msmarco-v2.1-doc-segmented-splade-v3 -topics $topics -topicReader TsvString -output $output -impact -pretokenized -removeQuery -hits 1000 -encoder SpladeV3  
+    command: java -cp $fatjar --add-modules jdk.incubator.vector io.anserini.search.SearchCollection -threads $threads -index msmarco-v2.1-doc-segmented-splade-v3 -topics $topics -output $output -impact -pretokenized -removeQuery -hits 1000 -encoder SpladeV3
     topics:  
       - topic_key: rag24.test  
         eval_key: rag24.test-umbrela-all  
diff --git a/src/test/java/io/anserini/doc/GeneratePrebuiltIndexesDocTest.java b/src/test/java/io/anserini/doc/GeneratePrebuiltIndexesDocTest.java
index 2b767bfcfa..c09c8e54cc 100644
--- a/src/test/java/io/anserini/doc/GeneratePrebuiltIndexesDocTest.java
+++ b/src/test/java/io/anserini/doc/GeneratePrebuiltIndexesDocTest.java
@@ -61,11 +61,11 @@ public void generateDocs() throws IOException {
     # Anserini: Prebuilt Indexes
 
     Anserini ships with a number of prebuilt indexes.
-    This means that various indexes (inverted indexes, HNSW indexes, etc.) for common collections used in NLP and IR research have already been built and just needs to be downloaded (from UWaterloo/Hugging Face servers), which Anserini will handle automatically for you.
+    This means that various indexes (inverted indexes, HNSW indexes, etc.) for common collections used in NLP and IR research have already been built and just needs to be downloaded (from UWaterloo and Hugging Face servers), which Anserini will handle automatically for you.
 
-    Bindings for the available prebuilt indexes are in [`io.anserini.index.IndexInfo`](https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexInfo.java) and below.
+    Bindings for the available prebuilt indexes are in [`io.anserini.index.IndexInfo`](https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexInfo.java) as Java enums.
     For example, if you specify `-index msmarco-v1-passage`, Anserini will know that you mean the Lucene index of the MS MARCO V1 passage corpus.
-    It will then download the index from the servers and cache locally.
+    It will then download the index from the specified location(s) and cache locally.
     All of this happens automagically!
 
     ## Getting Started
@@ -76,7 +76,7 @@ This means that various indexes (inverted indexes, HNSW indexes, etc.) for commo
     bin/run.sh io.anserini.index.IndexReaderUtils -index cacm -stats
     ```
 
-    The output of the command will be:
+    The output of the above command will be:
 
     ```
     Index statistics
@@ -87,28 +87,28 @@ This means that various indexes (inverted indexes, HNSW indexes, etc.) for commo
     total terms:           320968
     ```
 
-    Note that unless the underlying index was built with the `-optimize` option (i.e., merging all index segments into a single segment), `unique_terms` will show -1.
+    Note that for inverted indexes, unless the underlying index was built with the `-optimize` option (i.e., merging all index segments into a single segment), `unique_terms` will show -1.
     Nope, that's not a bug.
 
     ## Managing Indexes
 
-    The downloaded index will by default be in `~/.cache/pyserini/indexes/`.
-    (Yes, `pyserini`; this is so prebuilt indexes from both Pyserini and Anserini can live in the same location.)
+    Downloaded indexes are by default stored in `~/.cache/pyserini/indexes/`.
+    (Yes, `pyserini`, that's not a bug &mdash; this is so prebuilt indexes can be shared between Pyserini and Anserini.)
     You can specify a custom cache directory by setting the environment variable `$ANSERINI_INDEX_CACHE` or the system property `anserini.index.cache`.
 
     Another helpful tip is to download and manage the indexes by hand.
-    All relevant information is stored in [`IndexInfo`](https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexInfo.java).
-    For example, `msmarco-v1-passage` can be downloaded from:
+    As an example, from [`IndexInfo`](https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexInfo.java) you can see that `msmarco-v1-passage` can be downloaded from:
 
     ```
     https://huggingface.co/datasets/castorini/prebuilt-indexes-msmarco-v1/resolve/main/passage/original/lucene-inverted/tf/lucene-inverted.msmarco-v1-passage.20221004.252b5e.tar.gz
     ```
 
-    and has an MD5 checksum of `678876e8c99a89933d553609a0fd8793`.
-    You can download, verify, and put anywhere you want.
+    The tarball has an MD5 checksum of `678876e8c99a89933d553609a0fd8793`.
+    
+    You can download, verify, unpack, and put the index anywhere you want.
     With `-index /path/to/index/` you'll get exactly the same output as `-index msmarco-v1-passage`, except now you've got fine-grained control over managing the index.
 
-    By manually managing the indexes, you can share indexes between multiple users to conserve space.
+    By manually managing indexes, you can share indexes between multiple users to conserve space.
     The schema of the index location in `~/.cache/pyserini/indexes/` is the tarball name (after unpacking), followed by a dot and the checksum, so `msmarco-v1-passage` lives in following location:
 
     ```
@@ -116,7 +116,7 @@ This means that various indexes (inverted indexes, HNSW indexes, etc.) for commo
     ```
 
     You can download the index once, put in a common location, and have each user symlink to the actual index location.
-    Source would conform to the schema above, target would be where your index actually resides.
+    The source of the symlink would conform to the schema above, and the target of the symlink would be where your index actually resides.
 
     ## Recovering from Partial Downloads
 
@@ -132,7 +132,9 @@ Then start over (e.g., rerun the command you were running before).
 
     Below is a summary of the prebuilt indexes that are currently available.
 
-    Note that this page is automatically generated from [this script](../src/test/java/io/anserini/doc/GeneratePrebuiltIndexesDocTest.java), so do not modify this page directly; modify the script instead.
+    Note that this page is automatically generated from [this test case](../src/test/java/io/anserini/doc/GeneratePrebuiltIndexesDocTest.java).
+    This means that the page is updated with every (successful) build.
+    Therefore, do not modify this page directly; modify the test case instead.
 
     """);