Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -533,6 +533,8 @@ Substitute the appropriate `$MODEL` from the table below.

### BRIGHT Regressions

BRIGHT is a retrieval benchmark described [here](https://arxiv.org/abs/2407.12883).

| Corpus | Baselines |
|---------------------|:----------------------------------------------------------------:|
| **StackExchange** | |
Expand All @@ -553,11 +555,11 @@ Substitute the appropriate `$MODEL` from the table below.

### Available Corpora for Download

| Corpora | Size | Checksum |
|:---------------------------------------------------------------------------------------------------------------------------------------|-------:|:-----------------------------------|
| [Post-Processed](https://huggingface.co/datasets/castorini/collections-bright/resolve/main/bright-corpus.tar) | 297 MB | `d8c829f0e4468a8ce62768b6a1162158` |
| Corpora | Size | Checksum |
|:--------------------------------------------------------------------------------------------------------------|-------:|:-----------------------------------|
| [Post-Processed](https://huggingface.co/datasets/castorini/collections-bright/resolve/main/bright-corpus.tar) | 297 MB | `d8c829f0e4468a8ce62768b6a1162158` |

The [BRIGHT](https://arxiv.org/abs/2407.12883) corpus used here was processed from Hugging Face with these [scripts](https://github.com/ielab/llm-rankers/tree/main/Rank-R1/bright).
The BRIGHT corpora above were processed from Hugging Face with [these scripts](https://github.com/ielab/llm-rankers/tree/main/Rank-R1/bright).

<hr/>

Expand Down
6 changes: 3 additions & 3 deletions docs/fatjar-regressions/fatjar-regressions-v0.36.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ The table below reports effectiveness (dev in terms of RR@100, DL21-DL23, RAGgy
| BM25 doc (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
| BM25 doc-segmented (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
Expand Down Expand Up @@ -270,7 +270,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
| cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries) | 0.3647 | 0.6956 | 0.7245 |
| cohere-embed-english-v3.0 w/ HNSW int8 (cached queries) | 0.3656 | 0.6955 | 0.7262 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
Expand Down Expand Up @@ -440,7 +440,7 @@ The table below reports the effectiveness of the models (nDCG@10):
| `climate-fever` | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 |
| `scifact` | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
Expand Down
6 changes: 3 additions & 3 deletions docs/fatjar-regressions/fatjar-regressions-v0.37.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ The table below reports effectiveness (dev in terms of RR@100, DL21-DL23, RAGgy
| BM25 doc (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
| BM25 doc-segmented (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
Expand Down Expand Up @@ -278,7 +278,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
| cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries) | 0.3647 | 0.6956 | 0.7245 |
| cohere-embed-english-v3.0 w/ HNSW int8 (cached queries) | 0.3656 | 0.6955 | 0.7262 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
Expand Down Expand Up @@ -451,7 +451,7 @@ The table below reports the effectiveness of the models (nDCG@10):
| `climate-fever` | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
| `scifact` | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
Expand Down
6 changes: 3 additions & 3 deletions docs/fatjar-regressions/fatjar-regressions-v0.38.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ The table below reports effectiveness (dev in terms of RR@100, DL21-DL23, RAGgy
| BM25 doc (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
| BM25 doc-segmented (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
Expand Down Expand Up @@ -278,7 +278,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
| cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries) | 0.3647 | 0.6956 | 0.7245 |
| cohere-embed-english-v3.0 w/ HNSW int8 (cached queries) | 0.3656 | 0.6955 | 0.7262 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
Expand Down Expand Up @@ -451,7 +451,7 @@ The table below reports the effectiveness of the models (nDCG@10):
| `climate-fever` | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
| `scifact` | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
Expand Down
6 changes: 3 additions & 3 deletions docs/fatjar-regressions/fatjar-regressions-v0.39.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ The table below reports effectiveness (dev in terms of RR@100, DL21-DL23, RAGgy
| BM25 doc (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
| BM25 doc-segmented (<i>k<sub><small>1</small></sub></i>=0.9, <i>b</i>=0.4) | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
Expand Down Expand Up @@ -318,7 +318,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
| cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries) | 0.3647 | 0.6956 | 0.7245 |
| cohere-embed-english-v3.0 w/ HNSW int8 (cached queries) | 0.3656 | 0.6955 | 0.7262 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
Expand Down Expand Up @@ -491,7 +491,7 @@ The table below reports the effectiveness of the models (nDCG@10):
| `climate-fever` | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
| `scifact` | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
Expand Down
4 changes: 2 additions & 2 deletions docs/fatjar-regressions/fatjar-regressions-v1.0.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
| cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries) | 0.3647 | 0.6956 | 0.7245 |
| cohere-embed-english-v3.0 w/ HNSW int8 (cached queries) | 0.3656 | 0.6955 | 0.7262 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
Expand Down Expand Up @@ -402,7 +402,7 @@ The table below reports the effectiveness of the models (nDCG@10):
| `climate-fever` | 0.1651 | 0.2129 | 0.2297 | 0.2298 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
| `scifact` | 0.6789 | 0.6647 | 0.7041 | 0.7036 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
Expand Down
4 changes: 2 additions & 2 deletions docs/fatjar-regressions/fatjar-regressions-v1.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
| cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries) | 0.3647 | 0.6956 | 0.7245 |
| cohere-embed-english-v3.0 w/ HNSW int8 (cached queries) | 0.3656 | 0.6955 | 0.7262 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
Expand Down Expand Up @@ -400,7 +400,7 @@ The table below reports the effectiveness of the models (nDCG@10):
| `climate-fever` | 0.1651 | 0.2129 | 0.2625 | 0.2625 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
| `scifact` | 0.6789 | 0.6647 | 0.7140 | 0.7140 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
Expand Down
4 changes: 2 additions & 2 deletions docs/fatjar-regressions/fatjar-regressions-v1.1.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
| cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries) | 0.3647 | 0.6956 | 0.7245 |
| cohere-embed-english-v3.0 w/ HNSW int8 (cached queries) | 0.3656 | 0.6955 | 0.7262 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
Expand Down Expand Up @@ -400,7 +400,7 @@ The table below reports the effectiveness of the models (nDCG@10):
| `climate-fever` | 0.1651 | 0.2129 | 0.2625 | 0.2625 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
| `scifact` | 0.6789 | 0.6647 | 0.7140 | 0.7140 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
Expand Down
100 changes: 96 additions & 4 deletions docs/fatjar-regressions/fatjar-regressions-v1.2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ Using the [UMBRELA qrels](https://trec-rag.github.io/annoucements/umbrela-qrels/
| RAG24 Test (UMBRELA): nDCG@100 | 0.2563 | 0.4855 |
| RAG24 Test (UMBRELA): Recall@100 | 0.1395 | 0.2547 |

See instructions below on how to reproduce these runs; more details can be found in the following paper:
See instructions below on how to reproduce these runs; more details can be found in the following two papers:

> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models Using UMBRELA.](https://dl.acm.org/doi/10.1145/3731120.3744605) Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025), pages 358-368, July 2025, Padua, Italy.

> Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, and Jimmy Lin. [A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look.](https://arxiv.org/abs/2411.08275) _arXiv:2411.08275_, November 2024.

Expand Down Expand Up @@ -229,17 +231,75 @@ The table below reports the effectiveness of the models (dev in terms of RR@10,
| cohere-embed-english-v3.0 w/ HNSW fp32 (cached queries) | 0.3647 | 0.6956 | 0.7245 |
| cohere-embed-english-v3.0 w/ HNSW int8 (cached queries) | 0.3656 | 0.6955 | 0.7262 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v1-passage
```

To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.

## MS MARCO V2.1 Segmented Documents

❗ Beware, the (automatically downloaded) indexes for running these experiments take up several hundred GBs.

The MS MARCO V2.1 collections were created for the [TREC RAG Track](https://trec-rag.github.io/).
There were two variants: the documents corpus and the segmented documents corpus.
The documents corpus served as the source of the segmented documents corpus, but the segmented documents corpus is the one used in official TREC RAG evaluations.
The following table reports nDCG@20 scores for various retrieval conditions:

| | RAG 24 UMBRELA | RAG 24 NIST |
|-----------------------------------------------|:--------------:|:-----------:|
| baselines | 0.3198 | 0.2809 |
| SPLADE-v3 | 0.5167 | 0.4642 |
| Arctic-embed-l (`shard00`, HNSW int8 indexes) | 0.3003 | 0.2449 |
| Arctic-embed-l (`shard01`, HNSW int8 indexes) | 0.2599 | 0.2184 |
| Arctic-embed-l (`shard02`, HNSW int8 indexes) | 0.2661 | 0.2211 |
| Arctic-embed-l (`shard03`, HNSW int8 indexes) | 0.2705 | 0.2388 |
| Arctic-embed-l (`shard04`, HNSW int8 indexes) | 0.2937 | 0.2253 |
| Arctic-embed-l (`shard05`, HNSW int8 indexes) | 0.2590 | 0.2383 |
| Arctic-embed-l (`shard06`, HNSW int8 indexes) | 0.2444 | 0.2336 |
| Arctic-embed-l (`shard07`, HNSW int8 indexes) | 0.2417 | 0.2255 |
| Arctic-embed-l (`shard08`, HNSW int8 indexes) | 0.2847 | 0.2765 |
| Arctic-embed-l (`shard09`, HNSW int8 indexes) | 0.2432 | 0.2457 |

The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1-doc-segmented
```

To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.

## MS MARCO V2.1 Documents

❗ Beware, the (automatically downloaded) indexes for running these experiments take up several hundred GBs.

The MS MARCO V2.1 collections were created for the [TREC RAG Track](https://trec-rag.github.io/).
There were two variants: the documents corpus and the segmented documents corpus.
The documents corpus served as the source of the segmented documents corpus, but is not otherwise used in any formal evaluations.
It primarily served development purposes for the TREC 2024 RAG evaluation, where previous qrels from MS MARCO V2 and DL21-DL23 were "projected over" to this corpus.

The table below reports effectiveness (dev in terms of RR@10, DL21-DL23, RAGgy in terms of nDCG@10):

| | dev | dev2 | DL21 | DL22 | DL23 | RAGgy |
|:-------------------|-------:|-------:|-------:|-------:|-------:|-------:|
| BM25 doc | 0.1654 | 0.1732 | 0.5183 | 0.2991 | 0.2914 | 0.3631 |
| BM25 doc-segmented | 0.1973 | 0.2000 | 0.5778 | 0.3576 | 0.3356 | 0.4227 |

The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
```

To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.

## BEIR

❗ Beware, the (automatically downloaded) indexes for running these experiments take up several hundred GBs.

Currently, Anserini provides support for the following models:
Here is a selection of models that are currently suppoted in Anserini:

+ Flat = BM25, "flat" bag-of-words baseline
+ MF = BM25, "multifield" bag-of-words baseline
Expand Down Expand Up @@ -283,9 +343,41 @@ The table below reports the effectiveness of the models (nDCG@10):
| `climate-fever` | 0.1651 | 0.2129 | 0.2625 | 0.2625 | 0.3119 | 0.3117 | 0.3119 | 0.3117 |
| `scifact` | 0.6789 | 0.6647 | 0.7140 | 0.7140 | 0.7408 | 0.7408 | 0.7408 | 0.7408 |

The follow command will reproduce the above experiments:
The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunBeir
```

To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.

## BRIGHT

BRIGHT is a retrieval benchmark described [here](https://arxiv.org/abs/2407.12883).
The following table reports nDCG@10 scores for BM25 baselines:

| Corpus | BM25 |
|--------------------|:------:|
| **StackExchange** | |
| Biology | 0.1824 |
| Earth Science | 0.2791 |
| Economics | 0.1645 |
| Psychology | 0.1342 |
| Robotics | 0.1091 |
| Stack Overflow | 0.1626 |
| Sustainable Living | 0.1613 |
| **Coding** | |
| LeetCode | 0.2471 |
| Pony | 0.0434 |
| **Theorems** | |
| AoPS | 0.0645 |
| TheoremQA-Q | 0.0733 |
| TheoremQA-T | 0.0214 |

The following command will reproduce the above experiments:

```bash
java -cp $ANSERINI_JAR io.anserini.reproduce.RunBright
```

To print out the commands that will generate the above runs without performing the runs, use the options `-dryRun -printCommands`.
Loading