Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SPLADE-v3 fails index consistency checks #2291

@lintool

Description

@lintool

Failure:

>>> LuceneSearcher.from_prebuilt_index('bright-biology.splade-v3', verbose=True)
Attempting to initialize prebuilt index bright-biology.splade-v3.
/Users/jimmylin/.cache/pyserini/indexes/lucene-inverted.bright-biology.splade-v3.20250808.c6674a.559813ffede15ba7080af05383b64bde already exists, skipping download.
Oct 12, 2025 6:57:15 A.M. org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jimmylin/workspace/pyserini/pyserini/search/lucene/_searcher.py", line 85, in from_prebuilt_index
    index_reader.validate(prebuilt_index_name, verbose=verbose)
  File "/Users/jimmylin/workspace/pyserini/pyserini/index/lucene/_base.py", line 272, in validate
    raise ValueError('Prebuilt index fails consistency check: "documents" does not match!')
ValueError: Prebuilt index fails consistency check: "documents" does not match!

The issue is that these symbols are defined in Anserini, and then imported into Pyserini. However, they don't have relevant stats (e.g., num docs, etc.) defined.

Occurs for the following:

% grep splade-v3 src/main/java/io/anserini/index/IndexInfo.java | grep BEIR     
  BEIR_V1_0_0_TREC_COVID_SPLADE_V3("beir-v1.0.0-trec-covid.splade-v3",
  BEIR_V1_0_0_BIOASQ_SPLADE_V3("beir-v1.0.0-bioasq.splade-v3",
  BEIR_V1_0_0_NFCORPUS_SPLADE_V3("beir-v1.0.0-nfcorpus.splade-v3",
  BEIR_V1_0_0_NQ_SPLADE_V3("beir-v1.0.0-nq.splade-v3",
  BEIR_V1_0_0_HOTPOTQA_SPLADE_V3("beir-v1.0.0-hotpotqa.splade-v3",
  BEIR_V1_0_0_FIQA_SPLADE_V3("beir-v1.0.0-fiqa.splade-v3",
  BEIR_V1_0_0_SIGNAL1M_SPLADE_V3("beir-v1.0.0-signal1m.splade-v3",
  BEIR_V1_0_0_TREC_NEWS_SPLADE_V3("beir-v1.0.0-trec-news.splade-v3",
  BEIR_V1_0_0_ROBUST04_SPLADE_V3("beir-v1.0.0-robust04.splade-v3",
  BEIR_V1_0_0_ARGUANA_SPLADE_V3("beir-v1.0.0-arguana.splade-v3",
  BEIR_V1_0_0_WEBIS_TOUCHE2020_SPLADE_V3("beir-v1.0.0-webis-touche2020.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_ANDROID_SPLADE_V3("beir-v1.0.0-cqadupstack-android.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_ENGLISH_SPLADE_V3("beir-v1.0.0-cqadupstack-english.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_GAMING_SPLADE_V3("beir-v1.0.0-cqadupstack-gaming.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_GIS_SPLADE_V3("beir-v1.0.0-cqadupstack-gis.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_MATHEMATICA_SPLADE_V3("beir-v1.0.0-cqadupstack-mathematica.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_PHYSICS_SPLADE_V3("beir-v1.0.0-cqadupstack-physics.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_PROGRAMMERS_SPLADE_V3("beir-v1.0.0-cqadupstack-programmers.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_STATS_SPLADE_V3("beir-v1.0.0-cqadupstack-stats.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_TEX_SPLADE_V3("beir-v1.0.0-cqadupstack-tex.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_UNIX_SPLADE_V3("beir-v1.0.0-cqadupstack-unix.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_WEBMASTERS_SPLADE_V3("beir-v1.0.0-cqadupstack-webmasters.splade-v3",
  BEIR_V1_0_0_CQADUPSTACK_WORDPRESS_SPLADE_V3("beir-v1.0.0-cqadupstack-wordpress.splade-v3",
  BEIR_V1_0_0_QUORA_SPLADE_V3("beir-v1.0.0-quora.splade-v3",
  BEIR_V1_0_0_DBPEDIA_ENTITY_SPLADE_V3("beir-v1.0.0-dbpedia-entity.splade-v3",
  BEIR_V1_0_0_SCIDOCS_SPLADE_V3("beir-v1.0.0-scidocs.splade-v3",
  BEIR_V1_0_0_FEVER_SPLADE_V3("beir-v1.0.0-fever.splade-v3",
  BEIR_V1_0_0_CLIMATE_FEVER_SPLADE_V3("beir-v1.0.0-climate-fever.splade-v3",
  BEIR_V1_0_0_SCIFACT_SPLADE_V3("beir-v1.0.0-scifact.splade-v3",

% grep splade-v3 src/main/java/io/anserini/index/IndexInfo.java | grep BRIGHT
  BRIGHT_BIOLOGY_SPLADE_V3("bright-biology.splade-v3",
  BRIGHT_EARTH_SCIENCE_SPLADE_V3("bright-earth-science.splade-v3",
  BRIGHT_ECONOMICS_SPLADE_V3("bright-economics.splade-v3",
  BRIGHT_PSYCHOLOGY_SPLADE_V3("bright-psychology.splade-v3",
  BRIGHT_ROBOTICS_SPLADE_V3("bright-robotics.splade-v3",
  BRIGHT_STACKOVERFLOW_SPLADE_V3("bright-stackoverflow.splade-v3",
  BRIGHT_SUSTAINABLE_LIVING_SPLADE_V3("bright-sustainable-living.splade-v3",
  BRIGHT_PONY_SPLADE_V3("bright-pony.splade-v3",
  BRIGHT_LEETCODE_SPLADE_V3("bright-leetcode.splade-v3",
  BRIGHT_AOPS_SPLADE_V3("bright-aops.splade-v3",
  BRIGHT_THEOREMQA_THEOREMS_SPLADE_V3("bright-theoremqa-theorems.splade-v3",
  BRIGHT_THEOREMQA_QUESTIONS_SPLADE_V3("bright-theoremqa-questions.splade-v3",

@lilyjge is this something you have time to tackle?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions