-
Notifications
You must be signed in to change notification settings - Fork 475
Closed
castorini/anserini
#2993Description
Failure:
>>> LuceneSearcher.from_prebuilt_index('bright-biology.splade-v3', verbose=True)
Attempting to initialize prebuilt index bright-biology.splade-v3.
/Users/jimmylin/.cache/pyserini/indexes/lucene-inverted.bright-biology.splade-v3.20250808.c6674a.559813ffede15ba7080af05383b64bde already exists, skipping download.
Oct 12, 2025 6:57:15 A.M. org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jimmylin/workspace/pyserini/pyserini/search/lucene/_searcher.py", line 85, in from_prebuilt_index
index_reader.validate(prebuilt_index_name, verbose=verbose)
File "/Users/jimmylin/workspace/pyserini/pyserini/index/lucene/_base.py", line 272, in validate
raise ValueError('Prebuilt index fails consistency check: "documents" does not match!')
ValueError: Prebuilt index fails consistency check: "documents" does not match!
The issue is that these symbols are defined in Anserini, and then imported into Pyserini. However, they don't have relevant stats (e.g., num docs, etc.) defined.
Occurs for the following:
% grep splade-v3 src/main/java/io/anserini/index/IndexInfo.java | grep BEIR
BEIR_V1_0_0_TREC_COVID_SPLADE_V3("beir-v1.0.0-trec-covid.splade-v3",
BEIR_V1_0_0_BIOASQ_SPLADE_V3("beir-v1.0.0-bioasq.splade-v3",
BEIR_V1_0_0_NFCORPUS_SPLADE_V3("beir-v1.0.0-nfcorpus.splade-v3",
BEIR_V1_0_0_NQ_SPLADE_V3("beir-v1.0.0-nq.splade-v3",
BEIR_V1_0_0_HOTPOTQA_SPLADE_V3("beir-v1.0.0-hotpotqa.splade-v3",
BEIR_V1_0_0_FIQA_SPLADE_V3("beir-v1.0.0-fiqa.splade-v3",
BEIR_V1_0_0_SIGNAL1M_SPLADE_V3("beir-v1.0.0-signal1m.splade-v3",
BEIR_V1_0_0_TREC_NEWS_SPLADE_V3("beir-v1.0.0-trec-news.splade-v3",
BEIR_V1_0_0_ROBUST04_SPLADE_V3("beir-v1.0.0-robust04.splade-v3",
BEIR_V1_0_0_ARGUANA_SPLADE_V3("beir-v1.0.0-arguana.splade-v3",
BEIR_V1_0_0_WEBIS_TOUCHE2020_SPLADE_V3("beir-v1.0.0-webis-touche2020.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_ANDROID_SPLADE_V3("beir-v1.0.0-cqadupstack-android.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_ENGLISH_SPLADE_V3("beir-v1.0.0-cqadupstack-english.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_GAMING_SPLADE_V3("beir-v1.0.0-cqadupstack-gaming.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_GIS_SPLADE_V3("beir-v1.0.0-cqadupstack-gis.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_MATHEMATICA_SPLADE_V3("beir-v1.0.0-cqadupstack-mathematica.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_PHYSICS_SPLADE_V3("beir-v1.0.0-cqadupstack-physics.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_PROGRAMMERS_SPLADE_V3("beir-v1.0.0-cqadupstack-programmers.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_STATS_SPLADE_V3("beir-v1.0.0-cqadupstack-stats.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_TEX_SPLADE_V3("beir-v1.0.0-cqadupstack-tex.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_UNIX_SPLADE_V3("beir-v1.0.0-cqadupstack-unix.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_WEBMASTERS_SPLADE_V3("beir-v1.0.0-cqadupstack-webmasters.splade-v3",
BEIR_V1_0_0_CQADUPSTACK_WORDPRESS_SPLADE_V3("beir-v1.0.0-cqadupstack-wordpress.splade-v3",
BEIR_V1_0_0_QUORA_SPLADE_V3("beir-v1.0.0-quora.splade-v3",
BEIR_V1_0_0_DBPEDIA_ENTITY_SPLADE_V3("beir-v1.0.0-dbpedia-entity.splade-v3",
BEIR_V1_0_0_SCIDOCS_SPLADE_V3("beir-v1.0.0-scidocs.splade-v3",
BEIR_V1_0_0_FEVER_SPLADE_V3("beir-v1.0.0-fever.splade-v3",
BEIR_V1_0_0_CLIMATE_FEVER_SPLADE_V3("beir-v1.0.0-climate-fever.splade-v3",
BEIR_V1_0_0_SCIFACT_SPLADE_V3("beir-v1.0.0-scifact.splade-v3",
% grep splade-v3 src/main/java/io/anserini/index/IndexInfo.java | grep BRIGHT
BRIGHT_BIOLOGY_SPLADE_V3("bright-biology.splade-v3",
BRIGHT_EARTH_SCIENCE_SPLADE_V3("bright-earth-science.splade-v3",
BRIGHT_ECONOMICS_SPLADE_V3("bright-economics.splade-v3",
BRIGHT_PSYCHOLOGY_SPLADE_V3("bright-psychology.splade-v3",
BRIGHT_ROBOTICS_SPLADE_V3("bright-robotics.splade-v3",
BRIGHT_STACKOVERFLOW_SPLADE_V3("bright-stackoverflow.splade-v3",
BRIGHT_SUSTAINABLE_LIVING_SPLADE_V3("bright-sustainable-living.splade-v3",
BRIGHT_PONY_SPLADE_V3("bright-pony.splade-v3",
BRIGHT_LEETCODE_SPLADE_V3("bright-leetcode.splade-v3",
BRIGHT_AOPS_SPLADE_V3("bright-aops.splade-v3",
BRIGHT_THEOREMQA_THEOREMS_SPLADE_V3("bright-theoremqa-theorems.splade-v3",
BRIGHT_THEOREMQA_QUESTIONS_SPLADE_V3("bright-theoremqa-questions.splade-v3",
@lilyjge is this something you have time to tackle?
lilyjge
Metadata
Metadata
Assignees
Labels
No labels