Thanks to visit codestin.com
Credit goes to github.com

Skip to content

castorini/quackir

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QuackIR

LICENSE

QuackIR is a toolkit for reproducible information retrieval research with relational database management systems. Sparse retrieval is available with DuckDB, SQLite, and PostgreSQL. Dense and hybrid retrieval are available with DuckDB and PostgreSQL. Analysis with the porter tokenizer is provided via wrapping Pyserini's Lucene analyzer.

Installation

Clone Repository

git clone https://github.com/castorini/quackir.git --recurse-submodules

Install Dependencies

conda create -n quackir python=3.10
conda activate quackir
conda install -c conda-forge postgresql pgvector openjdk=21 maven -y
pip install -r requirements.txt

Initialize PostgreSQL

initdb -D mydb
pg_ctl -D mydb -l logfile start &
createdb quackir
psql quackir
create user postgres superuser;
create extension vector;
\q

Quick Start

To create a sparse index with DuckDB:

from quackir.index import DuckDBIndexer
from quackir import IndexType

table_name = "corpus"
index_type = IndexType.SPARSE

indexer = DuckDBIndexer()
indexer.init_table(table_name, index_type)
indexer.load_table(table_name, corpus_file)
indexer.fts_index(table_name)

indexer.close()

To perform sparse retrieval:

from quackir.search import DuckDBSearcher
from quackir import SearchType

table_name = "corpus"
query = "what is a lobster roll"
search_type = SearchType.SPARSE

searcher = DuckDBSearcher()
results = searcher.search(
    search_type, query_string=query, table_names=[table_name]
)
print(results)

searcher.close()

For using commands, see the documentation.

Reproduce

For step-by-step reproduction of BEIR experiments, see these docs.

To reproduce all BEIR experiments, run the following command and find the results in logs:

bash ./scripts/beir/run.sh

About

QuackIR is an IR toolkit built on DuckDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published