VectorChord-BM25 is a PostgreSQL extension for bm25 ranking algorithm. We implemented the Block-WeakAnd Algorithms for BM25 ranking inside PostgreSQL. It's recommended to be used with pg_tokenizer.rs for customized tokenization.
For new users, we recommend using tensorchord/vchord-suite image to get started quickly, you can find more details in the VectorChord-images repository.
docker run \
--name vchord-suite \
-e POSTGRES_PASSWORD=postgres \
-p 5432:5432 \
-d tensorchord/vchord-suite:pg18-latest
# If you want to use ghcr image, you can change the image to `ghcr.io/tensorchord/vchord-suite:pg18-latest`.
# if you want to use the specific version, you can use the tag `pg17-20250414`, supported version can be found in the support matrix.Once everythingโs set up, you can connect to the database using the psql command line tool. The default username is postgres, and the default password is postgres. Hereโs how to connect:
psql -h localhost -p 5432 -U postgresAfter connecting, run the following SQL to make sure the extension is enabled:
CREATE EXTENSION IF NOT EXISTS pg_tokenizer CASCADE; -- for tokenizer
CREATE EXTENSION IF NOT EXISTS vchord_bm25 CASCADE; -- for bm25 rankingThe extension is mainly composed by three parts, tokenizer, bm25vector and bm25vector index. The tokenizer is used to convert the text into a bm25vector, and the bm25vector is similar to a sparse vector, which stores the vocabulary id and frequency. The bm25vector index is used to speed up the search and ranking process.
To tokenize a text, you can use the tokenize function. The tokenize function takes two arguments, the text to tokenize and the tokenizer name.
Note
Tokenizer part is completed by a separate extension pg_tokenizer.rs, more details can be found here.
-- create a tokenizer
SELECT create_tokenizer('bert', $$
model = "bert_base_uncased" # using pre-trained model
$$);
-- tokenize text with bert tokenizer
SELECT tokenize('A quick brown fox jumps over the lazy dog.', 'bert')::bm25vector;
-- Output: {1012:1, 1037:1, 1996:1, 2058:1, 2829:1, 3899:1, 4248:1, 4419:1, 13971:1, 14523:1}
-- The output is a bm25vector, 1012:1 means the word with id 1012 appears once in the text.One thing special about bm25 score is that it depends on a global document frequency, which means the score of a word in a document depends on the frequency of the word in all documents. To calculate the bm25 score between a bm25vector and a query, you need had a document set first and then use the <&> operator.
-- Setup the document table
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
passage TEXT,
embedding bm25vector
);
INSERT INTO documents (passage) VALUES
('PostgreSQL is a powerful, open-source object-relational database system. It has over 15 years of active development.'),
('Full-text search is a technique for searching in plain-text documents or textual database fields. PostgreSQL supports this with tsvector.'),
('BM25 is a ranking function used by search engines to estimate the relevance of documents to a given search query.'),
('PostgreSQL provides many advanced features like full-text search, window functions, and more.'),
('Search and ranking in databases are important in building effective information retrieval systems.'),
('The BM25 ranking algorithm is derived from the probabilistic retrieval framework.'),
('Full-text search indexes documents to allow fast text queries. PostgreSQL supports this through its GIN and GiST indexes.'),
('The PostgreSQL community is active and regularly improves the database system.'),
('Relational databases such as PostgreSQL can handle both structured and unstructured data.'),
('Effective search ranking algorithms, such as BM25, improve search results by understanding relevance.');Then tokenize it
UPDATE documents SET embedding = tokenize(passage, 'bert');Create the index on the bm25vector column so that we can collect the global document frequency.
CREATE INDEX documents_embedding_bm25 ON documents USING bm25 (embedding bm25_ops);Now we can calculate the BM25 score between the query and the vectors. Note that the BM25 score in VectorChord-BM25 is negative, which means the more negative the score, the more relevant the document is. We intentionally make it negative so that you can use the default order by to get the most relevant documents first.
-- to_bm25query(index_name, query, tokenizer_name)
-- <&> is the operator to compute the bm25 score
SELECT id, passage, embedding <&> to_bm25query('documents_embedding_bm25', tokenize('PostgreSQL', 'bert')) AS bm25_score FROM documents;And you can use the order by to utilize the index to get the most relevant documents first and faster.
SELECT id, passage, embedding <&> to_bm25query('documents_embedding_bm25', tokenize('PostgreSQL', 'bert')) AS rank
FROM documents
ORDER BY rank
LIMIT 10;Using custom model
You can also build a custom model based on your own corpus easily.
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
passage TEXT,
embedding bm25vector
);
-- create a text analyzer to generate tokens that can be used to train the model
SELECT create_text_analyzer('text_analyzer1', $$
pre_tokenizer = "unicode_segmentation" # split texts according to the Unicode Standard Annex #29
[[character_filters]]
to_lowercase = {} # convert all characters to lowercase
[[character_filters]]
unicode_normalization = "nfkd" # normalize the text to Unicode Normalization Form KD
[[token_filters]]
skip_non_alphanumeric = {} # skip tokens that all characters are not alphanumeric
[[token_filters]]
stopwords = "nltk_english" # remove stopwords using the nltk dictionary
[[token_filters]]
stemmer = "english_porter2" # stem tokens using the English Porter2 stemmer
$$);
-- create a model to generate embeddings from original passage
-- It'll train a model from passage column and store the embeddings in the embedding column
SELECT create_custom_model_tokenizer_and_trigger(
tokenizer_name => 'tokenizer1',
model_name => 'model1',
text_analyzer_name => 'text_analyzer1',
table_name => 'documents',
source_column => 'passage',
target_column => 'embedding'
);
INSERT INTO documents (passage) VALUES
('PostgreSQL is a powerful, open-source object-relational database system. It has over 15 years of active development.'),
('Full-text search is a technique for searching in plain-text documents or textual database fields. PostgreSQL supports this with tsvector.'),
('BM25 is a ranking function used by search engines to estimate the relevance of documents to a given search query.'),
('PostgreSQL provides many advanced features like full-text search, window functions, and more.'),
('Search and ranking in databases are important in building effective information retrieval systems.'),
('The BM25 ranking algorithm is derived from the probabilistic retrieval framework.'),
('Full-text search indexes documents to allow fast text queries. PostgreSQL supports this through its GIN and GiST indexes.'),
('The PostgreSQL community is active and regularly improves the database system.'),
('Relational databases such as PostgreSQL can handle both structured and unstructured data.'),
('Effective search ranking algorithms, such as BM25, improve search results by understanding relevance.');
CREATE INDEX documents_embedding_bm25 ON documents USING bm25 (embedding bm25_ops);
SELECT id, passage, embedding <&> to_bm25query('documents_embedding_bm25', tokenize('PostgreSQL', 'tokenizer1')) AS rank
FROM documents
ORDER BY rank
LIMIT 10;Using jieba for Chinese text
For chinese text, you can use jieba pre-tokenizer to segment the text into words. And then train a custom model with segmented words.
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
passage TEXT,
embedding bm25vector
);
-- create a text analyzer which uses jieba pre-tokenizer
SELECT create_text_analyzer('text_analyzer1', $$
[pre_tokenizer.jieba]
$$);
SELECT create_custom_model_tokenizer_and_trigger(
tokenizer_name => 'tokenizer1',
model_name => 'model1',
text_analyzer_name => 'text_analyzer1',
table_name => 'documents',
source_column => 'passage',
target_column => 'embedding'
);
INSERT INTO documents (passage) VALUES
('็บขๆตทๆฉ่ฟไบ๏ผ่นๅจๅฐๅบฆๆด้ขไธๅผ้ฉถ็๏ผไฝๆฏๅคช้ณไพ็ถไธ้ฅถไบบๅฐ่ฟ่ฝๆฉ่ตท๏ผไพตๅ ๅปๅคง้จๅ็ๅคใ'),
('ๅคไปฟไฝ็บธๆตธไบๆฒนๅๆๅ้ๆไฝ๏ผๅฎ็ปๅคช้ณๆฅๆฑไฝไบ๏ผๅไธๅบ่บซๆฅ๏ผไน่ฎธๆฏ็ปๅคช้ณ้ถ้ไบ๏ผๆไปฅๅค็
งๆ้่คชๅ็ๅค่ฒไนๅธฆ็้
ก็บขใ'),
('ๅฐ็บขๆถ้้๏ผ่น่ฑ้็็กไบบไนไธ่บซ่
ปๆฑๅฐ้ๆฅ๏ผๆดไบๆพก่ตถๅฐ็ฒๆฟไธๅนๆตท้ฃ๏ผๅๆฏไธๅคฉๅผๅงใ'),
('่ฟๆฏไธๆไธๆฌ๏ผๅไธญๅฝๆงๅ็ไธไผ๏ผไธๅนดๆ็ญ็ๆถๅใๅจไธญๅฝ็ญๅพๆดๆฏๅธธๅนดๅฉๅฎณ๏ผไบๅๅคงๅฎถ้ฝ่ฏดๆฏๅ
ตๆไน่ฑก๏ผๅ ไธบ่ฟๅฐฑๆฏๆฐๅฝไบๅๅ
ญๅนดใไธไนไธไธๅนดใใ'),
('่ฟๆกๆณๅฝ้ฎ่น็ฝๆๆฅ้ๅญ็ตๅท๏ผVicomtedeBragelonne๏ผๆญฃๅไธญๅฝๅผๆฅใ'),
('ๆฉๆจๅ
ซ็นๅค้๏ผๅฒๆด่ฟ็ไธ็ญ่ฑ็ฒๆฟๆนฟๆๆชๅนฒ๏ผไฝๅทฒๅๆปกไบไบบ๏ผๆณๅฝไบบใๅพทๅฝๆตไบกๅบๆฅ็็นๅคชไบบใๅฐๅบฆไบบใๅฎๅไบบ๏ผไธ็จ่ฏด่ฟๆไธญๅฝไบบใ'),
('ๆตท้ฃ้ๆฉๅซ็็ฅ็ญ๏ผ่ไบบ่บซไฝ็ป็้ฃๅนๅนฒไบ๏ผไธไธๅฑๆฑ็ป็็้๏ผไปฟไฝๅๅจๅทดๅๆฏๅฆ็ๆญปๆตท้ๆด่ฟๆพกใ'),
('ๆฏ็ซๆฏๆธ
ๆจ๏ผไบบ็ๅ
ด่ด่ฟๆฒก็ปๅคช้ณๆ่๏ผ็ๆ๏ผ่ฏด่ฏๅไบ้ฝๅพ่ตทๅฒใ'),
('้ฃๅ ไธชๆฐๆดพๅฐๅฎๅๆไธญๅฝ็ง็ๅฝ่ญฆๅฏ็ๆณๅฝไบบ๏ผๆญฃๅดไบ้ฃๅนด่ฝปๅๆๅจ็็นๅคชๅฅณไบบๅจ่ฐๆ
ใ'),
('ไฟพๆฏ้บฆๆพ่ฏด่ฟ๏ผๆณๅฝๅ
ฌไฝฟๅคงไฝฟ็็น็น๏ผๅฐฑๆฏไธๅฅๅคๅฝ่ฏไธไผ่ฎฒ๏ผ่ฟๅ ไฝ่ญฆๅฏๅนถไธๆๅพทๆ๏ผๅฑ
็ถไผ ๆ
่พพๆ๏ผๅผๅพ็นๅคชๅฅณไบบๆ ผๆ ผๅฐ็ฌ๏ผๆฏไปไปฌ็ๅคไบคๅฎๅผบๅคไบใ'),
('่ฟๅฅณไบบ็ๆผไบฎไธๅคซ๏ผๅจๆ้กพ่ไนไน๏ผๅ ไธบไปๅ ๅคฉๆฅ๏ผ้ฆ็ใๅค้
ใๆ ๆชฌๆฐดๆฒพๅ
ไบไธๅฐใ'),
('็บขๆตทๅทฒ่ฟ๏ผไธๆ็ญๆๅผ็ซ๏ผๆไปฅ็ญไธไผ็ฒๆฟไธ้ถๆๆ็ฎใ็บธ็ใ็ถๅกไนๅค๏ผ้ฆ็ๅคดๅฎๅ้ๅค็ๆฏใ'),
('ๆณๅฝไบบ็ๆๆณๆฏๆๅ็ๆธ
ๆฅ๏ผไป็ๆ็ซ ไนๆ็ฝๅนฒๅ๏ผไฝๆฏไป็ๅไบ๏ผๆ ไธๆททไนฑใ่ฎ่ใๅงๅ๏ผไฝ็่ฟ่นไธ็ไนฑ็ณ็ณใ'),
('่ฟ่น๏ผๅไปไบบ็ๆบๅทง๏ผ่ฝฝๆปกไบบ็ๆฐๆ๏ผๅฏๆปกไบบ็ๅธๆ๏ผ็ญ้นๅฐ่ก็๏ผๆฏๅ้ๆๆฒพๆฑกไบไบบๆฐ็ไธๅฐๆนๅฐ้ข๏ผ่ฟ็ป้ฃๆ ๆ
ใๆ ๅฐฝใๆ ้
็ๅคงๆตทใ');
CREATE INDEX documents_embedding_bm25 ON documents USING bm25 (embedding bm25_ops);
SELECT id, passage, embedding <&> to_bm25query('documents_embedding_bm25', tokenize('ไบบ', 'tokenizer1')) AS rank
FROM documents
ORDER BY rank
LIMIT 10;Using lindera for Japanese text
For Japanese text, you can use lindera model with its configuration.
It requires extra compile flags. We don't enable it default, and you need to recompile it from source.
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
passage TEXT,
embedding bm25vector
);
-- using lindera config to customize the tokenizer, see https://github.com/lindera/lindera
SELECT create_lindera_model('lindera_ipadic', $$
[segmenter]
mode = "normal"
[segmenter.dictionary]
kind = "ipadic"
[[character_filters]]
kind = "unicode_normalize"
[character_filters.args]
kind = "nfkc"
[[character_filters]]
kind = "japanese_iteration_mark"
[character_filters.args]
normalize_kanji = true
normalize_kana = true
[[character_filters]]
kind = "mapping"
[character_filters.args.mapping]
"ใชใณใใฉ" = "Lindera"
[[token_filters]]
kind = "japanese_compound_word"
[token_filters.args]
kind = "ipadic"
tags = [ "ๅ่ฉ,ๆฐ", "ๅ่ฉ,ๆฅๅฐพ,ๅฉๆฐ่ฉ" ]
new_tag = "ๅ่ฉ,ๆฐ"
[[token_filters]]
kind = "japanese_number"
[token_filters.args]
tags = [ "ๅ่ฉ,ๆฐ" ]
[[token_filters]]
kind = "japanese_stop_tags"
[token_filters.args]
tags = [
"ๆฅ็ถ่ฉ",
"ๅฉ่ฉ",
"ๅฉ่ฉ,ๆ ผๅฉ่ฉ",
"ๅฉ่ฉ,ๆ ผๅฉ่ฉ,ไธ่ฌ",
"ๅฉ่ฉ,ๆ ผๅฉ่ฉ,ๅผ็จ",
"ๅฉ่ฉ,ๆ ผๅฉ่ฉ,้ฃ่ช",
"ๅฉ่ฉ,ไฟๅฉ่ฉ",
"ๅฉ่ฉ,ๅฏๅฉ่ฉ",
"ๅฉ่ฉ,้ๆๅฉ่ฉ",
"ๅฉ่ฉ,ไธฆ็ซๅฉ่ฉ",
"ๅฉ่ฉ,็ตๅฉ่ฉ",
"ๅฉ่ฉ,ๅฏๅฉ่ฉ๏ผไธฆ็ซๅฉ่ฉ๏ผ็ตๅฉ่ฉ",
"ๅฉ่ฉ,้ฃไฝๅ",
"ๅฉ่ฉ,ๅฏ่ฉๅ",
"ๅฉ่ฉ,็นๆฎ",
"ๅฉๅ่ฉ",
"่จๅท",
"่จๅท,ไธ่ฌ",
"่จๅท,่ชญ็น",
"่จๅท,ๅฅ็น",
"่จๅท,็ฉบ็ฝ",
"่จๅท,ๆฌๅผง้",
"ใใฎไป,้ๆ",
"ใใฃใฉใผ",
"้่จ่ช้ณ"
]
[[token_filters]]
kind = "japanese_katakana_stem"
[token_filters.args]
min = 3
[[token_filters]]
kind = "remove_diacritical_mark"
[token_filters.args]
japanese = false
$$);
SELECT create_tokenizer('lindera_ipadic', $$
model = "lindera_ipadic"
$$);
INSERT INTO documents (passage) VALUES
('ใฉใใง็ใใใใจใใจ่ฆๅฝใใใจใใใคใใฌใ'),
('ไฝใงใ่ๆใใใใใใใๆใงใใฃใผใใฃใผๆณฃใใฆใใไบใ ใใฏ่จๆถใใฆใใใ'),
('ๅพ่ผฉใฏใใใงๅงใใฆไบบ้ใจใใใใฎใ่ฆใใ'),
('ใใใใใจใง่ใใจใใใฏๆธ็ใจใใไบบ้ไธญใงไธ็ช็ฐๆชใฉใใใใช็จฎๆใงใใฃใใใใ ใ'),
('ใใฎๆธ็ใจใใใฎใฏๆใ
ๆใ
ใๆใคใใพใใฆ็
ฎใซใฆ้ฃใใจใใ่ฉฑใงใใใ'),
('ใใใใใฎๅฝๆใฏไฝใจใใ่ใใชใใฃใใใๅฅๆฎตๆใใใจใๆใใชใใฃใใ'),
('ใใ ๅฝผใฎๆใฆใฎใฒใใซ่ผใใใใฆในใผใจๆใกไธใใใใๆไฝใ ใใใฏใใฏใใๆใใใใฃใใฐใใใงใใใ'),
('ๆใฎไธใงๅฐใ่ฝใกใคใใฆๆธ็ใฎ้กใ่ฆใใฎใใใใใไบบ้ใจใใใใฎใฎ่ฆๅงใฟใฏใใใงใใใใ'),
('ใใฎๆๅฆใชใใฎใ ใจๆใฃใๆใใไปใงใๆฎใฃใฆใใใ'),
('็ฌฌไธๆฏใใใฃใฆ่ฃ
้ฃพใใในใใฏใใฎ้กใใคใใคใใใฆใพใใง่ฌ็ผถใใใใ ใ'),
('ใใฎๅพใ็ซใซใใ ใใถ้ขใใฃใใใใใช็่ผชใใใใซใฏไธๅบฆใๅบไผใงใใใใไบใใชใใ'),
('ใฎใฟใชใใ้กใฎ็ไธญใใใพใใซ็ช่ตทใใฆใใใ'),
('ใใใใฆใใฎ็ฉดใฎไธญใใๆใ
ใทใใทใใจ็
ใใใใๅนใใ'),
('ใฉใใๅฝใใใฝใใฆๅฎใซๅผฑใฃใใ'),
('ใใใไบบ้ใฎ้ฃฒใ็
่ใใฐใใจใใใใฎใงใใไบใฏใใใใใใฎ้ ็ฅใฃใใ');
UPDATE documents SET embedding = tokenize(passage, 'lindera_ipadic');
CREATE INDEX documents_embedding_bm25 ON documents USING bm25 (embedding bm25_ops);
SELECT id, passage, embedding <&> to_bm25query('documents_embedding_bm25', tokenize('ๆธ็', 'lindera_ipadic')) AS rank
FROM documents
ORDER BY rank
LIMIT 10;Tokenizer configuration is a critical aspect of effective text processing, significantly impacting the performance and accuracy. Here are some key considerations and options to help you choose the right tokenizer for your use case.
Tokenizers can be configured in two primary ways:
- Pre-Trained Models: Suitable for most standard use cases, these models are efficient and require minimal setup. They are ideal for general-purpose applications where the text aligns with the model's training data.
- Custom Models: Offer flexibility and superior accuracy for specialized texts. These models are trained on specific corpora, making them suitable for domains with unique terminology, such as technical fields or industry-specific jargon.
Usage Details can be found in pg_tokenizer doc
- Language and Script:
- Space-Separated Languages (e.g., English, Spanish, German): Simple tokenizers such as
bert(for English) orunicodetokenizers are effective here. - Non-Space-Separated Languages (e.g., Chinese, Japanese): These require specialized algorithms (pre-tokenizer) that understand language structure beyond simple spaces. You can refer Chinese and Japanese example.
- Multilingual Data: Handling multiple languages within a single index requires tokenizers designed for multilingual support, such as
gemma2borllmlingua2, which efficiently manage diverse scripts and languages.
- Vocabulary Complexity:
- Standard Language: For texts with common vocabulary, pre-trained models are sufficient. They handle everyday language efficiently without requiring extensive customization.
- Specialized Texts: Technical terms, abbreviations (e.g., "k8s" for Kubernetes), or compound nouns may need custom models. Custom models can be trained to recognize domain-specific terms, ensuring accurate tokenization. Custom synonyms may also be necessary for precise results. See custom model example.
For each connection, Postgresql will load the model at the first time you use it. This may cause a delay for the first query. You can use the add_preload_model function to preload the model at the server startup.
psql -c "SELECT add_preload_model('model1')"
# restart the PostgreSQL to take effects
sudo docker restart container_name # for pg_tokenizer running with docker
sudo systemctl restart postgresql.service # for pg_tokenizer running with systemdThe default preload model is llmlingua2. You can change it by using add_preload_model, remove_preload_model functions.
Note: The pre-trained model may take a lot of memory (100MB for gemma2b, 200MB for llmlingua2). If you have a lot of models, you should consider the memory usage when you preload the model.
PostgreSQL supports full-text search using the tsvector data type and GIN indexes. Text is transformed into a tsvector, which tokenizes content into standardized lexemes, and a GIN index accelerates searchesโeven on large text fields. However, PostgreSQL lacks modern relevance scoring methods like BM25; it retrieves all matching documents and re-ranks them using ts_rank, which is inefficient and can obscure the most relevant results.
ParadeDB is an alternative that functions as a full-featured PostgreSQL replacement for ElasticSearch. It offloads full-text search and filtering operations to Tantivy and includes BM25 among its features, though it uses a different query and filter syntax than PostgreSQL's native indexes.
In contrast, Vectorchord-bm25 focuses exclusively on BM25 ranking within PostgreSQL. We implemented the BM25 ranking algorithm Block WeakAnd from scratch and built it as a custom operator and index (similar to pgvector) to accelerate queries. It is designed to be lightweight and a more native and intuitive API for better full-text search and ranking in PostgreSQL.
- The index will return up to
bm25_catalog.bm25_limitresults to PostgreSQL. Users need to adjust thebm25_catalog.bm25_limitfor more results when using larger limit values or stricter filter conditions. - We currently have only tested against English. Other language can be supported with bpe tokenizer with larger vocab like tiktoken out of the box. Feel free to talk to us or raise issue if you need more language support.
bm25vector: A specialized vector type for storing BM25 tokenized text. Structured as a sparse vector, it stores token IDs and their corresponding frequencies. For example,{1:2, 2:1}indicates that token ID 1 appears twice and token ID 2 appears once in the document.bm25query: A query type for BM25 ranking.
to_bm25query(index_name regclass, query_vector bm25vector) RETURNS bm25query: Convert the input text into a BM25 query.
bm25vector = bm25vector RETURNS boolean: Check if two BM25 vectors are equal.bm25vector <> bm25vector RETURNS boolean: Check if two BM25 vectors are not equal.bm25vector <&> bm25query RETURNS float4: Calculate the negative BM25 score between the BM25 vector and query. The lower the score, the more relevant the document is. (This is intentionally designed to be negative for easier sorting.)
int[]::bm25vector (implicit): Cast an integer array to a BM25 vector. The integer array represents token IDs, and the cast aggregates duplicates into frequencies, ignoring token order. For example,{1, 2, 1}will be cast to{1:2, 2:1}(token ID 1 appears twice, token ID 2 appears once).
bm25_catalog.bm25_limit (integer): The maximum number of documents to return in a search. Default is 100, minimum is -1, and maximum is 65535. When set to -1, it will perform brute force search and return all documents with scores greater than 0.bm25_catalog.enable_index (boolean): Whether to enable the bm25 index. Default is true.bm25_catalog.segment_growing_max_page_size (integer): The maximum page count of the growing segment. When the size of the growing segment exceeds this value, the segment will be sealed into a read-only segment. Default is 4,096, minimum is 1, and maximum is 1,000,000.
This software is licensed under a dual license model:
-
GNU Affero General Public License v3 (AGPLv3): You may use, modify, and distribute this software under the terms of the AGPLv3.
-
Elastic License v2 (ELv2): You may also use, modify, and distribute this software under the Elastic License v2, which has specific restrictions.
You may choose either license based on your needs. We welcome any commercial collaboration or support, so please email us [email protected] with any questions or requests regarding the licenses.