Releases: groonga/groonga
Groonga 15.1.8 - 2025-10-31
In this release, semantic search is supported.
Improvements
[status] Added .["features"]["faiss"] entry
The .["features"] entry shows whether each feature is enabled or not like the following:
[
["..."],
{
"...": "...",
"features": {
"nfkc": true,
"mecab": true,
"...": "..."
},
"...": "..."
}
]This release added faiss to the .["features"] entry:
[
["..."],
{
"...": "...",
"features": {
"nfkc": true,
"mecab": true,
"...": "...",
"faiss": true,
"...": "..."
},
"...": "..."
}
]You can use .["features"]["faiss"] to know whether Faiss is enabled or not.
You need the Faiss support to use semantic search described next.
Supported semantic search
This feature is still experimental and unstable.
Added token_language_model_knn tokenizer and language_model_knn function.
Using these together you can do semantic search.
Here is an example of semantic search.
select Data \
--filter 'language_model_knn(text, "male child")' \
--output_columns text
[
[
0,
0.0,
0.0
],
[
[
[
3
],
[
[
"text",
"ShortText"
]
],
[
"I am a boy."
],
[
"This is an apple."
],
[
"Groonga is a full text search engine."
]
]
]
]
The important feature is that it searches using text, just like keyword search.
Internally, Groonga creates embedding automatically and use it for semantic search.
Users do not need to create embeddings and search using them.
You can do semantic search with text, just like keyword search.
We omitted examples, but the same works when loading text for search.
Simply load text into Groonga as before, and you can do semantic search.
Users don't need to create embeddings and load them.
This is because Groonga automatically generates embeddings as well.
For details on setting up indexes and more, see the pages for token_language_model_knn tokenizer and language_model_knn function.
[language_model_vectorize] Added support for auto download from Hugging Face
This feature is still experimental and unstable.
Specifying a Hugging Face URI in model_name will automatically download the language model.
The model is downloaded during the first execution and located in the Groonga database directory.
Subsequent executions use the local model files.
language_model_vectorize("hf:///groonga/all-MiniLM-L6-v2-Q4_K_M-GGUF", content)
Using this feature is useful because Groonga handles the download and placement.
[types] Add ShortBinary, Binary and LargeBinary
This feature is still experimental and unstable.
Contributors
$ git shortlog -sn v15.1.7..
51 Sutou Kouhei
8 Abe Tomoaki
7 takuya kodama
5 Horimoto Yasuhiro
2 dependabot[bot]Translations
Groonga 15.1.7 - 2025-09-29
In this release, NormalizerNFKC can normalize Japanese iteration marks and fixed installation failure on AlmaLinux 10.
Improvements
[grndb] Improved error handling for large database files
Previously, grndb terminated abnormally when processing database files that
exceeded the filesystem stat limit (such as files files larger than 2GB on
Windows).
In this release, when grndb processes such files in the Groonga database
directory (the db file and related db.* files), it records an error for each
problematic file and continues to completion without aborting.
[normalizer-nfkc-unify-iteration-mark] Added support for iteration marks with unify_iteration_mark option
The unify_iteration_mark option now supports additional iteration mark characters.
This option treats iteration marks as repeats of the immediately preceding character as below.
- Hiragana Iteration Mark γ (U+309D)
- Hiragana Voiced Iteration Mark γ (U+309E)
- Katakana Iteration Mark γ½ (U+30FD)
- Katakana Voiced Iteration Mark γΎ (U+30FE)
- Ideographic Iteration Mark γ (U+3005) - limitation: only repeats the immediately preceding single character
- Vertical Ideographic Iteration Mark γ» (U+303B) - limitation: only repeats the immediately preceding single character
Here is an example of using unify_iteration_mark option.
normalize \
'NormalizerNFKC("unify_iteration_mark", true)' \
"γγγ"
[
[
0,
1758763896.821301,
0.0001749992370605469
],
{
"normalized": "γγγ",
"types": [
],
"checks": [
]
}
]
For Ideographic Iteration Mark (γ
) and Vertical Ideographic Iteration Mark (γ»), this feature only repeats the immediately preceding single character.
Patterns beyond **repeat the previous one character** are **not** supported like the following cases.
Examples:
- "ι¨εγ
γ
" -> "ι¨ει¨ε"
- "ε€γ
γ
η±³" -> "ε€ε€ε€η±³"
Added new command to list available commands
A new command_list command has been added that returns a list of all available
Groonga commands.
Currently, this command returns only the command ID and name for each command.
Using this command could enable automatic generation of client library APIs
and help implement Groonga MCP (Model Context Protocol) servers.
In future releases, we plan to expand the output to include command summaries,
descriptions, and detailed argument information.
command_list
[
[
0,
1758764636.669152,
0.0002362728118896484
],
{
"cache_limit": {
"id": 150,
"name": "cache_limit"
},
...
}
]
[token_filter_stem] Added support for non-ASCII alphabets
Reported by Tsai, Xing Wei
Previously, the TokenFilterStem filter only worked with ASCII alphabets.
Now it supports stemming for non-ASCII alphabets such as Arabic.
Here is an example of using TokenFilterStem with Arabic text:
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenNgram \
--normalizer 'NormalizerNFKC("version", "16.0.0")' \
--token_filters 'TokenFilterStem("algorithm", "arabic")'
table_tokenize Terms "Ψ§ΩΩΨͺΨ§Ψ¨ Ω
ΩΩΨ―" --mode ADD
[
[
0,
0.0,
0.0
],
[
{
"value": "ΩΨͺΨ§Ψ¨",
"position": 0,
"force_prefix": false,
"force_prefix_search": false
},
{
"value": "Ω
ΩΩΨ―",
"position": 1,
"force_prefix": false,
"force_prefix_search": false
}
]
]
Fixes
[almalinux] Fixed installation failure on AlmaLinux 10
Previously, installation could fail with a dnf GPG check error due to the
following outdated RPM GPG key in groonga-release.
Weβve removed the old key and now ship only the RSA4096 key, so installs work as
expected now.
$ dnf install -y --enablerepo=epel --enablerepo=crb groonga
...
error: Certificate 72A7496B45499429:
Policy rejects 72A7496B45499429: No binding signature at time 2025-09-24T09:35:25Z
Key import failed (code 2). Failing package is: groonga-15.1.5-1.el10.x86_64
GPG Keys are configured as: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-34839225, file:///etc/pki/rpm-gpg/RPM-GPG-KEY-45499429
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: GPG check FAILEDWho should upgrade?
Most users do not need to upgrade.
Only users who installed groonga-release on AlmaLinux 10 before 2025/09/24
need to upgrade to the latest package using the steps below.
If you installed groonga-release on AlmaLinux 10 on or after 2025/09/24,
no action is required.
$ sudo dnf upgrade --refresh groonga-releaseAfter upgrading the package, groonga-release contains only the new key:
$ dnf repoquery -l --installed groonga-release
/etc/pki/rpm-gpg
/etc/pki/rpm-gpg/RPM-GPG-KEY-34839225
/etc/yum.repos.d
/etc/yum.repos.d/groonga-almalinux.repo
/etc/yum.repos.d/groonga-amazon-linux.repoThanks
- Tsai, Xing Wei
Contributors
$ git shortlog -sn v15.1.5..
15 takuya kodama
11 Horimoto Yasuhiro
7 Sutou Kouhei
1 Abe Tomoaki
1 dependabot[bot]Translations
Groonga 15.1.5 - 2025-08-29
In this release, we supported KEY_LARGE flag for TABLE_PAT_KEY!
Improvements
[table_create] Added support for KEY_LARGE flag for TABLE_PAT_KEY
You can now use the KEY_LARGE flag with TABLE_PAT_KEY tables to expand the maximum total key size from 4GiB to 1TiB,
similar to TABLE_HASH_KEY tables as below. This allows you to store more keys in total.
table_create LargePaths TABLE_PAT_KEY|KEY_LARGE ShortText
[normalizer_nfkc] Added support for unify_hyphen_and_prolonged_sound_mark and remove_symbol combination
Previously, when both unify_hyphen_and_prolonged_sound_mark and remove_symbol options were enabled together,
This combination didn't remove hyphen characters as expected because the hyphen characters were not properly treated as symbols to be removed.
This release fixes this issue. So, hyphen characters are properly removed from the normalized text as below.
normalize \
'NormalizerNFKC("remove_symbol", true, \
"unify_hyphen_and_prolonged_sound_mark", true)' \
"090γΌ1234-5678"
[
[
0,
1756363926.409565,
0.0003023147583007812
],
{
"normalized": "09012345678",
"types": [
],
"checks": [
]
}
]
[almalinux] Added support for AlmaLinux 10
AlmaLinux 10 packages are now available.
You can install Groonga on AlmaLinux 10 using the standard package installation methods.
Fixes
[cmake] Fixed how to build/install
Patched by Tsutomu Katsube
The documentation included an incorrect -B option in the cmake --build and cmake --install commands,
which caused build errors.
The corrected commands are now:
cmake --build <Build directory path>
cmake --install <Build directory path>
[table_create] Fixed a bug where KEY_LARGE flag was lost after executing truncate command
This issue meant that when you executed the truncate command on a TABLE_HASH_KEY table with the KEY_LARGE flag,
the table could no longer hold more than 4 GiB of total key data, because the KEY_LARGE flag was removed during the truncation.
Thanks
- Tsutomu Katsube
Contributors
$ git shortlog -sn v15.1.4..
29 Horimoto Yasuhiro
9 takuya kodama
3 Abe Tomoaki
2 dependabot[bot]
1 Sutou Kouhei
1 Tsutomu KatsubeTranslations
Groonga 15.1.4 - 2025-07-29
In this release, we fixed a bug in the interval calculation between phrases in *ONPP operator.
Improvements
[Ordered near phrase product search] Fixed a bug in the interval calculation between phrases
This problem may occur when we use *ONPP with MAX_ELEMENT_INTERVAL such as *ONPP-1,0,10"(abc bcd) (defg)".
If you don't use MAX_ELEMENT_INTERVAL, this problem doesn't occur.
Please refer to the following links for usage and syntax of *ONPP.
- [
*ONPPin query syntax] query-syntax-ordered-near-phrase-product-search-condition - [
*ONPPin script syntax] script-syntax-ordered-near-phrase-product-search-operator
If this problem occurs, the following things may happen.
- Groonga may return records that shouldn't be matched.
- Groonga may not return records that should be matched.
Contributors
$ git shortlog -sn v15.1.3..
10 Horimoto Yasuhiro
2 Abe TomoakiTranslations
Groonga 15.1.3 - 2025-07-18
Improvements
[Apache Arrow] Added support for Apache Arrow C++ 21.0.0
Contributors
$ git shortlog -sn v15.1.2..
13 Horimoto Yasuhiro
2 takuya kodama
1 Abe Tomoaki
1 Sutou KouheiTranslations
Groonga 15.1.2 - 2025-07-07
Improvements
[Windows] Drop support for Groonga package that is built with Visual Studio 2019
We don't provide the following packages since this release.
- groonga-xx.x.x-x64-vs2019.zip
- groonga-xx.x.x-x64-vs2019-with-vcruntime.zip
Fixes
[Near phrase search] Fixed a bug that interval between phrases calculation
This problem may occur when we use *NP, *NPP, or *ONP with MAX_ELEMENT_INTERVAL as below.
*NP-1,0,12,11"abc ef"*NPP-1,0,10,9"(abc bcd) (ef)"*ONP-1,0,5|6 "abc defghi jklmnop"
If you don't use MAX_ELEMENT_INTERVAL, this problem doesn't occur.
Please refer to the following links about usage and syntax of *NP, *NPP, or *ONPP.
- [
*NPin query syntax] query-syntax-near-phrase-search-condition - [
*NPin script syntax] script-syntax-near-phrase-search-operator - [
*NPPin query syntax] query-syntax-near-phrase-product-search-condition - [
*NPPin script syntax] script-syntax-near-phrase-product-search-operator - [
*ONPPin query syntax] query-syntax-ordered-near-phrase-search-condition - [
*ONPPin script syntax] script-syntax-ordered-near-phrase-search-operator
If this problem occurs, the following things may happen.
- Groonga may return records shouldn't be a hit.
- Groonga may not return records that should be returned as hits.
Contributors
$ git shortlog -sn v15.1.1..
24 Abe Tomoaki
19 Horimoto Yasuhiro
10 takuya kodama
2 Sutou KouheiTranslations
Groonga 15.1.1 - 2025-06-02
This release updates TokenMecab to preserve user-defined entries with spaces as
single tokens.
Improvements
token_mecab: Fix unintended splitting of user-defined entries with spaces
Previously, token-mecab split user-defined entries containing spaces
(e.g., "search engine") into separate tokens ("search" and "engine"). This
release fixes this issue, so entries with embedded spaces are now preserved and
handled as single tokens like "search engine" as follows.
tokenize TokenMecab "search engine" --output_pretty yes
[
[
0,
1748413131.972704,
0.0003032684326171875
],
[
{
"value": "search engine",
"position": 0,
"force_prefix": false,
"force_prefix_search": false
}
]
]
Fixes
Fixed many typos in documentation
GH-2332,
GH-2333,
GH-2334,
GH-2335,
GH-2336,
GH-2337,
GH-2338
Patched by Vasilii Lakhin.
Thanks
- Vasilii Lakhin
Contributors
$ git shortlog -sn v15.0.9..
9 takuya kodama
7 Vasilii Lakhin
6 Sutou Kouhei
2 Abe Tomoaki
2 Horimoto YasuhiroTranslations
Groonga 15.0.9 - 2025-05-08
This release adds the tokenizer's option to make token inspection simpler and
improves negative-division semantics for unsigned integer.
Improvements
tokenize/table_tokenize: Added tokenize-output-style option
This tokenize-output-style option to the
tokenize/table_tokenize
command makes it easier to focus on the tokens when you donβt need the full
attribute set.
Here is example of using tokenize-output-style option.
tokenize TokenNgram "Fulltext Search" --output_style simple
[
[
0,
1746573056.540744,
0.0007045269012451172
],
[
"Fu",
"ul",
"ll",
"lt",
"te",
"ex",
"xt",
"t ",
" S",
"Se",
"ea",
"ar",
"rc",
"ch",
"h"
]
]
Clarified X / negative value semantics
Previously, only dividing X by -1/1.0 returns -X for unsigned integers.
From this release, dividing by any negative value will yield the mathematically
expected negative result as follows.
- Before:
X / -2might not return-(X / 2). - After:
X / -2always returns-(X / 2).
This is a backward incompatible change but we assume that no user depends on
this behavior.
Contributors
$ git shortlog -sn v15.0.4..Translations
Groonga 15.0.4 - 2025-03-29
Improvements
Clarified X / -1 and X / -1.0 semantics
In many languages, X / -1 and X / -1.0 return -X. But Groonga
may not return -X when X is unsigned integer.
X / -1 and X / -1.0 always return -X from this release.
This is a backward incompatible change but we assume that no user
depends on this behavior.
Translations
Groonga 15.0.3 - 2025-03-10
Improvements
offline-index-construction: Added support for parallel construction with table-hash-key lexicon
Parallel offline index construction iterates sorted terms
internally. table-pat-key and table-pat-key can do it
effectively because they are based on tree. But table-hash-key
can't do it effectively because it's not based on tree. So we didn't
support parallel offline index construction with
table-hash-key lexicon.
This release adds support for parallel offline index construction with
table-hash-key lexicon. It sort terms in a normal way. So it's
not so effective. Parallel offline index construction with
table-hash-key lexicon will be slower than
table-pat-key/table-dat-key. But it may be faster than
sequential offline index construction with table-hash-key
lexicon.