Codestin Search App

In this release, semantic search is supported.

Improvements

[status] Added `.["features"]["faiss"]` entry

The .["features"] entry shows whether each feature is enabled or not like the following:

[
  ["..."],
  {
    "...": "...",
    "features": {
      "nfkc": true,
      "mecab": true,
      "...": "..."
    },
    "...": "..."
  }
]

This release added faiss to the .["features"] entry:

[
  ["..."],
  {
    "...": "...",
    "features": {
      "nfkc": true,
      "mecab": true,
      "...": "...",
      "faiss": true,
      "...": "..."
    },
    "...": "..."
  }
]

You can use .["features"]["faiss"] to know whether Faiss is enabled or not.

You need the Faiss support to use semantic search described next.

Supported semantic search

This feature is still experimental and unstable.

Added token_language_model_knn tokenizer and language_model_knn function.
Using these together you can do semantic search.

Here is an example of semantic search.

select Data \
  --filter 'language_model_knn(text, "male child")' \
  --output_columns text

[
  [
    0,
    0.0,
    0.0
  ],
  [
    [
      [
        3
      ],
      [
        [
          "text",
          "ShortText"
        ]
      ],
      [
        "I am a boy."
      ],
      [
        "This is an apple."
      ],
      [
        "Groonga is a full text search engine."
      ]
    ]
  ]
]

The important feature is that it searches using text, just like keyword search.
Internally, Groonga creates embedding automatically and use it for semantic search.

Users do not need to create embeddings and search using them.
You can do semantic search with text, just like keyword search.

We omitted examples, but the same works when loading text for search.
Simply load text into Groonga as before, and you can do semantic search.
Users don't need to create embeddings and load them.
This is because Groonga automatically generates embeddings as well.

For details on setting up indexes and more, see the pages for token_language_model_knn tokenizer and language_model_knn function.

[language_model_vectorize] Added support for auto download from Hugging Face

This feature is still experimental and unstable.

Specifying a Hugging Face URI in model_name will automatically download the language model.
The model is downloaded during the first execution and located in the Groonga database directory.
Subsequent executions use the local model files.

language_model_vectorize("hf:///groonga/all-MiniLM-L6-v2-Q4_K_M-GGUF", content)

Using this feature is useful because Groonga handles the download and placement.

[types] Add `ShortBinary`, `Binary` and `LargeBinary`

This feature is still experimental and unstable.

Contributors

$ git shortlog -sn v15.1.7..
    51	Sutou Kouhei
     8	Abe Tomoaki
     7	takuya kodama
     5	Horimoto Yasuhiro
     2	dependabot[bot]

Translations

Japanese

In this release, NormalizerNFKC can normalize Japanese iteration marks and fixed installation failure on AlmaLinux 10.

Improvements

[grndb] Improved error handling for large database files

Previously, grndb terminated abnormally when processing database files that
exceeded the filesystem stat limit (such as files files larger than 2GB on
Windows).
In this release, when grndb processes such files in the Groonga database
directory (the db file and related db.* files), it records an error for each
problematic file and continues to completion without aborting.

[normalizer-nfkc-unify-iteration-mark] Added support for iteration marks with `unify_iteration_mark` option

The unify_iteration_mark option now supports additional iteration mark characters.
This option treats iteration marks as repeats of the immediately preceding character as below.

Hiragana Iteration Mark ゝ (U+309D)
Hiragana Voiced Iteration Mark ゞ (U+309E)
Katakana Iteration Mark ヽ (U+30FD)
Katakana Voiced Iteration Mark ヾ (U+30FE)
Ideographic Iteration Mark 々 (U+3005) - limitation: only repeats the immediately preceding single character
Vertical Ideographic Iteration Mark 〻 (U+303B) - limitation: only repeats the immediately preceding single character

Here is an example of using unify_iteration_mark option.

normalize \
  'NormalizerNFKC("unify_iteration_mark", true)' \
  "こゝろ"
[
  [
    0,
    1758763896.821301,
    0.0001749992370605469
  ],
  {
    "normalized": "こころ",
    "types": [
    ],
    "checks": [
    ]
  }
]

For Ideographic Iteration Mark (々) and Vertical Ideographic Iteration Mark (〻), this feature only repeats the immediately preceding single character.
Patterns beyond **repeat the previous one character** are **not** supported like the following cases.

Examples:
- "部分々々" -> "部分部分"
- "古々々米" -> "古古古米"

Added new command to list available commands

A new command_list command has been added that returns a list of all available
Groonga commands.
Currently, this command returns only the command ID and name for each command.
Using this command could enable automatic generation of client library APIs
and help implement Groonga MCP (Model Context Protocol) servers.
In future releases, we plan to expand the output to include command summaries,
descriptions, and detailed argument information.

command_list
[
  [
    0,
    1758764636.669152,
    0.0002362728118896484
  ],
  {
    "cache_limit": {
      "id": 150,
      "name": "cache_limit"
    },
    ...
  }
]

[token_filter_stem] Added support for non-ASCII alphabets

GH-2539

Reported by Tsai, Xing Wei

Previously, the TokenFilterStem filter only worked with ASCII alphabets.
Now it supports stemming for non-ASCII alphabets such as Arabic.

Here is an example of using TokenFilterStem with Arabic text:

table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer TokenNgram \
  --normalizer 'NormalizerNFKC("version", "16.0.0")' \
  --token_filters 'TokenFilterStem("algorithm", "arabic")'

table_tokenize Terms "الكتاب مفيد" --mode ADD
[
  [
    0,
    0.0,
    0.0
  ],
  [
    {
      "value": "كتاب",
      "position": 0,
      "force_prefix": false,
      "force_prefix_search": false
    },
    {
      "value": "مفيد",
      "position": 1,
      "force_prefix": false,
      "force_prefix_search": false
    }
  ]
]

Fixes

[almalinux] Fixed installation failure on AlmaLinux 10

Previously, installation could fail with a dnf GPG check error due to the
following outdated RPM GPG key in groonga-release.
We’ve removed the old key and now ship only the RSA4096 key, so installs work as
expected now.

$ dnf install -y --enablerepo=epel --enablerepo=crb groonga
...
error: Certificate 72A7496B45499429:
  Policy rejects 72A7496B45499429: No binding signature at time 2025-09-24T09:35:25Z
Key import failed (code 2). Failing package is: groonga-15.1.5-1.el10.x86_64
 GPG Keys are configured as: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-34839225, file:///etc/pki/rpm-gpg/RPM-GPG-KEY-45499429
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: GPG check FAILED

Who should upgrade?

Most users do not need to upgrade.
Only users who installed groonga-release on AlmaLinux 10 before 2025/09/24
need to upgrade to the latest package using the steps below.
If you installed groonga-release on AlmaLinux 10 on or after 2025/09/24,
no action is required.

$ sudo dnf upgrade --refresh groonga-release

After upgrading the package, groonga-release contains only the new key:

$ dnf repoquery -l --installed groonga-release
/etc/pki/rpm-gpg
/etc/pki/rpm-gpg/RPM-GPG-KEY-34839225
/etc/yum.repos.d
/etc/yum.repos.d/groonga-almalinux.repo
/etc/yum.repos.d/groonga-amazon-linux.repo

Thanks

Tsai, Xing Wei

Contributors

$ git shortlog -sn v15.1.5..
    15	takuya kodama
    11	Horimoto Yasuhiro
     7	Sutou Kouhei
     1	Abe Tomoaki
     1	dependabot[bot]

Translations

Japanese

In this release, we supported KEY_LARGE flag for TABLE_PAT_KEY!

Improvements

[table_create] Added support for `KEY_LARGE` flag for `TABLE_PAT_KEY`

You can now use the KEY_LARGE flag with TABLE_PAT_KEY tables to expand the maximum total key size from 4GiB to 1TiB,
similar to TABLE_HASH_KEY tables as below. This allows you to store more keys in total.

table_create LargePaths TABLE_PAT_KEY|KEY_LARGE ShortText

[normalizer_nfkc] Added support for `unify_hyphen_and_prolonged_sound_mark` and `remove_symbol` combination

Previously, when both unify_hyphen_and_prolonged_sound_mark and remove_symbol options were enabled together,
This combination didn't remove hyphen characters as expected because the hyphen characters were not properly treated as symbols to be removed.

This release fixes this issue. So, hyphen characters are properly removed from the normalized text as below.

normalize \
  'NormalizerNFKC("remove_symbol", true, \
  "unify_hyphen_and_prolonged_sound_mark", true)' \
  "090ー1234-5678"
[
  [
    0,
    1756363926.409565,
    0.0003023147583007812
  ],
  {
    "normalized": "09012345678",
    "types": [
    ],
    "checks": [
    ]
  }
]

[almalinux] Added support for AlmaLinux 10

AlmaLinux 10 packages are now available.
You can install Groonga on AlmaLinux 10 using the standard package installation methods.

Fixes

[cmake] Fixed how to build/install

GH-2479

Patched by Tsutomu Katsube

The documentation included an incorrect -B option in the cmake --build and cmake --install commands,
which caused build errors.

The corrected commands are now:

cmake --build <Build directory path>
cmake --install <Build directory path>

[table_create] Fixed a bug where `KEY_LARGE` flag was lost after executing `truncate` command

This issue meant that when you executed the truncate command on a TABLE_HASH_KEY table with the KEY_LARGE flag,
the table could no longer hold more than 4 GiB of total key data, because the KEY_LARGE flag was removed during the truncation.

Thanks

Tsutomu Katsube

Contributors

$ git shortlog -sn v15.1.4..
    29	Horimoto Yasuhiro
     9	takuya kodama
     3	Abe Tomoaki
     2	dependabot[bot]
     1	Sutou Kouhei
     1	Tsutomu Katsube

Translations

Japanese

In this release, we fixed a bug in the interval calculation between phrases in *ONPP operator.

Improvements

[Ordered near phrase product search] Fixed a bug in the interval calculation between phrases

This problem may occur when we use *ONPP with MAX_ELEMENT_INTERVAL such as *ONPP-1,0,10"(abc bcd) (defg)".
If you don't use MAX_ELEMENT_INTERVAL, this problem doesn't occur.

Please refer to the following links for usage and syntax of *ONPP.

[*ONPP in query syntax] query-syntax-ordered-near-phrase-product-search-condition
[*ONPP in script syntax] script-syntax-ordered-near-phrase-product-search-operator

If this problem occurs, the following things may happen.

Groonga may return records that shouldn't be matched.
Groonga may not return records that should be matched.

Contributors

$ git shortlog -sn v15.1.3..
    10	Horimoto Yasuhiro
     2	Abe Tomoaki

Translations

Japanese

Improvements

[Apache Arrow] Added support for Apache Arrow C++ 21.0.0

Contributors

$ git shortlog -sn v15.1.2..
    13	Horimoto Yasuhiro
     2	takuya kodama
     1	Abe Tomoaki
     1	Sutou Kouhei

Translations

Japanese

Improvements

[Windows] Drop support for Groonga package that is built with Visual Studio 2019

We don't provide the following packages since this release.

groonga-xx.x.x-x64-vs2019.zip
groonga-xx.x.x-x64-vs2019-with-vcruntime.zip

Fixes

[Near phrase search] Fixed a bug that interval between phrases calculation

This problem may occur when we use *NP, *NPP, or *ONP with MAX_ELEMENT_INTERVAL as below.

*NP-1,0,12,11"abc ef"
*NPP-1,0,10,9"(abc bcd) (ef)"
*ONP-1,0,5|6 "abc defghi jklmnop"

If you don't use MAX_ELEMENT_INTERVAL, this problem doesn't occur.

Please refer to the following links about usage and syntax of *NP, *NPP, or *ONPP.

[*NP in query syntax] query-syntax-near-phrase-search-condition
[*NP in script syntax] script-syntax-near-phrase-search-operator
[*NPP in query syntax] query-syntax-near-phrase-product-search-condition
[*NPP in script syntax] script-syntax-near-phrase-product-search-operator
[*ONPP in query syntax] query-syntax-ordered-near-phrase-search-condition
[*ONPP in script syntax] script-syntax-ordered-near-phrase-search-operator

If this problem occurs, the following things may happen.

Groonga may return records shouldn't be a hit.
Groonga may not return records that should be returned as hits.

Contributors

$ git shortlog -sn v15.1.1..
    24	Abe Tomoaki
    19	Horimoto Yasuhiro
    10	takuya kodama
     2	Sutou Kouhei

Translations

Japanese

This release updates TokenMecab to preserve user-defined entries with spaces as
single tokens.

Improvements

token_mecab: Fix unintended splitting of user-defined entries with spaces

Previously, token-mecab split user-defined entries containing spaces
(e.g., "search engine") into separate tokens ("search" and "engine"). This
release fixes this issue, so entries with embedded spaces are now preserved and
handled as single tokens like "search engine" as follows.

tokenize TokenMecab "search engine" --output_pretty yes
[
  [
    0,
    1748413131.972704,
    0.0003032684326171875
  ],
  [
    {
      "value": "search engine",
      "position": 0,
      "force_prefix": false,
      "force_prefix_search": false
    }
  ]
]

Fixes

Fixed many typos in documentation

GH-2332,
GH-2333,
GH-2334,
GH-2335,
GH-2336,
GH-2337,
GH-2338

Patched by Vasilii Lakhin.

Thanks

Vasilii Lakhin

Contributors

$ git shortlog -sn v15.0.9..
     9	takuya kodama
     7	Vasilii Lakhin
     6	Sutou Kouhei
     2	Abe Tomoaki
     2	Horimoto Yasuhiro

Translations

Japanese

This release adds the tokenizer's option to make token inspection simpler and
improves negative-division semantics for unsigned integer.

Improvements

tokenize/table_tokenize: Added tokenize-output-style option

This tokenize-output-style option to the
tokenize/table_tokenize
command makes it easier to focus on the tokens when you don’t need the full
attribute set.

Here is example of using tokenize-output-style option.

tokenize TokenNgram "Fulltext Search" --output_style simple
[
  [
    0,
    1746573056.540744,
    0.0007045269012451172
  ],
  [
    "Fu",
    "ul",
    "ll",
    "lt",
    "te",
    "ex",
    "xt",
    "t ",
    " S",
    "Se",
    "ea",
    "ar",
    "rc",
    "ch",
    "h"
  ]
]

Clarified `X / negative value` semantics

Previously, only dividing X by -1/1.0 returns -X for unsigned integers.
From this release, dividing by any negative value will yield the mathematically
expected negative result as follows.

Before: X / -2 might not return -(X / 2).
After: X / -2 always returns -(X / 2).

This is a backward incompatible change but we assume that no user depends on
this behavior.

Contributors

$ git shortlog -sn v15.0.4..

Translations

Japanese

Improvements

Clarified `X / -1` and `X / -1.0` semantics

In many languages, X / -1 and X / -1.0 return -X. But Groonga
may not return -X when X is unsigned integer.

X / -1 and X / -1.0 always return -X from this release.

This is a backward incompatible change but we assume that no user
depends on this behavior.

Translations

Japanese

Improvements

offline-index-construction: Added support for parallel construction with table-hash-key lexicon

Parallel offline index construction iterates sorted terms
internally. table-pat-key and table-pat-key can do it
effectively because they are based on tree. But table-hash-key
can't do it effectively because it's not based on tree. So we didn't
support parallel offline index construction with
table-hash-key lexicon.

This release adds support for parallel offline index construction with
table-hash-key lexicon. It sort terms in a normal way. So it's
not so effective. Parallel offline index construction with
table-hash-key lexicon will be slower than
table-pat-key/table-dat-key. But it may be faster than
sequential offline index construction with table-hash-key
lexicon.

Translations

Japanese

Releases: groonga/groonga

Groonga 15.1.8 - 2025-10-31

Improvements

[status] Added .["features"]["faiss"] entry

Supported semantic search

[language_model_vectorize] Added support for auto download from Hugging Face

[types] Add ShortBinary, Binary and LargeBinary

Contributors

Translations

Uh oh!

Groonga 15.1.7 - 2025-09-29

Improvements

[grndb] Improved error handling for large database files

[normalizer-nfkc-unify-iteration-mark] Added support for iteration marks with unify_iteration_mark option

Added new command to list available commands

[token_filter_stem] Added support for non-ASCII alphabets

Fixes

[almalinux] Fixed installation failure on AlmaLinux 10

Who should upgrade?

Thanks

Contributors

Translations

Uh oh!

Groonga 15.1.5 - 2025-08-29

Improvements

[table_create] Added support for KEY_LARGE flag for TABLE_PAT_KEY

[normalizer_nfkc] Added support for unify_hyphen_and_prolonged_sound_mark and remove_symbol combination

[almalinux] Added support for AlmaLinux 10

Fixes

[cmake] Fixed how to build/install

[table_create] Fixed a bug where KEY_LARGE flag was lost after executing truncate command

Thanks

Contributors

Translations

Uh oh!

Groonga 15.1.4 - 2025-07-29

Improvements

[Ordered near phrase product search] Fixed a bug in the interval calculation between phrases

Contributors

Translations

Uh oh!

Groonga 15.1.3 - 2025-07-18

Improvements

[Apache Arrow] Added support for Apache Arrow C++ 21.0.0

Contributors

Translations

Uh oh!

Groonga 15.1.2 - 2025-07-07

Improvements

[Windows] Drop support for Groonga package that is built with Visual Studio 2019

Fixes

[Near phrase search] Fixed a bug that interval between phrases calculation

Contributors

Translations

Uh oh!

Groonga 15.1.1 - 2025-06-02

Improvements

token_mecab: Fix unintended splitting of user-defined entries with spaces

Fixes

Fixed many typos in documentation

Thanks

Contributors

Translations

Uh oh!

Groonga 15.0.9 - 2025-05-08

Improvements

tokenize/table_tokenize: Added tokenize-output-style option

Clarified X / negative value semantics

Contributors

Translations

Uh oh!

Groonga 15.0.4 - 2025-03-29

Improvements

Clarified X / -1 and X / -1.0 semantics

Translations

Uh oh!

Groonga 15.0.3 - 2025-03-10

Improvements

offline-index-construction: Added support for parallel construction with table-hash-key lexicon

Translations

[status] Added `.["features"]["faiss"]` entry

[types] Add `ShortBinary`, `Binary` and `LargeBinary`

[normalizer-nfkc-unify-iteration-mark] Added support for iteration marks with `unify_iteration_mark` option

[table_create] Added support for `KEY_LARGE` flag for `TABLE_PAT_KEY`

[normalizer_nfkc] Added support for `unify_hyphen_and_prolonged_sound_mark` and `remove_symbol` combination

[table_create] Fixed a bug where `KEY_LARGE` flag was lost after executing `truncate` command

Clarified `X / negative value` semantics

Clarified `X / -1` and `X / -1.0` semantics