All JavaScript stemmers have been transpiled from Java implementation of Snowball stemming algorithms using ESJava transpiler.
This project provides not only pre-built JavaScript stemmers, but allows to create new ones.
Stemmers for 20+ languages are packed in one file in two ECMAScript standards:
You can test stemmers directly in online demo.
As there are several limitations of ESJava transpiler, the build process has to be complemented by pre- and post-transpiling tweaks.
- Unix-like OS (or Cygwin on Windows)
- Node.js + npm
- rsync (for syncing Snowball repository, required only in specific scenarios)
- perl (for generating Java code from Snowball algorithms (SBL files), required only in specific scenarios)
- Building Java stemmers from most recent Snowball stemmers
- Creating a Java bundle
- Tweaking the Java bundle
- Transpiling the Java bundle to JavaScript
- Modifying the transpiled JavaScript
- Building Java stemmers from most recent Snowball stemmers
- Building Java stemmers from custom Snowball stemmers
- Creating a Java bundle
- Adding custom Java stemmers into the bundle
- Tweaking the Java bundle
- Transpiling the Java bundle to JavaScript
- Modifying the transpiled JavaScript
git clone https://github.com/mazko/jssnowball.git
cd jssnowball/
make bundle
- Change directory to
jssnoball/snowball-master/ - Create new subfolder in the
algorithmsfolder and copy there the given SBL file renamed tostem_Unicode.sbl - Add stemmer configuration into
libstemmer/modules.txtandlibstemmer/modules_utf8.txt - Add stemmer to the GNUmakefile's
libstemmer_algorithmsvariable - Compile the Snowball using
make dist
As ESJava can convert a single file only, all Java source files have to be bundled first.
git checkout -- js_snowball/eclipse/
make bundle
Copy the Java stemmer code from jssnoball/snowball-master/java/org/tartarus/snowball/ext/ into jssnowball/js_snowball/lib/snowball.bundle.java.
It also recommended to remove unused code like copy_from, hashCode etc. Here is Eclipse EE Mars.1 Release (4.5.1) example:
source -> cleanup
There are some Java constructions that can't be translated to JavaScipt directly, e.g. reflection etc. Such fragments has to be tweaked a bit.
Fortunately, most of them are in the common code, not in stemmers themselves (except for finnishStemmer). They are wrapped inside :es6: code :end: and should be edited as suggested in comments.
On top of that, these further tweaks are required:
- removing package names in method references (
org.tartarus.snowball,java.lang) - removing some overloaded methods
The result should match the original snowball.bundle.java file.
npm i -g esjava babel-cli
npm i babel-preset-es2015 babel-plugin-transform-es2015-modules-umd
make esjava
In the final JavaScript files (stored in jssnowball/js_snowball/lib/ directory) it is necessary to replace s.length() with s.length in eq_s and eq_s_b methods. Otherwise the code returns a TypeError: s.length is not a function.