Ktoken is a BPE tokenizer designed for seamless integration with OpenAI's models.
Install Ktoken by adding the dependency to your build.gradle file:
repositories {
    mavenCentral()
}
dependencies {
    implementation "com.aallam.ktoken:ktoken:0.4.0"
}val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE)
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4")
val tokens = tokenizer.encode("hello world")
val text = tokenizer.decode(listOf(15339, 1917))Ktoken operates in two modes: Local (default for JVM) and Remote (default for JS/Native).
Utilize LocalPbeLoader to retrieve encodings from local files:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.SYSTEM))
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = LocalPbeLoader(FileSystem.SYSTEM))Artifacts for JVM include encoding files. Use FileSystem.RESOURCES to load them:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.RESOURCES))Note: this is the default behavior for JVM.
- Add Engine: Include one of Ktor's engines to your dependencies.
- Use RemoteBpeLoader: To load encoding from remote sources:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = RemoteBpeLoader())
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = RemoteBpeLoader())You might alternatively use ktoken-bom by adding the following dependency to your build.gradle file:
dependencies {
    // Import Kotlin API client BOM
    implementation platform('com.aallam.ktoken:ktoken-bom:0.4.0')
    // Define dependencies without versions
    implementation 'com.aallam.ktoken:ktoken'
    runtimeOnly 'io.ktor:ktor-client-okhttp'
}For multiplatform projects, add the ktoken dependency to commonMain, and select an engine for each target.
Ktoken is open-source software and distributed under the MIT license. This project is not affiliated with nor endorsed by OpenAI.