Cross-platform framework for deploying LLM/VLM/TTS models locally in your app.
- Available in Flutter, React-Native and Kotlin Multiplatform.
- Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek etc.
- Run LLMs, VLMs, Embedding Models, TTS models and more.
- Accommodates from FP32 to as low as 2-bit quantized models, for efficiency and less device strain.
- Chat templates with Jinja2 support and token streaming.
CLICK TO JOIN OUR DISCORD!
CLICK TO VISUALISE AND QUERY REPO
- Install:
Execute the following command in your project terminal:
flutter pub add cactus
- Flutter Text Completion
import 'package:cactus/cactus.dart'; final lm = await CactusLM.init( modelUrl: 'huggingface/gguf/link', contextSize: 2048, ); final messages = [ChatMessage(role: 'user', content: 'Hello!')]; final response = await lm.completion(messages, maxTokens: 100, temperature: 0.7);
- Flutter Embedding
import 'package:cactus/cactus.dart'; final lm = await CactusLM.init( modelUrl: 'huggingface/gguf/link', contextSize: 2048, generateEmbeddings: true, ); final text = 'Your text to embed'; final result = await lm.embedding(text);
- Flutter VLM Completion
import 'package:cactus/cactus.dart'; final vlm = await CactusVLM.init( modelUrl: 'huggingface/gguf/link', mmprojUrl: 'huggingface/gguf/mmproj/link', ); final messages = [ChatMessage(role: 'user', content: 'Describe this image')]; final response = await vlm.completion( messages, imagePaths: ['/absolute/path/to/image.jpg'], maxTokens: 200, temperature: 0.3, );
- Flutter Cloud Fallback
import 'package:cactus/cactus.dart'; final lm = await CactusLM.init( modelUrl: 'huggingface/gguf/link', contextSize: 2048, cactusToken: 'enterprise_token_here', ); final messages = [ChatMessage(role: 'user', content: 'Hello!')]; final response = await lm.completion(messages, maxTokens: 100, temperature: 0.7); // local (default): strictly only run on-device // localfirst: fallback to cloud if device fails // remotefirst: primarily remote, run local if API fails // remote: strictly run on cloud final embedding = await lm.embedding('Your text', mode: 'localfirst');
N/B: See the Flutter Docs for more.
-
Install the
cactus-react-nativepackage:npm install cactus-react-native && npx pod-install -
React-Native Text Completion
import { CactusLM } from 'cactus-react-native'; const { lm, error } = await CactusLM.init({ model: '/path/to/model.gguf', n_ctx: 2048, }); const messages = [{ role: 'user', content: 'Hello!' }]; const params = { n_predict: 100, temperature: 0.7 }; const response = await lm.completion(messages, params);
-
React-Native Embedding
import { CactusLM } from 'cactus-react-native'; const { lm, error } = await CactusLM.init({ model: '/path/to/model.gguf', n_ctx: 2048, embedding: True, }); const text = 'Your text to embed'; const params = { normalize: true }; const result = await lm.embedding(text, params);
-
React-Native VLM
import { CactusVLM } from 'cactus-react-native'; const { vlm, error } = await CactusVLM.init({ model: '/path/to/vision-model.gguf', mmproj: '/path/to/mmproj.gguf', }); const messages = [{ role: 'user', content: 'Describe this image' }]; const params = { images: ['/absolute/path/to/image.jpg'], n_predict: 200, temperature: 0.3, }; const response = await vlm.completion(messages, params);
-
React-Native Cloud Fallback
import { CactusLM } from 'cactus-react-native'; const { lm, error } = await CactusLM.init({ model: '/path/to/model.gguf', n_ctx: 2048, }, undefined, 'enterprise_token_here'); const messages = [{ role: 'user', content: 'Hello!' }]; const params = { n_predict: 100, temperature: 0.7 }; const response = await lm.completion(messages, params); // local (default): strictly only run on-device // localfirst: fallback to cloud if device fails // remotefirst: primarily remote, run local if API fails // remote: strictly run on cloud const embedding = await lm.embedding('Your text', undefined, 'localfirst');
N/B: See the React Docs for more.
-
Add Maven Dependency: Add to your KMP project's
build.gradle.kts:kotlin { sourceSets { commonMain { dependencies { implementation("com.cactus:library:0.2.4") } } } } -
Platform Setup:
- Android: Works automatically - native libraries included.
- iOS: In Xcode: File → Add Package Dependencies → Paste
https://github.com/cactus-compute/cactus→ Click Add
-
Kotlin Multiplatform Text Completion
import com.cactus.CactusLM import kotlinx.coroutines.runBlocking runBlocking { val lm = CactusLM( threads = 4, contextSize = 2048, gpuLayers = 0 // Set to 99 for full GPU offload ) val downloadSuccess = lm.download( url = "path/to/hugginface/gguf", filename = "model_filename.gguf" ) val initSuccess = lm.init("qwen3-600m.gguf") val result = lm.completion( prompt = "Hello!", maxTokens = 100, temperature = 0.7f ) }
-
Kotlin Multiplatform Speech To Text
import com.cactus.CactusSTT import kotlinx.coroutines.runBlocking runBlocking { val stt = CactusSTT( language = "en-US", sampleRate = 16000, maxDuration = 30 ) // Only supports default Vosk STT model for Android & Apple FOundation Model val downloadSuccess = stt.download() val initSuccess = stt.init() val result = stt.transcribe() result?.let { sttResult -> println("Transcribed: ${sttResult.text}") println("Confidence: ${sttResult.confidence}") } // Or transcribe from audio file val fileResult = stt.transcribeFile("/path/to/audio.wav") }
-
Kotlin Multiplatform VLM
import com.cactus.CactusVLM import kotlinx.coroutines.runBlocking runBlocking { val vlm = CactusVLM( threads = 4, contextSize = 2048, gpuLayers = 0 // Set to 99 for full GPU offload ) val downloadSuccess = vlm.download( modelUrl = "path/to/hugginface/gguf", mmprojUrl = "path/to/hugginface/mmproj/gguf", modelFilename = "model_filename.gguf", mmprojFilename = "mmproj_filename.gguf" ) val initSuccess = vlm.init("smolvlm2-500m.gguf", "mmproj-smolvlm2-500m.gguf") val result = vlm.completion( prompt = "Describe this image", imagePath = "/path/to/image.jpg", maxTokens = 200, temperature = 0.3f ) }
N/B: See the Kotlin Docs for more.
Cactus backend is written in C/C++ and can run directly on phones, smart tvs, watches, speakers, cameras, laptops etc. See the C++ Docs for more.
First, clone the repo with git clone https://github.com/cactus-compute/cactus.git, cd into it and make all scripts executable with chmod +x scripts/*.sh
-
Flutter
- Build the Android JNILibs with
scripts/build-flutter-android.sh. - Build the Flutter Plugin with
scripts/build-flutter.sh. (MUST run before using example) - Navigate to the example app with
cd flutter/example. - Open your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
- Always start app with this combo
flutter clean && flutter pub get && flutter run. - Play with the app, and make changes either to the example app or plugin as desired.
- Build the Android JNILibs with
-
React Native
- Build the Android JNILibs with
scripts/build-react-android.sh. - Build the Flutter Plugin with
scripts/build-react.sh. - Navigate to the example app with
cd react/example. - Setup your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
- Always start app with this combo
yarn && yarn iosoryarn && yarn android. - Play with the app, and make changes either to the example app or package as desired.
- For now, if changes are made in the package, you would manually copy the files/folders into the
examples/react/node_modules/cactus-react-native.
- Build the Android JNILibs with
-
Kotlin Multiplatform
- Build the Android JNILibs with
scripts/build-flutter-android.sh. (Flutter & Kotlin share same JNILibs) - Build the Kotlin library with
scripts/build-kotlin.sh. (MUST run before using example) - Navigate to the example app with
cd kotlin/example. - Open your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
- Always start app with
./gradlew :composeApp:runfor desktop or use Android Studio/Xcode for mobile. - Play with the app, and make changes either to the example app or library as desired.
- Build the Android JNILibs with
-
C/C++
- Navigate to the example app with
cd cactus/example. - There are multiple main files
main_vlm, main_llm, main_embed, main_tts. - Build both the libraries and executable using
build.sh. - Run with one of the executables
./cactus_vlm,./cactus_llm,./cactus_embed,./cactus_tts. - Try different models and make changes as desired.
- Navigate to the example app with
-
Contributing
- To contribute a bug fix, create a branch after making your changes with
git checkout -b <branch-name>and submit a PR. - To contribute a feature, please raise as issue first so it can be discussed, to avoid intersecting with someone else.
- Join our discord
- To contribute a bug fix, create a branch after making your changes with
| Device | Gemma3 1B Q4 (toks/sec) | Qwen3 4B Q4 (toks/sec) |
|---|---|---|
| iPhone 16 Pro Max | 54 | 18 |
| iPhone 16 Pro | 54 | 18 |
| iPhone 16 | 49 | 16 |
| iPhone 15 Pro Max | 45 | 15 |
| iPhone 15 Pro | 45 | 15 |
| iPhone 14 Pro Max | 44 | 14 |
| OnePlus 13 5G | 43 | 14 |
| Samsung Galaxy S24 Ultra | 42 | 14 |
| iPhone 15 | 42 | 14 |
| OnePlus Open | 38 | 13 |
| Samsung Galaxy S23 5G | 37 | 12 |
| Samsung Galaxy S24 | 36 | 12 |
| iPhone 13 Pro | 35 | 11 |
| OnePlus 12 | 35 | 11 |
| Galaxy S25 Ultra | 29 | 9 |
| OnePlus 11 | 26 | 8 |
| iPhone 13 mini | 25 | 8 |
| Redmi K70 Ultra | 24 | 8 |
| Xiaomi 13 | 24 | 8 |
| Samsung Galaxy S24+ | 22 | 7 |
| Samsung Galaxy Z Fold 4 | 22 | 7 |
| Xiaomi Poco F6 5G | 22 | 6 |
We provide a colleaction of recommended models on our HuggingFace Page