Swift bindings for catus compute.
Cactus is a framework for deploying LLMs locally in your app, and can act as a suitable alternative to FoundationModels for your app by offering more model choices and better performance.
At the moment, this package provides a minimal and low-level Swifty interface above the Cactus C FFI, JSON Schema, Cactus Telemetry, and model downloading.
You first must download the model you want to use using CactusModelsDirectory, then you can create an instance of CactusLanguageModel with a local model URL to start generating.
import Cactus
let modelURL = try await CactusModelsDirectory.shared.modelURL(for: .qwen3_0_6b())
let model = try CactusLanguageModel(from: modelURL)
let completion = try model.chatCompletion(
messages: [
.system("You are a philosopher, philosophize about anything."),
.user("What is the meaning of life?")
]
)CactusModelsDirectory now stores models in a nested structure:
<models-root>/<version>/<quantization>/__ordinary__/<slug><models-root>/<version>/<quantization>/<pro>/<slug>
The previous flat structure (<slug>--<quantization>--<version>[--<pro>]) is no longer used.
You can migrate existing directories with:
let result = try CactusModelsDirectory.shared.migrateFromv1_5Tov1_7Structure()
print("Migrated: \(result.migrated.map(\.request.slug))")
print("Removed: \(result.removed.map(\.request.slug))")During migration, models with versions older than v1.7 are removed.
Note
The methods of CactusLanguageModel are synchronous and blocking, and the CactusLanguageModel class is also not Sendable. This gives you the flexibility to use the model in non-isolated and synchronous contexts, but you should almost certainly avoid using it directly on the main thread. If you need concurrent access to the model, you may want to consider wrapping it in an actor.
final actor LanguageModelActor {
let model: CactusLanguageModel
init(model: sending CactusLanguageModel) {
self.model = model
}
func withIsolation<T, E: Error>(
perform operation: (isolated LanguageModelActor) throws(E) -> sending T
) throws(E) -> sending T {
try operation(self)
}
}
@concurrent
func chatInBackground(
with modelActor: LanguageModelActor
) async throws {
try await modelActor.withIsolation { @Sendable modelActor in
// You can access the model directly because the closure
// is isolated to modelActor.
let model = modelActor.model
// ...
}
}The CactusLanguageModel.chatCompletion method provides a callback the allows you to stream tokens as they come in.
let completion = try model.chatCompletion(
messages: [
.system("You are a philosopher, philosophize about anything."),
.user("What is the meaning of life?")
]
) { token in
print(token)
}You can stream audio transcription results using CactusTranscriptionStream, which vends an async sequence of
processed transcriptions.
import AVFoundation
let modelURL = try await CactusModelsDirectory.shared.modelURL(for: .whisperSmall())
let stream = try CactusTranscriptionStream(modelURL: modelURL, contextSize: 2048)
let task = Task {
for try await chunk in stream {
print(chunk.confirmed, chunk.pending)
}
}
let buffer = try AVAudioPCMBuffer(pcmFormat: format, frameCapacity: frameCount)
try await stream.insert(buffer: buffer)
let finalized = try await stream.finish()
print(finalized.confirmed)
_ = try await task.valueYou can pass a list of function definitions to the model, which the model can then invoke based on the schema of arguments you provide. The function calling format is based on the JSON Schema format.
let completion = try model.chatCompletion(
messages: [
.system("You are a helpful assistant that can use tools."),
.user("What is the weather in San Francisco?")
],
functions: [
CactusLanguageModel.FunctionDefinition(
name: "get_weather",
description: "Get the weather in a given location",
parameters: .object(
properties: [
"location": .string(
description: "City name, eg. 'San Francisco'",
minLength: 1,
examples: ["San Francisco"]
)
],
required: ["location"]
)
)
]
)
// [
// CactusLanguageModel.FunctionCall(
// name: "get_weather",
// arguments: ["location": "San Francisco"]
// )
// ]
print(completion.functionCalls)Note
Smaller models may struggle to generate function arguments that match the JSONSchema you specify for the function. Therefore, the library provides a way to manually validate any value against the schema you provide to the model using the JSONSchema.Validator class.
let functionDefinition = CactusLanguageModel.FunctionDefinition(
name: "search",
description: "Find something",
parameters: .object(
properties: [
"query": .string(minLength: 1)
]
)
)
let completion = try model.chatCompletion(
messages: messages,
functions: [functionDefinition]
)
for functionCall in completion.functionCalls {
try JSONSchema.Validator.shared.validate(
value: .object(functionCall.arguments),
with: functionDefinition.parameters
)
}VLMs allow you to pass images to the model for analysis. You can pass an array of URLs to image files when creating a CactusLanguageModel.ChatMessage.
let modelURL = try await CactusModelsDirectory.shared.modelURL(for: .lfm2Vl_450m())
let model = try CactusLanguageModel(from: modelURL)
let completion = try model.chatCompletion(
messages: [
.system("You are a helpful assistant."),
.user("What is going on here?", images: [imageURL])
]
)Audio models allow you to transcribe audio files. You can pass the URL of an audio file to CactusLanguageModel.transcribe in order to transcribe it.
let modelURL = try await CactusModelsDirectory.shared.modelURL(for: .whisperSmall())
let model = try CactusLanguageModel(from: modelURL)
// See https://huggingface.co/openai/whisper-small#usage for more info on how to structure a
// whisper prompt.
let transcription = try model.transcribe(
audio: audioFileURL,
prompt: "<|startoftranscript|><|en|><|transcribe|><|notimestamps|>"
)You can also transcribe directly from an AVAudioPCMBuffer directly.
import AVFoundation
let buffer = try AVAudioPCMBuffer(pcmFormat: format, frameCapacity: frameCount)
try model.transcribe(
buffer: buffer,
prompt: "<|startoftranscript|><|en|><|transcribe|><|notimestamps|>"
)You can generate embeddings by passing a MutableSpan as a buffer to CactusLanguageModel.embeddings, or you can alternatively obtain a [Float] directly by calling CactusLanguageModel.embeddings.
let embeddings: [Float] = try model.embeddings(for: "This is some text")
// OR (Using InlineArray)
var embeddings = [2048 of Float](repeating: 0)
var span = embeddings.mutableSpan
try model.embeddings(for: "This is some text", buffer: &span)Audio models and VLMs also support audio and image embeddings respectively.
let imageEmbeddings = try model.imageEmbeddings(for: imageURL)
let audioEmbeddings = try model.audioEmbeddings(for: audioFileURL)
// OR (Using InlineArray)
var embeddings = [2048 of Float](repeating: 0)
var span = embeddings.mutableSpan
try model.imageEmbeddings(for: imageURL, buffer: &span)
try model.imageEmbeddings(for: audioFileURL, buffer: &span)You can use embeddings to match similar strings for searching purposes using algorithms such as cosine similarity.
func cosineSimilarity<C: Collection>(_ a: C, _ b: C) throws -> Double
where C.Element: BinaryFloatingPoint {
guard a.count == b.count else {
struct LengthError: Error {}
throw LengthError()
}
var dot = 0.0, normA = 0.0, normB = 0.0
var ia = a.startIndex, ib = b.startIndex
while ia != a.endIndex {
let x = Double(a[ia])
let y = Double(b[ib])
dot += x * y
normA += x * x
normB += y * y
ia = a.index(after: ia)
ib = b.index(after: ib)
}
let denom = (normA.squareRoot() * normB.squareRoot())
return denom == 0 ? 0 : dot / denom
}
let fancy = try model.embeddings(for: "This is some fancy text")
let pretty = try model.embeddings(for: "This is some pretty text")
print(cosineSimilarity(fancy, pretty))RAG models allow you to query a corpus of documents for relevant information. Create a model with a corpus directory containing your documents, then use ragQuery to search them.
let corpusURL = URL.documentsDirectory.appending(path: "swift-corpus")
let modelURL = try await CactusModelsDirectory.shared.modelURL(for: .lfm2_1_2bRag())
let model = try CactusLanguageModel(from: modelURL, corpusDirectoryURL: corpusURL)
let result = try model.ragQuery(query: "What is async/await?")
// [
// CactusLanguageModel.RAGChunk(
// score: 0.85,
// source: "document2.txt",
// content: "Async and await are fundamental concepts..."
// )
// ]
for chunk in result.chunks {
print("Score: \(chunk.score), Source: \(chunk.source)")
print("Content: \(chunk.content)")
}The RAG query uses hybrid search combining embeddings with BM25 rankings to find the most relevant document chunks.
On Android, you must create your own CactusModelsDirectory instance with a custom base URL.
import Cactus
import Android
import AndroidNativeAppGlue
@_silgen_name("android_main")
public func android_main(_ app: UnsafeMutablePointer<android_app>) {
let modelsDirectory = CactusModelsDirectory(
baseURL: URL(fileURLWithPath: app.pointee.activity.pointee.internalDataPath)
)
// ...
}Alternatively, you could export a JNI function to create a directory, and call that function from kotlin.
// In JNI module MyAppSwift
import Cactus
public func setAndroidFilesDirectory(_ path: String) {
let modelsDirectory = CactusModelsDirectory(
baseURL: URL(fileURLWithPath: path)
)
}// In Android App
import com.example.myapp.MyAppSwift
class MainActivity : ComponentActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
MyAppSwift.setAndroidFilesDirectory(applicationContext.filesDir.absolutePath)
// ...
}
}The documentation for releases and main are available here.
You can add Swift Cactus to an Xcode project by adding it to your project as a package.
If you want to use Swift Cactus in a SwiftPM project, it's as simple as adding it to your Package.swift.
dependencies: [
.package(url: "https://github.com/mhayes853/swift-cactus", from: "1.5.0")
]And then adding the product to any target that needs access to the library.
.product(name: "Cactus", package: "swift-cactus")This library is licensed under an MIT License. See LICENSE for details.