VecturaKit is a Swift-based vector database designed for on-device apps through local vector storage and retrieval.
Inspired by Dripfarm's SVDB, VecturaKit uses MLTensor and swift-embeddings for generating and managing embeddings. It features Model2Vec support with the 32M parameter model as default for fast static embeddings.
The framework offers VecturaKit as the core vector database with pluggable embedding providers. Use SwiftEmbedder for swift-embeddings integration, MLXEmbedder for Apple's MLX framework acceleration, or NLContextualEmbedder for Apple's NaturalLanguage framework with zero external dependencies.
It also includes CLI tools (vectura-cli and vectura-mlx-cli) for easily trying out the package.
Explore the following books to understand more about AI and iOS development:
- Exploring On-Device AI for Apple Platforms Development
- Exploring AI-Assisted Coding for iOS Development
- Features
- Supported Platforms
- Installation
- Usage
- MLX Integration
- NaturalLanguage Integration
- Command Line Interface
- License
- Contributing
- Support
- Model2Vec Support: Uses the retrieval 32M parameter Model2Vec model as default for fast static embeddings.
- Auto-Dimension Detection: Automatically detects embedding dimensions from models.
- On-Device Storage: Stores and manages vector embeddings locally.
- Hybrid Search: Combines vector similarity with BM25 text search for relevant search results (
VecturaKit). - Pluggable Search Engines: Implement custom search algorithms by conforming to the
VecturaSearchEngineprotocol. - Batch Processing: Indexes documents in parallel for faster data ingestion.
- Persistent Storage: Automatically saves and loads document data, preserving the database state across app sessions.
- Configurable Search: Customizes search behavior with adjustable thresholds, result limits, and hybrid search weights.
- Custom Storage Location: Specifies a custom directory for database storage.
- Custom Storage Provider: Implements custom storage backends (SQLite, Core Data, cloud storage) by conforming to the
VecturaStorageprotocol. - Memory Management Strategies: Choose between automatic, full-memory, or indexed modes to optimize performance for datasets ranging from thousands to millions of documents. Learn more
- MLX Support: Uses Apple's MLX framework for accelerated embedding generation through
MLXEmbedder. - NaturalLanguage Support: Uses Apple's NaturalLanguage framework for contextual embeddings with zero external dependencies through
NLContextualEmbedder. - CLI Tools: Includes
vectura-cli(Swift embeddings) andvectura-mlx-cli(MLX embeddings) for database management and testing.
- macOS 14.0 or later
- iOS 17.0 or later
- tvOS 17.0 or later
- visionOS 1.0 or later
- watchOS 10.0 or later
Swift Package Manager handles the distribution of Swift code and comes built into the Swift compiler.
To integrate VecturaKit into your project using Swift Package Manager, add the following dependency in your Package.swift file:
dependencies: [
.package(url: "https://github.com/rryam/VecturaKit.git", from: "2.3.1"),
],VecturaKit uses the following Swift packages:
- swift-embeddings: Used in
VecturaKitfor generating text embeddings using various models. - swift-argument-parser: Used for creating the command-line interface.
- mlx-swift-examples: Provides MLX-based embeddings and vector search capabilities, specifically for
VecturaMLXKit.
Note: VecturaNLKit has no external dependencies beyond Apple's native NaturalLanguage framework.
Get up and running with VecturaKit in minutes. Here is an example of adding and searching documents:
import VecturaKit
Task {
do {
let config = VecturaConfig(name: "my-db")
let embedder = SwiftEmbedder(modelSource: .default)
let vectorDB = try await VecturaKit(config: config, embedder: embedder)
// Add documents
let ids = try await vectorDB.addDocuments(texts: [
"The quick brown fox jumps over the lazy dog",
"Swift is a powerful programming language"
])
// Search documents
let results = try await vectorDB.search(query: "programming language", numResults: 5)
print("Found \(results.count) results!")
} catch {
print("Error: \(error)")
}
}import VecturaKitimport Foundation
import VecturaKit
let config = VecturaConfig(
name: "my-vector-db",
directoryURL: nil, // Optional custom storage location
dimension: nil, // Auto-detect dimension from embedder (recommended)
searchOptions: VecturaConfig.SearchOptions(
defaultNumResults: 10,
minThreshold: 0.7,
hybridWeight: 0.5, // Balance between vector and text search
k1: 1.2, // BM25 parameters
b: 0.75,
bm25NormalizationFactor: 10.0
)
)
// Create an embedder (SwiftEmbedder uses swift-embeddings library)
let embedder = SwiftEmbedder(modelSource: .default)
let vectorDB = try await VecturaKit(config: config, embedder: embedder)For large-scale datasets (100K+ documents):
let config = VecturaConfig(
name: "my-vector-db",
memoryStrategy: .indexed(candidateMultiplier: 10)
)
let vectorDB = try await VecturaKit(config: config, embedder: embedder)
// Reduced memory footprint with on-demand document loading💡 Tip: See the Indexed Storage Guide for detailed information on memory strategies and performance optimization for large-scale datasets.
📊 Performance: Check out the Performance Test Results for detailed benchmarking data and recommendations. For documentation index, see Docs/.
Single document:
let text = "Sample text to be embedded"
let documentId = try await vectorDB.addDocument(
text: text,
id: UUID() // Optional, will be generated if not provided
)Multiple documents in batch:
let texts = [
"First document text",
"Second document text",
"Third document text"
]
let documentIds = try await vectorDB.addDocuments(
texts: texts,
ids: nil // Optional array of UUIDs
)Search by text (hybrid search):
let results = try await vectorDB.search(
query: "search query",
numResults: 5, // Optional
threshold: 0.8 // Optional
)
for result in results {
print("Document ID: \(result.id)")
print("Text: \(result.text)")
print("Similarity Score: \(result.score)")
print("Created At: \(result.createdAt)")
}Search by vector embedding:
// Using array literal
let results = try await vectorDB.search(
query: [0.1, 0.2, 0.3, ...], // Array literal matching config.dimension
numResults: 5, // Optional
threshold: 0.8 // Optional
)
// Or explicitly use SearchQuery enum
let embedding: [Float] = getEmbedding()
let results = try await vectorDB.search(
query: .vector(embedding),
numResults: 5,
threshold: 0.8
)Update document:
try await vectorDB.updateDocument(
id: documentId,
newText: "Updated text"
)Delete documents:
try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])Reset database:
try await vectorDB.reset()Get document count:
let count = await vectorDB.documentCount
print("Database contains \(count) documents")VecturaKit allows you to implement your own storage backend by conforming to the VecturaStorage protocol. This is useful for integrating with different storage systems like SQLite, Core Data, or cloud storage.
Define a custom storage provider:
import Foundation
import VecturaKit
final class MyCustomStorageProvider: VecturaStorage {
private var documents: [UUID: VecturaDocument] = [:]
func createStorageDirectoryIfNeeded() async throws {
// Initialize your storage system
}
func loadDocuments() async throws -> [VecturaDocument] {
// Load documents from your storage
return Array(documents.values)
}
func saveDocument(_ document: VecturaDocument) async throws {
// Save document to your storage
documents[document.id] = document
}
func deleteDocument(withID id: UUID) async throws {
// Delete document from your storage
documents.removeValue(forKey: id)
}
func updateDocument(_ document: VecturaDocument) async throws {
// Update document in your storage
documents[document.id] = document
}
func getTotalDocumentCount() async throws -> Int {
// Return total count (optional - protocol provides default implementation)
return documents.count
}
}Use the custom storage provider:
let config = VecturaConfig(name: "my-db")
let customStorage = MyCustomStorageProvider()
let vectorDB = try await VecturaKit(
config: config,
storageProvider: customStorage
)
// Use vectorDB normally - all storage operations will use your custom provider
let documentId = try await vectorDB.addDocument(text: "Sample text")VecturaKit supports custom search engine implementations by conforming to the VecturaSearchEngine protocol. This allows you to implement specialized search algorithms (pure vector, pure text, custom hybrid, or other ranking methods).
Define a custom search engine:
import Foundation
import VecturaKit
struct MyCustomSearchEngine: VecturaSearchEngine {
func search(
query: SearchQuery,
storage: VecturaStorage,
options: SearchOptions
) async throws -> [VecturaSearchResult] {
// Load documents from storage
let documents = try await storage.loadDocuments()
// Implement your custom search logic
// This example does a simple exact text match
guard case .text(let queryText) = query else {
return []
}
let results = documents.filter { doc in
doc.text.lowercased().contains(queryText.lowercased())
}.map { doc in
VecturaSearchResult(
id: doc.id,
text: doc.text,
score: 1.0,
createdAt: doc.createdAt
)
}
return Array(results.prefix(options.numResults))
}
func indexDocument(_ document: VecturaDocument) async throws {
// Optional: Update your search engine's internal index
}
func removeDocument(id: UUID) async throws {
// Optional: Remove from your search engine's internal index
}
}Use the custom search engine:
let config = VecturaConfig(name: "my-db")
let embedder = SwiftEmbedder(modelSource: .default)
let customEngine = MyCustomSearchEngine()
let vectorDB = try await VecturaKit(
config: config,
embedder: embedder,
searchEngine: customEngine
)
// All searches will use your custom search engine
let results = try await vectorDB.search(query: "search query")VecturaKit supports Apple's MLX framework through the MLXEmbedder for accelerated on-device machine learning performance.
import VecturaKit
import VecturaMLXKit
import MLXEmbedderslet config = VecturaConfig(
name: "my-mlx-vector-db",
dimension: nil // Auto-detect dimension from MLX embedder
)
// Create MLX embedder
let embedder = try await MLXEmbedder(configuration: .nomic_text_v1_5)
let vectorDB = try await VecturaKit(config: config, embedder: embedder)let texts = [
"First document text",
"Second document text",
"Third document text"
]
let documentIds = try await vectorDB.addDocuments(texts: texts)let results = try await vectorDB.search(
query: "search query",
numResults: 5, // Optional
threshold: 0.8 // Optional
)
for result in results {
print("Document ID: \(result.id)")
print("Text: \(result.text)")
print("Similarity Score: \(result.score)")
print("Created At: \(result.createdAt)")
}Update document:
try await vectorDB.updateDocument(
id: documentId,
newText: "Updated text"
)Delete documents:
try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])Reset database:
try await vectorDB.reset()VecturaKit supports Apple's NaturalLanguage framework through the NLContextualEmbedder for contextual embeddings with zero external dependencies.
import VecturaKit
import VecturaNLKitlet config = VecturaConfig(
name: "my-nl-vector-db",
dimension: nil // Auto-detect dimension from NL embedder
)
// Create NLContextualEmbedder
let embedder = try await NLContextualEmbedder(
language: .english
)
let vectorDB = try await VecturaKit(config: config, embedder: embedder)Available Options:
// Initialize with specific language
let embedder = try await NLContextualEmbedder(
language: .spanish
)
// Get model information
let modelInfo = await embedder.modelInfo
print("Language: \(modelInfo.language)")
if let dimension = modelInfo.dimension {
print("Dimension: \(dimension)")
} else {
print("Dimension: Not yet determined")
}let texts = [
"Natural language understanding is fascinating",
"Swift makes iOS development enjoyable",
"Machine learning on device preserves privacy"
]
let documentIds = try await vectorDB.addDocuments(texts: texts)let results = try await vectorDB.search(
query: "iOS programming",
numResults: 5, // Optional
threshold: 0.7 // Optional
)
for result in results {
print("Document ID: \(result.id)")
print("Text: \(result.text)")
print("Similarity Score: \(result.score)")
print("Created At: \(result.createdAt)")
}Update document:
try await vectorDB.updateDocument(
id: documentId,
newText: "Updated text"
)Delete documents:
try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])Reset database:
try await vectorDB.reset()Key Features:
- Zero External Dependencies: Uses only Apple's native NaturalLanguage framework
- Contextual Embeddings: Considers surrounding context for more accurate semantic understanding
- Privacy-First: All processing happens on-device
- Language Support: Supports multiple languages (English, Spanish, French, German, Italian, Portuguese, and more)
- Auto-Detection: Automatically detects embedding dimensions
Performance Characteristics:
- Speed: Moderate (slower than Model2Vec, comparable to MLX)
- Accuracy: High contextual understanding for supported languages
- Memory: Efficient on-device processing
- Use Cases: Ideal for apps requiring semantic search without external dependencies
Platform Requirements:
- iOS 17.0+ / macOS 14.0+ / tvOS 17.0+ / visionOS 1.0+ / watchOS 10.0+
- NaturalLanguage framework (included with OS)
VecturaKit includes command-line tools for database management with different embedding backends.
# Add documents (dimension auto-detected from model)
vectura add "First document" "Second document" "Third document" \
--db-name "my-vector-db"
# Search documents
vectura search "search query" \
--db-name "my-vector-db" \
--threshold 0.7 \
--num-results 5
# Update document
vectura update <document-uuid> "Updated text content" \
--db-name "my-vector-db"
# Delete documents
vectura delete <document-uuid-1> <document-uuid-2> \
--db-name "my-vector-db"
# Reset database
vectura reset \
--db-name "my-vector-db"
# Run demo with sample data
vectura mock \
--db-name "my-vector-db" \
--threshold 0.7 \
--num-results 10Common options for vectura-cli:
--db-name, -d: Database name (default: "vectura-cli-db")--dimension, -v: Vector dimension (auto-detected by default)--threshold, -t: Minimum similarity threshold (default: 0.7)--num-results, -n: Number of results to return (default: 10)--model-id, -m: Model ID for embeddings (default: "minishlab/potion-retrieval-32M")
# Add documents
vectura-mlx add "First document" "Second document" "Third document" --db-name "my-mlx-vector-db"
# Search documents
vectura-mlx search "search query" --db-name "my-mlx-vector-db" --threshold 0.7 --num-results 5
# Update document
vectura-mlx update <document-uuid> "Updated text content" --db-name "my-mlx-vector-db"
# Delete documents
vectura-mlx delete <document-uuid-1> <document-uuid-2> --db-name "my-mlx-vector-db"
# Reset database
vectura-mlx reset --db-name "my-mlx-vector-db"
# Run demo with sample data
vectura-mlx mock --db-name "my-mlx-vector-db"Options for vectura-mlx-cli:
--db-name, -d: Database name (default: "vectura-mlx-cli-db")--threshold, -t: Minimum similarity threshold (default: no threshold)--num-results, -n: Number of results to return (default: 10)
VecturaKit is released under the MIT License. See the LICENSE file for more information. Copyright (c) 2025 Rudrank Riyam.
Contributions are welcome! Please fork the repository and submit a pull request with your improvements.
- Clone the repository
- Open
Package.swiftin Xcode or VS Code - Run tests to ensure everything works:
swift test - Run performance benchmarks (optional):
swift test --filter BenchmarkSuite— see Performance Tests - Make your changes and test them
- Follow SwiftLint rules (run
swiftlint lint) - Use Swift 6.0+ features where appropriate
- Maintain backward compatibility when possible
- Document public APIs with DocC comments