Open-source Memory-as-a-Service - Add persistent memory to your AI applications
Kernal is a self-hostable API that provides persistent memory capabilities for AI applications. It handles embedding generation, vector storage, and semantic retrieval through a simple REST API.
- Simple REST API: Store and retrieve memories with a few HTTP calls
- Semantic Search: AI-powered similarity search using vector embeddings
- OpenAI Embeddings: Uses OpenAI's embedding models for optimal quality
- Multi-Tenant: Secure isolation with tenant/container/user hierarchy
- Async Processing: Hybrid sync/async architecture for optimal performance
- Docker Compose: Complete stack deployment in one command
- Self-Hosted: Full control over your data and infrastructure
- Open Source: MIT licensed, community-driven development
- Docker & Docker Compose
- OpenAI API key (for embeddings)
- 2GB RAM minimum, 4GB recommended
-
Clone the repository
git clone https://github.com/yourusername/kernal.git cd kernal -
Configure environment
cp .env.example .env nano .env # Add your OPENAI_API_KEY and other settings -
Start all services
docker-compose up -d
-
Get your API key
docker-compose logs memory-service | grep "Default API Key"
That's it! Your Kernal instance is running at http://localhost:8000
# Store a memory
curl -X POST "http://localhost:8000/api/v1/memories/" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "The user prefers dark mode and wants notifications enabled",
"metadata": {"preferences": true},
"tags": ["user-preference", "ui"]
}'
# Search memories
curl -X POST "http://localhost:8000/api/v1/memories/search" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the UI preferences?",
"limit": 5,
"threshold": 0.7
}'┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ FastAPI App │───▶│ OpenAI API │───▶│ Qdrant DB │
│ │ │ (Embeddings) │ │ (Vectors) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ PostgreSQL DB │ │ Redis + RQ │
│ (Metadata) │ │ (Jobs/Cache) │
└─────────────────┘ └──────────────────┘
│
▼
┌──────────────────┐
│ Embedding Worker │
│ (2x Replicas) │
└──────────────────┘
Components:
- FastAPI: REST API server with async request handling
- OpenAI API: Generates embeddings via
text-embedding-3-smallmodel - Qdrant: Vector database for semantic similarity search
- PostgreSQL: Stores metadata, tenants, and API keys
- Redis: Job queue for background processing
- Workers: Background embedding generation (scalable)
All requests require an API key:
Authorization: Bearer YOUR_API_KEYPOST /api/v1/memories/
Content-Type: application/json
{
"content": "Text to remember",
"metadata": {"key": "value"},
"tags": ["tag1", "tag2"],
"user_id": "user123",
"container_id": "app:user:user123"
}Response: Returns Memory object (sync) or MemoryJobResponse (async) with job ID
POST /api/v1/memories/search
Content-Type: application/json
{
"query": "search text",
"limit": 10,
"threshold": 0.7,
"filters": {"metadata.key": "value"},
"tags": ["tag1"]
}GET /api/v1/memories/{memory_id}PUT /api/v1/memories/{memory_id}
Content-Type: application/json
{
"content": "Updated text",
"metadata": {"key": "new_value"}
}DELETE /api/v1/memories/{memory_id}GET /api/v1/memories/jobs/{job_id}/statusResponse: Job status, queue position, and ETA
GET /api/v1/healthGET /api/v1/auth/tenantGET /api/v1/auth/api-keysPOST /api/v1/auth/api-keys
Content-Type: application/json
{
"name": "production-key",
"description": "API key for production environment"
}Full API documentation available at http://localhost:8000/docs (Swagger UI)
Key environment variables in .env:
| Variable | Description | Required | Default |
|---|---|---|---|
OPENAI_API_KEY |
OpenAI API key for embeddings | Yes | - |
SECRET_KEY |
JWT secret for authentication | Yes | - |
DEPLOYMENT_SALT |
HMAC salt for API keys | Yes (prod) | dev-salt |
POSTGRES_PASSWORD |
Database password | Yes (prod) | password |
DEFAULT_EMBEDDING_MODEL |
OpenAI embedding model | No | text-embedding-3-small |
EMBEDDING_TIMEOUT_THRESHOLD |
Sync/async threshold (seconds) | No | 0.2 |
LOG_LEVEL |
Logging verbosity | No | INFO |
CORS_ORIGINS |
Allowed CORS origins | No | * |
See .env.example for all available options.
Create and manage tenants:
# Create new tenant
python manage.py create-tenant --tenant-id "my-app" --tenant-name "My Application"
# Generate additional API key
python manage.py generate-key --tenant-id "my-app" --key-name "production"
# List all tenants
python manage.py list-tenants
# Test connectivity
python manage.py test-embedding
python manage.py test-qdrantComplete production-ready stack:
docker-compose up -dMonitor services:
docker-compose ps
docker-compose logs -f memory-serviceComing soon - community contributions welcome!
-
Security:
- Set strong
SECRET_KEYandDEPLOYMENT_SALT - Change default
POSTGRES_PASSWORD - Use HTTPS reverse proxy (nginx, Caddy)
- Restrict CORS origins
- Keep OpenAI API key secure
- Set strong
-
Scaling:
- Increase worker replicas: Edit
deploy.replicasindocker-compose.yml - Scale memory-service:
docker-compose up -d --scale memory-service=3 - Use external PostgreSQL/Redis for high availability
- Monitor queue depth and adjust workers accordingly
- Increase worker replicas: Edit
-
Monitoring:
- Check
/api/v1/healthendpoint - Monitor Docker container health
- Track Redis queue depth
- Set up log aggregation (ELK, Loki)
- Check
-
Backup:
- PostgreSQL: Regular database dumps
- Qdrant: Backup
/qdrant/storagevolume - Redis: AOF persistence enabled by default
# Clone repository
git clone https://github.com/yourusername/kernal.git
cd kernal
# Create virtual environment
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Install dependencies
pip install -r requirements.txt
# Start infrastructure
docker-compose up -d postgres redis qdrant
# Run API server
uvicorn memory_service.main:app --reload
# Run worker
python memory_service/worker.py# Apply migrations
alembic upgrade head
# Create new migration
alembic revision --autogenerate -m "description"
# Rollback
alembic downgrade -1# Format code
black .
isort .
# Type checking
mypy .
# Run tests (if available)
pytest- AI Chatbots: Remember conversation context across sessions
- Personal Assistants: Store user preferences and history
- Customer Support: Track customer interactions and history
- Content Recommendation: Remember user interests and behaviors
- Knowledge Management: Semantic search across documentation
- RAG Applications: Retrieval-Augmented Generation pipelines
import requests
from typing import List, Dict, Optional
class KernalClient:
def __init__(self, api_key: str, base_url: str = "http://localhost:8000/api/v1"):
self.base_url = base_url
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
def create_memory(self, content: str, metadata: Optional[Dict] = None,
tags: Optional[List[str]] = None) -> Dict:
"""Create a new memory"""
response = requests.post(
f"{self.base_url}/memories/",
headers=self.headers,
json={"content": content, "metadata": metadata or {}, "tags": tags or []}
)
response.raise_for_status()
return response.json()
def search_memories(self, query: str, limit: int = 10,
threshold: float = 0.7) -> List[Dict]:
"""Search memories by semantic similarity"""
response = requests.post(
f"{self.base_url}/memories/search",
headers=self.headers,
json={"query": query, "limit": limit, "threshold": threshold}
)
response.raise_for_status()
return response.json()
# Usage
client = KernalClient("your-api-key")
memory = client.create_memory("User loves Python", tags=["programming"])
results = client.search_memories("programming languages")import openai
from kernal_client import KernalClient # Use the client above
class RAGPipeline:
def __init__(self, kernal_api_key: str, openai_api_key: str):
self.kernal = KernalClient(kernal_api_key)
openai.api_key = openai_api_key
def store_knowledge(self, content: str, metadata: Dict = None):
"""Store information in knowledge base"""
return self.kernal.create_memory(content, metadata=metadata, tags=["knowledge"])
def query(self, question: str) -> str:
"""Answer question using RAG pattern"""
# Retrieve relevant context
memories = self.kernal.search_memories(question, limit=5)
context = "\n".join([m['content'] for m in memories])
# Generate answer with context
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Answer based on the provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
]
)
return response.choices[0].message.content
# Usage
rag = RAGPipeline("kernal-key", "openai-key")
rag.store_knowledge("The company was founded in 2020")
answer = rag.query("When was the company founded?")class MemoryChatbot:
def __init__(self, kernal_api_key: str):
self.kernal = KernalClient(kernal_api_key)
self.conversation_history = []
def chat(self, user_message: str) -> str:
"""Chat with memory of past conversations"""
# Store user message
self.kernal.create_memory(
f"User: {user_message}",
metadata={"type": "user_message"},
tags=["conversation"]
)
# Retrieve relevant past conversations
context = self.kernal.search_memories(user_message, limit=5)
context_str = "\n".join([m['content'] for m in context])
# Generate response (integrate with your LLM)
# response = your_llm.generate(user_message, context=context_str)
response = f"Response based on context: {context_str[:100]}..."
# Store bot response
self.kernal.create_memory(
f"Bot: {response}",
metadata={"type": "bot_response"},
tags=["conversation"]
)
return response
# Usage
bot = MemoryChatbot("your-api-key")
print(bot.chat("I love Python programming"))
print(bot.chat("What do I like?")) # Bot will rememberconst axios = require('axios');
class KernalClient {
constructor(apiKey, baseUrl = 'http://localhost:8000/api/v1') {
this.client = axios.create({
baseURL: baseUrl,
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
}
});
}
async createMemory(content, metadata = {}, tags = []) {
const response = await this.client.post('/memories/', {
content, metadata, tags
});
return response.data;
}
async searchMemories(query, limit = 10, threshold = 0.7) {
const response = await this.client.post('/memories/search', {
query, limit, threshold
});
return response.data;
}
async getMemory(memoryId) {
const response = await this.client.get(`/memories/${memoryId}`);
return response.data;
}
}
// Usage
(async () => {
const kernal = new KernalClient('your-api-key');
const memory = await kernal.createMemory(
'User prefers TypeScript',
{ category: 'preference' },
['programming', 'typescript']
);
console.log('Created:', memory.id);
const results = await kernal.searchMemories('programming preferences');
results.forEach(m => console.log(`- ${m.content} (${m.score})`));
})();We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas needing help:
- Test coverage
- Documentation improvements
- Performance optimization
- Additional embedding providers
- Example implementations
MIT License - see LICENSE for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: See CLAUDE.md for architecture details
Built with:
- FastAPI - Web framework
- Qdrant - Vector database
- OpenAI - Embedding models
- PostgreSQL - Metadata storage
- Redis - Job queue
- Python-RQ - Background workers
Star this repo if you find it useful! ⭐