Kernal

Open-source Memory-as-a-Service - Add persistent memory to your AI applications

Kernal is a self-hostable API that provides persistent memory capabilities for AI applications. It handles embedding generation, vector storage, and semantic retrieval through a simple REST API.

Features

Simple REST API: Store and retrieve memories with a few HTTP calls
Semantic Search: AI-powered similarity search using vector embeddings
OpenAI Embeddings: Uses OpenAI's embedding models for optimal quality
Multi-Tenant: Secure isolation with tenant/container/user hierarchy
Async Processing: Hybrid sync/async architecture for optimal performance
Docker Compose: Complete stack deployment in one command
Self-Hosted: Full control over your data and infrastructure
Open Source: MIT licensed, community-driven development

Quick Start

Prerequisites

Docker & Docker Compose
OpenAI API key (for embeddings)
2GB RAM minimum, 4GB recommended

Installation

Clone the repository

git clone https://github.com/yourusername/kernal.git
cd kernal

Configure environment

cp .env.example .env
nano .env  # Add your OPENAI_API_KEY and other settings

Start all services
```
docker-compose up -d
```

Get your API key

docker-compose logs memory-service | grep "Default API Key"

That's it! Your Kernal instance is running at http://localhost:8000

Quick Test

# Store a memory
curl -X POST "http://localhost:8000/api/v1/memories/" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "The user prefers dark mode and wants notifications enabled",
    "metadata": {"preferences": true},
    "tags": ["user-preference", "ui"]
  }'

# Search memories
curl -X POST "http://localhost:8000/api/v1/memories/search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the UI preferences?",
    "limit": 5,
    "threshold": 0.7
  }'

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   FastAPI App   │───▶│  OpenAI API      │───▶│   Qdrant DB     │
│                 │    │  (Embeddings)    │    │   (Vectors)     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │
         ▼                       ▼
┌─────────────────┐    ┌──────────────────┐
│  PostgreSQL DB  │    │  Redis + RQ      │
│  (Metadata)     │    │  (Jobs/Cache)    │
└─────────────────┘    └──────────────────┘
                                │
                                ▼
                     ┌──────────────────┐
                     │ Embedding Worker │
                     │  (2x Replicas)   │
                     └──────────────────┘

Components:

FastAPI: REST API server with async request handling
OpenAI API: Generates embeddings via text-embedding-3-small model
Qdrant: Vector database for semantic similarity search
PostgreSQL: Stores metadata, tenants, and API keys
Redis: Job queue for background processing
Workers: Background embedding generation (scalable)

API Documentation

Authentication

All requests require an API key:

Authorization: Bearer YOUR_API_KEY

Core Endpoints

Create Memory

POST /api/v1/memories/
Content-Type: application/json

{
  "content": "Text to remember",
  "metadata": {"key": "value"},
  "tags": ["tag1", "tag2"],
  "user_id": "user123",
  "container_id": "app:user:user123"
}

Response: Returns Memory object (sync) or MemoryJobResponse (async) with job ID

Search Memories

POST /api/v1/memories/search
Content-Type: application/json

{
  "query": "search text",
  "limit": 10,
  "threshold": 0.7,
  "filters": {"metadata.key": "value"},
  "tags": ["tag1"]
}

Get Memory

GET /api/v1/memories/{memory_id}

Update Memory

PUT /api/v1/memories/{memory_id}
Content-Type: application/json

{
  "content": "Updated text",
  "metadata": {"key": "new_value"}
}

Delete Memory

DELETE /api/v1/memories/{memory_id}

Check Job Status

GET /api/v1/memories/jobs/{job_id}/status

Response: Job status, queue position, and ETA

Management Endpoints

Health Check

GET /api/v1/health

Tenant Info

GET /api/v1/auth/tenant

List API Keys

GET /api/v1/auth/api-keys

Create Additional API Key

POST /api/v1/auth/api-keys
Content-Type: application/json

{
  "name": "production-key",
  "description": "API key for production environment"
}

Full API documentation available at http://localhost:8000/docs (Swagger UI)

Configuration

Key environment variables in .env:

Variable	Description	Required	Default
`OPENAI_API_KEY`	OpenAI API key for embeddings	Yes	-
`SECRET_KEY`	JWT secret for authentication	Yes	-
`DEPLOYMENT_SALT`	HMAC salt for API keys	Yes (prod)	dev-salt
`POSTGRES_PASSWORD`	Database password	Yes (prod)	password
`DEFAULT_EMBEDDING_MODEL`	OpenAI embedding model	No	text-embedding-3-small
`EMBEDDING_TIMEOUT_THRESHOLD`	Sync/async threshold (seconds)	No	0.2
`LOG_LEVEL`	Logging verbosity	No	INFO
`CORS_ORIGINS`	Allowed CORS origins	No	*

See .env.example for all available options.

Management CLI

Create and manage tenants:

# Create new tenant
python manage.py create-tenant --tenant-id "my-app" --tenant-name "My Application"

# Generate additional API key
python manage.py generate-key --tenant-id "my-app" --key-name "production"

# List all tenants
python manage.py list-tenants

# Test connectivity
python manage.py test-embedding
python manage.py test-qdrant

Deployment

Docker Compose (Recommended)

Complete production-ready stack:

docker-compose up -d

Monitor services:

docker-compose ps
docker-compose logs -f memory-service

Kubernetes

Coming soon - community contributions welcome!

Production Considerations

Security:
- Set strong SECRET_KEY and DEPLOYMENT_SALT
- Change default POSTGRES_PASSWORD
- Use HTTPS reverse proxy (nginx, Caddy)
- Restrict CORS origins
- Keep OpenAI API key secure
Scaling:
- Increase worker replicas: Edit deploy.replicas in docker-compose.yml
- Scale memory-service: docker-compose up -d --scale memory-service=3
- Use external PostgreSQL/Redis for high availability
- Monitor queue depth and adjust workers accordingly
Monitoring:
- Check /api/v1/health endpoint
- Monitor Docker container health
- Track Redis queue depth
- Set up log aggregation (ELK, Loki)
Backup:
- PostgreSQL: Regular database dumps
- Qdrant: Backup /qdrant/storage volume
- Redis: AOF persistence enabled by default

Development

Setup Development Environment

# Clone repository
git clone https://github.com/yourusername/kernal.git
cd kernal

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install dependencies
pip install -r requirements.txt

# Start infrastructure
docker-compose up -d postgres redis qdrant

# Run API server
uvicorn memory_service.main:app --reload

# Run worker
python memory_service/worker.py

Database Migrations

# Apply migrations
alembic upgrade head

# Create new migration
alembic revision --autogenerate -m "description"

# Rollback
alembic downgrade -1

Code Quality

# Format code
black .
isort .

# Type checking
mypy .

# Run tests (if available)
pytest

Use Cases

AI Chatbots: Remember conversation context across sessions
Personal Assistants: Store user preferences and history
Customer Support: Track customer interactions and history
Content Recommendation: Remember user interests and behaviors
Knowledge Management: Semantic search across documentation
RAG Applications: Retrieval-Augmented Generation pipelines

Integration Examples

Python SDK Pattern

import requests
from typing import List, Dict, Optional

class KernalClient:
    def __init__(self, api_key: str, base_url: str = "http://localhost:8000/api/v1"):
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def create_memory(self, content: str, metadata: Optional[Dict] = None,
                     tags: Optional[List[str]] = None) -> Dict:
        """Create a new memory"""
        response = requests.post(
            f"{self.base_url}/memories/",
            headers=self.headers,
            json={"content": content, "metadata": metadata or {}, "tags": tags or []}
        )
        response.raise_for_status()
        return response.json()

    def search_memories(self, query: str, limit: int = 10,
                       threshold: float = 0.7) -> List[Dict]:
        """Search memories by semantic similarity"""
        response = requests.post(
            f"{self.base_url}/memories/search",
            headers=self.headers,
            json={"query": query, "limit": limit, "threshold": threshold}
        )
        response.raise_for_status()
        return response.json()

# Usage
client = KernalClient("your-api-key")
memory = client.create_memory("User loves Python", tags=["programming"])
results = client.search_memories("programming languages")

RAG (Retrieval-Augmented Generation)

import openai
from kernal_client import KernalClient  # Use the client above

class RAGPipeline:
    def __init__(self, kernal_api_key: str, openai_api_key: str):
        self.kernal = KernalClient(kernal_api_key)
        openai.api_key = openai_api_key

    def store_knowledge(self, content: str, metadata: Dict = None):
        """Store information in knowledge base"""
        return self.kernal.create_memory(content, metadata=metadata, tags=["knowledge"])

    def query(self, question: str) -> str:
        """Answer question using RAG pattern"""
        # Retrieve relevant context
        memories = self.kernal.search_memories(question, limit=5)
        context = "\n".join([m['content'] for m in memories])

        # Generate answer with context
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "Answer based on the provided context."},
                {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
            ]
        )
        return response.choices[0].message.content

# Usage
rag = RAGPipeline("kernal-key", "openai-key")
rag.store_knowledge("The company was founded in 2020")
answer = rag.query("When was the company founded?")

Chatbot with Conversation Memory

class MemoryChatbot:
    def __init__(self, kernal_api_key: str):
        self.kernal = KernalClient(kernal_api_key)
        self.conversation_history = []

    def chat(self, user_message: str) -> str:
        """Chat with memory of past conversations"""
        # Store user message
        self.kernal.create_memory(
            f"User: {user_message}",
            metadata={"type": "user_message"},
            tags=["conversation"]
        )

        # Retrieve relevant past conversations
        context = self.kernal.search_memories(user_message, limit=5)
        context_str = "\n".join([m['content'] for m in context])

        # Generate response (integrate with your LLM)
        # response = your_llm.generate(user_message, context=context_str)
        response = f"Response based on context: {context_str[:100]}..."

        # Store bot response
        self.kernal.create_memory(
            f"Bot: {response}",
            metadata={"type": "bot_response"},
            tags=["conversation"]
        )

        return response

# Usage
bot = MemoryChatbot("your-api-key")
print(bot.chat("I love Python programming"))
print(bot.chat("What do I like?"))  # Bot will remember

Node.js / JavaScript

const axios = require('axios');

class KernalClient {
  constructor(apiKey, baseUrl = 'http://localhost:8000/api/v1') {
    this.client = axios.create({
      baseURL: baseUrl,
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      }
    });
  }

  async createMemory(content, metadata = {}, tags = []) {
    const response = await this.client.post('/memories/', {
      content, metadata, tags
    });
    return response.data;
  }

  async searchMemories(query, limit = 10, threshold = 0.7) {
    const response = await this.client.post('/memories/search', {
      query, limit, threshold
    });
    return response.data;
  }

  async getMemory(memoryId) {
    const response = await this.client.get(`/memories/${memoryId}`);
    return response.data;
  }
}

// Usage
(async () => {
  const kernal = new KernalClient('your-api-key');

  const memory = await kernal.createMemory(
    'User prefers TypeScript',
    { category: 'preference' },
    ['programming', 'typescript']
  );
  console.log('Created:', memory.id);

  const results = await kernal.searchMemories('programming preferences');
  results.forEach(m => console.log(`- ${m.content} (${m.score})`));
})();

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas needing help:

Test coverage
Documentation improvements
Performance optimization
Additional embedding providers
Example implementations

License

MIT License - see LICENSE for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: See CLAUDE.md for architecture details

Star History

Acknowledgments

Built with:

FastAPI - Web framework
Qdrant - Vector database
OpenAI - Embedding models
PostgreSQL - Metadata storage
Redis - Job queue
Python-RQ - Background workers

Star this repo if you find it useful! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
alembic		alembic
memory_service		memory_service
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
manage.py		manage.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh

License

Dakkshin/kernalmemory

Folders and files

Latest commit

History

Repository files navigation

Kernal

Features

Quick Start

Prerequisites

Installation

Quick Test

Architecture

API Documentation

Authentication

Core Endpoints

Create Memory

Search Memories

Get Memory

Update Memory

Delete Memory

Check Job Status

Management Endpoints

Health Check

Tenant Info

List API Keys

Create Additional API Key

Configuration

Management CLI

Deployment

Docker Compose (Recommended)

Kubernetes

Production Considerations

Development

Setup Development Environment

Database Migrations

Code Quality

Use Cases

Integration Examples

Python SDK Pattern

RAG (Retrieval-Augmented Generation)

Chatbot with Conversation Memory

Node.js / JavaScript

Contributing

License

Support

Star History

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages