Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Dakkshin/kernalmemory

Repository files navigation

Kernal

Open-source Memory-as-a-Service - Add persistent memory to your AI applications

License: MIT Docker FastAPI Python 3.11+

Kernal is a self-hostable API that provides persistent memory capabilities for AI applications. It handles embedding generation, vector storage, and semantic retrieval through a simple REST API.

Features

  • Simple REST API: Store and retrieve memories with a few HTTP calls
  • Semantic Search: AI-powered similarity search using vector embeddings
  • OpenAI Embeddings: Uses OpenAI's embedding models for optimal quality
  • Multi-Tenant: Secure isolation with tenant/container/user hierarchy
  • Async Processing: Hybrid sync/async architecture for optimal performance
  • Docker Compose: Complete stack deployment in one command
  • Self-Hosted: Full control over your data and infrastructure
  • Open Source: MIT licensed, community-driven development

Quick Start

Prerequisites

  • Docker & Docker Compose
  • OpenAI API key (for embeddings)
  • 2GB RAM minimum, 4GB recommended

Installation

  1. Clone the repository

    git clone https://github.com/yourusername/kernal.git
    cd kernal
  2. Configure environment

    cp .env.example .env
    nano .env  # Add your OPENAI_API_KEY and other settings
  3. Start all services

    docker-compose up -d
  4. Get your API key

    docker-compose logs memory-service | grep "Default API Key"

That's it! Your Kernal instance is running at http://localhost:8000

Quick Test

# Store a memory
curl -X POST "http://localhost:8000/api/v1/memories/" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "The user prefers dark mode and wants notifications enabled",
    "metadata": {"preferences": true},
    "tags": ["user-preference", "ui"]
  }'

# Search memories
curl -X POST "http://localhost:8000/api/v1/memories/search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the UI preferences?",
    "limit": 5,
    "threshold": 0.7
  }'

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   FastAPI App   │───▶│  OpenAI API      │───▶│   Qdrant DB     │
│                 │    │  (Embeddings)    │    │   (Vectors)     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │
         ▼                       ▼
┌─────────────────┐    ┌──────────────────┐
│  PostgreSQL DB  │    │  Redis + RQ      │
│  (Metadata)     │    │  (Jobs/Cache)    │
└─────────────────┘    └──────────────────┘
                                │
                                ▼
                     ┌──────────────────┐
                     │ Embedding Worker │
                     │  (2x Replicas)   │
                     └──────────────────┘

Components:

  • FastAPI: REST API server with async request handling
  • OpenAI API: Generates embeddings via text-embedding-3-small model
  • Qdrant: Vector database for semantic similarity search
  • PostgreSQL: Stores metadata, tenants, and API keys
  • Redis: Job queue for background processing
  • Workers: Background embedding generation (scalable)

API Documentation

Authentication

All requests require an API key:

Authorization: Bearer YOUR_API_KEY

Core Endpoints

Create Memory

POST /api/v1/memories/
Content-Type: application/json

{
  "content": "Text to remember",
  "metadata": {"key": "value"},
  "tags": ["tag1", "tag2"],
  "user_id": "user123",
  "container_id": "app:user:user123"
}

Response: Returns Memory object (sync) or MemoryJobResponse (async) with job ID

Search Memories

POST /api/v1/memories/search
Content-Type: application/json

{
  "query": "search text",
  "limit": 10,
  "threshold": 0.7,
  "filters": {"metadata.key": "value"},
  "tags": ["tag1"]
}

Get Memory

GET /api/v1/memories/{memory_id}

Update Memory

PUT /api/v1/memories/{memory_id}
Content-Type: application/json

{
  "content": "Updated text",
  "metadata": {"key": "new_value"}
}

Delete Memory

DELETE /api/v1/memories/{memory_id}

Check Job Status

GET /api/v1/memories/jobs/{job_id}/status

Response: Job status, queue position, and ETA

Management Endpoints

Health Check

GET /api/v1/health

Tenant Info

GET /api/v1/auth/tenant

List API Keys

GET /api/v1/auth/api-keys

Create Additional API Key

POST /api/v1/auth/api-keys
Content-Type: application/json

{
  "name": "production-key",
  "description": "API key for production environment"
}

Full API documentation available at http://localhost:8000/docs (Swagger UI)

Configuration

Key environment variables in .env:

Variable Description Required Default
OPENAI_API_KEY OpenAI API key for embeddings Yes -
SECRET_KEY JWT secret for authentication Yes -
DEPLOYMENT_SALT HMAC salt for API keys Yes (prod) dev-salt
POSTGRES_PASSWORD Database password Yes (prod) password
DEFAULT_EMBEDDING_MODEL OpenAI embedding model No text-embedding-3-small
EMBEDDING_TIMEOUT_THRESHOLD Sync/async threshold (seconds) No 0.2
LOG_LEVEL Logging verbosity No INFO
CORS_ORIGINS Allowed CORS origins No *

See .env.example for all available options.

Management CLI

Create and manage tenants:

# Create new tenant
python manage.py create-tenant --tenant-id "my-app" --tenant-name "My Application"

# Generate additional API key
python manage.py generate-key --tenant-id "my-app" --key-name "production"

# List all tenants
python manage.py list-tenants

# Test connectivity
python manage.py test-embedding
python manage.py test-qdrant

Deployment

Docker Compose (Recommended)

Complete production-ready stack:

docker-compose up -d

Monitor services:

docker-compose ps
docker-compose logs -f memory-service

Kubernetes

Coming soon - community contributions welcome!

Production Considerations

  1. Security:

    • Set strong SECRET_KEY and DEPLOYMENT_SALT
    • Change default POSTGRES_PASSWORD
    • Use HTTPS reverse proxy (nginx, Caddy)
    • Restrict CORS origins
    • Keep OpenAI API key secure
  2. Scaling:

    • Increase worker replicas: Edit deploy.replicas in docker-compose.yml
    • Scale memory-service: docker-compose up -d --scale memory-service=3
    • Use external PostgreSQL/Redis for high availability
    • Monitor queue depth and adjust workers accordingly
  3. Monitoring:

    • Check /api/v1/health endpoint
    • Monitor Docker container health
    • Track Redis queue depth
    • Set up log aggregation (ELK, Loki)
  4. Backup:

    • PostgreSQL: Regular database dumps
    • Qdrant: Backup /qdrant/storage volume
    • Redis: AOF persistence enabled by default

Development

Setup Development Environment

# Clone repository
git clone https://github.com/yourusername/kernal.git
cd kernal

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install dependencies
pip install -r requirements.txt

# Start infrastructure
docker-compose up -d postgres redis qdrant

# Run API server
uvicorn memory_service.main:app --reload

# Run worker
python memory_service/worker.py

Database Migrations

# Apply migrations
alembic upgrade head

# Create new migration
alembic revision --autogenerate -m "description"

# Rollback
alembic downgrade -1

Code Quality

# Format code
black .
isort .

# Type checking
mypy .

# Run tests (if available)
pytest

Use Cases

  • AI Chatbots: Remember conversation context across sessions
  • Personal Assistants: Store user preferences and history
  • Customer Support: Track customer interactions and history
  • Content Recommendation: Remember user interests and behaviors
  • Knowledge Management: Semantic search across documentation
  • RAG Applications: Retrieval-Augmented Generation pipelines

Integration Examples

Python SDK Pattern

import requests
from typing import List, Dict, Optional

class KernalClient:
    def __init__(self, api_key: str, base_url: str = "http://localhost:8000/api/v1"):
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def create_memory(self, content: str, metadata: Optional[Dict] = None,
                     tags: Optional[List[str]] = None) -> Dict:
        """Create a new memory"""
        response = requests.post(
            f"{self.base_url}/memories/",
            headers=self.headers,
            json={"content": content, "metadata": metadata or {}, "tags": tags or []}
        )
        response.raise_for_status()
        return response.json()

    def search_memories(self, query: str, limit: int = 10,
                       threshold: float = 0.7) -> List[Dict]:
        """Search memories by semantic similarity"""
        response = requests.post(
            f"{self.base_url}/memories/search",
            headers=self.headers,
            json={"query": query, "limit": limit, "threshold": threshold}
        )
        response.raise_for_status()
        return response.json()

# Usage
client = KernalClient("your-api-key")
memory = client.create_memory("User loves Python", tags=["programming"])
results = client.search_memories("programming languages")

RAG (Retrieval-Augmented Generation)

import openai
from kernal_client import KernalClient  # Use the client above

class RAGPipeline:
    def __init__(self, kernal_api_key: str, openai_api_key: str):
        self.kernal = KernalClient(kernal_api_key)
        openai.api_key = openai_api_key

    def store_knowledge(self, content: str, metadata: Dict = None):
        """Store information in knowledge base"""
        return self.kernal.create_memory(content, metadata=metadata, tags=["knowledge"])

    def query(self, question: str) -> str:
        """Answer question using RAG pattern"""
        # Retrieve relevant context
        memories = self.kernal.search_memories(question, limit=5)
        context = "\n".join([m['content'] for m in memories])

        # Generate answer with context
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "Answer based on the provided context."},
                {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
            ]
        )
        return response.choices[0].message.content

# Usage
rag = RAGPipeline("kernal-key", "openai-key")
rag.store_knowledge("The company was founded in 2020")
answer = rag.query("When was the company founded?")

Chatbot with Conversation Memory

class MemoryChatbot:
    def __init__(self, kernal_api_key: str):
        self.kernal = KernalClient(kernal_api_key)
        self.conversation_history = []

    def chat(self, user_message: str) -> str:
        """Chat with memory of past conversations"""
        # Store user message
        self.kernal.create_memory(
            f"User: {user_message}",
            metadata={"type": "user_message"},
            tags=["conversation"]
        )

        # Retrieve relevant past conversations
        context = self.kernal.search_memories(user_message, limit=5)
        context_str = "\n".join([m['content'] for m in context])

        # Generate response (integrate with your LLM)
        # response = your_llm.generate(user_message, context=context_str)
        response = f"Response based on context: {context_str[:100]}..."

        # Store bot response
        self.kernal.create_memory(
            f"Bot: {response}",
            metadata={"type": "bot_response"},
            tags=["conversation"]
        )

        return response

# Usage
bot = MemoryChatbot("your-api-key")
print(bot.chat("I love Python programming"))
print(bot.chat("What do I like?"))  # Bot will remember

Node.js / JavaScript

const axios = require('axios');

class KernalClient {
  constructor(apiKey, baseUrl = 'http://localhost:8000/api/v1') {
    this.client = axios.create({
      baseURL: baseUrl,
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      }
    });
  }

  async createMemory(content, metadata = {}, tags = []) {
    const response = await this.client.post('/memories/', {
      content, metadata, tags
    });
    return response.data;
  }

  async searchMemories(query, limit = 10, threshold = 0.7) {
    const response = await this.client.post('/memories/search', {
      query, limit, threshold
    });
    return response.data;
  }

  async getMemory(memoryId) {
    const response = await this.client.get(`/memories/${memoryId}`);
    return response.data;
  }
}

// Usage
(async () => {
  const kernal = new KernalClient('your-api-key');

  const memory = await kernal.createMemory(
    'User prefers TypeScript',
    { category: 'preference' },
    ['programming', 'typescript']
  );
  console.log('Created:', memory.id);

  const results = await kernal.searchMemories('programming preferences');
  results.forEach(m => console.log(`- ${m.content} (${m.score})`));
})();

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas needing help:

  • Test coverage
  • Documentation improvements
  • Performance optimization
  • Additional embedding providers
  • Example implementations

License

MIT License - see LICENSE for details.

Support

Star History

Star History Chart

Acknowledgments

Built with:


Star this repo if you find it useful!

Releases

No releases published

Packages

 
 
 

Contributors

Languages