Memlayer

The plug-and-play memory layer for smart, contextual agents

Memlayer adds persistent, intelligent memory to any LLM, enabling agents that recall context across conversations, extract structured knowledge, and surface relevant information when it matters.

<100ms Fast Search • Noise-Aware Memory Gate • Multi-Tier Retrieval Modes • 100% Local • Zero Config

Quick Start • Documentation • Examples • Twitter

Features

Universal LLM Support: Works with OpenAI, Claude, Gemini, Ollama models
Plug-and-play: Install with pip install memlayer and get started in minutes — minimal setup required.
Intelligent Memory Filtering: Three operation modes (LOCAL/ONLINE/LIGHTWEIGHT) automatically filter important information
Hybrid Search: Combines vector similarity + knowledge graph traversal for accurate retrieval
Three Search Tiers: Fast (<100ms), Balanced (<500ms), Deep (<2s) optimized for different use cases
Knowledge Graph: Automatically extracts entities, relationships, and facts from conversations
Proactive Reminders: Schedule tasks and get automatic reminders when they're due
Built-in Observability: Trace every search operation with detailed performance metrics
Flexible Storage: ChromaDB (vector) + NetworkX (graph) or graph-only mode
Production Ready: Serverless-friendly with fast cold starts using online mode

Quick Start

Installation

pip install memlayer

Basic Usage

from memlayer.wrappers.openai import OpenAI

# Initialize with memory capabilities
client = OpenAI(
    model="gpt-4.1-mini",
    storage_path="./memories",
    user_id="user_123"
)

# Store information automatically
client.chat([
    {"role": "user", "content": "My name is Alice and I work at TechCorp"}
])

# Retrieve information automatically (no manual prompting needed!)
response = client.chat([
    {"role": "user", "content": "Where do I work?"}
])
# Response: "You work at TechCorp."

That's it! Memlayer automatically:

✅ Filters salient information using ML-based classification
✅ Extracts structured facts, entities, and relationships
✅ Stores memories in hybrid vector + graph storage
✅ Retrieves relevant context for each query
✅ Injects memories seamlessly into LLM context

Key Concepts

Salience Filtering

Not all conversation content is worth storing. Memlayer uses salience gates to intelligently filter:

✅ Save: Facts, preferences, user info, decisions, relationships
❌ Skip: Greetings, acknowledgments, filler words, meta-conversation

Hybrid Storage

Memories are stored in two complementary systems:

Vector Store (ChromaDB): Semantic similarity search for facts
Knowledge Graph (NetworkX): Entity relationships and structured knowledge

Automatic Consolidation

After each conversation, background threads:

Extract facts, entities, and relationships using LLM
Store facts in vector database with embeddings
Build knowledge graph with entities and relationships
Index everything for fast retrieval

Memory Modes

Memlayer offers three modes that control both memory filtering (salience) and storage:

1. LOCAL Mode (Default)

client = OpenAI(salience_mode="local")

Filtering: Sentence-transformers ML model (high accuracy)
Storage: ChromaDB (vector) + NetworkX (graph)
Startup: ~10s (model loading)
Best for: High-volume production, offline apps
Cost: Free (no API calls)

2. ONLINE Mode

client = OpenAI(salience_mode="online")

Filtering: OpenAI embeddings API (high accuracy)
Storage: ChromaDB (vector) + NetworkX (graph)
Startup: ~2s (no model loading!)
Best for: Serverless, cloud functions, fast cold starts
Cost: ~$0.0001 per operation

3. LIGHTWEIGHT Mode

client = OpenAI(salience_mode="lightweight")

Filtering: Keyword-based (medium accuracy)
Storage: NetworkX only (no vector storage!)
Startup: <1s (instant)
Best for: Prototyping, testing, low-resource environments
Cost: Free (no embeddings at all)

Performance Comparison:

Mode          Startup Time    Accuracy    API Cost    Storage
──────────────────────────────────────────────────────────────
LOCAL         ~10s            High        Free        Vector+Graph
ONLINE        ~2s             High        $0.0001/op  Vector+Graph  
LIGHTWEIGHT   <1s             Medium      Free        Graph-only

Search Tiers

Memlayer provides three search tiers optimized for different latency requirements:

Fast Tier (<100ms)

# Automatic - LLM chooses based on query complexity
response = client.chat([{"role": "user", "content": "What's my name?"}])

2 vector search results
No graph traversal
Perfect for: Real-time chat, simple factual recall

Balanced Tier (<500ms) DEFAULT

# Automatic - handles most queries well
response = client.chat([{"role": "user", "content": "Tell me about my projects"}])

5 vector search results
No graph traversal
Perfect for: General conversation, most use cases

Deep Tier (<2s)

# Explicit request or auto-detected for complex queries
response = client.chat([{
    "role": "user",
    "content": "Use deep search: Tell me everything about Alice and her relationships"
}])

10 vector search results
Graph traversal enabled (entity extraction + 1-hop relationships)
Perfect for: Research, "tell me everything", multi-hop reasoning

🔌 Providers

Memlayer works with all major LLM providers:

OpenAI

from memlayer.wrappers.openai import OpenAI

client = OpenAI(
    model="gpt-4.1-mini",  # or gpt-4.1, gpt-5, etc.
    storage_path="./memories",
    user_id="user_123"
)

Claude (Anthropic)

from memlayer.wrappers.claude import Claude

client = Claude(
    model="claude-4-sonnet",
    storage_path="./memories",
    user_id="user_123"
)

Google Gemini

from memlayer.wrappers.gemini import Gemini

client = Gemini(
    model="gemini-2.5-flash",
    storage_path="./memories",
    user_id="user_123"
)

Ollama (Local)

from memlayer.wrappers.ollama import Ollama

client = Ollama(
    host="http://localhost:11434",
    model="qwen3:1.7b",  # or llama3.2, mistral, etc.
    storage_path="./memories",
    user_id="user_123",
    salience_mode="local"  # Run 100% offline!
)

All providers share the same API - switch between them seamlessly!

Advanced Features

Proactive Task Reminders

# User schedules a task
client.chat([{
    "role": "user",
    "content": "Remind me to submit the report next Friday at 9am"
}])

# Later, when the task is due, Memlayer automatically injects it
response = client.chat([{"role": "user", "content": "What should I do today?"}])
# Response includes: "Don't forget to submit the report - it's due today at 9am!"

Observability & Tracing

response = client.chat(messages)

# Inspect search performance
if client.last_trace:
    print(f"Search tier: {client.last_trace.events[0].metadata.get('tier')}")
    print(f"Total time: {client.last_trace.total_duration_ms}ms")
    
    for event in client.last_trace.events:
        print(f"  {event.event_type}: {event.duration_ms}ms")

Custom Salience Threshold

# Control memory filtering strictness
client = OpenAI(
    salience_threshold=-0.1  # Permissive (saves more)
    # salience_threshold=0.0   # Balanced (default)
    # salience_threshold=0.1   # Strict (saves less)
)

Knowledge Graph Extraction

# Manually extract structured knowledge
kg = client.analyze_and_extract_knowledge(
    "Alice leads Project Phoenix in the London office. The project uses Python and React."
)

print(kg["facts"])         # ["Alice leads Project Phoenix", ...]
print(kg["entities"])      # [{"name": "Alice", "type": "Person"}, ...]
print(kg["relationships"]) # [{"subject": "Alice", "predicate": "leads", "object": "Project Phoenix"}]

Examples

Explore the examples/ directory for comprehensive examples:

Basics

# Getting started
python examples/01_basics/getting_started.py

Search Tiers

# Try all three search tiers
python examples/02_search_tiers/fast_tier_example.py
python examples/02_search_tiers/balanced_tier_example.py
python examples/02_search_tiers/deep_tier_example.py

# Compare them side-by-side
python examples/02_search_tiers/tier_comparison.py

Advanced Features

# Proactive task reminders
python examples/03_features/task_reminders.py

# Knowledge graph visualization
python examples/03_features/test_knowledge_graph.py

Benchmarks

# Compare salience modes
python examples/04_benchmarks/compare_salience_modes.py

Providers

# Try different LLM providers
python examples/05_providers/openai_example.py
python examples/05_providers/claude_example.py
python examples/05_providers/gemini_example.py
python examples/05_providers/ollama_example.py

See examples/README.md for full documentation.

Performance

Salience Mode Comparison

Real-world startup times from benchmarks:

Mode          First Use    Memory Savings    Trade-off
─────────────────────────────────────────────────────────
LIGHTWEIGHT   ~5s          No embeddings     No semantic search
ONLINE        ~5s          5s faster         Small API cost
LOCAL         ~10s         No API cost       11s model loading

Search Tier Latency

Typical query latencies:

Tier        Latency    Vector Results    Graph    Use Case
────────────────────────────────────────────────────────────
Fast        50-150ms   2                 No       Real-time chat
Balanced    200-600ms  5                 No       General use
Deep        800-2500ms 10                Yes      Research queries

Memory Consolidation

Background processing (non-blocking):

Step                        Time      Async
──────────────────────────────────────────────
Salience filtering         ~10ms      Yes
Knowledge extraction       ~1-2s      Yes (background thread)
Vector storage             ~50ms      Yes
Graph storage              ~20ms      Yes
Total (non-blocking)       ~0ms       User doesn't wait!

Documentation

Getting Started

Basics Overview - Architecture, components, and how Memlayer works
Quickstart Guide - Get up and running in 5 minutes
Streaming Mode - Complete guide to streaming responses
API Reference - Complete API documentation with all methods and parameters

Provider Setup

Providers Overview - Compare all providers, choose the right one
Ollama Setup - Run completely offline with local models
OpenAI - OpenAI configuration
Claude - Anthropic Claude setup
Gemini - Google Gemini configuration

Examples

Examples Index - Comprehensive examples by category
Provider Examples - Provider comparison and usage

Tunable features (quick index)

The project exposes several runtime/configuration knobs you can tune to match latency, cost, and accuracy trade-offs. Detailed docs for each area live in the docs/ folder:

docs/tuning/operation_mode.md — Architecture deep dive: How to choose between online, local, and lightweight modes, performance implications, storage composition, and deployment strategies.
docs/tuning/intervals.md — Scheduler and curation interval configuration (scheduler_interval_seconds, curation_interval_seconds) and practical guidance.
docs/tuning/salience_threshold.md — How to adjust salience_threshold and expected behavior.
docs/services/consolidation.md — Consolidation pipeline internals and how to call it programmatically (including update_from_text).
docs/services/curation.md — How memory curation works, archiving rules, and how to run/stop the curation service.
docs/storage/chroma.md — ChromaDB notes: metadata types, connection handling, and Windows file-lock guidance.
docs/storage/networkx.md — Knowledge graph persistence, expected node schemas, and backup/restore tips.

Use the docs when tuning for production. The following docs/ files were added to this repository and provide detailed, practical guidance.

Development

Setup

# Clone repository
git clone https://github.com/divagr18/memlayer.git
cd memlayer

# Install dependencies
pip install -e .

# Run tests
python -m pytest tests/

# Run examples
python examples/01_basics/getting_started.py

Project Structure

memlayer/
├── memlayer/           # Core library
│   ├── wrappers/          # LLM provider wrappers
│   ├── storage/           # Storage backends (ChromaDB, NetworkX)
│   ├── services.py        # Search & consolidation services
│   ├── ml_gate.py         # Salience filtering
│   └── embedding_models.py # Embedding model implementations
├── examples/              # Organized examples by category
│   ├── 01_basics/
│   ├── 02_search_tiers/
│   ├── 03_features/
│   ├── 04_benchmarks/
│   └── 05_providers/
├── tests/                 # Tests and benchmarks
├── docs/                  # Documentation
└── README.md              # This file

Contributing

Contributions are welcome! Here's how you can help:

Report bugs - Open an issue with reproduction steps
Suggest features - Share your use case and requirements
Submit PRs - Fix bugs, add features, improve docs
Share examples - Show us what you've built!

Please keep PRs focused and include tests for new features.

Contact & Support

Author/Maintainer: Divyansh Agrawal
Email: [email protected]
GitHub: divagr18
Issues: Report bugs or request features via GitHub Issues

For security vulnerabilities, please email directly with SECURITY in the subject line instead of opening a public issue.

License

MIT License - see LICENSE for details.

Acknowledgments

Built with ChromaDB for vector storage
Uses NetworkX for knowledge graph operations
Powered by sentence-transformers for local embeddings
Supports OpenAI, Anthropic, Google Gemini, and Ollama

Made with ❤️ for the AI community

Give your LLMs memory. Try Memlayer today!

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
examples		examples
memlayer		memlayer
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
memlayer.png		memlayer.png
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

License

merdan-9/memlayer

Folders and files

Latest commit

History

Repository files navigation

Memlayer

Contents

Features

Quick Start

Installation

Basic Usage

Key Concepts

Salience Filtering

Hybrid Storage

Automatic Consolidation

Memory Modes

1. LOCAL Mode (Default)

2. ONLINE Mode

3. LIGHTWEIGHT Mode

Search Tiers

Fast Tier (<100ms)

Balanced Tier (<500ms) DEFAULT

Deep Tier (<2s)

🔌 Providers

OpenAI

Claude (Anthropic)

Google Gemini

Ollama (Local)

Advanced Features

Proactive Task Reminders

Observability & Tracing

Custom Salience Threshold

Knowledge Graph Extraction

Examples

Basics

Search Tiers

Advanced Features

Benchmarks

Providers

Performance

Salience Mode Comparison

Search Tier Latency

Memory Consolidation

Documentation

Getting Started

Provider Setup

Examples

Tunable features (quick index)

Development

Setup

Project Structure

Contributing

Contact & Support

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages