Build AI applications, agents, and data pipelines with the Memvid Python SDK. Native Rust bindings deliver high performance with a Pythonic API.
Installation
Requirements: Python 3.8+, macOS/Linux/Windows. Native bindings included - no extra dependencies needed.
create() will OVERWRITE existing files without warning!| Function | Purpose | If File Exists | Parameter Order |
|---|
create(path) | Create new .mv2 file | DELETES all data | path first (kind is keyword-only) |
use(kind, path) | Open existing .mv2 file | Preserves data | kind first, then path |
Always check if the file exists before choosing — see example below.
Quick Start
from memvid_sdk import create, use
import os
path = 'knowledge.mv2'
# CRITICAL: Check if file exists to avoid data loss!
if os.path.exists(path):
mem = use('basic', path) # Open existing file (kind first!)
else:
mem = create(path) # Create new file (path first!)
# Lexical search (BM25) is enabled by default - no need to call enable_lex()
# Add documents
mem.put(
title='Meeting Notes',
label='notes',
metadata={'source': 'slack'},
text='Alice mentioned she works at Anthropic...'
)
# Search works immediately
results = mem.find('who works at AI companies?', k=5, mode='lex')
print(results['hits'])
# Ask questions
answer = mem.ask('What does Alice do?', k=5, mode='lex')
print(answer['answer'])
# Seal when done (commits changes)
mem.seal()
API Reference
| Category | Methods | Description |
|---|
| File Operations | create, use, close | Create, open, close memory files |
| Data Ingestion | put, put_many, put_file, put_files | Add documents with embeddings |
| Search | find, ask, timeline | Query your memory |
| Corrections | correct, correct_many | Store ground truth with retrieval boost |
| Memory Cards | memories, state, enrich, add_memory_cards | Structured fact extraction |
| Tables | put_pdf_tables, list_tables, get_table | PDF table extraction |
| Sessions | session_start, session_end, session_replay | Time-travel debugging |
| Tickets | sync_tickets, current_ticket, get_capacity | Capacity management |
| Security | lock, unlock, lock_who, lock_nudge | Encryption and access control |
| Utilities | verify, doctor | Maintenance and utilities |
Context Manager
import memvid_sdk as memvid
# Automatically closes when done
with memvid.use('basic', 'memory.mv2') as mem:
mem.put('Doc', 'test', {}, text='Content')
results = mem.find('query')
Core Functions
File Operations
from memvid_sdk import create, use, lock, unlock, info
# Create new memory file
mem = create('project.mv2')
# With options
mem = create(
'project.mv2',
enable_vec=True, # Enable vector index
enable_lex=True, # Enable lexical index
memory_id='mem_abc', # Bind to dashboard
api_key='mv_live_...' # Dashboard API key
)
# Open existing memory with adapter
mem = use('basic', 'project.mv2', mode='open')
# Available adapters
mem = use('langchain', 'file.mv2')
mem = use('llamaindex', 'file.mv2')
mem = use('crewai', 'file.mv2')
mem = use('autogen', 'file.mv2')
mem = use('haystack', 'file.mv2')
mem = use('langgraph', 'file.mv2')
mem = use('semantic-kernel', 'file.mv2')
mem = use('openai', 'file.mv2')
mem = use('google-adk', 'file.mv2')
# Get SDK info
sdk_info = info()
# Verify file integrity
result = mem.verify(deep=True)
# Repair and optimize
result = mem.doctor(
rebuild_time_index=True,
rebuild_vec_index=True,
vacuum=True
)
Context Manager Support
from memvid_sdk import create
# Automatically closes when done
with create('project.mv2') as mem:
mem.put('Title', 'label', {}, text='Content')
results = mem.find('query')
Data Ingestion
put() - Add Single Document
mem.put(
'Document Title', # title (required, positional)
'label', # label (required, positional)
{}, # metadata dict (required, can be {})
# Content (one of these)
text='Document content...',
file='/path/to/document.pdf',
# Optional
uri='mv2://docs/intro',
tags=['api', 'v2'],
labels=['public', 'reviewed'],
kind='markdown',
track='documentation',
# Embeddings
enable_embedding=True,
embedding_model='bge-small', # or 'openai', 'nomic', etc.
vector_compression=True, # 16x compression with PQ
# Behavior
auto_tag=True, # Auto-generate tags
extract_dates=True # Extract date mentions
)
put_many() - Batch Ingestion
docs = [
{'title': 'Doc 1', 'label': 'kb', 'text': 'First document'},
{'title': 'Doc 2', 'label': 'kb', 'text': 'Second document'},
{'title': 'Doc 3', 'label': 'kb', 'text': 'Third document'}
]
frame_ids = mem.put_many(
docs,
embedder=openai_embeddings, # Custom embedder
opts={
'compression_level': 3,
'enable_embedding': True,
'embedding_model': 'bge-small'
}
)
put_file() - Document Parsing
Ingest documents directly from files. Supports PDF, DOCX, XLSX, PPTX, and more. The SDK automatically extracts text content and creates searchable frames.
# Single file ingestion
frames = mem.put_file('/path/to/report.pdf')
print(f'Ingested {len(frames)} frames from PDF')
# With options
frames = mem.put_file(
'/path/to/presentation.pptx',
chunk_size=1000, # Characters per chunk
chunk_overlap=200, # Overlap between chunks
enable_embedding=True, # Generate embeddings
embedding_model='bge-small'
)
# Excel/XLSX files
frames = mem.put_file('/path/to/data.xlsx')
# Each sheet becomes searchable content
# Word documents
frames = mem.put_file('/path/to/document.docx')
put_files() - Batch Document Ingestion
Ingest multiple documents at once:
files = [
'/path/to/report.pdf',
'/path/to/slides.pptx',
'/path/to/data.xlsx',
'/path/to/notes.docx'
]
all_frames = mem.put_files(
files,
chunk_size=1000,
enable_embedding=True
)
print(f'Total frames: {len(all_frames)}')
Supported Formats:
- PDF - Text extraction with page-aware chunking
- DOCX - Microsoft Word documents
- XLSX - Excel spreadsheets (all sheets, formulas evaluated)
- PPTX - PowerPoint presentations (slide text and notes)
Dependencies: Document parsing requires optional dependencies. Install them with:pip install memvid-sdk[documents]
# or install individually:
pip install pypdf openpyxl python-pptx python-docx
For XLSX files with formulas, the SDK extracts the calculated values, not the formula text. This ensures searchable, meaningful content.
Search & Retrieval
find() - Hybrid Search
# Simple search
results = mem.find('budget projections')
# With options
results = mem.find(
'financial outlook',
mode='auto', # 'lex', 'sem', 'auto'
k=10, # Number of results
snippet_chars=480, # Snippet length
scope='track:meetings', # Scope filter
# Adaptive retrieval
adaptive=True,
min_relevancy=0.5,
max_k=100,
adaptive_strategy='combined', # 'relative', 'absolute', 'cliff', 'elbow'
# Time-travel
as_of_frame=100,
as_of_ts=1704067200,
# Custom embeddings
embedder=custom_embedder,
query_embedding_model='openai'
)
for hit in results['hits']:
print(hit['title'], hit['snippet'])
ask() - LLM Q&A
answer = mem.ask(
'What was decided about the budget?',
k=8,
mode='auto',
# LLM settings
model='gpt-4o-mini',
api_key=os.environ['OPENAI_API_KEY'],
llm_context_chars=120000,
# Privacy
mask_pii=True,
# Time filters
since=1704067200,
until=1706745600,
# Options
context_only=False, # Set True to skip synthesis
# Adaptive retrieval
adaptive=True,
min_relevancy=0.5
)
print(answer['answer'])
print(answer['hits']) # Source documents
Grounding & Hallucination Detection
The ask() response includes a grounding object that measures how well the answer is supported by context:
answer = mem.ask(
'What is the API endpoint?',
model='gpt-4o-mini',
api_key=os.environ['OPENAI_API_KEY']
)
# Check grounding quality
print(answer['grounding'])
# {
# 'score': 0.85,
# 'label': 'HIGH', # 'LOW', 'MEDIUM', or 'HIGH'
# 'sentence_count': 3,
# 'grounded_sentences': 3,
# 'has_warning': False,
# 'warning_reason': None
# }
# Check if follow-up is needed
follow_up = answer.get('follow_up')
if follow_up and follow_up['needed']:
print('Low confidence:', follow_up['reason'])
print('Try these instead:', follow_up['suggestions'])
Grounding Fields:
| Field | Type | Description |
|---|
score | float | Grounding score from 0.0 to 1.0 |
label | str | Quality label: LOW, MEDIUM, or HIGH |
sentence_count | int | Sentences in the answer |
grounded_sentences | int | Sentences supported by context |
has_warning | bool | True if answer may be hallucinated |
warning_reason | str? | Explanation if warning is present |
Follow-up Fields:
| Field | Type | Description |
|---|
needed | bool | True if answer confidence is low |
reason | str | Why confidence is low |
hint | str | Helpful hint for the user |
available_topics | list[str] | Topics in this memory |
suggestions | list[str] | Suggested follow-up questions |
correct() - Ground Truth Corrections
Store authoritative corrections that take priority in future retrievals:
# Store a correction
frame_id = mem.correct('Ben Koenig reported to Chloe Nguyen before 2025')
# With options
frame_id = mem.correct(
'The API rate limit is 1000 req/min',
source='Engineering Team - Jan 2025',
topics=['API', 'rate limiting'],
boost=2.5 # Higher retrieval priority (default: 2.0)
)
# Batch corrections
frame_ids = mem.correct_many([
{'statement': 'OAuth tokens expire after 24 hours', 'topics': ['auth', 'OAuth']},
{'statement': 'Production DB is db.prod.example.com', 'source': 'Ops Team'}
])
# Verify correction is retrievable
results = mem.find('Ben Koenig reported to')
print(results['hits'][0]['snippet']) # Should show the correction
Use correct() to fix hallucinations or add verified facts. Corrections receive boosted retrieval scores and are labeled [Correction] in results.
Automatic Enrichment
# Extract facts using rules engine (fast, offline)
result = mem.enrich(engine='rules')
# Extract with LLM (more accurate)
result = mem.enrich(engine='openai')
# View extracted cards
memories = mem.memories()
print(memories['cards'])
# Filter by entity
alice_cards = mem.memories(entity='Alice')
# Get entity state (O(1) lookup)
alice = mem.state('Alice')
print(alice['slots'])
# {'employer': 'Anthropic', 'role': 'Engineer', 'location': 'SF'}
# Get stats
stats = mem.memories_stats()
print(stats['entityCount'], stats['cardCount'])
# List all entities
entities = mem.memory_entities()
Manual Memory Cards
# Add SPO triplets directly
result = mem.add_memory_cards([
{'entity': 'Alice', 'slot': 'employer', 'value': 'Anthropic'},
{'entity': 'Alice', 'slot': 'role', 'value': 'Senior Engineer'},
{'entity': 'Bob', 'slot': 'team', 'value': 'Infrastructure'}
])
print(result['added'], result['ids'])
Export Facts
# Export to JSON
json_data = mem.export_facts(format='json')
# Export to CSV
csv_data = mem.export_facts(format='csv', entity='Alice')
# Export to N-Triples (RDF)
ntriples = mem.export_facts(format='ntriples')
# Extract tables from PDF
result = mem.put_pdf_tables('financial-report.pdf', embed_rows=True)
print(f"Extracted {result['tables_count']} tables")
# List all tables
tables = mem.list_tables()
for table in tables:
print(table['table_id'], table['n_rows'], table['n_cols'])
# Get table data
data = mem.get_table('tbl_001', format='dict')
csv_data = mem.get_table('tbl_001', format='csv')
Time-Travel & Sessions
Timeline Queries
timeline = mem.timeline(
limit=50,
since=1704067200,
until=1706745600,
reverse=True,
as_of_frame=100
)
Session Recording
# Start recording
session_id = mem.session_start('qa-test')
# Perform operations
mem.find('test query')
mem.ask('What happened?')
# Add checkpoint
mem.session_checkpoint()
# End session
summary = mem.session_end()
# List sessions
sessions = mem.session_list()
# Replay session with different params
replay = mem.session_replay(session_id, top_k=10, adaptive=True)
print(replay['match_rate'])
# Delete session
mem.session_delete(session_id)
Encryption & Security
from memvid_sdk import lock, unlock, lock_who, lock_nudge
# Encrypt to .mv2e capsule
encrypted_path = lock(
'project.mv2',
password='secret',
force=True
)
# Decrypt back to .mv2
decrypted_path = unlock(
'project.mv2e',
password='secret'
)
# Check who has the lock
lock_info = lock_who('project.mv2')
# Nudge stale lock
released = lock_nudge('project.mv2')
Tickets & Capacity
# Get current capacity
capacity = mem.get_capacity()
# Get current ticket info
ticket = mem.current_ticket()
# Sync tickets from dashboard
result = mem.sync_tickets('mem_abc123', api_key)
# Apply ticket manually
mem.apply_ticket(ticket_string)
# Get memory binding
binding = mem.get_memory_binding()
# Unbind from dashboard
mem.unbind_memory()
Cloud Project & Memory Management
Programmatically create projects and memories on the Memvid dashboard, then bind local .mv2 files to them.
from memvid_sdk import configure
configure({
"api_key": "mv2_your_api_key_here",
"dashboard_url": "https://memvid.com"
})
Create and List Projects
from memvid_sdk import create_project, list_projects
# Create a new project
project = create_project(
"My AI Project",
description="Knowledge base for my AI agent"
)
print(f"Project ID: {project['id']}")
print(f"Slug: {project['slug']}")
# List all projects
projects = list_projects()
for proj in projects:
print(f"{proj['name']} ({proj['id']})")
Project Response Fields:
| Field | Type | Description |
|---|
id | str | Unique project ID |
organisation_id | str | Organisation ID |
slug | str | URL-friendly slug |
name | str | Project name |
description | str? | Optional description |
created_at | str | ISO 8601 timestamp |
updated_at | str | ISO 8601 timestamp |
Create and List Memories
from memvid_sdk import create_memory, list_memories
# Create a memory in a project
memory = create_memory(
"Agent Memory",
description="Long-term memory for chatbot",
project_id=project["id"]
)
print(f"Memory ID: {memory['id']}")
print(f"Display Name: {memory['display_name']}")
# List all memories
all_memories = list_memories()
# List memories in a specific project
project_memories = list_memories(project_id=project["id"])
Bind Local File to Cloud Memory
from memvid_sdk import create
# Create local .mv2 file bound to cloud memory
mv = create("./agent.mv2", memory_id=memory["id"])
mv.enable_lex() # Enable lexical search
# Add content
mv.put(title="Meeting Notes", label="notes", metadata={}, text="Today we discussed...")
# Search
results = mv.find("discussed", k=5)
# Close
mv.seal()
Complete Example
import os
from memvid_sdk import configure, create_project, create_memory, create
# 1. Configure SDK
configure({"api_key": os.environ["MEMVID_API_KEY"]})
# 2. Create project
project = create_project("Knowledge Base", description="Company docs")
# 3. Create cloud memory in project
memory = create_memory("Docs Memory", project_id=project["id"])
# 4. Create local file bound to cloud memory
mv = create("./docs.mv2", memory_id=memory["id"])
mv.enable_lex()
# 5. Add content
mv.put(title="API Guide", label="docs", metadata={}, text="API documentation...")
mv.put(title="FAQ", label="docs", metadata={}, text="Frequently asked questions...")
# 6. Search
results = mv.find("API", k=5)
print(f"Found {len(results['hits'])} results")
# 7. Clean up
mv.seal()
Embedding Providers
External Providers
from memvid_sdk.embeddings import (
OpenAIEmbeddings,
GeminiEmbeddings,
MistralEmbeddings,
CohereEmbeddings,
VoyageEmbeddings,
NvidiaEmbeddings,
HuggingFaceEmbeddings,
get_embedder
)
# OpenAI
openai = OpenAIEmbeddings(
api_key=os.environ['OPENAI_API_KEY'],
model='text-embedding-3-small' # or 'text-embedding-3-large'
)
# Gemini
gemini = GeminiEmbeddings(
api_key=os.environ['GEMINI_API_KEY'],
model='text-embedding-004'
)
# Mistral
mistral = MistralEmbeddings(
api_key=os.environ['MISTRAL_API_KEY']
)
# Cohere
cohere = CohereEmbeddings(
api_key=os.environ['COHERE_API_KEY'],
model='embed-english-v3.0'
)
# Voyage
voyage = VoyageEmbeddings(
api_key=os.environ['VOYAGE_API_KEY'],
model='voyage-3'
)
# NVIDIA
nvidia = NvidiaEmbeddings(
api_key=os.environ['NVIDIA_API_KEY']
)
# HuggingFace (local)
hf = HuggingFaceEmbeddings(model='all-MiniLM-L6-v2')
# Factory function
embedder = get_embedder('openai', api_key='...')
# Use with put_many
mem.put_many(docs, embedder=openai)
# Use with find
mem.find('query', embedder=gemini)
Local Embeddings (No API Required)
from memvid_sdk.embeddings import LOCAL_EMBEDDING_MODELS
mem.put(
'Title', 'label', {},
text='content',
enable_embedding=True,
embedding_model=LOCAL_EMBEDDING_MODELS['BGE_SMALL'] # 384d, fast
)
# Available local models
LOCAL_EMBEDDING_MODELS['BGE_SMALL'] # 384d - fastest
LOCAL_EMBEDDING_MODELS['BGE_BASE'] # 768d - balanced
LOCAL_EMBEDDING_MODELS['NOMIC'] # 768d - general purpose
LOCAL_EMBEDDING_MODELS['GTE_LARGE'] # 1024d - highest quality
Error Handling
from memvid_sdk import (
MemvidError,
CapacityExceededError, # MV001
TicketInvalidError, # MV002
TicketReplayError, # MV003
LexIndexDisabledError, # MV004
TimeIndexMissingError, # MV005
VerifyFailedError, # MV006
LockedError, # MV007
ApiKeyRequiredError, # MV008
MemoryAlreadyBoundError, # MV009
FrameNotFoundError, # MV010
VecIndexDisabledError, # MV011
CorruptFileError, # MV012
VecDimensionMismatchError, # MV014
EmbeddingFailedError, # MV015
EncryptionError, # MV016
NerModelNotAvailableError, # MV017
ClipIndexDisabledError # MV018
)
try:
mem.put('Large file', 'data', {}, file='huge.bin')
except CapacityExceededError:
print('Storage capacity exceeded (MV001)')
except LockedError:
print('File locked by another process (MV007)')
except VecIndexDisabledError:
print('Enable vector index first (MV011)')
# Get frame by URI
frame = mem.frame('mv2://docs/intro')
# Get raw binary data
data = mem.blob('mv2://docs/intro')
Utility Functions
from memvid_sdk import info, flush_analytics, is_telemetry_enabled, verify_single_file
# Get SDK info
sdk_info = info()
print(sdk_info['version'], sdk_info['platform'])
# Flush analytics
flush_analytics()
# Check telemetry status
enabled = is_telemetry_enabled()
# Verify no auxiliary files
verify_single_file('project.mv2')
Environment Variables
| Variable | Description |
|---|
MEMVID_API_KEY | Dashboard API key for sync |
OPENAI_API_KEY | OpenAI embeddings and LLM |
GEMINI_API_KEY | Gemini embeddings |
MISTRAL_API_KEY | Mistral embeddings |
COHERE_API_KEY | Cohere embeddings |
VOYAGE_API_KEY | Voyage embeddings |
NVIDIA_API_KEY | NVIDIA embeddings |
ANTHROPIC_API_KEY | Claude for entities |
MEMVID_MODELS_DIR | Model cache directory |
MEMVID_OFFLINE | Use cached models only |
Next Steps