Codestin Search App

Build AI applications, agents, and data pipelines with the Memvid Python SDK. Native Rust bindings deliver high performance with a Pythonic API.

Installation

pip install memvid-sdk

Requirements: Python 3.8+, macOS/Linux/Windows. Native bindings included - no extra dependencies needed.

create() will OVERWRITE existing files without warning!

Function	Purpose	If File Exists	Parameter Order
`create(path)`	Create new .mv2 file	DELETES all data	path first (kind is keyword-only)
`use(kind, path)`	Open existing .mv2 file	Preserves data	kind first, then path

Always check if the file exists before choosing — see example below.

Quick Start

from memvid_sdk import create, use
import os

path = 'knowledge.mv2'

# CRITICAL: Check if file exists to avoid data loss!
if os.path.exists(path):
    mem = use('basic', path)  # Open existing file (kind first!)
else:
    mem = create(path)  # Create new file (path first!)

# Lexical search (BM25) is enabled by default - no need to call enable_lex()

# Add documents
mem.put(
    title='Meeting Notes',
    label='notes',
    metadata={'source': 'slack'},
    text='Alice mentioned she works at Anthropic...'
)

# Search works immediately
results = mem.find('who works at AI companies?', k=5, mode='lex')
print(results['hits'])

# Ask questions
answer = mem.ask('What does Alice do?', k=5, mode='lex')
print(answer['answer'])

# Seal when done (commits changes)
mem.seal()

API Reference

Category	Methods	Description
File Operations	`create`, `use`, `close`	Create, open, close memory files
Data Ingestion	`put`, `put_many`, `put_file`, `put_files`	Add documents with embeddings
Search	`find`, `ask`, `timeline`	Query your memory
Corrections	`correct`, `correct_many`	Store ground truth with retrieval boost
Memory Cards	`memories`, `state`, `enrich`, `add_memory_cards`	Structured fact extraction
Tables	`put_pdf_tables`, `list_tables`, `get_table`	PDF table extraction
Sessions	`session_start`, `session_end`, `session_replay`	Time-travel debugging
Tickets	`sync_tickets`, `current_ticket`, `get_capacity`	Capacity management
Security	`lock`, `unlock`, `lock_who`, `lock_nudge`	Encryption and access control
Utilities	`verify`, `doctor`	Maintenance and utilities

Context Manager

import memvid_sdk as memvid

# Automatically closes when done
with memvid.use('basic', 'memory.mv2') as mem:
    mem.put('Doc', 'test', {}, text='Content')
    results = mem.find('query')

Core Functions

File Operations

from memvid_sdk import create, use, lock, unlock, info

# Create new memory file
mem = create('project.mv2')

# With options
mem = create(
    'project.mv2',
    enable_vec=True,      # Enable vector index
    enable_lex=True,      # Enable lexical index
    memory_id='mem_abc',  # Bind to dashboard
    api_key='mv_live_...' # Dashboard API key
)

# Open existing memory with adapter
mem = use('basic', 'project.mv2', mode='open')

# Available adapters
mem = use('langchain', 'file.mv2')
mem = use('llamaindex', 'file.mv2')
mem = use('crewai', 'file.mv2')
mem = use('autogen', 'file.mv2')
mem = use('haystack', 'file.mv2')
mem = use('langgraph', 'file.mv2')
mem = use('semantic-kernel', 'file.mv2')
mem = use('openai', 'file.mv2')
mem = use('google-adk', 'file.mv2')

# Get SDK info
sdk_info = info()

# Verify file integrity
result = mem.verify(deep=True)

# Repair and optimize
result = mem.doctor(
    rebuild_time_index=True,
    rebuild_vec_index=True,
    vacuum=True
)

Context Manager Support

from memvid_sdk import create

# Automatically closes when done
with create('project.mv2') as mem:
    mem.put('Title', 'label', {}, text='Content')
    results = mem.find('query')

Data Ingestion

put() - Add Single Document

mem.put(
    'Document Title',      # title (required, positional)
    'label',               # label (required, positional)
    {},                    # metadata dict (required, can be {})

    # Content (one of these)
    text='Document content...',
    file='/path/to/document.pdf',

    # Optional
    uri='mv2://docs/intro',
    tags=['api', 'v2'],
    labels=['public', 'reviewed'],
    kind='markdown',
    track='documentation',

    # Embeddings
    enable_embedding=True,
    embedding_model='bge-small',   # or 'openai', 'nomic', etc.
    vector_compression=True,       # 16x compression with PQ

    # Behavior
    auto_tag=True,                 # Auto-generate tags
    extract_dates=True             # Extract date mentions
)

put_many() - Batch Ingestion

docs = [
    {'title': 'Doc 1', 'label': 'kb', 'text': 'First document'},
    {'title': 'Doc 2', 'label': 'kb', 'text': 'Second document'},
    {'title': 'Doc 3', 'label': 'kb', 'text': 'Third document'}
]

frame_ids = mem.put_many(
    docs,
    embedder=openai_embeddings,  # Custom embedder
    opts={
        'compression_level': 3,
        'enable_embedding': True,
        'embedding_model': 'bge-small'
    }
)

put_file() - Document Parsing

Ingest documents directly from files. Supports PDF, DOCX, XLSX, PPTX, and more. The SDK automatically extracts text content and creates searchable frames.

# Single file ingestion
frames = mem.put_file('/path/to/report.pdf')
print(f'Ingested {len(frames)} frames from PDF')

# With options
frames = mem.put_file(
    '/path/to/presentation.pptx',
    chunk_size=1000,           # Characters per chunk
    chunk_overlap=200,         # Overlap between chunks
    enable_embedding=True,     # Generate embeddings
    embedding_model='bge-small'
)

# Excel/XLSX files
frames = mem.put_file('/path/to/data.xlsx')
# Each sheet becomes searchable content

# Word documents
frames = mem.put_file('/path/to/document.docx')

put_files() - Batch Document Ingestion

Ingest multiple documents at once:

files = [
    '/path/to/report.pdf',
    '/path/to/slides.pptx',
    '/path/to/data.xlsx',
    '/path/to/notes.docx'
]

all_frames = mem.put_files(
    files,
    chunk_size=1000,
    enable_embedding=True
)

print(f'Total frames: {len(all_frames)}')

Supported Formats:

PDF - Text extraction with page-aware chunking
DOCX - Microsoft Word documents
XLSX - Excel spreadsheets (all sheets, formulas evaluated)
PPTX - PowerPoint presentations (slide text and notes)

Dependencies: Document parsing requires optional dependencies. Install them with:

pip install memvid-sdk[documents]
# or install individually:
pip install pypdf openpyxl python-pptx python-docx

For XLSX files with formulas, the SDK extracts the calculated values, not the formula text. This ensures searchable, meaningful content.

Search & Retrieval

find() - Hybrid Search

# Simple search
results = mem.find('budget projections')

# With options
results = mem.find(
    'financial outlook',
    mode='auto',              # 'lex', 'sem', 'auto'
    k=10,                     # Number of results
    snippet_chars=480,        # Snippet length
    scope='track:meetings',   # Scope filter

    # Adaptive retrieval
    adaptive=True,
    min_relevancy=0.5,
    max_k=100,
    adaptive_strategy='combined',  # 'relative', 'absolute', 'cliff', 'elbow'

    # Time-travel
    as_of_frame=100,
    as_of_ts=1704067200,

    # Custom embeddings
    embedder=custom_embedder,
    query_embedding_model='openai'
)

for hit in results['hits']:
    print(hit['title'], hit['snippet'])

ask() - LLM Q&A

answer = mem.ask(
    'What was decided about the budget?',
    k=8,
    mode='auto',

    # LLM settings
    model='gpt-4o-mini',
    api_key=os.environ['OPENAI_API_KEY'],
    llm_context_chars=120000,

    # Privacy
    mask_pii=True,

    # Time filters
    since=1704067200,
    until=1706745600,

    # Options
    context_only=False,    # Set True to skip synthesis

    # Adaptive retrieval
    adaptive=True,
    min_relevancy=0.5
)

print(answer['answer'])
print(answer['hits'])  # Source documents

Grounding & Hallucination Detection

The ask() response includes a grounding object that measures how well the answer is supported by context:

answer = mem.ask(
    'What is the API endpoint?',
    model='gpt-4o-mini',
    api_key=os.environ['OPENAI_API_KEY']
)

# Check grounding quality
print(answer['grounding'])
# {
#   'score': 0.85,
#   'label': 'HIGH',           # 'LOW', 'MEDIUM', or 'HIGH'
#   'sentence_count': 3,
#   'grounded_sentences': 3,
#   'has_warning': False,
#   'warning_reason': None
# }

# Check if follow-up is needed
follow_up = answer.get('follow_up')
if follow_up and follow_up['needed']:
    print('Low confidence:', follow_up['reason'])
    print('Try these instead:', follow_up['suggestions'])

Grounding Fields:

Field	Type	Description
`score`	`float`	Grounding score from 0.0 to 1.0
`label`	`str`	Quality label: `LOW`, `MEDIUM`, or `HIGH`
`sentence_count`	`int`	Sentences in the answer
`grounded_sentences`	`int`	Sentences supported by context
`has_warning`	`bool`	True if answer may be hallucinated
`warning_reason`	`str?`	Explanation if warning is present

Follow-up Fields:

Field	Type	Description
`needed`	`bool`	True if answer confidence is low
`reason`	`str`	Why confidence is low
`hint`	`str`	Helpful hint for the user
`available_topics`	`list[str]`	Topics in this memory
`suggestions`	`list[str]`	Suggested follow-up questions

correct() - Ground Truth Corrections

Store authoritative corrections that take priority in future retrievals:

# Store a correction
frame_id = mem.correct('Ben Koenig reported to Chloe Nguyen before 2025')

# With options
frame_id = mem.correct(
    'The API rate limit is 1000 req/min',
    source='Engineering Team - Jan 2025',
    topics=['API', 'rate limiting'],
    boost=2.5  # Higher retrieval priority (default: 2.0)
)

# Batch corrections
frame_ids = mem.correct_many([
    {'statement': 'OAuth tokens expire after 24 hours', 'topics': ['auth', 'OAuth']},
    {'statement': 'Production DB is db.prod.example.com', 'source': 'Ops Team'}
])

# Verify correction is retrievable
results = mem.find('Ben Koenig reported to')
print(results['hits'][0]['snippet'])  # Should show the correction

Use correct() to fix hallucinations or add verified facts. Corrections receive boosted retrieval scores and are labeled [Correction] in results.

Memory Cards (Entity Extraction)

Automatic Enrichment

# Extract facts using rules engine (fast, offline)
result = mem.enrich(engine='rules')

# Extract with LLM (more accurate)
result = mem.enrich(engine='openai')

# View extracted cards
memories = mem.memories()
print(memories['cards'])

# Filter by entity
alice_cards = mem.memories(entity='Alice')

# Get entity state (O(1) lookup)
alice = mem.state('Alice')
print(alice['slots'])
# {'employer': 'Anthropic', 'role': 'Engineer', 'location': 'SF'}

# Get stats
stats = mem.memories_stats()
print(stats['entityCount'], stats['cardCount'])

# List all entities
entities = mem.memory_entities()

Manual Memory Cards

# Add SPO triplets directly
result = mem.add_memory_cards([
    {'entity': 'Alice', 'slot': 'employer', 'value': 'Anthropic'},
    {'entity': 'Alice', 'slot': 'role', 'value': 'Senior Engineer'},
    {'entity': 'Bob', 'slot': 'team', 'value': 'Infrastructure'}
])

print(result['added'], result['ids'])

Export Facts

# Export to JSON
json_data = mem.export_facts(format='json')

# Export to CSV
csv_data = mem.export_facts(format='csv', entity='Alice')

# Export to N-Triples (RDF)
ntriples = mem.export_facts(format='ntriples')

Table Extraction

# Extract tables from PDF
result = mem.put_pdf_tables('financial-report.pdf', embed_rows=True)
print(f"Extracted {result['tables_count']} tables")

# List all tables
tables = mem.list_tables()
for table in tables:
    print(table['table_id'], table['n_rows'], table['n_cols'])

# Get table data
data = mem.get_table('tbl_001', format='dict')
csv_data = mem.get_table('tbl_001', format='csv')

Time-Travel & Sessions

Timeline Queries

timeline = mem.timeline(
    limit=50,
    since=1704067200,
    until=1706745600,
    reverse=True,
    as_of_frame=100
)

Session Recording

# Start recording
session_id = mem.session_start('qa-test')

# Perform operations
mem.find('test query')
mem.ask('What happened?')

# Add checkpoint
mem.session_checkpoint()

# End session
summary = mem.session_end()

# List sessions
sessions = mem.session_list()

# Replay session with different params
replay = mem.session_replay(session_id, top_k=10, adaptive=True)
print(replay['match_rate'])

# Delete session
mem.session_delete(session_id)

Encryption & Security

from memvid_sdk import lock, unlock, lock_who, lock_nudge

# Encrypt to .mv2e capsule
encrypted_path = lock(
    'project.mv2',
    password='secret',
    force=True
)

# Decrypt back to .mv2
decrypted_path = unlock(
    'project.mv2e',
    password='secret'
)

# Check who has the lock
lock_info = lock_who('project.mv2')

# Nudge stale lock
released = lock_nudge('project.mv2')

Tickets & Capacity

# Get current capacity
capacity = mem.get_capacity()

# Get current ticket info
ticket = mem.current_ticket()

# Sync tickets from dashboard
result = mem.sync_tickets('mem_abc123', api_key)

# Apply ticket manually
mem.apply_ticket(ticket_string)

# Get memory binding
binding = mem.get_memory_binding()

# Unbind from dashboard
mem.unbind_memory()

Cloud Project & Memory Management

Programmatically create projects and memories on the Memvid dashboard, then bind local .mv2 files to them.

Configure SDK

from memvid_sdk import configure

configure({
    "api_key": "mv2_your_api_key_here",
    "dashboard_url": "https://memvid.com"
})

Create and List Projects

from memvid_sdk import create_project, list_projects

# Create a new project
project = create_project(
    "My AI Project",
    description="Knowledge base for my AI agent"
)
print(f"Project ID: {project['id']}")
print(f"Slug: {project['slug']}")

# List all projects
projects = list_projects()
for proj in projects:
    print(f"{proj['name']} ({proj['id']})")

Project Response Fields:

Field	Type	Description
`id`	`str`	Unique project ID
`organisation_id`	`str`	Organisation ID
`slug`	`str`	URL-friendly slug
`name`	`str`	Project name
`description`	`str?`	Optional description
`created_at`	`str`	ISO 8601 timestamp
`updated_at`	`str`	ISO 8601 timestamp

Create and List Memories

from memvid_sdk import create_memory, list_memories

# Create a memory in a project
memory = create_memory(
    "Agent Memory",
    description="Long-term memory for chatbot",
    project_id=project["id"]
)
print(f"Memory ID: {memory['id']}")
print(f"Display Name: {memory['display_name']}")

# List all memories
all_memories = list_memories()

# List memories in a specific project
project_memories = list_memories(project_id=project["id"])

Bind Local File to Cloud Memory

from memvid_sdk import create

# Create local .mv2 file bound to cloud memory
mv = create("./agent.mv2", memory_id=memory["id"])
mv.enable_lex()  # Enable lexical search

# Add content
mv.put(title="Meeting Notes", label="notes", metadata={}, text="Today we discussed...")

# Search
results = mv.find("discussed", k=5)

# Close
mv.seal()

Complete Example

import os
from memvid_sdk import configure, create_project, create_memory, create

# 1. Configure SDK
configure({"api_key": os.environ["MEMVID_API_KEY"]})

# 2. Create project
project = create_project("Knowledge Base", description="Company docs")

# 3. Create cloud memory in project
memory = create_memory("Docs Memory", project_id=project["id"])

# 4. Create local file bound to cloud memory
mv = create("./docs.mv2", memory_id=memory["id"])
mv.enable_lex()

# 5. Add content
mv.put(title="API Guide", label="docs", metadata={}, text="API documentation...")
mv.put(title="FAQ", label="docs", metadata={}, text="Frequently asked questions...")

# 6. Search
results = mv.find("API", k=5)
print(f"Found {len(results['hits'])} results")

# 7. Clean up
mv.seal()

Embedding Providers

External Providers

from memvid_sdk.embeddings import (
    OpenAIEmbeddings,
    GeminiEmbeddings,
    MistralEmbeddings,
    CohereEmbeddings,
    VoyageEmbeddings,
    NvidiaEmbeddings,
    HuggingFaceEmbeddings,
    get_embedder
)

# OpenAI
openai = OpenAIEmbeddings(
    api_key=os.environ['OPENAI_API_KEY'],
    model='text-embedding-3-small'  # or 'text-embedding-3-large'
)

# Gemini
gemini = GeminiEmbeddings(
    api_key=os.environ['GEMINI_API_KEY'],
    model='text-embedding-004'
)

# Mistral
mistral = MistralEmbeddings(
    api_key=os.environ['MISTRAL_API_KEY']
)

# Cohere
cohere = CohereEmbeddings(
    api_key=os.environ['COHERE_API_KEY'],
    model='embed-english-v3.0'
)

# Voyage
voyage = VoyageEmbeddings(
    api_key=os.environ['VOYAGE_API_KEY'],
    model='voyage-3'
)

# NVIDIA
nvidia = NvidiaEmbeddings(
    api_key=os.environ['NVIDIA_API_KEY']
)

# HuggingFace (local)
hf = HuggingFaceEmbeddings(model='all-MiniLM-L6-v2')

# Factory function
embedder = get_embedder('openai', api_key='...')

# Use with put_many
mem.put_many(docs, embedder=openai)

# Use with find
mem.find('query', embedder=gemini)

Local Embeddings (No API Required)

from memvid_sdk.embeddings import LOCAL_EMBEDDING_MODELS

mem.put(
    'Title', 'label', {},
    text='content',
    enable_embedding=True,
    embedding_model=LOCAL_EMBEDDING_MODELS['BGE_SMALL']  # 384d, fast
)

# Available local models
LOCAL_EMBEDDING_MODELS['BGE_SMALL']   # 384d - fastest
LOCAL_EMBEDDING_MODELS['BGE_BASE']    # 768d - balanced
LOCAL_EMBEDDING_MODELS['NOMIC']       # 768d - general purpose
LOCAL_EMBEDDING_MODELS['GTE_LARGE']   # 1024d - highest quality

Error Handling

from memvid_sdk import (
    MemvidError,
    CapacityExceededError,      # MV001
    TicketInvalidError,         # MV002
    TicketReplayError,          # MV003
    LexIndexDisabledError,      # MV004
    TimeIndexMissingError,      # MV005
    VerifyFailedError,          # MV006
    LockedError,                # MV007
    ApiKeyRequiredError,        # MV008
    MemoryAlreadyBoundError,    # MV009
    FrameNotFoundError,         # MV010
    VecIndexDisabledError,      # MV011
    CorruptFileError,           # MV012
    VecDimensionMismatchError,  # MV014
    EmbeddingFailedError,       # MV015
    EncryptionError,            # MV016
    NerModelNotAvailableError,  # MV017
    ClipIndexDisabledError      # MV018
)

try:
    mem.put('Large file', 'data', {}, file='huge.bin')
except CapacityExceededError:
    print('Storage capacity exceeded (MV001)')
except LockedError:
    print('File locked by another process (MV007)')
except VecIndexDisabledError:
    print('Enable vector index first (MV011)')

Asset Extraction

# Get frame by URI
frame = mem.frame('mv2://docs/intro')

# Get raw binary data
data = mem.blob('mv2://docs/intro')

Utility Functions

from memvid_sdk import info, flush_analytics, is_telemetry_enabled, verify_single_file

# Get SDK info
sdk_info = info()
print(sdk_info['version'], sdk_info['platform'])

# Flush analytics
flush_analytics()

# Check telemetry status
enabled = is_telemetry_enabled()

# Verify no auxiliary files
verify_single_file('project.mv2')

Environment Variables

Variable	Description
`MEMVID_API_KEY`	Dashboard API key for sync
`OPENAI_API_KEY`	OpenAI embeddings and LLM
`GEMINI_API_KEY`	Gemini embeddings
`MISTRAL_API_KEY`	Mistral embeddings
`COHERE_API_KEY`	Cohere embeddings
`VOYAGE_API_KEY`	Voyage embeddings
`NVIDIA_API_KEY`	NVIDIA embeddings
`ANTHROPIC_API_KEY`	Claude for entities
`MEMVID_MODELS_DIR`	Model cache directory
`MEMVID_OFFLINE`	Use cached models only

Next Steps

Quickstart

Build your first AI memory in 5 minutes

Embedding Providers

Compare local and external embedding options

Framework Integrations

LangChain, LlamaIndex, and more

Memory Cards

O(1) entity lookups and fact extraction

Get Started

Comparisons

Install

Hosting

Architecture

Search & Retrieval

Enrichment

Media Processing

Embeddings

Security & Limits

Performance

CLI

Python SDK

Node.js SDK

Examples & Packages

Testing

Help

​Installation

​Quick Start

​API Reference

​Context Manager

​Core Functions

​File Operations

​Context Manager Support

​Data Ingestion

​put() - Add Single Document

​put_many() - Batch Ingestion

​put_file() - Document Parsing

​put_files() - Batch Document Ingestion

​Search & Retrieval

​find() - Hybrid Search

​ask() - LLM Q&A

​Grounding & Hallucination Detection

​correct() - Ground Truth Corrections

​Memory Cards (Entity Extraction)

​Automatic Enrichment

​Manual Memory Cards

​Export Facts

​Table Extraction

​Time-Travel & Sessions

​Timeline Queries

​Session Recording

​Encryption & Security

​Tickets & Capacity

​Cloud Project & Memory Management

​Configure SDK

​Create and List Projects

​Create and List Memories

​Bind Local File to Cloud Memory

​Complete Example

​Embedding Providers

​External Providers

​Local Embeddings (No API Required)

​Error Handling

​Asset Extraction

​Utility Functions

​Environment Variables

​Next Steps

Quickstart

Embedding Providers

Framework Integrations

Memory Cards

Installation

Quick Start

API Reference

Context Manager

Core Functions

File Operations

Context Manager Support

Data Ingestion

put() - Add Single Document

put_many() - Batch Ingestion

put_file() - Document Parsing

put_files() - Batch Document Ingestion

Search & Retrieval

find() - Hybrid Search

ask() - LLM Q&A

Grounding & Hallucination Detection

correct() - Ground Truth Corrections

Memory Cards (Entity Extraction)

Automatic Enrichment

Manual Memory Cards

Export Facts

Table Extraction

Time-Travel & Sessions

Timeline Queries

Session Recording

Encryption & Security

Tickets & Capacity

Cloud Project & Memory Management

Configure SDK

Create and List Projects

Create and List Memories

Bind Local File to Cloud Memory

Complete Example

Embedding Providers

External Providers

Local Embeddings (No API Required)

Error Handling

Asset Extraction

Utility Functions

Environment Variables

Next Steps