Thanks to visit codestin.com
Credit goes to docs.memvid.com

Skip to main content
Build AI applications, agents, and data pipelines with the Memvid Python SDK. Native Rust bindings deliver high performance with a Pythonic API.

Installation

pip install memvid-sdk
Requirements: Python 3.8+, macOS/Linux/Windows. Native bindings included - no extra dependencies needed.

create() will OVERWRITE existing files without warning!
FunctionPurposeIf File ExistsParameter Order
create(path)Create new .mv2 fileDELETES all datapath first (kind is keyword-only)
use(kind, path)Open existing .mv2 filePreserves datakind first, then path
Always check if the file exists before choosing — see example below.

Quick Start

from memvid_sdk import create, use
import os

path = 'knowledge.mv2'

# CRITICAL: Check if file exists to avoid data loss!
if os.path.exists(path):
    mem = use('basic', path)  # Open existing file (kind first!)
else:
    mem = create(path)  # Create new file (path first!)

# Lexical search (BM25) is enabled by default - no need to call enable_lex()

# Add documents
mem.put(
    title='Meeting Notes',
    label='notes',
    metadata={'source': 'slack'},
    text='Alice mentioned she works at Anthropic...'
)

# Search works immediately
results = mem.find('who works at AI companies?', k=5, mode='lex')
print(results['hits'])

# Ask questions
answer = mem.ask('What does Alice do?', k=5, mode='lex')
print(answer['answer'])

# Seal when done (commits changes)
mem.seal()

API Reference

CategoryMethodsDescription
File Operationscreate, use, closeCreate, open, close memory files
Data Ingestionput, put_many, put_file, put_filesAdd documents with embeddings
Searchfind, ask, timelineQuery your memory
Correctionscorrect, correct_manyStore ground truth with retrieval boost
Memory Cardsmemories, state, enrich, add_memory_cardsStructured fact extraction
Tablesput_pdf_tables, list_tables, get_tablePDF table extraction
Sessionssession_start, session_end, session_replayTime-travel debugging
Ticketssync_tickets, current_ticket, get_capacityCapacity management
Securitylock, unlock, lock_who, lock_nudgeEncryption and access control
Utilitiesverify, doctorMaintenance and utilities

Context Manager

import memvid_sdk as memvid

# Automatically closes when done
with memvid.use('basic', 'memory.mv2') as mem:
    mem.put('Doc', 'test', {}, text='Content')
    results = mem.find('query')

Core Functions

File Operations

from memvid_sdk import create, use, lock, unlock, info

# Create new memory file
mem = create('project.mv2')

# With options
mem = create(
    'project.mv2',
    enable_vec=True,      # Enable vector index
    enable_lex=True,      # Enable lexical index
    memory_id='mem_abc',  # Bind to dashboard
    api_key='mv_live_...' # Dashboard API key
)

# Open existing memory with adapter
mem = use('basic', 'project.mv2', mode='open')

# Available adapters
mem = use('langchain', 'file.mv2')
mem = use('llamaindex', 'file.mv2')
mem = use('crewai', 'file.mv2')
mem = use('autogen', 'file.mv2')
mem = use('haystack', 'file.mv2')
mem = use('langgraph', 'file.mv2')
mem = use('semantic-kernel', 'file.mv2')
mem = use('openai', 'file.mv2')
mem = use('google-adk', 'file.mv2')

# Get SDK info
sdk_info = info()

# Verify file integrity
result = mem.verify(deep=True)

# Repair and optimize
result = mem.doctor(
    rebuild_time_index=True,
    rebuild_vec_index=True,
    vacuum=True
)

Context Manager Support

from memvid_sdk import create

# Automatically closes when done
with create('project.mv2') as mem:
    mem.put('Title', 'label', {}, text='Content')
    results = mem.find('query')

Data Ingestion

put() - Add Single Document

mem.put(
    'Document Title',      # title (required, positional)
    'label',               # label (required, positional)
    {},                    # metadata dict (required, can be {})

    # Content (one of these)
    text='Document content...',
    file='/path/to/document.pdf',

    # Optional
    uri='mv2://docs/intro',
    tags=['api', 'v2'],
    labels=['public', 'reviewed'],
    kind='markdown',
    track='documentation',

    # Embeddings
    enable_embedding=True,
    embedding_model='bge-small',   # or 'openai', 'nomic', etc.
    vector_compression=True,       # 16x compression with PQ

    # Behavior
    auto_tag=True,                 # Auto-generate tags
    extract_dates=True             # Extract date mentions
)

put_many() - Batch Ingestion

docs = [
    {'title': 'Doc 1', 'label': 'kb', 'text': 'First document'},
    {'title': 'Doc 2', 'label': 'kb', 'text': 'Second document'},
    {'title': 'Doc 3', 'label': 'kb', 'text': 'Third document'}
]

frame_ids = mem.put_many(
    docs,
    embedder=openai_embeddings,  # Custom embedder
    opts={
        'compression_level': 3,
        'enable_embedding': True,
        'embedding_model': 'bge-small'
    }
)

put_file() - Document Parsing

Ingest documents directly from files. Supports PDF, DOCX, XLSX, PPTX, and more. The SDK automatically extracts text content and creates searchable frames.
# Single file ingestion
frames = mem.put_file('/path/to/report.pdf')
print(f'Ingested {len(frames)} frames from PDF')

# With options
frames = mem.put_file(
    '/path/to/presentation.pptx',
    chunk_size=1000,           # Characters per chunk
    chunk_overlap=200,         # Overlap between chunks
    enable_embedding=True,     # Generate embeddings
    embedding_model='bge-small'
)

# Excel/XLSX files
frames = mem.put_file('/path/to/data.xlsx')
# Each sheet becomes searchable content

# Word documents
frames = mem.put_file('/path/to/document.docx')

put_files() - Batch Document Ingestion

Ingest multiple documents at once:
files = [
    '/path/to/report.pdf',
    '/path/to/slides.pptx',
    '/path/to/data.xlsx',
    '/path/to/notes.docx'
]

all_frames = mem.put_files(
    files,
    chunk_size=1000,
    enable_embedding=True
)

print(f'Total frames: {len(all_frames)}')
Supported Formats:
  • PDF - Text extraction with page-aware chunking
  • DOCX - Microsoft Word documents
  • XLSX - Excel spreadsheets (all sheets, formulas evaluated)
  • PPTX - PowerPoint presentations (slide text and notes)
Dependencies: Document parsing requires optional dependencies. Install them with:
pip install memvid-sdk[documents]
# or install individually:
pip install pypdf openpyxl python-pptx python-docx
For XLSX files with formulas, the SDK extracts the calculated values, not the formula text. This ensures searchable, meaningful content.

Search & Retrieval

# Simple search
results = mem.find('budget projections')

# With options
results = mem.find(
    'financial outlook',
    mode='auto',              # 'lex', 'sem', 'auto'
    k=10,                     # Number of results
    snippet_chars=480,        # Snippet length
    scope='track:meetings',   # Scope filter

    # Adaptive retrieval
    adaptive=True,
    min_relevancy=0.5,
    max_k=100,
    adaptive_strategy='combined',  # 'relative', 'absolute', 'cliff', 'elbow'

    # Time-travel
    as_of_frame=100,
    as_of_ts=1704067200,

    # Custom embeddings
    embedder=custom_embedder,
    query_embedding_model='openai'
)

for hit in results['hits']:
    print(hit['title'], hit['snippet'])

ask() - LLM Q&A

answer = mem.ask(
    'What was decided about the budget?',
    k=8,
    mode='auto',

    # LLM settings
    model='gpt-4o-mini',
    api_key=os.environ['OPENAI_API_KEY'],
    llm_context_chars=120000,

    # Privacy
    mask_pii=True,

    # Time filters
    since=1704067200,
    until=1706745600,

    # Options
    context_only=False,    # Set True to skip synthesis

    # Adaptive retrieval
    adaptive=True,
    min_relevancy=0.5
)

print(answer['answer'])
print(answer['hits'])  # Source documents

Grounding & Hallucination Detection

The ask() response includes a grounding object that measures how well the answer is supported by context:
answer = mem.ask(
    'What is the API endpoint?',
    model='gpt-4o-mini',
    api_key=os.environ['OPENAI_API_KEY']
)

# Check grounding quality
print(answer['grounding'])
# {
#   'score': 0.85,
#   'label': 'HIGH',           # 'LOW', 'MEDIUM', or 'HIGH'
#   'sentence_count': 3,
#   'grounded_sentences': 3,
#   'has_warning': False,
#   'warning_reason': None
# }

# Check if follow-up is needed
follow_up = answer.get('follow_up')
if follow_up and follow_up['needed']:
    print('Low confidence:', follow_up['reason'])
    print('Try these instead:', follow_up['suggestions'])
Grounding Fields:
FieldTypeDescription
scorefloatGrounding score from 0.0 to 1.0
labelstrQuality label: LOW, MEDIUM, or HIGH
sentence_countintSentences in the answer
grounded_sentencesintSentences supported by context
has_warningboolTrue if answer may be hallucinated
warning_reasonstr?Explanation if warning is present
Follow-up Fields:
FieldTypeDescription
neededboolTrue if answer confidence is low
reasonstrWhy confidence is low
hintstrHelpful hint for the user
available_topicslist[str]Topics in this memory
suggestionslist[str]Suggested follow-up questions

correct() - Ground Truth Corrections

Store authoritative corrections that take priority in future retrievals:
# Store a correction
frame_id = mem.correct('Ben Koenig reported to Chloe Nguyen before 2025')

# With options
frame_id = mem.correct(
    'The API rate limit is 1000 req/min',
    source='Engineering Team - Jan 2025',
    topics=['API', 'rate limiting'],
    boost=2.5  # Higher retrieval priority (default: 2.0)
)

# Batch corrections
frame_ids = mem.correct_many([
    {'statement': 'OAuth tokens expire after 24 hours', 'topics': ['auth', 'OAuth']},
    {'statement': 'Production DB is db.prod.example.com', 'source': 'Ops Team'}
])

# Verify correction is retrievable
results = mem.find('Ben Koenig reported to')
print(results['hits'][0]['snippet'])  # Should show the correction
Use correct() to fix hallucinations or add verified facts. Corrections receive boosted retrieval scores and are labeled [Correction] in results.

Memory Cards (Entity Extraction)

Automatic Enrichment

# Extract facts using rules engine (fast, offline)
result = mem.enrich(engine='rules')

# Extract with LLM (more accurate)
result = mem.enrich(engine='openai')

# View extracted cards
memories = mem.memories()
print(memories['cards'])

# Filter by entity
alice_cards = mem.memories(entity='Alice')

# Get entity state (O(1) lookup)
alice = mem.state('Alice')
print(alice['slots'])
# {'employer': 'Anthropic', 'role': 'Engineer', 'location': 'SF'}

# Get stats
stats = mem.memories_stats()
print(stats['entityCount'], stats['cardCount'])

# List all entities
entities = mem.memory_entities()

Manual Memory Cards

# Add SPO triplets directly
result = mem.add_memory_cards([
    {'entity': 'Alice', 'slot': 'employer', 'value': 'Anthropic'},
    {'entity': 'Alice', 'slot': 'role', 'value': 'Senior Engineer'},
    {'entity': 'Bob', 'slot': 'team', 'value': 'Infrastructure'}
])

print(result['added'], result['ids'])

Export Facts

# Export to JSON
json_data = mem.export_facts(format='json')

# Export to CSV
csv_data = mem.export_facts(format='csv', entity='Alice')

# Export to N-Triples (RDF)
ntriples = mem.export_facts(format='ntriples')

Table Extraction

# Extract tables from PDF
result = mem.put_pdf_tables('financial-report.pdf', embed_rows=True)
print(f"Extracted {result['tables_count']} tables")

# List all tables
tables = mem.list_tables()
for table in tables:
    print(table['table_id'], table['n_rows'], table['n_cols'])

# Get table data
data = mem.get_table('tbl_001', format='dict')
csv_data = mem.get_table('tbl_001', format='csv')

Time-Travel & Sessions

Timeline Queries

timeline = mem.timeline(
    limit=50,
    since=1704067200,
    until=1706745600,
    reverse=True,
    as_of_frame=100
)

Session Recording

# Start recording
session_id = mem.session_start('qa-test')

# Perform operations
mem.find('test query')
mem.ask('What happened?')

# Add checkpoint
mem.session_checkpoint()

# End session
summary = mem.session_end()

# List sessions
sessions = mem.session_list()

# Replay session with different params
replay = mem.session_replay(session_id, top_k=10, adaptive=True)
print(replay['match_rate'])

# Delete session
mem.session_delete(session_id)

Encryption & Security

from memvid_sdk import lock, unlock, lock_who, lock_nudge

# Encrypt to .mv2e capsule
encrypted_path = lock(
    'project.mv2',
    password='secret',
    force=True
)

# Decrypt back to .mv2
decrypted_path = unlock(
    'project.mv2e',
    password='secret'
)

# Check who has the lock
lock_info = lock_who('project.mv2')

# Nudge stale lock
released = lock_nudge('project.mv2')

Tickets & Capacity

# Get current capacity
capacity = mem.get_capacity()

# Get current ticket info
ticket = mem.current_ticket()

# Sync tickets from dashboard
result = mem.sync_tickets('mem_abc123', api_key)

# Apply ticket manually
mem.apply_ticket(ticket_string)

# Get memory binding
binding = mem.get_memory_binding()

# Unbind from dashboard
mem.unbind_memory()

Cloud Project & Memory Management

Programmatically create projects and memories on the Memvid dashboard, then bind local .mv2 files to them.

Configure SDK

from memvid_sdk import configure

configure({
    "api_key": "mv2_your_api_key_here",
    "dashboard_url": "https://memvid.com"
})

Create and List Projects

from memvid_sdk import create_project, list_projects

# Create a new project
project = create_project(
    "My AI Project",
    description="Knowledge base for my AI agent"
)
print(f"Project ID: {project['id']}")
print(f"Slug: {project['slug']}")

# List all projects
projects = list_projects()
for proj in projects:
    print(f"{proj['name']} ({proj['id']})")
Project Response Fields:
FieldTypeDescription
idstrUnique project ID
organisation_idstrOrganisation ID
slugstrURL-friendly slug
namestrProject name
descriptionstr?Optional description
created_atstrISO 8601 timestamp
updated_atstrISO 8601 timestamp

Create and List Memories

from memvid_sdk import create_memory, list_memories

# Create a memory in a project
memory = create_memory(
    "Agent Memory",
    description="Long-term memory for chatbot",
    project_id=project["id"]
)
print(f"Memory ID: {memory['id']}")
print(f"Display Name: {memory['display_name']}")

# List all memories
all_memories = list_memories()

# List memories in a specific project
project_memories = list_memories(project_id=project["id"])

Bind Local File to Cloud Memory

from memvid_sdk import create

# Create local .mv2 file bound to cloud memory
mv = create("./agent.mv2", memory_id=memory["id"])
mv.enable_lex()  # Enable lexical search

# Add content
mv.put(title="Meeting Notes", label="notes", metadata={}, text="Today we discussed...")

# Search
results = mv.find("discussed", k=5)

# Close
mv.seal()

Complete Example

import os
from memvid_sdk import configure, create_project, create_memory, create

# 1. Configure SDK
configure({"api_key": os.environ["MEMVID_API_KEY"]})

# 2. Create project
project = create_project("Knowledge Base", description="Company docs")

# 3. Create cloud memory in project
memory = create_memory("Docs Memory", project_id=project["id"])

# 4. Create local file bound to cloud memory
mv = create("./docs.mv2", memory_id=memory["id"])
mv.enable_lex()

# 5. Add content
mv.put(title="API Guide", label="docs", metadata={}, text="API documentation...")
mv.put(title="FAQ", label="docs", metadata={}, text="Frequently asked questions...")

# 6. Search
results = mv.find("API", k=5)
print(f"Found {len(results['hits'])} results")

# 7. Clean up
mv.seal()

Embedding Providers

External Providers

from memvid_sdk.embeddings import (
    OpenAIEmbeddings,
    GeminiEmbeddings,
    MistralEmbeddings,
    CohereEmbeddings,
    VoyageEmbeddings,
    NvidiaEmbeddings,
    HuggingFaceEmbeddings,
    get_embedder
)

# OpenAI
openai = OpenAIEmbeddings(
    api_key=os.environ['OPENAI_API_KEY'],
    model='text-embedding-3-small'  # or 'text-embedding-3-large'
)

# Gemini
gemini = GeminiEmbeddings(
    api_key=os.environ['GEMINI_API_KEY'],
    model='text-embedding-004'
)

# Mistral
mistral = MistralEmbeddings(
    api_key=os.environ['MISTRAL_API_KEY']
)

# Cohere
cohere = CohereEmbeddings(
    api_key=os.environ['COHERE_API_KEY'],
    model='embed-english-v3.0'
)

# Voyage
voyage = VoyageEmbeddings(
    api_key=os.environ['VOYAGE_API_KEY'],
    model='voyage-3'
)

# NVIDIA
nvidia = NvidiaEmbeddings(
    api_key=os.environ['NVIDIA_API_KEY']
)

# HuggingFace (local)
hf = HuggingFaceEmbeddings(model='all-MiniLM-L6-v2')

# Factory function
embedder = get_embedder('openai', api_key='...')

# Use with put_many
mem.put_many(docs, embedder=openai)

# Use with find
mem.find('query', embedder=gemini)

Local Embeddings (No API Required)

from memvid_sdk.embeddings import LOCAL_EMBEDDING_MODELS

mem.put(
    'Title', 'label', {},
    text='content',
    enable_embedding=True,
    embedding_model=LOCAL_EMBEDDING_MODELS['BGE_SMALL']  # 384d, fast
)

# Available local models
LOCAL_EMBEDDING_MODELS['BGE_SMALL']   # 384d - fastest
LOCAL_EMBEDDING_MODELS['BGE_BASE']    # 768d - balanced
LOCAL_EMBEDDING_MODELS['NOMIC']       # 768d - general purpose
LOCAL_EMBEDDING_MODELS['GTE_LARGE']   # 1024d - highest quality

Error Handling

from memvid_sdk import (
    MemvidError,
    CapacityExceededError,      # MV001
    TicketInvalidError,         # MV002
    TicketReplayError,          # MV003
    LexIndexDisabledError,      # MV004
    TimeIndexMissingError,      # MV005
    VerifyFailedError,          # MV006
    LockedError,                # MV007
    ApiKeyRequiredError,        # MV008
    MemoryAlreadyBoundError,    # MV009
    FrameNotFoundError,         # MV010
    VecIndexDisabledError,      # MV011
    CorruptFileError,           # MV012
    VecDimensionMismatchError,  # MV014
    EmbeddingFailedError,       # MV015
    EncryptionError,            # MV016
    NerModelNotAvailableError,  # MV017
    ClipIndexDisabledError      # MV018
)

try:
    mem.put('Large file', 'data', {}, file='huge.bin')
except CapacityExceededError:
    print('Storage capacity exceeded (MV001)')
except LockedError:
    print('File locked by another process (MV007)')
except VecIndexDisabledError:
    print('Enable vector index first (MV011)')

Asset Extraction

# Get frame by URI
frame = mem.frame('mv2://docs/intro')

# Get raw binary data
data = mem.blob('mv2://docs/intro')

Utility Functions

from memvid_sdk import info, flush_analytics, is_telemetry_enabled, verify_single_file

# Get SDK info
sdk_info = info()
print(sdk_info['version'], sdk_info['platform'])

# Flush analytics
flush_analytics()

# Check telemetry status
enabled = is_telemetry_enabled()

# Verify no auxiliary files
verify_single_file('project.mv2')

Environment Variables

VariableDescription
MEMVID_API_KEYDashboard API key for sync
OPENAI_API_KEYOpenAI embeddings and LLM
GEMINI_API_KEYGemini embeddings
MISTRAL_API_KEYMistral embeddings
COHERE_API_KEYCohere embeddings
VOYAGE_API_KEYVoyage embeddings
NVIDIA_API_KEYNVIDIA embeddings
ANTHROPIC_API_KEYClaude for entities
MEMVID_MODELS_DIRModel cache directory
MEMVID_OFFLINEUse cached models only

Next Steps