Thanks to visit codestin.com
Credit goes to memvid.com

ResearchJanuary 2025

Memvid achieves 85.7% on the LoCoMo benchmark

A 35% relative improvement over leading memory systems on long-term conversational understanding.

Overall Results
0%

Categories 1-4 Accuracy

Memvid
85.65%
Full-context
72.90%
Mem0ᵍ
68.44%
Mem0
66.88%
Zep
65.99%
LangMem
58.10%
OpenAI
52.90%

Benchmark: LoCoMo · 10 conversations · ~26K tokens each · LLM-as-Judge evaluation

Source: Baseline results from arXiv:2504.19413. Some vendors dispute these figures; see paper for methodology.

By Category
CategoryMemvidMem0Mem0ᵍZepOpenAIΔ vs Avg
Single-hop80.1%67.1%65.7%61.7%63.8%+24%
Multi-hop80.4%51.1%47.2%41.4%42.9%+76%
Temporal71.9%55.5%58.1%49.3%21.7%+56%
World-knowledge91.1%72.9%75.7%76.6%62.3%+27%
Adversarial77.8%
Overall (Cat. 1-4)85.65%66.88%68.44%65.99%52.90%+35%

Following standard methodology, adversarial category is excluded from the primary metric.

Baseline figures sourced from arXiv:2504.19413. Results for some systems are disputed by their vendors.

Configuration
Embedding
text-embedding-3-large
Answer Model
GPT-4o
Judge Model
GPT-4o-mini
Search
Hybrid (BM25 + Semantic)
Top-K
60
Questions
1,986

Reproduce the Results

Open source benchmark suite

Our benchmark implementation is fully open source. Run the complete evaluation suite yourself and verify the results.

memvid/memvidbenchbun run bench:full