benchmarks

Benchmarks

This folder contains the evaluation suites used by EverOS to measure memory quality and agent self-evolution. Use these benchmarks to reproduce reported results, compare memory systems, or evaluate new agent learning methods.

Included Benchmarks

Benchmark	What it measures	Start here
EverMemBench	Long-term memory quality in multi-person group conversations, including factual recall, applied reasoning, and personalized generalization.	EverMemBench/
EvoAgentBench	Agent self-evolution across information retrieval, reasoning, software engineering, code implementation, and knowledge-work tasks.	EvoAgentBench/

How to Use This Folder

Start with EverMemBench/ to evaluate memory retrieval and answer quality.
Start with EvoAgentBench/ to evaluate whether agents improve from past experience.
Use the top-level Benchmarks and Evaluation sections for the project-level benchmark overview.

Name		Name	Last commit message	Last commit date
parent directory ..
EverMemBench		EverMemBench
EvoAgentBench		EvoAgentBench
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Benchmarks

Included Benchmarks

How to Use This Folder

FilesExpand file tree

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

Benchmarks

Included Benchmarks

How to Use This Folder