Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Benchmarks

This folder contains the evaluation suites used by EverOS to measure memory quality and agent self-evolution. Use these benchmarks to reproduce reported results, compare memory systems, or evaluate new agent learning methods.

Included Benchmarks

Benchmark What it measures Start here
EverMemBench Long-term memory quality in multi-person group conversations, including factual recall, applied reasoning, and personalized generalization. EverMemBench/
EvoAgentBench Agent self-evolution across information retrieval, reasoning, software engineering, code implementation, and knowledge-work tasks. EvoAgentBench/

How to Use This Folder

  • Start with EverMemBench/ to evaluate memory retrieval and answer quality.
  • Start with EvoAgentBench/ to evaluate whether agents improve from past experience.
  • Use the top-level Benchmarks and Evaluation sections for the project-level benchmark overview.