Atlas is a system for continual learning from agent workflows, designed to close the loop between real-world agent execution and model improvement. The platform is composed of two main components:
The Atlas SDK is the runtime component that wraps existing agent systems in a dual-agent reasoning loop. It captures causality data (student attempt → teacher intervention → outcome) and streams rich telemetry to a database.
Atlas Core (this repository) is the offline training engine that uses this data to train improved models via methods like on-policy distillation (GKD) and reinforcement learning (GRPO).
The SDK captures causality traces and feeds the reward system; Atlas Core trains new teacher checkpoints from this data.
-
Prepare the runtime export
# From the atlas-sdk repo after running adaptive episodes atlas init # optional helper to launch Postgres arc-atlas review sessions --database-url postgresql://atlas:atlas@localhost:5433/atlas --status pending # Approve or quarantine as needed, then export approved sessions arc-atlas --database-url postgresql://atlas:atlas@localhost:5433/atlas \ --include-status approved \ --output traces/runtime.jsonl
Each record carries
triage_dossier,adaptive_summary, persona usage/updates, plan/step traces, and reward payloads—the exact inputs Atlas Core expects. -
Train your model Use this repository's training pipeline to update your teacher model from runtime traces. Atlas Core supports multiple training methods depending on your needs—see Training Methods for detailed guidance.
# Example: GRPO training python scripts/run_offline_pipeline.py \ --export-path traces/runtime.jsonl \ output_dir=results/teacher-grpoOverride Hydra arguments (model, batch size, GPUs) as needed; the helper wires up
configs/run/teacher_rcl.yamlby default. -
Redeploy the checkpoint Point the runtime SDK at your output directory (e.g.,
results/teacher-grpo/rl_checkpoint/) to load the new teacher, then rerunatlas.core.runto close the loop.
Atlas Core provides flexible training capabilities for different scenarios:
- GRPO – Reinforcement learning from reward signals in runtime traces. Updates teacher policies by optimizing for task success and efficiency.
- GKD – Distill large models into smaller, deployment-optimized variants. 9-30× faster training than GRPO for creating compact production models.
- SFT – Supervised fine-tuning on approved traces. Direct imitation learning from high-quality runtime episodes.
Each method uses the same Postgres-backed dataset infrastructure and Hydra configuration system. All training methods support direct database access, reward filtering, and multi-turn conversation workflows.
See the Training Guide for detailed comparisons, configuration options, and when to use each method.
Need scoring without training? Import RIMReward directly:
from RIM.reward_adapter import RIMReward
reward_system = RIMReward(config_path="configs/rim_config.yaml")
score = reward_system.evaluate(prompt="...", response="...")
print(score.score, score.rationale)- Atlas Core Docs – Offline training guides, reward system reference, architecture deep dives
- SDK Docs – Runtime orchestration, export/review CLI, online adaptation
- Evaluation Harnesses – Learning, runtime, and reward harness workflows
- Technical Report – Research, benchmarks, and methodology
python -m venv .venv
source .venv/bin/activate
pip install -r requirements-py312.txtNeed GPU-backed training? Install PyTorch matching your CUDA stack, then run
pip install -r requirements-py312.txt.On Linux/CUDA environments the pinned
bitsandbyteswheel will install automatically; on macOS or Windows it is skipped.
- Format / lint:
ruff check . - Tests:
pytest - Docs sanity:
mintlify broken-links(requires interactive prompt today) - Type checking:
pyright(coverstrain.py, offline CLI helpers, and the runtime trace ingest path; seepyrightconfig.json)
We track major changes in CHANGELOG.md.
MIT © Arc Computer
