Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View crizCraig's full-sized avatar
🛠️
Building
🛠️
Building

Sponsoring

@tjbck

Organizations

@deepdrive

Block or report crizCraig

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Temporal service

Go 16,293 1,153 Updated Oct 24, 2025

Provider-agnostic, open-source evaluation infrastructure for language models

Python 626 73 Updated Oct 24, 2025

The glamourous AI coding agent for your favourite terminal 💘

Go 14,186 763 Updated Oct 24, 2025

Trae Agent is an LLM-based agent for general purpose software engineering tasks.

Python 9,750 1,008 Updated Sep 24, 2025

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 1,337 116 Updated Oct 9, 2025

"Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy. A frontier, first-principles handbook inspi…

Python 7,334 811 Updated Sep 30, 2025

Generate a timeline of your day, automatically

Swift 3,661 152 Updated Oct 22, 2025

User-friendly WebUI for LLMs (Formerly Ollama WebUI)

Svelte 4 Updated Feb 17, 2025

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

C++ 358 23 Updated Apr 11, 2025

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

121 6 Updated Mar 5, 2025

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Python 1,443 192 Updated Sep 3, 2024

📄🧠 PageIndex: Document Index for Reasoning-based RAG

Python 2,881 214 Updated Oct 14, 2025

An associative memory system that stores and retrieves experiences using the 5W1H framework (Who, What, When, Where, Why, How) and content-addressable memory.

Python 172 28 Updated Sep 15, 2025
TypeScript 154 33 Updated Jul 9, 2025

Evals for Context Memory

Python 1 Updated Aug 22, 2025

Frontier Models playing the board game Diplomacy.

Python 597 86 Updated Sep 5, 2025

Native mobile client for OpenWebUI. Chat with your self‑hosted AI.

Dart 574 45 Updated Oct 23, 2025

Run claude code in somewhat safe and isolated yolo mode

Shell 37 7 Updated Aug 14, 2025

A CLI tool for analyzing Claude Code/Codex CLI usage from local JSONL files.

TypeScript 8,658 268 Updated Oct 21, 2025

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

TypeScript 116 37 Updated Oct 23, 2025
Python 113 14 Updated Oct 16, 2025

Collection of evals for Inspect AI

Python 262 185 Updated Oct 23, 2025

Public repository containing METR's DVC pipeline for eval data analysis

Python 122 24 Updated Apr 6, 2025
Python 68 14 Updated Oct 22, 2025

SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents

Python 69 9 Updated Oct 13, 2025

OpenAI Frontier Evals

Python 923 105 Updated Oct 21, 2025
Python 94 12 Updated Sep 12, 2025

Code search MCP for Claude Code. Make entire codebase the context for any coding agent.

TypeScript 4,199 367 Updated Sep 16, 2025

Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

Shell 218 261 Updated Oct 21, 2025
Next