Stars
- All languages
- ApacheConf
- Assembly
- Batchfile
- Bikeshed
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Cuda
- Cython
- Dart
- Dockerfile
- Go
- HTML
- Haskell
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- OCaml
- Objective-C
- PHP
- Perl
- Puppet
- Python
- Ruby
- Rust
- SCSS
- Scala
- Scheme
- Shell
- Starlark
- Svelte
- Swift
- TeX
- TypeScript
- Vim Script
Provider-agnostic, open-source evaluation infrastructure for language models
The glamourous AI coding agent for your favourite terminal 💘
Trae Agent is an LLM-based agent for general purpose software engineering tasks.
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
"Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy. A frontier, first-principles handbook inspi…
Generate a timeline of your day, automatically
crizCraig / open-webui
Forked from open-webui/open-webuiUser-friendly WebUI for LLMs (Formerly Ollama WebUI)
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
📄🧠 PageIndex: Document Index for Reasoning-based RAG
An associative memory system that stores and retrieves experiences using the 5W1H framework (Who, What, When, Where, Why, How) and content-addressable memory.
Frontier Models playing the board game Diplomacy.
Native mobile client for OpenWebUI. Chat with your self‑hosted AI.
Run claude code in somewhat safe and isolated yolo mode
A CLI tool for analyzing Claude Code/Codex CLI usage from local JSONL files.
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
Collection of evals for Inspect AI
Public repository containing METR's DVC pipeline for eval data analysis
SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
Code search MCP for Claude Code. Make entire codebase the context for any coding agent.
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.