Highlights
Stars
- All languages
- ApacheConf
- Assembly
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- CoffeeScript
- Cuda
- EJS
- Go
- HTML
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Mathematica
- Mojo
- Objective-C
- PHP
- Perl
- PowerShell
- Python
- QML
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Solidity
- SourcePawn
- Swift
- TeX
- TypeScript
- Vala
- Vim Script
- Vue
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Streamline on-policy/off-policy distillation workflows in a few lines of code
My learning notes for ML SYS.
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
A simple plug-in framework that corrects bias and computes confidence intervals in reporting LLM-as-a-judge evaluation, and an adaptive algorithm that efficiently allocates calibration samples to r…
A calm, CLI-native way to semantically grep everything, like code, images, pdfs and more.
A non-saturating, open-ended environment for evaluating LLMs in Factorio
[Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
A challenging aggregation benchmark for long-context models
Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature convergence and unlock greater RL potential.
Harbor is a framework for running agent evaluations and creating and using RL environments.
Easy, safe evaluation of arbitrary Python code
Daytona is a Secure and Elastic Infrastructure for Running AI-Generated Code
Ultrafast serverless GPU inference, sandboxes, and background jobs
A fully customizable and self-hosted sandboxing solution for AI agent code execution and computer use. It features out-of-the-box support for backtracking, a simple REST API and Python SDK, automat…
Python & JS/TS SDK for running AI-generated code/code interpreting in your AI app
This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"
Content of Online Encyclopedia of Integer Sequences (OEIS)
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
A framework for the evaluation of autoregressive code generation language models.
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!
Official code of "StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs".
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping" by Zhiheng Xi et al.