Stars
Run any GUI app in the terminalâť—
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
Code search MCP for Claude Code. Make entire codebase the context for any coding agent.
Golang implementation of the Raft consensus protocol
A vector search SQLite extension that runs anywhere!
Trae Agent is an LLM-based agent for general purpose software engineering tasks.
AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.
Efficient Compute-Communication Overlap for Distributed LLM Inference
A throughput-oriented high-performance serving framework for LLMs
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A lightweight design for computation-communication overlap.
Distributed Compiler based on Triton for Parallel Systems
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Fast and memory-efficient exact attention
FlashInfer: Kernel Library for LLM Serving
An open source, self-hosted implementation of the Tailscale control server
Optimized primitives for collective multi-GPU communication
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.