Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View yuhong-zhong's full-sized avatar

Highlights

  • Pro

Block or report yuhong-zhong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
C++ 56 20 Updated May 31, 2025

Query-Adaptive Vector Search

C++ 59 12 Updated Oct 27, 2025

Run any GUI app in the terminalâť—

TypeScript 6,711 149 Updated Oct 26, 2025
C 5 Updated Aug 26, 2025

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 3,246 335 Updated Oct 27, 2025

Infrastructure that's powering E2B Cloud.

Go 683 180 Updated Oct 27, 2025

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,839 244 Updated Oct 22, 2025

Code search MCP for Claude Code. Make entire codebase the context for any coding agent.

TypeScript 4,254 373 Updated Sep 16, 2025

Golang implementation of the Raft consensus protocol

Go 8,806 1,040 Updated Sep 28, 2025

A vector search SQLite extension that runs anywhere!

C 6,323 239 Updated Jan 24, 2025

Trae Agent is an LLM-based agent for general purpose software engineering tasks.

Python 9,787 1,013 Updated Sep 24, 2025

AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.

Rust 3,356 288 Updated Sep 23, 2025

Efficient Compute-Communication Overlap for Distributed LLM Inference

Python 61 4 Updated Oct 1, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 909 44 Updated Oct 22, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,683 1,499 Updated Oct 22, 2025

A lightweight design for computation-communication overlap.

Cuda 182 8 Updated Oct 10, 2025

NCCL Tests

Cuda 1,306 322 Updated Oct 25, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,201 99 Updated Oct 17, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,154 82 Updated Aug 28, 2025

Fast and memory-efficient exact attention

Python 20,198 2,090 Updated Oct 28, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,968 542 Updated Oct 28, 2025

An open source, self-hosted implementation of the Tailscale control server

Go 32,005 1,706 Updated Oct 28, 2025

Ultra and Unified CCL

C++ 629 51 Updated Oct 28, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,186 1,049 Updated Oct 18, 2025

[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents

Python 435 69 Updated Oct 27, 2025

NVIDIA Inference Xfer Library (NIXL)

C++ 688 171 Updated Oct 28, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,160 409 Updated Oct 28, 2025
Jupyter Notebook 124 12 Updated Nov 11, 2024

🙌 OpenHands: Code Less, Make More

Python 64,497 7,831 Updated Oct 28, 2025
Python 1 1 Updated May 12, 2025
Next