Stars
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
High performance Transformer implementation in C++.
Disaggregated serving system for Large Language Models (LLMs).
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Distributed MoE in a Single Kernel [NeurIPS '25]
Source code for the X Recommendation Algorithm
how to optimize some algorithm in cuda.
Ongoing research training transformer models at scale
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Kuboard 是基于 Kubernetes 的微服务管理界面。同时提供 Kubernetes 免费中文教程,入门教程,最新版本的 Kubernetes v1.23.4 安装手册,(k8s install) 在线答疑,持续更新。
C++ implementation of Raft core logic as a replication library
Cross-platform, customizable ML solutions for live and streaming media.
A Decentrilized Asynchronously Distribute Training framework
润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。
Visualizer for neural network, deep learning and machine learning models
TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.
Python package built to ease deep learning on graph, on top of existing DL frameworks.
Browser-based frontend to gdb (gnu debugger). Add breakpoints, view the stack, visualize data structures, and more in C, C++, Go, Rust, and Fortran. Run gdbgui from the terminal and a new tab will …
Demo for the "Talking Head Anime from a Single Image."
QUDA is a library for performing calculations in lattice QCD on GPUs.
Optimized primitives for collective multi-GPU communication
Open files with xdg-open on Bash for Windows in Windows applications. Read only mirror from GitLab, see link 👉
Repo for counting stars and contributing. Press F to pay respect to glorious developers.