VCCL: Venus Collective Communication Library

📄 Paper | 📚 Documentation | Discord | WeChat Group | 🇨🇳 中文

VCCL is a collective communication library for GPUs. It provides communication primitives such as all-reduce, all-gather, reduce, broadcast, reduce-scatter, and general send/recv. It is compatible with PCIe, NVLink, and NVSwitch, and supports cross-node communication via InfiniBand Verbs or TCP/IP sockets. It can be used in single-node/multi-node, multi-process (e.g., MPI), or single-process applications.

🅾 Introduction

VCCL redefines the GPU communication experience with three core capabilities: High Efficiency, High Availability, and High Visibility.

High Efficiency
Inspired by the DPDK design philosophy, VCCL introduces a “DPDK-Like P2P” high-performance scheduling mechanism, ensuring that GPUs remain fully utilized.
In the early days of high-speed networking on CPUs, achieving 10Gbps network performance was nearly impossible due to kernel stack overhead (multiple memory copies, interrupt handling inefficiencies). DPDK solved this by leveraging hugepage memory + zero-copy and moving the data path from kernel space to user space.
Similarly, current CUDA still faces limitations in communication/computation scheduling and API granularity (public sources note that ~20 out of 132 SMs on the H800 GPU are reserved for communication). VCCL adopts an analogous optimization strategy: offloading communication tasks from GPU CUDA stack to the CPU side, combined with zero-copy and global load balancing across pipeline parallel workflows (PP).
In training dense models with hundreds of billions of parameters, our internal benchmarks show that cluster-wide training compute efficiency improves by ~2% compared to state-of-the-art baselines (More about use zerocopy for training).

Note: The SM-Free mode currently does not support fault tolerance or telemetry; this is planned as future work.
High Availability
Provides a lightweight local recovery fault-tolerance mechanism that effectively handles NIC failures and switch faults without significantly increasing system overhead. Concretly, when link fail occurs, VCCL can migrades the traffic within one iteration by creating a backup QP. Simultaneously, VCCL supports seamless traffic recovery to the primary QP once link integrity is re-established. In practice, this reduces overall training interruption rates by over 50% (More about fault tolerance).
High Visibility
Offers microsecond-level sliding-window flow telemetry, enabling efficient bottleneck localization and congestion detection for performance tuning(More about flow telemetry).

For more information about VCCL and how to use it, please refer to the VCCL documentation.

Build from Source

Note: Currently, only source builds are supported.

$ git clone https://github.com/sii-research/VCCL.git
$ cd VCCL
$ make -j src.build

If CUDA is not installed under /usr/local/cuda:

$ make src.build CUDA_HOME=<path to cuda install>

Build artifacts are placed in the build/ directory (can be customized via BUILDDIR).

By default, VCCL compiles for all supported architectures. To speed up builds and reduce binary size, redefine NVCC_GENCODE (in makefiles/common.mk) to include only your target architecture(s):

# Example: build only for Hopper architecture (H100/H200, sm_90)
$ make -j80 src.build NVCC_GENCODE="-gencode=arch=compute_90,code=sm_90"

📦 Packaging & Installation

To install VCCL on your system, build a package and install it as root:

# Debian/Ubuntu
sudo apt install -y build-essential devscripts debhelper fakeroot
make pkg.debian.build
ls build/pkg/deb/

# RedHat/CentOS
sudo yum install -y rpm-build rpmdevtools
make pkg.redhat.build
ls build/pkg/rpm/

# OS-agnostic tarball
make pkg.txz.build
ls build/pkg/txz/

🍽️ Testing

Tests for VCCL are maintained separately at NVIDIA NCCL Tests.

$ git clone https://github.com/NVIDIA/nccl-tests.git
$ cd nccl-tests
$ make
$ ./build/all_reduce_perf -b 8 -e 256M -f 2 -g <ngpus>

🙇‍♂️ License & Acknowledgements

This project is developed based on nccl_2.26.6-1, and retains upstream copyright and license information in the relevant files.
See the LICENSE file for detailed terms.
Thanks to the open-source community (including but not limited to NCCL and nccl-tests) for their outstanding work.

Name		Name	Last commit message	Last commit date
Latest commit History 347 Commits
.github/workflows		.github/workflows
.vscode		.vscode
asset		asset
ext-net		ext-net
ext-profiler		ext-profiler
ext-tuner/example		ext-tuner/example
makefiles		makefiles
pkg		pkg
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.cn.md		README.cn.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VCCL: Venus Collective Communication Library

🅾 Introduction

Build from Source

📦 Packaging & Installation

🍽️ Testing

🙇‍♂️ License & Acknowledgements

About

Uh oh!

Releases 1

Packages

Contributors 48

Uh oh!

Languages

License

sii-research/VCCL

Folders and files

Latest commit

History

Repository files navigation

VCCL: Venus Collective Communication Library

🅾 Introduction

Build from Source

📦 Packaging & Installation

🍽️ Testing

🙇‍♂️ License & Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 48

Uh oh!

Languages

Packages