Codestin Search App

Pre-built wheel for use in the aa-api pixi environment (PyTorch 2.8.0+cu128, Python 3.12, CUDA 12.8).

New in this release

use_default_stream_as_comm_stream: New Buffer.__init__ option to reuse the default CUDA stream for communication instead of allocating a separate one from the pool. When enabled, all stream_wait synchronization between comm and compute streams is skipped (since they're the same stream).
num_recv_tokens_per_expert_as_cuda: New kwarg on dispatch() / intranode_dispatch() / internode_dispatch(). When True, returns num_recv_tokens_per_expert as a CUDA int32 tensor (via cudaMemcpyAsync on the comm stream) instead of a Python list[int], avoiding a CPU→Python roundtrip.

Key changes from etongit/DeepEP v1.2.1

Relocatable RPATH: Uses $ORIGIN/nvidia/nvshmem/lib instead of hardcoded build path, so the wheel works in any environment with nvidia-nvshmem-cu12 installed via pip.
nvshmem dependency: nvidia-nvshmem-cu12>=3.5.19 declared in pyproject.toml so pip pulls the correct version.
NVSHMEM 3.5.19: Required for CoreWeave IB device naming (ibpX instead of mlx5_X). Set NVSHMEM_HCA_PREFIX=ibp at runtime.
Commit 48bd800 adds a cuda device init and sync inside of buffer.py

Build environment

Built on a compute node with GPU access using the aa-api pixi env's Python and PyTorch:

srun --nodes=1 --ntasks=1 --gres=gpu:1 bash -c '
cd /mnt/main0/home/zlin/code/DeepEP
export PIXI_ENV=/mnt/main0/home/zlin/code/evos/aa-api/.pixi/envs/default
unset CXX CC CFLAGS CXXFLAGS CPPFLAGS LDFLAGS
export CUDA_HOME=/usr/local/cuda-12.8
export PATH=/usr/local/cuda-12.8/bin:/usr/local/bin:/usr/bin:$PIXI_ENV/bin:$PATH
rm -rf dist/ build/ *.egg-info
python setup.py bdist_wheel
'

Key requirements:

Must build on a GPU node (needs libcuda.so)
Must unset CXX/CC/FLAGS (pixi conda compilers can't handle CUDA)
Must use the target env's Python/torch to match ABI (torch 2.8.0 ≠ torch 2.10.0)
Python 3.12, PyTorch 2.8.0+cu128, CUDA 12.8

Usage in pixi.toml

deep-ep = { url = "https://github.com/ebetica/DeepEP/releases/download/v1.2.1-fix2/deep_ep-1.2.1+536a37a-cp312-cp312-linux_x86_64.whl" }
nvidia-nvshmem-cu12 = "==3.5.19"
libnvshmem3 = "==3.5.19"

Pre-built wheel for use in the aa-api pixi environment (PyTorch 2.8.0+cu128, Python 3.12, CUDA 12.8).

New in this release

use_default_stream_as_comm_stream: New Buffer.__init__ option to reuse the default CUDA stream for communication instead of allocating a separate one from the pool. When enabled, all stream_wait synchronization between comm and compute streams is skipped (since they're the same stream).
num_recv_tokens_per_expert_as_cuda: New kwarg on dispatch() / intranode_dispatch() / internode_dispatch(). When True, returns num_recv_tokens_per_expert as a CUDA int32 tensor (via cudaMemcpyAsync on the comm stream) instead of a Python list[int], avoiding a CPU→Python roundtrip.

Key changes from etongit/DeepEP v1.2.1

Relocatable RPATH: Uses $ORIGIN/nvidia/nvshmem/lib instead of hardcoded build path, so the wheel works in any environment with nvidia-nvshmem-cu12 installed via pip.
nvshmem dependency: nvidia-nvshmem-cu12>=3.5.19 declared in pyproject.toml so pip pulls the correct version.
NVSHMEM 3.5.19: Required for CoreWeave IB device naming (ibpX instead of mlx5_X). Set NVSHMEM_HCA_PREFIX=ibp at runtime.
Commit 48bd800 adds a cuda device init and sync inside of buffer.py

Build environment

Built on a compute node with GPU access using the aa-api pixi env's Python and PyTorch:

srun --nodes=1 --ntasks=1 --gres=gpu:1 bash -c '
cd /mnt/main0/home/zlin/code/DeepEP
export PIXI_ENV=/mnt/main0/home/zlin/code/evos/aa-api/.pixi/envs/default
unset CXX CC CFLAGS CXXFLAGS CPPFLAGS LDFLAGS
export CUDA_HOME=/usr/local/cuda-12.8
export PATH=/usr/local/cuda-12.8/bin:/usr/local/bin:/usr/bin:$PIXI_ENV/bin:$PATH
rm -rf dist/ build/ *.egg-info
python setup.py bdist_wheel
'

Key requirements:

Must build on a GPU node (needs libcuda.so)
Must unset CXX/CC/FLAGS (pixi conda compilers can't handle CUDA)
Must use the target env's Python/torch to match ABI (torch 2.8.0 ≠ torch 2.10.0)
Python 3.12, PyTorch 2.8.0+cu128, CUDA 12.8

Usage in pixi.toml

deep-ep = { url = "https://github.com/ebetica/DeepEP/releases/download/v1.2.2/deep_ep-1.2.1+ab2bb8b-cp312-cp312-linux_x86_64.whl" }
nvidia-nvshmem-cu12 = "==3.5.19"
libnvshmem3 = "==3.5.19"

Pre-built wheel for use in the aa-api pixi environment (PyTorch 2.8.0+cu128, Python 3.12, CUDA 12.8).

Key changes from etongit/DeepEP v1.2.1:

Relocatable RPATH: Uses $ORIGIN/nvidia/nvshmem/lib instead of hardcoded build path, so the wheel works in any environment with nvidia-nvshmem-cu12 installed via pip.
nvshmem dependency: nvidia-nvshmem-cu12>=3.5.19 declared in pyproject.toml so pip pulls the correct version.
NVSHMEM 3.5.19: Required for CoreWeave IB device naming (ibpX instead of mlx5_X). Set NVSHMEM_HCA_PREFIX=ibp at runtime.
Commit 48bd800 adds a cuda device init and sync inside of buffer.py

Build environment

Built on a compute node with GPU access using the aa-api pixi env's Python and PyTorch:

srun --nodes=1 --ntasks=1 --gres=gpu:1 bash -c '
cd /mnt/main0/home/zlin/code/DeepEP
export PIXI_ENV=/mnt/main0/home/zlin/code/evos/aa-api/.pixi/envs/default
unset CXX CC CFLAGS CXXFLAGS CPPFLAGS LDFLAGS
export CUDA_HOME=/usr/local/cuda-12.8
export PATH=/usr/local/cuda-12.8/bin:/usr/local/bin:/usr/bin:$PIXI_ENV/bin:$PATH
rm -rf dist/ build/ *.egg-info
python setup.py bdist_wheel
'

Key requirements:

Must build on a GPU node (needs libcuda.so)
Must unset CXX/CC/FLAGS (pixi conda compilers can't handle CUDA)
Must use the target env's Python/torch to match ABI (torch 2.8.0 ≠ torch 2.10.0)
Python 3.12, PyTorch 2.8.0+cu128, CUDA 12.8

Usage in pixi.toml

deep-ep = { url = "https://github.com/ebetica/DeepEP/releases/download/v1.2.1-fix/deep_ep-1.2.1+bdd0d6f-cp312-cp312-linux_x86_64.whl" }
nvidia-nvshmem-cu12 = "==3.5.19"
libnvshmem3 = "==3.5.19"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

New in this release

Key changes from etongit/DeepEP v1.2.1

Build environment

Usage in pixi.toml

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

New in this release

Key changes from etongit/DeepEP v1.2.1

Build environment

Usage in pixi.toml

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Build environment

Usage in pixi.toml

Uh oh!

Releases: ebetica/DeepEP

DeepEP wheel with comm stream + cuda expert counts

New in this release

Key changes from etongit/DeepEP v1.2.1

Build environment

Usage in pixi.toml

Uh oh!

DeepEP wheel with comm stream + cuda expert counts

New in this release

Key changes from etongit/DeepEP v1.2.1

Build environment

Usage in pixi.toml

Uh oh!

DeepEP wheel with relocatable RPATH

Build environment

Usage in pixi.toml

Uh oh!