Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[AMD] Add mori-shmem backend support#145

Merged
wenlei-bao merged 18 commits into
ByteDance-Seed:mainfrom
jhchouuu:mori_shmem_support
Jan 27, 2026
Merged

[AMD] Add mori-shmem backend support#145
wenlei-bao merged 18 commits into
ByteDance-Seed:mainfrom
jhchouuu:mori_shmem_support

Conversation

@jhchouuu
Copy link
Copy Markdown
Collaborator

Description

Add mori-shmem backend support for AMD GPUs

Changes

  • Integrated mori-shmem library as a submodule
  • Added backend switching via TRITON_DIST_SHMEM_BACKEND environment variable
  • Added test suite for mori-shmem functionality

Copilot AI review requested due to automatic review settings December 19, 2025 00:46
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Dec 19, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the mori-shmem backend as an alternative to rocshmem for AMD GPU distributed computing. The implementation introduces a new backend selection mechanism via the TRITON_DIST_SHMEM_BACKEND environment variable and integrates the mori library as a git submodule.

  • Adds mori-shmem as a selectable backend alongside rocshmem for AMD GPUs
  • Implements dynamic backend switching through environment variable configuration
  • Provides comprehensive test coverage for mori-shmem APIs

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
scripts/build_mori_shmem.sh Build script for compiling mori library and linking bitcode files for device execution
python/triton_dist/utils.py Core utilities for backend selection, initialization, and version/hash management
python/triton_dist/test/amd/test_mori_shmem_api.py Test suite validating mori-shmem basic operations and device-level APIs
python/triton_dist/language/extra/libshmem_device.py Updated module proxy to route to mori-shmem device library based on backend
python/triton_dist/language/extra/hip/libmori_shmem_device.py Device-level API bindings for mori-shmem operations (my_pe, n_pes, int_p)
python/triton_dist/jit.py JIT compilation integration with backend-specific module initialization
.gitmodules Adds mori library as a git submodule dependency
Comments suppressed due to low confidence (3)

python/triton_dist/test/amd/test_mori_shmem_api.py:28

  • Import of 'time' is not used.
import time

python/triton_dist/test/amd/test_mori_shmem_api.py:29

  • Import of 'shutil' is not used.
import shutil

python/triton_dist/test/amd/test_mori_shmem_api.py:31

  • Import of 'dist' is not used.
import torch.distributed as dist

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/build_mori_shmem.sh Outdated
Comment thread scripts/build_mori_shmem.sh Outdated
Comment thread scripts/build_mori_shmem.sh Outdated
Comment thread python/triton_dist/utils.py Outdated
Comment thread python/triton_dist/utils.py Outdated
Comment thread .gitmodules
Comment thread python/triton_dist/test/amd/test_mori_shmem_api.py Outdated
Comment thread python/triton_dist/test/amd/test_mori_shmem_api.py Outdated
Comment thread python/triton_dist/utils.py Outdated
Comment thread python/triton_dist/utils.py Outdated
Comment thread python/triton_dist/test/amd/test_mori_shmem_api.py
Comment thread scripts/build_mori_shmem.sh
@preminstrel preminstrel self-requested a review December 19, 2025 18:35
Copilot AI review requested due to automatic review settings December 22, 2025 12:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .gitmodules
Copilot AI review requested due to automatic review settings December 26, 2025 06:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/build_mori_shmem.sh Outdated
Comment thread python/triton_dist/utils.py
Comment thread python/triton_dist/test/amd/test_mori_shmem_api.py Outdated
@wenlei-bao wenlei-bao self-requested a review December 29, 2025 19:41


# Simple helper to wrap mori shmem pointer as torch tensor
class MoriShmemBuffer:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of code and below shouldn't be here, can you please make these consistent with other shmem component? @jhchouuu

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, it was just a temporary placement before. Do you prefer to maintain it in triton-dist like rocshmem, or directly in mori_shmem like nvshmem4py?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhchouuu right now we can follow rocshmem, and can be refactor like nvshmem4py later.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we wrap it into mori_shmem similar to nvshmem4py

Comment thread python/triton_dist/test/amd/test_mori_shmem_api.py Outdated
Comment thread python/triton_dist/test/amd/test_mori_shmem_api.py Outdated
Comment thread python/triton_dist/utils.py Outdated
Comment thread scripts/build_mori_shmem.sh


@core.extern
def my_pe(_semantic=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are other shmem APIs ? Ideally it should be consistent compared to other shmem. @jhchouuu

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another branch that doesn't involve mr, we need to discuss the API encapsulation method, as this will add a parameter qp_id for selecting the QP during RDMA communication. Maybe I should talk with @XG-zheng offline?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the CI has encountered a build error due to failure in pulling the submodules of mori. I will fix this issue.

* Feature: add more mori_shmem bitcode wrappers && add mori_shmem bandwith test

* Feature: mori_shmem supoort dl op call && small refactor

* fix ci && move tensor create to mori library

* Refine setup.py
Copilot AI review requested due to automatic review settings January 8, 2026 11:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def get_rocshmem_home():
return os.getenv("ROCSHMEM_HOME",
Path(__file__).parent.parent / "shmem" / "rocshmem_bind" / "rocshmem_build" / "install")
Path(__file__).parent.parent.parent / "shmem" / "rocshmem_bind" / "rocshmem_build" / "install")
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path traverses up three parent directories (parent.parent.parent) which appears incorrect. This changes the rocshmem default path from the original two parents to three, which would break existing rocshmem installations. The original path used Path(__file__).parent.parent, and this should remain unchanged.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path corrected to use three parents - the shmem directory is at project root, not inside python/.

Comment thread lib/Conversion/TritonDistributedToLLVM/AMD/DistributedOpToLLVM.cpp Outdated
Comment thread python/triton_dist/language/extra/hip/libmori_shmem_device.py
Comment thread python/triton_dist/language/extra/hip/libmori_shmem_device.py
Comment thread python/triton_dist/test/amd/test_mori_shmem_bw.py Outdated
Comment thread python/triton_dist/test/amd/test_mori_shmem_bw.py Outdated
Comment thread python/triton_dist/test/amd/test_mori_shmem_bw.py Outdated
Copilot AI review requested due to automatic review settings January 8, 2026 12:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/triton_dist/test/amd/test_mori_shmem_bw.py Outdated
Comment thread python/triton_dist/test/amd/test_mori_shmem_bw.py Outdated
Comment thread lib/Conversion/TritonDistributedToLLVM/AMD/DistributedOpToLLVM.cpp Outdated
Comment thread python/setup.py
Copilot AI review requested due to automatic review settings January 27, 2026 05:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .gitmodules
[submodule "3rdparty/mori"]
path = 3rdparty/mori
url = https://github.com/ROCm/mori.git
branch = jiahzhou/triton_dis_support
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'jiahzhou' to 'jiahzhou'. However, note that this appears to be referencing a development branch which may not be appropriate for production. Consider using a stable release branch instead.

Suggested change
branch = jiahzhou/triton_dis_support

Copilot uses AI. Check for mistakes.
@@ -82,8 +88,8 @@ jobs:
- name: E2E tests
run: |
bash ./scripts/build_e2e_env.sh --download_model
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from Qwen3-32B to Qwen3-0.6B significantly reduces the model size for testing. While this may speed up CI, ensure this smaller model still provides adequate coverage for the test scenarios. Consider documenting why this change was made (e.g., CI resource constraints) in the PR description or commit message.

Suggested change
bash ./scripts/build_e2e_env.sh --download_model
bash ./scripts/build_e2e_env.sh --download_model
# Use the smaller Qwen3-0.6B model in CI to keep AMD E2E tests within runtime and memory limits.
# The larger Qwen3-32B variants below remain commented out until the CI image supports them (e.g., flash-attention).

Copilot uses AI. Check for mistakes.
def get_rocshmem_home():
return os.getenv("ROCSHMEM_HOME",
Path(__file__).parent.parent / "shmem" / "rocshmem_bind" / "rocshmem_build" / "install")
Path(__file__).parent.parent.parent / "shmem" / "rocshmem_bind" / "rocshmem_build" / "install")
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path has been changed from parent.parent to parent.parent.parent, adding an extra level of directory traversal. This breaks the relative path resolution for rocshmem. The original path structure should be maintained to ensure rocshmem can be located correctly.

Suggested change
Path(__file__).parent.parent.parent / "shmem" / "rocshmem_bind" / "rocshmem_build" / "install")
Path(__file__).parent.parent / "shmem" / "rocshmem_bind" / "rocshmem_build" / "install")

Copilot uses AI. Check for mistakes.
return get_rocshmem_version()
elif backend == 'mori_shmem':
return get_mori_version()
return "unknown"
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When neither CUDA nor HIP is available, or when backend detection fails, returning 'unknown' could mask configuration issues. Consider raising an exception or logging a warning to make debugging easier.

Copilot uses AI. Check for mistakes.
return get_rocshmem_hash()
elif backend == 'mori_shmem':
return get_mori_shmem_hash()
return "unknown"
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to get_shmem_version, returning 'unknown' silently when neither CUDA nor HIP is detected could hide configuration problems. Consider raising an exception or logging a warning.

Copilot uses AI. Check for mistakes.
Comment thread python/setup.py
raise RuntimeError(f"Unknown TRITON_DIST_SHMEM_BACKEND: {shmem_backend}. Must be 'mori_shmem' or 'rocshmem'")

# Also build if explicitly requested via env var (for backward compatibility)
if check_env_flag("TRITON_DISTRIBUTED_BUILD_PYROCSHMEM", "0") and shmem_backend != "rocshmem":
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for building rocshmem when TRITON_DISTRIBUTED_BUILD_PYROCSHMEM is set seems redundant since rocshmem is already built based on TRITON_DIST_SHMEM_BACKEND. This could lead to building rocshmem twice in some scenarios. Consider simplifying this logic or clarifying the intended behavior in a comment.

Copilot uses AI. Check for mistakes.
@wenlei-bao wenlei-bao merged commit 1242224 into ByteDance-Seed:main Jan 27, 2026
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants