Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Prototype version 2 to support NCCL zero copy #112241

Closed
minsii wants to merge 1 commit into
pytorch:mainfrom
minsii:nccl-zero-copy-v2
Closed

Prototype version 2 to support NCCL zero copy #112241
minsii wants to merge 1 commit into
pytorch:mainfrom
minsii:nccl-zero-copy-v2

Conversation

@minsii
Copy link
Copy Markdown
Contributor

@minsii minsii commented Oct 27, 2023

Manually port from D50726970 stack for OSS testing, not for review

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Oct 27, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112241

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit f35b398 with merge base 7f143d7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added the release notes: distributed (c10d) release notes category label Oct 27, 2023
Comment thread c10/cuda/CUDACachingAllocator.cpp Outdated

void attachAllocatorTraceTracker(AllocatorTraceTracker tracker) {
enable_trace_tracker_ = true;
trace_trackers_.emplace_back(tracker);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
trace_trackers_.emplace_back(tracker);
trace_trackers_.emplace_back(std::move(tracker));

@minsii minsii force-pushed the nccl-zero-copy-v2 branch from ce3fff9 to 6379707 Compare October 30, 2023 06:20
@minsii minsii marked this pull request as draft October 30, 2023 06:20
@minsii
Copy link
Copy Markdown
Contributor Author

minsii commented Oct 30, 2023

The cache allocator commit is reviewed via #112238

The intention of this PR is to test NCCL PG level change together with the cache allocator commit.

@minsii minsii force-pushed the nccl-zero-copy-v2 branch 3 times, most recently from 0932427 to bdb1a2e Compare November 2, 2023 19:29
Summary:
We need to register all cache segments allocated by allocator, so that NCCL can apply zero copy algorithms at collective and point-to-point operations.

How to track and register all cache segments:
- It registers a register and a deregister hook to cache allocator as action tracker callbacks, tracking SEGMENT_ALLOC and SEGMENT_FREE trace entries, respectively. When SEGMENT_ALLOC is tracked, the register hook will register to the PG's communicators on the same device. Similarly, when SEGMENT_FREE is tracked, the deregister hook handles deregistration before cudaFree.
- When a new NCCL communicator is created, it dumps the snapspot from cache allocator to register all existing cache segments at once.
- When a NCCL communicator is aborted, it deregisters all segments that have been registered by this communicator

Test Plan: See test in D50726971

Reviewed By: wconstab

Differential Revision: D50726970
@minsii minsii force-pushed the nccl-zero-copy-v2 branch from bdb1a2e to f35b398 Compare November 3, 2023 16:33
@minsii
Copy link
Copy Markdown
Contributor Author

minsii commented Nov 3, 2023

Close this prototype PR as the final version #112850 has exported

@minsii minsii closed this Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants