Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Transformer] Manipulate NCCL_NET for hybrid IB/Socket setups#1620

Merged
crcrpar merged 30 commits into
NVIDIA:masterfrom
Aidyn-A:transformer_nccl_socket_for_p2p
Apr 19, 2023
Merged

[Transformer] Manipulate NCCL_NET for hybrid IB/Socket setups#1620
crcrpar merged 30 commits into
NVIDIA:masterfrom
Aidyn-A:transformer_nccl_socket_for_p2p

Conversation

@Aidyn-A
Copy link
Copy Markdown
Collaborator

@Aidyn-A Aidyn-A commented Mar 17, 2023

No description provided.

Comment thread apex/transformer/parallel_state.py Outdated
Comment thread apex/transformer/parallel_state.py Outdated
Comment thread apex/transformer/parallel_state.py Outdated
@Aidyn-A Aidyn-A changed the title [WIP] [Transformer] Manipulate NCCL_NET [Transformer] Manipulate NCCL_NET for hybrid IB/Socket setups Apr 13, 2023
Comment thread apex/transformer/parallel_state.py Outdated
Comment thread apex/transformer/parallel_state.py Outdated
Comment thread apex/transformer/parallel_state.py
@crcrpar crcrpar added this to the 23.05 milestone Apr 18, 2023
Comment thread apex/transformer/parallel_state.py
@crcrpar crcrpar merged commit 817e818 into NVIDIA:master Apr 19, 2023
yuanzhedong pushed a commit to yuanzhedong/apex that referenced this pull request Jul 14, 2023
…#1620)

* set NCCL Socket for p2p

* clean up

* refactor parallel_state.py

* update parallel_state

* fix minor bug

* fix minor bug 2

* Update parallel_state.py

* introduce NET_BLOCK_SIZE

* fix small bug

* Better cross-block check

* some cleanups

* more cleanups

* more cleanups

* more cleanup

* swap lines

* update else

* minor fix

* default to NCCL

* apply suggestions and add more description

* revert one line change

* move default_nccl_net down

* forgot one line

* remove unnecessary spaces

* apply nit-picking

* raise RuntimeError if NCCL_SOCKET_IFNAME missing

* Revert HAS_UCC

* Add missing space
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants