I have two computer, when i do ifconfig, then one gives
‘eno1
lo’
2nd computer gives
‘eth0
lo’
What do I specify in nccl_socket_ifname on both nodes, do I specify lo on both?
I receive error like
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:410, unhandled system error, NCCL version 2.4.8
and
connection reset by peer