Thanks to visit codestin.com
Credit goes to github.com

Skip to content

adaptation_of_mlu_device_based_on_refactor_xccl_primitive#10647

Merged
fpzh2011 merged 7 commits into
masterfrom
adaptation_of_mlu_device_based_on_refactor_xccl_primitive
Jul 31, 2025
Merged

adaptation_of_mlu_device_based_on_refactor_xccl_primitive#10647
fpzh2011 merged 7 commits into
masterfrom
adaptation_of_mlu_device_based_on_refactor_xccl_primitive

Conversation

@Flowingsun007
Copy link
Copy Markdown
Contributor

No description provided.

@ShawnXuan ShawnXuan requested a review from oneflow-ci-bot June 18, 2025 06:19
@Flowingsun007 Flowingsun007 marked this pull request as ready for review June 24, 2025 03:09
…ve' of github.com:Oneflow-Inc/oneflow into adaptation_of_mlu_device_based_on_refactor_xccl_primitive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 1, 2025

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 1, 2025

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.7ms (= 4367.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.6ms (= 5755.3ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.6ms / 43.7ms)

OneFlow resnet50 time: 26.4ms (= 2635.9ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.2ms (= 3817.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.45 (= 38.2ms / 26.4ms)

OneFlow resnet50 time: 18.8ms (= 3767.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.7ms (= 7147.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.90 (= 35.7ms / 18.8ms)

OneFlow resnet50 time: 17.5ms (= 3504.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 33.3ms (= 6667.9ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.90 (= 33.3ms / 17.5ms)

OneFlow resnet50 time: 17.6ms (= 3518.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.9ms (= 5776.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.64 (= 28.9ms / 17.6ms)

OneFlow swin dataloader time: 0.212s (= 42.469s / 200, num_workers=1)
PyTorch swin dataloader time: 0.129s (= 25.776s / 200, num_workers=1)
Relative speed: 0.607 (= 0.129s / 0.212s)

OneFlow swin dataloader time: 0.054s (= 10.877s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.423s / 200, num_workers=4)
Relative speed: 0.591 (= 0.032s / 0.054s)

OneFlow swin dataloader time: 0.031s (= 6.268s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.321s / 200, num_workers=8)
Relative speed: 0.530 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 49.3ms (= 4932.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.8ms (= 6580.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 65.8ms / 49.3ms)

OneFlow resnet50 time: 37.0ms (= 3702.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 48.2ms (= 4818.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 48.2ms / 37.0ms)

OneFlow resnet50 time: 29.6ms (= 5927.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.3ms (= 8057.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 40.3ms / 29.6ms)

OneFlow resnet50 time: 26.5ms (= 5293.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.3ms (= 8251.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 41.3ms / 26.5ms)

OneFlow resnet50 time: 25.6ms (= 5116.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.9ms (= 7179.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 35.9ms / 25.6ms)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 1, 2025

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

Comment thread cmake/oneflow.cmake Outdated
get_directory_property(EXISTING_DEFS COMPILE_DEFINITIONS)

if(NOT "WITH_DEVICES" IN_LIST EXISTING_DEFS)
add_definitions(-DWITH_DEVICES)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当编译npu/xpu/mlu等设备时,DEVICES_ENABLED即为True,会定义一一个WITH_DEVICES的宏表示多设备支持,后续在代码中方便统一管理。

// before:
#if defined(WITH_CUDA) || defined(WITH_NPU) || defined(WITH_MLU) || defined(WITH_XPU) ...

// after: 
#if defined(WITH_CUDA) || defined(WITH_DEVICES)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 1, 2025

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 1, 2025

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.3ms (= 4329.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.5ms (= 5750.3ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.33 (= 57.5ms / 43.3ms)

OneFlow resnet50 time: 26.6ms (= 2657.4ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.1ms (= 3712.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.40 (= 37.1ms / 26.6ms)

OneFlow resnet50 time: 18.6ms (= 3712.5ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.2ms (= 7037.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.90 (= 35.2ms / 18.6ms)

OneFlow resnet50 time: 17.6ms (= 3520.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.3ms (= 6264.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.78 (= 31.3ms / 17.6ms)

OneFlow resnet50 time: 17.4ms (= 3484.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.6ms (= 5921.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.70 (= 29.6ms / 17.4ms)

OneFlow swin dataloader time: 0.199s (= 39.804s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.570s / 200, num_workers=1)
Relative speed: 0.642 (= 0.128s / 0.199s)

OneFlow swin dataloader time: 0.057s (= 11.474s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.410s / 200, num_workers=4)
Relative speed: 0.559 (= 0.032s / 0.057s)

OneFlow swin dataloader time: 0.030s (= 6.022s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.342s / 200, num_workers=8)
Relative speed: 0.555 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 49.7ms (= 4974.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.1ms (= 6610.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 66.1ms / 49.7ms)

OneFlow resnet50 time: 37.6ms (= 3762.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 45.5ms (= 4550.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 45.5ms / 37.6ms)

OneFlow resnet50 time: 27.9ms (= 5584.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.3ms (= 8051.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.44 (= 40.3ms / 27.9ms)

OneFlow resnet50 time: 25.2ms (= 5034.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.2ms (= 7643.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 38.2ms / 25.2ms)

OneFlow resnet50 time: 24.8ms (= 4956.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.0ms (= 7197.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 36.0ms / 24.8ms)

@fpzh2011 fpzh2011 merged commit f485d7e into master Jul 31, 2025
20 checks passed
@fpzh2011 fpzh2011 deleted the adaptation_of_mlu_device_based_on_refactor_xccl_primitive branch July 31, 2025 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants