Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@amd-eochoalo
Copy link
Contributor

@amd-eochoalo amd-eochoalo commented Nov 4, 2025

Sample run here https://github.com/iree-org/iree/actions/runs/19085734122/job/54525888442

I'm not exactly sure why I had to disable some of the initial tests on GPUs at the moment. They were failing, I made pytest run sequentially and it seems to have stopped it, but maybe it was something else?

@amd-eochoalo amd-eochoalo marked this pull request as ready for review November 5, 2025 00:08
Comment on lines 83 to 88
pytest iree-test-suites/pytorch_ops/ \
-rpfE \
--timeout=30 \
--durations=20 \
--report-log=${LOG_FILE_PATH} \
--config-files=${CONFIG_FILE_PATH}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many tests is this going to run in parallel?

Copy link
Contributor Author

@amd-eochoalo amd-eochoalo Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now not running in parallel. I removed running tests in parallel in here 9c911f7 due to a PkgCI run that failed. This one specifically: https://github.com/iree-org/iree/actions/runs/19083410154/job/54518386104.

I disabled parallelism because I had already disabled some tests that failed but a new different set of tests failed after disabling the originally failing tests. I decided to open this up for review to surface the issue. We may want to first merge the CPU only and then investigate the failures on the GPUs. It may also be a different CI issue altogether.

@@ -0,0 +1,18 @@
{
"config_name": "gpu_hip_rdna3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably call this gpu_hip_gfx1100, since there are also other gfx ip versions under the rdna3 umbrella

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"config_name": "gpu_vulkan",
"iree_compile_flags": [
"--iree-hal-target-device=vulkan",
"--iree-opt-level=O0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these work in -O3? If yes, we probably want to use that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me set up a run with O3. I copied over ONNX's settings. Let me know if changing the ONNX setting would also be a good idea.

- name: amdgpu_hip_rdna3_O3
numprocesses: 1
config-file: onnx_ops_gpu_hip_rdna3_O3.json
runs-on: [Linux, X64, gfx1100]
- name: amdgpu_vulkan_O0
numprocesses: 4
config-file: onnx_ops_gpu_vulkan_O0.json
runs-on: [Linux, X64, rdna3]

I will update this comment after the run has either failed or succeeded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We disabled O3 in onnx because of some regression but we'd prefer to run pytorch in O3 if it works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it worked! Here's the link to the successful test https://github.com/iree-org/iree/actions/runs/19104758812

62c69ed

Comment on lines 37 to 42
- name: amdgpu_hip_rdna3_O3
config-file: torch_ops_gpu_hip_gfx1100_O3.json
runs-on: [Linux, X64, gfx1100]
- name: amdgpu_vulkan_O0
config-file: torch_ops_gpu_vulkan_O3.json
runs-on: [Linux, X64, rdna3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add mi300 also

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should have mi250 also, but okay for now

Copy link
Contributor Author

@amd-eochoalo amd-eochoalo Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edef06d

(I'm running a test pkgCI here https://github.com/iree-org/iree/actions/runs/19139123834)

Please note that once the changes you requested here iree-org/iree-test-suites#128 are merged, I'll just need to revert the last commit.

Rerunning here after disabling failing test https://github.com/iree-org/iree/actions/runs/19140548793 f8b1c7d

@ScottTodd ScottTodd removed their request for review November 6, 2025 16:39
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
repository: iree-org/iree-test-suites
ref: main
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Groverkss as discussed offline, below you have a commit to one of your branches. I have created a PR for your branch here: iree-org/iree-test-suites#127 . I think it is best to have these reference main. That way as iree-test-suites changes, the code pulled in here gets updated automatically. If you prefer to have a ref to a specific commit id, then let me know and I can change this one here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pin to a specific commit (from the main branch) and not main. That allows you to make breaking changes in the other repository and then take time adapting to them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed 5e3554c. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accidently commited a reference to my branch, it should always be a commit reference to main, and also not main as Scott mentioned.

@amd-eochoalo amd-eochoalo merged commit adb3bdb into iree-org:main Nov 7, 2025
44 of 45 checks passed
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Nov 19, 2025
---------

Signed-off-by: Erick Ochoa <[email protected]>
pstarkcdpr pushed a commit to pstarkcdpr/iree that referenced this pull request Nov 28, 2025
---------

Signed-off-by: Erick Ochoa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants