[ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload #59557

andrew-anyscale · 2025-12-18T19:28:03Z

Adds script to take prebuilt wheel image, and extract
Ports current wheel build+upload to use prebuilt wheel image

Topic: ci-ray-wheel-wanda
Relative: ray-wheel-wanda

* Adds script to take prebuilt wheel image, and extract * Ports current wheel build+upload to use prebuilt wheel image Topic: ci-ray-wheel-wanda Relative: ray-wheel-wanda Signed-off-by: andrew <[email protected]>

andrew-anyscale · 2025-12-18T19:28:05Z

Reviews in this chain:
└#59555 [ci] [local] Add wanda definition for ray, ray-cpp whl
　├#59557 [ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload
　└#59558 [ci] [local] Add wanda definition for ray-image-cpu, cuda

andrew-anyscale · 2025-12-18T19:28:05Z

#	head	base	diff	date	summary
0	5eea6a7b	d5acb37b	diff	Dec 18 19:28 PM	4 files changed, 580 insertions(+), 3 deletions(-)

gemini-code-assist

Code Review

This pull request refactors the wheel building and uploading process to use pre-built images from a 'wanda' cache. It introduces a new script, extract_wanda_wheel.py, to pull these images and extract the wheel files. The changes look good and the new script is well-tested, though I have a few suggestions. I've pointed out some significant duplication in the Buildkite pipeline configuration that could be reduced with YAML anchors. I also found a potential issue with error handling in the new extraction script and a flaw in one of the new tests where the mock was doing the work of the code it was supposed to be testing. Addressing these points will improve the maintainability and robustness of the new CI steps.

gemini-code-assist · 2025-12-18T19:30:02Z

ci/ray_ci/automation/test_extract_wanda_wheel.py

+                elif args[0:2] == ["docker", "cp"]:
+                    # Simulate copying by actually copying the fake wheel
+                    import shutil
+
+                    for whl in docker_temp.glob("*.whl"):
+                        shutil.copy2(whl, output_dir / whl.name)
+                    result.returncode = 0
+                    result.stdout = ""
+                    result.stderr = ""


The mock for subprocess.run in test_extract_wheel_success doesn't correctly simulate the behavior of docker cp. The code under test (_extract_wheel) calls docker cp to copy files into a temporary directory (temp_path), but your mock bypasses this and copies the fake wheel file directly into the final output_dir.

This means the logic inside _extract_wheel that is supposed to find the wheel in temp_path and copy it to output_dir is never actually executed or tested. The test passes because the mock is doing the work that the function under test is supposed to do.

The mock for docker cp should copy the file to the destination specified in the docker cp command arguments, which is temp_path in the function under test.

Suggested change

elif args[0:2] == ["docker", "cp"]:

# Simulate copying by actually copying the fake wheel

import shutil

for whl in docker_temp.glob("*.whl"):

shutil.copy2(whl, output_dir / whl.name)

result.returncode = 0

result.stdout = ""

result.stderr = ""

elif args[0:2] == ["docker", "cp"]:

# Simulate copying by actually copying the fake wheel

# to the destination path provided in the command.

import shutil

dest_path = Path(args[3])

for whl in docker_temp.glob("*.whl"):

shutil.copy2(whl, dest_path / whl.name)

result.returncode = 0

result.stdout = ""

result.stderr = ""

gemini-code-assist · 2025-12-18T19:30:02Z

.buildkite/build.rayci.yml

+  - name: ray-wheel-build
+    label: "wanda: wheel py{{matrix}} (x86_64)"
+    wanda: ci/docker/ray-wheel.wanda.yaml
+    matrix:
+      - "3.10"
+      - "3.11"
+      - "3.12"
+      - "3.13"
+    env:
+      PYTHON_VERSION: "{{matrix}}"
+      ARCH_SUFFIX: ""
+      HOSTTYPE: "x86_64"
+      MANYLINUX_VERSION: "251216.3835fc5"
+    tags:
+      - release_wheels
+      - linux_wheels
+      - oss
+    depends_on:
+      - ray-core-build
+      - ray-dashboard-build
+      - ray-java-build
+
+  - name: ray-cpp-core-build
+    label: "wanda: cpp core py{{matrix}} (x86_64)"
+    wanda: ci/docker/ray-cpp-core.wanda.yaml
+    matrix:
+      - "3.10"
+      - "3.11"
+      - "3.12"
+      - "3.13"
+    env:
+      PYTHON_VERSION: "{{matrix}}"
+      ARCH_SUFFIX: ""
+      HOSTTYPE: "x86_64"
+      MANYLINUX_VERSION: "251216.3835fc5"
+    tags:
+      - release_wheels
+      - oss
+    depends_on:
+      - ray-core-build
+
+  - name: ray-cpp-wheel-build
+    label: "wanda: cpp wheel py{{matrix}} (x86_64)"
+    wanda: ci/docker/ray-cpp-wheel.wanda.yaml
+    matrix:
+      - "3.10"
+      - "3.11"
+      - "3.12"
+      - "3.13"
+    env:
+      PYTHON_VERSION: "{{matrix}}"
+      ARCH_SUFFIX: ""
+      HOSTTYPE: "x86_64"
+      MANYLINUX_VERSION: "251216.3835fc5"
+    tags:
+      - release_wheels
+      - oss
+    depends_on:
+      - ray-wheel-build
+      - ray-cpp-core-build


The steps ray-wheel-build, ray-cpp-core-build, and ray-cpp-wheel-build share a lot of configuration, specifically the matrix and env blocks. This duplication can make the file harder to maintain. For example, if you need to add or remove a Python version, you have to do it in three different places.

You can use YAML anchors and aliases to define these blocks once and reuse them, which would make this file more DRY (Don't Repeat Yourself).

Example:

x-python-matrix: &python-matrix - "3.10" - "3.11" - "3.12" - "3.13" x-default-env: &default-env PYTHON_VERSION: "{{matrix}}" ARCH_SUFFIX: "" HOSTTYPE: "x86_64" MANYLINUX_VERSION: "251216.3835fc5" steps: # ... - name: ray-wheel-build # ... matrix: *python-matrix env: *default-env # ... - name: ray-cpp-core-build # ... matrix: *python-matrix env: *default-env # ... - name: ray-cpp-wheel-build # ... matrix: *python-matrix env: *default-env # ...

gemini-code-assist · 2025-12-18T19:30:02Z

ci/ray_ci/automation/extract_wanda_wheel.py

+        subprocess.run(
+            ["docker", "rm", container_id],
+            capture_output=True,
+        )


The subprocess.run call to remove the docker container uses capture_output=True but does not check for errors. If docker rm fails, the error will be suppressed and the script will continue, potentially leaving dangling containers behind. This could consume resources on the build agent over time. It's safer to ensure the command succeeds by adding check=True.

Suggested change

subprocess.run(

["docker", "rm", container_id],

capture_output=True,

)

subprocess.run(

["docker", "rm", container_id],

capture_output=True,

check=True,

)

cursor · 2025-12-18T19:38:33Z

ci/ray_ci/automation/test_extract_wanda_wheel.py

+                    import shutil
+
+                    for whl in docker_temp.glob("*.whl"):
+                        shutil.copy2(whl, output_dir / whl.name)


Bug: Test mock copies to wrong directory, bypassing extraction logic

The test mock for docker cp copies wheel files directly to output_dir, but the actual _extract_wheel function copies container contents to an internal temp_path, then searches that path with temp_path.rglob("*.whl") and copies found wheels to output_dir. Since the mock bypasses temp_path entirely, the production code's glob-and-copy loop never actually executes during the test, meaning the core extraction logic is untested. The test passes only because the mock directly populates output_dir, giving false confidence in code coverage.

cursor · 2025-12-18T19:38:33Z

ci/ray_ci/automation/extract_wanda_wheel.py

+                wheel_count += 1
+
+            if wheel_count == 0:
+                logger.warning(f"  No wheel files found in {image_name}")


Bug: Script succeeds silently when no wheels are extracted

When _extract_wheel finds no .whl files in the container image, it only logs a warning and continues. Similarly, when main completes with an empty output directory, it logs a warning but exits successfully with code 0. This could cause silent CI failures where the subsequent copy_build_artifacts.sh wheel step runs against an empty .whl directory, potentially uploading nothing without failing the build.

Additional Locations (1)

ci/ray_ci/automation/extract_wanda_wheel.py#L218-L220

[ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload

5eea6a7

* Adds script to take prebuilt wheel image, and extract * Ports current wheel build+upload to use prebuilt wheel image Topic: ci-ray-wheel-wanda Relative: ray-wheel-wanda Signed-off-by: andrew <[email protected]>

andrew-anyscale requested a review from a team as a code owner December 18, 2025 19:28

andrew-anyscale mentioned this pull request Dec 18, 2025

[ci] [local] Add wanda definition for ray-image-cpu, cuda #59558

Open

andrew-anyscale added the ci label Dec 18, 2025

andrew-anyscale mentioned this pull request Dec 18, 2025

[ci] [local] Add wanda definition for ray, ray-cpp whl #59555

Open

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

cursor bot reviewed Dec 18, 2025

View reviewed changes

ray-gardener bot added the devprod label Dec 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload #59557

[ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload #59557

Uh oh!

andrew-anyscale commented Dec 18, 2025

Uh oh!

andrew-anyscale commented Dec 18, 2025

Uh oh!

andrew-anyscale commented Dec 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 18, 2025

Uh oh!

gemini-code-assist bot Dec 18, 2025

Uh oh!

gemini-code-assist bot Dec 18, 2025

Uh oh!

cursor bot Dec 18, 2025

Uh oh!

cursor bot Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload #59557

Are you sure you want to change the base?

[ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload #59557

Uh oh!

Conversation

andrew-anyscale commented Dec 18, 2025

Uh oh!

andrew-anyscale commented Dec 18, 2025

Uh oh!

andrew-anyscale commented Dec 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Dec 18, 2025

Choose a reason for hiding this comment

Bug: Test mock copies to wrong directory, bypassing extraction logic

Uh oh!

cursor bot Dec 18, 2025

Choose a reason for hiding this comment

Bug: Script succeeds silently when no wheels are extracted

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants