Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@andrew-anyscale
Copy link
Contributor

  • Adds script to take prebuilt wheel image, and extract
  • Ports current wheel build+upload to use prebuilt wheel image

Topic: ci-ray-wheel-wanda
Relative: ray-wheel-wanda

Signed-off-by: andrew [email protected]

* Adds script to take prebuilt wheel image, and extract
* Ports current wheel build+upload to use prebuilt wheel image

Topic: ci-ray-wheel-wanda
Relative: ray-wheel-wanda

Signed-off-by: andrew <[email protected]>
@andrew-anyscale andrew-anyscale requested a review from a team as a code owner December 18, 2025 19:28
@andrew-anyscale
Copy link
Contributor Author

Reviews in this chain:
#59555 [ci] [local] Add wanda definition for ray, ray-cpp whl
 ├#59557 [ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload
 └#59558 [ci] [local] Add wanda definition for ray-image-cpu, cuda

@andrew-anyscale
Copy link
Contributor Author

# head base diff date summary
0 5eea6a7b d5acb37b diff Dec 18 19:28 PM 4 files changed, 580 insertions(+), 3 deletions(-)

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the wheel building and uploading process to use pre-built images from a 'wanda' cache. It introduces a new script, extract_wanda_wheel.py, to pull these images and extract the wheel files. The changes look good and the new script is well-tested, though I have a few suggestions. I've pointed out some significant duplication in the Buildkite pipeline configuration that could be reduced with YAML anchors. I also found a potential issue with error handling in the new extraction script and a flaw in one of the new tests where the mock was doing the work of the code it was supposed to be testing. Addressing these points will improve the maintainability and robustness of the new CI steps.

Comment on lines +63 to +71
elif args[0:2] == ["docker", "cp"]:
# Simulate copying by actually copying the fake wheel
import shutil

for whl in docker_temp.glob("*.whl"):
shutil.copy2(whl, output_dir / whl.name)
result.returncode = 0
result.stdout = ""
result.stderr = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The mock for subprocess.run in test_extract_wheel_success doesn't correctly simulate the behavior of docker cp. The code under test (_extract_wheel) calls docker cp to copy files into a temporary directory (temp_path), but your mock bypasses this and copies the fake wheel file directly into the final output_dir.

This means the logic inside _extract_wheel that is supposed to find the wheel in temp_path and copy it to output_dir is never actually executed or tested. The test passes because the mock is doing the work that the function under test is supposed to do.

The mock for docker cp should copy the file to the destination specified in the docker cp command arguments, which is temp_path in the function under test.

Suggested change
elif args[0:2] == ["docker", "cp"]:
# Simulate copying by actually copying the fake wheel
import shutil
for whl in docker_temp.glob("*.whl"):
shutil.copy2(whl, output_dir / whl.name)
result.returncode = 0
result.stdout = ""
result.stderr = ""
elif args[0:2] == ["docker", "cp"]:
# Simulate copying by actually copying the fake wheel
# to the destination path provided in the command.
import shutil
dest_path = Path(args[3])
for whl in docker_temp.glob("*.whl"):
shutil.copy2(whl, dest_path / whl.name)
result.returncode = 0
result.stdout = ""
result.stderr = ""

Comment on lines +38 to +97
- name: ray-wheel-build
label: "wanda: wheel py{{matrix}} (x86_64)"
wanda: ci/docker/ray-wheel.wanda.yaml
matrix:
- "3.10"
- "3.11"
- "3.12"
- "3.13"
env:
PYTHON_VERSION: "{{matrix}}"
ARCH_SUFFIX: ""
HOSTTYPE: "x86_64"
MANYLINUX_VERSION: "251216.3835fc5"
tags:
- release_wheels
- linux_wheels
- oss
depends_on:
- ray-core-build
- ray-dashboard-build
- ray-java-build

- name: ray-cpp-core-build
label: "wanda: cpp core py{{matrix}} (x86_64)"
wanda: ci/docker/ray-cpp-core.wanda.yaml
matrix:
- "3.10"
- "3.11"
- "3.12"
- "3.13"
env:
PYTHON_VERSION: "{{matrix}}"
ARCH_SUFFIX: ""
HOSTTYPE: "x86_64"
MANYLINUX_VERSION: "251216.3835fc5"
tags:
- release_wheels
- oss
depends_on:
- ray-core-build

- name: ray-cpp-wheel-build
label: "wanda: cpp wheel py{{matrix}} (x86_64)"
wanda: ci/docker/ray-cpp-wheel.wanda.yaml
matrix:
- "3.10"
- "3.11"
- "3.12"
- "3.13"
env:
PYTHON_VERSION: "{{matrix}}"
ARCH_SUFFIX: ""
HOSTTYPE: "x86_64"
MANYLINUX_VERSION: "251216.3835fc5"
tags:
- release_wheels
- oss
depends_on:
- ray-wheel-build
- ray-cpp-core-build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The steps ray-wheel-build, ray-cpp-core-build, and ray-cpp-wheel-build share a lot of configuration, specifically the matrix and env blocks. This duplication can make the file harder to maintain. For example, if you need to add or remove a Python version, you have to do it in three different places.

You can use YAML anchors and aliases to define these blocks once and reuse them, which would make this file more DRY (Don't Repeat Yourself).

Example:

x-python-matrix: &python-matrix
  - "3.10"
  - "3.11"
  - "3.12"
  - "3.13"

x-default-env: &default-env
  PYTHON_VERSION: "{{matrix}}"
  ARCH_SUFFIX: ""
  HOSTTYPE: "x86_64"
  MANYLINUX_VERSION: "251216.3835fc5"

steps:
  # ...
  - name: ray-wheel-build
    # ...
    matrix: *python-matrix
    env: *default-env
    # ...
  - name: ray-cpp-core-build
    # ...
    matrix: *python-matrix
    env: *default-env
    # ...
  - name: ray-cpp-wheel-build
    # ...
    matrix: *python-matrix
    env: *default-env
    # ...

Comment on lines +105 to +108
subprocess.run(
["docker", "rm", container_id],
capture_output=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The subprocess.run call to remove the docker container uses capture_output=True but does not check for errors. If docker rm fails, the error will be suppressed and the script will continue, potentially leaving dangling containers behind. This could consume resources on the build agent over time. It's safer to ensure the command succeeds by adding check=True.

Suggested change
subprocess.run(
["docker", "rm", container_id],
capture_output=True,
)
subprocess.run(
["docker", "rm", container_id],
capture_output=True,
check=True,
)

import shutil

for whl in docker_temp.glob("*.whl"):
shutil.copy2(whl, output_dir / whl.name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Test mock copies to wrong directory, bypassing extraction logic

The test mock for docker cp copies wheel files directly to output_dir, but the actual _extract_wheel function copies container contents to an internal temp_path, then searches that path with temp_path.rglob("*.whl") and copies found wheels to output_dir. Since the mock bypasses temp_path entirely, the production code's glob-and-copy loop never actually executes during the test, meaning the core extraction logic is untested. The test passes only because the mock directly populates output_dir, giving false confidence in code coverage.

Fix in Cursor Fix in Web

wheel_count += 1

if wheel_count == 0:
logger.warning(f" No wheel files found in {image_name}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Script succeeds silently when no wheels are extracted

When _extract_wheel finds no .whl files in the container image, it only logs a warning and continues. Similarly, when main completes with an empty output directory, it logs a warning but exits successfully with code 0. This could cause silent CI failures where the subsequent copy_build_artifacts.sh wheel step runs against an empty .whl directory, potentially uploading nothing without failing the build.

Additional Locations (1)

Fix in Cursor Fix in Web

@ray-gardener ray-gardener bot added the devprod label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants