-
Notifications
You must be signed in to change notification settings - Fork 7k
[ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload #59557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: andrew/revup/master/ray-wheel-wanda
Are you sure you want to change the base?
[ci] add ray-cpp-wheel-build, ray-wheel-build to build+upload #59557
Conversation
* Adds script to take prebuilt wheel image, and extract * Ports current wheel build+upload to use prebuilt wheel image Topic: ci-ray-wheel-wanda Relative: ray-wheel-wanda Signed-off-by: andrew <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the wheel building and uploading process to use pre-built images from a 'wanda' cache. It introduces a new script, extract_wanda_wheel.py, to pull these images and extract the wheel files. The changes look good and the new script is well-tested, though I have a few suggestions. I've pointed out some significant duplication in the Buildkite pipeline configuration that could be reduced with YAML anchors. I also found a potential issue with error handling in the new extraction script and a flaw in one of the new tests where the mock was doing the work of the code it was supposed to be testing. Addressing these points will improve the maintainability and robustness of the new CI steps.
| elif args[0:2] == ["docker", "cp"]: | ||
| # Simulate copying by actually copying the fake wheel | ||
| import shutil | ||
|
|
||
| for whl in docker_temp.glob("*.whl"): | ||
| shutil.copy2(whl, output_dir / whl.name) | ||
| result.returncode = 0 | ||
| result.stdout = "" | ||
| result.stderr = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mock for subprocess.run in test_extract_wheel_success doesn't correctly simulate the behavior of docker cp. The code under test (_extract_wheel) calls docker cp to copy files into a temporary directory (temp_path), but your mock bypasses this and copies the fake wheel file directly into the final output_dir.
This means the logic inside _extract_wheel that is supposed to find the wheel in temp_path and copy it to output_dir is never actually executed or tested. The test passes because the mock is doing the work that the function under test is supposed to do.
The mock for docker cp should copy the file to the destination specified in the docker cp command arguments, which is temp_path in the function under test.
| elif args[0:2] == ["docker", "cp"]: | |
| # Simulate copying by actually copying the fake wheel | |
| import shutil | |
| for whl in docker_temp.glob("*.whl"): | |
| shutil.copy2(whl, output_dir / whl.name) | |
| result.returncode = 0 | |
| result.stdout = "" | |
| result.stderr = "" | |
| elif args[0:2] == ["docker", "cp"]: | |
| # Simulate copying by actually copying the fake wheel | |
| # to the destination path provided in the command. | |
| import shutil | |
| dest_path = Path(args[3]) | |
| for whl in docker_temp.glob("*.whl"): | |
| shutil.copy2(whl, dest_path / whl.name) | |
| result.returncode = 0 | |
| result.stdout = "" | |
| result.stderr = "" |
| - name: ray-wheel-build | ||
| label: "wanda: wheel py{{matrix}} (x86_64)" | ||
| wanda: ci/docker/ray-wheel.wanda.yaml | ||
| matrix: | ||
| - "3.10" | ||
| - "3.11" | ||
| - "3.12" | ||
| - "3.13" | ||
| env: | ||
| PYTHON_VERSION: "{{matrix}}" | ||
| ARCH_SUFFIX: "" | ||
| HOSTTYPE: "x86_64" | ||
| MANYLINUX_VERSION: "251216.3835fc5" | ||
| tags: | ||
| - release_wheels | ||
| - linux_wheels | ||
| - oss | ||
| depends_on: | ||
| - ray-core-build | ||
| - ray-dashboard-build | ||
| - ray-java-build | ||
|
|
||
| - name: ray-cpp-core-build | ||
| label: "wanda: cpp core py{{matrix}} (x86_64)" | ||
| wanda: ci/docker/ray-cpp-core.wanda.yaml | ||
| matrix: | ||
| - "3.10" | ||
| - "3.11" | ||
| - "3.12" | ||
| - "3.13" | ||
| env: | ||
| PYTHON_VERSION: "{{matrix}}" | ||
| ARCH_SUFFIX: "" | ||
| HOSTTYPE: "x86_64" | ||
| MANYLINUX_VERSION: "251216.3835fc5" | ||
| tags: | ||
| - release_wheels | ||
| - oss | ||
| depends_on: | ||
| - ray-core-build | ||
|
|
||
| - name: ray-cpp-wheel-build | ||
| label: "wanda: cpp wheel py{{matrix}} (x86_64)" | ||
| wanda: ci/docker/ray-cpp-wheel.wanda.yaml | ||
| matrix: | ||
| - "3.10" | ||
| - "3.11" | ||
| - "3.12" | ||
| - "3.13" | ||
| env: | ||
| PYTHON_VERSION: "{{matrix}}" | ||
| ARCH_SUFFIX: "" | ||
| HOSTTYPE: "x86_64" | ||
| MANYLINUX_VERSION: "251216.3835fc5" | ||
| tags: | ||
| - release_wheels | ||
| - oss | ||
| depends_on: | ||
| - ray-wheel-build | ||
| - ray-cpp-core-build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The steps ray-wheel-build, ray-cpp-core-build, and ray-cpp-wheel-build share a lot of configuration, specifically the matrix and env blocks. This duplication can make the file harder to maintain. For example, if you need to add or remove a Python version, you have to do it in three different places.
You can use YAML anchors and aliases to define these blocks once and reuse them, which would make this file more DRY (Don't Repeat Yourself).
Example:
x-python-matrix: &python-matrix
- "3.10"
- "3.11"
- "3.12"
- "3.13"
x-default-env: &default-env
PYTHON_VERSION: "{{matrix}}"
ARCH_SUFFIX: ""
HOSTTYPE: "x86_64"
MANYLINUX_VERSION: "251216.3835fc5"
steps:
# ...
- name: ray-wheel-build
# ...
matrix: *python-matrix
env: *default-env
# ...
- name: ray-cpp-core-build
# ...
matrix: *python-matrix
env: *default-env
# ...
- name: ray-cpp-wheel-build
# ...
matrix: *python-matrix
env: *default-env
# ...| subprocess.run( | ||
| ["docker", "rm", container_id], | ||
| capture_output=True, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subprocess.run call to remove the docker container uses capture_output=True but does not check for errors. If docker rm fails, the error will be suppressed and the script will continue, potentially leaving dangling containers behind. This could consume resources on the build agent over time. It's safer to ensure the command succeeds by adding check=True.
| subprocess.run( | |
| ["docker", "rm", container_id], | |
| capture_output=True, | |
| ) | |
| subprocess.run( | |
| ["docker", "rm", container_id], | |
| capture_output=True, | |
| check=True, | |
| ) |
| import shutil | ||
|
|
||
| for whl in docker_temp.glob("*.whl"): | ||
| shutil.copy2(whl, output_dir / whl.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Test mock copies to wrong directory, bypassing extraction logic
The test mock for docker cp copies wheel files directly to output_dir, but the actual _extract_wheel function copies container contents to an internal temp_path, then searches that path with temp_path.rglob("*.whl") and copies found wheels to output_dir. Since the mock bypasses temp_path entirely, the production code's glob-and-copy loop never actually executes during the test, meaning the core extraction logic is untested. The test passes only because the mock directly populates output_dir, giving false confidence in code coverage.
| wheel_count += 1 | ||
|
|
||
| if wheel_count == 0: | ||
| logger.warning(f" No wheel files found in {image_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Script succeeds silently when no wheels are extracted
When _extract_wheel finds no .whl files in the container image, it only logs a warning and continues. Similarly, when main completes with an empty output directory, it logs a warning but exits successfully with code 0. This could cause silent CI failures where the subsequent copy_build_artifacts.sh wheel step runs against an empty .whl directory, potentially uploading nothing without failing the build.
Topic: ci-ray-wheel-wanda
Relative: ray-wheel-wanda
Signed-off-by: andrew [email protected]