[FusilliPlugin] Re-Enable build and CI and add CODEOWNERS etc.#2269
[FusilliPlugin] Re-Enable build and CI and add CODEOWNERS etc.#2269AaronStGeorge wants to merge 7 commits into
CODEOWNERS etc.#2269Conversation
c65442e to
376f0dc
Compare
6228026 to
2a1f4cb
Compare
a2561ba to
301a775
Compare
CODEOWNERS etc.
CODEOWNERS etc.CODEOWNERS etc.
272a3b7 to
470079d
Compare
jayhawk-commits
left a comment
There was a problem hiding this comment.
The files outside projects/fusilli-plugin directory look fine to me.
|
@amd-jnovotny Could somebody from docs approve this PR? This adds a new project to rocm-libraries, which includes some .md files so it pulls in the docs team. |
|
|
||
| #===----------------------------------------------------------------------===# | ||
| # | ||
| # This script is copied from fusilli: |
There was a problem hiding this comment.
Since you're downloading fusilli (at a given hash), is it possible to reuse this script from there instead of cloning it?
sjain-stanford
left a comment
There was a problem hiding this comment.
fusilli-plugin/ changes LGTM apart from a non-blocking comment.
There was a problem hiding this comment.
PR 2 ... Note: TheRock CI will still not be enabled. TheRock CI is the main CI platform for rocm-libraries, but a project must build as part of TheRock to participate.
PR N+1 Remove CI added in PR 2 as things will now be tested through TheRock CI.
I'm okay with the end state, but this intermediate state has some details that concern me a bit.
For example, none of the other subprojects have their own dedicated workflows and I don't see this fitting into the existing gardener rotations (https://github.com/ROCm/rocm-libraries/blob/develop/docs/gardening.md) until this uses the same workflow style as other subprojects.
| defaults: | ||
| run: | ||
| # --noprofile skips loading ~/.bash_profile | ||
| # --norc skips loading ~/.bashrc | ||
| # -exo pipefail: | ||
| # -e exit immediately if any command fails | ||
| # -x print each command before executing | ||
| # -o pipefail ensure pipeline fails if any command in it fails | ||
| # {0} github actions placeholder for the command to run | ||
| shell: bash --noprofile --norc -exo pipefail {0} |
There was a problem hiding this comment.
Comments should also explain why this is customized. GitHub Actions has good defaults for bash as the shell anyways, so I'm not sure what this is changing.
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| name: ["gfx942_clang20_debug"] | ||
| include: | ||
| - name: gfx942_clang20_debug |
There was a problem hiding this comment.
Why is this using a matrix with only one element?
| - name: build fusilli-plugin | ||
| # The docker container mounts the working directory as a volume, so we | ||
| # must run from github.workspace to ensure fusilli-plugin can reach into | ||
| # ../sharkfuser for its fusilli dependency. | ||
| working-directory: ${{ github.workspace }} | ||
| run: | | ||
| ${{ env.FUSILLI_PLUGIN_DIR }}/build_tools/docker/exec_docker_ci.sh \ | ||
| bash -c "cd projects/fusilli-plugin && \ | ||
| cmake -GNinja -S. -Bbuild \ | ||
| ${{ matrix.fusilli-plugin-cmake-options }} && \ | ||
| cmake --build build --target all" |
There was a problem hiding this comment.
Requiring Docker specific code with bundled source/dependencies for building is a huge red flag to me.
- Does this project have plans for Windows support?
- What is in the dockerfile?
- Can the entire workflow run in a container instead of just this specific step via a script?
- Can the build run on a CPU machine and tests run on a GPU machine?
| # ROCm requires accesses to the host's /dev/kfd and /dev/dri/* device nodes, typically | ||
| # owned by the `render` and `video` groups. The groups' GIDs in the container must | ||
| # match the host's to access the resources. Sometimes the device nodes may be owned by | ||
| # dynamic GIDs (that don't belong to the `render` or `video` groups). So instead of | ||
| # adding user to the GIDs of named groups (obtained from `getent group render` or | ||
| # `getent group video`), we simply check the owning GID of the device nodes on the host | ||
| # and pass it to `docker run` with `--group-add=<GID>`. | ||
| for DEV in /dev/kfd /dev/dri/*; do |
There was a problem hiding this comment.
Relating to my prior comment, there should be a separation between build and test containers and requirements. Mixing the two risks release builds generating artifacts that are not portable.
| -v "${PWD}":/workspace \ | ||
| ${DOCKER_RUN_DEVICE_OPTS} \ | ||
| --security-opt seccomp=unconfined \ | ||
| ghcr.io/sjain-stanford/compiler-dev-ubuntu-24.04:main@sha256:d52a5eb21ce21509f5fd1064074ba34f7ad8810c5d5c6caff9790149c8e05b3c \ |
There was a problem hiding this comment.
No using dockerfiles from user repositories. This needs to be moved to a shared location if used at all.
There was a problem hiding this comment.
To be stronger on this, this is a red flag with prevents merging this as is.
| # Options | ||
| option(FUSILLI_PLUGIN_USE_LOCAL_FUSILLI "Use local Fusilli build from ../sharkfuser" ON) | ||
| set(FUSILLI_HASH "5970f13835942213c7d4fd6171b936ee9fe9be73" CACHE STRING "Git hash for Fusilli") | ||
| set(HIP_DNN_HASH "4e0a0452cfcb8fdb86e9c40a6e43debab4d4ecbc" CACHE STRING "Git hash for hipDNN") |
There was a problem hiding this comment.
While in this superrepo, this must depend on the source in https://github.com/ROCm/rocm-libraries/tree/develop/projects/hipdnn, not fetch from a fixed commit hash.
There was a problem hiding this comment.
I strongly agree with Scott here. We cannot have another projects that pulls it's own version from source sitting next to it. Both need to work at HEAD.
There was a problem hiding this comment.
Is a
FetchContent_Declare(
hipdnn
SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/../hipdnn
)acceptable? Once fusilli-plugin is fully integrated into TheRock the plan was to use the find_package. IIUC find_package + TheRock dependency providers ensure that the build is using hipDNN built as part of TheRock. This was intermediate state.
There was a problem hiding this comment.
Most (all?) other licenses in rocm-libraries subprojects are MIT. Should we pursue a re-licensing of this project from "Apache v2 with LLVM Exceptions" to MIT?
There was a problem hiding this comment.
This indeed needs to be clarified. Same applies to the workflow file added as part of the PR and licensed under Apache 2.0
There was a problem hiding this comment.
Apache 2.0 W LLVM exceptions is marked "pre-approved for use in AMD projects" on the OSS License List so I think it's fine. That being said, with only three contributors it would be quite easy to re-license so we might want to move towards MIT for consistency.
marbre
left a comment
There was a problem hiding this comment.
I agree with Scott and see (the same) red flags which need to be addressed before this can be merged.
| # Licensed under the Apache License v2.0 with LLVM Exceptions. | ||
| # See https://llvm.org/LICENSE.txt for license information. | ||
| # SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
There was a problem hiding this comment.
Not sure this should be the default license for workflows.
| # Options | ||
| option(FUSILLI_PLUGIN_USE_LOCAL_FUSILLI "Use local Fusilli build from ../sharkfuser" ON) | ||
| set(FUSILLI_HASH "5970f13835942213c7d4fd6171b936ee9fe9be73" CACHE STRING "Git hash for Fusilli") | ||
| set(HIP_DNN_HASH "4e0a0452cfcb8fdb86e9c40a6e43debab4d4ecbc" CACHE STRING "Git hash for hipDNN") |
There was a problem hiding this comment.
I strongly agree with Scott here. We cannot have another projects that pulls it's own version from source sitting next to it. Both need to work at HEAD.
There was a problem hiding this comment.
This indeed needs to be clarified. Same applies to the workflow file added as part of the PR and licensed under Apache 2.0
| -v "${PWD}":/workspace \ | ||
| ${DOCKER_RUN_DEVICE_OPTS} \ | ||
| --security-opt seccomp=unconfined \ | ||
| ghcr.io/sjain-stanford/compiler-dev-ubuntu-24.04:main@sha256:d52a5eb21ce21509f5fd1064074ba34f7ad8810c5d5c6caff9790149c8e05b3c \ |
There was a problem hiding this comment.
To be stronger on this, this is a red flag with prevents merging this as is.
|
@jayhawk-commits Done |
|
@marbre @ScottTodd thanks for the review! Most of the blocking comments seem to be on temporary pieces intended to keep previous (non-TheRock-based) CI during the transition into TheRock. I think the simplest path forward is simply to go without CI while in a transitionary state. Effort that can be put into fixing intermediate state could be better allocated towards getting to the desired end state where fusilli-plugin is part of TheRock build and enabled through TheRock CI. |
|
@AaronStGeorge let's drop the intermediate CI entirely and skip to working towards the end state. No point trying to get to a better intermediate state that is transient / throw-away anyway. Please remove the CI bits from here and let's try to land this soon. |
393764f to
a4cc9b1
Compare
|
RFC0004-Fusilli-IREE-Kernel-Provider-hipDNN decided that fusilli plugin will be hosted in fusilli repo rather than rocm-libraries. |
|
Fusilli removal from rocm-libraries will happen in #2777 |
Motivation
As part of the effort to create IREE/Fusilli based hipDNN plugin in TheRock, this PR enables github action based CI for
fusilli-plugin, adds aCODEOWNERSsection, and updates the label mapping for the PR labeling bot.Technical Details
This PR is part of a series that will move the initial efforts toward bringing and IREE/Fusilli based hipDNN plugin into TheRock (see
RFC0004-Fusilli-IREE-Kernel-Provider-hipDNN.md):fusilli-plugin:shark-ai->rocm-libraries#2252CODEOWNERSetc. #2269 (this PR)Test Plan
CI now runs
fusilli-pluginCI.Test Result
CI looks green!
Submission Checklist