-
Notifications
You must be signed in to change notification settings - Fork 565
Add initial AI api-review configuration #2489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hello @JoelSpeed! Some important instructions when contributing to openshift/api: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some general questions for learning purposes.
Nothing worth blocking this on IMO, especially if you are finding it useful.
**Explanation:** [Why this change is needed] | ||
|
||
|
||
I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like something an LLM would spit out when you ask for a review - is it necessary to include this in the instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this whole section looks like it is something an LLM spit out as an execution plan. Should something like this be hand-rolled with explicit instructions on how to conduct the review and important considerations?
I guess my curiosity here is if we made an explicit guidelines type document that humans could follow, an LLM should be able to follow along relatively easily and we can potentially enforce more nuanced guardrails.
Not worth blocking this on - more so asking questions for myself as I've not worked with LLMs in this capacity before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an artefact of how these documents are built. Using Claude locally, I've given it text based prompts, and it has converted that into instructions that it can read. So 95% of this document is it translating my instructions and feedback into rules that it can later apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. That is interesting. So this file is automatically updated with more detailed instructions by Claude as the AGENTS.md file is updated?
If you update this file by hand to "improve it", what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can improve it by hand, that's also fine and I have made a few edits here and there. But the bulk was generated, (then pruned - it was more verbose), and tested again to see if it was producing good results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to include this in the instructions?
I've found giving explicit examples, when you want a particular format of output to work quite well. Unfortunately, it seems best with monkey see monkey do style learning in this department :(
I guess my curiosity here is if we made an explicit guidelines type document that humans could follow, an LLM should be able to follow along relatively easily and we can potentially enforce more nuanced guardrails.
I think this should work, and is sort of what the section ### API review
in AGENTS.md
is, but you would still need something similar to this document for the command to give it a guide on how you want the output, and communication style alongside extra prompts for when it fails to follow the rules...
If you update this file by hand to "improve it", what happens?
I'be found that often, what makes sense to me (as a human, reading the document), the AI will ignore / may degrade performance. Conversing with the model and asking it what seems to be the issue (why didn't you listen to my instructions?) does seem to help - but yeilds different structures to what we'd normally expect e.g for documentation to be consumed by humans.
This does lead into one of the bigger problems with automating problems like this, which is how do you ensure consistent output, and avoid regressions when experimenting with prompts? You've got a probabilistic system you need to build confidence in, and don't want to rely on manual checks or 'vibes'. evals
are the way the industry seems to be moving, but the OSS tooling all seems pretty heavyweight for automating internal tasks / reviews.
One approach for API review that may work could be directing the command (or a version of it) to output json:
{
"summary": "…",
"issues": [
{"rule": "fieldDocumentation", "msg": "...", "lines": "…"},
{"rule": "optionalFieldBehaviour", "msg": "...", "lines": "…"}
],
"meta": {"model": "claude-xxx", "rules_version": "<commit?>"}
}
Having sets of real API PRs, where we can write units that Expect
the correct issues to be caught. Something like:
golden/
: real API doc chunks + your expected issue set (ground truth).
synthetic/
: synthetic snippets each targeting exactly one rule. These are your 'unit tests' .
Then you can compare existing rules/prompt to changed, and hopefully catch any changes? There are lots of ways to extend this too e.g mutating synthetic snippets, and still expecting issues to be caught, or using existing reviews (e.g merged PRs with comment chains + changes requested) and an llm as a judge style system.
Either way, its not simple... unfortunately :(
|
||
|
||
I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create a separate H1 heading to explain to the LLM that this is the steps for how to actually conduct the review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly, I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR.
seems to work just fine. I've had instances (e.g on ccapio) where adding headings lead to the agent ignoring them 🤦
It's the sort of thing you want to build something to allow you to test :/
### Testing | ||
```bash | ||
make test-unit # Run unit tests | ||
make integration # Run integration tests (in tests/ directory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we teach it how to run the integration tests with more focused arguments?
That way running the integration tests don't take longer than necessary when reviewing a change that only impacts a subset of the APIs/tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good idea, I'll have a go at that
/lgtm As this will be an iterative process of working out what works, this seems like a good place to start. /hold for focussing integration tests |
/test remaining-required Overriding unmatched contexts: |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: theobarberbany The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@openshift-ci-robot: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@openshift-ci-robot: Overrode contexts on behalf of openshift-ci-robot: ci/prow/e2e-aws-ovn, ci/prow/e2e-aws-ovn-hypershift, ci/prow/e2e-aws-ovn-hypershift-conformance, ci/prow/e2e-aws-ovn-techpreview, ci/prow/e2e-aws-serial-1of2, ci/prow/e2e-aws-serial-2of2, ci/prow/e2e-aws-serial-techpreview-1of2, ci/prow/e2e-aws-serial-techpreview-2of2, ci/prow/e2e-azure, ci/prow/e2e-gcp, ci/prow/e2e-upgrade, ci/prow/e2e-upgrade-out-of-change In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@JoelSpeed: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Couple of TODOs which I'll leave as a note here, so keep the hold for now:
|
This adds an initial AGENTS.md configuration for how to API review via an AI agent such as claude.
It also implements a
/api-review
command that can be used locally to review PRs for anyone who has claude installed.I hope we can get folks using this to self help, but my long term goal is to integrate this into coderabbit or some other review tool that can post the comments directly on the PR.
As an example of the output, see #2488 (comment)
Currently highlights of its instructions: