Add initial AI api-review configuration #2489

JoelSpeed · 2025-09-19T16:21:31Z

This adds an initial AGENTS.md configuration for how to API review via an AI agent such as claude.

It also implements a /api-review command that can be used locally to review PRs for anyone who has claude installed.

I hope we can get folks using this to self help, but my long term goal is to integrate this into coderabbit or some other review tool that can post the comments directly on the PR.

As an example of the output, see #2488 (comment)

Currently highlights of its instructions:

Explaining valid values
Ensuring markers are documented in the godoc
Identifying cross field validations that aren't enforced

openshift-ci · 2025-09-19T16:21:49Z

Hello @JoelSpeed! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

everettraven

Just some general questions for learning purposes.

Nothing worth blocking this on IMO, especially if you are finding it useful.

everettraven · 2025-09-19T16:23:46Z

.claude/commands/api-review.md

+**Explanation:** [Why this change is needed]
+
+
+I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR.


This looks like something an LLM would spit out when you ask for a review - is it necessary to include this in the instructions?

Ah, this whole section looks like it is something an LLM spit out as an execution plan. Should something like this be hand-rolled with explicit instructions on how to conduct the review and important considerations?

I guess my curiosity here is if we made an explicit guidelines type document that humans could follow, an LLM should be able to follow along relatively easily and we can potentially enforce more nuanced guardrails.

Not worth blocking this on - more so asking questions for myself as I've not worked with LLMs in this capacity before.

This is an artefact of how these documents are built. Using Claude locally, I've given it text based prompts, and it has converted that into instructions that it can read. So 95% of this document is it translating my instructions and feedback into rules that it can later apply.

Hmm. That is interesting. So this file is automatically updated with more detailed instructions by Claude as the AGENTS.md file is updated?

If you update this file by hand to "improve it", what happens?

You can improve it by hand, that's also fine and I have made a few edits here and there. But the bulk was generated, (then pruned - it was more verbose), and tested again to see if it was producing good results.

is it necessary to include this in the instructions?

I've found giving explicit examples, when you want a particular format of output to work quite well. Unfortunately, it seems best with monkey see monkey do style learning in this department :(

I guess my curiosity here is if we made an explicit guidelines type document that humans could follow, an LLM should be able to follow along relatively easily and we can potentially enforce more nuanced guardrails.

I think this should work, and is sort of what the section ### API review in AGENTS.md is, but you would still need something similar to this document for the command to give it a guide on how you want the output, and communication style alongside extra prompts for when it fails to follow the rules...

If you update this file by hand to "improve it", what happens?

I'be found that often, what makes sense to me (as a human, reading the document), the AI will ignore / may degrade performance. Conversing with the model and asking it what seems to be the issue (why didn't you listen to my instructions?) does seem to help - but yeilds different structures to what we'd normally expect e.g for documentation to be consumed by humans.

This does lead into one of the bigger problems with automating problems like this, which is how do you ensure consistent output, and avoid regressions when experimenting with prompts? You've got a probabilistic system you need to build confidence in, and don't want to rely on manual checks or 'vibes'. evals are the way the industry seems to be moving, but the OSS tooling all seems pretty heavyweight for automating internal tasks / reviews.

One approach for API review that may work could be directing the command (or a version of it) to output json:

{ "summary": "…", "issues": [ {"rule": "fieldDocumentation", "msg": "...", "lines": "…"}, {"rule": "optionalFieldBehaviour", "msg": "...", "lines": "…"} ], "meta": {"model": "claude-xxx", "rules_version": "<commit?>"} }

Having sets of real API PRs, where we can write units that Expect the correct issues to be caught. Something like:

golden/: real API doc chunks + your expected issue set (ground truth).
synthetic/: synthetic snippets each targeting exactly one rule. These are your 'unit tests' .

Then you can compare existing rules/prompt to changed, and hopefully catch any changes? There are lots of ways to extend this too e.g mutating synthetic snippets, and still expecting issues to be caught, or using existing reviews (e.g merged PRs with comment chains + changes requested) and an llm as a judge style system.

Either way, its not simple... unfortunately :(

everettraven · 2025-09-19T16:24:55Z

.claude/commands/api-review.md

+
+
+I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR.
+


Do we need to create a separate H1 heading to explain to the LLM that this is the steps for how to actually conduct the review?

Weirdly, I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR. seems to work just fine. I've had instances (e.g on ccapio) where adding headings lead to the agent ignoring them 🤦

It's the sort of thing you want to build something to allow you to test :/

everettraven · 2025-09-19T16:32:56Z

AGENTS.md

+### Testing
+```bash
+make test-unit          # Run unit tests
+make integration        # Run integration tests (in tests/ directory)


Should we teach it how to run the integration tests with more focused arguments?

That way running the integration tests don't take longer than necessary when reviewing a change that only impacts a subset of the APIs/tests?

Yes, that's a good idea, I'll have a go at that

theobarberbany · 2025-09-22T10:16:05Z

/lgtm

As this will be an iterative process of working out what works, this seems like a good place to start.

/hold for focussing integration tests

openshift-ci-robot · 2025-09-22T10:16:26Z

/test remaining-required

Overriding unmatched contexts:
/override ci/prow/e2e-aws-ovn ci/prow/e2e-aws-ovn-hypershift ci/prow/e2e-aws-ovn-hypershift-conformance ci/prow/e2e-aws-ovn-techpreview ci/prow/e2e-aws-serial-1of2 ci/prow/e2e-aws-serial-2of2 ci/prow/e2e-aws-serial-techpreview-1of2 ci/prow/e2e-aws-serial-techpreview-2of2 ci/prow/e2e-azure ci/prow/e2e-gcp ci/prow/e2e-upgrade ci/prow/e2e-upgrade-out-of-change

openshift-ci · 2025-09-22T10:16:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: theobarberbany
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-09-22T10:16:31Z

@openshift-ci-robot: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test build

/test e2e-aws-ovn

/test e2e-aws-ovn-hypershift

/test e2e-aws-ovn-hypershift-conformance

/test e2e-aws-ovn-techpreview

/test e2e-aws-serial-1of2

/test e2e-aws-serial-2of2

/test e2e-aws-serial-techpreview-1of2

/test e2e-aws-serial-techpreview-2of2

/test e2e-azure

/test e2e-gcp

/test e2e-upgrade

/test e2e-upgrade-out-of-change

/test images

/test integration

/test lint

/test minor-e2e-upgrade-minor

/test minor-images

/test okd-scos-images

/test unit

/test verify

/test verify-client-go

/test verify-crd-schema

/test verify-crdify

/test verify-deps

/test verify-feature-promotion

The following commands are available to trigger optional jobs:

/test okd-scos-e2e-aws-ovn

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-api-master-build

pull-ci-openshift-api-master-images

pull-ci-openshift-api-master-integration

pull-ci-openshift-api-master-lint

pull-ci-openshift-api-master-minor-images

pull-ci-openshift-api-master-okd-scos-images

pull-ci-openshift-api-master-unit

pull-ci-openshift-api-master-verify

pull-ci-openshift-api-master-verify-client-go

pull-ci-openshift-api-master-verify-crd-schema

pull-ci-openshift-api-master-verify-crdify

pull-ci-openshift-api-master-verify-deps

pull-ci-openshift-api-master-verify-feature-promotion

In response to this:

/test remaining-required

Overriding unmatched contexts:
/override ci/prow/e2e-aws-ovn ci/prow/e2e-aws-ovn-hypershift ci/prow/e2e-aws-ovn-hypershift-conformance ci/prow/e2e-aws-ovn-techpreview ci/prow/e2e-aws-serial-1of2 ci/prow/e2e-aws-serial-2of2 ci/prow/e2e-aws-serial-techpreview-1of2 ci/prow/e2e-aws-serial-techpreview-2of2 ci/prow/e2e-azure ci/prow/e2e-gcp ci/prow/e2e-upgrade ci/prow/e2e-upgrade-out-of-change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-09-22T10:16:57Z

@openshift-ci-robot: Overrode contexts on behalf of openshift-ci-robot: ci/prow/e2e-aws-ovn, ci/prow/e2e-aws-ovn-hypershift, ci/prow/e2e-aws-ovn-hypershift-conformance, ci/prow/e2e-aws-ovn-techpreview, ci/prow/e2e-aws-serial-1of2, ci/prow/e2e-aws-serial-2of2, ci/prow/e2e-aws-serial-techpreview-1of2, ci/prow/e2e-aws-serial-techpreview-2of2, ci/prow/e2e-azure, ci/prow/e2e-gcp, ci/prow/e2e-upgrade, ci/prow/e2e-upgrade-out-of-change

In response to this:

/test remaining-required

Overriding unmatched contexts:
/override ci/prow/e2e-aws-ovn ci/prow/e2e-aws-ovn-hypershift ci/prow/e2e-aws-ovn-hypershift-conformance ci/prow/e2e-aws-ovn-techpreview ci/prow/e2e-aws-serial-1of2 ci/prow/e2e-aws-serial-2of2 ci/prow/e2e-aws-serial-techpreview-1of2 ci/prow/e2e-aws-serial-techpreview-2of2 ci/prow/e2e-azure ci/prow/e2e-gcp ci/prow/e2e-upgrade ci/prow/e2e-upgrade-out-of-change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-09-22T10:17:32Z

@JoelSpeed: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

JoelSpeed · 2025-09-22T15:21:12Z

Couple of TODOs which I'll leave as a note here, so keep the hold for now:

Determine if we can API review the local changes since master, so folks can run this before they create a PR
Check that if I have existing changes checked out locally, Claude won't throw those away while running the /api-review command

openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 19, 2025

openshift-ci bot requested review from deads2k and everettraven September 19, 2025 16:22

everettraven reviewed Sep 19, 2025

View reviewed changes

JoelSpeed force-pushed the agents branch from 1888450 to edc45f3 Compare September 19, 2025 16:44

Add initial AI api-review configuration

cd2724a

JoelSpeed force-pushed the agents branch from edc45f3 to cd2724a Compare September 19, 2025 16:50

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 22, 2025

openshift-ci bot assigned theobarberbany Sep 22, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 22, 2025

		Explanation: [Why this change is needed]


		I'll run a comprehensive API review for the OpenShift API changes in the specified GitHub PR.

Add initial AI api-review configuration #2489

Are you sure you want to change the base?

Add initial AI api-review configuration #2489

Conversation

JoelSpeed commented Sep 19, 2025

Uh oh!

openshift-ci bot commented Sep 19, 2025

Uh oh!

everettraven left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

everettraven Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theobarberbany Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theobarberbany commented Sep 22, 2025

Uh oh!

openshift-ci-robot commented Sep 22, 2025

Uh oh!

openshift-ci bot commented Sep 22, 2025

Uh oh!

openshift-ci bot commented Sep 22, 2025

Uh oh!

openshift-ci bot commented Sep 22, 2025

Uh oh!

openshift-ci bot commented Sep 22, 2025

Uh oh!

JoelSpeed commented Sep 22, 2025

Uh oh!

Uh oh!

everettraven Sep 19, 2025 •

edited

Loading

theobarberbany Sep 22, 2025 •

edited

Loading