Deterministic LLM testing in your CI/CD pipeline. This action evaluates recorded LLM outputs against defined contracts and fails PRs when violations are detected.
- Zero network calls - Tests run on recorded fixtures
- Rich reporting - HTML, JUnit, JSON output formats
- PR comments - Automatic violation summaries
- Budget tracking - Cost and latency monitoring
- Flexible checks - JSON schema, regex, numeric bounds, string contains/equals, list/set equality, file diff, custom functions
-
Regression fail PR · Cost gate PR · Assertion fail PR
[Links to live PRs and GIFs to be inserted after publishing]
Below are example screenshots of the HTML report generated by this action.
|
Before |
After |
name: PromptProof
on: [pull_request]
jobs:
eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: geminimir/promptproof-action@v0
with:
config: promptproof.yaml
baseline-ref: origin/main
runs: 3
seed: 1337
max-run-cost: 2.50
report-artifact: promptproof-report
mode: gate| Input | Description | Default |
|---|---|---|
config |
Path to promptproof.yaml | promptproof.yaml |
baseline-ref |
Git ref to load baseline snapshot from (e.g., origin/main) |
|
runs |
Number of runs for flake control | |
seed |
Seed for flake control determinism | |
max-run-cost |
Maximum total cost for this run (USD) | |
report-artifact |
Name of uploaded report artifact | promptproof-report |
mode |
gate (fail) or report-only (warn). Defaults to config. |
|
format |
Output format (html |
junit |
regress |
Also compare to local baseline | false |
node-version |
Node.js version | 20 |
snapshot-on-success |
Create snapshot after successful run | false |
snapshot-promote-on-main |
Promote snapshot to baseline on main | false |
snapshot-tag |
Optional snapshot tag |
| Output | Description |
|---|---|
violations |
Number of violations found |
passed |
Number of fixtures that passed |
failed |
Number of fixtures that failed |
failed-tests |
Alias for failed |
total-cost |
Total cost (USD) of this evaluation |
regressions |
New failures vs baseline (when regression comparison is enabled) |
report-path |
Path to generated report |
Create a promptproof.yaml file in your repository:
schema_version: pp.v1
fixtures: fixtures/outputs.jsonl
checks:
- id: no_pii
type: regex_forbidden
target: text
patterns:
- "[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}"
budgets:
cost_usd_per_run_max: 0.50
latency_ms_p95_max: 2000
mode: failWhen using format: sarif, ensure your workflow grants Code Scanning upload permissions:
permissions:
contents: read
security-events: write- uses: geminimir/promptproof-action@v0
with:
config: promptproof.yaml
baseline-ref: origin/main # pull last green snapshot from main
regress: true # also compare with any local baseline
runs: 5 # flake control
seed: 42 # deterministic nondeterminism
max-run-cost: 1.75 # cost gate for the entire suite
format: junit # emit JUnit XML for test tab
mode: gate # fail on violations- uses: geminimir/promptproof-action@v0
with:
config: promptproof.yaml
format: sarif- uses: geminimir/promptproof-action@v0
with:
config: promptproof.yaml
max-run-cost: 1.00
mode: report-only # never fail directlyThen in Branch protection, require the "PromptProof" check so the PR is blocked when the budget is exceeded.
- uses: geminimir/promptproof-action@v0
with:
config: promptproof.yaml
format: junit
report-artifact: promptproof-report
snapshot-on-success: true
snapshot-promote-on-main: true
snapshot-tag: nightlyNo API keys required. Use sample fixtures to see a green run:
name: PromptProof
on: [pull_request]
jobs:
eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: geminimir/promptproof-action@v0
with:
config: example/promptproof.yaml
format: html
mode: report-onlyThis uses recorded fixtures under example/fixtures/ so CI makes no network calls.
- Settings → Branches → Branch protection rules → Add rule
- Branch name pattern =
main - Enable "Require status checks to pass" → select "PromptProof"
- Save
strategy:
matrix:
suite: [support, sales, docs]
steps:
- uses: geminimir/promptproof-action@v0
with:
config: promptproof-${{ matrix.suite }}.yamlThe action automatically comments on PRs with:
- Violation summary grouped by check type
- Key metrics (cost, latency, pass/fail counts)
- Expandable details for each violation type
- A permalink to the run artifacts
Reports are uploaded as artifacts and retained for 30 days:
- HTML report for human review
- JSON report for programmatic access
- JUnit XML for test result visualization
- SARIF report for Code Scanning
MIT