PromptProof GitHub Action

Deterministic LLM testing in your CI/CD pipeline. This action evaluates recorded LLM outputs against defined contracts and fails PRs when violations are detected.

Features

Zero network calls - Tests run on recorded fixtures
Rich reporting - HTML, JUnit, JSON output formats
PR comments - Automatic violation summaries
Budget tracking - Cost and latency monitoring
Flexible checks - JSON schema, regex, numeric bounds, string contains/equals, list/set equality, file diff, custom functions

See it in action

Regression fail PR · Cost gate PR · Assertion fail PR

[Links to live PRs and GIFs to be inserted after publishing]

Report preview

Below are example screenshots of the HTML report generated by this action.

Before

After

Quick Start

name: PromptProof
on: [pull_request]

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: geminimir/promptproof-action@v0
        with:
          config: promptproof.yaml
          baseline-ref: origin/main
          runs: 3
          seed: 1337
          max-run-cost: 2.50
          report-artifact: promptproof-report
          mode: gate

Inputs

Input	Description	Default
`config`	Path to promptproof.yaml	`promptproof.yaml`
`baseline-ref`	Git ref to load baseline snapshot from (e.g., `origin/main`)
`runs`	Number of runs for flake control
`seed`	Seed for flake control determinism
`max-run-cost`	Maximum total cost for this run (USD)
`report-artifact`	Name of uploaded report artifact	`promptproof-report`
`mode`	`gate` (fail) or `report-only` (warn). Defaults to config.
`format`	Output format (`html`	`junit`
`regress`	Also compare to local baseline	`false`
`node-version`	Node.js version	`20`
`snapshot-on-success`	Create snapshot after successful run	`false`
`snapshot-promote-on-main`	Promote snapshot to baseline on main	`false`
`snapshot-tag`	Optional snapshot tag

Outputs

Output	Description
`violations`	Number of violations found
`passed`	Number of fixtures that passed
`failed`	Number of fixtures that failed
`failed-tests`	Alias for `failed`
`total-cost`	Total cost (USD) of this evaluation
`regressions`	New failures vs baseline (when regression comparison is enabled)
`report-path`	Path to generated report

Configuration

Create a promptproof.yaml file in your repository:

schema_version: pp.v1
fixtures: fixtures/outputs.jsonl
checks:
  - id: no_pii
    type: regex_forbidden
    target: text
    patterns:
      - "[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}"
budgets:
  cost_usd_per_run_max: 0.50
  latency_ms_p95_max: 2000
mode: fail

Permissions

When using format: sarif, ensure your workflow grants Code Scanning upload permissions:

permissions:
  contents: read
  security-events: write

Examples

Advanced usage (baseline/regress + flake control + cost gate)

- uses: geminimir/promptproof-action@v0
  with:
    config: promptproof.yaml
    baseline-ref: origin/main   # pull last green snapshot from main
    regress: true               # also compare with any local baseline
    runs: 5                     # flake control
    seed: 42                    # deterministic nondeterminism
    max-run-cost: 1.75          # cost gate for the entire suite
    format: junit               # emit JUnit XML for test tab
    mode: gate                  # fail on violations

Emit SARIF for Code Scanning

- uses: geminimir/promptproof-action@v0
  with:
    config: promptproof.yaml
    format: sarif

Gate on cost via branch rules (report-only mode)

- uses: geminimir/promptproof-action@v0
  with:
    config: promptproof.yaml
    max-run-cost: 1.00
    mode: report-only           # never fail directly

Then in Branch protection, require the "PromptProof" check so the PR is blocked when the budget is exceeded.

Custom Output Format

- uses: geminimir/promptproof-action@v0
  with:
    config: promptproof.yaml
    format: junit
    report-artifact: promptproof-report
    snapshot-on-success: true
    snapshot-promote-on-main: true
    snapshot-tag: nightly

Zero-network Quickstart (fixtures only)

No API keys required. Use sample fixtures to see a green run:

name: PromptProof
on: [pull_request]
jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: geminimir/promptproof-action@v0
        with:
          config: example/promptproof.yaml
          format: html
          mode: report-only

This uses recorded fixtures under example/fixtures/ so CI makes no network calls.

Make it a required check

Settings → Branches → Branch protection rules → Add rule
Branch name pattern = main
Enable "Require status checks to pass" → select "PromptProof"
Save

With Matrix Testing

strategy:
  matrix:
    suite: [support, sales, docs]
steps:
  - uses: geminimir/promptproof-action@v0
    with:
      config: promptproof-${{ matrix.suite }}.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
assets		assets
example		example
reports		reports
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PromptProof GitHub Action

Features

See it in action

Report preview

Quick Start

Inputs

Outputs

Configuration

Permissions

Examples

Advanced usage (baseline/regress + flake control + cost gate)

Emit SARIF for Code Scanning

Gate on cost via branch rules (report-only mode)

Custom Output Format

Zero-network Quickstart (fixtures only)

Make it a required check

With Matrix Testing

PR Comments

Artifacts

License

About

Uh oh!

Releases

Packages

Languages

License

geminimir/promptproof-action

Folders and files

Latest commit

History

Repository files navigation

PromptProof GitHub Action

Features

See it in action

Report preview

Quick Start

Inputs

Outputs

Configuration

Permissions

Examples

Advanced usage (baseline/regress + flake control + cost gate)

Emit SARIF for Code Scanning

Gate on cost via branch rules (report-only mode)

Custom Output Format

Zero-network Quickstart (fixtures only)

Make it a required check

With Matrix Testing

PR Comments

Artifacts

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages