11 unstable releases (3 breaking)

new 0.9.0	May 16, 2026
0.8.1	May 14, 2026
0.7.2	May 13, 2026
0.6.6	May 9, 2026

#163 in Text processing

Apache-2.0 OR MIT

1MB
27K SLoC

gaze-cli

Gaze command-line interface

Part of the Gaze workspace — a reversible PII pseudonymization runtime for agentic LLM workflows.

This crate publishes the gaze binary. It is the process boundary used by shell integrations and language adapters that should not link the Rust library directly.

The CLI reads from stdin, writes JSON to stdout, and emits sanitized structured errors to stderr. Panic handling is overridden so dependency panics do not dump raw input or backtraces into caller logs.

Cargo

Install from crates.io:

$ cargo install gaze-cli

Build from the workspace root:

$ cargo build -p gaze-cli

The default build includes gaze proxy for local LLM API proxy use.

Build with the MCP installer/server surface:

$ cargo build -p gaze-cli --features mcp

Run from the workspace root:

$ cargo run -p gaze-cli -- clean --policy policy.toml

The installed binary name is gaze.

Subcommands

Current subcommands in src/commands/mod.rs:

Subcommand	Purpose
`clean`	Reads raw UTF-8 text from stdin and emits `{"clean_text","session_blob","stats"}` JSON.
`daemon`	Runs a long-lived JSONL stdio cleaner with one process-level pipeline and per-`session_id` manifests.
`restore`	Reads `{"session_blob","text"}` JSON from stdin and emits restored `{"text"}` JSON, plus `restore_warning` when tolerant restore allows an unknown token.
`audit query`	Prints filtered audit metadata rows from a `--audit-db` SQLite log, opened read-only.
`audit export`	Exports filtered audit metadata rows in JSONL (default) for downstream processing.
`document clean`	OCRs PNG/JPG/PDF input into a SafeBundle. Requires `--features document`.
`mcp install`	Installs `gaze mcp serve` into supported MCP client configs. Requires `--features mcp`.
`mcp doctor`	Diagnoses MCP runtime dependencies, client config, and AGENTS.md guidance. Requires `--features mcp`.
`mcp serve`	Runs the stdio MCP server exposing `gaze_read_file` and `gaze_read_text`. Requires `--features mcp`.
`proxy serve/start/stop/status/restart`	Runs or manages the local LLM API proxy. Included in the default build.

Audit logging is captured on clean via --audit-db <path>; the audit query and audit export subcommands read the same database back.

Daemon mode

gaze daemon --policy policy.toml keeps one pipeline alive and reads one JSON request per stdin line:

{"session_id":"conversation-1","text":"Contact [email protected]"}

Each stdout line is either a clean response:

{"session_id":"conversation-1","clean_text":"Contact <...:Email_1>","manifest":[],"tokens":[]}

or a typed protocol/cleaning error:

{"session_id":null,"error":"JsonMalformed","detail":"malformed JSON line"}

Sessions are isolated by session_id, evicted by LRU after --session-cap (default 1000), and evicted after --session-idle-timeout seconds (default 3600). --idle-timeout exits the process after stdin inactivity (default 1800). SIGINT and SIGTERM finish the current line, flush stdout/audit writes, and exit. Daemon audit rows are stamped with provenance_stage = "daemon".

MCP installation

The mcp feature embeds the rmcp stdio server into the gaze binary and registers gaze-document tools:

$ cargo install gaze-cli --features mcp
$ gaze mcp install --client=claude-code
$ gaze mcp doctor

install always writes the absolute std::env::current_exe() path into the client config:

{
  "mcpServers": {
    "gaze": {
      "command": "/absolute/path/to/gaze",
      "args": ["mcp", "serve"],
      "env": {}
    }
  }
}

Supported clients:

Client	Default config path
`claude-code`	`./.mcp.json` in the current project.
`claude-desktop`	macOS `~/Library/Application Support/Claude/claude_desktop_config.json`; Windows `%APPDATA%\Claude\claude_desktop_config.json`; Linux config dir fallback.
`cursor`	`./.cursor/mcp.json` in the current project.
`all`	Updates all supported client paths.

Use --agents-md <path> to choose where the marker-fenced Gaze MCP guidance section is written. By default, install updates ./AGENTS.md. Use --skip-agents-md to update only client JSON, and --dry-run to preview without writing.

doctor checks whether the current binary matches client config entries, whether tesseract and pdfium are available for gaze_read_file, whether the manifest directory is writable, and whether the AGENTS.md guidance section is present. Warnings do not fail by default; --strict exits non-zero on warnings. Use --json for machine-readable output.

serve runs the stdio MCP server:

$ gaze mcp serve --manifest-dir ~/.local/share/gaze/mcp-manifests --max-file-size 26214400

The server covers the data-source to model path only. It does not filter text the user pastes directly into a chat UI.

`clean`

$ printf '%s' 'Email [email protected] now' \
  | gaze clean --policy policy.toml

Flags:

Flag	Meaning
`--policy <path>`	Optional `policy.toml` path. Production integrations should pass one.
`--format <json>`	Output format. Only `json` is accepted. Defaults to `json`.
`--session-ttl <secs>`	Override persistent session TTL from policy.
`--session-scope <scope>`	Override `[session].scope` from policy.
`--locale <tag[,tag...]>`	Active locale fallback chain, comma separated and priority ordered.
`--ner-threshold <float>`	Override policy `[ner]` threshold. Must be between `0.0` and `1.0` inclusive.
`--ner-model-dir <path>`	Override `[ner].model_dir` from policy.
`--ner-locale <tag>`	Override `[ner].locale` from policy.
`--rulepack-bundled <name[,name...]>`	Override `[policy.rulepacks].bundled`. Comma separated. `core-extended` is deprecated since v0.8.0; use `core --locale=<lang>` for explicit locale-gated activation.
`--rulepack-path <path>`	Override `[policy.rulepacks].paths`. Repeatable.
`--max-bytes <bytes>`	Stdin byte cap. Defaults to `10485760`.
`--context-json <path>`	Typed context envelope with dictionaries, class map, and fields.
`--audit-db <path>`	Optional SQLite redaction-log database path for metadata-only audit entries.
`--safety-net <kind>`	Optional observer-only safety net. Accepts `openai-filter` (v0.6+) or `kiji-distilbert` (v0.8+). Activates the post-clean leak audit.
`--safety-net-backend <backend>`	v0.8 single-backend selector: `openai-filter` or `kiji-distilbert`. When set alongside `--safety-net=<kind>`, this flag wins and lets adopters swap the Pass-3 implementation without re-typing the legacy `--safety-net` value. Cannot be combined with `--safety-net-registry`.
`--safety-net-registry`	Enables locale-aware Pass-3 dispatch through `LocaleAwareModelRegistry`. Requires one or more `--safety-net-add` flags.
`--safety-net-add <backend>`	Adds one backend to the registry. Repeatable. First resolved backend wins for v1.
`--openai-filter-command <path>`	Path to the local OpenAI Privacy Filter `opf` command. Required with the `openai-filter` backend.
`--openai-filter-checkpoint <path>`	Path to the OPF checkpoint or model directory. Required with the `openai-filter` backend.
`--kiji-backend <backend>`	Kiji runtime backend: `subprocess` (default, compatibility path) or `ort` (in-process ONNX Runtime).
`--opf-command <path>` / `--opf-checkpoint <path>`	Registry-example aliases for the OpenAI Privacy Filter command and checkpoint.
`--opf-locales <tag[,tag...]>`	Native locales for the OpenAI Privacy Filter registry entry. Empty keeps the backend default.
`--kiji-distilbert-command <path>`	Path to the local Kiji DistilBERT subprocess command. Required with the `kiji-distilbert` backend.
`--kiji-distilbert-model-dir <path>`	Path to the pinned Kiji DistilBERT model directory (must contain `SHA256SUMS`, `labels.json`, `model.onnx`, `tokenizer.json`). Required with the `kiji-distilbert` backend.
`--kiji-distilbert-locales <tag[,tag...]>`	Native locales for the Kiji DistilBERT registry entry. Empty keeps the backend default.
`--safety-net-timeout-ms <ms>`	Subprocess deadline. Defaults to `5000`.
`--safety-net-input-limit-bytes <bytes>`	Clean-text input cap forwarded to the safety net. Defaults to `1048576`.
`--safety-net-mode <strict\|tolerant\|redact\|resolve>`	Production action on `Uncovered`/`PartialBleed` suspects. `strict` exits `3`; `tolerant` emits warnings on stderr and continues (dev-only, fires a stderr warning on every invocation); `redact` overwrites the suspect span with a sentinel and records an audit row; `resolve` promotes the suspect into a synthetic custom-recognizer match and re-runs the resolver. Defaults to `resolve`. Mode catalog and posture guide: `docs/architecture/safety-net-modes.md`.
`--safety-net-fallback <strict\|tolerant\|redact>`	Cascade action when `--safety-net-mode` is `redact` or `resolve` and the primary action cannot be honored for a specific suspect (manifest overlap or grapheme-cluster break for `redact`; validator-veto, missing mandatory anchor, or residual suspect after the one-shot resolve pass for `resolve`). Ignored when `--safety-net-mode` is `strict` or `tolerant`. Defaults to `redact`. One-hop cascade only. `tolerant` requires `GAZE_ALLOW_TOLERANT=1`. Composition matrix and audit-row delta: `docs/architecture/safety-net-modes.md`.
`--safety-net-resolve-threshold <float>`	Confidence threshold for `--safety-net-mode resolve`. Suspects below threshold are dropped before candidate construction. Defaults to `0.7`. `0.0` disables filtering; `1.0` disables resolve entirely.

When --policy is omitted, the CLI runs a stub email pipeline so the process surface can be exercised. Production use should pass --policy.

Safety net

The optional --safety-net=<kind> flag activates the observer-only safety net documented in docs/architecture/safety-nets.md. The safety net runs after the deterministic clean and reports suspected leaks against the manifest of emitted tokens. It cannot mutate the clean text and cannot affect restore.

Safety-net backends

Two observer-only backends are available; pick one via --safety-net-backend <backend> (v0.8) or the legacy --safety-net=<kind>. Both share the strict/tolerant exit-code contract, the LeakReport shape, and the safety_net_log audit table.

openai-filter (v0.6+) wraps the official openai/privacy-filter subprocess. Strengths: eight typed labels covering Person, Email, Phone, URL, Address, Date, Account number, Secret; documented operating points; mature upstream. Trade-offs: heavier model and slower per-clean latency; no first-party fetch path; runtime depends on a third-party Python install the operator pins.

kiji-distilbert (v0.8+) wraps a pinned DistilBERT NER model. It supports --kiji-backend=subprocess for the existing external command path and --kiji-backend=ort for in-process ONNX Runtime inference without a Python install. Strengths: lightweight ONNX-served weights, straightforward fetch script, second NER opinion at the chokepoint that complements OPF's class set. Trade-offs: narrower closed label set (person, location, organization, miscellaneous) so financial-secret or account-number suspects are not surfaced; pinned-artifact contract requires SHA256SUMS present on disk (Axis-1 fail-closed — no silent disable). The default remains subprocess for backwards compatibility.

Setup

The safety-net features are gated off by default. Build with one or both:

$ cargo build -p gaze-cli --features safety-net-openai
$ cargo build -p gaze-cli --features safety-net-kiji
$ cargo build -p gaze-cli --features safety-net-openai,safety-net-kiji

The opf command must be installed from a pinned upstream Git revision or an official release of the openai/privacy-filter repository. Adopters should record the exact upstream Git SHA or tag they install in their deployment manifest. The adapter does not download or update the checkpoint; bring-your-own-binary plus bring-your-own-weights is the v0.6 contract.

Pin the install path with GAZE_OPENAI_FILTER_OPF=/opt/opf/bin/opf or pass --openai-filter-command=<path> per invocation. The command path must be a regular file (not a symlink) when given as an absolute path, and the checkpoint directory must be owned by the current user with mode 0700 and no group/world write bits.

If the checkpoint is missing, the CLI fails closed with exit 3 and variant WeightsMissing before any subprocess spawn. Initialization failures are cached for the lifetime of the process so missing-checkpoint errors do not retry on every clean.

The Kiji DistilBERT backend follows the same bring-your-own pattern. The model directory must contain SHA256SUMS, labels.json, model.onnx, and tokenizer.json; populate it via scripts/fetch-kiji-safetynet-model.sh. A missing artifact (including a missing SHA256SUMS) fails closed with the typed SafetyNetArtifactMissing envelope and exit code 2 before the subprocess is spawned — Axis-1 reliability never silent-disables a requested backend.

Synthetic example — strict mode

$ printf '%s' 'Email [email protected] or call 555-0100 now' \
  | gaze clean \
      --policy=policy.toml \
      --safety-net=openai-filter \
      --safety-net-mode=strict \
      --openai-filter-command=/opt/opf/bin/opf \
      --openai-filter-checkpoint=/opt/opf/checkpoint

A clean run emits the standard {clean_text, session_blob, stats} JSON plus a leak_report block on stdout:

{
  "clean_text": "Email <{session_hex}:Email_1> or call <{session_hex}:Phone_1> now",
  "session_blob": "<base64>",
  "stats": {"detections": 2},
  "leak_report": {
    "stats": {
      "suspect_count": 0,
      "uncovered_count": 0,
      "partial_bleed_count": 0,
      "class_mismatch_count": 0,
      "locale_skipped_count": 0
    }
  }
}

Exit code 0 and suspect_count = 0 is the contract for "no leaks".

Synthetic example — tolerant mode

$ printf '%s' 'Sender: Bob Example, phone +44 113 496 0123' \
  | gaze clean \
      --policy=policy.toml \
      --safety-net=openai-filter \
      --openai-filter-command=/opt/opf/bin/opf \
      --openai-filter-checkpoint=/opt/opf/checkpoint \
      --safety-net-mode=tolerant

If the safety net reports an Uncovered or PartialBleed suspect that the deterministic pipeline missed, tolerant mode emits a stderr warning and exits 0:

{"warning":"SafetyNet","variant":"SuspectedLeak","count":1}

Strict mode (the v0.7.x default; now opt-in via --safety-net-mode strict) would exit 3 with the JSON error {"error":"SafetyNet","exit":3,"variant":"SuspectedLeak"} and stdout would be empty. ClassMismatch suspects always warn but never fail strict mode, because the manifest still tokenized the bytes — only the class disagrees. The default mode in v0.8.x+ is resolve with a redact fallback (see the flag table above and the mode catalog).

Synthetic example — Kiji DistilBERT backend

$ printf '%s' 'Alice Example mailed the package to Berlin' \
  | gaze clean \
      --policy=policy.toml \
      --safety-net=kiji-distilbert \
      --safety-net-backend=kiji-distilbert \
      --kiji-distilbert-command=/opt/kiji/bin/kiji \
      --kiji-distilbert-model-dir=~/.local/share/gaze/models/kiji-distilbert

The --safety-net-backend flag overrides any legacy --safety-net=<kind> value, so a deployment can keep its existing --safety-net=openai-filter invocation and flip to Kiji by adding one flag. Suspects emit the same LeakReport shape; the safety_net_id field switches to kiji-distilbert.

Approved synthetic PII

All examples in this README use project-approved synthetic fixtures so the fixture-citation and no-tenant-knowledge gates remain green:

Emails: <local>@example.invalid, *.invalid, *.test. RFC 6761 guarantees these never resolve.
US/CA phones: NANPA 555-01xx range (555-0100 through 555-0199), reserved by the FCC for fictional use.
UK phones: Ofcom drama ranges (e.g. +44 113 496 0xxx), reserved by Ofcom for fictional use.
Names: Alice Example, Bob Example. Avoid real public-figure names.

Do not paste real customer or operator data into examples or fixtures — the fixture-citation-lint xtask gate will reject any literal that looks real or that is not cited from a checked-in test.

Latency budget

Each safety-net check spawns one opf subprocess. The default subprocess deadline is 5000 ms; tighten it via --safety-net-timeout-ms for latency-sensitive callers. On timeout the adapter sends SIGKILL, reaps the process, and returns exit 3 with variant Timeout. The safety net does not currently amortize subprocess startup across calls; a long-lived helper is filed for post-v0.6.0 (todo #303).

Audit

Combine the safety net with --audit-db <path> to persist metadata-only suspect rows into the safety_net_log table. Query the rows back with gaze audit safety-net query (see below). The schema and the bytes-free invariants are documented in docs/architecture/safety-nets.md.

`restore`

$ printf '%s' '{"session_blob":"<base64>","text":"Email <token> now"}' \
  | gaze restore

Flags:

Flag	Meaning
`--format <json>`	Output format. Only `json` is accepted. Defaults to `json`.
`--restore-mode <strict\|tolerant>`	Unknown-token handling. Defaults to `strict`.
`--max-bytes <bytes>`	Stdin byte cap. Defaults to `10485760`.

strict restore fails on unknown tokens. tolerant restore preserves unknown tokens and returns a warning in the JSON response.

`audit query`

Reads the SQLite redaction log written by gaze clean --audit-db <path> and prints filtered metadata rows as tab-separated values. The DB is opened read-only via OpenFlags::SQLITE_OPEN_READ_ONLY, so the audit CLI cannot write back to the log even if compromised.

$ gaze audit query --audit-db audit.sqlite --class email --action tokenize

Filters:

Flag	Meaning
`--audit-db <path>`	Required. SQLite redaction-log database path.
`--class <pii_class>`	Filter by PII class such as `email`, `name`, or `custom:term`.
`--source <name>`	Filter by source recognizer name.
`--action <kind>`	Filter by action: `tokenize`, `redact`, `preserve`.
`--document-kind <kind>`	Filter by document kind: `text`, `structured`.
`--from <iso8601>`	Include rows whose `created_at` is at or after this timestamp (v0.4.4).
`--to <iso8601>`	Include rows whose `created_at` is at or before this timestamp (v0.4.4).

Time-filtered queries omit NULL created_at rows from legacy v0.4.3 audit DBs by SQL semantics. Unfiltered queries still surface those rows.

`audit export`

Same filter set as audit query, with output destined for downstream processing rather than the terminal:

$ gaze audit export --audit-db audit.sqlite --format jsonl --output redactions.jsonl

Flag	Meaning
`--format <jsonl>`	Export format. JSONL is the default and currently the only supported format.
`--output <path>`	Optional output file. Defaults to stdout.

Exported JSON rows include created_at since v0.4.4. The export ships a restricted column set so raw PII payloads stay outside the export surface.

`audit safety-net query`

Reads the safety_net_log rows written by gaze clean --audit-db <path> --safety-net <kind> and prints them as tab-separated values. The DB is opened read-only.

$ gaze audit safety-net query \
    --audit-db audit.sqlite \
    --leak-kind uncovered \
    --field-path '$.user.email'

Filters:

Flag	Meaning
`--audit-db <path>`	Required. SQLite redaction-log database path.
`--leak-kind <kind>`	Filter by `uncovered`, `partial_bleed`, or `class_mismatch`.
`--raw-label <label>`	Filter by validated upstream label, e.g. `private_email`.
`--mapped-class <pii_class>`	Filter by Gaze class produced by the class map.
`--field-path <selector>`	Filter by structured-document field path, e.g. `$.user.email`.
`--from <iso8601>`	Include rows whose `created_at` is at or after this timestamp.
`--to <iso8601>`	Include rows whose `created_at` is at or before this timestamp.

The safety_net_log table stores metadata only — raw_label is the validated upstream label, not the upstream raw text. See docs/architecture/safety-nets.md for the full schema.

Exit codes

Exit codes are defined by CliError in src/error.rs.

Exit	Variants
`0`	Success, help, version output, or tolerant-mode safety-net runs that produced only stderr warnings.
`1`	`StdinParse`, `EmptyInput`, `InputTooLarge`, `InvalidEncoding`.
`2`	`PolicyConfig`, including unsupported format, invalid policy, invalid locale, invalid NER threshold, unknown rulepack, unsupported CLI column rules, `SafetyNetConfig` (missing backend command/checkpoint or safety-net flags supplied without the matching feature), or `SafetyNetArtifactMissing` (Axis-1 fail-closed when a backend's pinned artifact is absent, including a missing `SHA256SUMS` for the Kiji DistilBERT backend).
`3`	`UnknownToken`, `InvalidSignature`, `InvalidBlobVersion`, `BlobExpired`, `Pipeline`, sanitized panic path, and `SafetyNetFailure` variants: `Unavailable`, `WeightsMissing`, `ModelUnavailable`, `InputTooLarge`, `Timeout`, `Runtime`, `InvalidOutput`, `SuspectedLeak` (strict mode only).
`4`	`Io`, `PolicyOpen`.

Safety-net summary: exit 3 means the safety net (or strict mode) closed the door; exit 0 with no leak_report.stats.suspect_count means a clean run; exit 0 plus stderr {"warning":"SafetyNet",...} means tolerant mode reported suspects without blocking.

Stderr is JSON with the error variant and exit code, for example:

{"error":"PolicyConfig","exit":2}

UnknownToken includes the unknown token string because the token is already a pseudonym emitted by Gaze, not raw PII.

Full exit-code catalog (including 5 document, 6 mcp, 7 proxy feature codes) and the stability guarantee for each variant: docs/metrics.md.

Policy path

clean --policy <path> loads the TOML policy through gaze::Policy, loads bundled/path rulepacks, resolves locale precedence, builds a pipeline with gaze-assembly, then exports the session as session_blob.

For policy schema details, see docs/policy.md.

Dependencies

~50–73MB
~1.5M SLoC