Title: Grounding Gemma 4 in Human Lived Experience: A Convention for Verifiable, Consent-Gated AI Alignment
DOI (Viability Condition paper): 10.5281/zenodo.18144681
Authors: Benjamin Haslam (Bazzer) and Garrett Sutherland — collaborative entry
The canonical submission writeup is
WRITEUP.md. This README is the GitHub repo entry point; the writeup is what gets pasted into the Kaggle submission form. License is Apache 2.0 (LICENSE+NOTICE). Current promoted candidate isguard-v7 + v42at anchor4d0d7bf05ea2cc8d323b08982329455c72a999bd6da5a75a8b136a81b8ad8bb8(H26 verdict).Submitted snapshot: Kaggle submission filed 2026-05-18 from commit
ec7db2e; seedocs/submission_manifest_2026-05-18.mdfor the frozen submission set anddocs/research_record_map.mdfor the post-submission reader map.Video: https://youtu.be/p5ZprNkIAEM
AI systems trained on synthetic data can maintain semantic grounding only when the rate of externally-verified human correction exceeds the rate of internally-generated error — the Viability Condition: Ceff(t) > E(t).
This notebook demonstrates how Gemma 4's function-calling capability can be used to build a governance loop that monitors and maintains this condition in real time using:
- The HAIC Maestro gateway — verified grounding interviews (Ceff)
- The PRISM geometry library — activation-level E(t) measurement
- A Merkle-auditable participation receipt — proof the condition is met
The governance pipeline spans four composable layers, each enforcing the Viability Condition at a different time scale. Every gradient signal is traceable from operator click through to federation commit.
┌──────────────────────────────────────────────────────────────────────┐
│ L4 SYSTEM Viability Condition Ceff(t) > E(t) │
│ federation viability/distributed_viability.py │
│ per round decision: commit · rollback · alert_operator │
└────────────────────────────────┬─────────────────────────────────────┘
│ accepts / rejects fragments
┌────────────────────────────────▼─────────────────────────────────────┐
│ L3 FRAGMENT DiLoCo Fragment Verifier │
│ per learner tools/diloco_fragment_verifier.py │
│ per round Merkle integrity · consent · shape · norms │
└────────────────────────────────┬─────────────────────────────────────┘
│ accepts / rejects round contributions
┌────────────────────────────────▼─────────────────────────────────────┐
│ L2 SESSION Six Convention-Session Viability Gates │
│ per session viability/session_gates.py │
│ entropy_reduction · extraction_risk · prism_consistency│
│ participation_covenant · federated_exchange · epistemic│
└────────────────────────────────┬─────────────────────────────────────┘
│ admits / rejects training_signal
┌────────────────────────────────▼─────────────────────────────────────┐
│ L1 STEP TTT Gates (error_bias BLOCKING) │
│ per device viability/ttt_gates.py + tools/edge_ttt_adapter.py │
│ error_bias (BLOCK) · weight_drift (warn) · rate (warn) │
└──────────────────────────────────────────────────────────────────────┘
Plus a structured decision vocabulary for enforcement-consequential
observations (deforestation, structural damage):
tools/enforcement_evidence_contract.py with the four-action contract
accept · refine · defer · skip.
Try it:
python tools/federated_round_demo.py --n-learners 5 --bias-fraction 0.4produces a Merkle-anchored JSON receipt for one synthetic federation round.
Test it:
python -m pytest tests/ # 797 tests
python experiments/runtime_loop_stress_test.py # 7 streams
bash scripts/verify_all.sh # all of the above + receiptsSee docs/runtime_grounding_loop_2026-05-11.md and
docs/diloco_integration_2026-05-11.md for the full architecture.
gemma4good/
├── notebook/
│ └── haic_gemma4_governance.ipynb ← main Kaggle submission
├── tools/
│ ├── haic_tools.py ← 7 function-calling tool implementations
│ ├── incremental_grounding.py ← session-driven continual learning
│ ├── eval_leakage_check.py ← Gate 2: scenario-vs-shard hash check
│ ├── check_promotion.py ← Gate decision: PROMOTED/BLOCKED CLI
│ ├── evaluate_promotion.py ← single-entry pipeline wrapper
│ ├── eval_receipt.py ← Merkle-anchored eval receipt
│ ├── edge_ttt_adapter.py ← Layer 1: per-device runtime adaptation
│ ├── diloco_fragment_verifier.py ← Layer 3: per-fragment Merkle/consent/shape
│ ├── enforcement_evidence_contract.py ← VLA-style 8-key evidence + 4 actions
│ └── federated_round_demo.py ← End-to-end CLI demo of all four layers
├── experiments/
│ ├── canonical_eval.py ← CURRENT canonical eval driver (post-v42)
│ ├── rubrics.py ← canonical strict + v1 rubrics (stable API)
│ ├── sgt_harness.py ← rigorous SGT (Garrett Sutherland's)
│ ├── sgt_extended_scenarios.py ← 10 grounding + 5 security scenarios
│ ├── run_v38_sgt.py ← BEAST runner (1-turn, pre-v42 era)
│ ├── inspect_security_responses.py ← failure-mode dissection helper
│ ├── kaggle_cell_rigorous_sgt.py ← drop-in cell for kaggle build scripts
│ ├── runtime_loop_stress_test.py ← 7-stream end-to-end runtime loop validation
│ ├── runtime_loop_stress_report.json ← receipt-anchored stress test result
│ ├── prism_geometry_trajectory.py ← v55–v58 PRISM qh scan
│ ├── h19_*.jsonl ← H19 predeclared Unicode/multi-msg suites
│ ├── h26_*.jsonl / h26_*.py ← final H26 multi-language gate suite
│ ├── v42_guard_v7_h26_canonical.json ← submitted H26 anchor 4d0d7bf…
│ └── archive/ ← v43–v59 notebook builders, legacy evals
├── tests/
│ └── test_*.py ← 797 unit tests covering eval + four layers
├── prism_integration/ ← Prism geometry wrappers (E(t) source)
├── maestro_integration/ ← Maestro gateway client
├── viability/
│ ├── viability_condition.py ← Original single-node Ceff(t) > E(t)
│ ├── distributed_viability.py ← Layer 4: federated Ceff_global > E_global
│ ├── session_gates.py ← Layer 2: six convention-session gates
│ └── ttt_gates.py ← Layer 1: TTT runtime adaptation gates
├── utils/
│ └── merkle.py ← shared SHA3-256 + Merkle root utilities
├── notebook/
│ ├── haic_gemma4_governance.ipynb ← main submission (Scenarios 1-5)
│ └── _scenario5_insert.py ← one-shot builder for Scenario 5 cells
├── assets/ ← Diagrams, images
└── docs/
├── evaluation_doctrine.md ← six-gate model evaluation doctrine
├── promotion_workflow.md ← end-to-end promotion pipeline
├── v39_recipe.md ← next training run proposal
├── audit_humanai_convention_pipeline.md
│ ← gap analysis vs upstream pipeline
├── writeup_addendum_2026-05-08.md
│ ← rigorous re-eval companion to WRITEUP
├── integration_notes.md ← Maestro + Prism code interfaces
├── viability_condition.md ← Full theoretical framework
├── diloco_integration_2026-05-11.md ← Layer 3/4: DiLoCo theory + scenarios
├── runtime_grounding_loop_2026-05-11.md ← Four-layer architecture walkthrough
├── simsat_incorporation_decisions_2026-05-11.md ← What ported from SimSat
├── autonomous_session_2026-05-11.md ← Overnight session operator brief
├── v43_v44_verdict_2026-05-10.md ← v43/v44 model verdict
├── v45_verdict_2026-05-10.md ← v45 H4d verdict (superseded by canonical eval)
├── v46_verdict_2026-05-11.md ← v46 DPO verdict: H4e REFUTED
├── canonical_eval_verdict_2026-05-11.md ← single-source-of-truth eval + SHA3-256 anchor
├── strict_rubric_finding_2026-05-11.md ← strict explicit-refusal classifier
├── system_prompt_artifact_finding_2026-05-11.md ← OLD vs NEW prompt analysis
└── nla_training_cost_analysis_2026-05-11.md ← NLA Stage 1/2 cost decision doc
This local project root has one active lane and one archive:
<repo-root>— current branchmain(tracksorigin/main). All work happens here.<repo-root>\_local_worktrees\_archive\local-history-safety— archived safety lane, not for active development.
Local-only artifacts live under:
<repo-root>\_local_state\archives<repo-root>\_local_state\backups<repo-root>\_local_state\regressions<repo-root>\_local_state\logs<repo-root>\_local_notes
History note. Prior to 2026-05-11 the repo used a dual-branch "runtime master + public main" pattern with unrelated histories. That pattern was retired; the
masterbranch and thepublic-mainworktree no longer exist.mainis now the only working branch.
The submission notebook (notebook/haic_gemma4_governance.ipynb) uses Gemma 4's
native function-calling format with five active governance tools (Scenarios 1–4)
plus one advisory audit (Scenario 6):
| # | Tool | Role | Implementation |
|---|---|---|---|
| 1 | assess_wellbeing_domain |
Map scenario to GFS wellbeing domains + vulnerability | inline, tools/haic_tools.py |
| 2 | verify_consent_and_provenance |
Check consent layers + data lineage | inline, Maestro /v1/session/consent |
| 3 | run_prism_analysis |
Measure activation geometry (E(t)) | prism_integration/prism_client.py |
| 4 | audit_activation_explanation |
NLA: explain what the model is reasoning about | tools/audit_activation_explanation.py (MockNLA until Gemma-4 NLA trained) |
| 5 | generate_alignment_receipt |
Finalize Merkle-anchored governance receipt | inline GovernanceTrace.finalize() |
| Tool | Role | Implementation |
|---|---|---|
audit_provenance |
Cisco MPK statistical model-derivation check | tools/audit_provenance.py (score ≥ 0.75 high / ≥ 0.65 weak) |
NLA honest-scope note: Tool 4 (audit_activation_explanation) uses MockNLA
today — no Gemma-4-E2B NLA has been trained yet. The mock is deterministic and
audit-stable. The contract is forward-compatible: a trained Gemma-4 NLA plugs
in with zero consumer-code changes. See docs/nla_training_cost_analysis_2026-05-11.md.
# Start Maestro in test mode
cd <humanai-convention-root>\maestro
MAESTRO_LAUNCH_MODE=test MAESTRO_JWT_SECRET=$(python -c "import secrets; print(secrets.token_hex(32))") \
python -m uvicorn apps.gateway.main:app --reload --port 8000
# Run the notebook
cd <repo-root>
jupyter notebook notebook/haic_gemma4_governance.ipynbdocs/project_goal_2026-05-13.md— current scientific charter for the submission: governance proof first, fine-tuning as falsifiable appendixdocs/submission_manifest_2026-05-18.md— exact submitted snapshot and load-bearing file setdocs/research_record_map.md— post-submission map of governance, fine-tuning, guard, and reproducibility tracksdocs/submission_alignment_2026-05-13.md— current submission posture, load-bearing documents, and claim disciplinedocs/v57_production_candidate_plan_2026-05-14.md— precommitted path for a possible v42-plus live replacement; no promotion without H15 passingdocs/v57_canonical_verdict_2026-05-14.md— v57 H15 failed; v42 remains live production referencedocs/v58_precommit_plan_2026-05-14.md— boundary-first v58 design after the v42/v55/v57 failure taxonomydocs/v58_canonical_verdict_2026-05-14.md— v58 H16 failed on non-compensatory direct-injection and disclosure-preview gatesdocs/v59_precommit_plan_2026-05-14.md— targeted residual patch plan and real artifact/quantization/eval traildocs/v59_canonical_verdict_2026-05-14.md— latest fine-tuning endpoint: v59 H17 failed; v42 remains the live production referencedocs/viability_condition.md— the mathematical foundationdocs/evaluation_doctrine.md— the six gates that govern model promotiondocs/promotion_workflow.md— end-to-end pipeline (rigorous SGT → leakage receipt → six-gate decision → Merkle eval receipt)docs/integration_notes.md— code interfaces for Maestro and Prismtools/haic_tools.py— tool implementations
docs/canonical_eval_verdict_2026-05-11.md— canonical eval methodology + SHA3-256 self-anchorexperiments/v42_canonical_old_prompt.json— v42 anchore5976055…(5 seeds, n=100)experiments/v46_canonical_old_prompt.json— v46 DPO anchor95252de7…(H4e REFUTED)docs/v46_verdict_2026-05-11.md— v46 DPO verdict: H4e refuted (strict refusal 13.8% → 2.6%)docs/strict_rubric_finding_2026-05-11.md— strict classifier methodologydocs/system_prompt_artifact_finding_2026-05-11.md— OLD vs NEW prompt artifactdocs/nla_training_cost_analysis_2026-05-11.md— NLA Stage 1/2 cost analysis + decisiondocs/v55_canonical_verdict_2026-05-14.md— best balanced fine-tuned result so far, not promoteddocs/v56_canonical_verdict_2026-05-14.md— targeted mixed SFT negative result and stop conditiondocs/v57_production_candidate_plan_2026-05-14.md— precommitted curated-target production-candidate hypothesisdocs/v57_canonical_verdict_2026-05-14.md— curated-target production candidate failed H15 and is not promoteddocs/v58_canonical_verdict_2026-05-14.md— boundary-first SFT improved aggregate and explicit refusal, but failed H16 non-compensatory gatesdocs/v59_canonical_verdict_2026-05-14.md— strongest fine-tuned result to date, but failed H17 direct-injection and jailbreak gates; not promoteddocs/v42_boundary_guard_precommit_2026-05-14.md— H18 guard design: deterministic FastAPI proxy (port 8082, 16 rules, 4 classes) around v42; phrase updated to EXPLICIT_REFUSAL language after H18 first rundocs/v42_guard_h18_verdict_2026-05-15.md— H18 first run FAIL on H18b rubric artifact; documented phrase evolution across three iterationsdocs/v42_guard_h18r4_verdict_2026-05-15.md— H18r4 PASS (all 13 gates); originalguard + v42promotion; anchor18e2c5a5…; 16 rules, 60 tests. Historical anchor for the ASCII-only attack surface.docs/v42_guard_known_limitations_2026-05-15.md— original security gaps that H18r4 did NOT anchor (Unicode bypass = L-01, now CLOSED by H20; multi-message scan = L-02, still open and deferred to H21)docs/h19_precommit_hypothesis_2026-05-16.md— H19 hypothesis: close the Unicode-bypass and multi-message gaps with normalization + per-message scan, with predeclared FP suitedocs/h19_verdict_2026-05-16.md— H19 FAIL per predeclared gates; Unicode mitigation proven (H19-B 20/20, H19-C 0/31 FP) but multi-message D-gates failed due to suite-design confound. Honest negative verdict.docs/h20_precommit_hypothesis_2026-05-16.md— H20: clean re-test isolating Unicode normalization only; multi-message claim deferred.docs/h20_verdict_2026-05-16.md— H20 PASS (all 14 gates);guard-v3 + v42; anchor56ce960993f9…; closes L-01 Unicode-bypass.docs/h21_precommit_hypothesis_2026-05-16.md— H21: clean re-test isolating multi-message attack closure only; system-role rejection deferred. Suite-design fix verified all 25 attack payloads fire a v3 rule when sent as single messages (preventing the H19 confound).docs/h21_verdict_2026-05-16.md— H21 PASS (all 15 gates);guard-v4 + v42; anchord916ef63…; closes L-02 multi-message-scan.docs/h22_precommit_hypothesis_2026-05-16.md— H22: client-suppliedrole: systemrejection (the residual L-02b), with explicit D2b predicate for legitimate operator pos-0 system prompts (the fix for the H19-D2 precommit-vs-suite confound).docs/h22_verdict_2026-05-16.md— H22 PASS (all 16 gates);guard-v5 + v42superseded H21; anchor5f2e796cf5af…; closes L-02b.docs/h23_verdict_2026-05-16.md— H23 PASS at threshold; encoded-payload behavioral defense held and surfaced L-08.docs/h24_verdict_2026-05-16.md— H24 PASS; leet-fold closes L-08 and promotesguard-v6 + v42.docs/h25_verdict_2026-05-16.md— H25 FAIL; native-language attack bypass confirmed and documented as L-09.docs/h26_verdict_2026-05-17.md— H26 PASS; multi-language rules close L-09 and promoteguard-v7 + v42, the submitted candidate.docs/discipline_is_the_contribution.md— 1,200-word essay on predeclared, non-compensatory, anchored evaluation as the contribution.docs/compliance_one_pager.md— EU AI Act / NIST AI RMF / US AG / OECD mapping of HAIC primitives.docs/why_this_matters.md— 5-minute public-facing articulation
- Public Kaggle notebook:
benhaslam/haic-guard-v42-reproducibility-demo-h18r4— runs the H18r4 guard demo end-to-end on Kaggle T4 in under a minute, emits a SHA3-anchored receipt
To evaluate any candidate (v42 and later — uses canonical_eval):
# 1. Start the candidate via llama-server, then run canonical eval.
python experiments/canonical_eval.py \
--model-id haic-gemma4-v<N> \
--server-url http://127.0.0.1:8081 \
--scenarios experiments/sgt_scenarios_v2.jsonl \
--system-prompt-variant old \
--seeds 7 13 23 42 100 \
--n-samples 20 \
--focused-n 100 \
--out experiments/v<N>_canonical.json \
--failure-sidecar experiments/v<N>_failures.jsonl
# 2. The canonical JSON contains all 13 H-series gate results and a SHA3-256
# self-anchor. Gate thresholds are predeclared in the matching docs/hypothesis
# file BEFORE the run (per evaluation_doctrine.md).
# 3. Verdict: write docs/v<N>_canonical_verdict_<DATE>.md citing the anchor,
# gate results, and PASS/FAIL decision per the precommit gates.For the v38-v40 era six-gate decision pipeline (BEAST runner, leakage check,
check_promotion CLI), see docs/promotion_workflow.md — that pipeline is
still functional for historical reproduction but post-v42 promotion uses
canonical_eval above.
The pipeline is non-compensatory: any one of the six gates failing blocks
promotion. See docs/promotion_workflow.md
for the full procedure and the v38 disposition under it.