HAIC × Gemma 4 Good — Kaggle Hackathon Entry

Title: Grounding Gemma 4 in Human Lived Experience: A Convention for Verifiable, Consent-Gated AI Alignment

DOI (Viability Condition paper): 10.5281/zenodo.18144681

Authors: Benjamin Haslam (Bazzer) and Garrett Sutherland — collaborative entry

The canonical submission writeup is WRITEUP.md. This README is the GitHub repo entry point; the writeup is what gets pasted into the Kaggle submission form. License is Apache 2.0 (LICENSE + NOTICE). Current promoted candidate is guard-v7 + v42 at anchor 4d0d7bf05ea2cc8d323b08982329455c72a999bd6da5a75a8b136a81b8ad8bb8 (H26 verdict).

Submitted snapshot: Kaggle submission filed 2026-05-18 from commit ec7db2e; see docs/submission_manifest_2026-05-18.md for the frozen submission set and docs/research_record_map.md for the post-submission reader map.

Video: https://youtu.be/p5ZprNkIAEM

Core Thesis

AI systems trained on synthetic data can maintain semantic grounding only when the rate of externally-verified human correction exceeds the rate of internally-generated error — the Viability Condition: Ceff(t) > E(t).

This notebook demonstrates how Gemma 4's function-calling capability can be used to build a governance loop that monitors and maintains this condition in real time using:

The HAIC Maestro gateway — verified grounding interviews (Ceff)
The PRISM geometry library — activation-level E(t) measurement
A Merkle-auditable participation receipt — proof the condition is met

The Runtime Grounding Loop

The governance pipeline spans four composable layers, each enforcing the Viability Condition at a different time scale. Every gradient signal is traceable from operator click through to federation commit.

┌──────────────────────────────────────────────────────────────────────┐
│ L4 SYSTEM     Viability Condition Ceff(t) > E(t)                     │
│   federation  viability/distributed_viability.py                     │
│   per round   decision: commit · rollback · alert_operator           │
└────────────────────────────────┬─────────────────────────────────────┘
                                 │ accepts / rejects fragments
┌────────────────────────────────▼─────────────────────────────────────┐
│ L3 FRAGMENT   DiLoCo Fragment Verifier                               │
│   per learner tools/diloco_fragment_verifier.py                      │
│   per round   Merkle integrity · consent · shape · norms             │
└────────────────────────────────┬─────────────────────────────────────┘
                                 │ accepts / rejects round contributions
┌────────────────────────────────▼─────────────────────────────────────┐
│ L2 SESSION    Six Convention-Session Viability Gates                 │
│   per session viability/session_gates.py                             │
│               entropy_reduction · extraction_risk · prism_consistency│
│               participation_covenant · federated_exchange · epistemic│
└────────────────────────────────┬─────────────────────────────────────┘
                                 │ admits / rejects training_signal
┌────────────────────────────────▼─────────────────────────────────────┐
│ L1 STEP       TTT Gates (error_bias BLOCKING)                        │
│   per device  viability/ttt_gates.py + tools/edge_ttt_adapter.py     │
│               error_bias (BLOCK) · weight_drift (warn) · rate (warn) │
└──────────────────────────────────────────────────────────────────────┘

Plus a structured decision vocabulary for enforcement-consequential observations (deforestation, structural damage): tools/enforcement_evidence_contract.py with the four-action contract accept · refine · defer · skip.

Try it:

python tools/federated_round_demo.py --n-learners 5 --bias-fraction 0.4

produces a Merkle-anchored JSON receipt for one synthetic federation round.

Test it:

python -m pytest tests/                                   # 797 tests
python experiments/runtime_loop_stress_test.py            # 7 streams
bash scripts/verify_all.sh                                # all of the above + receipts

See docs/runtime_grounding_loop_2026-05-11.md and docs/diloco_integration_2026-05-11.md for the full architecture.

Project Structure

gemma4good/
├── notebook/
│   └── haic_gemma4_governance.ipynb  ← main Kaggle submission
├── tools/
│   ├── haic_tools.py                       ← 7 function-calling tool implementations
│   ├── incremental_grounding.py            ← session-driven continual learning
│   ├── eval_leakage_check.py               ← Gate 2: scenario-vs-shard hash check
│   ├── check_promotion.py                  ← Gate decision: PROMOTED/BLOCKED CLI
│   ├── evaluate_promotion.py               ← single-entry pipeline wrapper
│   ├── eval_receipt.py                     ← Merkle-anchored eval receipt
│   ├── edge_ttt_adapter.py                 ← Layer 1: per-device runtime adaptation
│   ├── diloco_fragment_verifier.py         ← Layer 3: per-fragment Merkle/consent/shape
│   ├── enforcement_evidence_contract.py    ← VLA-style 8-key evidence + 4 actions
│   └── federated_round_demo.py             ← End-to-end CLI demo of all four layers
├── experiments/
│   ├── canonical_eval.py                   ← CURRENT canonical eval driver (post-v42)
│   ├── rubrics.py                          ← canonical strict + v1 rubrics (stable API)
│   ├── sgt_harness.py                      ← rigorous SGT (Garrett Sutherland's)
│   ├── sgt_extended_scenarios.py           ← 10 grounding + 5 security scenarios
│   ├── run_v38_sgt.py                      ← BEAST runner (1-turn, pre-v42 era)
│   ├── inspect_security_responses.py       ← failure-mode dissection helper
│   ├── kaggle_cell_rigorous_sgt.py         ← drop-in cell for kaggle build scripts
│   ├── runtime_loop_stress_test.py         ← 7-stream end-to-end runtime loop validation
│   ├── runtime_loop_stress_report.json     ← receipt-anchored stress test result
│   ├── prism_geometry_trajectory.py        ← v55–v58 PRISM qh scan
│   ├── h19_*.jsonl                         ← H19 predeclared Unicode/multi-msg suites
│   ├── h26_*.jsonl / h26_*.py              ← final H26 multi-language gate suite
│   ├── v42_guard_v7_h26_canonical.json     ← submitted H26 anchor 4d0d7bf…
│   └── archive/                            ← v43–v59 notebook builders, legacy evals
├── tests/
│   └── test_*.py                           ← 797 unit tests covering eval + four layers
├── prism_integration/                      ← Prism geometry wrappers (E(t) source)
├── maestro_integration/                    ← Maestro gateway client
├── viability/
│   ├── viability_condition.py              ← Original single-node Ceff(t) > E(t)
│   ├── distributed_viability.py            ← Layer 4: federated Ceff_global > E_global
│   ├── session_gates.py                    ← Layer 2: six convention-session gates
│   └── ttt_gates.py                        ← Layer 1: TTT runtime adaptation gates
├── utils/
│   └── merkle.py                           ← shared SHA3-256 + Merkle root utilities
├── notebook/
│   ├── haic_gemma4_governance.ipynb        ← main submission (Scenarios 1-5)
│   └── _scenario5_insert.py                ← one-shot builder for Scenario 5 cells
├── assets/                                 ← Diagrams, images
└── docs/
    ├── evaluation_doctrine.md        ← six-gate model evaluation doctrine
    ├── promotion_workflow.md         ← end-to-end promotion pipeline
    ├── v39_recipe.md                 ← next training run proposal
    ├── audit_humanai_convention_pipeline.md
    │                                 ← gap analysis vs upstream pipeline
    ├── writeup_addendum_2026-05-08.md
    │                                 ← rigorous re-eval companion to WRITEUP
    ├── integration_notes.md          ← Maestro + Prism code interfaces
    ├── viability_condition.md        ← Full theoretical framework
    ├── diloco_integration_2026-05-11.md       ← Layer 3/4: DiLoCo theory + scenarios
    ├── runtime_grounding_loop_2026-05-11.md   ← Four-layer architecture walkthrough
    ├── simsat_incorporation_decisions_2026-05-11.md  ← What ported from SimSat
    ├── autonomous_session_2026-05-11.md       ← Overnight session operator brief
    ├── v43_v44_verdict_2026-05-10.md          ← v43/v44 model verdict
    ├── v45_verdict_2026-05-10.md              ← v45 H4d verdict (superseded by canonical eval)
    ├── v46_verdict_2026-05-11.md              ← v46 DPO verdict: H4e REFUTED
    ├── canonical_eval_verdict_2026-05-11.md   ← single-source-of-truth eval + SHA3-256 anchor
    ├── strict_rubric_finding_2026-05-11.md    ← strict explicit-refusal classifier
    ├── system_prompt_artifact_finding_2026-05-11.md ← OLD vs NEW prompt analysis
    └── nla_training_cost_analysis_2026-05-11.md     ← NLA Stage 1/2 cost decision doc

Local Layout

This local project root has one active lane and one archive:

<repo-root> — current branch main (tracks origin/main). All work happens here.
<repo-root>\_local_worktrees\_archive\local-history-safety — archived safety lane, not for active development.

Local-only artifacts live under:

<repo-root>\_local_state\archives
<repo-root>\_local_state\backups
<repo-root>\_local_state\regressions
<repo-root>\_local_state\logs
<repo-root>\_local_notes

History note. Prior to 2026-05-11 the repo used a dual-branch "runtime master + public main" pattern with unrelated histories. That pattern was retired; the master branch and the public-main worktree no longer exist. main is now the only working branch.

The Governance Tool Pipeline

The submission notebook (notebook/haic_gemma4_governance.ipynb) uses Gemma 4's native function-calling format with five active governance tools (Scenarios 1–4) plus one advisory audit (Scenario 6):

Five Function-Calling Tools (Gemma 4 TOOL_SCHEMAS)

#	Tool	Role	Implementation
1	`assess_wellbeing_domain`	Map scenario to GFS wellbeing domains + vulnerability	inline, `tools/haic_tools.py`
2	`verify_consent_and_provenance`	Check consent layers + data lineage	inline, Maestro `/v1/session/consent`
3	`run_prism_analysis`	Measure activation geometry (E(t))	`prism_integration/prism_client.py`
4	`audit_activation_explanation`	NLA: explain what the model is reasoning about	`tools/audit_activation_explanation.py` (MockNLA until Gemma-4 NLA trained)
5	`generate_alignment_receipt`	Finalize Merkle-anchored governance receipt	inline `GovernanceTrace.finalize()`

Advisory Audit (Scenario 6, not in TOOL_SCHEMAS)

Tool	Role	Implementation
`audit_provenance`	Cisco MPK statistical model-derivation check	`tools/audit_provenance.py` (score ≥ 0.75 high / ≥ 0.65 weak)

NLA honest-scope note: Tool 4 (audit_activation_explanation) uses MockNLA today — no Gemma-4-E2B NLA has been trained yet. The mock is deterministic and audit-stable. The contract is forward-compatible: a trained Gemma-4 NLA plugs in with zero consumer-code changes. See docs/nla_training_cost_analysis_2026-05-11.md.

Quick Start (local gateway)

# Start Maestro in test mode
cd <humanai-convention-root>\maestro
MAESTRO_LAUNCH_MODE=test MAESTRO_JWT_SECRET=$(python -c "import secrets; print(secrets.token_hex(32))") \
  python -m uvicorn apps.gateway.main:app --reload --port 8000

# Run the notebook
cd <repo-root>
jupyter notebook notebook/haic_gemma4_governance.ipynb

Key Reading

docs/project_goal_2026-05-13.md — current scientific charter for the submission: governance proof first, fine-tuning as falsifiable appendix
docs/submission_manifest_2026-05-18.md — exact submitted snapshot and load-bearing file set
docs/research_record_map.md — post-submission map of governance, fine-tuning, guard, and reproducibility tracks
docs/submission_alignment_2026-05-13.md — current submission posture, load-bearing documents, and claim discipline
docs/v57_production_candidate_plan_2026-05-14.md — precommitted path for a possible v42-plus live replacement; no promotion without H15 passing
docs/v57_canonical_verdict_2026-05-14.md — v57 H15 failed; v42 remains live production reference
docs/v58_precommit_plan_2026-05-14.md — boundary-first v58 design after the v42/v55/v57 failure taxonomy
docs/v58_canonical_verdict_2026-05-14.md — v58 H16 failed on non-compensatory direct-injection and disclosure-preview gates
docs/v59_precommit_plan_2026-05-14.md — targeted residual patch plan and real artifact/quantization/eval trail
docs/v59_canonical_verdict_2026-05-14.md — latest fine-tuning endpoint: v59 H17 failed; v42 remains the live production reference
docs/viability_condition.md — the mathematical foundation
docs/evaluation_doctrine.md — the six gates that govern model promotion
docs/promotion_workflow.md — end-to-end pipeline (rigorous SGT → leakage receipt → six-gate decision → Merkle eval receipt)
docs/integration_notes.md — code interfaces for Maestro and Prism
tools/haic_tools.py — tool implementations

Model evaluation (canonical record)

docs/canonical_eval_verdict_2026-05-11.md — canonical eval methodology + SHA3-256 self-anchor
experiments/v42_canonical_old_prompt.json — v42 anchor e5976055… (5 seeds, n=100)
experiments/v46_canonical_old_prompt.json — v46 DPO anchor 95252de7… (H4e REFUTED)
docs/v46_verdict_2026-05-11.md — v46 DPO verdict: H4e refuted (strict refusal 13.8% → 2.6%)
docs/strict_rubric_finding_2026-05-11.md — strict classifier methodology
docs/system_prompt_artifact_finding_2026-05-11.md — OLD vs NEW prompt artifact
docs/nla_training_cost_analysis_2026-05-11.md — NLA Stage 1/2 cost analysis + decision
docs/v55_canonical_verdict_2026-05-14.md — best balanced fine-tuned result so far, not promoted
docs/v56_canonical_verdict_2026-05-14.md — targeted mixed SFT negative result and stop condition
docs/v57_production_candidate_plan_2026-05-14.md — precommitted curated-target production-candidate hypothesis
docs/v57_canonical_verdict_2026-05-14.md — curated-target production candidate failed H15 and is not promoted
docs/v58_canonical_verdict_2026-05-14.md — boundary-first SFT improved aggregate and explicit refusal, but failed H16 non-compensatory gates
docs/v59_canonical_verdict_2026-05-14.md — strongest fine-tuned result to date, but failed H17 direct-injection and jailbreak gates; not promoted
docs/v42_boundary_guard_precommit_2026-05-14.md — H18 guard design: deterministic FastAPI proxy (port 8082, 16 rules, 4 classes) around v42; phrase updated to EXPLICIT_REFUSAL language after H18 first run
docs/v42_guard_h18_verdict_2026-05-15.md — H18 first run FAIL on H18b rubric artifact; documented phrase evolution across three iterations
docs/v42_guard_h18r4_verdict_2026-05-15.md — H18r4 PASS (all 13 gates); original guard + v42 promotion; anchor 18e2c5a5…; 16 rules, 60 tests. Historical anchor for the ASCII-only attack surface.
docs/v42_guard_known_limitations_2026-05-15.md — original security gaps that H18r4 did NOT anchor (Unicode bypass = L-01, now CLOSED by H20; multi-message scan = L-02, still open and deferred to H21)
docs/h19_precommit_hypothesis_2026-05-16.md — H19 hypothesis: close the Unicode-bypass and multi-message gaps with normalization + per-message scan, with predeclared FP suite
docs/h19_verdict_2026-05-16.md — H19 FAIL per predeclared gates; Unicode mitigation proven (H19-B 20/20, H19-C 0/31 FP) but multi-message D-gates failed due to suite-design confound. Honest negative verdict.
docs/h20_precommit_hypothesis_2026-05-16.md — H20: clean re-test isolating Unicode normalization only; multi-message claim deferred.
docs/h20_verdict_2026-05-16.md — H20 PASS (all 14 gates); guard-v3 + v42; anchor 56ce960993f9…; closes L-01 Unicode-bypass.
docs/h21_precommit_hypothesis_2026-05-16.md — H21: clean re-test isolating multi-message attack closure only; system-role rejection deferred. Suite-design fix verified all 25 attack payloads fire a v3 rule when sent as single messages (preventing the H19 confound).
docs/h21_verdict_2026-05-16.md — H21 PASS (all 15 gates); guard-v4 + v42; anchor d916ef63…; closes L-02 multi-message-scan.
docs/h22_precommit_hypothesis_2026-05-16.md — H22: client-supplied role: system rejection (the residual L-02b), with explicit D2b predicate for legitimate operator pos-0 system prompts (the fix for the H19-D2 precommit-vs-suite confound).
docs/h22_verdict_2026-05-16.md — H22 PASS (all 16 gates); guard-v5 + v42 superseded H21; anchor 5f2e796cf5af…; closes L-02b.
docs/h23_verdict_2026-05-16.md — H23 PASS at threshold; encoded-payload behavioral defense held and surfaced L-08.
docs/h24_verdict_2026-05-16.md — H24 PASS; leet-fold closes L-08 and promotes guard-v6 + v42.
docs/h25_verdict_2026-05-16.md — H25 FAIL; native-language attack bypass confirmed and documented as L-09.
docs/h26_verdict_2026-05-17.md — H26 PASS; multi-language rules close L-09 and promote guard-v7 + v42, the submitted candidate.
docs/discipline_is_the_contribution.md — 1,200-word essay on predeclared, non-compensatory, anchored evaluation as the contribution.
docs/compliance_one_pager.md — EU AI Act / NIST AI RMF / US AG / OECD mapping of HAIC primitives.
docs/why_this_matters.md — 5-minute public-facing articulation

Independent reproducibility

Public Kaggle notebook: benhaslam/haic-guard-v42-reproducibility-demo-h18r4 — runs the H18r4 guard demo end-to-end on Kaggle T4 in under a minute, emits a SHA3-anchored receipt

Promotion gate

To evaluate any candidate (v42 and later — uses canonical_eval):

# 1. Start the candidate via llama-server, then run canonical eval.
python experiments/canonical_eval.py \
    --model-id haic-gemma4-v<N> \
    --server-url http://127.0.0.1:8081 \
    --scenarios experiments/sgt_scenarios_v2.jsonl \
    --system-prompt-variant old \
    --seeds 7 13 23 42 100 \
    --n-samples 20 \
    --focused-n 100 \
    --out experiments/v<N>_canonical.json \
    --failure-sidecar experiments/v<N>_failures.jsonl

# 2. The canonical JSON contains all 13 H-series gate results and a SHA3-256
#    self-anchor. Gate thresholds are predeclared in the matching docs/hypothesis
#    file BEFORE the run (per evaluation_doctrine.md).

# 3. Verdict: write docs/v<N>_canonical_verdict_<DATE>.md citing the anchor,
#    gate results, and PASS/FAIL decision per the precommit gates.

For the v38-v40 era six-gate decision pipeline (BEAST runner, leakage check, check_promotion CLI), see docs/promotion_workflow.md — that pipeline is still functional for historical reproduction but post-v42 promotion uses canonical_eval above.

The pipeline is non-compensatory: any one of the six gates failing blocks promotion. See docs/promotion_workflow.md for the full procedure and the v38 disposition under it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAIC × Gemma 4 Good — Kaggle Hackathon Entry

Core Thesis

The Runtime Grounding Loop

Project Structure

Local Layout

The Governance Tool Pipeline

Five Function-Calling Tools (Gemma 4 TOOL_SCHEMAS)

Advisory Audit (Scenario 6, not in TOOL_SCHEMAS)

Quick Start (local gateway)

Key Reading

Model evaluation (canonical record)

Independent reproducibility

Promotion gate

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 307 Commits
.github		.github
assets/media_gallery		assets/media_gallery
dashboard		dashboard
data		data
deploy		deploy
docs		docs
experiments		experiments
maestro_gateway		maestro_gateway
maestro_integration		maestro_integration
notebook		notebook
onchain		onchain
prism_integration		prism_integration
scripts		scripts
tests		tests
tools		tools
utils		utils
viability		viability
video		video
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
WRITEUP.md		WRITEUP.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

HAIC × Gemma 4 Good — Kaggle Hackathon Entry

Core Thesis

The Runtime Grounding Loop

Project Structure

Local Layout

The Governance Tool Pipeline

Five Function-Calling Tools (Gemma 4 TOOL_SCHEMAS)

Advisory Audit (Scenario 6, not in TOOL_SCHEMAS)

Quick Start (local gateway)

Key Reading

Model evaluation (canonical record)

Independent reproducibility

Promotion gate

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages