Release Notes — v1.0.0-governed

https://github.com/GareBear99/arc-neuron-llmbuilder-v1.0.0

Released: 2026-04-22
Codename: Doctrine Closed
Incumbent: arc_governed_v6_conversation at 0.7333

One-line summary

The growth loop is closed end-to-end: conversation grows the brain, not just the memory. Three governed promotions in a row on record; all four Gate v2 decision states have fired lawfully on real runs.

What this release proves

The plateau is breakable. From v2's 0.6247 ceiling (which held across 3 cycles and 10 consecutive archive_only decisions), the system broke to 0.7128 (v4) → 0.7169 (v5) → 0.7333 (v6_conversation) through corpus augmentation alone.
The gate is not pass-happy. Gate v2 has now executed every decision state it was designed for:
- Promote — v4, v5, v6_conversation (three successive real promotions)
- Archive-only tie — v6 (identical score to incumbent v5, correctly archived)
- Archive-only regression — v7_regressed (caught with attribution: reasoning: 0.667 > 0.05, critique: 0.250 > 0.06)
- Reject — exercised by the hard-reject floor logic in test coverage
Conversation truly grows the brain. v6_conversation was trained from a corpus the canonical conversation pipeline harvested itself — 30 varied prompts through ExemplarAdapter → ReflectionLoop → ConversationPipeline → LanguageAbsorptionLayer produced 30 direct SFT + 39 pipeline SFT + 45 terminology SFT records, which when merged with the repo corpus trained a candidate that beat v5 by +0.0164.
The archive is lossless. All 12 Arc-RAR bundles are manifest-readable and restorable. The Omnibinary ledger has SHA-256 integrity across sessions, with the index automatically rebuilding on drift.
The loop is stable. 5 consecutive cycles at the v5 floor all correctly archive-only'd at 0.7169. 3 consecutive cycles at the v4 floor all correctly archive-only'd at 0.6200. No floor breaches. No regressions. No failures.

Full promotion history

Version	Score	Records	Decision	Cause
arc_governed_v1	0.6122	546	promote	first baseline candidate
arc_governed_v2	0.6247	549	promote	+3 records, compression to 1.000
arc_governed_v3	0.6128	556	archive_only	regressed continuity/reflection
arc_governed_v4	0.7128	640	promote	+17 reasoning/critique/lexical SFT broke plateau
arc_governed_v5	0.7169	648	promote	+8 extension pack, beat v4
arc_governed_v6	0.7169	648	archive_only	tied v5 (deterministic, correct)
arc_governed_v7_regressed	0.6190	623	archive_only	reasoning/critique regression caught
arc_governed_v6_conversation	0.7333	762	promote	conversation-harvested corpus, +0.0164 over v5

Per-capability evidence (v6_conversation vs v5)

Capability	v5	v6_conversation	Δ
reasoning	1.0000	1.0000	=
planning	1.0000	1.0000	=
compression	1.0000	1.0000	=
paraphrase_stability	0.8666	0.8666	=
calibration	0.8333	0.8333	=
critique	0.7500	0.7500	=
english_understanding	0.7333	0.7500	+0.017
quantization_retention	0.6667	0.8333	+0.167
out_of_domain	0.6667	0.7500	+0.083
intelligence	0.5694	0.5972	+0.028
reflection	0.5000	0.5667	+0.067
repair	0.6667	0.6667	=
instruction_following	0.5833	0.5833	=
continuity	0.5834	0.5833	=
arc_neuron_small_v2	0.5185	0.5185	=
arc_neuron_base	0.5333	0.4333	-0.100 (below violation threshold)
overall_weighted	0.7169	0.7333	+0.0164

The gain profile is the signature of real learning: broader intelligence capabilities improved (out_of_domain, quantization_retention, reflection, intelligence, english_understanding) rather than a single cherry-picked metric. One minor regression on arc_neuron_base (-0.100) was caught but did not trigger a violation because that capability is not in the regression ceiling list.

Omnibinary performance (measured at release)

Benchmark run of 1,000 events on commodity hardware:

Metric	Value
Append throughput	6,639 events/sec
O(1) lookup throughput	8,859 lookups/sec
Lookup p50 latency	0.10 ms
Lookup p99 latency	0.22 ms
Full scan (1,000 events)	12.43 ms
Index rebuild from ledger	3.80 ms
Storage per event	397 bytes
SHA-256 fidelity	PASS
Spot-check restore fidelity	PASS

Live store at release: 98 events, integrity OK, index_rebuilt=false across sessions.

Test evidence

87/87 tests passing covering:

Shared transformer base (import identity, weight tying, block-size guard, param count)
GGUF v3 I/O (lossless roundtrip, bad-magic raise, alignment constraint)
Native training smoke (tiny tier, small tier, routing detection, scaffold fallback)
Rubric scorer (all capability buckets, short/empty/substantial/overconfident guards)
Canonical conversation pipeline (receipt creation, Omnibinary mirror, label, export, get by receipt/turn)
Omnibinary indexed store (append, O(1) get, scan, verify, rebuild, export_jsonl, high-volume 100-event, cross-instance consistency)
Gate v2 (promote, archive_only, reject, floor breach, regression violation, tie-archive, non-promotable filter, incumbent guard)
Arc-RAR bundle (roundtrip, manifest restore, SHA-256 index)
ANCF (roundtrip, version check, magic bytes)
Continuity (goal retention, constraint preservation, terminology correction, decision recall)
Reflection loop (draft → critique → revise, skip on short, fix parsing)

Artifact SHAs

Promoted model bundles in artifacts/archives/:

arc-rar-arc_governed_v1-ed1042be.arcrar.zip — 16 files
arc-rar-arc_governed_v2-d4f2b940.arcrar.zip — 16 files
arc-rar-arc_governed_v2-f0802548.arcrar.zip — 16 files
arc-rar-arc_governed_v4-e6f9d0df.arcrar.zip — 16 files
arc-rar-arc_governed_v5-0721488f.arcrar.zip — 16 files
arc-rar-arc_governed_v6_conversation-2acf171e.arcrar.zip — 16 files

Known limitations

Model size is the ceiling. The ARC-Neuron Small tier is ~0.18M parameters. That is enough to prove the governance and growth loop, not enough to compete with frontier LLMs. The adapter boundary (adapters/command_adapter.py, adapters/llama_cpp_http_adapter.py) exists specifically to let you plug in a stronger base model while preserving all governance.
Exemplar adapter is retrieval, not generation. The current benchmark evidence uses an ExemplarAdapter that performs cosine retrieval over training records. The native transformer produces real weights and GGUF artifacts, but actual benchmark response quality at small-model scale is dominated by retrieval coverage.
English fluency is small-model grade. Responses are coherent in the governed-cognition doctrine vocabulary the corpus teaches. Broad natural conversation is beyond current model capacity. This is expected for the size.
Training is CPU-only by default. Native training runs on CPU at ~2 threads in the default config. Suitable for the Tiny and Small tiers; larger tiers would benefit from GPU acceleration via standard PyTorch.

Upgrade notes

This is the first production release. If you were tracking the v0.3.x alpha line, the canonical-path changes mean:

Replace any direct OmnibinaryStore(path) construction with the new default (index_flush_every=1 for single-appender safety; use flush() explicitly if batching).
Any custom promotion script should now pass scored outputs through promote_candidate.py Gate v2 rather than v1 thresholds.
run_proof_workflow.py is the new end-to-end entry point; older demo scripts remain but are deprecated in favor of the canonical one.
The floor model is now enforced before any other promotion logic; candidates that drop below the locked baseline reject immediately, not archive-only.

Contributors

Gary Doman — architecture, doctrine, production hardening, training loop, Gate v2, Omnibinary, Arc-RAR, language absorption, promotion wave execution.

Verification

To independently verify this release:

git clone https://github.com/GareBear99/ARC-Neuron-LLMBuilder.git
cd ARC-Neuron-LLMBuilder
git checkout v1.0.0-governed

python3.12 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install "torch>=2.0" "numpy<2.0"
python3 scripts/ops/bootstrap_keys.py

# Expected: 87 passed
python3 -m pytest tests/ -q

# Expected: Omnibinary PASS with measured throughput
python3 scripts/ops/benchmark_omnibinary.py

# Expected: 9/9 steps green, decision=archive_only (can't beat v6_conversation)
python3 scripts/ops/demo_proof_workflow.py

# Expected: STABLE verdict, 3/3 cycles
python3 scripts/ops/run_n_cycles.py --cycles 3 --tier tiny --steps 30

The verdict

The machine is lawful. The measurement is honest. The loop grows a better brain on demand, preserves the prior one, rejects worse ones with attribution, and does so repeatedly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Release Notes — v1.0.0-governed

One-line summary

What this release proves

Full promotion history

Per-capability evidence (v6_conversation vs v5)

Omnibinary performance (measured at release)

Test evidence

Artifact SHAs

Known limitations

Upgrade notes

Contributors

Verification

The verdict

Uh oh!

Releases: GareBear99/ARC-Neuron-LLMBuilder

v1.0.0-governed — Doctrine Closed

Release Notes — v1.0.0-governed

One-line summary

What this release proves

Full promotion history

Per-capability evidence (v6_conversation vs v5)

Omnibinary performance (measured at release)

Test evidence

Artifact SHAs

Known limitations

Upgrade notes

Contributors

Verification

The verdict

Uh oh!