Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

 

History

History
122 lines (89 loc) · 3.7 KB

File metadata and controls

122 lines (89 loc) · 3.7 KB

Styxx v3.5.0 — the Cognitive Instruction Set

2026-04-22

Headline features

  1. styxx.steer + styxx.cogvm — CIS v0 (Cognitive Instruction Set). The first open-source runtime for programmable residual-stream control of any HuggingFace decoder model. Multi-concept composition + conditional dispatch on live probe readings.
  2. styxx.hallucination — runtime fabrication detector with 3 modes (verdict / streaming / auto-halt). Uses new behavioral-label confab probe (AUC 0.800 @ layer 11).
  3. Multi-vendor probe atlas — refuse probes shipped for Llama-3.2-1B, Llama-3.2-3B, Qwen-2.5-1.5B, Phi-3.5-mini. First open cross-vendor cognitive direction library.

Landmark research results

Safety bypass on Llama-3.2-1B

Single-direction multi-position residual steering causes refusal on unsafe prompts to drop from 97% → 17% at α=3.0 (n=60 held-out). Reproduces Arditi et al. at 1B with open data.

Gradient-free capability amplification

On TruthfulQA MC1 with Llama-3.2-1B: baseline 32.5% → 39.5% at α=1.0 multi-layer patching with a supervised correct-vs-incorrect answer direction. Validated by random-direction control (random directions hurt accuracy −5.3pp at α=0.5; trained direction lifts +6.0pp; gap +11.3pp). Reproduces Representation Engineering at 1B with random control.

Concept geometry

Refuse / sycophant-pressure / confab-prompt probe directions at shared layer 10 of Llama-1B fall at 86°–92° pairwise — random high-dim-vector spacing. Concepts are modular. First empirical measurement.

Universal Cognitive Basis v0

Cross-model direction transfer grid:

Transfer cos Verdict
Llama-1B → Llama-3B (within family) +0.464 Strong
Llama-1B → Qwen-1.5B (cross-vendor) +0.362 Moderate
Llama-1B → Phi-3.5 +0.150 Weak
Qwen-1.5B → Phi-3.5 +0.043 Essentially random

Naive linear UCB holds partially — strong within family, weakens with vendor safety-training divergence. Falsified for the hardest pair. Honest.

CognitiveBench v0 — first cross-vendor cognitive audit

50-prompt fake-entity fabrication battery, same scoring for every model:

Vendor Model Fabrication
Anthropic claude-haiku-4-5 14%
Meta Llama-3.2-1B 56%
Meta Llama-3.2-3B 62%
Alibaba Qwen-2.5-1.5B (running)
Microsoft Phi-3.5-mini (running)

Scale alone doesn't improve fabrication resistance — Llama-3B fabricates more than Llama-1B. Safety training + architecture, not just param count, carries the signal.

Papers in repo

  • papers/cognitive-instruction-set-v0-filled.md
  • papers/universal-cognitive-basis-v0.md
  • papers/capability-amplification-v0.md
  • docs/cognet-protocol-v0.md

Reproducibility

bash scripts/reproduce-cis-v0.sh

~25 min on RTX 4070-class GPU. Full: probe training × 4 vendors + causal α-sweep + geometry + cogvm demo.

Install

pip install styxx==3.5.0
# For local-model probes (tier 1):
pip install 'styxx[tier1]==3.5.0'

What ships in the wheel

  • 7 trained probes (refuse × 4 vendors + 3 concepts on Llama-1B)
  • 4 papers + spec
  • Full CogVM runtime
  • Hallucination detector API
  • Production calibration utility

Acknowledgements

Builds on published work from:

  • Arditi et al. 2024 — "Refusal in Language Models is Mediated by a Single Direction"
  • Zou et al. 2023 — "Representation Engineering"
  • Marks & Tegmark 2024 — "The Geometry of Truth"
  • Turner et al. 2023 — "Activation Addition"

License

MIT (code), CC-BY-4.0 (atlas + papers).

Patents

Extends the Fathom Cognitive Atlas + Cognitive Metrology patent stack (US Provisional 64/020,489, 64/021,113, 64/026,964).