An interpretable causal diffusion language model.
Steerling-8B combines masked diffusion language modeling with concept decomposition, enabling:
- Generation: Non-autoregressive text generation via confidence-based unmasking
- Attribution: Decompose predictions into known concept contributions
- Steering: Intervene on concept activations to control generation
- Embeddings: Extract hidden, composed, known, or unknown representations
pip install steerlingfrom steerling import SteerlingGenerator, GenerationConfig
generator = SteerlingGenerator.from_pretrained("guidelabs/steerling-8b")
text = generator.generate(
"The key to understanding neural networks is",
GenerationConfig(max_new_tokens=100, seed=42),
)
print(text)| Property | Value |
|---|---|
| Parameters | ~8B |
| Architecture | CausalDiffusionLM + Interpretable Concept Head |
| Context Length | 4096 |
| Vocabulary | 100,281 (cl100k_base + specials) |
| Known Concepts | 33,732 |
| Unknown Concepts | 101,196 |
| GQA | 32 heads, 4 KV heads |
| Precision | bfloat16 |
Steerling uses block-causal attention (bidirectional within 64-token blocks, causal across blocks) with masked diffusion training. At inference, tokens are generated by iteratively unmasking positions in order of model confidence. The interpretable concept heads decompose transformer hidden states h into:
h → known_features + unk_hat + epsilon = composed → lm_head → logits
known_features: Weighted sum of top-k learned concept embeddingsunk_hat: Residual features captured by a factorized unknown headepsilon: Small correction term for reconstruction fidelity
# From PyPI
pip install steerling
# From source
git clone https://github.com/guidelabs/steerling.git
cd steerling
pip install -e ".[dev]"
# With evaluation support
pip install -e ".[all]"-
Where can I read more about the details of this architecture?
You can read more about the architecture in these blog posts: Scaling Interpretable Models with 8B Parameters and Causal Diffusion Language Models. We will be releasing a more detailed technical report in a few months. -
This is a base model, what about an instruction-tuned model?
Stay tuned. -
Is training code available?
This release is inference-only, so the training code is not included. If you're interested in training or fine-tuning, please reach out to [email protected]. -
What dataset did you train on?
We trained on an augmented version of the Nemontron-cc-hq data for a total of about 1.35 Trillion tokens. -
What is block-causal attention?
Standard causal attention only lets each token attend to previous tokens. Block-causal attention groups tokens into blocks of say 64 and allows bidirectional attention within each block, while maintaining causal ordering across blocks. This gives the model local bidirectional context while preserving the ability to generate sequentially. Refer to this post: Causal Diffusion Language Models, for more details. -
What are "known" and "unknown" concepts?
The model decomposes its internal representations into two parts:- Known concepts (33,732): learned and supervised features that correspond to identifiable patterns that a human will understand.
- Unknown concepts (101,196): capture the signal that known concepts don't explain in the hidden representations.
- Together they reconstruct the full hidden state with an error:
hidden ≈ known_features + unknown_features + epsilon.
-
How do I find concept IDs for steering?
Over the coming weeks, we will provide a full-scale workthrough of how to extract and steer Steerling-8B. -
What GPU do I need?
Steerling-8B in bfloat16 requires approximately 18GB VRAM. It fits on a single H100, A100 (40GB or 80GB), A6000 (48GB), or RTX 4090 (24GB). -
Can I fine-tune this model?
Yes. However, we have not included finetuning code with this package. It is currently an inference-only release; if there is increasing request, we will support fine-tuning in a future release. -
What tokenizer does Steerling-8B use?
Steerling uses OpenAI'scl100k_basetokenizer (via tiktoken) with 4 additional special tokens:<|pad|>,<|bos|>,<|endofchunk|>, and<|mask|>, for a total vocabulary of 100,281 tokens. -
Can I use this with the Hugging Face transformers library?
Not directly, Steerling uses a custom architecture (block-causal attention, concept heads) that isn't in the transformers library. Use thesteerlingpackage instead, which providesSteerlingGenerator.from_pretrained()with a similar interface. -
How do I get training data attributions?
This release is a light-weight version of the pipeline, so it doesn't directly support training data attribution. We have provided notebooks to enable concept, and feature attributions. If you're interested in supporting training data attribution, please reach out to Guide Labs.
The Steerling source code is released under the Apache License 2.0.
The model weights are provided for research and evaluation purposes. The weights were trained on datasets with varying license terms, including Nemotron-CC-HQ and Dolmino Mix. Some training data includes synthetic content generated by third-party models with their own license terms. We are currently reviewing the implications of these upstream licenses for downstream use of the model weights. Please check back for updates on the weight licensing terms.
For questions about commercial use of the model weights, contact us at [email protected]