Steerling

An interpretable causal diffusion language model.

Steerling-8B combines masked diffusion language modeling with concept decomposition, enabling:

Generation: Non-autoregressive text generation via confidence-based unmasking
Attribution: Decompose predictions into known concept contributions
Steering: Intervene on concept activations to control generation
Embeddings: Extract hidden, composed, known, or unknown representations

Quick Start

pip install steerling

from steerling import SteerlingGenerator, GenerationConfig

generator = SteerlingGenerator.from_pretrained("guidelabs/steerling-8b")

text = generator.generate(
    "The key to understanding neural networks is",
    GenerationConfig(max_new_tokens=100, seed=42),
)
print(text)

Model Details

Property	Value
Parameters	~8B
Architecture	CausalDiffusionLM + Interpretable Concept Head
Context Length	4096
Vocabulary	100,281 (cl100k_base + specials)
Known Concepts	33,732
Unknown Concepts	101,196
GQA	32 heads, 4 KV heads
Precision	bfloat16

Architecture

Steerling uses block-causal attention (bidirectional within 64-token blocks, causal across blocks) with masked diffusion training. At inference, tokens are generated by iteratively unmasking positions in order of model confidence. The interpretable concept heads decompose transformer hidden states h into:

h → known_features + unk_hat + epsilon = composed → lm_head → logits

known_features: Weighted sum of top-k learned concept embeddings
unk_hat: Residual features captured by a factorized unknown head
epsilon: Small correction term for reconstruction fidelity

Installation

# From PyPI
pip install steerling

# From source
git clone https://github.com/guidelabs/steerling.git
cd steerling
pip install -e ".[dev]"

# With evaluation support
pip install -e ".[all]"

FAQ

Where can I read more about the details of this architecture?
You can read more about the architecture in these blog posts: Scaling Interpretable Models with 8B Parameters and Causal Diffusion Language Models. We will be releasing a more detailed technical report in a few months.
This is a base model, what about an instruction-tuned model?
Stay tuned.
Is training code available?
This release is inference-only, so the training code is not included. If you're interested in training or fine-tuning, please reach out to [email protected].
What dataset did you train on?
We trained on an augmented version of the Nemontron-cc-hq data for a total of about 1.35 Trillion tokens.
What is block-causal attention?
Standard causal attention only lets each token attend to previous tokens. Block-causal attention groups tokens into blocks of say 64 and allows bidirectional attention within each block, while maintaining causal ordering across blocks. This gives the model local bidirectional context while preserving the ability to generate sequentially. Refer to this post: Causal Diffusion Language Models, for more details.
What are "known" and "unknown" concepts?
The model decomposes its internal representations into two parts:
- Known concepts (33,732): learned and supervised features that correspond to identifiable patterns that a human will understand.
- Unknown concepts (101,196): capture the signal that known concepts don't explain in the hidden representations.
- Together they reconstruct the full hidden state with an error: hidden ≈ known_features + unknown_features + epsilon.
How do I find concept IDs for steering?
Over the coming weeks, we will provide a full-scale workthrough of how to extract and steer Steerling-8B.
What GPU do I need?
Steerling-8B in bfloat16 requires approximately 18GB VRAM. It fits on a single H100, A100 (40GB or 80GB), A6000 (48GB), or RTX 4090 (24GB).
Can I fine-tune this model?
Yes. However, we have not included finetuning code with this package. It is currently an inference-only release; if there is increasing request, we will support fine-tuning in a future release.
What tokenizer does Steerling-8B use?
Steerling uses OpenAI's cl100k_base tokenizer (via tiktoken) with 4 additional special tokens: <|pad|>, <|bos|>, <|endofchunk|>, and <|mask|>, for a total vocabulary of 100,281 tokens.
Can I use this with the Hugging Face transformers library?
Not directly, Steerling uses a custom architecture (block-causal attention, concept heads) that isn't in the transformers library. Use the steerling package instead, which provides SteerlingGenerator.from_pretrained() with a similar interface.
How do I get training data attributions?
This release is a light-weight version of the pipeline, so it doesn't directly support training data attribution. We have provided notebooks to enable concept, and feature attributions. If you're interested in supporting training data attribution, please reach out to Guide Labs.

License

The Steerling source code is released under the Apache License 2.0.

The model weights are provided for research and evaluation purposes. The weights were trained on datasets with varying license terms, including Nemotron-CC-HQ and Dolmino Mix. Some training data includes synthetic content generated by third-party models with their own license terms. We are currently reviewing the implications of these upstream licenses for downstream use of the model weights. Please check back for updates on the weight licensing terms.

For questions about commercial use of the model weights, contact us at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
notebooks		notebooks
scripts		scripts
steerling		steerling
tests		tests
.env-template		.env-template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steerling

Quick Start

Model Details

Architecture

Installation

FAQ

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

guidelabs/steerling

Folders and files

Latest commit

History

Repository files navigation

Steerling

Quick Start

Model Details

Architecture

Installation

FAQ

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages