transformer-lens

Here are 9 public repositories matching this topic...

RishabSA / interp-refusal-tokens

We study whether categorical refusal tokens enable controllable and interpretable safety behavior in language models.

machine-learning research ai deep-learning pytorch artificial-intelligence safety llama steering neurips llm mechanistic-interpretability llm-safety refusal llama3 transformer-lens llm-refusal

Updated Jan 29, 2026
Python

designer-coderajay / logit-lens-explorer

Star

Mechanistic interpretability tool visualizing GPT-2's layer-by-layer predictions using the logit lens technique

nlp deep-learning transformers pytorch gpt-2 streamlit mechanistic-interpretability transformer-lens

Updated Dec 19, 2025
Python

designer-coderajay / induction-head-detector

Star

Mechanistic interpretability tool to detect induction heads in GPT-2 using TransformerLens

nlp machine-learning deep-learning transformers pytorch gpt-2 attention-heads mechanistic-interpretability transformer-lens

Updated Dec 15, 2025
Python

designer-coderajay / activation-patching-framework

Star

Causal intervention framework for mechanistic interpretability research. Implements activation patching methodology for identifying causally important components in transformer language models.

nlp machine-learning deep-learning pytorch interpretability gpt-2 mechanistic-interpretability causal-tracing activation-patching transformer-lens

Updated Dec 17, 2025
Python

tesims / multiagent-emergent-deception

Star

A research tool for studying how deception emerges in multi-agent LLM systems and detecting it through activation analysis.

alignment gemma sparse-autoencoders multi-agent-systems ai-safety emergent-behavior interpretability deception-detection activation-analysis mechanistic-interpretability llm-agents gemma-2b gemma-scope transformer-lens linear-probes

Updated Jan 11, 2026
Python

beachcities / gpt2-arithmetic-mechanistic-analysis-

Star

"Arithmetic Without Algorithms": Mechanistic analysis of arithmetic failure ("5+5=6") in GPT-2 Small using Induction Heads and Sparse Autoencoders (SAEs).

python research sparse-autoencoders gpt-2 mechanistic-interpretability transformer-lens

Updated Jan 1, 2026
Jupyter Notebook

nekaeve24 / Neural-DNA-Forensics

Star

Forensic suite for Mechanistic Interpretability in Transformers. Implementing 0.0054 Basal Accountability Gradients for auditing model logic using TransformerLens and SAELens

pytorch ai-safety quantitative-research mechanistic-interpretability transformer-lens

Updated Jan 29, 2026
Python

komikat / prep-gated-circuits

Star

Code used for reverse-engineering a “Query-Gated Courier” circuit in Gemma-2-2B for role-gated retrieval.

mechanistic-interpretability gemma-2-2b transformer-lens

Updated Aug 23, 2025
Jupyter Notebook

jdavidks / activation-patching-framework

Star

🧩 Simplify causal intervention in transformer models with this modular library for accurate circuit analysis and behavior identification.

microsoft nlp powershell office ida patching interpretability spotify-windows emulated-kms-servers spotify-ads hexrays massgrave spotify-no-ads kmsvlall tsforge transformer-lens

Updated Jan 29, 2026
Python

Improve this page

Add a description, image, and links to the transformer-lens topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the transformer-lens topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformer-lens

Here are 9 public repositories matching this topic...

RishabSA / interp-refusal-tokens

designer-coderajay / logit-lens-explorer

designer-coderajay / induction-head-detector

designer-coderajay / activation-patching-framework

tesims / multiagent-emergent-deception

beachcities / gpt2-arithmetic-mechanistic-analysis-

nekaeve24 / Neural-DNA-Forensics

komikat / prep-gated-circuits

jdavidks / activation-patching-framework

Improve this page

Add this topic to your repo