Introduction to Mechanistic Interpretability, Superposition and Sparse Autoencoders

In this post, we will explore the concepts of Superposition and Sparse Autoencoders in the context of mechanistic interpretability. We'll build a spar...

Dec 25, 2025 AI Safety, Alignment, Mechanistic Interpretability, Transformers, Superposition, Sparse Autoencoders, SAE, Natural Language Processing, NLP

Representation Engineering - I: Steering Language Models With Activation Engineering

Implement Activation Addition (ActAdd) and Contrastive Activation Addition (CAA) to steer language models at inference time without training. Learn how adding vectors to the residual stream changes behavior, with practical code implementations and analysis of both methods.

Dec 20, 2025 AI Safety, Alignment, Representation Engineering, Activation Engineering, ControlibilityNatural Language Processing, NLP, Large Language Models, LLMs, Transformers

Building GPT-2 From Scratch: Mechanistic Interpretability View

In this post, we're going to build GPT-2 from the ground up, implementing every component ourselves and understanding exactly how this remarkable architecture works.

Dec 19, 2025 Large Language Model, Transformers, Mechanistic Interpretability, Natural Language Processing, AI Safety

Visualizing Attention: See what an LLM sees.

Learn how attention mechanisms work in transformers by visualizing what LLMs see when processing text. Discover how attention connects semantically related tokens (like Paris → French), understand the Query-Key-Value framework, and explore how different attention heads specialize in syntax, semantics, and coreference.

Dec 19, 2025 Natural Language Processing, NLP, Large Language Models, LLMs, Transformers

Supervised Finetuning in LLM training workflow

Learn how supervised fine-tuning (SFT) fits into the LLM training pipeline. This post explains the three-step process (pretraining → SFT → alignment), demonstrates SFT implementation with a practical example, and shows how fine-tuning transforms a base model into a task-specific assistant.

Dec 18, 2025 Natural Language Processing, NLP, Large Language Models, LLMs, Transformers

From Words to Meaning: Implementing Word2Vec from Scratch

Word embeddings are one of the most transformative developments in Natural Language Processing (NLP). They solve a fundamental problem: how can we rep...

Dec 17, 2025 Natural Language Processing

Primer on Large Language Model (LLM) Inference Optimizations: 3. Model Architecture Optimizations

Exploring model architecture optimizations for Large Language Model (LLM) inference, focusing on Group Query Attention (GQA) and Mixture of Experts (MoE) techniques.

Nov 15, 2024 Large Language Model, Inference Optimization, Natural Language Processing

Scaling Laws in Large Language Models

Scaling laws in AI offer a quantitative framework for understanding the relationship between model size, data, and compute resources. Learn about the Chinchilla scaling law, power laws, and the future of large models.

Nov 7, 2024 Large Language Model, Emegent Capabilities, Natural Language Processing

Primer on Large Language Model (LLM) Inference Optimizations: 2. Introduction to Artificial Intelligence (AI) Accelerators

Exploration of AI accelerators and their impact on deploying Large Language Models (LLMs) at scale.

Nov 5, 2024 Large Language Model, Inference Optimization, AI Accelerators, Natural Language Processing

Primer on Large Language Model (LLM) Inference Optimizations: 1. Background and Problem Formulation

Overview of Large Language Model (LLM) inference, its importance, challenges, and key problem formulations.

Oct 31, 2024 Large Language Model, Inference Optimization, Natural Language Processing