FREE Reverse Engineering Self-Study Course HERE
A production-ready, fully type-annotated GPT-2 implementation from scratch in PyTorch.
MicroGPT is a clean, educational implementation of the GPT-2 Medium architecture (355M parameters) built from first principles with detailed explanations and comprehensive testing.
- config.json - All hyperparameters (architecture, training, fine-tuning)
- config.py - Loads config.json into typed GPTConfig dataclass
- micro_gpt.py (800+ lines) - Complete GPT-2 implementation
- ✅ 100% Type Annotated - Full type hints
- ✅ Production Ready - Clean, maintainable code
- ✅ GPT-2 Medium Architecture - 355M parameters
- Components:
LayerNorm,CausalSelfAttention,FeedForward,TransformerBlock,GPT2
- main.py - Pre-training on OpenWebText dataset (GPT-2 tokenizer)
- fine_tune_micro_gpt.py - Fine-tuning for professional chatbot (Stanford Human Preferences)
- inference_micro_gpt.py - Interactive chat interface
- device.py - Device detection (CUDA/MPS/CPU)
- test_micro_gpt.py (2,715 lines) - 65 tests, 99% coverage
- test_fine_tune_micro_gpt.py - 23 tests for fine-tuning
- test_inference_micro_gpt.py - 34 tests for inference
- GPT2_Tutorial.pdf - Complete transformer architecture tutorial
- README.md - This file
- FILES.md - Complete file inventory
# Clone repository
git clone <repository-url>
cd microgpt
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtRequired packages (requirements.txt):
torch- PyTorch frameworktiktoken- OpenAI's tokenizerdatasets- Hugging Face datasetspytest- Testing frameworkpytest-cov- Coverage reportingmarkdown- Markdown to HTML conversionweasyprint- HTML to PDF conversionpygments- Syntax highlighting
1. Pre-training on OpenWebText (creates base language model)
python main.py- Loads 20M examples from OpenWebText dataset
- Uses GPT-2 BPE tokenizer (vocab size: 50,257)
- Trains for 300k steps with cosine LR schedule
- Saves checkpoint to
checkpoints/best_val.pt(overwrites with better loss)
2. Fine-tuning for Chatbot (adds conversational abilities)
python fine_tune_micro_gpt.py- Loads pre-trained checkpoint from step 1
- Fine-tunes on 20M tokens from Stanford Human Preferences dataset
- Adds professional identity training ("I am MicroGPT, created by Kevin Thomas")
- Saves fine-tuned checkpoint to
checkpoints/finetuned_best_val.pt(overwrites with better loss)
3. Interactive Chat
python inference_micro_gpt.py- Loads fine-tuned checkpoint from step 2
- Provides interactive chat interface
- Professional, consistent responses (temperature=0.7, top_p=0.9)
from micro_gpt import GPT2, GPT2Config
from config import load_config
import torch
import tiktoken
# Load config
cfg = load_config("config.json")
# Create model with GPT-2 Medium config
config = GPT2Config(
block_size=cfg.block_size, # 1024
vocab_size=cfg.vocab_size, # 50257 (GPT-2 tokenizer)
n_layer=cfg.n_layer, # 24
n_head=cfg.n_head, # 16
n_embd=cfg.n_embd, # 1024
dropout=cfg.dropout, # 0.1
bias=cfg.bias, # True
)
model = GPT2(config)
# Generate text
tokenizer = tiktoken.get_encoding("gpt2")
context_tokens = tokenizer.encode("The quick brown")
context = torch.tensor([context_tokens])
output = model.generate(context, max_new_tokens=50, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0].tolist()))from micro_gpt import GPT2
help(GPT2) # View class documentation
help(GPT2.generate) # View method documentationpytest test_micro_gpt.py -vExpected output:
collected 65 items
test_micro_gpt.py::TestSelfAttentionHead::test_initialization PASSED [ 1%]
test_micro_gpt.py::TestSelfAttentionHead::test_forward_shape PASSED [ 3%]
...
====================================================== 65 passed in 2.86s ========
# Test only MicroGPT model
pytest test_micro_gpt.py::TestMicroGPT -v
# Test only integration tests
pytest test_micro_gpt.py::TestIntegration -v
# Test specific function
pytest test_micro_gpt.py::TestMicroGPT::test_forward_with_targets -v# Terminal + HTML report
pytest test_micro_gpt.py -v --cov=test_micro_gpt --cov-report=html --cov-report=term
# View HTML report
open htmlcov/index.html # macOSOutput:
Name Stmts Miss Cover Missing
-------------------------------------------------
test_micro_gpt.py 521 1 99% 1018
-------------------------------------------------
TOTAL 521 1 99%
The GPT2_Tutorial.pdf provides a comprehensive guide to understanding GPT from scratch. It's designed for high school students and beginners, covering:
- Introduction - What is GPT and how language models work
- Tokenization - Breaking text into tokens
- Vocabulary - Building and using a vocabulary
- Token Embeddings - Converting tokens to vectors
- Positional Embeddings - Encoding position information
- Residual Stream - Data flow through the model
- Self-Attention - How attention mechanisms work
- Multi-Head Attention - Parallel attention computation
- Feed-Forward Networks - Processing within positions
- Transformer Block - Combining components
- Model Architecture - Complete GPT structure
- Training - How the model learns
- Text Generation - Producing new text
Regenerate PDF:
python convert_tutorial_to_pdf.py| Component | Purpose |
|---|---|
LayerNorm |
Pre-LayerNorm (GPT-2 style) |
CausalSelfAttention |
Fused multi-head causal attention |
FeedForward |
MLP with 4x expansion and GELU |
TransformerBlock |
Pre-LN transformer block |
GPT2 |
Complete GPT-2 language model |
Architecture (GPT-2 Medium):
vocab_size: 50257 (GPT-2 tokenizer)n_embd: 1024block_size: 1024n_head: 16n_layer: 24dropout: 0.1- Parameters: ~355M (~1.4GB)
Pre-training (main.py):
- Dataset: OpenWebText (20M examples)
- Batch size: 4
- Learning rate: 3e-4 → 3e-5 (cosine decay)
- Warmup steps: 2000
- Training steps: 300,000
Fine-tuning (fine_tune_micro_gpt.py):
- Dataset: Stanford Human Preferences (20M tokens)
- Learning rate: 1e-5
- Epochs: 10,000
- Temperature: 0.7 (balanced responses)
- Top-p: 0.9 (nucleus sampling)
- Max new tokens: 150
| Config | Parameters | Memory |
|---|---|---|
| GPT-2 Small (n=768) | ~124M | ~16 GB |
| GPT-2 Medium (n=1024) | ~355M | ~40 GB |
| GPT-2 Large (n=1280) | ~774M | ~60 GB |
Note: All parameters are configurable in config.json - single source of truth for the entire project.
MIT License
Kevin Thomas
- Email: [email protected]
- GitHub: @mytechnotalent
Built for educational purposes to help students understand transformer architecture from first principles.