MicroGPT

FREE Reverse Engineering Self-Study Course HERE

MicroGPT

A production-ready, fully type-annotated GPT-2 implementation from scratch in PyTorch.

MicroGPT is a clean, educational implementation of the GPT-2 Medium architecture (355M parameters) built from first principles with detailed explanations and comprehensive testing.

🎯 Core Files

Configuration (Single Source of Truth)

config.json - All hyperparameters (architecture, training, fine-tuning)
config.py - Loads config.json into typed GPTConfig dataclass

Model Implementation

micro_gpt.py (800+ lines) - Complete GPT-2 implementation
- ✅ 100% Type Annotated - Full type hints
- ✅ Production Ready - Clean, maintainable code
- ✅ GPT-2 Medium Architecture - 355M parameters
- Components: LayerNorm, CausalSelfAttention, FeedForward, TransformerBlock, GPT2

Training Pipeline

main.py - Pre-training on OpenWebText dataset (GPT-2 tokenizer)
fine_tune_micro_gpt.py - Fine-tuning for professional chatbot (Stanford Human Preferences)
inference_micro_gpt.py - Interactive chat interface
device.py - Device detection (CUDA/MPS/CPU)

Testing

test_micro_gpt.py (2,715 lines) - 65 tests, 99% coverage
test_fine_tune_micro_gpt.py - 23 tests for fine-tuning
test_inference_micro_gpt.py - 34 tests for inference

Documentation

GPT2_Tutorial.pdf - Complete transformer architecture tutorial
README.md - This file
FILES.md - Complete file inventory

⚙️ Installation

# Clone repository
git clone <repository-url>
cd microgpt

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

Required packages (requirements.txt):

torch - PyTorch framework
tiktoken - OpenAI's tokenizer
datasets - Hugging Face datasets
pytest - Testing framework
pytest-cov - Coverage reporting
markdown - Markdown to HTML conversion
weasyprint - HTML to PDF conversion
pygments - Syntax highlighting

🚀 Usage

Complete Training Workflow

1. Pre-training on OpenWebText (creates base language model)

python main.py

Loads 20M examples from OpenWebText dataset
Uses GPT-2 BPE tokenizer (vocab size: 50,257)
Trains for 300k steps with cosine LR schedule
Saves checkpoint to checkpoints/best_val.pt (overwrites with better loss)

2. Fine-tuning for Chatbot (adds conversational abilities)

python fine_tune_micro_gpt.py

Loads pre-trained checkpoint from step 1
Fine-tunes on 20M tokens from Stanford Human Preferences dataset
Adds professional identity training ("I am MicroGPT, created by Kevin Thomas")
Saves fine-tuned checkpoint to checkpoints/finetuned_best_val.pt (overwrites with better loss)

3. Interactive Chat

python inference_micro_gpt.py

Loads fine-tuned checkpoint from step 2
Provides interactive chat interface
Professional, consistent responses (temperature=0.7, top_p=0.9)

Direct Model Usage

from micro_gpt import GPT2, GPT2Config
from config import load_config
import torch
import tiktoken

# Load config
cfg = load_config("config.json")

# Create model with GPT-2 Medium config
config = GPT2Config(
    block_size=cfg.block_size,      # 1024
    vocab_size=cfg.vocab_size,      # 50257 (GPT-2 tokenizer)
    n_layer=cfg.n_layer,            # 24
    n_head=cfg.n_head,              # 16
    n_embd=cfg.n_embd,              # 1024
    dropout=cfg.dropout,            # 0.1
    bias=cfg.bias,                  # True
)
model = GPT2(config)

# Generate text
tokenizer = tiktoken.get_encoding("gpt2")
context_tokens = tokenizer.encode("The quick brown")
context = torch.tensor([context_tokens])
output = model.generate(context, max_new_tokens=50, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0].tolist()))

View Documentation

from micro_gpt import GPT2
help(GPT2)              # View class documentation
help(GPT2.generate)     # View method documentation

🧪 Testing

Run All Tests

pytest test_micro_gpt.py -v

Expected output:

collected 65 items

test_micro_gpt.py::TestSelfAttentionHead::test_initialization PASSED      [  1%]
test_micro_gpt.py::TestSelfAttentionHead::test_forward_shape PASSED       [  3%]
...
====================================================== 65 passed in 2.86s ========

Run Specific Tests

# Test only MicroGPT model
pytest test_micro_gpt.py::TestMicroGPT -v

# Test only integration tests
pytest test_micro_gpt.py::TestIntegration -v

# Test specific function
pytest test_micro_gpt.py::TestMicroGPT::test_forward_with_targets -v

Coverage Report

# Terminal + HTML report
pytest test_micro_gpt.py -v --cov=test_micro_gpt --cov-report=html --cov-report=term

# View HTML report
open htmlcov/index.html  # macOS

Output:

Name                Stmts   Miss  Cover   Missing
-------------------------------------------------
test_micro_gpt.py     521      1    99%   1018
-------------------------------------------------
TOTAL                 521      1    99%

📖 Tutorial

The GPT2_Tutorial.pdf provides a comprehensive guide to understanding GPT from scratch. It's designed for high school students and beginners, covering:

Introduction - What is GPT and how language models work
Tokenization - Breaking text into tokens
Vocabulary - Building and using a vocabulary
Token Embeddings - Converting tokens to vectors
Positional Embeddings - Encoding position information
Residual Stream - Data flow through the model
Self-Attention - How attention mechanisms work
Multi-Head Attention - Parallel attention computation
Feed-Forward Networks - Processing within positions
Transformer Block - Combining components
Model Architecture - Complete GPT structure
Training - How the model learns
Text Generation - Producing new text

Regenerate PDF:

python convert_tutorial_to_pdf.py

🔧 Model Architecture

Components

Component	Purpose
`LayerNorm`	Pre-LayerNorm (GPT-2 style)
`CausalSelfAttention`	Fused multi-head causal attention
`FeedForward`	MLP with 4x expansion and GELU
`TransformerBlock`	Pre-LN transformer block
`GPT2`	Complete GPT-2 language model

Current Configuration (config.json)

Architecture (GPT-2 Medium):

vocab_size: 50257 (GPT-2 tokenizer)
n_embd: 1024
block_size: 1024
n_head: 16
n_layer: 24
dropout: 0.1
Parameters: ~355M (~1.4GB)

Pre-training (main.py):

Dataset: OpenWebText (20M examples)
Batch size: 4
Learning rate: 3e-4 → 3e-5 (cosine decay)
Warmup steps: 2000
Training steps: 300,000

Fine-tuning (fine_tune_micro_gpt.py):

Dataset: Stanford Human Preferences (20M tokens)
Learning rate: 1e-5
Epochs: 10,000
Temperature: 0.7 (balanced responses)
Top-p: 0.9 (nucleus sampling)
Max new tokens: 150

Memory Usage

Config	Parameters	Memory
GPT-2 Small (n=768)	~124M	~16 GB
GPT-2 Medium (n=1024)	~355M	~40 GB
GPT-2 Large (n=1280)	~774M	~60 GB

Note: All parameters are configurable in config.json - single source of truth for the entire project.

📜 License

MIT License

👤 Author

Kevin Thomas

Email: [email protected]
GitHub: @mytechnotalent

🙏 Acknowledgments

Built for educational purposes to help students understand transformer architecture from first principles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FREE Reverse Engineering Self-Study Course HERE

MicroGPT

🎯 Core Files

Configuration (Single Source of Truth)

Model Implementation

Training Pipeline

Testing

Documentation

⚙️ Installation

🚀 Usage

Complete Training Workflow

Direct Model Usage

View Documentation

🧪 Testing

Run All Tests

Run Specific Tests

Coverage Report

📖 Tutorial

🔧 Model Architecture

Components

Current Configuration (config.json)

Memory Usage

📜 License

👤 Author

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitignore		.gitignore
GPT2_Tutorial.md		GPT2_Tutorial.md
GPT2_Tutorial.pdf		GPT2_Tutorial.pdf
LICENSE		LICENSE
MicroGPT.png		MicroGPT.png
README.md		README.md
config.json		config.json
config.py		config.py
convert_tutorial_to_pdf.py		convert_tutorial_to_pdf.py
corpus.json		corpus.json
device.py		device.py
fine_tune_micro_gpt.py		fine_tune_micro_gpt.py
inference_micro_gpt.py		inference_micro_gpt.py
main.py		main.py
micro_gpt.py		micro_gpt.py
requirements.txt		requirements.txt
test_fine_tune_micro_gpt.py		test_fine_tune_micro_gpt.py
test_inference_micro_gpt.py		test_inference_micro_gpt.py
test_micro_gpt.py		test_micro_gpt.py

License

mytechnotalent/MicroGPT

Folders and files

Latest commit

History

Repository files navigation

FREE Reverse Engineering Self-Study Course HERE

MicroGPT

🎯 Core Files

Configuration (Single Source of Truth)

Model Implementation

Training Pipeline

Testing

Documentation

⚙️ Installation

🚀 Usage

Complete Training Workflow

Direct Model Usage

View Documentation

🧪 Testing

Run All Tests

Run Specific Tests

Coverage Report

📖 Tutorial

🔧 Model Architecture

Components

Current Configuration (config.json)

Memory Usage

📜 License

👤 Author

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages