Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MicroGPT is a clean, educational implementation of the GPT (Generative Pre-trained Transformer) architecture built from first principles with detailed explanations and comprehensive testing.

License

Notifications You must be signed in to change notification settings

mytechnotalent/MicroGPT

Repository files navigation

image

FREE Reverse Engineering Self-Study Course HERE


MicroGPT

A production-ready, fully type-annotated GPT-2 implementation from scratch in PyTorch.

MicroGPT is a clean, educational implementation of the GPT-2 Medium architecture (355M parameters) built from first principles with detailed explanations and comprehensive testing.

🎯 Core Files

Configuration (Single Source of Truth)

  • config.json - All hyperparameters (architecture, training, fine-tuning)
  • config.py - Loads config.json into typed GPTConfig dataclass

Model Implementation

  • micro_gpt.py (800+ lines) - Complete GPT-2 implementation
    • 100% Type Annotated - Full type hints
    • Production Ready - Clean, maintainable code
    • GPT-2 Medium Architecture - 355M parameters
    • Components: LayerNorm, CausalSelfAttention, FeedForward, TransformerBlock, GPT2

Training Pipeline

  • main.py - Pre-training on OpenWebText dataset (GPT-2 tokenizer)
  • fine_tune_micro_gpt.py - Fine-tuning for professional chatbot (Stanford Human Preferences)
  • inference_micro_gpt.py - Interactive chat interface
  • device.py - Device detection (CUDA/MPS/CPU)

Testing

  • test_micro_gpt.py (2,715 lines) - 65 tests, 99% coverage
  • test_fine_tune_micro_gpt.py - 23 tests for fine-tuning
  • test_inference_micro_gpt.py - 34 tests for inference

Documentation

  • GPT2_Tutorial.pdf - Complete transformer architecture tutorial
  • README.md - This file
  • FILES.md - Complete file inventory

⚙️ Installation

# Clone repository
git clone <repository-url>
cd microgpt

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

Required packages (requirements.txt):

  • torch - PyTorch framework
  • tiktoken - OpenAI's tokenizer
  • datasets - Hugging Face datasets
  • pytest - Testing framework
  • pytest-cov - Coverage reporting
  • markdown - Markdown to HTML conversion
  • weasyprint - HTML to PDF conversion
  • pygments - Syntax highlighting

🚀 Usage

Complete Training Workflow

1. Pre-training on OpenWebText (creates base language model)

python main.py
  • Loads 20M examples from OpenWebText dataset
  • Uses GPT-2 BPE tokenizer (vocab size: 50,257)
  • Trains for 300k steps with cosine LR schedule
  • Saves checkpoint to checkpoints/best_val.pt (overwrites with better loss)

2. Fine-tuning for Chatbot (adds conversational abilities)

python fine_tune_micro_gpt.py
  • Loads pre-trained checkpoint from step 1
  • Fine-tunes on 20M tokens from Stanford Human Preferences dataset
  • Adds professional identity training ("I am MicroGPT, created by Kevin Thomas")
  • Saves fine-tuned checkpoint to checkpoints/finetuned_best_val.pt (overwrites with better loss)

3. Interactive Chat

python inference_micro_gpt.py
  • Loads fine-tuned checkpoint from step 2
  • Provides interactive chat interface
  • Professional, consistent responses (temperature=0.7, top_p=0.9)

Direct Model Usage

from micro_gpt import GPT2, GPT2Config
from config import load_config
import torch
import tiktoken

# Load config
cfg = load_config("config.json")

# Create model with GPT-2 Medium config
config = GPT2Config(
    block_size=cfg.block_size,      # 1024
    vocab_size=cfg.vocab_size,      # 50257 (GPT-2 tokenizer)
    n_layer=cfg.n_layer,            # 24
    n_head=cfg.n_head,              # 16
    n_embd=cfg.n_embd,              # 1024
    dropout=cfg.dropout,            # 0.1
    bias=cfg.bias,                  # True
)
model = GPT2(config)

# Generate text
tokenizer = tiktoken.get_encoding("gpt2")
context_tokens = tokenizer.encode("The quick brown")
context = torch.tensor([context_tokens])
output = model.generate(context, max_new_tokens=50, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0].tolist()))

View Documentation

from micro_gpt import GPT2
help(GPT2)              # View class documentation
help(GPT2.generate)     # View method documentation

🧪 Testing

Run All Tests

pytest test_micro_gpt.py -v

Expected output:

collected 65 items

test_micro_gpt.py::TestSelfAttentionHead::test_initialization PASSED      [  1%]
test_micro_gpt.py::TestSelfAttentionHead::test_forward_shape PASSED       [  3%]
...
====================================================== 65 passed in 2.86s ========

Run Specific Tests

# Test only MicroGPT model
pytest test_micro_gpt.py::TestMicroGPT -v

# Test only integration tests
pytest test_micro_gpt.py::TestIntegration -v

# Test specific function
pytest test_micro_gpt.py::TestMicroGPT::test_forward_with_targets -v

Coverage Report

# Terminal + HTML report
pytest test_micro_gpt.py -v --cov=test_micro_gpt --cov-report=html --cov-report=term

# View HTML report
open htmlcov/index.html  # macOS

Output:

Name                Stmts   Miss  Cover   Missing
-------------------------------------------------
test_micro_gpt.py     521      1    99%   1018
-------------------------------------------------
TOTAL                 521      1    99%

📖 Tutorial

The GPT2_Tutorial.pdf provides a comprehensive guide to understanding GPT from scratch. It's designed for high school students and beginners, covering:

  1. Introduction - What is GPT and how language models work
  2. Tokenization - Breaking text into tokens
  3. Vocabulary - Building and using a vocabulary
  4. Token Embeddings - Converting tokens to vectors
  5. Positional Embeddings - Encoding position information
  6. Residual Stream - Data flow through the model
  7. Self-Attention - How attention mechanisms work
  8. Multi-Head Attention - Parallel attention computation
  9. Feed-Forward Networks - Processing within positions
  10. Transformer Block - Combining components
  11. Model Architecture - Complete GPT structure
  12. Training - How the model learns
  13. Text Generation - Producing new text

Regenerate PDF:

python convert_tutorial_to_pdf.py

🔧 Model Architecture

Components

Component Purpose
LayerNorm Pre-LayerNorm (GPT-2 style)
CausalSelfAttention Fused multi-head causal attention
FeedForward MLP with 4x expansion and GELU
TransformerBlock Pre-LN transformer block
GPT2 Complete GPT-2 language model

Current Configuration (config.json)

Architecture (GPT-2 Medium):

  • vocab_size: 50257 (GPT-2 tokenizer)
  • n_embd: 1024
  • block_size: 1024
  • n_head: 16
  • n_layer: 24
  • dropout: 0.1
  • Parameters: ~355M (~1.4GB)

Pre-training (main.py):

  • Dataset: OpenWebText (20M examples)
  • Batch size: 4
  • Learning rate: 3e-4 → 3e-5 (cosine decay)
  • Warmup steps: 2000
  • Training steps: 300,000

Fine-tuning (fine_tune_micro_gpt.py):

  • Dataset: Stanford Human Preferences (20M tokens)
  • Learning rate: 1e-5
  • Epochs: 10,000
  • Temperature: 0.7 (balanced responses)
  • Top-p: 0.9 (nucleus sampling)
  • Max new tokens: 150

Memory Usage

Config Parameters Memory
GPT-2 Small (n=768) ~124M ~16 GB
GPT-2 Medium (n=1024) ~355M ~40 GB
GPT-2 Large (n=1280) ~774M ~60 GB

Note: All parameters are configurable in config.json - single source of truth for the entire project.

📜 License

MIT License

👤 Author

Kevin Thomas

🙏 Acknowledgments

Built for educational purposes to help students understand transformer architecture from first principles.

About

MicroGPT is a clean, educational implementation of the GPT (Generative Pre-trained Transformer) architecture built from first principles with detailed explanations and comprehensive testing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages