The most atomic way to train and inference a GPT in modern C++.
A faithful C++20 incarnation of Andrej Karpathy's microgpt.py — a complete autograd engine, transformer, tokenizer, optimizer, training loop, and inference pipeline in a single header file.
Zero dependencies beyond the C++20 standard library.
microgpt.hpp (~370 lines) contains everything:
- Autograd engine —
Valuewith shared_ptr computation graph, reverse-mode autodiff - Operators —
+,*,-,/,pow,log,exp,reluwith full gradient support - Linear algebra —
linear(),softmax(),rmsnorm()onVec/Mattypes - GPT model — token/position embeddings, multi-head causal self-attention with KV cache, MLP, residual connections
- Tokenizer — character-level with BOS token
- Adam optimizer — with bias-corrected moments and learning rate decay
- Training — cross-entropy loss, full backprop through the entire model
- Inference — temperature-scaled autoregressive sampling
cmake -B build && cmake --build build
# Train on a names dataset
echo -e "alice\nbob\ncharlie\ndave\neve\nfrank\ngrace" > input.txt
./build/microgpt input.txt 500
# Run tests
./build/microgpt_test27 Google Tests covering the full stack:
[==========] Running 27 tests from 6 test suites.
ValueForward (9 tests) — forward pass for all operations
ValueBackward (9 tests) — gradients + numerical gradient check
Components (5 tests) — linear, softmax, rmsnorm, softmax backward
Tokenizer (1 test) — encode/decode roundtrip
GPT (2 tests) — forward produces finite logits, KV cache grows
Training (1 test) — loss decreases over 100 steps
[ PASSED ] 27 tests.
Default config (tiny, CPU-friendly):
| Parameter | Value |
|---|---|
n_embd |
16 |
n_head |
4 |
n_layer |
1 |
block_size |
16 |
microgpt-cpp: When You Rewrite Karpathy's 200 Lines of Python in C++ Because Why Not
Based on Andrej Karpathy's microgpt.py.